1Single-sampled Color Compression 2================================ 3 4Starting with Ivy Bridge, Intel graphics hardware provides a form of color 5compression for single-sampled surfaces. In its initial form, this provided an 6acceleration of render target clear operations that, in the common case, allows 7you to avoid almost all of the bandwidth of a full-surface clear operation. On 8Sky Lake, single-sampled color compression was extended to allow for the 9compression color values from actual rendering and not just the initial clear. 10From here on, the older Ivy Bridge form of color compression will be called 11"fast-clears" and term "color compression" will be reserved for the more 12powerful Sky Lake form. 13 14The documentation for Ivy Bridge through Broadwell overloads the term MCS for 15referring both to the *multisample control surface* used for multisample 16compression and the control surface used for fast-clears. In ISL, the 17:c:enumerator:`isl_aux_usage.ISL_AUX_USAGE_MCS` enum always refers to 18multisample color compression while the 19:c:enumerator:`isl_aux_usage.ISL_AUX_USAGE_CCS_D` and 20:c:enumerator:`isl_aux_usage.ISL_AUX_USAGE_CCS_E` enums always refer to 21single-sampled color compression. Throughout this chapter and the rest of the 22ISL documentation, we will use the term "color control surface", abbreviated 23CCS, to denote the control surface used for both fast-clears and color 24compression. While this is still an overloaded term, Ivy Bridge fast-clears 25are much closer to Sky Lake color compression than they are to multisample 26compression. 27 28CCS data 29-------- 30 31Fast clears and CCS are possibly the single most poorly documented aspect of 32surface layout/setup for Intel graphics hardware (with HiZ coming in a neat 33second). All the documentation really says is that you can use an MCS buffer on 34single-sampled surfaces (we will call it the CCS in this case). It also 35provides some documentation on how to program the hardware to perform clear 36operations, but that's it. How big is this buffer? What does it contain? 37Those question are left as exercises to the reader. Almost everything we know 38about the contents of the CCS is gleaned from reverse-engineering of the 39hardware. The best bit of documentation we have ever had comes from the 40display section of the Sky Lake PRM Vol 12 section on planes (p. 159): 41 42 The Color Control Surface (CCS) contains the compression status of the 43 cache-line pairs. The compression state of the cache-line pair is 44 specified by 2 bits in the CCS. Each CCS cache-line represents an area 45 on the main surface of 16x16 sets of 128 byte Y-tiled cache-line-pairs. 46 CCS is always Y tiled. 47 48While this is technically for color compression and not fast-clears, it 49provides a good bit of insight into how color compression and fast-clears 50operate. Each cache-line pair, in the main surface corresponds to 1 or 2 bits 51in the CCS. The primary difference, as far as the current discussion is 52concerned, is that fast-clears use only 1 bit per cache-line pair whereas color 53compression uses 2 bits. 54 55What is a cache-line pair? Both the X and Y tiling formats are arranged as an 568x8 grid of cache lines. (See the :doc:`chapter on tiling <tiling>` for more 57details.) In either case, a cache-line pair is a pair of cache lines whose 58starting addresses differ by 512 bytes or 8 cache lines. This results in the 59two cache lines being vertically adjacent when the main surface is X-tiled and 60horizontally adjacent when the main surface is Y-tiled. For an X-tiled surface 61this forms an area of 64B x 2rows and for a Y-tiled surface this forms an area 62of 32B x 4rows. In either case, it is guaranteed that, regardless of surface 63format, each 2x2 subspan coming out of a shader will land entirely within one 64cache-line pair. 65 66What is the correspondence between bits and cache-line pairs? The best model I 67(Faith) know of is to consider the CCS as having a 1-bit color format for 68fast-clears and a 2-bit format for color compression and a special tiling 69format. The CCS tiling formats operate on a 1 or 2-bit granularity rather than 70the byte granularity of most tiling formats. 71 72The following table represents the bit-layouts that yield the CCS tiling format 73on different hardware generations. Bits 0-11 correspond to the regular swizzle 74of bytes within a 4KB page whereas the negative bits represent the address of 75the particular 1 or 2-bit portion of a byte. (Note: The Haswell data was 76gathered on a dual-channel system so bit-6 swizzling was enabled. It's unclear 77how this affects the CCS layout.) 78 79============ ======== =========== =========== ====================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== 80 Generation Tiling 11 10 9 8 7 6 5 4 3 2 1 0 -1 -2 -3 81============ ======== =========== =========== ====================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== 82 Ivy Bridge X or Y :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_2` :math:`v_3` :math:`v_1` :math:`v_0` :math:`u_3` :math:`u_2` :math:`u_1` :math:`u_0` 83 Haswell X :math:`u_6` :math:`u_5` :math:`v_3 \oplus u_1` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_2` :math:`v_3` :math:`v_1` :math:`v_0` :math:`u_4` :math:`u_3` :math:`u_2` :math:`u_0` 84 Haswell Y :math:`u_6` :math:`u_5` :math:`v_2 \oplus u_1` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_2` :math:`v_3` :math:`v_1` :math:`v_0` :math:`u_4` :math:`u_3` :math:`u_2` :math:`u_0` 85 Broadwell X :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`u_3` :math:`v_3` :math:`u_2` :math:`u_1` :math:`u_0` :math:`v_2` :math:`v_1` :math:`v_0` 86 Broadwell Y :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_2` :math:`v_3` :math:`u_3` :math:`u_2` :math:`u_1` :math:`v_1` :math:`v_0` :math:`u_0` 87 Sky Lake Y :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_3` :math:`v_2` :math:`v_1` :math:`u_3` :math:`u_2` :math:`u_1` :math:`v_0` :math:`u_0` 88============ ======== =========== =========== ====================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== 89 90CCS surface layout 91------------------ 92 93Starting with Broadwell, fast-clears and color compression can be used on 94mipmapped and array surfaces. When considered from a higher level, the CCS is 95laid out like any other surface. The Broadwell and Sky Lake PRMs describe 96this as follows: 97 98Broadwell PRM Vol 7, "MCS Buffer for Render Target(s)" (p. 676): 99 100 Mip-mapped and arrayed surfaces are supported with MCS buffer layout with 101 these alignments in the RT space: Horizontal Alignment = 256 and Vertical 102 Alignment = 128. 103 104Broadwell PRM Vol 2d, "RENDER_SURFACE_STATE" (p. 279): 105 106 For non-multisampled render target's auxiliary surface, MCS, QPitch must be 107 computed with Horizontal Alignment = 256 and Surface Vertical Alignment = 108 128. These alignments are only for MCS buffer and not for associated render 109 target. 110 111Sky Lake PRM Vol 7, "MCS Buffer for Render Target(s)" (p. 632): 112 113 Mip-mapped and arrayed surfaces are supported with MCS buffer layout with 114 these alignments in the RT space: Horizontal Alignment = 128 and Vertical 115 Alignment = 64. 116 117Sky Lake PRM Vol. 2d, "RENDER_SURFACE_STATE" (p. 435): 118 119 For non-multisampled render target's CCS auxiliary surface, QPitch must be 120 computed with Horizontal Alignment = 128 and Surface Vertical Alignment 121 = 256. These alignments are only for CCS buffer and not for associated 122 render target. 123 124Empirical evidence seems to confirm this. On Sky Lake, the vertical alignment 125is always one cache line. The horizontal alignment, however, varies by main 126surface format: 1 cache line for 32bpp, 2 for 64bpp and 4 cache lines for 127128bpp formats. This nicely corresponds to the alignment of 128x64 pixels in 128the primary color surface. The second PRM citation about Sky Lake CCS above 129gives a vertical alignment of 256 rather than 64. With a little 130experimentation, this additional alignment appears to only apply to QPitch and 131not to the miplevels within a slice. 132 133On Broadwell, each miplevel in the CCS is aligned to a cache-line pair 134boundary: horizontal when the primary surface is X-tiled and vertical when 135Y-tiled. For a 32bpp format, this works out to an alignment of 256x128 main 136surface pixels regardless of X or Y tiling. On Sky Lake, the alignment is 137a single cache line which works out to an alignment of 128x64 main surface 138pixels. 139 140TODO: More than just 32bpp formats on Broadwell! 141 142Once armed with the above alignment information, we can lay out the CCS surface 143itself. The way ISL does CCS layout calculations is by a very careful and 144subtle application of its normal surface layout code. 145 146Above, we described the CCS data layout as mapping of address bits. In 147ISL, this is represented by :c:enumerator:`isl_tiling.ISL_TILING_CCS`. The 148logical and physical tile dimensions corresponding to the above mapping. 149 150We also have special :c:enum:`isl_format` enums for CCS. These formats are 1 151bit-per-pixel on Ivy Bridge through Broadwell and 2 bits-per-pixel on Skylake 152and above to correspond to the 1 and 2-bit values represented in the CCS data. 153They have a block size (similar to a block compressed format such as BC or 154ASTC) which says what area (in surface elements) in the main surface is covered 155by a single CCS element (1 or 2-bit). Because this depends on the main surface 156tiling and format, we have several different CCS formats. 157 158Once the appropriate :c:enum:`isl_format` has been selected, computing the 159size and layout of a CCS surface is as simple as passing the same surface 160creation parameters to :c:func:`isl_surf_init_s` as were used to create the 161primary surface only with :c:enumerator:`isl_tiling.ISL_TILING_CCS` and the 162correct CCS format. This not only results in a correctly sized surface but 163most other ISL helpers for things such as computing offsets into surfaces work 164correctly as well. 165 166CCS on Tigerlake and above 167-------------------------- 168 169Starting with Tigerlake, CCS is no longer done via a surface and, instead, the 170term CCS gets overloaded once again (gotta love it!) to now refer to a form of 171universal compression which can be applied to almost any surface. Nothing in 172this chapter applies to any hardware with a graphics IP version 12 or above. 173