1TGSI 2==== 3 4TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language 5for describing shaders. Since Gallium is inherently shaderful, shaders are 6an important part of the API. TGSI is the only intermediate representation 7used by all drivers. 8 9Basics 10------ 11 12All TGSI instructions, known as *opcodes*, operate on arbitrary-precision 13floating-point four-component vectors. An opcode may have up to one 14destination register, known as *dst*, and between zero and three source 15registers, called *src0* through *src2*, or simply *src* if there is only 16one. 17 18Some instructions, like :opcode:`I2F`, permit re-interpretation of vector 19components as integers. Other instructions permit using registers as 20two-component vectors with double precision; see :ref:`doubleopcodes`. 21 22When an instruction has a scalar result, the result is usually copied into 23each of the components of *dst*. When this happens, the result is said to be 24*replicated* to *dst*. :opcode:`RCP` is one such instruction. 25 26Source Modifiers 27^^^^^^^^^^^^^^^^ 28 29TGSI supports 32-bit negate and absolute value modifiers on floating-point 30inputs, and 32-bit integer negates on some drivers. The negate applies after 31absolute value if both are present. 32 33The type of an input can be found by ``tgsi_opcode_infer_src_type()``, and 34TGSI_OPCODE_MOV and the second and third operands of TGSI_OPCODE_UCMP (which 35return TGSI_TYPE_UNTYPED) are also considered floats for the purpose of source 36modifiers. 37 38 39Other Modifiers 40^^^^^^^^^^^^^^^ 41 42The saturate modifier clamps 32-bit destination stores to [0.0, 1.0]. 43 44For arithmetic instruction having a precise modifier certain optimizations 45which may alter the result are disallowed. Example: *add(mul(a,b),c)* can't be 46optimized to TGSI_OPCODE_MAD, because some hardware only supports the fused 47MAD instruction. 48 49Instruction Set 50--------------- 51 52Core ISA 53^^^^^^^^^^^^^^^^^^^^^^^^^ 54 55These opcodes are guaranteed to be available regardless of the driver being 56used. 57 58.. opcode:: ARL - Address Register Load 59 60 .. math:: 61 62 dst.x = (int) \lfloor src.x\rfloor 63 64 dst.y = (int) \lfloor src.y\rfloor 65 66 dst.z = (int) \lfloor src.z\rfloor 67 68 dst.w = (int) \lfloor src.w\rfloor 69 70 71.. opcode:: MOV - Move 72 73 .. math:: 74 75 dst.x = src.x 76 77 dst.y = src.y 78 79 dst.z = src.z 80 81 dst.w = src.w 82 83 84.. opcode:: LIT - Light Coefficients 85 86 .. math:: 87 88 dst.x = 1 89 90 dst.y = max(src.x, 0) 91 92 dst.z = 93 \left\{ 94 \begin{array}{ c l } 95 max(src.y, 0)^{clamp(src.w, -128, 128)} & \quad \textrm{if } src.x \gt 0 \\ 96 0 & \quad \textrm{otherwise} 97 \end{array} 98 \right. 99 100 dst.w = 1 101 102 103.. opcode:: RCP - Reciprocal 104 105 This instruction replicates its result. 106 107 .. math:: 108 109 dst = \frac{1}{src.x} 110 111 112.. opcode:: RSQ - Reciprocal Square Root 113 114 This instruction replicates its result. The results are undefined for *src* <= 0. 115 116 .. math:: 117 118 dst = \frac{1}{\sqrt{src.x}} 119 120 121.. opcode:: SQRT - Square Root 122 123 This instruction replicates its result. The results are undefined for *src* < 0. 124 125 .. math:: 126 127 dst = {\sqrt{src.x}} 128 129 130.. opcode:: EXP - Approximate Exponential Base 2 131 132 .. math:: 133 134 dst.x &= 2^{\lfloor src.x\rfloor} \\ 135 dst.y &= src.x - \lfloor src.x\rfloor \\ 136 dst.z &= 2^{src.x} \\ 137 dst.w &= 1 138 139 140.. opcode:: LOG - Approximate Logarithm Base 2 141 142 .. math:: 143 144 dst.x &= \lfloor\log_2{|src.x|}\rfloor \\ 145 dst.y &= \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} \\ 146 dst.z &= \log_2{|src.x|} \\ 147 dst.w &= 1 148 149 150.. opcode:: MUL - Multiply 151 152 .. math:: 153 154 dst.x = src0.x \times src1.x 155 156 dst.y = src0.y \times src1.y 157 158 dst.z = src0.z \times src1.z 159 160 dst.w = src0.w \times src1.w 161 162 163.. opcode:: ADD - Add 164 165 .. math:: 166 167 dst.x = src0.x + src1.x 168 169 dst.y = src0.y + src1.y 170 171 dst.z = src0.z + src1.z 172 173 dst.w = src0.w + src1.w 174 175 176.. opcode:: DP3 - 3-component Dot Product 177 178 This instruction replicates its result. 179 180 .. math:: 181 182 \begin{aligned} 183 dst = & src0.x \times src1.x +\\ 184 & src0.y \times src1.y +\\ 185 & src0.z \times src1.z 186 \end{aligned} 187 188 189.. opcode:: DP4 - 4-component Dot Product 190 191 This instruction replicates its result. 192 193 .. math:: 194 195 \begin{aligned} 196 dst = & src0.x \times src1.x +\\ 197 & src0.y \times src1.y +\\ 198 & src0.z \times src1.z +\\ 199 & src0.w \times src1.w 200 \end{aligned} 201 202 203.. opcode:: DST - Distance Vector 204 205 .. math:: 206 207 dst.x &= 1\\ 208 dst.y &= src0.y \times src1.y\\ 209 dst.z &= src0.z\\ 210 dst.w &= src1.w 211 212 213.. opcode:: MIN - Minimum 214 215 .. math:: 216 217 dst.x = min(src0.x, src1.x) 218 219 dst.y = min(src0.y, src1.y) 220 221 dst.z = min(src0.z, src1.z) 222 223 dst.w = min(src0.w, src1.w) 224 225 226.. opcode:: MAX - Maximum 227 228 .. math:: 229 230 dst.x = max(src0.x, src1.x) 231 232 dst.y = max(src0.y, src1.y) 233 234 dst.z = max(src0.z, src1.z) 235 236 dst.w = max(src0.w, src1.w) 237 238 239.. opcode:: SLT - Set On Less Than 240 241 .. math:: 242 243 dst.x = (src0.x < src1.x) ? 1.0F : 0.0F 244 245 dst.y = (src0.y < src1.y) ? 1.0F : 0.0F 246 247 dst.z = (src0.z < src1.z) ? 1.0F : 0.0F 248 249 dst.w = (src0.w < src1.w) ? 1.0F : 0.0F 250 251 252.. opcode:: SGE - Set On Greater Equal Than 253 254 .. math:: 255 256 dst.x = (src0.x >= src1.x) ? 1.0F : 0.0F 257 258 dst.y = (src0.y >= src1.y) ? 1.0F : 0.0F 259 260 dst.z = (src0.z >= src1.z) ? 1.0F : 0.0F 261 262 dst.w = (src0.w >= src1.w) ? 1.0F : 0.0F 263 264 265.. opcode:: MAD - Multiply And Add 266 267 Perform a * b + c. The implementation is free to decide whether there is an 268 intermediate rounding step or not. 269 270 .. math:: 271 272 dst.x = src0.x \times src1.x + src2.x 273 274 dst.y = src0.y \times src1.y + src2.y 275 276 dst.z = src0.z \times src1.z + src2.z 277 278 dst.w = src0.w \times src1.w + src2.w 279 280 281.. opcode:: LRP - Linear Interpolate 282 283 .. math:: 284 285 dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x 286 287 dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y 288 289 dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z 290 291 dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w 292 293 294.. opcode:: FMA - Fused Multiply-Add 295 296 Perform a * b + c with no intermediate rounding step. 297 298 .. math:: 299 300 dst.x = src0.x \times src1.x + src2.x 301 302 dst.y = src0.y \times src1.y + src2.y 303 304 dst.z = src0.z \times src1.z + src2.z 305 306 dst.w = src0.w \times src1.w + src2.w 307 308 309.. opcode:: FRC - Fraction 310 311 .. math:: 312 313 dst.x = src.x - \lfloor src.x\rfloor 314 315 dst.y = src.y - \lfloor src.y\rfloor 316 317 dst.z = src.z - \lfloor src.z\rfloor 318 319 dst.w = src.w - \lfloor src.w\rfloor 320 321 322.. opcode:: FLR - Floor 323 324 .. math:: 325 326 dst.x = \lfloor src.x\rfloor 327 328 dst.y = \lfloor src.y\rfloor 329 330 dst.z = \lfloor src.z\rfloor 331 332 dst.w = \lfloor src.w\rfloor 333 334 335.. opcode:: ROUND - Round 336 337 .. math:: 338 339 dst.x = round(src.x) 340 341 dst.y = round(src.y) 342 343 dst.z = round(src.z) 344 345 dst.w = round(src.w) 346 347 348.. opcode:: EX2 - Exponential Base 2 349 350 This instruction replicates its result. 351 352 .. math:: 353 354 dst = 2^{src.x} 355 356 357.. opcode:: LG2 - Logarithm Base 2 358 359 This instruction replicates its result. 360 361 .. math:: 362 363 dst = \log_2{src.x} 364 365 366.. opcode:: POW - Power 367 368 This instruction replicates its result. 369 370 .. math:: 371 372 dst = src0.x^{src1.x} 373 374 375.. opcode:: LDEXP - Multiply Number by Integral Power of 2 376 377 *src1* is an integer. 378 379 .. math:: 380 381 dst.x = src0.x * 2^{src1.x} 382 383 dst.y = src0.y * 2^{src1.y} 384 385 dst.z = src0.z * 2^{src1.z} 386 387 dst.w = src0.w * 2^{src1.w} 388 389 390.. opcode:: COS - Cosine 391 392 This instruction replicates its result. 393 394 .. math:: 395 396 dst = \cos{src.x} 397 398 399.. opcode:: DDX, DDX_FINE - Derivative Relative To X 400 401 The fine variant is only used when ``PIPE_CAP_FS_FINE_DERIVATIVE`` is 402 advertised. When it is, the fine version guarantees one derivative per 403 row while DDX is allowed to be the same for the entire 2x2 quad. 404 405 .. math:: 406 407 dst.x = partialx(src.x) 408 409 dst.y = partialx(src.y) 410 411 dst.z = partialx(src.z) 412 413 dst.w = partialx(src.w) 414 415 416.. opcode:: DDY, DDY_FINE - Derivative Relative To Y 417 418 The fine variant is only used when ``PIPE_CAP_FS_FINE_DERIVATIVE`` is 419 advertised. When it is, the fine version guarantees one derivative per 420 column while DDY is allowed to be the same for the entire 2x2 quad. 421 422 .. math:: 423 424 dst.x = partialy(src.x) 425 426 dst.y = partialy(src.y) 427 428 dst.z = partialy(src.z) 429 430 dst.w = partialy(src.w) 431 432 433.. opcode:: PK2H - Pack Two 16-bit Floats 434 435 This instruction replicates its result. 436 437 .. math:: 438 439 \begin{aligned} 440 dst = & f32\_to\_f16(src.x) | \\ 441 ( & f32\_to\_f16(src.y) \ll 16) 442 \end{aligned} 443 444.. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars 445 446 This instruction replicates its result. 447 448 .. math:: 449 450 \begin{aligned} 451 dst = & f32\_to\_unorm16(src.x) | \\ 452 ( & f32\_to\_unorm16(src.y) \ll 16) 453 \end{aligned} 454 455 456.. opcode:: PK4B - Pack Four Signed 8-bit Scalars 457 458 This instruction replicates its result. 459 460 .. math:: 461 462 \begin{aligned} 463 dst = & f32\_to\_snorm8(src.x) | \\ 464 ( & f32\_to\_snorm8(src.y) \ll 8) | \\ 465 ( & f32\_to\_snorm8(src.z) \ll 16) | \\ 466 ( & f32\_to\_snorm8(src.w) \ll 24) 467 \end{aligned} 468 469 470.. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars 471 472 This instruction replicates its result. 473 474 .. math:: 475 476 \begin{aligned} 477 dst = & f32\_to\_unorm8(src.x) | \\ 478 ( & f32\_to\_unorm8(src.y) \ll 8) | \\ 479 ( & f32\_to\_unorm8(src.z) \ll 16) | \\ 480 ( & f32\_to\_unorm8(src.w) \ll 24) 481 \end{aligned} 482 483 484.. opcode:: SEQ - Set On Equal 485 486 .. math:: 487 488 dst.x = (src0.x == src1.x) ? 1.0F : 0.0F 489 490 dst.y = (src0.y == src1.y) ? 1.0F : 0.0F 491 492 dst.z = (src0.z == src1.z) ? 1.0F : 0.0F 493 494 dst.w = (src0.w == src1.w) ? 1.0F : 0.0F 495 496 497.. opcode:: SGT - Set On Greater Than 498 499 .. math:: 500 501 dst.x = (src0.x > src1.x) ? 1.0F : 0.0F 502 503 dst.y = (src0.y > src1.y) ? 1.0F : 0.0F 504 505 dst.z = (src0.z > src1.z) ? 1.0F : 0.0F 506 507 dst.w = (src0.w > src1.w) ? 1.0F : 0.0F 508 509 510.. opcode:: SIN - Sine 511 512 This instruction replicates its result. 513 514 .. math:: 515 516 dst = \sin{src.x} 517 518 519.. opcode:: SLE - Set On Less Equal Than 520 521 .. math:: 522 523 dst.x = (src0.x <= src1.x) ? 1.0F : 0.0F 524 525 dst.y = (src0.y <= src1.y) ? 1.0F : 0.0F 526 527 dst.z = (src0.z <= src1.z) ? 1.0F : 0.0F 528 529 dst.w = (src0.w <= src1.w) ? 1.0F : 0.0F 530 531 532.. opcode:: SNE - Set On Not Equal 533 534 .. math:: 535 536 dst.x = (src0.x != src1.x) ? 1.0F : 0.0F 537 538 dst.y = (src0.y != src1.y) ? 1.0F : 0.0F 539 540 dst.z = (src0.z != src1.z) ? 1.0F : 0.0F 541 542 dst.w = (src0.w != src1.w) ? 1.0F : 0.0F 543 544 545.. opcode:: TEX - Texture Lookup 546 547 for array textures *src0.y* contains the slice for 1D, 548 and *src0.z* contain the slice for 2D. 549 550 for shadow textures with no arrays (and not cube map), 551 *src0.z* contains the reference value. 552 553 for shadow textures with arrays, *src0.z* contains 554 the reference value for 1D arrays, and *src0.w* contains 555 the reference value for 2D arrays and cube maps. 556 557 for cube map array shadow textures, the reference value 558 cannot be passed in *src0.w*, and TEX2 must be used instead. 559 560 .. math:: 561 562 coord = src0 563 564 shadow\_ref = src0.z \textrm{ or } src0.w \textrm{ (optional)} 565 566 unit = src1 567 568 dst = texture\_sample(unit, coord, shadow\_ref) 569 570 571.. opcode:: TEX2 - Texture Lookup (for shadow cube map arrays only) 572 573 this is the same as TEX, but uses another reg to encode the 574 reference value. 575 576 .. math:: 577 578 coord = src0 579 580 shadow\_ref = src1.x 581 582 unit = src2 583 584 dst = texture\_sample(unit, coord, shadow\_ref) 585 586 587.. opcode:: TXD - Texture Lookup with Derivatives 588 589 .. math:: 590 591 coord = src0 592 593 ddx = src1 594 595 ddy = src2 596 597 unit = src3 598 599 dst = texture\_sample\_deriv(unit, coord, ddx, ddy) 600 601 602.. opcode:: TXP - Projective Texture Lookup 603 604 .. math:: 605 606 coord.x = src0.x / src0.w 607 608 coord.y = src0.y / src0.w 609 610 coord.z = src0.z / src0.w 611 612 coord.w = src0.w 613 614 unit = src1 615 616 dst = texture\_sample(unit, coord) 617 618 619.. opcode:: UP2H - Unpack Two 16-Bit Floats 620 621 .. math:: 622 623 dst.x = f16\_to\_f32(src0.x \& 0xffff) 624 625 dst.y = f16\_to\_f32(src0.x \gg 16) 626 627 dst.z = f16\_to\_f32(src0.x \& 0xffff) 628 629 dst.w = f16\_to\_f32(src0.x \gg 16) 630 631 .. note:: 632 633 Considered for removal. 634 635.. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars 636 637 TBD 638 639 .. note:: 640 641 Considered for removal. 642 643.. opcode:: UP4B - Unpack Four Signed 8-Bit Values 644 645 TBD 646 647 .. note:: 648 649 Considered for removal. 650 651.. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars 652 653 TBD 654 655 .. note:: 656 657 Considered for removal. 658 659 660.. opcode:: ARR - Address Register Load With Round 661 662 .. math:: 663 664 dst.x = (int) round(src.x) 665 666 dst.y = (int) round(src.y) 667 668 dst.z = (int) round(src.z) 669 670 dst.w = (int) round(src.w) 671 672 673.. opcode:: SSG - Set Sign 674 675 .. math:: 676 677 dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0 678 679 dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0 680 681 dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0 682 683 dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0 684 685 686.. opcode:: CMP - Compare 687 688 .. math:: 689 690 dst.x = (src0.x < 0) ? src1.x : src2.x 691 692 dst.y = (src0.y < 0) ? src1.y : src2.y 693 694 dst.z = (src0.z < 0) ? src1.z : src2.z 695 696 dst.w = (src0.w < 0) ? src1.w : src2.w 697 698 699.. opcode:: KILL_IF - Conditional Discard 700 701 Conditional discard. Allowed in fragment shaders only. 702 703 Pseudocode:: 704 705 if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0) 706 discard 707 endif 708 709 710.. opcode:: KILL - Discard 711 712 Unconditional discard. Allowed in fragment shaders only. 713 714 715.. opcode:: DEMOTE - Demote Invocation to a Helper 716 717 This demotes the current invocation to a helper, but continues 718 execution (while KILL may or may not terminate the 719 invocation). After this runs, all the usual helper invocation rules 720 apply about discarding buffer and render target writes. This is 721 useful for having accurate derivatives in the other invocations 722 which have not been demoted. 723 724 Allowed in fragment shaders only. 725 726 727.. opcode:: READ_HELPER - Reads Invocation Helper Status 728 729 This is identical to ``TGSI_SEMANTIC_HELPER_INVOCATION``, except 730 this will read the current value, which might change as a result of 731 a ``DEMOTE`` instruction. 732 733 Allowed in fragment shaders only. 734 735 736.. opcode:: TXB - Texture Lookup With Bias 737 738 for cube map array textures and shadow cube maps, the bias value 739 cannot be passed in *src0.w*, and TXB2 must be used instead. 740 741 if the target is a shadow texture, the reference value is always 742 in *src.z* (this prevents shadow 3d and shadow 2d arrays from 743 using this instruction, but this is not needed). 744 745 .. math:: 746 747 coord.x = src0.x 748 749 coord.y = src0.y 750 751 coord.z = src0.z 752 753 coord.w = none 754 755 bias = src0.w 756 757 unit = src1 758 759 dst = texture\_sample(unit, coord, bias) 760 761 762.. opcode:: TXB2 - Texture Lookup With Bias (some cube maps only) 763 764 this is the same as TXB, but uses another reg to encode the 765 LOD bias value for cube map arrays and shadow cube maps. 766 Presumably shadow 2d arrays and shadow 3d targets could use 767 this encoding too, but this is not legal. 768 769 if the target is a shadow cube map array, the reference value is in 770 *src1.y*. 771 772 .. math:: 773 774 coord = src0 775 776 bias = src1.x 777 778 unit = src2 779 780 dst = texture\_sample(unit, coord, bias) 781 782 783.. opcode:: DIV - Divide 784 785 .. math:: 786 787 dst.x = \frac{src0.x}{src1.x} 788 789 dst.y = \frac{src0.y}{src1.y} 790 791 dst.z = \frac{src0.z}{src1.z} 792 793 dst.w = \frac{src0.w}{src1.w} 794 795 796.. opcode:: DP2 - 2-component Dot Product 797 798 This instruction replicates its result. 799 800 .. math:: 801 802 \begin{aligned} 803 dst = & src0.x \times src1.x + \\ 804 & src0.y \times src1.y 805 \end{aligned} 806 807.. opcode:: TEX_LZ - Texture Lookup With LOD = 0 808 809 This is the same as TXL with LOD = 0. Like every texture opcode, it obeys 810 pipe_sampler_view::u.tex.first_level and pipe_sampler_state::min_lod. 811 There is no way to override those two in shaders. 812 813 .. math:: 814 815 coord.x = src0.x 816 817 coord.y = src0.y 818 819 coord.z = src0.z 820 821 coord.w = none 822 823 lod = 0 824 825 unit = src1 826 827 dst = texture\_sample(unit, coord, lod) 828 829 830.. opcode:: TXL - Texture Lookup With explicit LOD 831 832 for cube map array textures, the explicit LOD value 833 cannot be passed in *src0.w*, and TXL2 must be used instead. 834 835 if the target is a shadow texture, the reference value is always 836 in *src.z* (this prevents shadow 3d / 2d array / cube targets from 837 using this instruction, but this is not needed). 838 839 .. math:: 840 841 coord.x = src0.x 842 843 coord.y = src0.y 844 845 coord.z = src0.z 846 847 coord.w = none 848 849 lod = src0.w 850 851 unit = src1 852 853 dst = texture\_sample(unit, coord, lod) 854 855 856.. opcode:: TXL2 - Texture Lookup With explicit LOD (for cube map arrays only) 857 858 this is the same as TXL, but uses another reg to encode the 859 explicit LOD value. 860 Presumably shadow 3d / 2d array / cube targets could use 861 this encoding too, but this is not legal. 862 863 if the target is a shadow cube map array, the reference value is in 864 *src1.y*. 865 866 .. math:: 867 868 coord = src0 869 870 lod = src1.x 871 872 unit = src2 873 874 dst = texture\_sample(unit, coord, lod) 875 876 877Compute ISA 878^^^^^^^^^^^^^^^^^^^^^^^^ 879 880These opcodes are primarily provided for special-use computational shaders. 881Support for these opcodes indicated by a special pipe capability bit (TBD). 882 883XXX doesn't look like most of the opcodes really belong here. 884 885.. opcode:: CEIL - Ceiling 886 887 .. math:: 888 889 dst.x = \lceil src.x\rceil 890 891 dst.y = \lceil src.y\rceil 892 893 dst.z = \lceil src.z\rceil 894 895 dst.w = \lceil src.w\rceil 896 897 898.. opcode:: TRUNC - Truncate 899 900 .. math:: 901 902 dst.x = trunc(src.x) 903 904 dst.y = trunc(src.y) 905 906 dst.z = trunc(src.z) 907 908 dst.w = trunc(src.w) 909 910 911.. opcode:: MOD - Modulus 912 913 .. math:: 914 915 dst.x = src0.x \bmod src1.x 916 917 dst.y = src0.y \bmod src1.y 918 919 dst.z = src0.z \bmod src1.z 920 921 dst.w = src0.w \bmod src1.w 922 923 924.. opcode:: UARL - Integer Address Register Load 925 926 Moves the contents of the source register, assumed to be an integer, into the 927 destination register, which is assumed to be an address (ADDR) register. 928 929 930.. opcode:: TXF - Texel Fetch 931 932 As per :ext:`GL_NV_gpu_program4`, extract a single texel from a specified 933 texture image or PIPE_BUFFER resource. The source sampler may not be a 934 CUBE or SHADOW. *src0* is a 935 four-component signed integer vector used to identify the single texel 936 accessed. 3 components + level. If the texture is multisampled, then 937 the fourth component indicates the sample, not the mipmap level. 938 Just like texture instructions, an optional 939 offset vector is provided, which is subject to various driver restrictions 940 (regarding range, source of offsets). This instruction ignores the sampler 941 state. 942 943 TXF(uint_vec coord, int_vec offset). 944 945 946.. opcode:: TXQ - Texture Size Query 947 948 As per :ext:`GL_NV_gpu_program4`, retrieve the dimensions of the texture 949 depending on the target. For 1D (width), 2D/RECT/CUBE (width, height), 950 3D (width, height, depth), 1D array (width, layers), 2D array (width, 951 height, layers). Also return the number of accessible levels 952 (last_level - first_level + 1) in W. 953 954 For components which don't return a resource dimension, their value 955 is undefined. 956 957 .. math:: 958 959 lod = src0.x 960 961 dst.x = texture\_width(unit, lod) 962 963 dst.y = texture\_height(unit, lod) 964 965 dst.z = texture\_depth(unit, lod) 966 967 dst.w = texture\_levels(unit) 968 969 970.. opcode:: TXQS - Texture Samples Query 971 972 This retrieves the number of samples in the texture, and stores it 973 into the x component as an unsigned integer. The other components are 974 undefined. If the texture is not multisampled, this function returns 975 (1, undef, undef, undef). 976 977 .. math:: 978 979 dst.x = texture\_samples(unit) 980 981 982.. opcode:: TG4 - Texture Gather 983 984 As per :ext:`GL_ARB_texture_gather`, gathers the four texels to be used in a 985 bi-linear filtering operation and packs them into a single register. 986 Only works with 2D, 2D array, cubemaps, and cubemaps arrays. For 2D 987 textures, only the addressing modes of the sampler and the top level of any 988 mip pyramid are used. Set W to zero. It behaves like the TEX instruction, 989 but a filtered sample is not generated. The four samples that contribute to 990 filtering are placed into XYZW in clockwise order, starting with the (u,v) 991 texture coordinate delta at the following locations (-, +), (+, +), (+, -), 992 (-, -), where the magnitude of the deltas are half a texel. 993 994 PIPE_CAP_TEXTURE_SM5 enhances this instruction to support shadow per-sample 995 depth compares, single component selection, and a non-constant offset. It 996 doesn't allow support for the GL independent offset to get i0,j0. This would 997 require another CAP is HW can do it natively. For now we lower that before 998 TGSI. 999 1000 PIPE_CAP_TGSI_TG4_COMPONENT_IN_SWIZZLE changes the encoding so that component 1001 is stored in the sampler source swizzle x. 1002 1003 (without TGSI_TG4_COMPONENT_IN_SWIZZLE) 1004 1005 .. math:: 1006 1007 coord = src0 1008 1009 component = src1 1010 1011 dst = texture\_gather4 (unit, coord, component) 1012 1013 (with TGSI_TG4_COMPONENT_IN_SWIZZLE) 1014 1015 .. math:: 1016 1017 coord = src0 1018 1019 dst = texture\_gather4 (unit, coord) 1020 1021 \text{component is encoded in sampler swizzle.} 1022 1023 (with SM5 - cube array shadow) 1024 1025 .. math:: 1026 1027 coord = src0 1028 1029 compare = src1 1030 1031 dst = texture\_gather (uint, coord, compare) 1032 1033.. opcode:: LODQ - level of detail query 1034 1035 Compute the LOD information that the texture pipe would use to access the 1036 texture. The Y component contains the computed LOD lambda_prime. The X 1037 component contains the LOD that will be accessed, based on min/max LODs 1038 and mipmap filters. 1039 1040 .. math:: 1041 1042 coord = src0 1043 1044 dst.xy = lodq(uint, coord); 1045 1046.. opcode:: CLOCK - retrieve the current shader time 1047 1048 Invoking this instruction multiple times in the same shader should 1049 cause monotonically increasing values to be returned. The values 1050 are implicitly 64-bit, so if fewer than 64 bits of precision are 1051 available, to provide expected wraparound semantics, the value 1052 should be shifted up so that the most significant bit of the time 1053 is the most significant bit of the 64-bit value. 1054 1055 .. math:: 1056 1057 dst.xy = clock() 1058 1059 1060Integer ISA 1061^^^^^^^^^^^^^^^^^^^^^^^^ 1062These opcodes are used for integer operations. 1063Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?) 1064 1065 1066.. opcode:: I2F - Signed Integer To Float 1067 1068 Rounding is unspecified (round to nearest even suggested). 1069 1070 .. math:: 1071 1072 dst.x = (float) src.x 1073 1074 dst.y = (float) src.y 1075 1076 dst.z = (float) src.z 1077 1078 dst.w = (float) src.w 1079 1080 1081.. opcode:: U2F - Unsigned Integer To Float 1082 1083 Rounding is unspecified (round to nearest even suggested). 1084 1085 .. math:: 1086 1087 dst.x = (float) src.x 1088 1089 dst.y = (float) src.y 1090 1091 dst.z = (float) src.z 1092 1093 dst.w = (float) src.w 1094 1095 1096.. opcode:: F2I - Float to Signed Integer 1097 1098 Rounding is towards zero (truncate). 1099 Values outside signed range (including NaNs) produce undefined results. 1100 1101 .. math:: 1102 1103 dst.x = (int) src.x 1104 1105 dst.y = (int) src.y 1106 1107 dst.z = (int) src.z 1108 1109 dst.w = (int) src.w 1110 1111 1112.. opcode:: F2U - Float to Unsigned Integer 1113 1114 Rounding is towards zero (truncate). 1115 Values outside unsigned range (including NaNs) produce undefined results. 1116 1117 .. math:: 1118 1119 dst.x = (unsigned) src.x 1120 1121 dst.y = (unsigned) src.y 1122 1123 dst.z = (unsigned) src.z 1124 1125 dst.w = (unsigned) src.w 1126 1127 1128.. opcode:: UADD - Integer Add 1129 1130 This instruction works the same for signed and unsigned integers. 1131 The low 32bit of the result is returned. 1132 1133 .. math:: 1134 1135 dst.x = src0.x + src1.x 1136 1137 dst.y = src0.y + src1.y 1138 1139 dst.z = src0.z + src1.z 1140 1141 dst.w = src0.w + src1.w 1142 1143 1144.. opcode:: UMAD - Integer Multiply And Add 1145 1146 This instruction works the same for signed and unsigned integers. 1147 The multiplication returns the low 32bit (as does the result itself). 1148 1149 .. math:: 1150 1151 dst.x = src0.x \times src1.x + src2.x 1152 1153 dst.y = src0.y \times src1.y + src2.y 1154 1155 dst.z = src0.z \times src1.z + src2.z 1156 1157 dst.w = src0.w \times src1.w + src2.w 1158 1159 1160.. opcode:: UMUL - Integer Multiply 1161 1162 This instruction works the same for signed and unsigned integers. 1163 The low 32bit of the result is returned. 1164 1165 .. math:: 1166 1167 dst.x = src0.x \times src1.x 1168 1169 dst.y = src0.y \times src1.y 1170 1171 dst.z = src0.z \times src1.z 1172 1173 dst.w = src0.w \times src1.w 1174 1175 1176.. opcode:: IMUL_HI - Signed Integer Multiply High Bits 1177 1178 The high 32bits of the multiplication of 2 signed integers are returned. 1179 1180 .. math:: 1181 1182 dst.x = (src0.x \times src1.x) \gg 32 1183 1184 dst.y = (src0.y \times src1.y) \gg 32 1185 1186 dst.z = (src0.z \times src1.z) \gg 32 1187 1188 dst.w = (src0.w \times src1.w) \gg 32 1189 1190 1191.. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits 1192 1193 The high 32bits of the multiplication of 2 unsigned integers are returned. 1194 1195 .. math:: 1196 1197 dst.x = (src0.x \times src1.x) \gg 32 1198 1199 dst.y = (src0.y \times src1.y) \gg 32 1200 1201 dst.z = (src0.z \times src1.z) \gg 32 1202 1203 dst.w = (src0.w \times src1.w) \gg 32 1204 1205 1206.. opcode:: IDIV - Signed Integer Division 1207 1208 TBD: behavior for division by zero. 1209 1210 .. math:: 1211 1212 dst.x = \frac{src0.x}{src1.x} 1213 1214 dst.y = \frac{src0.y}{src1.y} 1215 1216 dst.z = \frac{src0.z}{src1.z} 1217 1218 dst.w = \frac{src0.w}{src1.w} 1219 1220 1221.. opcode:: UDIV - Unsigned Integer Division 1222 1223 For division by zero, ``0xffffffff`` is returned. 1224 1225 .. math:: 1226 1227 dst.x = \frac{src0.x}{src1.x} 1228 1229 dst.y = \frac{src0.y}{src1.y} 1230 1231 dst.z = \frac{src0.z}{src1.z} 1232 1233 dst.w = \frac{src0.w}{src1.w} 1234 1235 1236.. opcode:: UMOD - Unsigned Integer Remainder 1237 1238 If *src1* is zero, ``0xffffffff`` is returned. 1239 1240 .. math:: 1241 1242 dst.x = src0.x \bmod src1.x 1243 1244 dst.y = src0.y \bmod src1.y 1245 1246 dst.z = src0.z \bmod src1.z 1247 1248 dst.w = src0.w \bmod src1.w 1249 1250 1251.. opcode:: NOT - Bitwise Not 1252 1253 .. math:: 1254 1255 dst.x = \sim src.x 1256 1257 dst.y = \sim src.y 1258 1259 dst.z = \sim src.z 1260 1261 dst.w = \sim src.w 1262 1263 1264.. opcode:: AND - Bitwise And 1265 1266 .. math:: 1267 1268 dst.x = src0.x \& src1.x 1269 1270 dst.y = src0.y \& src1.y 1271 1272 dst.z = src0.z \& src1.z 1273 1274 dst.w = src0.w \& src1.w 1275 1276 1277.. opcode:: OR - Bitwise Or 1278 1279 .. math:: 1280 1281 dst.x = src0.x | src1.x 1282 1283 dst.y = src0.y | src1.y 1284 1285 dst.z = src0.z | src1.z 1286 1287 dst.w = src0.w | src1.w 1288 1289 1290.. opcode:: XOR - Bitwise Xor 1291 1292 .. math:: 1293 1294 dst.x = src0.x \oplus src1.x 1295 1296 dst.y = src0.y \oplus src1.y 1297 1298 dst.z = src0.z \oplus src1.z 1299 1300 dst.w = src0.w \oplus src1.w 1301 1302 1303.. opcode:: IMAX - Maximum of Signed Integers 1304 1305 .. math:: 1306 1307 dst.x = max(src0.x, src1.x) 1308 1309 dst.y = max(src0.y, src1.y) 1310 1311 dst.z = max(src0.z, src1.z) 1312 1313 dst.w = max(src0.w, src1.w) 1314 1315 1316.. opcode:: UMAX - Maximum of Unsigned Integers 1317 1318 .. math:: 1319 1320 dst.x = max(src0.x, src1.x) 1321 1322 dst.y = max(src0.y, src1.y) 1323 1324 dst.z = max(src0.z, src1.z) 1325 1326 dst.w = max(src0.w, src1.w) 1327 1328 1329.. opcode:: IMIN - Minimum of Signed Integers 1330 1331 .. math:: 1332 1333 dst.x = min(src0.x, src1.x) 1334 1335 dst.y = min(src0.y, src1.y) 1336 1337 dst.z = min(src0.z, src1.z) 1338 1339 dst.w = min(src0.w, src1.w) 1340 1341 1342.. opcode:: UMIN - Minimum of Unsigned Integers 1343 1344 .. math:: 1345 1346 dst.x = min(src0.x, src1.x) 1347 1348 dst.y = min(src0.y, src1.y) 1349 1350 dst.z = min(src0.z, src1.z) 1351 1352 dst.w = min(src0.w, src1.w) 1353 1354 1355.. opcode:: SHL - Shift Left 1356 1357 The shift count is masked with ``0x1f`` before the shift is applied. 1358 1359 .. math:: 1360 1361 dst.x = src0.x \ll (0x1f \& src1.x) 1362 1363 dst.y = src0.y \ll (0x1f \& src1.y) 1364 1365 dst.z = src0.z \ll (0x1f \& src1.z) 1366 1367 dst.w = src0.w \ll (0x1f \& src1.w) 1368 1369 1370.. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer) 1371 1372 The shift count is masked with ``0x1f`` before the shift is applied. 1373 1374 .. math:: 1375 1376 dst.x = src0.x \gg (0x1f \& src1.x) 1377 1378 dst.y = src0.y \gg (0x1f \& src1.y) 1379 1380 dst.z = src0.z \gg (0x1f \& src1.z) 1381 1382 dst.w = src0.w \gg (0x1f \& src1.w) 1383 1384 1385.. opcode:: USHR - Logical Shift Right 1386 1387 The shift count is masked with ``0x1f`` before the shift is applied. 1388 1389 .. math:: 1390 1391 dst.x = src0.x \gg (unsigned) (0x1f \& src1.x) 1392 1393 dst.y = src0.y \gg (unsigned) (0x1f \& src1.y) 1394 1395 dst.z = src0.z \gg (unsigned) (0x1f \& src1.z) 1396 1397 dst.w = src0.w \gg (unsigned) (0x1f \& src1.w) 1398 1399 1400.. opcode:: UCMP - Integer Conditional Move 1401 1402 .. math:: 1403 1404 dst.x = src0.x ? src1.x : src2.x 1405 1406 dst.y = src0.y ? src1.y : src2.y 1407 1408 dst.z = src0.z ? src1.z : src2.z 1409 1410 dst.w = src0.w ? src1.w : src2.w 1411 1412 1413 1414.. opcode:: ISSG - Integer Set Sign 1415 1416 .. math:: 1417 1418 dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0 1419 1420 dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0 1421 1422 dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0 1423 1424 dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0 1425 1426 1427 1428.. opcode:: FSLT - Float Set On Less Than (ordered) 1429 1430 Same comparison as SLT but returns integer instead of 1.0/0.0 float 1431 1432 .. math:: 1433 1434 dst.x = (src0.x < src1.x) ? \sim 0 : 0 1435 1436 dst.y = (src0.y < src1.y) ? \sim 0 : 0 1437 1438 dst.z = (src0.z < src1.z) ? \sim 0 : 0 1439 1440 dst.w = (src0.w < src1.w) ? \sim 0 : 0 1441 1442 1443.. opcode:: ISLT - Signed Integer Set On Less Than 1444 1445 .. math:: 1446 1447 dst.x = (src0.x < src1.x) ? \sim 0 : 0 1448 1449 dst.y = (src0.y < src1.y) ? \sim 0 : 0 1450 1451 dst.z = (src0.z < src1.z) ? \sim 0 : 0 1452 1453 dst.w = (src0.w < src1.w) ? \sim 0 : 0 1454 1455 1456.. opcode:: USLT - Unsigned Integer Set On Less Than 1457 1458 .. math:: 1459 1460 dst.x = (src0.x < src1.x) ? \sim 0 : 0 1461 1462 dst.y = (src0.y < src1.y) ? \sim 0 : 0 1463 1464 dst.z = (src0.z < src1.z) ? \sim 0 : 0 1465 1466 dst.w = (src0.w < src1.w) ? \sim 0 : 0 1467 1468 1469.. opcode:: FSGE - Float Set On Greater Equal Than (ordered) 1470 1471 Same comparison as SGE but returns integer instead of 1.0/0.0 float 1472 1473 .. math:: 1474 1475 dst.x = (src0.x >= src1.x) ? \sim 0 : 0 1476 1477 dst.y = (src0.y >= src1.y) ? \sim 0 : 0 1478 1479 dst.z = (src0.z >= src1.z) ? \sim 0 : 0 1480 1481 dst.w = (src0.w >= src1.w) ? \sim 0 : 0 1482 1483 1484.. opcode:: ISGE - Signed Integer Set On Greater Equal Than 1485 1486 .. math:: 1487 1488 dst.x = (src0.x >= src1.x) ? \sim 0 : 0 1489 1490 dst.y = (src0.y >= src1.y) ? \sim 0 : 0 1491 1492 dst.z = (src0.z >= src1.z) ? \sim 0 : 0 1493 1494 dst.w = (src0.w >= src1.w) ? \sim 0 : 0 1495 1496 1497.. opcode:: USGE - Unsigned Integer Set On Greater Equal Than 1498 1499 .. math:: 1500 1501 dst.x = (src0.x >= src1.x) ? \sim 0 : 0 1502 1503 dst.y = (src0.y >= src1.y) ? \sim 0 : 0 1504 1505 dst.z = (src0.z >= src1.z) ? \sim 0 : 0 1506 1507 dst.w = (src0.w >= src1.w) ? \sim 0 : 0 1508 1509 1510.. opcode:: FSEQ - Float Set On Equal (ordered) 1511 1512 Same comparison as SEQ but returns integer instead of 1.0/0.0 float 1513 1514 .. math:: 1515 1516 dst.x = (src0.x == src1.x) ? \sim 0 : 0 1517 1518 dst.y = (src0.y == src1.y) ? \sim 0 : 0 1519 1520 dst.z = (src0.z == src1.z) ? \sim 0 : 0 1521 1522 dst.w = (src0.w == src1.w) ? \sim 0 : 0 1523 1524 1525.. opcode:: USEQ - Integer Set On Equal 1526 1527 .. math:: 1528 1529 dst.x = (src0.x == src1.x) ? \sim 0 : 0 1530 1531 dst.y = (src0.y == src1.y) ? \sim 0 : 0 1532 1533 dst.z = (src0.z == src1.z) ? \sim 0 : 0 1534 1535 dst.w = (src0.w == src1.w) ? \sim 0 : 0 1536 1537 1538.. opcode:: FSNE - Float Set On Not Equal (unordered) 1539 1540 Same comparison as SNE but returns integer instead of 1.0/0.0 float 1541 1542 .. math:: 1543 1544 dst.x = (src0.x != src1.x) ? \sim 0 : 0 1545 1546 dst.y = (src0.y != src1.y) ? \sim 0 : 0 1547 1548 dst.z = (src0.z != src1.z) ? \sim 0 : 0 1549 1550 dst.w = (src0.w != src1.w) ? \sim 0 : 0 1551 1552 1553.. opcode:: USNE - Integer Set On Not Equal 1554 1555 .. math:: 1556 1557 dst.x = (src0.x != src1.x) ? \sim 0 : 0 1558 1559 dst.y = (src0.y != src1.y) ? \sim 0 : 0 1560 1561 dst.z = (src0.z != src1.z) ? \sim 0 : 0 1562 1563 dst.w = (src0.w != src1.w) ? \sim 0 : 0 1564 1565 1566.. opcode:: INEG - Integer Negate 1567 1568 Two's complement. 1569 1570 .. math:: 1571 1572 dst.x = -src.x 1573 1574 dst.y = -src.y 1575 1576 dst.z = -src.z 1577 1578 dst.w = -src.w 1579 1580 1581.. opcode:: IABS - Integer Absolute Value 1582 1583 .. math:: 1584 1585 dst.x = |src.x| 1586 1587 dst.y = |src.y| 1588 1589 dst.z = |src.z| 1590 1591 dst.w = |src.w| 1592 1593Bitwise ISA 1594^^^^^^^^^^^ 1595These opcodes are used for bit-level manipulation of integers. 1596 1597.. opcode:: IBFE - Signed Bitfield Extract 1598 1599 Like GLSL bitfieldExtract. Extracts a set of bits from the input, and 1600 sign-extends them if the high bit of the extracted window is set. 1601 1602 Pseudocode:: 1603 1604 def ibfe(value, offset, bits): 1605 if offset < 0 or bits < 0 or offset + bits > 32: 1606 return undefined 1607 if bits == 0: return 0 1608 # Note: >> sign-extends 1609 return (value << (32 - offset - bits)) >> (32 - bits) 1610 1611.. opcode:: UBFE - Unsigned Bitfield Extract 1612 1613 Like GLSL bitfieldExtract. Extracts a set of bits from the input, without 1614 any sign-extension. 1615 1616 Pseudocode:: 1617 1618 def ubfe(value, offset, bits): 1619 if offset < 0 or bits < 0 or offset + bits > 32: 1620 return undefined 1621 if bits == 0: return 0 1622 # Note: >> does not sign-extend 1623 return (value << (32 - offset - bits)) >> (32 - bits) 1624 1625.. opcode:: BFI - Bitfield Insert 1626 1627 Like GLSL bitfieldInsert. Replaces a bit region of 'base' with the low bits 1628 of 'insert'. 1629 1630 Pseudocode:: 1631 1632 def bfi(base, insert, offset, bits): 1633 if offset < 0 or bits < 0 or offset + bits > 32: 1634 return undefined 1635 # << defined such that mask == ~0 when bits == 32, offset == 0 1636 mask = ((1 << bits) - 1) << offset 1637 return ((insert << offset) & mask) | (base & ~mask) 1638 1639.. opcode:: BREV - Bitfield Reverse 1640 1641 See SM5 instruction BFREV. Reverses the bits of the argument. 1642 1643.. opcode:: POPC - Population Count 1644 1645 See SM5 instruction COUNTBITS. Counts the number of set bits in the argument. 1646 1647.. opcode:: LSB - Index of lowest set bit 1648 1649 See SM5 instruction FIRSTBIT_LO. Computes the 0-based index of the first set 1650 bit of the argument. Returns -1 if none are set. 1651 1652.. opcode:: IMSB - Index of highest non-sign bit 1653 1654 See SM5 instruction FIRSTBIT_SHI. Computes the 0-based index of the highest 1655 non-sign bit of the argument (i.e. highest 0 bit for negative numbers, 1656 highest 1 bit for positive numbers). Returns -1 if all bits are the same 1657 (i.e. for inputs 0 and -1). 1658 1659.. opcode:: UMSB - Index of highest set bit 1660 1661 See SM5 instruction FIRSTBIT_HI. Computes the 0-based index of the highest 1662 set bit of the argument. Returns -1 if none are set. 1663 1664Geometry ISA 1665^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1666 1667These opcodes are only supported in geometry shaders; they have no meaning 1668in any other type of shader. 1669 1670.. opcode:: EMIT - Emit 1671 1672 Generate a new vertex for the current primitive into the specified vertex 1673 stream using the values in the output registers. 1674 1675 1676.. opcode:: ENDPRIM - End Primitive 1677 1678 Complete the current primitive in the specified vertex stream (consisting of 1679 the emitted vertices), and start a new one. 1680 1681 1682GLSL ISA 1683^^^^^^^^^^ 1684 1685These opcodes are part of :term:`GLSL`'s opcode set. Support for these 1686opcodes is determined by a special capability bit, ``GLSL``. 1687Some require glsl version 1.30 (UIF/SWITCH/CASE/DEFAULT/ENDSWITCH). 1688 1689.. opcode:: CAL - Subroutine Call 1690 1691 Pseudocode:: 1692 1693 push(pc) 1694 pc = target 1695 1696 1697.. opcode:: RET - Subroutine Call Return 1698 1699 Pseudocode:: 1700 1701 pc = pop() 1702 1703 1704.. opcode:: CONT - Continue 1705 1706 Unconditionally moves the point of execution to the instruction after the 1707 last BGNLOOP. The instruction must appear within a BGNLOOP/ENDLOOP. 1708 1709.. note:: 1710 1711 Support for CONT is determined by a special capability bit, 1712 ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information. 1713 1714 1715.. opcode:: BGNLOOP - Begin a Loop 1716 1717 Start a loop. Must have a matching ENDLOOP. 1718 1719 1720.. opcode:: BGNSUB - Begin Subroutine 1721 1722 Starts definition of a subroutine. Must have a matching ENDSUB. 1723 1724 1725.. opcode:: ENDLOOP - End a Loop 1726 1727 End a loop started with BGNLOOP. 1728 1729 1730.. opcode:: ENDSUB - End Subroutine 1731 1732 Ends definition of a subroutine. 1733 1734 1735.. opcode:: NOP - No Operation 1736 1737 Do nothing. 1738 1739 1740.. opcode:: BRK - Break 1741 1742 Unconditionally moves the point of execution to the instruction after the 1743 next ENDLOOP or ENDSWITCH. The instruction must appear within a 1744 BGNLOOP/ENDLOOP or SWITCH/ENDSWITCH. 1745 1746 1747.. opcode:: IF - Float If 1748 1749 Start an IF ... ELSE .. ENDIF block. Condition evaluates to true if 1750 1751 *src0.x* != 0.0 1752 1753 where *src0.x* is interpreted as a floating point register. 1754 1755 1756.. opcode:: UIF - Bitwise If 1757 1758 Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if 1759 1760 *src0.x* != 0 1761 1762 where *src0.x* is interpreted as an integer register. 1763 1764 1765.. opcode:: ELSE - Else 1766 1767 Starts an else block, after an IF or UIF statement. 1768 1769 1770.. opcode:: ENDIF - End If 1771 1772 Ends an IF or UIF block. 1773 1774 1775.. opcode:: SWITCH - Switch 1776 1777 Starts a C-style switch expression. The switch consists of one or multiple 1778 CASE statements, and at most one DEFAULT statement. Execution of a statement 1779 ends when a BRK is hit, but just like in C falling through to other cases 1780 without a break is allowed. Similarly, DEFAULT label is allowed anywhere not 1781 just as last statement, and fallthrough is allowed into/from it. 1782 CASE *src* arguments are evaluated at bit level against the SWITCH *src* argument. 1783 1784 Example:: 1785 1786 SWITCH src[0].x 1787 CASE src[0].x 1788 (some instructions here) 1789 (optional BRK here) 1790 DEFAULT 1791 (some instructions here) 1792 (optional BRK here) 1793 CASE src[0].x 1794 (some instructions here) 1795 (optional BRK here) 1796 ENDSWITCH 1797 1798 1799.. opcode:: CASE - Switch case 1800 1801 This represents a switch case label. The *src* arg must be an integer immediate. 1802 1803 1804.. opcode:: DEFAULT - Switch default 1805 1806 This represents the default case in the switch, which is taken if no other 1807 case matches. 1808 1809 1810.. opcode:: ENDSWITCH - End of switch 1811 1812 Ends a switch expression. 1813 1814 1815Interpolation ISA 1816^^^^^^^^^^^^^^^^^ 1817 1818The interpolation instructions allow an input to be interpolated in a 1819different way than its declaration. This corresponds to the GLSL 4.00 1820interpolateAt* functions. The first argument of each of these must come from 1821``TGSI_FILE_INPUT``. 1822 1823.. opcode:: INTERP_CENTROID - Interpolate at the centroid 1824 1825 Interpolates the varying specified by *src0* at the centroid 1826 1827.. opcode:: INTERP_SAMPLE - Interpolate at the specified sample 1828 1829 Interpolates the varying specified by *src0* at the sample id 1830 specified by *src1.x* (interpreted as an integer) 1831 1832.. opcode:: INTERP_OFFSET - Interpolate at the specified offset 1833 1834 Interpolates the varying specified by *src0* at the offset *src1.xy* 1835 from the pixel center (interpreted as floats) 1836 1837 1838.. _doubleopcodes: 1839 1840Double ISA 1841^^^^^^^^^^^^^^^ 1842 1843The double-precision opcodes reinterpret four-component vectors into 1844two-component vectors with doubled precision in each component. 1845 1846.. opcode:: DABS - Absolute 1847 1848 .. math:: 1849 1850 dst.xy = |src0.xy| 1851 1852 dst.zw = |src0.zw| 1853 1854.. opcode:: DADD - Add 1855 1856 .. math:: 1857 1858 dst.xy = src0.xy + src1.xy 1859 1860 dst.zw = src0.zw + src1.zw 1861 1862.. opcode:: DSEQ - Set on Equal 1863 1864 .. math:: 1865 1866 dst.x = src0.xy == src1.xy ? \sim 0 : 0 1867 1868 dst.z = src0.zw == src1.zw ? \sim 0 : 0 1869 1870.. opcode:: DSNE - Set on Not Equal 1871 1872 .. math:: 1873 1874 dst.x = src0.xy != src1.xy ? \sim 0 : 0 1875 1876 dst.z = src0.zw != src1.zw ? \sim 0 : 0 1877 1878.. opcode:: DSLT - Set on Less than 1879 1880 .. math:: 1881 1882 dst.x = src0.xy < src1.xy ? \sim 0 : 0 1883 1884 dst.z = src0.zw < src1.zw ? \sim 0 : 0 1885 1886.. opcode:: DSGE - Set on Greater equal 1887 1888 .. math:: 1889 1890 dst.x = src0.xy >= src1.xy ? \sim 0 : 0 1891 1892 dst.z = src0.zw >= src1.zw ? \sim 0 : 0 1893 1894.. opcode:: DFRAC - Fraction 1895 1896 .. math:: 1897 1898 dst.xy = src.xy - \lfloor src.xy\rfloor 1899 1900 dst.zw = src.zw - \lfloor src.zw\rfloor 1901 1902.. opcode:: DTRUNC - Truncate 1903 1904 .. math:: 1905 1906 dst.xy = trunc(src.xy) 1907 1908 dst.zw = trunc(src.zw) 1909 1910.. opcode:: DCEIL - Ceiling 1911 1912 .. math:: 1913 1914 dst.xy = \lceil src.xy\rceil 1915 1916 dst.zw = \lceil src.zw\rceil 1917 1918.. opcode:: DFLR - Floor 1919 1920 .. math:: 1921 1922 dst.xy = \lfloor src.xy\rfloor 1923 1924 dst.zw = \lfloor src.zw\rfloor 1925 1926.. opcode:: DROUND - Fraction 1927 1928 .. math:: 1929 1930 dst.xy = round(src.xy) 1931 1932 dst.zw = round(src.zw) 1933 1934.. opcode:: DSSG - Set Sign 1935 1936 .. math:: 1937 1938 dst.xy = (src.xy > 0) ? 1.0 : (src.xy < 0) ? -1.0 : 0.0 1939 1940 dst.zw = (src.zw > 0) ? 1.0 : (src.zw < 0) ? -1.0 : 0.0 1941 1942.. opcode:: DLDEXP - Multiply Number by Integral Power of 2 1943 1944 This opcode is the inverse of frexp. The second 1945 source is an integer. 1946 1947 .. math:: 1948 1949 dst.xy = src0.xy \times 2^{src1.x} 1950 1951 dst.zw = src0.zw \times 2^{src1.z} 1952 1953.. opcode:: DMIN - Minimum 1954 1955 .. math:: 1956 1957 dst.xy = min(src0.xy, src1.xy) 1958 1959 dst.zw = min(src0.zw, src1.zw) 1960 1961.. opcode:: DMAX - Maximum 1962 1963 .. math:: 1964 1965 dst.xy = max(src0.xy, src1.xy) 1966 1967 dst.zw = max(src0.zw, src1.zw) 1968 1969.. opcode:: DMUL - Multiply 1970 1971 .. math:: 1972 1973 dst.xy = src0.xy \times src1.xy 1974 1975 dst.zw = src0.zw \times src1.zw 1976 1977 1978.. opcode:: DMAD - Multiply And Add 1979 1980 .. math:: 1981 1982 dst.xy = src0.xy \times src1.xy + src2.xy 1983 1984 dst.zw = src0.zw \times src1.zw + src2.zw 1985 1986 1987.. opcode:: DFMA - Fused Multiply-Add 1988 1989 Perform a * b + c with no intermediate rounding step. 1990 1991 .. math:: 1992 1993 dst.xy = src0.xy \times src1.xy + src2.xy 1994 1995 dst.zw = src0.zw \times src1.zw + src2.zw 1996 1997 1998.. opcode:: DDIV - Divide 1999 2000 .. math:: 2001 2002 dst.xy = \frac{src0.xy}{src1.xy} 2003 2004 dst.zw = \frac{src0.zw}{src1.zw} 2005 2006 2007.. opcode:: DRCP - Reciprocal 2008 2009 .. math:: 2010 2011 dst.xy = \frac{1}{src.xy} 2012 2013 dst.zw = \frac{1}{src.zw} 2014 2015.. opcode:: DSQRT - Square Root 2016 2017 .. math:: 2018 2019 dst.xy = \sqrt{src.xy} 2020 2021 dst.zw = \sqrt{src.zw} 2022 2023.. opcode:: DRSQ - Reciprocal Square Root 2024 2025 .. math:: 2026 2027 dst.xy = \frac{1}{\sqrt{src.xy}} 2028 2029 dst.zw = \frac{1}{\sqrt{src.zw}} 2030 2031.. opcode:: F2D - Float to Double 2032 2033 .. math:: 2034 2035 dst.xy = double(src0.x) 2036 2037 dst.zw = double(src0.y) 2038 2039.. opcode:: D2F - Double to Float 2040 2041 .. math:: 2042 2043 dst.x = float(src0.xy) 2044 2045 dst.y = float(src0.zw) 2046 2047.. opcode:: I2D - Int to Double 2048 2049 .. math:: 2050 2051 dst.xy = double(src0.x) 2052 2053 dst.zw = double(src0.y) 2054 2055.. opcode:: D2I - Double to Int 2056 2057 .. math:: 2058 2059 dst.x = int(src0.xy) 2060 2061 dst.y = int(src0.zw) 2062 2063.. opcode:: U2D - Unsigned Int to Double 2064 2065 .. math:: 2066 2067 dst.xy = double(src0.x) 2068 2069 dst.zw = double(src0.y) 2070 2071.. opcode:: D2U - Double to Unsigned Int 2072 2073 .. math:: 2074 2075 dst.x = unsigned(src0.xy) 2076 2077 dst.y = unsigned(src0.zw) 2078 207964-bit Integer ISA 2080^^^^^^^^^^^^^^^^^^ 2081 2082The 64-bit integer opcodes reinterpret four-component vectors into 2083two-component vectors with 64-bits in each component. 2084 2085.. opcode:: I64ABS - 64-bit Integer Absolute Value 2086 2087 .. math:: 2088 2089 dst.xy = |src0.xy| 2090 2091 dst.zw = |src0.zw| 2092 2093.. opcode:: I64NEG - 64-bit Integer Negate 2094 2095 Two's complement. 2096 2097 .. math:: 2098 2099 dst.xy = -src.xy 2100 2101 dst.zw = -src.zw 2102 2103.. opcode:: I64SSG - 64-bit Integer Set Sign 2104 2105 .. math:: 2106 2107 dst.xy = (src0.xy < 0) ? -1 : (src0.xy > 0) ? 1 : 0 2108 2109 dst.zw = (src0.zw < 0) ? -1 : (src0.zw > 0) ? 1 : 0 2110 2111.. opcode:: U64ADD - 64-bit Integer Add 2112 2113 .. math:: 2114 2115 dst.xy = src0.xy + src1.xy 2116 2117 dst.zw = src0.zw + src1.zw 2118 2119.. opcode:: U64MUL - 64-bit Integer Multiply 2120 2121 .. math:: 2122 2123 dst.xy = src0.xy * src1.xy 2124 2125 dst.zw = src0.zw * src1.zw 2126 2127.. opcode:: U64SEQ - 64-bit Integer Set on Equal 2128 2129 .. math:: 2130 2131 dst.x = src0.xy == src1.xy ? \sim 0 : 0 2132 2133 dst.z = src0.zw == src1.zw ? \sim 0 : 0 2134 2135.. opcode:: U64SNE - 64-bit Integer Set on Not Equal 2136 2137 .. math:: 2138 2139 dst.x = src0.xy != src1.xy ? \sim 0 : 0 2140 2141 dst.z = src0.zw != src1.zw ? \sim 0 : 0 2142 2143.. opcode:: U64SLT - 64-bit Unsigned Integer Set on Less Than 2144 2145 .. math:: 2146 2147 dst.x = src0.xy < src1.xy ? \sim 0 : 0 2148 2149 dst.z = src0.zw < src1.zw ? \sim 0 : 0 2150 2151.. opcode:: U64SGE - 64-bit Unsigned Integer Set on Greater Equal 2152 2153 .. math:: 2154 2155 dst.x = src0.xy >= src1.xy ? \sim 0 : 0 2156 2157 dst.z = src0.zw >= src1.zw ? \sim 0 : 0 2158 2159.. opcode:: I64SLT - 64-bit Signed Integer Set on Less Than 2160 2161 .. math:: 2162 2163 dst.x = src0.xy < src1.xy ? \sim 0 : 0 2164 2165 dst.z = src0.zw < src1.zw ? \sim 0 : 0 2166 2167.. opcode:: I64SGE - 64-bit Signed Integer Set on Greater Equal 2168 2169 .. math:: 2170 2171 dst.x = src0.xy >= src1.xy ? \sim 0 : 0 2172 2173 dst.z = src0.zw >= src1.zw ? \sim 0 : 0 2174 2175.. opcode:: I64MIN - Minimum of 64-bit Signed Integers 2176 2177 .. math:: 2178 2179 dst.xy = min(src0.xy, src1.xy) 2180 2181 dst.zw = min(src0.zw, src1.zw) 2182 2183.. opcode:: U64MIN - Minimum of 64-bit Unsigned Integers 2184 2185 .. math:: 2186 2187 dst.xy = min(src0.xy, src1.xy) 2188 2189 dst.zw = min(src0.zw, src1.zw) 2190 2191.. opcode:: I64MAX - Maximum of 64-bit Signed Integers 2192 2193 .. math:: 2194 2195 dst.xy = max(src0.xy, src1.xy) 2196 2197 dst.zw = max(src0.zw, src1.zw) 2198 2199.. opcode:: U64MAX - Maximum of 64-bit Unsigned Integers 2200 2201 .. math:: 2202 2203 dst.xy = max(src0.xy, src1.xy) 2204 2205 dst.zw = max(src0.zw, src1.zw) 2206 2207.. opcode:: U64SHL - Shift Left 64-bit Unsigned Integer 2208 2209 The shift count is masked with ``0x3f`` before the shift is applied. 2210 2211 .. math:: 2212 2213 dst.xy = src0.xy \ll (0x3f \& src1.x) 2214 2215 dst.zw = src0.zw \ll (0x3f \& src1.y) 2216 2217.. opcode:: I64SHR - Arithmetic Shift Right (of 64-bit Signed Integer) 2218 2219 The shift count is masked with ``0x3f`` before the shift is applied. 2220 2221 .. math:: 2222 2223 dst.xy = src0.xy \gg (0x3f \& src1.x) 2224 2225 dst.zw = src0.zw \gg (0x3f \& src1.y) 2226 2227.. opcode:: U64SHR - Logical Shift Right (of 64-bit Unsigned Integer) 2228 2229 The shift count is masked with ``0x3f`` before the shift is applied. 2230 2231 .. math:: 2232 2233 dst.xy = src0.xy \gg (unsigned) (0x3f \& src1.x) 2234 2235 dst.zw = src0.zw \gg (unsigned) (0x3f \& src1.y) 2236 2237.. opcode:: I64DIV - 64-bit Signed Integer Division 2238 2239 .. math:: 2240 2241 dst.xy = \frac{src0.xy}{src1.xy} 2242 2243 dst.zw = \frac{src0.zw}{src1.zw} 2244 2245.. opcode:: U64DIV - 64-bit Unsigned Integer Division 2246 2247 .. math:: 2248 2249 dst.xy = \frac{src0.xy}{src1.xy} 2250 2251 dst.zw = \frac{src0.zw}{src1.zw} 2252 2253.. opcode:: U64MOD - 64-bit Unsigned Integer Remainder 2254 2255 .. math:: 2256 2257 dst.xy = src0.xy \bmod src1.xy 2258 2259 dst.zw = src0.zw \bmod src1.zw 2260 2261.. opcode:: I64MOD - 64-bit Signed Integer Remainder 2262 2263 .. math:: 2264 2265 dst.xy = src0.xy \bmod src1.xy 2266 2267 dst.zw = src0.zw \bmod src1.zw 2268 2269.. opcode:: F2U64 - Float to 64-bit Unsigned Int 2270 2271 .. math:: 2272 2273 dst.xy = (uint64_t) src0.x 2274 2275 dst.zw = (uint64_t) src0.y 2276 2277.. opcode:: F2I64 - Float to 64-bit Int 2278 2279 .. math:: 2280 2281 dst.xy = (int64_t) src0.x 2282 2283 dst.zw = (int64_t) src0.y 2284 2285.. opcode:: U2I64 - Unsigned Integer to 64-bit Integer 2286 2287 This is a zero extension. 2288 2289 .. math:: 2290 2291 dst.xy = (int64_t) src0.x 2292 2293 dst.zw = (int64_t) src0.y 2294 2295.. opcode:: I2I64 - Signed Integer to 64-bit Integer 2296 2297 This is a sign extension. 2298 2299 .. math:: 2300 2301 dst.xy = (int64_t) src0.x 2302 2303 dst.zw = (int64_t) src0.y 2304 2305.. opcode:: D2U64 - Double to 64-bit Unsigned Int 2306 2307 .. math:: 2308 2309 dst.xy = (uint64_t) src0.xy 2310 2311 dst.zw = (uint64_t) src0.zw 2312 2313.. opcode:: D2I64 - Double to 64-bit Int 2314 2315 .. math:: 2316 2317 dst.xy = (int64_t) src0.xy 2318 2319 dst.zw = (int64_t) src0.zw 2320 2321.. opcode:: U642F - 64-bit unsigned integer to float 2322 2323 .. math:: 2324 2325 dst.x = (float) src0.xy 2326 2327 dst.y = (float) src0.zw 2328 2329.. opcode:: I642F - 64-bit Int to Float 2330 2331 .. math:: 2332 2333 dst.x = (float) src0.xy 2334 2335 dst.y = (float) src0.zw 2336 2337.. opcode:: U642D - 64-bit unsigned integer to double 2338 2339 .. math:: 2340 2341 dst.xy = (double) src0.xy 2342 2343 dst.zw = (double) src0.zw 2344 2345.. opcode:: I642D - 64-bit Int to double 2346 2347 .. math:: 2348 2349 dst.xy = (double) src0.xy 2350 2351 dst.zw = (double) src0.zw 2352 2353.. _samplingopcodes: 2354 2355Resource Sampling Opcodes 2356^^^^^^^^^^^^^^^^^^^^^^^^^ 2357 2358Those opcodes follow very closely semantics of the respective Direct3D 2359instructions. If in doubt double check Direct3D documentation. 2360Note that the swizzle on SVIEW (src1) determines texel swizzling 2361after lookup. 2362 2363.. opcode:: SAMPLE 2364 2365 Using provided address, sample data from the specified texture using the 2366 filtering mode identified by the given sampler. The source data may come from 2367 any resource type other than buffers. 2368 2369 Syntax: ``SAMPLE dst, address, sampler_view, sampler`` 2370 2371 Example: ``SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]`` 2372 2373.. opcode:: SAMPLE_I 2374 2375 Simplified alternative to the SAMPLE instruction. Using the provided 2376 integer address, SAMPLE_I fetches data from the specified sampler view 2377 without any filtering. The source data may come from any resource type 2378 other than CUBE. 2379 2380 Syntax: ``SAMPLE_I dst, address, sampler_view`` 2381 2382 Example: ``SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]`` 2383 2384 The 'address' is specified as unsigned integers. If the 'address' is out of 2385 range [0...(# texels - 1)] the result of the fetch is always 0 in all 2386 components. As such the instruction doesn't honor address wrap modes, in 2387 cases where that behavior is desirable 'SAMPLE' instruction should be used. 2388 address.w always provides an unsigned integer mipmap level. If the value is 2389 out of the range then the instruction always returns 0 in all components. 2390 address.yz are ignored for buffers and 1d textures. address.z is ignored 2391 for 1d texture arrays and 2d textures. 2392 2393 For 1D texture arrays address.y provides the array index (also as unsigned 2394 integer). If the value is out of the range of available array indices 2395 [0... (array size - 1)] then the opcode always returns 0 in all components. 2396 For 2D texture arrays address.z provides the array index, otherwise it 2397 exhibits the same behavior as in the case for 1D texture arrays. The exact 2398 semantics of the source address are presented in the table below: 2399 2400 +---------------------------+----+-----+-----+---------+ 2401 | resource type | X | Y | Z | W | 2402 +===========================+====+=====+=====+=========+ 2403 | ``PIPE_BUFFER`` | x | | | ignored | 2404 +---------------------------+----+-----+-----+---------+ 2405 | ``PIPE_TEXTURE_1D`` | x | | | mpl | 2406 +---------------------------+----+-----+-----+---------+ 2407 | ``PIPE_TEXTURE_2D`` | x | y | | mpl | 2408 +---------------------------+----+-----+-----+---------+ 2409 | ``PIPE_TEXTURE_3D`` | x | y | z | mpl | 2410 +---------------------------+----+-----+-----+---------+ 2411 | ``PIPE_TEXTURE_RECT`` | x | y | | mpl | 2412 +---------------------------+----+-----+-----+---------+ 2413 | ``PIPE_TEXTURE_CUBE`` | not allowed as source | 2414 +---------------------------+----+-----+-----+---------+ 2415 | ``PIPE_TEXTURE_1D_ARRAY`` | x | idx | | mpl | 2416 +---------------------------+----+-----+-----+---------+ 2417 | ``PIPE_TEXTURE_2D_ARRAY`` | x | y | idx | mpl | 2418 +---------------------------+----+-----+-----+---------+ 2419 2420 Where 'mpl' is a mipmap level and 'idx' is the array index. 2421 2422.. opcode:: SAMPLE_I_MS 2423 2424 Just like SAMPLE_I but allows fetch data from multi-sampled surfaces. 2425 2426 Syntax: ``SAMPLE_I_MS dst, address, sampler_view, sample`` 2427 2428.. opcode:: SAMPLE_B 2429 2430 Just like the SAMPLE instruction with the exception that an additional bias 2431 is applied to the level of detail computed as part of the instruction 2432 execution. 2433 2434 Syntax: ``SAMPLE_B dst, address, sampler_view, sampler, lod_bias`` 2435 2436 Example: ``SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x`` 2437 2438.. opcode:: SAMPLE_C 2439 2440 Similar to the SAMPLE instruction but it performs a comparison filter. The 2441 operands to SAMPLE_C are identical to SAMPLE, except that there is an 2442 additional float32 operand, reference value, which must be a register with 2443 single-component, or a scalar literal. SAMPLE_C makes the hardware use the 2444 current samplers compare_func (in pipe_sampler_state) to compare reference 2445 value against the red component value for the source resource at each texel 2446 that the currently configured texture filter covers based on the provided 2447 coordinates. 2448 2449 Syntax: ``SAMPLE_C dst, address, sampler_view.r, sampler, ref_value`` 2450 2451 Example: ``SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x`` 2452 2453.. opcode:: SAMPLE_C_LZ 2454 2455 Same as SAMPLE_C, but LOD is 0 and derivatives are ignored. The LZ stands 2456 for level-zero. 2457 2458 Syntax: ``SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value`` 2459 2460 Example: ``SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x`` 2461 2462 2463.. opcode:: SAMPLE_D 2464 2465 SAMPLE_D is identical to the SAMPLE opcode except that the derivatives for 2466 the source address in the x direction and the y direction are provided by 2467 extra parameters. 2468 2469 Syntax: ``SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y`` 2470 2471 Example: ``SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]`` 2472 2473.. opcode:: SAMPLE_L 2474 2475 SAMPLE_L is identical to the SAMPLE opcode except that the LOD is provided 2476 directly as a scalar value, representing no anisotropy. 2477 2478 Syntax: ``SAMPLE_L dst, address, sampler_view, sampler, explicit_lod`` 2479 2480 Example: ``SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x`` 2481 2482.. opcode:: GATHER4 2483 2484 Gathers the four texels to be used in a bi-linear filtering operation and 2485 packs them into a single register. Only works with 2D, 2D array, cubemaps, 2486 and cubemaps arrays. For 2D textures, only the addressing modes of the 2487 sampler and the top level of any mip pyramid are used. Set W to zero. It 2488 behaves like the SAMPLE instruction, but a filtered sample is not 2489 generated. The four samples that contribute to filtering are placed into 2490 XYZW in counter-clockwise order, starting with the (u,v) texture coordinate 2491 delta at the following locations (-, +), (+, +), (+, -), (-, -), where the 2492 magnitude of the deltas are half a texel. 2493 2494 2495.. opcode:: SVIEWINFO 2496 2497 Query the dimensions of a given sampler view. dst receives width, height, 2498 depth or array size and number of mipmap levels as int4. The dst can have a 2499 writemask which will specify what info is the caller interested in. 2500 2501 Syntax: ``SVIEWINFO dst, src_mip_level, sampler_view`` 2502 2503 Example: ``SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]`` 2504 2505 src_mip_level is an unsigned integer scalar. If it's out of range then 2506 returns 0 for width, height and depth/array size but the total number of 2507 mipmap is still returned correctly for the given sampler view. The returned 2508 width, height and depth values are for the mipmap level selected by the 2509 src_mip_level and are in the number of texels. For 1d texture array width 2510 is in dst.x, array size is in dst.y and dst.z is 0. The number of mipmaps is 2511 still in dst.w. In contrast to d3d10 resinfo, there's no way in the TGSI 2512 instruction encoding to specify the return type (float/rcpfloat/uint), hence 2513 always using uint. Also, unlike the SAMPLE instructions, the swizzle on src1 2514 resinfo allowing swizzling dst values is ignored (due to the interaction 2515 with rcpfloat modifier which requires some swizzle handling in the state 2516 tracker anyway). 2517 2518.. opcode:: SAMPLE_POS 2519 2520 Query the position of a sample in the given resource or render target 2521 when per-sample fragment shading is in effect. 2522 2523 Syntax: ``SAMPLE_POS dst, source, sample_index`` 2524 2525 dst receives float4 (x, y, undef, undef) indicated where the sample is 2526 located. Sample locations are in the range [0, 1] where 0.5 is the center 2527 of the fragment. 2528 2529 source is either a sampler view (to indicate a shader resource) or temp 2530 register (to indicate the render target). The source register may have 2531 an optional swizzle to apply to the returned result 2532 2533 sample_index is an integer scalar indicating which sample position is to 2534 be queried. 2535 2536 If per-sample shading is not in effect or the source resource or render 2537 target is not multisampled, the result is (0.5, 0.5, undef, undef). 2538 2539 NOTE: no driver has implemented this opcode yet (and no gallium frontend 2540 emits it). This information is subject to change. 2541 2542.. opcode:: SAMPLE_INFO 2543 2544 Query the number of samples in a multisampled resource or render target. 2545 2546 Syntax: ``SAMPLE_INFO dst, source`` 2547 2548 dst receives int4 (n, 0, 0, 0) where n is the number of samples in a 2549 resource or the render target. 2550 2551 source is either a sampler view (to indicate a shader resource) or temp 2552 register (to indicate the render target). The source register may have 2553 an optional swizzle to apply to the returned result 2554 2555 If per-sample shading is not in effect or the source resource or render 2556 target is not multisampled, the result is (1, 0, 0, 0). 2557 2558 NOTE: no driver has implemented this opcode yet (and no gallium frontend 2559 emits it). This information is subject to change. 2560 2561.. opcode:: LOD - level of detail 2562 2563 Same syntax as the SAMPLE opcode but instead of performing an actual 2564 texture lookup/filter, return the computed LOD information that the 2565 texture pipe would use to access the texture. The Y component contains 2566 the computed LOD lambda_prime. The X component contains the LOD that will 2567 be accessed, based on min/max lod's and mipmap filters. 2568 The Z and W components are set to 0. 2569 2570 Syntax: ``LOD dst, address, sampler_view, sampler`` 2571 2572 2573.. _resourceopcodes: 2574 2575Resource Access Opcodes 2576^^^^^^^^^^^^^^^^^^^^^^^ 2577 2578For these opcodes, the resource can be a BUFFER, IMAGE, or MEMORY. 2579 2580.. opcode:: LOAD - Fetch data from a shader buffer or image 2581 2582 Syntax: ``LOAD dst, resource, address`` 2583 2584 Example: ``LOAD TEMP[0], BUFFER[0], TEMP[1]`` 2585 2586 Using the provided integer address, LOAD fetches data from the 2587 specified buffer or texture without any filtering. 2588 2589 The 'address' is specified as a vector of unsigned integers. If the 2590 'address' is out of range the result is unspecified. 2591 2592 Only the first mipmap level of a resource can be read from using this 2593 instruction. 2594 2595 For 1D or 2D texture arrays, the array index is provided as an 2596 unsigned integer in address.y or address.z, respectively. address.yz 2597 are ignored for buffers and 1D textures. address.z is ignored for 1D 2598 texture arrays and 2D textures. address.w is always ignored. 2599 2600 A swizzle suffix may be added to the resource argument this will 2601 cause the resource data to be swizzled accordingly. 2602 2603.. opcode:: STORE - Write data to a shader resource 2604 2605 Syntax: ``STORE resource, address, src`` 2606 2607 Example: ``STORE BUFFER[0], TEMP[0], TEMP[1]`` 2608 2609 Using the provided integer address, STORE writes data to the 2610 specified buffer or texture. 2611 2612 The 'address' is specified as a vector of unsigned integers. If the 2613 'address' is out of range the result is unspecified. 2614 2615 Only the first mipmap level of a resource can be written to using 2616 this instruction. 2617 2618 For 1D or 2D texture arrays, the array index is provided as an 2619 unsigned integer in address.y or address.z, respectively. 2620 address.yz are ignored for buffers and 1D textures. address.z is 2621 ignored for 1D texture arrays and 2D textures. address.w is always 2622 ignored. 2623 2624.. opcode:: RESQ - Query information about a resource 2625 2626 Syntax: ``RESQ dst, resource`` 2627 2628 Example: ``RESQ TEMP[0], BUFFER[0]`` 2629 2630 Returns information about the buffer or image resource. For buffer 2631 resources, the size (in bytes) is returned in the x component. For 2632 image resources, .xyz will contain the width/height/layers of the 2633 image, while .w will contain the number of samples for multi-sampled 2634 images. 2635 2636.. opcode:: FBFETCH - Load data from framebuffer 2637 2638 Syntax: ``FBFETCH dst, output`` 2639 2640 Example: ``FBFETCH TEMP[0], OUT[0]`` 2641 2642 This is only valid on ``COLOR`` semantic outputs. Returns the color 2643 of the current position in the framebuffer from before this fragment 2644 shader invocation. May return the same value from multiple calls for 2645 a particular output within a single invocation. Note that result may 2646 be undefined if a fragment is drawn multiple times without a blend 2647 barrier in between. 2648 2649 2650.. _bindlessopcodes: 2651 2652Bindless Opcodes 2653^^^^^^^^^^^^^^^^ 2654 2655These opcodes are for working with bindless sampler or image handles and 2656require PIPE_CAP_BINDLESS_TEXTURE. 2657 2658.. opcode:: IMG2HND - Get a bindless handle for a image 2659 2660 Syntax: ``IMG2HND dst, image`` 2661 2662 Example: ``IMG2HND TEMP[0], IMAGE[0]`` 2663 2664 Sets 'dst' to a bindless handle for 'image'. 2665 2666.. opcode:: SAMP2HND - Get a bindless handle for a sampler 2667 2668 Syntax: ``SAMP2HND dst, sampler`` 2669 2670 Example: ``SAMP2HND TEMP[0], SAMP[0]`` 2671 2672 Sets 'dst' to a bindless handle for 'sampler'. 2673 2674 2675.. _threadsyncopcodes: 2676 2677Inter-thread synchronization opcodes 2678^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2679 2680These opcodes are intended for communication between threads running 2681within the same compute grid. For now they're only valid in compute 2682programs. 2683 2684.. opcode:: BARRIER - Thread group barrier 2685 2686 ``BARRIER`` 2687 2688 This opcode suspends the execution of the current thread until all 2689 the remaining threads in the working group reach the same point of 2690 the program. Results are unspecified if any of the remaining 2691 threads terminates or never reaches an executed BARRIER instruction. 2692 2693.. opcode:: MEMBAR - Memory barrier 2694 2695 ``MEMBAR type`` 2696 2697 This opcode waits for the completion of all memory accesses based on 2698 the type passed in. The type is an immediate bitfield with the following 2699 meaning: 2700 2701 Bit 0: Shader storage buffers 2702 Bit 1: Atomic buffers 2703 Bit 2: Images 2704 Bit 3: Shared memory 2705 Bit 4: Thread group 2706 2707 These may be passed in in any combination. An implementation is free to not 2708 distinguish between these as it sees fit. However these map to all the 2709 possibilities made available by GLSL. 2710 2711.. _atomopcodes: 2712 2713Atomic opcodes 2714^^^^^^^^^^^^^^ 2715 2716These opcodes provide atomic variants of some common arithmetic and 2717logical operations. In this context atomicity means that another 2718concurrent memory access operation that affects the same memory 2719location is guaranteed to be performed strictly before or after the 2720entire execution of the atomic operation. The resource may be a BUFFER, 2721IMAGE, HWATOMIC, or MEMORY. In the case of an image, the offset works 2722the same as for ``LOAD`` and ``STORE``, specified above. For atomic 2723counters, the offset is an immediate index to the base HW atomic 2724counter for this operation. 2725These atomic operations may only be used with 32-bit integer image formats. 2726 2727.. opcode:: ATOMUADD - Atomic integer addition 2728 2729 Syntax: ``ATOMUADD dst, resource, offset, src`` 2730 2731 Example: ``ATOMUADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2732 2733 The following operation is performed atomically: 2734 2735 .. math:: 2736 2737 dst_x = resource[offset] 2738 2739 resource[offset] = dst_x + src_x 2740 2741 2742.. opcode:: ATOMFADD - Atomic floating point addition 2743 2744 Syntax: ``ATOMFADD dst, resource, offset, src`` 2745 2746 Example: ``ATOMFADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2747 2748 The following operation is performed atomically: 2749 2750 .. math:: 2751 2752 dst_x = resource[offset] 2753 2754 resource[offset] = dst_x + src_x 2755 2756 2757.. opcode:: ATOMXCHG - Atomic exchange 2758 2759 Syntax: ``ATOMXCHG dst, resource, offset, src`` 2760 2761 Example: ``ATOMXCHG TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2762 2763 The following operation is performed atomically: 2764 2765 .. math:: 2766 2767 dst_x = resource[offset] 2768 2769 resource[offset] = src_x 2770 2771 2772.. opcode:: ATOMCAS - Atomic compare-and-exchange 2773 2774 Syntax: ``ATOMCAS dst, resource, offset, cmp, src`` 2775 2776 Example: ``ATOMCAS TEMP[0], BUFFER[0], TEMP[1], TEMP[2], TEMP[3]`` 2777 2778 The following operation is performed atomically: 2779 2780 .. math:: 2781 2782 dst_x = resource[offset] 2783 2784 resource[offset] = (dst_x == cmp_x ? src_x : dst_x) 2785 2786 2787.. opcode:: ATOMAND - Atomic bitwise And 2788 2789 Syntax: ``ATOMAND dst, resource, offset, src`` 2790 2791 Example: ``ATOMAND TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2792 2793 The following operation is performed atomically: 2794 2795 .. math:: 2796 2797 dst_x = resource[offset] 2798 2799 resource[offset] = dst_x \& src_x 2800 2801 2802.. opcode:: ATOMOR - Atomic bitwise Or 2803 2804 Syntax: ``ATOMOR dst, resource, offset, src`` 2805 2806 Example: ``ATOMOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2807 2808 The following operation is performed atomically: 2809 2810 .. math:: 2811 2812 dst_x = resource[offset] 2813 2814 resource[offset] = dst_x | src_x 2815 2816 2817.. opcode:: ATOMXOR - Atomic bitwise Xor 2818 2819 Syntax: ``ATOMXOR dst, resource, offset, src`` 2820 2821 Example: ``ATOMXOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2822 2823 The following operation is performed atomically: 2824 2825 .. math:: 2826 2827 dst_x = resource[offset] 2828 2829 resource[offset] = dst_x \oplus src_x 2830 2831 2832.. opcode:: ATOMUMIN - Atomic unsigned minimum 2833 2834 Syntax: ``ATOMUMIN dst, resource, offset, src`` 2835 2836 Example: ``ATOMUMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2837 2838 The following operation is performed atomically: 2839 2840 .. math:: 2841 2842 dst_x = resource[offset] 2843 2844 resource[offset] = (dst_x < src_x ? dst_x : src_x) 2845 2846 2847.. opcode:: ATOMUMAX - Atomic unsigned maximum 2848 2849 Syntax: ``ATOMUMAX dst, resource, offset, src`` 2850 2851 Example: ``ATOMUMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2852 2853 The following operation is performed atomically: 2854 2855 .. math:: 2856 2857 dst_x = resource[offset] 2858 2859 resource[offset] = (dst_x > src_x ? dst_x : src_x) 2860 2861 2862.. opcode:: ATOMIMIN - Atomic signed minimum 2863 2864 Syntax: ``ATOMIMIN dst, resource, offset, src`` 2865 2866 Example: ``ATOMIMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2867 2868 The following operation is performed atomically: 2869 2870 .. math:: 2871 2872 dst_x = resource[offset] 2873 2874 resource[offset] = (dst_x < src_x ? dst_x : src_x) 2875 2876 2877.. opcode:: ATOMIMAX - Atomic signed maximum 2878 2879 Syntax: ``ATOMIMAX dst, resource, offset, src`` 2880 2881 Example: ``ATOMIMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2882 2883 The following operation is performed atomically: 2884 2885 .. math:: 2886 2887 dst_x = resource[offset] 2888 2889 resource[offset] = (dst_x > src_x ? dst_x : src_x) 2890 2891 2892.. opcode:: ATOMINC_WRAP - Atomic increment + wrap around 2893 2894 Syntax: ``ATOMINC_WRAP dst, resource, offset, src`` 2895 2896 Example: ``ATOMINC_WRAP TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2897 2898 The following operation is performed atomically: 2899 2900 .. math:: 2901 2902 dst_x = resource[offset] + 1 2903 2904 resource[offset] = dst_x <= src_x ? dst_x : 0 2905 2906 2907.. opcode:: ATOMDEC_WRAP - Atomic decrement + wrap around 2908 2909 Syntax: ``ATOMDEC_WRAP dst, resource, offset, src`` 2910 2911 Example: ``ATOMDEC_WRAP TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2912 2913 The following operation is performed atomically: 2914 2915 .. math:: 2916 2917 dst_x = resource[offset] 2918 2919 resource[offset] = 2920 \left\{ 2921 \begin{array}{ c l } 2922 dst_x - 1 & \quad \textrm{if } dst_x \gt 0 \textrm{ and } dst_x \lt src_x \\ 2923 0 & \quad \textrm{otherwise} 2924 \end{array} 2925 \right. 2926 2927.. _interlaneopcodes: 2928 2929Inter-lane opcodes 2930^^^^^^^^^^^^^^^^^^ 2931 2932These opcodes reduce the given value across the shader invocations 2933running in the current SIMD group. Every thread in the subgroup will receive 2934the same result. The BALLOT operations accept a single-channel argument that 2935is treated as a boolean and produce a 64-bit value. 2936 2937.. opcode:: VOTE_ANY - Value is set in any of the active invocations 2938 2939 Syntax: ``VOTE_ANY dst, value`` 2940 2941 Example: ``VOTE_ANY TEMP[0].x, TEMP[1].x`` 2942 2943 2944.. opcode:: VOTE_ALL - Value is set in all of the active invocations 2945 2946 Syntax: ``VOTE_ALL dst, value`` 2947 2948 Example: ``VOTE_ALL TEMP[0].x, TEMP[1].x`` 2949 2950 2951.. opcode:: VOTE_EQ - Value is the same in all of the active invocations 2952 2953 Syntax: ``VOTE_EQ dst, value`` 2954 2955 Example: ``VOTE_EQ TEMP[0].x, TEMP[1].x`` 2956 2957 2958.. opcode:: BALLOT - Lanemask of whether the value is set in each active 2959 invocation 2960 2961 Syntax: ``BALLOT dst, value`` 2962 2963 Example: ``BALLOT TEMP[0].xy, TEMP[1].x`` 2964 2965 When the argument is a constant true, this produces a bitmask of active 2966 invocations. In fragment shaders, this can include helper invocations 2967 (invocations whose outputs and writes to memory are discarded, but which 2968 are used to compute derivatives). 2969 2970 2971.. opcode:: READ_FIRST - Broadcast the value from the first active 2972 invocation to all active lanes 2973 2974 Syntax: ``READ_FIRST dst, value`` 2975 2976 Example: ``READ_FIRST TEMP[0], TEMP[1]`` 2977 2978 2979.. opcode:: READ_INVOC - Retrieve the value from the given invocation 2980 (need not be uniform) 2981 2982 Syntax: ``READ_INVOC dst, value, invocation`` 2983 2984 Example: ``READ_INVOC TEMP[0].xy, TEMP[1].xy, TEMP[2].x`` 2985 2986 invocation.x controls the invocation number to read from for all channels. 2987 The invocation number must be the same across all active invocations in a 2988 sub-group; otherwise, the results are undefined. 2989 2990 2991Explanation of symbols used 2992------------------------------ 2993 2994 2995Functions 2996^^^^^^^^^^^^^^ 2997 2998 2999 :math:`|x|` Absolute value of ``x``. 3000 3001 :math:`\lceil x \rceil` Ceiling of ``x``. 3002 3003 clamp(x,y,z) Clamp x between y and z. 3004 (x < y) ? y : (x > z) ? z : x 3005 3006 :math:`\lfloor x\rfloor` Floor of ``x``. 3007 3008 :math:`\log_2{x}` Logarithm of ``x``, base 2. 3009 3010 max(x,y) Maximum of x and y. 3011 (x > y) ? x : y 3012 3013 min(x,y) Minimum of x and y. 3014 (x < y) ? x : y 3015 3016 partialx(x) Derivative of x relative to fragment's X. 3017 3018 partialy(x) Derivative of x relative to fragment's Y. 3019 3020 pop() Pop from stack. 3021 3022 :math:`x^y` ``x`` to the power ``y``. 3023 3024 push(x) Push x on stack. 3025 3026 round(x) Round x. 3027 3028 trunc(x) Truncate x, i.e. drop the fraction bits. 3029 3030 3031Keywords 3032^^^^^^^^^^^^^ 3033 3034 3035 discard Discard fragment. 3036 3037 pc Program counter. 3038 3039 target Label of target instruction. 3040 3041 3042Other tokens 3043--------------- 3044 3045 3046Declaration 3047^^^^^^^^^^^ 3048 3049 3050Declares a register that is will be referenced as an operand in Instruction 3051tokens. 3052 3053File field contains register file that is being declared and is one 3054of TGSI_FILE. 3055 3056UsageMask field specifies which of the register components can be accessed 3057and is one of TGSI_WRITEMASK. 3058 3059The Local flag specifies that a given value isn't intended for 3060subroutine parameter passing and, as a result, the implementation 3061isn't required to give any guarantees of it being preserved across 3062subroutine boundaries. As it's merely a compiler hint, the 3063implementation is free to ignore it. 3064 3065If Dimension flag is set to 1, a Declaration Dimension token follows. 3066 3067If Semantic flag is set to 1, a Declaration Semantic token follows. 3068 3069If Interpolate flag is set to 1, a Declaration Interpolate token follows. 3070 3071If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows. 3072 3073If Array flag is set to 1, a Declaration Array token follows. 3074 3075Array Declaration 3076^^^^^^^^^^^^^^^^^^^^^^^^ 3077 3078Declarations can optional have an ArrayID attribute which can be referred by 3079indirect addressing operands. An ArrayID of zero is reserved and treated as 3080if no ArrayID is specified. 3081 3082If an indirect addressing operand refers to a specific declaration by using 3083an ArrayID only the registers in this declaration are guaranteed to be 3084accessed, accessing any register outside this declaration results in undefined 3085behavior. Note that for compatibility the effective index is zero-based and 3086not relative to the specified declaration 3087 3088If no ArrayID is specified with an indirect addressing operand the whole 3089register file might be accessed by this operand. This is strongly discouraged 3090and will prevent packing of scalar/vec2 arrays and effective alias analysis. 3091This is only legal for TEMP and CONST register files. 3092 3093Declaration Semantic 3094^^^^^^^^^^^^^^^^^^^^^^^^ 3095 3096Vertex and fragment shader input and output registers may be labeled 3097with semantic information consisting of a name and index. 3098 3099Follows Declaration token if Semantic bit is set. 3100 3101Since its purpose is to link a shader with other stages of the pipeline, 3102it is valid to follow only those Declaration tokens that declare a register 3103either in INPUT or OUTPUT file. 3104 3105SemanticName field contains the semantic name of the register being declared. 3106There is no default value. 3107 3108SemanticIndex is an optional subscript that can be used to distinguish 3109different register declarations with the same semantic name. The default value 3110is 0. 3111 3112The meanings of the individual semantic names are explained in the following 3113sections. 3114 3115TGSI_SEMANTIC_POSITION 3116"""""""""""""""""""""" 3117 3118For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader 3119output register which contains the homogeneous vertex position in the clip 3120space coordinate system. After clipping, the X, Y and Z components of the 3121vertex will be divided by the W value to get normalized device coordinates. 3122 3123For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that 3124fragment shader input (or system value, depending on which one is 3125supported by the driver) contains the fragment's window position. The X 3126component starts at zero and always increases from left to right. 3127The Y component starts at zero and always increases but Y=0 may either 3128indicate the top of the window or the bottom depending on the fragment 3129coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN). 3130The Z coordinate ranges from 0 to 1 to represent depth from the front 3131to the back of the Z buffer. The W component contains the interpolated 3132reciprocal of the vertex position W component (corresponding to gl_Fragcoord, 3133but unlike d3d10 which interpolates the same 1/w but then gives back 3134the reciprocal of the interpolated value). 3135 3136Fragment shaders may also declare an output register with 3137TGSI_SEMANTIC_POSITION. Only the Z component is writable. This allows 3138the fragment shader to change the fragment's Z position. 3139 3140 3141 3142TGSI_SEMANTIC_COLOR 3143""""""""""""""""""" 3144 3145For vertex shader outputs or fragment shader inputs/outputs, this 3146label indicates that the register contains an R,G,B,A color. 3147 3148Several shader inputs/outputs may contain colors so the semantic index 3149is used to distinguish them. For example, color[0] may be the diffuse 3150color while color[1] may be the specular color. 3151 3152This label is needed so that the flat/smooth shading can be applied 3153to the right interpolants during rasterization. 3154 3155 3156 3157TGSI_SEMANTIC_BCOLOR 3158"""""""""""""""""""" 3159 3160Back-facing colors are only used for back-facing polygons, and are only valid 3161in vertex shader outputs. After rasterization, all polygons are front-facing 3162and COLOR and BCOLOR end up occupying the same slots in the fragment shader, 3163so all BCOLORs effectively become regular COLORs in the fragment shader. 3164 3165 3166TGSI_SEMANTIC_FOG 3167""""""""""""""""" 3168 3169Vertex shader inputs and outputs and fragment shader inputs may be 3170labeled with TGSI_SEMANTIC_FOG to indicate that the register contains 3171a fog coordinate. Typically, the fragment shader will use the fog coordinate 3172to compute a fog blend factor which is used to blend the normal fragment color 3173with a constant fog color. But fog coord really is just an ordinary vec4 3174register like regular semantics. 3175 3176 3177TGSI_SEMANTIC_PSIZE 3178""""""""""""""""""" 3179 3180Vertex shader input and output registers may be labeled with 3181TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size 3182in the form (S, 0, 0, 1). The point size controls the width or diameter 3183of points for rasterization. This label cannot be used in fragment 3184shaders. 3185 3186When using this semantic, be sure to set the appropriate state in the 3187:ref:`rasterizer` first. 3188 3189 3190TGSI_SEMANTIC_TEXCOORD 3191"""""""""""""""""""""" 3192 3193Only available if PIPE_CAP_TGSI_TEXCOORD is exposed ! 3194 3195Vertex shader outputs and fragment shader inputs may be labeled with 3196this semantic to make them replaceable by sprite coordinates via the 3197sprite_coord_enable state in the :ref:`rasterizer`. 3198The semantic index permitted with this semantic is limited to <= 7. 3199 3200If the driver does not support TEXCOORD, sprite coordinate replacement 3201applies to inputs with the GENERIC semantic instead. 3202 3203The intended use case for this semantic is gl_TexCoord. 3204 3205 3206TGSI_SEMANTIC_PCOORD 3207"""""""""""""""""""" 3208 3209Only available if PIPE_CAP_TGSI_TEXCOORD is exposed ! 3210 3211Fragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate 3212that the register contains sprite coordinates in the form (x, y, 0, 1), if 3213the current primitive is a point and point sprites are enabled. Otherwise, 3214the contents of the register are undefined. 3215 3216The intended use case for this semantic is gl_PointCoord. 3217 3218 3219TGSI_SEMANTIC_GENERIC 3220""""""""""""""""""""" 3221 3222All vertex/fragment shader inputs/outputs not labeled with any other 3223semantic label can be considered to be generic attributes. Typical 3224uses of generic inputs/outputs are texcoords and user-defined values. 3225 3226 3227TGSI_SEMANTIC_NORMAL 3228"""""""""""""""""""" 3229 3230Indicates that a vertex shader input is a normal vector. This is 3231typically only used for legacy graphics APIs. 3232 3233 3234TGSI_SEMANTIC_FACE 3235"""""""""""""""""" 3236 3237This label applies to fragment shader inputs (or system values, 3238depending on which one is supported by the driver) and indicates that 3239the register contains front/back-face information. 3240 3241If it is an input, it will be a floating-point vector in the form (F, 0, 0, 1), 3242where F will be positive when the fragment belongs to a front-facing polygon, 3243and negative when the fragment belongs to a back-facing polygon. 3244 3245If it is a system value, it will be an integer vector in the form (F, 0, 0, 1), 3246where F is ``0xffffffff`` when the fragment belongs to a front-facing polygon 3247and ``0`` when the fragment belongs to a back-facing polygon. 3248 3249 3250TGSI_SEMANTIC_EDGEFLAG 3251"""""""""""""""""""""" 3252 3253For vertex shaders, this semantic label indicates that an input or 3254output is a boolean edge flag. The register layout is [F, x, x, x] 3255where F is 0.0 or 1.0 and x = don't care. Normally, the vertex shader 3256simply copies the edge flag input to the edge flag output. 3257 3258Edge flags are used to control which lines or points are actually 3259drawn when the polygon mode converts triangles/quads/polygons into 3260points or lines. 3261 3262 3263TGSI_SEMANTIC_STENCIL 3264""""""""""""""""""""" 3265 3266For fragment shaders, this semantic label indicates that an output 3267is a writable stencil reference value. Only the Y component is writable. 3268This allows the fragment shader to change the fragments stencil reference 3269value. 3270 3271 3272TGSI_SEMANTIC_VIEWPORT_INDEX 3273"""""""""""""""""""""""""""" 3274 3275For geometry shaders, this semantic label indicates that an output 3276contains the index of the viewport (and scissor) to use. 3277This is an integer value, and only the X component is used. 3278 3279If PIPE_CAP_VS_LAYER_VIEWPORT or PIPE_CAP_TES_LAYER_VIEWPORT is 3280supported, then this semantic label can also be used in vertex or 3281tessellation evaluation shaders, respectively. Only the value written in the 3282last vertex processing stage is used. 3283 3284 3285TGSI_SEMANTIC_LAYER 3286""""""""""""""""""" 3287 3288For geometry shaders, this semantic label indicates that an output 3289contains the layer value to use for the color and depth/stencil surfaces. 3290This is an integer value, and only the X component is used. 3291(Also known as rendertarget array index.) 3292 3293If PIPE_CAP_VS_LAYER_VIEWPORT or PIPE_CAP_TES_LAYER_VIEWPORT is 3294supported, then this semantic label can also be used in vertex or 3295tessellation evaluation shaders, respectively. Only the value written in the 3296last vertex processing stage is used. 3297 3298 3299TGSI_SEMANTIC_CLIPDIST 3300"""""""""""""""""""""" 3301 3302Note this covers clipping and culling distances. 3303 3304When components of vertex elements are identified this way, these 3305values are each assumed to be a float32 signed distance to a plane. 3306 3307For clip distances: 3308Primitive setup only invokes rasterization on pixels for which 3309the interpolated plane distances are >= 0. 3310 3311For cull distances: 3312Primitives will be completely discarded if the plane distance 3313for all of the vertices in the primitive are < 0. 3314If a vertex has a cull distance of NaN, that vertex counts as "out" 3315(as if its < 0); 3316 3317Multiple clip/cull planes can be implemented simultaneously, by 3318annotating multiple components of one or more vertex elements with 3319the above specified semantic. 3320The limits on both clip and cull distances are bound 3321by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines 3322the maximum number of components that can be used to hold the 3323distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT 3324which specifies the maximum number of registers which can be 3325annotated with those semantics. 3326The properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED 3327are used to divide up the 2 x vec4 space between clipping and culling. 3328 3329TGSI_SEMANTIC_SAMPLEID 3330"""""""""""""""""""""" 3331 3332For fragment shaders, this semantic label indicates that a system value 3333contains the current sample id (i.e. gl_SampleID) as an unsigned int. 3334Only the X component is used. If per-sample shading is not enabled, 3335the result is (0, undef, undef, undef). 3336 3337Note that if the fragment shader uses this system value, the fragment 3338shader is automatically executed at per sample frequency. 3339 3340TGSI_SEMANTIC_SAMPLEPOS 3341""""""""""""""""""""""" 3342 3343For fragment shaders, this semantic label indicates that a system 3344value contains the current sample's position as float4(x, y, undef, undef) 3345in the render target (i.e. gl_SamplePosition) when per-fragment shading 3346is in effect. Position values are in the range [0, 1] where 0.5 is 3347the center of the fragment. 3348 3349Note that if the fragment shader uses this system value, the fragment 3350shader is automatically executed at per sample frequency. 3351 3352TGSI_SEMANTIC_SAMPLEMASK 3353"""""""""""""""""""""""" 3354 3355For fragment shaders, this semantic label can be applied to either a 3356shader system value input or output. 3357 3358For a system value, the sample mask indicates the set of samples covered by 3359the current primitive. If MSAA is not enabled, the value is (1, 0, 0, 0). 3360 3361For an output, the sample mask is used to disable further sample processing. 3362 3363For both, the register type is uint[4] but only the X component is used 3364(i.e. gl_SampleMask[0]). Each bit corresponds to one sample position (up 3365to 32x MSAA is supported). 3366 3367TGSI_SEMANTIC_INVOCATIONID 3368"""""""""""""""""""""""""" 3369 3370For geometry shaders, this semantic label indicates that a system value 3371contains the current invocation id (i.e. gl_InvocationID). 3372This is an integer value, and only the X component is used. 3373 3374TGSI_SEMANTIC_INSTANCEID 3375"""""""""""""""""""""""" 3376 3377For vertex shaders, this semantic label indicates that a system value contains 3378the current instance id (i.e. gl_InstanceID). It does not include the base 3379instance. This is an integer value, and only the X component is used. 3380 3381TGSI_SEMANTIC_VERTEXID 3382"""""""""""""""""""""" 3383 3384For vertex shaders, this semantic label indicates that a system value contains 3385the current vertex id (i.e. gl_VertexID). It does (unlike in d3d10) include the 3386base vertex. This is an integer value, and only the X component is used. 3387 3388TGSI_SEMANTIC_VERTEXID_NOBASE 3389""""""""""""""""""""""""""""""" 3390 3391For vertex shaders, this semantic label indicates that a system value contains 3392the current vertex id without including the base vertex (this corresponds to 3393d3d10 vertex id, so TGSI_SEMANTIC_VERTEXID_NOBASE + TGSI_SEMANTIC_BASEVERTEX 3394== TGSI_SEMANTIC_VERTEXID). This is an integer value, and only the X component 3395is used. 3396 3397TGSI_SEMANTIC_BASEVERTEX 3398"""""""""""""""""""""""" 3399 3400For vertex shaders, this semantic label indicates that a system value contains 3401the base vertex (i.e. gl_BaseVertex). Note that for non-indexed draw calls, 3402this contains the first (or start) value instead. 3403This is an integer value, and only the X component is used. 3404 3405TGSI_SEMANTIC_PRIMID 3406"""""""""""""""""""" 3407 3408For geometry and fragment shaders, this semantic label indicates the value 3409contains the primitive id (i.e. gl_PrimitiveID). This is an integer value, 3410and only the X component is used. 3411FIXME: This right now can be either a ordinary input or a system value... 3412 3413 3414TGSI_SEMANTIC_PATCH 3415""""""""""""""""""" 3416 3417For tessellation evaluation/control shaders, this semantic label indicates a 3418generic per-patch attribute. Such semantics will not implicitly be per-vertex 3419arrays. 3420 3421TGSI_SEMANTIC_TESSCOORD 3422""""""""""""""""""""""" 3423 3424For tessellation evaluation shaders, this semantic label indicates the 3425coordinates of the vertex being processed. This is available in XYZ; W is 3426undefined. 3427 3428TGSI_SEMANTIC_TESSOUTER 3429""""""""""""""""""""""" 3430 3431For tessellation evaluation/control shaders, this semantic label indicates the 3432outer tessellation levels of the patch. Isoline tessellation will only have XY 3433defined, triangle will have XYZ and quads will have XYZW defined. This 3434corresponds to gl_TessLevelOuter. 3435 3436TGSI_SEMANTIC_TESSINNER 3437""""""""""""""""""""""" 3438 3439For tessellation evaluation/control shaders, this semantic label indicates the 3440inner tessellation levels of the patch. The X value is only defined for 3441triangle tessellation, while quads will have XY defined. This is entirely 3442undefined for isoline tessellation. 3443 3444TGSI_SEMANTIC_VERTICESIN 3445"""""""""""""""""""""""" 3446 3447For tessellation evaluation/control shaders, this semantic label indicates the 3448number of vertices provided in the input patch. Only the X value is defined. 3449 3450TGSI_SEMANTIC_HELPER_INVOCATION 3451""""""""""""""""""""""""""""""" 3452 3453For fragment shaders, this semantic indicates whether the current 3454invocation is covered or not. Helper invocations are created in order 3455to properly compute derivatives, however it may be desirable to skip 3456some of the logic in those cases. See ``gl_HelperInvocation`` documentation. 3457 3458TGSI_SEMANTIC_BASEINSTANCE 3459"""""""""""""""""""""""""" 3460 3461For vertex shaders, the base instance argument supplied for this 3462draw. This is an integer value, and only the X component is used. 3463 3464TGSI_SEMANTIC_DRAWID 3465"""""""""""""""""""" 3466 3467For vertex shaders, the zero-based index of the current draw in a 3468``glMultiDraw*`` invocation. This is an integer value, and only the X 3469component is used. 3470 3471 3472TGSI_SEMANTIC_WORK_DIM 3473"""""""""""""""""""""" 3474 3475For compute shaders started via OpenCL this retrieves the work_dim 3476parameter to the clEnqueueNDRangeKernel call with which the shader 3477was started. 3478 3479 3480TGSI_SEMANTIC_GRID_SIZE 3481""""""""""""""""""""""" 3482 3483For compute shaders, this semantic indicates the maximum (x, y, z) dimensions 3484of a grid of thread blocks. 3485 3486 3487TGSI_SEMANTIC_BLOCK_ID 3488"""""""""""""""""""""" 3489 3490For compute shaders, this semantic indicates the (x, y, z) coordinates of the 3491current block inside of the grid. 3492 3493 3494TGSI_SEMANTIC_BLOCK_SIZE 3495"""""""""""""""""""""""" 3496 3497For compute shaders, this semantic indicates the maximum (x, y, z) dimensions 3498of a block in threads. 3499 3500 3501TGSI_SEMANTIC_THREAD_ID 3502""""""""""""""""""""""" 3503 3504For compute shaders, this semantic indicates the (x, y, z) coordinates of the 3505current thread inside of the block. 3506 3507 3508TGSI_SEMANTIC_SUBGROUP_SIZE 3509""""""""""""""""""""""""""" 3510 3511This semantic indicates the subgroup size for the current invocation. This is 3512an integer of at most 64, as it indicates the width of lanemasks. It does not 3513depend on the number of invocations that are active. 3514 3515 3516TGSI_SEMANTIC_SUBGROUP_INVOCATION 3517""""""""""""""""""""""""""""""""" 3518 3519The index of the current invocation within its subgroup. 3520 3521 3522TGSI_SEMANTIC_SUBGROUP_EQ_MASK 3523"""""""""""""""""""""""""""""" 3524 3525A bit mask of ``bit index == TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3526``1 << subgroup_invocation`` in arbitrary precision arithmetic. 3527 3528 3529TGSI_SEMANTIC_SUBGROUP_GE_MASK 3530"""""""""""""""""""""""""""""" 3531 3532A bit mask of ``bit index >= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3533``((1 << (subgroup_size - subgroup_invocation)) - 1) << subgroup_invocation`` 3534in arbitrary precision arithmetic. 3535 3536 3537TGSI_SEMANTIC_SUBGROUP_GT_MASK 3538"""""""""""""""""""""""""""""" 3539 3540A bit mask of ``bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3541``((1 << (subgroup_size - subgroup_invocation - 1)) - 1) << (subgroup_invocation + 1)`` 3542in arbitrary precision arithmetic. 3543 3544 3545TGSI_SEMANTIC_SUBGROUP_LE_MASK 3546"""""""""""""""""""""""""""""" 3547 3548A bit mask of ``bit index <= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3549``(1 << (subgroup_invocation + 1)) - 1`` in arbitrary precision arithmetic. 3550 3551 3552TGSI_SEMANTIC_SUBGROUP_LT_MASK 3553"""""""""""""""""""""""""""""" 3554 3555A bit mask of ``bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3556``(1 << subgroup_invocation) - 1`` in arbitrary precision arithmetic. 3557 3558 3559TGSI_SEMANTIC_VIEWPORT_MASK 3560""""""""""""""""""""""""""" 3561 3562A bit mask of viewports to broadcast the current primitive to. See 3563:ext:`GL_NV_viewport_array2` for more details. 3564 3565 3566TGSI_SEMANTIC_TESS_DEFAULT_OUTER_LEVEL 3567"""""""""""""""""""""""""""""""""""""" 3568 3569A system value equal to the default_outer_level array set via set_tess_level. 3570 3571 3572TGSI_SEMANTIC_TESS_DEFAULT_INNER_LEVEL 3573"""""""""""""""""""""""""""""""""""""" 3574 3575A system value equal to the default_inner_level array set via set_tess_level. 3576 3577 3578Declaration Interpolate 3579^^^^^^^^^^^^^^^^^^^^^^^ 3580 3581This token is only valid for fragment shader INPUT declarations. 3582 3583The Interpolate field specifies the way input is being interpolated by 3584the rasterizer and is one of TGSI_INTERPOLATE_*. 3585 3586The Location field specifies the location inside the pixel that the 3587interpolation should be done at, one of ``TGSI_INTERPOLATE_LOC_*``. Note that 3588when per-sample shading is enabled, the implementation may choose to 3589interpolate at the sample irrespective of the Location field. 3590 3591 3592Declaration Sampler View 3593^^^^^^^^^^^^^^^^^^^^^^^^ 3594 3595Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW. 3596 3597DCL SVIEW[#], resource, type(s) 3598 3599Declares a shader input sampler view and assigns it to a SVIEW[#] 3600register. 3601 3602resource can be one of BUFFER, 1D, 2D, 3D, 1D_ARRAY and 2D_ARRAY. 3603 3604type must be 1 or 4 entries (if specifying on a per-component 3605level) out of UNORM, SNORM, SINT, UINT and FLOAT. 3606 3607For TEX\* style texture sample opcodes (as opposed to SAMPLE\* opcodes 3608which take an explicit SVIEW[#] source register), there may be optionally 3609SVIEW[#] declarations. In this case, the SVIEW index is implied by the 3610SAMP index, and there must be a corresponding SVIEW[#] declaration for 3611each SAMP[#] declaration. Drivers are free to ignore this if they wish. 3612But note in particular that some drivers need to know the sampler type 3613(float/int/unsigned) in order to generate the correct code, so cases 3614where integer textures are sampled, SVIEW[#] declarations should be 3615used. 3616 3617NOTE: It is NOT legal to mix SAMPLE\* style opcodes and TEX\* opcodes 3618in the same shader. 3619 3620Declaration Resource 3621^^^^^^^^^^^^^^^^^^^^ 3622 3623Follows Declaration token if file is TGSI_FILE_RESOURCE. 3624 3625DCL RES[#], resource [, WR] [, RAW] 3626 3627Declares a shader input resource and assigns it to a RES[#] 3628register. 3629 3630resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1D_ARRAY and 36312D_ARRAY. 3632 3633If the RAW keyword is not specified, the texture data will be 3634subject to conversion, swizzling and scaling as required to yield 3635the specified data type from the physical data format of the bound 3636resource. 3637 3638If the RAW keyword is specified, no channel conversion will be 3639performed: the values read for each of the channels (X,Y,Z,W) will 3640correspond to consecutive words in the same order and format 3641they're found in memory. No element-to-address conversion will be 3642performed either: the value of the provided X coordinate will be 3643interpreted in byte units instead of texel units. The result of 3644accessing a misaligned address is undefined. 3645 3646Usage of the STORE opcode is only allowed if the WR (writable) flag 3647is set. 3648 3649Hardware Atomic Register File 3650^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3651 3652Hardware atomics are declared as a 2D array with an optional array id. 3653 3654The first member of the dimension is the buffer resource the atomic 3655is located in. 3656The second member is a range into the buffer resource, either for 3657one or multiple counters. If this is an array, the declaration will have 3658an unique array id. 3659 3660Each counter is 4 bytes in size, and index and ranges are in counters not bytes. 3661DCL HWATOMIC[0][0] 3662DCL HWATOMIC[0][1] 3663 3664This declares two atomics, one at the start of the buffer and one in the 3665second 4 bytes. 3666 3667DCL HWATOMIC[0][0] 3668DCL HWATOMIC[1][0] 3669DCL HWATOMIC[1][1..3], ARRAY(1) 3670 3671This declares 5 atomics, one in buffer 0 at 0, 3672one in buffer 1 at 0, and an array of 3 atomics in 3673the buffer 1, starting at 1. 3674 3675Properties 3676^^^^^^^^^^^^^^^^^^^^^^^^ 3677 3678Properties are general directives that apply to the whole TGSI program. 3679 3680FS_COORD_ORIGIN 3681""""""""""""""" 3682 3683Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin. 3684The default value is UPPER_LEFT. 3685 3686If UPPER_LEFT, the position will be (0,0) at the upper left corner and 3687increase downward and rightward. 3688If LOWER_LEFT, the position will be (0,0) at the lower left corner and 3689increase upward and rightward. 3690 3691OpenGL defaults to LOWER_LEFT, and is configurable with the 3692:ext:`GL_ARB_fragment_coord_conventions` extension. 3693 3694DirectX 9/10 use UPPER_LEFT. 3695 3696FS_COORD_PIXEL_CENTER 3697""""""""""""""""""""" 3698 3699Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention. 3700The default value is HALF_INTEGER. 3701 3702If HALF_INTEGER, the fractional part of the position will be 0.5 3703If INTEGER, the fractional part of the position will be 0.0 3704 3705Note that this does not affect the set of fragments generated by 3706rasterization, which is instead controlled by half_pixel_center in the 3707rasterizer. 3708 3709OpenGL defaults to HALF_INTEGER, and is configurable with the 3710:ext:`GL_ARB_fragment_coord_conventions` extension. 3711 3712DirectX 9 uses INTEGER. 3713DirectX 10 uses HALF_INTEGER. 3714 3715FS_COLOR0_WRITES_ALL_CBUFS 3716"""""""""""""""""""""""""" 3717Specifies that writes to the fragment shader color 0 are replicated to all 3718bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where 3719fragData is directed to a single color buffer, but fragColor is broadcast. 3720 3721VS_PROHIBIT_UCPS 3722"""""""""""""""""""""""""" 3723If this property is set on the program bound to the shader stage before the 3724fragment shader, user clip planes should have no effect (be disabled) even if 3725that shader does not write to any clip distance outputs and the rasterizer's 3726clip_plane_enable is non-zero. 3727This property is only supported by drivers that also support shader clip 3728distance outputs. 3729This is useful for APIs that don't have UCPs and where clip distances written 3730by a shader cannot be disabled. 3731 3732GS_INVOCATIONS 3733"""""""""""""" 3734 3735Specifies the number of times a geometry shader should be executed for each 3736input primitive. Each invocation will have a different 3737TGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to 3738be 1. 3739 3740VS_WINDOW_SPACE_POSITION 3741"""""""""""""""""""""""""" 3742If this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output 3743is assumed to contain window space coordinates. 3744Division of X,Y,Z by W and the viewport transformation are disabled, and 1/W is 3745directly taken from the 4-th component of the shader output. 3746Naturally, clipping is not performed on window coordinates either. 3747The effect of this property is undefined if a geometry or tessellation shader 3748are in use. 3749 3750TCS_VERTICES_OUT 3751"""""""""""""""" 3752 3753The number of vertices written by the tessellation control shader. This 3754effectively defines the patch input size of the tessellation evaluation shader 3755as well. 3756 3757TES_PRIM_MODE 3758""""""""""""" 3759 3760This sets the tessellation primitive mode, one of ``MESA_PRIM_TRIANGLES``, 3761``MESA_PRIM_QUADS``, or ``MESA_PRIM_LINES``. (Unlike in GL, there is no 3762separate isolines settings, the regular lines is assumed to mean isolines.) 3763 3764TES_SPACING 3765""""""""""" 3766 3767This sets the spacing mode of the tessellation generator, one of 3768``PIPE_TESS_SPACING_*``. 3769 3770TES_VERTEX_ORDER_CW 3771""""""""""""""""""" 3772 3773This sets the vertex order to be clockwise if the value is 1, or 3774counter-clockwise if set to 0. 3775 3776TES_POINT_MODE 3777"""""""""""""" 3778 3779If set to a non-zero value, this turns on point mode for the tessellator, 3780which means that points will be generated instead of primitives. 3781 3782NUM_CLIPDIST_ENABLED 3783"""""""""""""""""""" 3784 3785How many clip distance scalar outputs are enabled. 3786 3787NUM_CULLDIST_ENABLED 3788"""""""""""""""""""" 3789 3790How many cull distance scalar outputs are enabled. 3791 3792FS_EARLY_DEPTH_STENCIL 3793"""""""""""""""""""""" 3794 3795Whether depth test, stencil test, and occlusion query should run before 3796the fragment shader (regardless of fragment shader side effects). Corresponds 3797to GLSL early_fragment_tests. 3798 3799NEXT_SHADER 3800""""""""""" 3801 3802Which shader stage will MOST LIKELY follow after this shader when the shader 3803is bound. This is only a hint to the driver and doesn't have to be precise. 3804Only set for VS and TES. 3805 3806CS_FIXED_BLOCK_WIDTH / HEIGHT / DEPTH 3807""""""""""""""""""""""""""""""""""""" 3808 3809Threads per block in each dimension, if known at compile time. If the block size 3810is known all three should be at least 1. If it is unknown they should all be set 3811to 0 or not set. 3812 3813LEGACY_MATH_RULES 3814""""""""""""""""" 3815 3816The MUL TGSI operation (FP32 multiplication) will return 0 if either 3817of the operands are equal to 0. That means that 0 * Inf = 0. This 3818should be set the same way for an entire pipeline. Note that this 3819applies not only to the literal MUL TGSI opcode, but all FP32 3820multiplications implied by other operations, such as MAD, FMA, DP2, 3821DP3, DP4, DST, LOG, LRP, and possibly others. If there is a 3822mismatch between shaders, then it is unspecified whether this behavior 3823will be enabled. 3824 3825FS_POST_DEPTH_COVERAGE 3826"""""""""""""""""""""" 3827 3828When enabled, the input for TGSI_SEMANTIC_SAMPLEMASK will exclude samples 3829that have failed the depth/stencil tests. This is only valid when 3830FS_EARLY_DEPTH_STENCIL is also specified. 3831 3832LAYER_VIEWPORT_RELATIVE 3833""""""""""""""""""""""" 3834 3835When enabled, the TGSI_SEMATNIC_LAYER output value is relative to the 3836current viewport. This is especially useful in conjunction with 3837TGSI_SEMANTIC_VIEWPORT_MASK. 3838 3839 3840Texture Sampling and Texture Formats 3841------------------------------------ 3842 3843This table shows how texture image components are returned as (x,y,z,w) tuples 3844by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and 3845:opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as 3846well. 3847 3848+--------------------+--------------+--------------------+--------------+ 3849| Texture Components | Gallium | OpenGL | Direct3D 9 | 3850+====================+==============+====================+==============+ 3851| R | (r, 0, 0, 1) | (r, 0, 0, 1) | (r, 1, 1, 1) | 3852+--------------------+--------------+--------------------+--------------+ 3853| RG | (r, g, 0, 1) | (r, g, 0, 1) | (r, g, 1, 1) | 3854+--------------------+--------------+--------------------+--------------+ 3855| RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) | 3856+--------------------+--------------+--------------------+--------------+ 3857| RGBA | (r, g, b, a) | (r, g, b, a) | (r, g, b, a) | 3858+--------------------+--------------+--------------------+--------------+ 3859| A | (0, 0, 0, a) | (0, 0, 0, a) | (0, 0, 0, a) | 3860+--------------------+--------------+--------------------+--------------+ 3861| L | (l, l, l, 1) | (l, l, l, 1) | (l, l, l, 1) | 3862+--------------------+--------------+--------------------+--------------+ 3863| LA | (l, l, l, a) | (l, l, l, a) | (l, l, l, a) | 3864+--------------------+--------------+--------------------+--------------+ 3865| I | (i, i, i, i) | (i, i, i, i) | N/A | 3866+--------------------+--------------+--------------------+--------------+ 3867| UV | XXX TBD | (0, 0, 0, 1) | (u, v, 1, 1) | 3868| | | [#envmap-bumpmap]_ | | 3869+--------------------+--------------+--------------------+--------------+ 3870| Z | (z, z, z, z) | (z, z, z, 1) | (0, z, 0, 1) | 3871| | | [#depth-tex-mode]_ | | 3872+--------------------+--------------+--------------------+--------------+ 3873| S | (s, s, s, s) | unknown | unknown | 3874+--------------------+--------------+--------------------+--------------+ 3875 3876.. [#envmap-bumpmap] https://registry.khronos.org/OpenGL/extensions/ATI/ATI_envmap_bumpmap.txt 3877.. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z) 3878 or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE. 3879