xref: /aosp_15_r20/external/mesa3d/docs/gallium/tgsi.rst (revision 6104692788411f58d303aa86923a9ff6ecaded22)
1TGSI
2====
3
4TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
5for describing shaders. Since Gallium is inherently shaderful, shaders are
6an important part of the API. TGSI is the only intermediate representation
7used by all drivers.
8
9Basics
10------
11
12All TGSI instructions, known as *opcodes*, operate on arbitrary-precision
13floating-point four-component vectors. An opcode may have up to one
14destination register, known as *dst*, and between zero and three source
15registers, called *src0* through *src2*, or simply *src* if there is only
16one.
17
18Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
19components as integers. Other instructions permit using registers as
20two-component vectors with double precision; see :ref:`doubleopcodes`.
21
22When an instruction has a scalar result, the result is usually copied into
23each of the components of *dst*. When this happens, the result is said to be
24*replicated* to *dst*. :opcode:`RCP` is one such instruction.
25
26Source Modifiers
27^^^^^^^^^^^^^^^^
28
29TGSI supports 32-bit negate and absolute value modifiers on floating-point
30inputs, and 32-bit integer negates on some drivers.  The negate applies after
31absolute value if both are present.
32
33The type of an input can be found by ``tgsi_opcode_infer_src_type()``, and
34TGSI_OPCODE_MOV and the second and third operands of TGSI_OPCODE_UCMP (which
35return TGSI_TYPE_UNTYPED) are also considered floats for the purpose of source
36modifiers.
37
38
39Other Modifiers
40^^^^^^^^^^^^^^^
41
42The saturate modifier clamps 32-bit destination stores to [0.0, 1.0].
43
44For arithmetic instruction having a precise modifier certain optimizations
45which may alter the result are disallowed. Example: *add(mul(a,b),c)* can't be
46optimized to TGSI_OPCODE_MAD, because some hardware only supports the fused
47MAD instruction.
48
49Instruction Set
50---------------
51
52Core ISA
53^^^^^^^^^^^^^^^^^^^^^^^^^
54
55These opcodes are guaranteed to be available regardless of the driver being
56used.
57
58.. opcode:: ARL - Address Register Load
59
60   .. math::
61
62      dst.x = (int) \lfloor src.x\rfloor
63
64      dst.y = (int) \lfloor src.y\rfloor
65
66      dst.z = (int) \lfloor src.z\rfloor
67
68      dst.w = (int) \lfloor src.w\rfloor
69
70
71.. opcode:: MOV - Move
72
73   .. math::
74
75      dst.x = src.x
76
77      dst.y = src.y
78
79      dst.z = src.z
80
81      dst.w = src.w
82
83
84.. opcode:: LIT - Light Coefficients
85
86   .. math::
87
88      dst.x = 1
89
90      dst.y = max(src.x, 0)
91
92      dst.z =
93      \left\{
94      \begin{array}{ c l }
95         max(src.y, 0)^{clamp(src.w, -128, 128)} & \quad \textrm{if } src.x \gt 0 \\
96         0                                       & \quad \textrm{otherwise}
97      \end{array}
98      \right.
99
100      dst.w = 1
101
102
103.. opcode:: RCP - Reciprocal
104
105   This instruction replicates its result.
106
107   .. math::
108
109      dst = \frac{1}{src.x}
110
111
112.. opcode:: RSQ - Reciprocal Square Root
113
114   This instruction replicates its result. The results are undefined for *src* <= 0.
115
116   .. math::
117
118      dst = \frac{1}{\sqrt{src.x}}
119
120
121.. opcode:: SQRT - Square Root
122
123   This instruction replicates its result. The results are undefined for *src* < 0.
124
125   .. math::
126
127      dst = {\sqrt{src.x}}
128
129
130.. opcode:: EXP - Approximate Exponential Base 2
131
132   .. math::
133
134      dst.x &= 2^{\lfloor src.x\rfloor} \\
135      dst.y &= src.x - \lfloor src.x\rfloor \\
136      dst.z &= 2^{src.x} \\
137      dst.w &= 1
138
139
140.. opcode:: LOG - Approximate Logarithm Base 2
141
142   .. math::
143
144      dst.x &= \lfloor\log_2{|src.x|}\rfloor \\
145      dst.y &= \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} \\
146      dst.z &= \log_2{|src.x|} \\
147      dst.w &= 1
148
149
150.. opcode:: MUL - Multiply
151
152   .. math::
153
154      dst.x = src0.x \times src1.x
155
156      dst.y = src0.y \times src1.y
157
158      dst.z = src0.z \times src1.z
159
160      dst.w = src0.w \times src1.w
161
162
163.. opcode:: ADD - Add
164
165   .. math::
166
167      dst.x = src0.x + src1.x
168
169      dst.y = src0.y + src1.y
170
171      dst.z = src0.z + src1.z
172
173      dst.w = src0.w + src1.w
174
175
176.. opcode:: DP3 - 3-component Dot Product
177
178   This instruction replicates its result.
179
180   .. math::
181
182      \begin{aligned}
183      dst = & src0.x \times src1.x +\\
184            & src0.y \times src1.y +\\
185            & src0.z \times src1.z
186      \end{aligned}
187
188
189.. opcode:: DP4 - 4-component Dot Product
190
191   This instruction replicates its result.
192
193   .. math::
194
195      \begin{aligned}
196      dst = & src0.x \times src1.x +\\
197            & src0.y \times src1.y +\\
198            & src0.z \times src1.z +\\
199            & src0.w \times src1.w
200      \end{aligned}
201
202
203.. opcode:: DST - Distance Vector
204
205   .. math::
206
207      dst.x &= 1\\
208      dst.y &= src0.y \times src1.y\\
209      dst.z &= src0.z\\
210      dst.w &= src1.w
211
212
213.. opcode:: MIN - Minimum
214
215   .. math::
216
217      dst.x = min(src0.x, src1.x)
218
219      dst.y = min(src0.y, src1.y)
220
221      dst.z = min(src0.z, src1.z)
222
223      dst.w = min(src0.w, src1.w)
224
225
226.. opcode:: MAX - Maximum
227
228   .. math::
229
230      dst.x = max(src0.x, src1.x)
231
232      dst.y = max(src0.y, src1.y)
233
234      dst.z = max(src0.z, src1.z)
235
236      dst.w = max(src0.w, src1.w)
237
238
239.. opcode:: SLT - Set On Less Than
240
241   .. math::
242
243      dst.x = (src0.x < src1.x) ? 1.0F : 0.0F
244
245      dst.y = (src0.y < src1.y) ? 1.0F : 0.0F
246
247      dst.z = (src0.z < src1.z) ? 1.0F : 0.0F
248
249      dst.w = (src0.w < src1.w) ? 1.0F : 0.0F
250
251
252.. opcode:: SGE - Set On Greater Equal Than
253
254   .. math::
255
256      dst.x = (src0.x >= src1.x) ? 1.0F : 0.0F
257
258      dst.y = (src0.y >= src1.y) ? 1.0F : 0.0F
259
260      dst.z = (src0.z >= src1.z) ? 1.0F : 0.0F
261
262      dst.w = (src0.w >= src1.w) ? 1.0F : 0.0F
263
264
265.. opcode:: MAD - Multiply And Add
266
267   Perform a * b + c. The implementation is free to decide whether there is an
268   intermediate rounding step or not.
269
270   .. math::
271
272      dst.x = src0.x \times src1.x + src2.x
273
274      dst.y = src0.y \times src1.y + src2.y
275
276      dst.z = src0.z \times src1.z + src2.z
277
278      dst.w = src0.w \times src1.w + src2.w
279
280
281.. opcode:: LRP - Linear Interpolate
282
283   .. math::
284
285      dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
286
287      dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
288
289      dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
290
291      dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
292
293
294.. opcode:: FMA - Fused Multiply-Add
295
296   Perform a * b + c with no intermediate rounding step.
297
298   .. math::
299
300      dst.x = src0.x \times src1.x + src2.x
301
302      dst.y = src0.y \times src1.y + src2.y
303
304      dst.z = src0.z \times src1.z + src2.z
305
306      dst.w = src0.w \times src1.w + src2.w
307
308
309.. opcode:: FRC - Fraction
310
311   .. math::
312
313      dst.x = src.x - \lfloor src.x\rfloor
314
315      dst.y = src.y - \lfloor src.y\rfloor
316
317      dst.z = src.z - \lfloor src.z\rfloor
318
319      dst.w = src.w - \lfloor src.w\rfloor
320
321
322.. opcode:: FLR - Floor
323
324   .. math::
325
326      dst.x = \lfloor src.x\rfloor
327
328      dst.y = \lfloor src.y\rfloor
329
330      dst.z = \lfloor src.z\rfloor
331
332      dst.w = \lfloor src.w\rfloor
333
334
335.. opcode:: ROUND - Round
336
337   .. math::
338
339      dst.x = round(src.x)
340
341      dst.y = round(src.y)
342
343      dst.z = round(src.z)
344
345      dst.w = round(src.w)
346
347
348.. opcode:: EX2 - Exponential Base 2
349
350   This instruction replicates its result.
351
352   .. math::
353
354      dst = 2^{src.x}
355
356
357.. opcode:: LG2 - Logarithm Base 2
358
359   This instruction replicates its result.
360
361   .. math::
362
363      dst = \log_2{src.x}
364
365
366.. opcode:: POW - Power
367
368   This instruction replicates its result.
369
370   .. math::
371
372      dst = src0.x^{src1.x}
373
374
375.. opcode:: LDEXP - Multiply Number by Integral Power of 2
376
377   *src1* is an integer.
378
379   .. math::
380
381      dst.x = src0.x * 2^{src1.x}
382
383      dst.y = src0.y * 2^{src1.y}
384
385      dst.z = src0.z * 2^{src1.z}
386
387      dst.w = src0.w * 2^{src1.w}
388
389
390.. opcode:: COS - Cosine
391
392   This instruction replicates its result.
393
394   .. math::
395
396      dst = \cos{src.x}
397
398
399.. opcode:: DDX, DDX_FINE - Derivative Relative To X
400
401   The fine variant is only used when ``PIPE_CAP_FS_FINE_DERIVATIVE`` is
402   advertised. When it is, the fine version guarantees one derivative per
403   row while DDX is allowed to be the same for the entire 2x2 quad.
404
405   .. math::
406
407      dst.x = partialx(src.x)
408
409      dst.y = partialx(src.y)
410
411      dst.z = partialx(src.z)
412
413      dst.w = partialx(src.w)
414
415
416.. opcode:: DDY, DDY_FINE - Derivative Relative To Y
417
418   The fine variant is only used when ``PIPE_CAP_FS_FINE_DERIVATIVE`` is
419   advertised. When it is, the fine version guarantees one derivative per
420   column while DDY is allowed to be the same for the entire 2x2 quad.
421
422   .. math::
423
424      dst.x = partialy(src.x)
425
426      dst.y = partialy(src.y)
427
428      dst.z = partialy(src.z)
429
430      dst.w = partialy(src.w)
431
432
433.. opcode:: PK2H - Pack Two 16-bit Floats
434
435   This instruction replicates its result.
436
437   .. math::
438
439      \begin{aligned}
440      dst = & f32\_to\_f16(src.x) | \\
441          ( & f32\_to\_f16(src.y) \ll 16)
442      \end{aligned}
443
444.. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
445
446   This instruction replicates its result.
447
448   .. math::
449
450      \begin{aligned}
451      dst = & f32\_to\_unorm16(src.x) | \\
452          ( & f32\_to\_unorm16(src.y) \ll 16)
453      \end{aligned}
454
455
456.. opcode:: PK4B - Pack Four Signed 8-bit Scalars
457
458   This instruction replicates its result.
459
460   .. math::
461
462      \begin{aligned}
463      dst = & f32\_to\_snorm8(src.x) | \\
464          ( & f32\_to\_snorm8(src.y) \ll 8) | \\
465          ( & f32\_to\_snorm8(src.z) \ll 16) | \\
466          ( & f32\_to\_snorm8(src.w) \ll 24)
467      \end{aligned}
468
469
470.. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
471
472   This instruction replicates its result.
473
474   .. math::
475
476      \begin{aligned}
477      dst = & f32\_to\_unorm8(src.x) | \\
478          ( & f32\_to\_unorm8(src.y) \ll 8) | \\
479          ( & f32\_to\_unorm8(src.z) \ll 16) | \\
480          ( & f32\_to\_unorm8(src.w) \ll 24)
481      \end{aligned}
482
483
484.. opcode:: SEQ - Set On Equal
485
486   .. math::
487
488      dst.x = (src0.x == src1.x) ? 1.0F : 0.0F
489
490      dst.y = (src0.y == src1.y) ? 1.0F : 0.0F
491
492      dst.z = (src0.z == src1.z) ? 1.0F : 0.0F
493
494      dst.w = (src0.w == src1.w) ? 1.0F : 0.0F
495
496
497.. opcode:: SGT - Set On Greater Than
498
499   .. math::
500
501      dst.x = (src0.x > src1.x) ? 1.0F : 0.0F
502
503      dst.y = (src0.y > src1.y) ? 1.0F : 0.0F
504
505      dst.z = (src0.z > src1.z) ? 1.0F : 0.0F
506
507      dst.w = (src0.w > src1.w) ? 1.0F : 0.0F
508
509
510.. opcode:: SIN - Sine
511
512   This instruction replicates its result.
513
514   .. math::
515
516      dst = \sin{src.x}
517
518
519.. opcode:: SLE - Set On Less Equal Than
520
521   .. math::
522
523      dst.x = (src0.x <= src1.x) ? 1.0F : 0.0F
524
525      dst.y = (src0.y <= src1.y) ? 1.0F : 0.0F
526
527      dst.z = (src0.z <= src1.z) ? 1.0F : 0.0F
528
529      dst.w = (src0.w <= src1.w) ? 1.0F : 0.0F
530
531
532.. opcode:: SNE - Set On Not Equal
533
534   .. math::
535
536      dst.x = (src0.x != src1.x) ? 1.0F : 0.0F
537
538      dst.y = (src0.y != src1.y) ? 1.0F : 0.0F
539
540      dst.z = (src0.z != src1.z) ? 1.0F : 0.0F
541
542      dst.w = (src0.w != src1.w) ? 1.0F : 0.0F
543
544
545.. opcode:: TEX - Texture Lookup
546
547   for array textures *src0.y* contains the slice for 1D,
548   and *src0.z* contain the slice for 2D.
549
550   for shadow textures with no arrays (and not cube map),
551   *src0.z* contains the reference value.
552
553   for shadow textures with arrays, *src0.z* contains
554   the reference value for 1D arrays, and *src0.w* contains
555   the reference value for 2D arrays and cube maps.
556
557   for cube map array shadow textures, the reference value
558   cannot be passed in *src0.w*, and TEX2 must be used instead.
559
560   .. math::
561
562      coord = src0
563
564      shadow\_ref = src0.z \textrm{ or } src0.w \textrm{ (optional)}
565
566      unit = src1
567
568      dst = texture\_sample(unit, coord, shadow\_ref)
569
570
571.. opcode:: TEX2 - Texture Lookup (for shadow cube map arrays only)
572
573   this is the same as TEX, but uses another reg to encode the
574   reference value.
575
576   .. math::
577
578      coord = src0
579
580      shadow\_ref = src1.x
581
582      unit = src2
583
584      dst = texture\_sample(unit, coord, shadow\_ref)
585
586
587.. opcode:: TXD - Texture Lookup with Derivatives
588
589   .. math::
590
591      coord = src0
592
593      ddx = src1
594
595      ddy = src2
596
597      unit = src3
598
599      dst = texture\_sample\_deriv(unit, coord, ddx, ddy)
600
601
602.. opcode:: TXP - Projective Texture Lookup
603
604   .. math::
605
606      coord.x = src0.x / src0.w
607
608      coord.y = src0.y / src0.w
609
610      coord.z = src0.z / src0.w
611
612      coord.w = src0.w
613
614      unit = src1
615
616      dst = texture\_sample(unit, coord)
617
618
619.. opcode:: UP2H - Unpack Two 16-Bit Floats
620
621   .. math::
622
623      dst.x = f16\_to\_f32(src0.x \& 0xffff)
624
625      dst.y = f16\_to\_f32(src0.x \gg 16)
626
627      dst.z = f16\_to\_f32(src0.x \& 0xffff)
628
629      dst.w = f16\_to\_f32(src0.x \gg 16)
630
631   .. note::
632
633      Considered for removal.
634
635.. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
636
637   TBD
638
639   .. note::
640
641      Considered for removal.
642
643.. opcode:: UP4B - Unpack Four Signed 8-Bit Values
644
645   TBD
646
647   .. note::
648
649      Considered for removal.
650
651.. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
652
653   TBD
654
655   .. note::
656
657      Considered for removal.
658
659
660.. opcode:: ARR - Address Register Load With Round
661
662   .. math::
663
664      dst.x = (int) round(src.x)
665
666      dst.y = (int) round(src.y)
667
668      dst.z = (int) round(src.z)
669
670      dst.w = (int) round(src.w)
671
672
673.. opcode:: SSG - Set Sign
674
675   .. math::
676
677      dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
678
679      dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
680
681      dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
682
683      dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
684
685
686.. opcode:: CMP - Compare
687
688   .. math::
689
690      dst.x = (src0.x < 0) ? src1.x : src2.x
691
692      dst.y = (src0.y < 0) ? src1.y : src2.y
693
694      dst.z = (src0.z < 0) ? src1.z : src2.z
695
696      dst.w = (src0.w < 0) ? src1.w : src2.w
697
698
699.. opcode:: KILL_IF - Conditional Discard
700
701   Conditional discard.  Allowed in fragment shaders only.
702
703   Pseudocode::
704
705      if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
706         discard
707      endif
708
709
710.. opcode:: KILL - Discard
711
712   Unconditional discard.  Allowed in fragment shaders only.
713
714
715.. opcode:: DEMOTE - Demote Invocation to a Helper
716
717   This demotes the current invocation to a helper, but continues
718   execution (while KILL may or may not terminate the
719   invocation). After this runs, all the usual helper invocation rules
720   apply about discarding buffer and render target writes. This is
721   useful for having accurate derivatives in the other invocations
722   which have not been demoted.
723
724   Allowed in fragment shaders only.
725
726
727.. opcode:: READ_HELPER - Reads Invocation Helper Status
728
729   This is identical to ``TGSI_SEMANTIC_HELPER_INVOCATION``, except
730   this will read the current value, which might change as a result of
731   a ``DEMOTE`` instruction.
732
733   Allowed in fragment shaders only.
734
735
736.. opcode:: TXB - Texture Lookup With Bias
737
738   for cube map array textures and shadow cube maps, the bias value
739   cannot be passed in *src0.w*, and TXB2 must be used instead.
740
741   if the target is a shadow texture, the reference value is always
742   in *src.z* (this prevents shadow 3d and shadow 2d arrays from
743   using this instruction, but this is not needed).
744
745   .. math::
746
747      coord.x = src0.x
748
749      coord.y = src0.y
750
751      coord.z = src0.z
752
753      coord.w = none
754
755      bias = src0.w
756
757      unit = src1
758
759      dst = texture\_sample(unit, coord, bias)
760
761
762.. opcode:: TXB2 - Texture Lookup With Bias (some cube maps only)
763
764   this is the same as TXB, but uses another reg to encode the
765   LOD bias value for cube map arrays and shadow cube maps.
766   Presumably shadow 2d arrays and shadow 3d targets could use
767   this encoding too, but this is not legal.
768
769   if the target is a shadow cube map array, the reference value is in
770   *src1.y*.
771
772   .. math::
773
774      coord = src0
775
776      bias = src1.x
777
778      unit = src2
779
780      dst = texture\_sample(unit, coord, bias)
781
782
783.. opcode:: DIV - Divide
784
785   .. math::
786
787      dst.x = \frac{src0.x}{src1.x}
788
789      dst.y = \frac{src0.y}{src1.y}
790
791      dst.z = \frac{src0.z}{src1.z}
792
793      dst.w = \frac{src0.w}{src1.w}
794
795
796.. opcode:: DP2 - 2-component Dot Product
797
798   This instruction replicates its result.
799
800   .. math::
801
802      \begin{aligned}
803      dst = & src0.x \times src1.x + \\
804            & src0.y \times src1.y
805      \end{aligned}
806
807.. opcode:: TEX_LZ - Texture Lookup With LOD = 0
808
809   This is the same as TXL with LOD = 0. Like every texture opcode, it obeys
810   pipe_sampler_view::u.tex.first_level and pipe_sampler_state::min_lod.
811   There is no way to override those two in shaders.
812
813   .. math::
814
815      coord.x = src0.x
816
817      coord.y = src0.y
818
819      coord.z = src0.z
820
821      coord.w = none
822
823      lod = 0
824
825      unit = src1
826
827      dst = texture\_sample(unit, coord, lod)
828
829
830.. opcode:: TXL - Texture Lookup With explicit LOD
831
832   for cube map array textures, the explicit LOD value
833   cannot be passed in *src0.w*, and TXL2 must be used instead.
834
835   if the target is a shadow texture, the reference value is always
836   in *src.z* (this prevents shadow 3d / 2d array / cube targets from
837   using this instruction, but this is not needed).
838
839   .. math::
840
841      coord.x = src0.x
842
843      coord.y = src0.y
844
845      coord.z = src0.z
846
847      coord.w = none
848
849      lod = src0.w
850
851      unit = src1
852
853      dst = texture\_sample(unit, coord, lod)
854
855
856.. opcode:: TXL2 - Texture Lookup With explicit LOD (for cube map arrays only)
857
858   this is the same as TXL, but uses another reg to encode the
859   explicit LOD value.
860   Presumably shadow 3d / 2d array / cube targets could use
861   this encoding too, but this is not legal.
862
863   if the target is a shadow cube map array, the reference value is in
864   *src1.y*.
865
866   .. math::
867
868      coord = src0
869
870      lod = src1.x
871
872      unit = src2
873
874      dst = texture\_sample(unit, coord, lod)
875
876
877Compute ISA
878^^^^^^^^^^^^^^^^^^^^^^^^
879
880These opcodes are primarily provided for special-use computational shaders.
881Support for these opcodes indicated by a special pipe capability bit (TBD).
882
883XXX doesn't look like most of the opcodes really belong here.
884
885.. opcode:: CEIL - Ceiling
886
887   .. math::
888
889      dst.x = \lceil src.x\rceil
890
891      dst.y = \lceil src.y\rceil
892
893      dst.z = \lceil src.z\rceil
894
895      dst.w = \lceil src.w\rceil
896
897
898.. opcode:: TRUNC - Truncate
899
900   .. math::
901
902      dst.x = trunc(src.x)
903
904      dst.y = trunc(src.y)
905
906      dst.z = trunc(src.z)
907
908      dst.w = trunc(src.w)
909
910
911.. opcode:: MOD - Modulus
912
913   .. math::
914
915      dst.x = src0.x \bmod src1.x
916
917      dst.y = src0.y \bmod src1.y
918
919      dst.z = src0.z \bmod src1.z
920
921      dst.w = src0.w \bmod src1.w
922
923
924.. opcode:: UARL - Integer Address Register Load
925
926   Moves the contents of the source register, assumed to be an integer, into the
927   destination register, which is assumed to be an address (ADDR) register.
928
929
930.. opcode:: TXF - Texel Fetch
931
932   As per :ext:`GL_NV_gpu_program4`, extract a single texel from a specified
933   texture image or PIPE_BUFFER resource. The source sampler may not be a
934   CUBE or SHADOW.  *src0* is a
935   four-component signed integer vector used to identify the single texel
936   accessed. 3 components + level.  If the texture is multisampled, then
937   the fourth component indicates the sample, not the mipmap level.
938   Just like texture instructions, an optional
939   offset vector is provided, which is subject to various driver restrictions
940   (regarding range, source of offsets). This instruction ignores the sampler
941   state.
942
943   TXF(uint_vec coord, int_vec offset).
944
945
946.. opcode:: TXQ - Texture Size Query
947
948   As per :ext:`GL_NV_gpu_program4`, retrieve the dimensions of the texture
949   depending on   the target. For 1D (width), 2D/RECT/CUBE (width, height),
950   3D (width, height, depth), 1D array (width, layers), 2D array (width,
951   height, layers).  Also return the number of accessible levels
952   (last_level - first_level + 1) in W.
953
954   For components which don't return a resource dimension, their value
955   is undefined.
956
957   .. math::
958
959      lod = src0.x
960
961      dst.x = texture\_width(unit, lod)
962
963      dst.y = texture\_height(unit, lod)
964
965      dst.z = texture\_depth(unit, lod)
966
967      dst.w = texture\_levels(unit)
968
969
970.. opcode:: TXQS - Texture Samples Query
971
972   This retrieves the number of samples in the texture, and stores it
973   into the x component as an unsigned integer. The other components are
974   undefined.  If the texture is not multisampled, this function returns
975   (1, undef, undef, undef).
976
977   .. math::
978
979      dst.x = texture\_samples(unit)
980
981
982.. opcode:: TG4 - Texture Gather
983
984   As per :ext:`GL_ARB_texture_gather`, gathers the four texels to be used in a
985   bi-linear   filtering operation and packs them into a single register.
986   Only works with 2D, 2D array, cubemaps, and cubemaps arrays.  For 2D
987   textures, only the addressing modes of the sampler and the top level of any
988   mip pyramid are used. Set W to zero.  It behaves like the TEX instruction,
989   but a filtered sample is not generated. The four samples that contribute to
990   filtering are placed into XYZW in clockwise order, starting with the (u,v)
991   texture coordinate delta at the following locations (-, +), (+, +), (+, -),
992   (-, -), where the magnitude of the deltas are half a texel.
993
994   PIPE_CAP_TEXTURE_SM5 enhances this instruction to support shadow per-sample
995   depth compares, single component selection, and a non-constant offset. It
996   doesn't allow support for the GL independent offset to get i0,j0. This would
997   require another CAP is HW can do it natively. For now we lower that before
998   TGSI.
999
1000   PIPE_CAP_TGSI_TG4_COMPONENT_IN_SWIZZLE changes the encoding so that component
1001   is stored in the sampler source swizzle x.
1002
1003   (without TGSI_TG4_COMPONENT_IN_SWIZZLE)
1004
1005   .. math::
1006
1007      coord = src0
1008
1009      component = src1
1010
1011      dst = texture\_gather4 (unit, coord, component)
1012
1013   (with TGSI_TG4_COMPONENT_IN_SWIZZLE)
1014
1015   .. math::
1016
1017      coord = src0
1018
1019      dst = texture\_gather4 (unit, coord)
1020
1021      \text{component is encoded in sampler swizzle.}
1022
1023   (with SM5 - cube array shadow)
1024
1025   .. math::
1026
1027      coord = src0
1028
1029      compare = src1
1030
1031      dst = texture\_gather (uint, coord, compare)
1032
1033.. opcode:: LODQ - level of detail query
1034
1035   Compute the LOD information that the texture pipe would use to access the
1036   texture. The Y component contains the computed LOD lambda_prime. The X
1037   component contains the LOD that will be accessed, based on min/max LODs
1038   and mipmap filters.
1039
1040   .. math::
1041
1042      coord = src0
1043
1044      dst.xy = lodq(uint, coord);
1045
1046.. opcode:: CLOCK - retrieve the current shader time
1047
1048   Invoking this instruction multiple times in the same shader should
1049   cause monotonically increasing values to be returned. The values
1050   are implicitly 64-bit, so if fewer than 64 bits of precision are
1051   available, to provide expected wraparound semantics, the value
1052   should be shifted up so that the most significant bit of the time
1053   is the most significant bit of the 64-bit value.
1054
1055   .. math::
1056
1057      dst.xy = clock()
1058
1059
1060Integer ISA
1061^^^^^^^^^^^^^^^^^^^^^^^^
1062These opcodes are used for integer operations.
1063Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
1064
1065
1066.. opcode:: I2F - Signed Integer To Float
1067
1068   Rounding is unspecified (round to nearest even suggested).
1069
1070   .. math::
1071
1072      dst.x = (float) src.x
1073
1074      dst.y = (float) src.y
1075
1076      dst.z = (float) src.z
1077
1078      dst.w = (float) src.w
1079
1080
1081.. opcode:: U2F - Unsigned Integer To Float
1082
1083   Rounding is unspecified (round to nearest even suggested).
1084
1085   .. math::
1086
1087      dst.x = (float) src.x
1088
1089      dst.y = (float) src.y
1090
1091      dst.z = (float) src.z
1092
1093      dst.w = (float) src.w
1094
1095
1096.. opcode:: F2I - Float to Signed Integer
1097
1098   Rounding is towards zero (truncate).
1099   Values outside signed range (including NaNs) produce undefined results.
1100
1101   .. math::
1102
1103      dst.x = (int) src.x
1104
1105      dst.y = (int) src.y
1106
1107      dst.z = (int) src.z
1108
1109      dst.w = (int) src.w
1110
1111
1112.. opcode:: F2U - Float to Unsigned Integer
1113
1114   Rounding is towards zero (truncate).
1115   Values outside unsigned range (including NaNs) produce undefined results.
1116
1117   .. math::
1118
1119      dst.x = (unsigned) src.x
1120
1121      dst.y = (unsigned) src.y
1122
1123      dst.z = (unsigned) src.z
1124
1125      dst.w = (unsigned) src.w
1126
1127
1128.. opcode:: UADD - Integer Add
1129
1130   This instruction works the same for signed and unsigned integers.
1131   The low 32bit of the result is returned.
1132
1133   .. math::
1134
1135      dst.x = src0.x + src1.x
1136
1137      dst.y = src0.y + src1.y
1138
1139      dst.z = src0.z + src1.z
1140
1141      dst.w = src0.w + src1.w
1142
1143
1144.. opcode:: UMAD - Integer Multiply And Add
1145
1146   This instruction works the same for signed and unsigned integers.
1147   The multiplication returns the low 32bit (as does the result itself).
1148
1149   .. math::
1150
1151      dst.x = src0.x \times src1.x + src2.x
1152
1153      dst.y = src0.y \times src1.y + src2.y
1154
1155      dst.z = src0.z \times src1.z + src2.z
1156
1157      dst.w = src0.w \times src1.w + src2.w
1158
1159
1160.. opcode:: UMUL - Integer Multiply
1161
1162   This instruction works the same for signed and unsigned integers.
1163   The low 32bit of the result is returned.
1164
1165   .. math::
1166
1167      dst.x = src0.x \times src1.x
1168
1169      dst.y = src0.y \times src1.y
1170
1171      dst.z = src0.z \times src1.z
1172
1173      dst.w = src0.w \times src1.w
1174
1175
1176.. opcode:: IMUL_HI - Signed Integer Multiply High Bits
1177
1178   The high 32bits of the multiplication of 2 signed integers are returned.
1179
1180   .. math::
1181
1182      dst.x = (src0.x \times src1.x) \gg 32
1183
1184      dst.y = (src0.y \times src1.y) \gg 32
1185
1186      dst.z = (src0.z \times src1.z) \gg 32
1187
1188      dst.w = (src0.w \times src1.w) \gg 32
1189
1190
1191.. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
1192
1193   The high 32bits of the multiplication of 2 unsigned integers are returned.
1194
1195   .. math::
1196
1197      dst.x = (src0.x \times src1.x) \gg 32
1198
1199      dst.y = (src0.y \times src1.y) \gg 32
1200
1201      dst.z = (src0.z \times src1.z) \gg 32
1202
1203      dst.w = (src0.w \times src1.w) \gg 32
1204
1205
1206.. opcode:: IDIV - Signed Integer Division
1207
1208   TBD: behavior for division by zero.
1209
1210   .. math::
1211
1212      dst.x = \frac{src0.x}{src1.x}
1213
1214      dst.y = \frac{src0.y}{src1.y}
1215
1216      dst.z = \frac{src0.z}{src1.z}
1217
1218      dst.w = \frac{src0.w}{src1.w}
1219
1220
1221.. opcode:: UDIV - Unsigned Integer Division
1222
1223   For division by zero, ``0xffffffff`` is returned.
1224
1225   .. math::
1226
1227      dst.x = \frac{src0.x}{src1.x}
1228
1229      dst.y = \frac{src0.y}{src1.y}
1230
1231      dst.z = \frac{src0.z}{src1.z}
1232
1233      dst.w = \frac{src0.w}{src1.w}
1234
1235
1236.. opcode:: UMOD - Unsigned Integer Remainder
1237
1238   If *src1* is zero, ``0xffffffff`` is returned.
1239
1240   .. math::
1241
1242      dst.x = src0.x \bmod src1.x
1243
1244      dst.y = src0.y \bmod src1.y
1245
1246      dst.z = src0.z \bmod src1.z
1247
1248      dst.w = src0.w \bmod src1.w
1249
1250
1251.. opcode:: NOT - Bitwise Not
1252
1253   .. math::
1254
1255      dst.x = \sim src.x
1256
1257      dst.y = \sim src.y
1258
1259      dst.z = \sim src.z
1260
1261      dst.w = \sim src.w
1262
1263
1264.. opcode:: AND - Bitwise And
1265
1266   .. math::
1267
1268      dst.x = src0.x \& src1.x
1269
1270      dst.y = src0.y \& src1.y
1271
1272      dst.z = src0.z \& src1.z
1273
1274      dst.w = src0.w \& src1.w
1275
1276
1277.. opcode:: OR - Bitwise Or
1278
1279   .. math::
1280
1281      dst.x = src0.x | src1.x
1282
1283      dst.y = src0.y | src1.y
1284
1285      dst.z = src0.z | src1.z
1286
1287      dst.w = src0.w | src1.w
1288
1289
1290.. opcode:: XOR - Bitwise Xor
1291
1292   .. math::
1293
1294      dst.x = src0.x \oplus src1.x
1295
1296      dst.y = src0.y \oplus src1.y
1297
1298      dst.z = src0.z \oplus src1.z
1299
1300      dst.w = src0.w \oplus src1.w
1301
1302
1303.. opcode:: IMAX - Maximum of Signed Integers
1304
1305   .. math::
1306
1307      dst.x = max(src0.x, src1.x)
1308
1309      dst.y = max(src0.y, src1.y)
1310
1311      dst.z = max(src0.z, src1.z)
1312
1313      dst.w = max(src0.w, src1.w)
1314
1315
1316.. opcode:: UMAX - Maximum of Unsigned Integers
1317
1318   .. math::
1319
1320      dst.x = max(src0.x, src1.x)
1321
1322      dst.y = max(src0.y, src1.y)
1323
1324      dst.z = max(src0.z, src1.z)
1325
1326      dst.w = max(src0.w, src1.w)
1327
1328
1329.. opcode:: IMIN - Minimum of Signed Integers
1330
1331   .. math::
1332
1333      dst.x = min(src0.x, src1.x)
1334
1335      dst.y = min(src0.y, src1.y)
1336
1337      dst.z = min(src0.z, src1.z)
1338
1339      dst.w = min(src0.w, src1.w)
1340
1341
1342.. opcode:: UMIN - Minimum of Unsigned Integers
1343
1344   .. math::
1345
1346      dst.x = min(src0.x, src1.x)
1347
1348      dst.y = min(src0.y, src1.y)
1349
1350      dst.z = min(src0.z, src1.z)
1351
1352      dst.w = min(src0.w, src1.w)
1353
1354
1355.. opcode:: SHL - Shift Left
1356
1357   The shift count is masked with ``0x1f`` before the shift is applied.
1358
1359   .. math::
1360
1361      dst.x = src0.x \ll (0x1f \& src1.x)
1362
1363      dst.y = src0.y \ll (0x1f \& src1.y)
1364
1365      dst.z = src0.z \ll (0x1f \& src1.z)
1366
1367      dst.w = src0.w \ll (0x1f \& src1.w)
1368
1369
1370.. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer)
1371
1372   The shift count is masked with ``0x1f`` before the shift is applied.
1373
1374   .. math::
1375
1376      dst.x = src0.x \gg (0x1f \& src1.x)
1377
1378      dst.y = src0.y \gg (0x1f \& src1.y)
1379
1380      dst.z = src0.z \gg (0x1f \& src1.z)
1381
1382      dst.w = src0.w \gg (0x1f \& src1.w)
1383
1384
1385.. opcode:: USHR - Logical Shift Right
1386
1387   The shift count is masked with ``0x1f`` before the shift is applied.
1388
1389   .. math::
1390
1391      dst.x = src0.x \gg (unsigned) (0x1f \& src1.x)
1392
1393      dst.y = src0.y \gg (unsigned) (0x1f \& src1.y)
1394
1395      dst.z = src0.z \gg (unsigned) (0x1f \& src1.z)
1396
1397      dst.w = src0.w \gg (unsigned) (0x1f \& src1.w)
1398
1399
1400.. opcode:: UCMP - Integer Conditional Move
1401
1402   .. math::
1403
1404      dst.x = src0.x ? src1.x : src2.x
1405
1406      dst.y = src0.y ? src1.y : src2.y
1407
1408      dst.z = src0.z ? src1.z : src2.z
1409
1410      dst.w = src0.w ? src1.w : src2.w
1411
1412
1413
1414.. opcode:: ISSG - Integer Set Sign
1415
1416   .. math::
1417
1418      dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0
1419
1420      dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0
1421
1422      dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0
1423
1424      dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0
1425
1426
1427
1428.. opcode:: FSLT - Float Set On Less Than (ordered)
1429
1430   Same comparison as SLT but returns integer instead of 1.0/0.0 float
1431
1432   .. math::
1433
1434      dst.x = (src0.x < src1.x) ? \sim 0 : 0
1435
1436      dst.y = (src0.y < src1.y) ? \sim 0 : 0
1437
1438      dst.z = (src0.z < src1.z) ? \sim 0 : 0
1439
1440      dst.w = (src0.w < src1.w) ? \sim 0 : 0
1441
1442
1443.. opcode:: ISLT - Signed Integer Set On Less Than
1444
1445   .. math::
1446
1447      dst.x = (src0.x < src1.x) ? \sim 0 : 0
1448
1449      dst.y = (src0.y < src1.y) ? \sim 0 : 0
1450
1451      dst.z = (src0.z < src1.z) ? \sim 0 : 0
1452
1453      dst.w = (src0.w < src1.w) ? \sim 0 : 0
1454
1455
1456.. opcode:: USLT - Unsigned Integer Set On Less Than
1457
1458   .. math::
1459
1460      dst.x = (src0.x < src1.x) ? \sim 0 : 0
1461
1462      dst.y = (src0.y < src1.y) ? \sim 0 : 0
1463
1464      dst.z = (src0.z < src1.z) ? \sim 0 : 0
1465
1466      dst.w = (src0.w < src1.w) ? \sim 0 : 0
1467
1468
1469.. opcode:: FSGE - Float Set On Greater Equal Than (ordered)
1470
1471   Same comparison as SGE but returns integer instead of 1.0/0.0 float
1472
1473   .. math::
1474
1475      dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1476
1477      dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1478
1479      dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1480
1481      dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1482
1483
1484.. opcode:: ISGE - Signed Integer Set On Greater Equal Than
1485
1486   .. math::
1487
1488      dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1489
1490      dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1491
1492      dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1493
1494      dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1495
1496
1497.. opcode:: USGE - Unsigned Integer Set On Greater Equal Than
1498
1499   .. math::
1500
1501      dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1502
1503      dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1504
1505      dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1506
1507      dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1508
1509
1510.. opcode:: FSEQ - Float Set On Equal (ordered)
1511
1512   Same comparison as SEQ but returns integer instead of 1.0/0.0 float
1513
1514   .. math::
1515
1516      dst.x = (src0.x == src1.x) ? \sim 0 : 0
1517
1518      dst.y = (src0.y == src1.y) ? \sim 0 : 0
1519
1520      dst.z = (src0.z == src1.z) ? \sim 0 : 0
1521
1522      dst.w = (src0.w == src1.w) ? \sim 0 : 0
1523
1524
1525.. opcode:: USEQ - Integer Set On Equal
1526
1527   .. math::
1528
1529      dst.x = (src0.x == src1.x) ? \sim 0 : 0
1530
1531      dst.y = (src0.y == src1.y) ? \sim 0 : 0
1532
1533      dst.z = (src0.z == src1.z) ? \sim 0 : 0
1534
1535      dst.w = (src0.w == src1.w) ? \sim 0 : 0
1536
1537
1538.. opcode:: FSNE - Float Set On Not Equal (unordered)
1539
1540   Same comparison as SNE but returns integer instead of 1.0/0.0 float
1541
1542   .. math::
1543
1544      dst.x = (src0.x != src1.x) ? \sim 0 : 0
1545
1546      dst.y = (src0.y != src1.y) ? \sim 0 : 0
1547
1548      dst.z = (src0.z != src1.z) ? \sim 0 : 0
1549
1550      dst.w = (src0.w != src1.w) ? \sim 0 : 0
1551
1552
1553.. opcode:: USNE - Integer Set On Not Equal
1554
1555   .. math::
1556
1557      dst.x = (src0.x != src1.x) ? \sim 0 : 0
1558
1559      dst.y = (src0.y != src1.y) ? \sim 0 : 0
1560
1561      dst.z = (src0.z != src1.z) ? \sim 0 : 0
1562
1563      dst.w = (src0.w != src1.w) ? \sim 0 : 0
1564
1565
1566.. opcode:: INEG - Integer Negate
1567
1568  Two's complement.
1569
1570   .. math::
1571
1572      dst.x = -src.x
1573
1574      dst.y = -src.y
1575
1576      dst.z = -src.z
1577
1578      dst.w = -src.w
1579
1580
1581.. opcode:: IABS - Integer Absolute Value
1582
1583   .. math::
1584
1585      dst.x = |src.x|
1586
1587      dst.y = |src.y|
1588
1589      dst.z = |src.z|
1590
1591      dst.w = |src.w|
1592
1593Bitwise ISA
1594^^^^^^^^^^^
1595These opcodes are used for bit-level manipulation of integers.
1596
1597.. opcode:: IBFE - Signed Bitfield Extract
1598
1599   Like GLSL bitfieldExtract. Extracts a set of bits from the input, and
1600   sign-extends them if the high bit of the extracted window is set.
1601
1602   Pseudocode::
1603
1604      def ibfe(value, offset, bits):
1605         if offset < 0 or bits < 0 or offset + bits > 32:
1606            return undefined
1607         if bits == 0: return 0
1608         # Note: >> sign-extends
1609         return (value << (32 - offset - bits)) >> (32 - bits)
1610
1611.. opcode:: UBFE - Unsigned Bitfield Extract
1612
1613   Like GLSL bitfieldExtract. Extracts a set of bits from the input, without
1614   any sign-extension.
1615
1616   Pseudocode::
1617
1618      def ubfe(value, offset, bits):
1619         if offset < 0 or bits < 0 or offset + bits > 32:
1620            return undefined
1621         if bits == 0: return 0
1622         # Note: >> does not sign-extend
1623         return (value << (32 - offset - bits)) >> (32 - bits)
1624
1625.. opcode:: BFI - Bitfield Insert
1626
1627   Like GLSL bitfieldInsert. Replaces a bit region of 'base' with the low bits
1628   of 'insert'.
1629
1630   Pseudocode::
1631
1632      def bfi(base, insert, offset, bits):
1633         if offset < 0 or bits < 0 or offset + bits > 32:
1634            return undefined
1635         # << defined such that mask == ~0 when bits == 32, offset == 0
1636         mask = ((1 << bits) - 1) << offset
1637         return ((insert << offset) & mask) | (base & ~mask)
1638
1639.. opcode:: BREV - Bitfield Reverse
1640
1641   See SM5 instruction BFREV. Reverses the bits of the argument.
1642
1643.. opcode:: POPC - Population Count
1644
1645   See SM5 instruction COUNTBITS. Counts the number of set bits in the argument.
1646
1647.. opcode:: LSB - Index of lowest set bit
1648
1649   See SM5 instruction FIRSTBIT_LO. Computes the 0-based index of the first set
1650   bit of the argument. Returns -1 if none are set.
1651
1652.. opcode:: IMSB - Index of highest non-sign bit
1653
1654   See SM5 instruction FIRSTBIT_SHI. Computes the 0-based index of the highest
1655   non-sign bit of the argument (i.e. highest 0 bit for negative numbers,
1656   highest 1 bit for positive numbers). Returns -1 if all bits are the same
1657   (i.e. for inputs 0 and -1).
1658
1659.. opcode:: UMSB - Index of highest set bit
1660
1661   See SM5 instruction FIRSTBIT_HI. Computes the 0-based index of the highest
1662   set bit of the argument. Returns -1 if none are set.
1663
1664Geometry ISA
1665^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1666
1667These opcodes are only supported in geometry shaders; they have no meaning
1668in any other type of shader.
1669
1670.. opcode:: EMIT - Emit
1671
1672   Generate a new vertex for the current primitive into the specified vertex
1673   stream using the values in the output registers.
1674
1675
1676.. opcode:: ENDPRIM - End Primitive
1677
1678   Complete the current primitive in the specified vertex stream (consisting of
1679   the emitted vertices), and start a new one.
1680
1681
1682GLSL ISA
1683^^^^^^^^^^
1684
1685These opcodes are part of :term:`GLSL`'s opcode set. Support for these
1686opcodes is determined by a special capability bit, ``GLSL``.
1687Some require glsl version 1.30 (UIF/SWITCH/CASE/DEFAULT/ENDSWITCH).
1688
1689.. opcode:: CAL - Subroutine Call
1690
1691   Pseudocode::
1692
1693      push(pc)
1694      pc = target
1695
1696
1697.. opcode:: RET - Subroutine Call Return
1698
1699   Pseudocode::
1700
1701      pc = pop()
1702
1703
1704.. opcode:: CONT - Continue
1705
1706   Unconditionally moves the point of execution to the instruction after the
1707   last BGNLOOP. The instruction must appear within a BGNLOOP/ENDLOOP.
1708
1709.. note::
1710
1711   Support for CONT is determined by a special capability bit,
1712   ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
1713
1714
1715.. opcode:: BGNLOOP - Begin a Loop
1716
1717   Start a loop. Must have a matching ENDLOOP.
1718
1719
1720.. opcode:: BGNSUB - Begin Subroutine
1721
1722   Starts definition of a subroutine. Must have a matching ENDSUB.
1723
1724
1725.. opcode:: ENDLOOP - End a Loop
1726
1727   End a loop started with BGNLOOP.
1728
1729
1730.. opcode:: ENDSUB - End Subroutine
1731
1732   Ends definition of a subroutine.
1733
1734
1735.. opcode:: NOP - No Operation
1736
1737   Do nothing.
1738
1739
1740.. opcode:: BRK - Break
1741
1742   Unconditionally moves the point of execution to the instruction after the
1743   next ENDLOOP or ENDSWITCH. The instruction must appear within a
1744   BGNLOOP/ENDLOOP or SWITCH/ENDSWITCH.
1745
1746
1747.. opcode:: IF - Float If
1748
1749   Start an IF ... ELSE .. ENDIF block.  Condition evaluates to true if
1750
1751      *src0.x* != 0.0
1752
1753   where *src0.x* is interpreted as a floating point register.
1754
1755
1756.. opcode:: UIF - Bitwise If
1757
1758   Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if
1759
1760      *src0.x* != 0
1761
1762   where *src0.x* is interpreted as an integer register.
1763
1764
1765.. opcode:: ELSE - Else
1766
1767   Starts an else block, after an IF or UIF statement.
1768
1769
1770.. opcode:: ENDIF - End If
1771
1772   Ends an IF or UIF block.
1773
1774
1775.. opcode:: SWITCH - Switch
1776
1777   Starts a C-style switch expression. The switch consists of one or multiple
1778   CASE statements, and at most one DEFAULT statement. Execution of a statement
1779   ends when a BRK is hit, but just like in C falling through to other cases
1780   without a break is allowed. Similarly, DEFAULT label is allowed anywhere not
1781   just as last statement, and fallthrough is allowed into/from it.
1782   CASE *src* arguments are evaluated at bit level against the SWITCH *src* argument.
1783
1784   Example::
1785
1786      SWITCH src[0].x
1787      CASE src[0].x
1788      (some instructions here)
1789      (optional BRK here)
1790      DEFAULT
1791      (some instructions here)
1792      (optional BRK here)
1793      CASE src[0].x
1794      (some instructions here)
1795      (optional BRK here)
1796      ENDSWITCH
1797
1798
1799.. opcode:: CASE - Switch case
1800
1801   This represents a switch case label. The *src* arg must be an integer immediate.
1802
1803
1804.. opcode:: DEFAULT - Switch default
1805
1806   This represents the default case in the switch, which is taken if no other
1807   case matches.
1808
1809
1810.. opcode:: ENDSWITCH - End of switch
1811
1812   Ends a switch expression.
1813
1814
1815Interpolation ISA
1816^^^^^^^^^^^^^^^^^
1817
1818The interpolation instructions allow an input to be interpolated in a
1819different way than its declaration. This corresponds to the GLSL 4.00
1820interpolateAt* functions. The first argument of each of these must come from
1821``TGSI_FILE_INPUT``.
1822
1823.. opcode:: INTERP_CENTROID - Interpolate at the centroid
1824
1825   Interpolates the varying specified by *src0* at the centroid
1826
1827.. opcode:: INTERP_SAMPLE - Interpolate at the specified sample
1828
1829   Interpolates the varying specified by *src0* at the sample id
1830   specified by *src1.x* (interpreted as an integer)
1831
1832.. opcode:: INTERP_OFFSET - Interpolate at the specified offset
1833
1834   Interpolates the varying specified by *src0* at the offset *src1.xy*
1835   from the pixel center (interpreted as floats)
1836
1837
1838.. _doubleopcodes:
1839
1840Double ISA
1841^^^^^^^^^^^^^^^
1842
1843The double-precision opcodes reinterpret four-component vectors into
1844two-component vectors with doubled precision in each component.
1845
1846.. opcode:: DABS - Absolute
1847
1848   .. math::
1849
1850      dst.xy = |src0.xy|
1851
1852      dst.zw = |src0.zw|
1853
1854.. opcode:: DADD - Add
1855
1856   .. math::
1857
1858      dst.xy = src0.xy + src1.xy
1859
1860      dst.zw = src0.zw + src1.zw
1861
1862.. opcode:: DSEQ - Set on Equal
1863
1864   .. math::
1865
1866      dst.x = src0.xy == src1.xy ? \sim 0 : 0
1867
1868      dst.z = src0.zw == src1.zw ? \sim 0 : 0
1869
1870.. opcode:: DSNE - Set on Not Equal
1871
1872   .. math::
1873
1874      dst.x = src0.xy != src1.xy ? \sim 0 : 0
1875
1876      dst.z = src0.zw != src1.zw ? \sim 0 : 0
1877
1878.. opcode:: DSLT - Set on Less than
1879
1880   .. math::
1881
1882      dst.x = src0.xy < src1.xy ? \sim 0 : 0
1883
1884      dst.z = src0.zw < src1.zw ? \sim 0 : 0
1885
1886.. opcode:: DSGE - Set on Greater equal
1887
1888   .. math::
1889
1890      dst.x = src0.xy >= src1.xy ? \sim 0 : 0
1891
1892      dst.z = src0.zw >= src1.zw ? \sim 0 : 0
1893
1894.. opcode:: DFRAC - Fraction
1895
1896   .. math::
1897
1898      dst.xy = src.xy - \lfloor src.xy\rfloor
1899
1900      dst.zw = src.zw - \lfloor src.zw\rfloor
1901
1902.. opcode:: DTRUNC - Truncate
1903
1904   .. math::
1905
1906      dst.xy = trunc(src.xy)
1907
1908      dst.zw = trunc(src.zw)
1909
1910.. opcode:: DCEIL - Ceiling
1911
1912   .. math::
1913
1914      dst.xy = \lceil src.xy\rceil
1915
1916      dst.zw = \lceil src.zw\rceil
1917
1918.. opcode:: DFLR - Floor
1919
1920   .. math::
1921
1922      dst.xy = \lfloor src.xy\rfloor
1923
1924      dst.zw = \lfloor src.zw\rfloor
1925
1926.. opcode:: DROUND - Fraction
1927
1928   .. math::
1929
1930      dst.xy = round(src.xy)
1931
1932      dst.zw = round(src.zw)
1933
1934.. opcode:: DSSG - Set Sign
1935
1936   .. math::
1937
1938      dst.xy = (src.xy > 0) ? 1.0 : (src.xy < 0) ? -1.0 : 0.0
1939
1940      dst.zw = (src.zw > 0) ? 1.0 : (src.zw < 0) ? -1.0 : 0.0
1941
1942.. opcode:: DLDEXP - Multiply Number by Integral Power of 2
1943
1944   This opcode is the inverse of frexp. The second
1945   source is an integer.
1946
1947   .. math::
1948
1949      dst.xy = src0.xy \times 2^{src1.x}
1950
1951      dst.zw = src0.zw \times 2^{src1.z}
1952
1953.. opcode:: DMIN - Minimum
1954
1955   .. math::
1956
1957      dst.xy = min(src0.xy, src1.xy)
1958
1959      dst.zw = min(src0.zw, src1.zw)
1960
1961.. opcode:: DMAX - Maximum
1962
1963   .. math::
1964
1965      dst.xy = max(src0.xy, src1.xy)
1966
1967      dst.zw = max(src0.zw, src1.zw)
1968
1969.. opcode:: DMUL - Multiply
1970
1971   .. math::
1972
1973      dst.xy = src0.xy \times src1.xy
1974
1975      dst.zw = src0.zw \times src1.zw
1976
1977
1978.. opcode:: DMAD - Multiply And Add
1979
1980   .. math::
1981
1982      dst.xy = src0.xy \times src1.xy + src2.xy
1983
1984      dst.zw = src0.zw \times src1.zw + src2.zw
1985
1986
1987.. opcode:: DFMA - Fused Multiply-Add
1988
1989   Perform a * b + c with no intermediate rounding step.
1990
1991   .. math::
1992
1993      dst.xy = src0.xy \times src1.xy + src2.xy
1994
1995      dst.zw = src0.zw \times src1.zw + src2.zw
1996
1997
1998.. opcode:: DDIV - Divide
1999
2000   .. math::
2001
2002      dst.xy = \frac{src0.xy}{src1.xy}
2003
2004      dst.zw = \frac{src0.zw}{src1.zw}
2005
2006
2007.. opcode:: DRCP - Reciprocal
2008
2009   .. math::
2010
2011      dst.xy = \frac{1}{src.xy}
2012
2013      dst.zw = \frac{1}{src.zw}
2014
2015.. opcode:: DSQRT - Square Root
2016
2017   .. math::
2018
2019      dst.xy = \sqrt{src.xy}
2020
2021      dst.zw = \sqrt{src.zw}
2022
2023.. opcode:: DRSQ - Reciprocal Square Root
2024
2025   .. math::
2026
2027      dst.xy = \frac{1}{\sqrt{src.xy}}
2028
2029      dst.zw = \frac{1}{\sqrt{src.zw}}
2030
2031.. opcode:: F2D - Float to Double
2032
2033   .. math::
2034
2035      dst.xy = double(src0.x)
2036
2037      dst.zw = double(src0.y)
2038
2039.. opcode:: D2F - Double to Float
2040
2041   .. math::
2042
2043      dst.x = float(src0.xy)
2044
2045      dst.y = float(src0.zw)
2046
2047.. opcode:: I2D - Int to Double
2048
2049   .. math::
2050
2051      dst.xy = double(src0.x)
2052
2053      dst.zw = double(src0.y)
2054
2055.. opcode:: D2I - Double to Int
2056
2057   .. math::
2058
2059      dst.x = int(src0.xy)
2060
2061      dst.y = int(src0.zw)
2062
2063.. opcode:: U2D - Unsigned Int to Double
2064
2065   .. math::
2066
2067      dst.xy = double(src0.x)
2068
2069      dst.zw = double(src0.y)
2070
2071.. opcode:: D2U - Double to Unsigned Int
2072
2073   .. math::
2074
2075      dst.x = unsigned(src0.xy)
2076
2077      dst.y = unsigned(src0.zw)
2078
207964-bit Integer ISA
2080^^^^^^^^^^^^^^^^^^
2081
2082The 64-bit integer opcodes reinterpret four-component vectors into
2083two-component vectors with 64-bits in each component.
2084
2085.. opcode:: I64ABS - 64-bit Integer Absolute Value
2086
2087   .. math::
2088
2089      dst.xy = |src0.xy|
2090
2091      dst.zw = |src0.zw|
2092
2093.. opcode:: I64NEG - 64-bit Integer Negate
2094
2095   Two's complement.
2096
2097   .. math::
2098
2099      dst.xy = -src.xy
2100
2101      dst.zw = -src.zw
2102
2103.. opcode:: I64SSG - 64-bit Integer Set Sign
2104
2105   .. math::
2106
2107      dst.xy = (src0.xy < 0) ? -1 : (src0.xy > 0) ? 1 : 0
2108
2109      dst.zw = (src0.zw < 0) ? -1 : (src0.zw > 0) ? 1 : 0
2110
2111.. opcode:: U64ADD - 64-bit Integer Add
2112
2113   .. math::
2114
2115      dst.xy = src0.xy + src1.xy
2116
2117      dst.zw = src0.zw + src1.zw
2118
2119.. opcode:: U64MUL - 64-bit Integer Multiply
2120
2121   .. math::
2122
2123      dst.xy = src0.xy * src1.xy
2124
2125      dst.zw = src0.zw * src1.zw
2126
2127.. opcode:: U64SEQ - 64-bit Integer Set on Equal
2128
2129   .. math::
2130
2131      dst.x = src0.xy == src1.xy ? \sim 0 : 0
2132
2133      dst.z = src0.zw == src1.zw ? \sim 0 : 0
2134
2135.. opcode:: U64SNE - 64-bit Integer Set on Not Equal
2136
2137   .. math::
2138
2139      dst.x = src0.xy != src1.xy ? \sim 0 : 0
2140
2141      dst.z = src0.zw != src1.zw ? \sim 0 : 0
2142
2143.. opcode:: U64SLT - 64-bit Unsigned Integer Set on Less Than
2144
2145   .. math::
2146
2147      dst.x = src0.xy < src1.xy ? \sim 0 : 0
2148
2149      dst.z = src0.zw < src1.zw ? \sim 0 : 0
2150
2151.. opcode:: U64SGE - 64-bit Unsigned Integer Set on Greater Equal
2152
2153   .. math::
2154
2155      dst.x = src0.xy >= src1.xy ? \sim 0 : 0
2156
2157      dst.z = src0.zw >= src1.zw ? \sim 0 : 0
2158
2159.. opcode:: I64SLT - 64-bit Signed Integer Set on Less Than
2160
2161   .. math::
2162
2163      dst.x = src0.xy < src1.xy ? \sim 0 : 0
2164
2165      dst.z = src0.zw < src1.zw ? \sim 0 : 0
2166
2167.. opcode:: I64SGE - 64-bit Signed Integer Set on Greater Equal
2168
2169   .. math::
2170
2171      dst.x = src0.xy >= src1.xy ? \sim 0 : 0
2172
2173      dst.z = src0.zw >= src1.zw ? \sim 0 : 0
2174
2175.. opcode:: I64MIN - Minimum of 64-bit Signed Integers
2176
2177   .. math::
2178
2179      dst.xy = min(src0.xy, src1.xy)
2180
2181      dst.zw = min(src0.zw, src1.zw)
2182
2183.. opcode:: U64MIN - Minimum of 64-bit Unsigned Integers
2184
2185   .. math::
2186
2187      dst.xy = min(src0.xy, src1.xy)
2188
2189      dst.zw = min(src0.zw, src1.zw)
2190
2191.. opcode:: I64MAX - Maximum of 64-bit Signed Integers
2192
2193   .. math::
2194
2195      dst.xy = max(src0.xy, src1.xy)
2196
2197      dst.zw = max(src0.zw, src1.zw)
2198
2199.. opcode:: U64MAX - Maximum of 64-bit Unsigned Integers
2200
2201   .. math::
2202
2203      dst.xy = max(src0.xy, src1.xy)
2204
2205      dst.zw = max(src0.zw, src1.zw)
2206
2207.. opcode:: U64SHL - Shift Left 64-bit Unsigned Integer
2208
2209   The shift count is masked with ``0x3f`` before the shift is applied.
2210
2211   .. math::
2212
2213      dst.xy = src0.xy \ll (0x3f \& src1.x)
2214
2215      dst.zw = src0.zw \ll (0x3f \& src1.y)
2216
2217.. opcode:: I64SHR - Arithmetic Shift Right (of 64-bit Signed Integer)
2218
2219   The shift count is masked with ``0x3f`` before the shift is applied.
2220
2221   .. math::
2222
2223      dst.xy = src0.xy \gg (0x3f \& src1.x)
2224
2225      dst.zw = src0.zw \gg (0x3f \& src1.y)
2226
2227.. opcode:: U64SHR - Logical Shift Right (of 64-bit Unsigned Integer)
2228
2229   The shift count is masked with ``0x3f`` before the shift is applied.
2230
2231   .. math::
2232
2233      dst.xy = src0.xy \gg (unsigned) (0x3f \& src1.x)
2234
2235      dst.zw = src0.zw \gg (unsigned) (0x3f \& src1.y)
2236
2237.. opcode:: I64DIV - 64-bit Signed Integer Division
2238
2239   .. math::
2240
2241      dst.xy = \frac{src0.xy}{src1.xy}
2242
2243      dst.zw = \frac{src0.zw}{src1.zw}
2244
2245.. opcode:: U64DIV - 64-bit Unsigned Integer Division
2246
2247   .. math::
2248
2249      dst.xy = \frac{src0.xy}{src1.xy}
2250
2251      dst.zw = \frac{src0.zw}{src1.zw}
2252
2253.. opcode:: U64MOD - 64-bit Unsigned Integer Remainder
2254
2255   .. math::
2256
2257      dst.xy = src0.xy \bmod src1.xy
2258
2259      dst.zw = src0.zw \bmod src1.zw
2260
2261.. opcode:: I64MOD - 64-bit Signed Integer Remainder
2262
2263   .. math::
2264
2265      dst.xy = src0.xy \bmod src1.xy
2266
2267      dst.zw = src0.zw \bmod src1.zw
2268
2269.. opcode:: F2U64 - Float to 64-bit Unsigned Int
2270
2271   .. math::
2272
2273      dst.xy = (uint64_t) src0.x
2274
2275      dst.zw = (uint64_t) src0.y
2276
2277.. opcode:: F2I64 - Float to 64-bit Int
2278
2279   .. math::
2280
2281      dst.xy = (int64_t) src0.x
2282
2283      dst.zw = (int64_t) src0.y
2284
2285.. opcode:: U2I64 - Unsigned Integer to 64-bit Integer
2286
2287   This is a zero extension.
2288
2289   .. math::
2290
2291      dst.xy = (int64_t) src0.x
2292
2293      dst.zw = (int64_t) src0.y
2294
2295.. opcode:: I2I64 - Signed Integer to 64-bit Integer
2296
2297   This is a sign extension.
2298
2299   .. math::
2300
2301      dst.xy = (int64_t) src0.x
2302
2303      dst.zw = (int64_t) src0.y
2304
2305.. opcode:: D2U64 - Double to 64-bit Unsigned Int
2306
2307   .. math::
2308
2309      dst.xy = (uint64_t) src0.xy
2310
2311      dst.zw = (uint64_t) src0.zw
2312
2313.. opcode:: D2I64 - Double to 64-bit Int
2314
2315   .. math::
2316
2317      dst.xy = (int64_t) src0.xy
2318
2319      dst.zw = (int64_t) src0.zw
2320
2321.. opcode:: U642F - 64-bit unsigned integer to float
2322
2323   .. math::
2324
2325      dst.x = (float) src0.xy
2326
2327      dst.y = (float) src0.zw
2328
2329.. opcode:: I642F - 64-bit Int to Float
2330
2331   .. math::
2332
2333      dst.x = (float) src0.xy
2334
2335      dst.y = (float) src0.zw
2336
2337.. opcode:: U642D - 64-bit unsigned integer to double
2338
2339   .. math::
2340
2341      dst.xy = (double) src0.xy
2342
2343      dst.zw = (double) src0.zw
2344
2345.. opcode:: I642D - 64-bit Int to double
2346
2347   .. math::
2348
2349      dst.xy = (double) src0.xy
2350
2351      dst.zw = (double) src0.zw
2352
2353.. _samplingopcodes:
2354
2355Resource Sampling Opcodes
2356^^^^^^^^^^^^^^^^^^^^^^^^^
2357
2358Those opcodes follow very closely semantics of the respective Direct3D
2359instructions. If in doubt double check Direct3D documentation.
2360Note that the swizzle on SVIEW (src1) determines texel swizzling
2361after lookup.
2362
2363.. opcode:: SAMPLE
2364
2365   Using provided address, sample data from the specified texture using the
2366   filtering mode identified by the given sampler. The source data may come from
2367   any resource type other than buffers.
2368
2369   Syntax: ``SAMPLE dst, address, sampler_view, sampler``
2370
2371   Example: ``SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]``
2372
2373.. opcode:: SAMPLE_I
2374
2375   Simplified alternative to the SAMPLE instruction.  Using the provided
2376   integer address, SAMPLE_I fetches data from the specified sampler view
2377   without any filtering.  The source data may come from any resource type
2378   other than CUBE.
2379
2380   Syntax: ``SAMPLE_I dst, address, sampler_view``
2381
2382   Example: ``SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]``
2383
2384   The 'address' is specified as unsigned integers. If the 'address' is out of
2385   range [0...(# texels - 1)] the result of the fetch is always 0 in all
2386   components.  As such the instruction doesn't honor address wrap modes, in
2387   cases where that behavior is desirable 'SAMPLE' instruction should be used.
2388   address.w always provides an unsigned integer mipmap level. If the value is
2389   out of the range then the instruction always returns 0 in all components.
2390   address.yz are ignored for buffers and 1d textures.  address.z is ignored
2391   for 1d texture arrays and 2d textures.
2392
2393   For 1D texture arrays address.y provides the array index (also as unsigned
2394   integer). If the value is out of the range of available array indices
2395   [0... (array size - 1)] then the opcode always returns 0 in all components.
2396   For 2D texture arrays address.z provides the array index, otherwise it
2397   exhibits the same behavior as in the case for 1D texture arrays.  The exact
2398   semantics of the source address are presented in the table below:
2399
2400   +---------------------------+----+-----+-----+---------+
2401   | resource type             | X  |  Y  |  Z  |    W    |
2402   +===========================+====+=====+=====+=========+
2403   | ``PIPE_BUFFER``           | x  |     |     | ignored |
2404   +---------------------------+----+-----+-----+---------+
2405   | ``PIPE_TEXTURE_1D``       | x  |     |     |   mpl   |
2406   +---------------------------+----+-----+-----+---------+
2407   | ``PIPE_TEXTURE_2D``       | x  |  y  |     |   mpl   |
2408   +---------------------------+----+-----+-----+---------+
2409   | ``PIPE_TEXTURE_3D``       | x  |  y  |  z  |   mpl   |
2410   +---------------------------+----+-----+-----+---------+
2411   | ``PIPE_TEXTURE_RECT``     | x  |  y  |     |   mpl   |
2412   +---------------------------+----+-----+-----+---------+
2413   | ``PIPE_TEXTURE_CUBE``     | not allowed as source    |
2414   +---------------------------+----+-----+-----+---------+
2415   | ``PIPE_TEXTURE_1D_ARRAY`` | x  | idx |     |   mpl   |
2416   +---------------------------+----+-----+-----+---------+
2417   | ``PIPE_TEXTURE_2D_ARRAY`` | x  |  y  | idx |   mpl   |
2418   +---------------------------+----+-----+-----+---------+
2419
2420   Where 'mpl' is a mipmap level and 'idx' is the array index.
2421
2422.. opcode:: SAMPLE_I_MS
2423
2424   Just like SAMPLE_I but allows fetch data from multi-sampled surfaces.
2425
2426   Syntax: ``SAMPLE_I_MS dst, address, sampler_view, sample``
2427
2428.. opcode:: SAMPLE_B
2429
2430   Just like the SAMPLE instruction with the exception that an additional bias
2431   is applied to the level of detail computed as part of the instruction
2432   execution.
2433
2434   Syntax: ``SAMPLE_B dst, address, sampler_view, sampler, lod_bias``
2435
2436   Example: ``SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2437
2438.. opcode:: SAMPLE_C
2439
2440   Similar to the SAMPLE instruction but it performs a comparison filter. The
2441   operands to SAMPLE_C are identical to SAMPLE, except that there is an
2442   additional float32 operand, reference value, which must be a register with
2443   single-component, or a scalar literal.  SAMPLE_C makes the hardware use the
2444   current samplers compare_func (in pipe_sampler_state) to compare reference
2445   value against the red component value for the source resource at each texel
2446   that the currently configured texture filter covers based on the provided
2447   coordinates.
2448
2449   Syntax: ``SAMPLE_C dst, address, sampler_view.r, sampler, ref_value``
2450
2451   Example: ``SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2452
2453.. opcode:: SAMPLE_C_LZ
2454
2455   Same as SAMPLE_C, but LOD is 0 and derivatives are ignored. The LZ stands
2456   for level-zero.
2457
2458   Syntax: ``SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value``
2459
2460   Example: ``SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2461
2462
2463.. opcode:: SAMPLE_D
2464
2465   SAMPLE_D is identical to the SAMPLE opcode except that the derivatives for
2466   the source address in the x direction and the y direction are provided by
2467   extra parameters.
2468
2469   Syntax: ``SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y``
2470
2471   Example: ``SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]``
2472
2473.. opcode:: SAMPLE_L
2474
2475   SAMPLE_L is identical to the SAMPLE opcode except that the LOD is provided
2476   directly as a scalar value, representing no anisotropy.
2477
2478   Syntax: ``SAMPLE_L dst, address, sampler_view, sampler, explicit_lod``
2479
2480   Example: ``SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2481
2482.. opcode:: GATHER4
2483
2484   Gathers the four texels to be used in a bi-linear filtering operation and
2485   packs them into a single register.  Only works with 2D, 2D array, cubemaps,
2486   and cubemaps arrays.  For 2D textures, only the addressing modes of the
2487   sampler and the top level of any mip pyramid are used. Set W to zero.  It
2488   behaves like the SAMPLE instruction, but a filtered sample is not
2489   generated. The four samples that contribute to filtering are placed into
2490   XYZW in counter-clockwise order, starting with the (u,v) texture coordinate
2491   delta at the following locations (-, +), (+, +), (+, -), (-, -), where the
2492   magnitude of the deltas are half a texel.
2493
2494
2495.. opcode:: SVIEWINFO
2496
2497   Query the dimensions of a given sampler view.  dst receives width, height,
2498   depth or array size and number of mipmap levels as int4. The dst can have a
2499   writemask which will specify what info is the caller interested in.
2500
2501   Syntax: ``SVIEWINFO dst, src_mip_level, sampler_view``
2502
2503   Example: ``SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]``
2504
2505   src_mip_level is an unsigned integer scalar. If it's out of range then
2506   returns 0 for width, height and depth/array size but the total number of
2507   mipmap is still returned correctly for the given sampler view.  The returned
2508   width, height and depth values are for the mipmap level selected by the
2509   src_mip_level and are in the number of texels.  For 1d texture array width
2510   is in dst.x, array size is in dst.y and dst.z is 0. The number of mipmaps is
2511   still in dst.w.  In contrast to d3d10 resinfo, there's no way in the TGSI
2512   instruction encoding to specify the return type (float/rcpfloat/uint), hence
2513   always using uint. Also, unlike the SAMPLE instructions, the swizzle on src1
2514   resinfo allowing swizzling dst values is ignored (due to the interaction
2515   with rcpfloat modifier which requires some swizzle handling in the state
2516   tracker anyway).
2517
2518.. opcode:: SAMPLE_POS
2519
2520   Query the position of a sample in the given resource or render target
2521   when per-sample fragment shading is in effect.
2522
2523   Syntax: ``SAMPLE_POS dst, source, sample_index``
2524
2525   dst receives float4 (x, y, undef, undef) indicated where the sample is
2526   located. Sample locations are in the range [0, 1] where 0.5 is the center
2527   of the fragment.
2528
2529   source is either a sampler view (to indicate a shader resource) or temp
2530   register (to indicate the render target).  The source register may have
2531   an optional swizzle to apply to the returned result
2532
2533   sample_index is an integer scalar indicating which sample position is to
2534   be queried.
2535
2536   If per-sample shading is not in effect or the source resource or render
2537   target is not multisampled, the result is (0.5, 0.5, undef, undef).
2538
2539   NOTE: no driver has implemented this opcode yet (and no gallium frontend
2540   emits it).  This information is subject to change.
2541
2542.. opcode:: SAMPLE_INFO
2543
2544   Query the number of samples in a multisampled resource or render target.
2545
2546   Syntax: ``SAMPLE_INFO dst, source``
2547
2548   dst receives int4 (n, 0, 0, 0) where n is the number of samples in a
2549   resource or the render target.
2550
2551   source is either a sampler view (to indicate a shader resource) or temp
2552   register (to indicate the render target).  The source register may have
2553   an optional swizzle to apply to the returned result
2554
2555   If per-sample shading is not in effect or the source resource or render
2556   target is not multisampled, the result is (1, 0, 0, 0).
2557
2558   NOTE: no driver has implemented this opcode yet (and no gallium frontend
2559   emits it).  This information is subject to change.
2560
2561.. opcode:: LOD - level of detail
2562
2563   Same syntax as the SAMPLE opcode but instead of performing an actual
2564   texture lookup/filter, return the computed LOD information that the
2565   texture pipe would use to access the texture. The Y component contains
2566   the computed LOD lambda_prime. The X component contains the LOD that will
2567   be accessed, based on min/max lod's and mipmap filters.
2568   The Z and W components are set to 0.
2569
2570   Syntax: ``LOD dst, address, sampler_view, sampler``
2571
2572
2573.. _resourceopcodes:
2574
2575Resource Access Opcodes
2576^^^^^^^^^^^^^^^^^^^^^^^
2577
2578For these opcodes, the resource can be a BUFFER, IMAGE, or MEMORY.
2579
2580.. opcode:: LOAD - Fetch data from a shader buffer or image
2581
2582   Syntax: ``LOAD dst, resource, address``
2583
2584   Example: ``LOAD TEMP[0], BUFFER[0], TEMP[1]``
2585
2586   Using the provided integer address, LOAD fetches data from the
2587   specified buffer or texture without any filtering.
2588
2589   The 'address' is specified as a vector of unsigned integers.  If the
2590   'address' is out of range the result is unspecified.
2591
2592   Only the first mipmap level of a resource can be read from using this
2593   instruction.
2594
2595   For 1D or 2D texture arrays, the array index is provided as an
2596   unsigned integer in address.y or address.z, respectively.  address.yz
2597   are ignored for buffers and 1D textures.  address.z is ignored for 1D
2598   texture arrays and 2D textures.  address.w is always ignored.
2599
2600   A swizzle suffix may be added to the resource argument this will
2601   cause the resource data to be swizzled accordingly.
2602
2603.. opcode:: STORE - Write data to a shader resource
2604
2605   Syntax: ``STORE resource, address, src``
2606
2607   Example: ``STORE BUFFER[0], TEMP[0], TEMP[1]``
2608
2609   Using the provided integer address, STORE writes data to the
2610   specified buffer or texture.
2611
2612   The 'address' is specified as a vector of unsigned integers.  If the
2613   'address' is out of range the result is unspecified.
2614
2615   Only the first mipmap level of a resource can be written to using
2616   this instruction.
2617
2618   For 1D or 2D texture arrays, the array index is provided as an
2619   unsigned integer in address.y or address.z, respectively.
2620   address.yz are ignored for buffers and 1D textures.  address.z is
2621   ignored for 1D texture arrays and 2D textures.  address.w is always
2622   ignored.
2623
2624.. opcode:: RESQ - Query information about a resource
2625
2626   Syntax: ``RESQ dst, resource``
2627
2628   Example: ``RESQ TEMP[0], BUFFER[0]``
2629
2630   Returns information about the buffer or image resource. For buffer
2631   resources, the size (in bytes) is returned in the x component. For
2632   image resources, .xyz will contain the width/height/layers of the
2633   image, while .w will contain the number of samples for multi-sampled
2634   images.
2635
2636.. opcode:: FBFETCH - Load data from framebuffer
2637
2638   Syntax: ``FBFETCH dst, output``
2639
2640   Example: ``FBFETCH TEMP[0], OUT[0]``
2641
2642   This is only valid on ``COLOR`` semantic outputs. Returns the color
2643   of the current position in the framebuffer from before this fragment
2644   shader invocation. May return the same value from multiple calls for
2645   a particular output within a single invocation. Note that result may
2646   be undefined if a fragment is drawn multiple times without a blend
2647   barrier in between.
2648
2649
2650.. _bindlessopcodes:
2651
2652Bindless Opcodes
2653^^^^^^^^^^^^^^^^
2654
2655These opcodes are for working with bindless sampler or image handles and
2656require PIPE_CAP_BINDLESS_TEXTURE.
2657
2658.. opcode:: IMG2HND - Get a bindless handle for a image
2659
2660   Syntax: ``IMG2HND dst, image``
2661
2662   Example: ``IMG2HND TEMP[0], IMAGE[0]``
2663
2664   Sets 'dst' to a bindless handle for 'image'.
2665
2666.. opcode:: SAMP2HND - Get a bindless handle for a sampler
2667
2668   Syntax: ``SAMP2HND dst, sampler``
2669
2670   Example: ``SAMP2HND TEMP[0], SAMP[0]``
2671
2672   Sets 'dst' to a bindless handle for 'sampler'.
2673
2674
2675.. _threadsyncopcodes:
2676
2677Inter-thread synchronization opcodes
2678^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2679
2680These opcodes are intended for communication between threads running
2681within the same compute grid.  For now they're only valid in compute
2682programs.
2683
2684.. opcode:: BARRIER - Thread group barrier
2685
2686   ``BARRIER``
2687
2688   This opcode suspends the execution of the current thread until all
2689   the remaining threads in the working group reach the same point of
2690   the program.  Results are unspecified if any of the remaining
2691   threads terminates or never reaches an executed BARRIER instruction.
2692
2693.. opcode:: MEMBAR - Memory barrier
2694
2695   ``MEMBAR type``
2696
2697   This opcode waits for the completion of all memory accesses based on
2698   the type passed in. The type is an immediate bitfield with the following
2699   meaning:
2700
2701   Bit 0: Shader storage buffers
2702   Bit 1: Atomic buffers
2703   Bit 2: Images
2704   Bit 3: Shared memory
2705   Bit 4: Thread group
2706
2707   These may be passed in in any combination. An implementation is free to not
2708   distinguish between these as it sees fit. However these map to all the
2709   possibilities made available by GLSL.
2710
2711.. _atomopcodes:
2712
2713Atomic opcodes
2714^^^^^^^^^^^^^^
2715
2716These opcodes provide atomic variants of some common arithmetic and
2717logical operations.  In this context atomicity means that another
2718concurrent memory access operation that affects the same memory
2719location is guaranteed to be performed strictly before or after the
2720entire execution of the atomic operation. The resource may be a BUFFER,
2721IMAGE, HWATOMIC, or MEMORY.  In the case of an image, the offset works
2722the same as for ``LOAD`` and ``STORE``, specified above. For atomic
2723counters, the offset is an immediate index to the base HW atomic
2724counter for this operation.
2725These atomic operations may only be used with 32-bit integer image formats.
2726
2727.. opcode:: ATOMUADD - Atomic integer addition
2728
2729   Syntax: ``ATOMUADD dst, resource, offset, src``
2730
2731   Example: ``ATOMUADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2732
2733   The following operation is performed atomically:
2734
2735   .. math::
2736
2737      dst_x = resource[offset]
2738
2739      resource[offset] = dst_x + src_x
2740
2741
2742.. opcode:: ATOMFADD - Atomic floating point addition
2743
2744   Syntax: ``ATOMFADD dst, resource, offset, src``
2745
2746   Example: ``ATOMFADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2747
2748   The following operation is performed atomically:
2749
2750   .. math::
2751
2752      dst_x = resource[offset]
2753
2754      resource[offset] = dst_x + src_x
2755
2756
2757.. opcode:: ATOMXCHG - Atomic exchange
2758
2759   Syntax: ``ATOMXCHG dst, resource, offset, src``
2760
2761   Example: ``ATOMXCHG TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2762
2763   The following operation is performed atomically:
2764
2765   .. math::
2766
2767      dst_x = resource[offset]
2768
2769      resource[offset] = src_x
2770
2771
2772.. opcode:: ATOMCAS - Atomic compare-and-exchange
2773
2774   Syntax: ``ATOMCAS dst, resource, offset, cmp, src``
2775
2776   Example: ``ATOMCAS TEMP[0], BUFFER[0], TEMP[1], TEMP[2], TEMP[3]``
2777
2778   The following operation is performed atomically:
2779
2780   .. math::
2781
2782      dst_x = resource[offset]
2783
2784      resource[offset] = (dst_x == cmp_x ? src_x : dst_x)
2785
2786
2787.. opcode:: ATOMAND - Atomic bitwise And
2788
2789   Syntax: ``ATOMAND dst, resource, offset, src``
2790
2791   Example: ``ATOMAND TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2792
2793   The following operation is performed atomically:
2794
2795   .. math::
2796
2797      dst_x = resource[offset]
2798
2799      resource[offset] = dst_x \& src_x
2800
2801
2802.. opcode:: ATOMOR - Atomic bitwise Or
2803
2804   Syntax: ``ATOMOR dst, resource, offset, src``
2805
2806   Example: ``ATOMOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2807
2808   The following operation is performed atomically:
2809
2810   .. math::
2811
2812      dst_x = resource[offset]
2813
2814      resource[offset] = dst_x | src_x
2815
2816
2817.. opcode:: ATOMXOR - Atomic bitwise Xor
2818
2819   Syntax: ``ATOMXOR dst, resource, offset, src``
2820
2821   Example: ``ATOMXOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2822
2823   The following operation is performed atomically:
2824
2825   .. math::
2826
2827      dst_x = resource[offset]
2828
2829      resource[offset] = dst_x \oplus src_x
2830
2831
2832.. opcode:: ATOMUMIN - Atomic unsigned minimum
2833
2834   Syntax: ``ATOMUMIN dst, resource, offset, src``
2835
2836   Example: ``ATOMUMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2837
2838   The following operation is performed atomically:
2839
2840   .. math::
2841
2842      dst_x = resource[offset]
2843
2844      resource[offset] = (dst_x < src_x ? dst_x : src_x)
2845
2846
2847.. opcode:: ATOMUMAX - Atomic unsigned maximum
2848
2849   Syntax: ``ATOMUMAX dst, resource, offset, src``
2850
2851   Example: ``ATOMUMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2852
2853   The following operation is performed atomically:
2854
2855   .. math::
2856
2857      dst_x = resource[offset]
2858
2859      resource[offset] = (dst_x > src_x ? dst_x : src_x)
2860
2861
2862.. opcode:: ATOMIMIN - Atomic signed minimum
2863
2864   Syntax: ``ATOMIMIN dst, resource, offset, src``
2865
2866   Example: ``ATOMIMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2867
2868   The following operation is performed atomically:
2869
2870   .. math::
2871
2872      dst_x = resource[offset]
2873
2874      resource[offset] = (dst_x < src_x ? dst_x : src_x)
2875
2876
2877.. opcode:: ATOMIMAX - Atomic signed maximum
2878
2879   Syntax: ``ATOMIMAX dst, resource, offset, src``
2880
2881   Example: ``ATOMIMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2882
2883   The following operation is performed atomically:
2884
2885   .. math::
2886
2887      dst_x = resource[offset]
2888
2889      resource[offset] = (dst_x > src_x ? dst_x : src_x)
2890
2891
2892.. opcode:: ATOMINC_WRAP - Atomic increment + wrap around
2893
2894   Syntax: ``ATOMINC_WRAP dst, resource, offset, src``
2895
2896   Example: ``ATOMINC_WRAP TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2897
2898   The following operation is performed atomically:
2899
2900   .. math::
2901
2902      dst_x = resource[offset] + 1
2903
2904      resource[offset] = dst_x <= src_x ? dst_x : 0
2905
2906
2907.. opcode:: ATOMDEC_WRAP - Atomic decrement + wrap around
2908
2909   Syntax: ``ATOMDEC_WRAP dst, resource, offset, src``
2910
2911   Example: ``ATOMDEC_WRAP TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2912
2913   The following operation is performed atomically:
2914
2915   .. math::
2916
2917      dst_x = resource[offset]
2918
2919      resource[offset] =
2920      \left\{
2921      \begin{array}{ c l }
2922         dst_x - 1 & \quad \textrm{if } dst_x \gt 0 \textrm{ and } dst_x \lt src_x \\
2923         0         & \quad \textrm{otherwise}
2924      \end{array}
2925      \right.
2926
2927.. _interlaneopcodes:
2928
2929Inter-lane opcodes
2930^^^^^^^^^^^^^^^^^^
2931
2932These opcodes reduce the given value across the shader invocations
2933running in the current SIMD group. Every thread in the subgroup will receive
2934the same result. The BALLOT operations accept a single-channel argument that
2935is treated as a boolean and produce a 64-bit value.
2936
2937.. opcode:: VOTE_ANY - Value is set in any of the active invocations
2938
2939   Syntax: ``VOTE_ANY dst, value``
2940
2941   Example: ``VOTE_ANY TEMP[0].x, TEMP[1].x``
2942
2943
2944.. opcode:: VOTE_ALL - Value is set in all of the active invocations
2945
2946   Syntax: ``VOTE_ALL dst, value``
2947
2948   Example: ``VOTE_ALL TEMP[0].x, TEMP[1].x``
2949
2950
2951.. opcode:: VOTE_EQ - Value is the same in all of the active invocations
2952
2953   Syntax: ``VOTE_EQ dst, value``
2954
2955   Example: ``VOTE_EQ TEMP[0].x, TEMP[1].x``
2956
2957
2958.. opcode:: BALLOT - Lanemask of whether the value is set in each active
2959            invocation
2960
2961   Syntax: ``BALLOT dst, value``
2962
2963   Example: ``BALLOT TEMP[0].xy, TEMP[1].x``
2964
2965   When the argument is a constant true, this produces a bitmask of active
2966   invocations. In fragment shaders, this can include helper invocations
2967   (invocations whose outputs and writes to memory are discarded, but which
2968   are used to compute derivatives).
2969
2970
2971.. opcode:: READ_FIRST - Broadcast the value from the first active
2972            invocation to all active lanes
2973
2974   Syntax: ``READ_FIRST dst, value``
2975
2976   Example: ``READ_FIRST TEMP[0], TEMP[1]``
2977
2978
2979.. opcode:: READ_INVOC - Retrieve the value from the given invocation
2980            (need not be uniform)
2981
2982   Syntax: ``READ_INVOC dst, value, invocation``
2983
2984   Example: ``READ_INVOC TEMP[0].xy, TEMP[1].xy, TEMP[2].x``
2985
2986   invocation.x controls the invocation number to read from for all channels.
2987   The invocation number must be the same across all active invocations in a
2988   sub-group; otherwise, the results are undefined.
2989
2990
2991Explanation of symbols used
2992------------------------------
2993
2994
2995Functions
2996^^^^^^^^^^^^^^
2997
2998
2999   :math:`|x|`       Absolute value of ``x``.
3000
3001   :math:`\lceil x \rceil` Ceiling of ``x``.
3002
3003   clamp(x,y,z)      Clamp x between y and z.
3004                     (x < y) ? y : (x > z) ? z : x
3005
3006   :math:`\lfloor x\rfloor` Floor of ``x``.
3007
3008   :math:`\log_2{x}` Logarithm of ``x``, base 2.
3009
3010   max(x,y)          Maximum of x and y.
3011                     (x > y) ? x : y
3012
3013   min(x,y)          Minimum of x and y.
3014                     (x < y) ? x : y
3015
3016   partialx(x)       Derivative of x relative to fragment's X.
3017
3018   partialy(x)       Derivative of x relative to fragment's Y.
3019
3020   pop()             Pop from stack.
3021
3022   :math:`x^y`       ``x`` to the power ``y``.
3023
3024   push(x)           Push x on stack.
3025
3026   round(x)          Round x.
3027
3028   trunc(x)          Truncate x, i.e. drop the fraction bits.
3029
3030
3031Keywords
3032^^^^^^^^^^^^^
3033
3034
3035   discard           Discard fragment.
3036
3037   pc                Program counter.
3038
3039   target            Label of target instruction.
3040
3041
3042Other tokens
3043---------------
3044
3045
3046Declaration
3047^^^^^^^^^^^
3048
3049
3050Declares a register that is will be referenced as an operand in Instruction
3051tokens.
3052
3053File field contains register file that is being declared and is one
3054of TGSI_FILE.
3055
3056UsageMask field specifies which of the register components can be accessed
3057and is one of TGSI_WRITEMASK.
3058
3059The Local flag specifies that a given value isn't intended for
3060subroutine parameter passing and, as a result, the implementation
3061isn't required to give any guarantees of it being preserved across
3062subroutine boundaries.  As it's merely a compiler hint, the
3063implementation is free to ignore it.
3064
3065If Dimension flag is set to 1, a Declaration Dimension token follows.
3066
3067If Semantic flag is set to 1, a Declaration Semantic token follows.
3068
3069If Interpolate flag is set to 1, a Declaration Interpolate token follows.
3070
3071If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows.
3072
3073If Array flag is set to 1, a Declaration Array token follows.
3074
3075Array Declaration
3076^^^^^^^^^^^^^^^^^^^^^^^^
3077
3078Declarations can optional have an ArrayID attribute which can be referred by
3079indirect addressing operands. An ArrayID of zero is reserved and treated as
3080if no ArrayID is specified.
3081
3082If an indirect addressing operand refers to a specific declaration by using
3083an ArrayID only the registers in this declaration are guaranteed to be
3084accessed, accessing any register outside this declaration results in undefined
3085behavior. Note that for compatibility the effective index is zero-based and
3086not relative to the specified declaration
3087
3088If no ArrayID is specified with an indirect addressing operand the whole
3089register file might be accessed by this operand. This is strongly discouraged
3090and will prevent packing of scalar/vec2 arrays and effective alias analysis.
3091This is only legal for TEMP and CONST register files.
3092
3093Declaration Semantic
3094^^^^^^^^^^^^^^^^^^^^^^^^
3095
3096Vertex and fragment shader input and output registers may be labeled
3097with semantic information consisting of a name and index.
3098
3099Follows Declaration token if Semantic bit is set.
3100
3101Since its purpose is to link a shader with other stages of the pipeline,
3102it is valid to follow only those Declaration tokens that declare a register
3103either in INPUT or OUTPUT file.
3104
3105SemanticName field contains the semantic name of the register being declared.
3106There is no default value.
3107
3108SemanticIndex is an optional subscript that can be used to distinguish
3109different register declarations with the same semantic name. The default value
3110is 0.
3111
3112The meanings of the individual semantic names are explained in the following
3113sections.
3114
3115TGSI_SEMANTIC_POSITION
3116""""""""""""""""""""""
3117
3118For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader
3119output register which contains the homogeneous vertex position in the clip
3120space coordinate system.  After clipping, the X, Y and Z components of the
3121vertex will be divided by the W value to get normalized device coordinates.
3122
3123For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that
3124fragment shader input (or system value, depending on which one is
3125supported by the driver) contains the fragment's window position.  The X
3126component starts at zero and always increases from left to right.
3127The Y component starts at zero and always increases but Y=0 may either
3128indicate the top of the window or the bottom depending on the fragment
3129coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN).
3130The Z coordinate ranges from 0 to 1 to represent depth from the front
3131to the back of the Z buffer.  The W component contains the interpolated
3132reciprocal of the vertex position W component (corresponding to gl_Fragcoord,
3133but unlike d3d10 which interpolates the same 1/w but then gives back
3134the reciprocal of the interpolated value).
3135
3136Fragment shaders may also declare an output register with
3137TGSI_SEMANTIC_POSITION.  Only the Z component is writable.  This allows
3138the fragment shader to change the fragment's Z position.
3139
3140
3141
3142TGSI_SEMANTIC_COLOR
3143"""""""""""""""""""
3144
3145For vertex shader outputs or fragment shader inputs/outputs, this
3146label indicates that the register contains an R,G,B,A color.
3147
3148Several shader inputs/outputs may contain colors so the semantic index
3149is used to distinguish them.  For example, color[0] may be the diffuse
3150color while color[1] may be the specular color.
3151
3152This label is needed so that the flat/smooth shading can be applied
3153to the right interpolants during rasterization.
3154
3155
3156
3157TGSI_SEMANTIC_BCOLOR
3158""""""""""""""""""""
3159
3160Back-facing colors are only used for back-facing polygons, and are only valid
3161in vertex shader outputs. After rasterization, all polygons are front-facing
3162and COLOR and BCOLOR end up occupying the same slots in the fragment shader,
3163so all BCOLORs effectively become regular COLORs in the fragment shader.
3164
3165
3166TGSI_SEMANTIC_FOG
3167"""""""""""""""""
3168
3169Vertex shader inputs and outputs and fragment shader inputs may be
3170labeled with TGSI_SEMANTIC_FOG to indicate that the register contains
3171a fog coordinate.  Typically, the fragment shader will use the fog coordinate
3172to compute a fog blend factor which is used to blend the normal fragment color
3173with a constant fog color.  But fog coord really is just an ordinary vec4
3174register like regular semantics.
3175
3176
3177TGSI_SEMANTIC_PSIZE
3178"""""""""""""""""""
3179
3180Vertex shader input and output registers may be labeled with
3181TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size
3182in the form (S, 0, 0, 1).  The point size controls the width or diameter
3183of points for rasterization.  This label cannot be used in fragment
3184shaders.
3185
3186When using this semantic, be sure to set the appropriate state in the
3187:ref:`rasterizer` first.
3188
3189
3190TGSI_SEMANTIC_TEXCOORD
3191""""""""""""""""""""""
3192
3193Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
3194
3195Vertex shader outputs and fragment shader inputs may be labeled with
3196this semantic to make them replaceable by sprite coordinates via the
3197sprite_coord_enable state in the :ref:`rasterizer`.
3198The semantic index permitted with this semantic is limited to <= 7.
3199
3200If the driver does not support TEXCOORD, sprite coordinate replacement
3201applies to inputs with the GENERIC semantic instead.
3202
3203The intended use case for this semantic is gl_TexCoord.
3204
3205
3206TGSI_SEMANTIC_PCOORD
3207""""""""""""""""""""
3208
3209Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
3210
3211Fragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate
3212that the register contains sprite coordinates in the form (x, y, 0, 1), if
3213the current primitive is a point and point sprites are enabled. Otherwise,
3214the contents of the register are undefined.
3215
3216The intended use case for this semantic is gl_PointCoord.
3217
3218
3219TGSI_SEMANTIC_GENERIC
3220"""""""""""""""""""""
3221
3222All vertex/fragment shader inputs/outputs not labeled with any other
3223semantic label can be considered to be generic attributes.  Typical
3224uses of generic inputs/outputs are texcoords and user-defined values.
3225
3226
3227TGSI_SEMANTIC_NORMAL
3228""""""""""""""""""""
3229
3230Indicates that a vertex shader input is a normal vector.  This is
3231typically only used for legacy graphics APIs.
3232
3233
3234TGSI_SEMANTIC_FACE
3235""""""""""""""""""
3236
3237This label applies to fragment shader inputs (or system values,
3238depending on which one is supported by the driver) and indicates that
3239the register contains front/back-face information.
3240
3241If it is an input, it will be a floating-point vector in the form (F, 0, 0, 1),
3242where F will be positive when the fragment belongs to a front-facing polygon,
3243and negative when the fragment belongs to a back-facing polygon.
3244
3245If it is a system value, it will be an integer vector in the form (F, 0, 0, 1),
3246where F is ``0xffffffff`` when the fragment belongs to a front-facing polygon
3247and ``0`` when the fragment belongs to a back-facing polygon.
3248
3249
3250TGSI_SEMANTIC_EDGEFLAG
3251""""""""""""""""""""""
3252
3253For vertex shaders, this semantic label indicates that an input or
3254output is a boolean edge flag.  The register layout is [F, x, x, x]
3255where F is 0.0 or 1.0 and x = don't care.  Normally, the vertex shader
3256simply copies the edge flag input to the edge flag output.
3257
3258Edge flags are used to control which lines or points are actually
3259drawn when the polygon mode converts triangles/quads/polygons into
3260points or lines.
3261
3262
3263TGSI_SEMANTIC_STENCIL
3264"""""""""""""""""""""
3265
3266For fragment shaders, this semantic label indicates that an output
3267is a writable stencil reference value. Only the Y component is writable.
3268This allows the fragment shader to change the fragments stencil reference
3269value.
3270
3271
3272TGSI_SEMANTIC_VIEWPORT_INDEX
3273""""""""""""""""""""""""""""
3274
3275For geometry shaders, this semantic label indicates that an output
3276contains the index of the viewport (and scissor) to use.
3277This is an integer value, and only the X component is used.
3278
3279If PIPE_CAP_VS_LAYER_VIEWPORT or PIPE_CAP_TES_LAYER_VIEWPORT is
3280supported, then this semantic label can also be used in vertex or
3281tessellation evaluation shaders, respectively. Only the value written in the
3282last vertex processing stage is used.
3283
3284
3285TGSI_SEMANTIC_LAYER
3286"""""""""""""""""""
3287
3288For geometry shaders, this semantic label indicates that an output
3289contains the layer value to use for the color and depth/stencil surfaces.
3290This is an integer value, and only the X component is used.
3291(Also known as rendertarget array index.)
3292
3293If PIPE_CAP_VS_LAYER_VIEWPORT or PIPE_CAP_TES_LAYER_VIEWPORT is
3294supported, then this semantic label can also be used in vertex or
3295tessellation evaluation shaders, respectively. Only the value written in the
3296last vertex processing stage is used.
3297
3298
3299TGSI_SEMANTIC_CLIPDIST
3300""""""""""""""""""""""
3301
3302Note this covers clipping and culling distances.
3303
3304When components of vertex elements are identified this way, these
3305values are each assumed to be a float32 signed distance to a plane.
3306
3307For clip distances:
3308Primitive setup only invokes rasterization on pixels for which
3309the interpolated plane distances are >= 0.
3310
3311For cull distances:
3312Primitives will be completely discarded if the plane distance
3313for all of the vertices in the primitive are < 0.
3314If a vertex has a cull distance of NaN, that vertex counts as "out"
3315(as if its < 0);
3316
3317Multiple clip/cull planes can be implemented simultaneously, by
3318annotating multiple components of one or more vertex elements with
3319the above specified semantic.
3320The limits on both clip and cull distances are bound
3321by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
3322the maximum number of components that can be used to hold the
3323distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
3324which specifies the maximum number of registers which can be
3325annotated with those semantics.
3326The properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED
3327are used to divide up the 2 x vec4 space between clipping and culling.
3328
3329TGSI_SEMANTIC_SAMPLEID
3330""""""""""""""""""""""
3331
3332For fragment shaders, this semantic label indicates that a system value
3333contains the current sample id (i.e. gl_SampleID) as an unsigned int.
3334Only the X component is used.  If per-sample shading is not enabled,
3335the result is (0, undef, undef, undef).
3336
3337Note that if the fragment shader uses this system value, the fragment
3338shader is automatically executed at per sample frequency.
3339
3340TGSI_SEMANTIC_SAMPLEPOS
3341"""""""""""""""""""""""
3342
3343For fragment shaders, this semantic label indicates that a system
3344value contains the current sample's position as float4(x, y, undef, undef)
3345in the render target (i.e.  gl_SamplePosition) when per-fragment shading
3346is in effect.  Position values are in the range [0, 1] where 0.5 is
3347the center of the fragment.
3348
3349Note that if the fragment shader uses this system value, the fragment
3350shader is automatically executed at per sample frequency.
3351
3352TGSI_SEMANTIC_SAMPLEMASK
3353""""""""""""""""""""""""
3354
3355For fragment shaders, this semantic label can be applied to either a
3356shader system value input or output.
3357
3358For a system value, the sample mask indicates the set of samples covered by
3359the current primitive.  If MSAA is not enabled, the value is (1, 0, 0, 0).
3360
3361For an output, the sample mask is used to disable further sample processing.
3362
3363For both, the register type is uint[4] but only the X component is used
3364(i.e. gl_SampleMask[0]). Each bit corresponds to one sample position (up
3365to 32x MSAA is supported).
3366
3367TGSI_SEMANTIC_INVOCATIONID
3368""""""""""""""""""""""""""
3369
3370For geometry shaders, this semantic label indicates that a system value
3371contains the current invocation id (i.e. gl_InvocationID).
3372This is an integer value, and only the X component is used.
3373
3374TGSI_SEMANTIC_INSTANCEID
3375""""""""""""""""""""""""
3376
3377For vertex shaders, this semantic label indicates that a system value contains
3378the current instance id (i.e. gl_InstanceID). It does not include the base
3379instance. This is an integer value, and only the X component is used.
3380
3381TGSI_SEMANTIC_VERTEXID
3382""""""""""""""""""""""
3383
3384For vertex shaders, this semantic label indicates that a system value contains
3385the current vertex id (i.e. gl_VertexID). It does (unlike in d3d10) include the
3386base vertex. This is an integer value, and only the X component is used.
3387
3388TGSI_SEMANTIC_VERTEXID_NOBASE
3389"""""""""""""""""""""""""""""""
3390
3391For vertex shaders, this semantic label indicates that a system value contains
3392the current vertex id without including the base vertex (this corresponds to
3393d3d10 vertex id, so TGSI_SEMANTIC_VERTEXID_NOBASE + TGSI_SEMANTIC_BASEVERTEX
3394== TGSI_SEMANTIC_VERTEXID). This is an integer value, and only the X component
3395is used.
3396
3397TGSI_SEMANTIC_BASEVERTEX
3398""""""""""""""""""""""""
3399
3400For vertex shaders, this semantic label indicates that a system value contains
3401the base vertex (i.e. gl_BaseVertex). Note that for non-indexed draw calls,
3402this contains the first (or start) value instead.
3403This is an integer value, and only the X component is used.
3404
3405TGSI_SEMANTIC_PRIMID
3406""""""""""""""""""""
3407
3408For geometry and fragment shaders, this semantic label indicates the value
3409contains the primitive id (i.e. gl_PrimitiveID). This is an integer value,
3410and only the X component is used.
3411FIXME: This right now can be either a ordinary input or a system value...
3412
3413
3414TGSI_SEMANTIC_PATCH
3415"""""""""""""""""""
3416
3417For tessellation evaluation/control shaders, this semantic label indicates a
3418generic per-patch attribute. Such semantics will not implicitly be per-vertex
3419arrays.
3420
3421TGSI_SEMANTIC_TESSCOORD
3422"""""""""""""""""""""""
3423
3424For tessellation evaluation shaders, this semantic label indicates the
3425coordinates of the vertex being processed. This is available in XYZ; W is
3426undefined.
3427
3428TGSI_SEMANTIC_TESSOUTER
3429"""""""""""""""""""""""
3430
3431For tessellation evaluation/control shaders, this semantic label indicates the
3432outer tessellation levels of the patch. Isoline tessellation will only have XY
3433defined, triangle will have XYZ and quads will have XYZW defined. This
3434corresponds to gl_TessLevelOuter.
3435
3436TGSI_SEMANTIC_TESSINNER
3437"""""""""""""""""""""""
3438
3439For tessellation evaluation/control shaders, this semantic label indicates the
3440inner tessellation levels of the patch. The X value is only defined for
3441triangle tessellation, while quads will have XY defined. This is entirely
3442undefined for isoline tessellation.
3443
3444TGSI_SEMANTIC_VERTICESIN
3445""""""""""""""""""""""""
3446
3447For tessellation evaluation/control shaders, this semantic label indicates the
3448number of vertices provided in the input patch. Only the X value is defined.
3449
3450TGSI_SEMANTIC_HELPER_INVOCATION
3451"""""""""""""""""""""""""""""""
3452
3453For fragment shaders, this semantic indicates whether the current
3454invocation is covered or not. Helper invocations are created in order
3455to properly compute derivatives, however it may be desirable to skip
3456some of the logic in those cases. See ``gl_HelperInvocation`` documentation.
3457
3458TGSI_SEMANTIC_BASEINSTANCE
3459""""""""""""""""""""""""""
3460
3461For vertex shaders, the base instance argument supplied for this
3462draw. This is an integer value, and only the X component is used.
3463
3464TGSI_SEMANTIC_DRAWID
3465""""""""""""""""""""
3466
3467For vertex shaders, the zero-based index of the current draw in a
3468``glMultiDraw*`` invocation. This is an integer value, and only the X
3469component is used.
3470
3471
3472TGSI_SEMANTIC_WORK_DIM
3473""""""""""""""""""""""
3474
3475For compute shaders started via OpenCL this retrieves the work_dim
3476parameter to the clEnqueueNDRangeKernel call with which the shader
3477was started.
3478
3479
3480TGSI_SEMANTIC_GRID_SIZE
3481"""""""""""""""""""""""
3482
3483For compute shaders, this semantic indicates the maximum (x, y, z) dimensions
3484of a grid of thread blocks.
3485
3486
3487TGSI_SEMANTIC_BLOCK_ID
3488""""""""""""""""""""""
3489
3490For compute shaders, this semantic indicates the (x, y, z) coordinates of the
3491current block inside of the grid.
3492
3493
3494TGSI_SEMANTIC_BLOCK_SIZE
3495""""""""""""""""""""""""
3496
3497For compute shaders, this semantic indicates the maximum (x, y, z) dimensions
3498of a block in threads.
3499
3500
3501TGSI_SEMANTIC_THREAD_ID
3502"""""""""""""""""""""""
3503
3504For compute shaders, this semantic indicates the (x, y, z) coordinates of the
3505current thread inside of the block.
3506
3507
3508TGSI_SEMANTIC_SUBGROUP_SIZE
3509"""""""""""""""""""""""""""
3510
3511This semantic indicates the subgroup size for the current invocation. This is
3512an integer of at most 64, as it indicates the width of lanemasks. It does not
3513depend on the number of invocations that are active.
3514
3515
3516TGSI_SEMANTIC_SUBGROUP_INVOCATION
3517"""""""""""""""""""""""""""""""""
3518
3519The index of the current invocation within its subgroup.
3520
3521
3522TGSI_SEMANTIC_SUBGROUP_EQ_MASK
3523""""""""""""""""""""""""""""""
3524
3525A bit mask of ``bit index == TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3526``1 << subgroup_invocation`` in arbitrary precision arithmetic.
3527
3528
3529TGSI_SEMANTIC_SUBGROUP_GE_MASK
3530""""""""""""""""""""""""""""""
3531
3532A bit mask of ``bit index >= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3533``((1 << (subgroup_size - subgroup_invocation)) - 1) << subgroup_invocation``
3534in arbitrary precision arithmetic.
3535
3536
3537TGSI_SEMANTIC_SUBGROUP_GT_MASK
3538""""""""""""""""""""""""""""""
3539
3540A bit mask of ``bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3541``((1 << (subgroup_size - subgroup_invocation - 1)) - 1) << (subgroup_invocation + 1)``
3542in arbitrary precision arithmetic.
3543
3544
3545TGSI_SEMANTIC_SUBGROUP_LE_MASK
3546""""""""""""""""""""""""""""""
3547
3548A bit mask of ``bit index <= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3549``(1 << (subgroup_invocation + 1)) - 1`` in arbitrary precision arithmetic.
3550
3551
3552TGSI_SEMANTIC_SUBGROUP_LT_MASK
3553""""""""""""""""""""""""""""""
3554
3555A bit mask of ``bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3556``(1 << subgroup_invocation) - 1`` in arbitrary precision arithmetic.
3557
3558
3559TGSI_SEMANTIC_VIEWPORT_MASK
3560"""""""""""""""""""""""""""
3561
3562A bit mask of viewports to broadcast the current primitive to. See
3563:ext:`GL_NV_viewport_array2` for more details.
3564
3565
3566TGSI_SEMANTIC_TESS_DEFAULT_OUTER_LEVEL
3567""""""""""""""""""""""""""""""""""""""
3568
3569A system value equal to the default_outer_level array set via set_tess_level.
3570
3571
3572TGSI_SEMANTIC_TESS_DEFAULT_INNER_LEVEL
3573""""""""""""""""""""""""""""""""""""""
3574
3575A system value equal to the default_inner_level array set via set_tess_level.
3576
3577
3578Declaration Interpolate
3579^^^^^^^^^^^^^^^^^^^^^^^
3580
3581This token is only valid for fragment shader INPUT declarations.
3582
3583The Interpolate field specifies the way input is being interpolated by
3584the rasterizer and is one of TGSI_INTERPOLATE_*.
3585
3586The Location field specifies the location inside the pixel that the
3587interpolation should be done at, one of ``TGSI_INTERPOLATE_LOC_*``. Note that
3588when per-sample shading is enabled, the implementation may choose to
3589interpolate at the sample irrespective of the Location field.
3590
3591
3592Declaration Sampler View
3593^^^^^^^^^^^^^^^^^^^^^^^^
3594
3595Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW.
3596
3597DCL SVIEW[#], resource, type(s)
3598
3599Declares a shader input sampler view and assigns it to a SVIEW[#]
3600register.
3601
3602resource can be one of BUFFER, 1D, 2D, 3D, 1D_ARRAY and 2D_ARRAY.
3603
3604type must be 1 or 4 entries (if specifying on a per-component
3605level) out of UNORM, SNORM, SINT, UINT and FLOAT.
3606
3607For TEX\* style texture sample opcodes (as opposed to SAMPLE\* opcodes
3608which take an explicit SVIEW[#] source register), there may be optionally
3609SVIEW[#] declarations.  In this case, the SVIEW index is implied by the
3610SAMP index, and there must be a corresponding SVIEW[#] declaration for
3611each SAMP[#] declaration.  Drivers are free to ignore this if they wish.
3612But note in particular that some drivers need to know the sampler type
3613(float/int/unsigned) in order to generate the correct code, so cases
3614where integer textures are sampled, SVIEW[#] declarations should be
3615used.
3616
3617NOTE: It is NOT legal to mix SAMPLE\* style opcodes and TEX\* opcodes
3618in the same shader.
3619
3620Declaration Resource
3621^^^^^^^^^^^^^^^^^^^^
3622
3623Follows Declaration token if file is TGSI_FILE_RESOURCE.
3624
3625DCL RES[#], resource [, WR] [, RAW]
3626
3627Declares a shader input resource and assigns it to a RES[#]
3628register.
3629
3630resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1D_ARRAY and
36312D_ARRAY.
3632
3633If the RAW keyword is not specified, the texture data will be
3634subject to conversion, swizzling and scaling as required to yield
3635the specified data type from the physical data format of the bound
3636resource.
3637
3638If the RAW keyword is specified, no channel conversion will be
3639performed: the values read for each of the channels (X,Y,Z,W) will
3640correspond to consecutive words in the same order and format
3641they're found in memory.  No element-to-address conversion will be
3642performed either: the value of the provided X coordinate will be
3643interpreted in byte units instead of texel units.  The result of
3644accessing a misaligned address is undefined.
3645
3646Usage of the STORE opcode is only allowed if the WR (writable) flag
3647is set.
3648
3649Hardware Atomic Register File
3650^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3651
3652Hardware atomics are declared as a 2D array with an optional array id.
3653
3654The first member of the dimension is the buffer resource the atomic
3655is located in.
3656The second member is a range into the buffer resource, either for
3657one or multiple counters. If this is an array, the declaration will have
3658an unique array id.
3659
3660Each counter is 4 bytes in size, and index and ranges are in counters not bytes.
3661DCL HWATOMIC[0][0]
3662DCL HWATOMIC[0][1]
3663
3664This declares two atomics, one at the start of the buffer and one in the
3665second 4 bytes.
3666
3667DCL HWATOMIC[0][0]
3668DCL HWATOMIC[1][0]
3669DCL HWATOMIC[1][1..3], ARRAY(1)
3670
3671This declares 5 atomics, one in buffer 0 at 0,
3672one in buffer 1 at 0, and an array of 3 atomics in
3673the buffer 1, starting at 1.
3674
3675Properties
3676^^^^^^^^^^^^^^^^^^^^^^^^
3677
3678Properties are general directives that apply to the whole TGSI program.
3679
3680FS_COORD_ORIGIN
3681"""""""""""""""
3682
3683Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
3684The default value is UPPER_LEFT.
3685
3686If UPPER_LEFT, the position will be (0,0) at the upper left corner and
3687increase downward and rightward.
3688If LOWER_LEFT, the position will be (0,0) at the lower left corner and
3689increase upward and rightward.
3690
3691OpenGL defaults to LOWER_LEFT, and is configurable with the
3692:ext:`GL_ARB_fragment_coord_conventions` extension.
3693
3694DirectX 9/10 use UPPER_LEFT.
3695
3696FS_COORD_PIXEL_CENTER
3697"""""""""""""""""""""
3698
3699Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
3700The default value is HALF_INTEGER.
3701
3702If HALF_INTEGER, the fractional part of the position will be 0.5
3703If INTEGER, the fractional part of the position will be 0.0
3704
3705Note that this does not affect the set of fragments generated by
3706rasterization, which is instead controlled by half_pixel_center in the
3707rasterizer.
3708
3709OpenGL defaults to HALF_INTEGER, and is configurable with the
3710:ext:`GL_ARB_fragment_coord_conventions` extension.
3711
3712DirectX 9 uses INTEGER.
3713DirectX 10 uses HALF_INTEGER.
3714
3715FS_COLOR0_WRITES_ALL_CBUFS
3716""""""""""""""""""""""""""
3717Specifies that writes to the fragment shader color 0 are replicated to all
3718bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where
3719fragData is directed to a single color buffer, but fragColor is broadcast.
3720
3721VS_PROHIBIT_UCPS
3722""""""""""""""""""""""""""
3723If this property is set on the program bound to the shader stage before the
3724fragment shader, user clip planes should have no effect (be disabled) even if
3725that shader does not write to any clip distance outputs and the rasterizer's
3726clip_plane_enable is non-zero.
3727This property is only supported by drivers that also support shader clip
3728distance outputs.
3729This is useful for APIs that don't have UCPs and where clip distances written
3730by a shader cannot be disabled.
3731
3732GS_INVOCATIONS
3733""""""""""""""
3734
3735Specifies the number of times a geometry shader should be executed for each
3736input primitive. Each invocation will have a different
3737TGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to
3738be 1.
3739
3740VS_WINDOW_SPACE_POSITION
3741""""""""""""""""""""""""""
3742If this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output
3743is assumed to contain window space coordinates.
3744Division of X,Y,Z by W and the viewport transformation are disabled, and 1/W is
3745directly taken from the 4-th component of the shader output.
3746Naturally, clipping is not performed on window coordinates either.
3747The effect of this property is undefined if a geometry or tessellation shader
3748are in use.
3749
3750TCS_VERTICES_OUT
3751""""""""""""""""
3752
3753The number of vertices written by the tessellation control shader. This
3754effectively defines the patch input size of the tessellation evaluation shader
3755as well.
3756
3757TES_PRIM_MODE
3758"""""""""""""
3759
3760This sets the tessellation primitive mode, one of ``MESA_PRIM_TRIANGLES``,
3761``MESA_PRIM_QUADS``, or ``MESA_PRIM_LINES``. (Unlike in GL, there is no
3762separate isolines settings, the regular lines is assumed to mean isolines.)
3763
3764TES_SPACING
3765"""""""""""
3766
3767This sets the spacing mode of the tessellation generator, one of
3768``PIPE_TESS_SPACING_*``.
3769
3770TES_VERTEX_ORDER_CW
3771"""""""""""""""""""
3772
3773This sets the vertex order to be clockwise if the value is 1, or
3774counter-clockwise if set to 0.
3775
3776TES_POINT_MODE
3777""""""""""""""
3778
3779If set to a non-zero value, this turns on point mode for the tessellator,
3780which means that points will be generated instead of primitives.
3781
3782NUM_CLIPDIST_ENABLED
3783""""""""""""""""""""
3784
3785How many clip distance scalar outputs are enabled.
3786
3787NUM_CULLDIST_ENABLED
3788""""""""""""""""""""
3789
3790How many cull distance scalar outputs are enabled.
3791
3792FS_EARLY_DEPTH_STENCIL
3793""""""""""""""""""""""
3794
3795Whether depth test, stencil test, and occlusion query should run before
3796the fragment shader (regardless of fragment shader side effects). Corresponds
3797to GLSL early_fragment_tests.
3798
3799NEXT_SHADER
3800"""""""""""
3801
3802Which shader stage will MOST LIKELY follow after this shader when the shader
3803is bound. This is only a hint to the driver and doesn't have to be precise.
3804Only set for VS and TES.
3805
3806CS_FIXED_BLOCK_WIDTH / HEIGHT / DEPTH
3807"""""""""""""""""""""""""""""""""""""
3808
3809Threads per block in each dimension, if known at compile time. If the block size
3810is known all three should be at least 1. If it is unknown they should all be set
3811to 0 or not set.
3812
3813LEGACY_MATH_RULES
3814"""""""""""""""""
3815
3816The MUL TGSI operation (FP32 multiplication) will return 0 if either
3817of the operands are equal to 0. That means that 0 * Inf = 0. This
3818should be set the same way for an entire pipeline. Note that this
3819applies not only to the literal MUL TGSI opcode, but all FP32
3820multiplications implied by other operations, such as MAD, FMA, DP2,
3821DP3, DP4, DST, LOG, LRP, and possibly others. If there is a
3822mismatch between shaders, then it is unspecified whether this behavior
3823will be enabled.
3824
3825FS_POST_DEPTH_COVERAGE
3826""""""""""""""""""""""
3827
3828When enabled, the input for TGSI_SEMANTIC_SAMPLEMASK will exclude samples
3829that have failed the depth/stencil tests. This is only valid when
3830FS_EARLY_DEPTH_STENCIL is also specified.
3831
3832LAYER_VIEWPORT_RELATIVE
3833"""""""""""""""""""""""
3834
3835When enabled, the TGSI_SEMATNIC_LAYER output value is relative to the
3836current viewport. This is especially useful in conjunction with
3837TGSI_SEMANTIC_VIEWPORT_MASK.
3838
3839
3840Texture Sampling and Texture Formats
3841------------------------------------
3842
3843This table shows how texture image components are returned as (x,y,z,w) tuples
3844by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
3845:opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
3846well.
3847
3848+--------------------+--------------+--------------------+--------------+
3849| Texture Components | Gallium      | OpenGL             | Direct3D 9   |
3850+====================+==============+====================+==============+
3851| R                  | (r, 0, 0, 1) | (r, 0, 0, 1)       | (r, 1, 1, 1) |
3852+--------------------+--------------+--------------------+--------------+
3853| RG                 | (r, g, 0, 1) | (r, g, 0, 1)       | (r, g, 1, 1) |
3854+--------------------+--------------+--------------------+--------------+
3855| RGB                | (r, g, b, 1) | (r, g, b, 1)       | (r, g, b, 1) |
3856+--------------------+--------------+--------------------+--------------+
3857| RGBA               | (r, g, b, a) | (r, g, b, a)       | (r, g, b, a) |
3858+--------------------+--------------+--------------------+--------------+
3859| A                  | (0, 0, 0, a) | (0, 0, 0, a)       | (0, 0, 0, a) |
3860+--------------------+--------------+--------------------+--------------+
3861| L                  | (l, l, l, 1) | (l, l, l, 1)       | (l, l, l, 1) |
3862+--------------------+--------------+--------------------+--------------+
3863| LA                 | (l, l, l, a) | (l, l, l, a)       | (l, l, l, a) |
3864+--------------------+--------------+--------------------+--------------+
3865| I                  | (i, i, i, i) | (i, i, i, i)       | N/A          |
3866+--------------------+--------------+--------------------+--------------+
3867| UV                 | XXX TBD      | (0, 0, 0, 1)       | (u, v, 1, 1) |
3868|                    |              | [#envmap-bumpmap]_ |              |
3869+--------------------+--------------+--------------------+--------------+
3870| Z                  | (z, z, z, z) | (z, z, z, 1)       | (0, z, 0, 1) |
3871|                    |              | [#depth-tex-mode]_ |              |
3872+--------------------+--------------+--------------------+--------------+
3873| S                  | (s, s, s, s) | unknown            | unknown      |
3874+--------------------+--------------+--------------------+--------------+
3875
3876.. [#envmap-bumpmap] https://registry.khronos.org/OpenGL/extensions/ATI/ATI_envmap_bumpmap.txt
3877.. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
3878   or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.
3879