xref: /aosp_15_r20/external/mesa3d/src/panfrost/compiler/Notes.txt (revision 6104692788411f58d303aa86923a9ff6ecaded22)
1*61046927SAndroid Build Coastguard Worker# Notes on opcodes
2*61046927SAndroid Build Coastguard Worker
3*61046927SAndroid Build Coastguard Worker_Notes mainly by Connor Abbott extracted from the disassembler_
4*61046927SAndroid Build Coastguard Worker
5*61046927SAndroid Build Coastguard WorkerLOG_FREXPM:
6*61046927SAndroid Build Coastguard Worker
7*61046927SAndroid Build Coastguard Worker        // From the ARM patent US20160364209A1:
8*61046927SAndroid Build Coastguard Worker        // "Decompose v (the input) into numbers x1 and s such that v = x1 * 2^s,
9*61046927SAndroid Build Coastguard Worker        // and x1 is a floating point value in a predetermined range where the
10*61046927SAndroid Build Coastguard Worker        // value 1 is within the range and not at one extremity of the range (e.g.
11*61046927SAndroid Build Coastguard Worker        // choose a range where 1 is towards middle of range)."
12*61046927SAndroid Build Coastguard Worker        //
13*61046927SAndroid Build Coastguard Worker        // This computes x1.
14*61046927SAndroid Build Coastguard Worker
15*61046927SAndroid Build Coastguard WorkerFRCP_FREXPM:
16*61046927SAndroid Build Coastguard Worker
17*61046927SAndroid Build Coastguard Worker        // Given a floating point number m * 2^e, returns m * 2^{-1}. This is
18*61046927SAndroid Build Coastguard Worker        // exactly the same as the mantissa part of frexp().
19*61046927SAndroid Build Coastguard Worker
20*61046927SAndroid Build Coastguard WorkerFSQRT_FREXPM:
21*61046927SAndroid Build Coastguard Worker        // Given a floating point number m * 2^e, returns m * 2^{-2} if e is even,
22*61046927SAndroid Build Coastguard Worker        // and m * 2^{-1} if e is odd. In other words, scales by powers of 4 until
23*61046927SAndroid Build Coastguard Worker        // within the range [0.25, 1). Used for square-root and reciprocal
24*61046927SAndroid Build Coastguard Worker        // square-root.
25*61046927SAndroid Build Coastguard Worker
26*61046927SAndroid Build Coastguard Worker
27*61046927SAndroid Build Coastguard Worker
28*61046927SAndroid Build Coastguard Worker
29*61046927SAndroid Build Coastguard WorkerFRCP_FREXPE:
30*61046927SAndroid Build Coastguard Worker        // Given a floating point number m * 2^e, computes -e - 1 as an integer.
31*61046927SAndroid Build Coastguard Worker        // Zero and infinity/NaN return 0.
32*61046927SAndroid Build Coastguard Worker
33*61046927SAndroid Build Coastguard WorkerFSQRT_FREXPE:
34*61046927SAndroid Build Coastguard Worker        // Computes floor(e/2) + 1.
35*61046927SAndroid Build Coastguard Worker
36*61046927SAndroid Build Coastguard WorkerFRSQ_FREXPE:
37*61046927SAndroid Build Coastguard Worker        // Given a floating point number m * 2^e, computes -floor(e/2) - 1 as an
38*61046927SAndroid Build Coastguard Worker        // integer.
39*61046927SAndroid Build Coastguard Worker
40*61046927SAndroid Build Coastguard WorkerLSHIFT_ADD_LOW32:
41*61046927SAndroid Build Coastguard Worker        // These instructions in the FMA slot, together with LSHIFT_ADD_HIGH32.i32
42*61046927SAndroid Build Coastguard Worker        // in the ADD slot, allow one to do a 64-bit addition with an extra small
43*61046927SAndroid Build Coastguard Worker        // shift on one of the sources. There are three possible scenarios:
44*61046927SAndroid Build Coastguard Worker        //
45*61046927SAndroid Build Coastguard Worker        // 1) Full 64-bit addition. Do:
46*61046927SAndroid Build Coastguard Worker        // out.x = LSHIFT_ADD_LOW32.i64 src1.x, src2.x, shift
47*61046927SAndroid Build Coastguard Worker        // out.y = LSHIFT_ADD_HIGH32.i32 src1.y, src2.y
48*61046927SAndroid Build Coastguard Worker        //
49*61046927SAndroid Build Coastguard Worker        // The shift amount is applied to src2 before adding. The shift amount, and
50*61046927SAndroid Build Coastguard Worker        // any extra bits from src2 plus the overflow bit, are sent directly from
51*61046927SAndroid Build Coastguard Worker        // FMA to ADD instead of being passed explicitly. Hence, these two must be
52*61046927SAndroid Build Coastguard Worker        // bundled together into the same instruction.
53*61046927SAndroid Build Coastguard Worker        //
54*61046927SAndroid Build Coastguard Worker        // 2) Add a 64-bit value src1 to a zero-extended 32-bit value src2. Do:
55*61046927SAndroid Build Coastguard Worker        // out.x = LSHIFT_ADD_LOW32.u32 src1.x, src2, shift
56*61046927SAndroid Build Coastguard Worker        // out.y = LSHIFT_ADD_HIGH32.i32 src1.x, 0
57*61046927SAndroid Build Coastguard Worker        //
58*61046927SAndroid Build Coastguard Worker        // Note that in this case, the second argument to LSHIFT_ADD_HIGH32 is
59*61046927SAndroid Build Coastguard Worker        // ignored, so it can actually be anything. As before, the shift is applied
60*61046927SAndroid Build Coastguard Worker        // to src2 before adding.
61*61046927SAndroid Build Coastguard Worker        //
62*61046927SAndroid Build Coastguard Worker        // 3) Add a 64-bit value to a sign-extended 32-bit value src2. Do:
63*61046927SAndroid Build Coastguard Worker        // out.x = LSHIFT_ADD_LOW32.i32 src1.x, src2, shift
64*61046927SAndroid Build Coastguard Worker        // out.y = LSHIFT_ADD_HIGH32.i32 src1.x, 0
65*61046927SAndroid Build Coastguard Worker        //
66*61046927SAndroid Build Coastguard Worker        // The only difference is the .i32 instead of .u32. Otherwise, this is
67*61046927SAndroid Build Coastguard Worker        // exactly the same as before.
68*61046927SAndroid Build Coastguard Worker        //
69*61046927SAndroid Build Coastguard Worker        // In all these instructions, the shift amount is stored where the third
70*61046927SAndroid Build Coastguard Worker        // source would be, so the shift has to be a small immediate from 0 to 7.
71*61046927SAndroid Build Coastguard Worker        // This is fine for the expected use-case of these instructions, which is
72*61046927SAndroid Build Coastguard Worker        // manipulating 64-bit pointers.
73*61046927SAndroid Build Coastguard Worker        //
74*61046927SAndroid Build Coastguard Worker        // These instructions can also be combined with various load/store
75*61046927SAndroid Build Coastguard Worker        // instructions which normally take a 64-bit pointer in order to add a
76*61046927SAndroid Build Coastguard Worker        // 32-bit or 64-bit offset to the pointer before doing the operation,
77*61046927SAndroid Build Coastguard Worker        // optionally shifting the offset. The load/store op implicity does
78*61046927SAndroid Build Coastguard Worker        // LSHIFT_ADD_HIGH32.i32 internally. Letting ptr be the pointer, and offset
79*61046927SAndroid Build Coastguard Worker        // the desired offset, the cases go as follows:
80*61046927SAndroid Build Coastguard Worker        //
81*61046927SAndroid Build Coastguard Worker        // 1) Add a 64-bit offset:
82*61046927SAndroid Build Coastguard Worker        // LSHIFT_ADD_LOW32.i64 ptr.x, offset.x, shift
83*61046927SAndroid Build Coastguard Worker        // ld_st_op ptr.y, offset.y, ...
84*61046927SAndroid Build Coastguard Worker        //
85*61046927SAndroid Build Coastguard Worker        // Note that the output of LSHIFT_ADD_LOW32.i64 is not used, instead being
86*61046927SAndroid Build Coastguard Worker        // implicitly sent to the load/store op to serve as the low 32 bits of the
87*61046927SAndroid Build Coastguard Worker        // pointer.
88*61046927SAndroid Build Coastguard Worker        //
89*61046927SAndroid Build Coastguard Worker        // 2) Add a 32-bit unsigned offset:
90*61046927SAndroid Build Coastguard Worker        // temp = LSHIFT_ADD_LOW32.u32 ptr.x, offset, shift
91*61046927SAndroid Build Coastguard Worker        // ld_st_op temp, ptr.y, ...
92*61046927SAndroid Build Coastguard Worker        //
93*61046927SAndroid Build Coastguard Worker        // Now, the low 32 bits of offset << shift + ptr are passed explicitly to
94*61046927SAndroid Build Coastguard Worker        // the ld_st_op, to match the case where there is no offset and ld_st_op is
95*61046927SAndroid Build Coastguard Worker        // called directly.
96*61046927SAndroid Build Coastguard Worker        //
97*61046927SAndroid Build Coastguard Worker        // 3) Add a 32-bit signed offset:
98*61046927SAndroid Build Coastguard Worker        // temp = LSHIFT_ADD_LOW32.i32 ptr.x, offset, shift
99*61046927SAndroid Build Coastguard Worker        // ld_st_op temp, ptr.y, ...
100*61046927SAndroid Build Coastguard Worker        //
101*61046927SAndroid Build Coastguard Worker        // Again, the same as the unsigned case except for the offset.
102*61046927SAndroid Build Coastguard Worker
103*61046927SAndroid Build Coastguard Worker---
104*61046927SAndroid Build Coastguard Worker
105*61046927SAndroid Build Coastguard WorkerADD ops..
106*61046927SAndroid Build Coastguard Worker
107*61046927SAndroid Build Coastguard WorkerF16_TO_F32.X: // take the low  16 bits, and expand it to a 32-bit float
108*61046927SAndroid Build Coastguard WorkerF16_TO_F32.Y: // take the high 16 bits, and expand it to a 32-bit float
109*61046927SAndroid Build Coastguard Worker
110*61046927SAndroid Build Coastguard WorkerMOV:
111*61046927SAndroid Build Coastguard Worker        // Logically, this should be SWZ.XY, but that's equivalent to a move, and
112*61046927SAndroid Build Coastguard Worker        // this seems to be the canonical way the blob generates a MOV.
113*61046927SAndroid Build Coastguard Worker
114*61046927SAndroid Build Coastguard Worker
115*61046927SAndroid Build Coastguard WorkerFRCP_FREXPM:
116*61046927SAndroid Build Coastguard Worker        // Given a floating point number m * 2^e, returns m ^ 2^{-1}.
117*61046927SAndroid Build Coastguard Worker
118*61046927SAndroid Build Coastguard WorkerFLOG_FREXPE:
119*61046927SAndroid Build Coastguard Worker        // From the ARM patent US20160364209A1:
120*61046927SAndroid Build Coastguard Worker        // "Decompose v (the input) into numbers x1 and s such that v = x1 * 2^s,
121*61046927SAndroid Build Coastguard Worker        // and x1 is a floating point value in a predetermined range where the
122*61046927SAndroid Build Coastguard Worker        // value 1 is within the range and not at one extremity of the range (e.g.
123*61046927SAndroid Build Coastguard Worker        // choose a range where 1 is towards middle of range)."
124*61046927SAndroid Build Coastguard Worker        //
125*61046927SAndroid Build Coastguard Worker        // This computes s.
126*61046927SAndroid Build Coastguard Worker
127*61046927SAndroid Build Coastguard WorkerLD_UBO.v4i32
128*61046927SAndroid Build Coastguard Worker        // src0 = offset, src1 = binding
129*61046927SAndroid Build Coastguard Worker
130*61046927SAndroid Build Coastguard WorkerFRCP_FAST.f32:
131*61046927SAndroid Build Coastguard Worker        // *_FAST does not exist on G71 (added to G51, G72, and everything after)
132*61046927SAndroid Build Coastguard Worker
133*61046927SAndroid Build Coastguard WorkerFRCP_TABLE
134*61046927SAndroid Build Coastguard Worker        // Given a floating point number m * 2^e, produces a table-based
135*61046927SAndroid Build Coastguard Worker        // approximation of 2/m using the top 17 bits. Includes special cases for
136*61046927SAndroid Build Coastguard Worker        // infinity, NaN, and zero, and copies the sign bit.
137*61046927SAndroid Build Coastguard Worker
138*61046927SAndroid Build Coastguard WorkerFRCP_FAST.f16.X
139*61046927SAndroid Build Coastguard Worker        // Exists on G71
140*61046927SAndroid Build Coastguard Worker
141*61046927SAndroid Build Coastguard WorkerFRSQ_TABLE:
142*61046927SAndroid Build Coastguard Worker        // A similar table for inverse square root, using the high 17 bits of the
143*61046927SAndroid Build Coastguard Worker        // mantissa as well as the low bit of the exponent.
144*61046927SAndroid Build Coastguard Worker
145*61046927SAndroid Build Coastguard WorkerFRCP_APPROX:
146*61046927SAndroid Build Coastguard Worker        // Used in the argument reduction for log. Given a floating-point number
147*61046927SAndroid Build Coastguard Worker        // m * 2^e, uses the top 4 bits of m to produce an approximation to 1/m
148*61046927SAndroid Build Coastguard Worker        // with the exponent forced to 0 and only the top 5 bits are nonzero. 0,
149*61046927SAndroid Build Coastguard Worker        // infinity, and NaN all return 1.0.
150*61046927SAndroid Build Coastguard Worker        // See the ARM patent for more information.
151*61046927SAndroid Build Coastguard Worker
152*61046927SAndroid Build Coastguard WorkerMUX:
153*61046927SAndroid Build Coastguard Worker        // For each bit i, return src2[i] ? src0[i] : src1[i]. In other words, this
154*61046927SAndroid Build Coastguard Worker        // is the same as (src2 & src0) | (~src2 & src1).
155*61046927SAndroid Build Coastguard Worker
156*61046927SAndroid Build Coastguard WorkerST_VAR:
157*61046927SAndroid Build Coastguard Worker        // store a varying given the address and datatype from LD_VAR_ADDR
158*61046927SAndroid Build Coastguard Worker
159*61046927SAndroid Build Coastguard WorkerLD_VAR_ADDR:
160*61046927SAndroid Build Coastguard Worker        // Compute varying address and datatype (for storing in the vertex shader),
161*61046927SAndroid Build Coastguard Worker        // and store the vec3 result in the data register. The result is passed as
162*61046927SAndroid Build Coastguard Worker        // the 3 normal arguments to ST_VAR.
163*61046927SAndroid Build Coastguard Worker
164*61046927SAndroid Build Coastguard WorkerDISCARD
165*61046927SAndroid Build Coastguard Worker        // Conditional discards (discard_if) in NIR. Compares the first two
166*61046927SAndroid Build Coastguard Worker        // sources and discards if the result is true
167*61046927SAndroid Build Coastguard Worker
168*61046927SAndroid Build Coastguard WorkerATEST.f32:
169*61046927SAndroid Build Coastguard Worker        // Implements alpha-to-coverage, as well as possibly the late depth and
170*61046927SAndroid Build Coastguard Worker        // stencil tests. The first source is the existing sample mask in R60
171*61046927SAndroid Build Coastguard Worker        // (possibly modified by gl_SampleMask), and the second source is the alpha
172*61046927SAndroid Build Coastguard Worker        // value.  The sample mask is written right away based on the
173*61046927SAndroid Build Coastguard Worker        // alpha-to-coverage result using the normal register write mechanism,
174*61046927SAndroid Build Coastguard Worker        // since that doesn't need to read from any memory, and then written again
175*61046927SAndroid Build Coastguard Worker        // later based on the result of the stencil and depth tests using the
176*61046927SAndroid Build Coastguard Worker        // special register.
177*61046927SAndroid Build Coastguard Worker
178*61046927SAndroid Build Coastguard WorkerBLEND:
179*61046927SAndroid Build Coastguard Worker        // This takes the sample coverage mask (computed by ATEST above) as a
180*61046927SAndroid Build Coastguard Worker        // regular argument, in addition to the vec4 color in the special register.
181