1# Occlusion queries in the Metal back-end 2 3- OpenGL allows occlusion query to start and end at any time. 4- On the other hand, Metal only allows occlusion query (called visibility result test) to begin and 5 end within a render pass. 6- Furthermore, a visibility result buffer must be set before encoding the render pass. This buffer 7 must be allocated beforehand and will be used to store the visibility results of all queries 8 within the render pass. Each query uses one offset within the buffer to store the result. Once the 9 render pass's encoding starts, this buffer must not be changed. 10- The visibility result buffer will always be reset to zeros within the render pass. 11 12### Previous implementation 13- Metal back-end object `RenderCommandEncoder`'s method restart() will create an instance of Metal 14 framework's native object `MTLRenderCommandEncoder` immediately to start encoding a render pass. 15- Afterwards, calling `RenderCommandEncoder`'s functions such as draw(), setBuffer(), setTexture(), 16 etc will invoke the equivalent `MTLRenderCommandEncoder`'s methods. 17- The render pass's encoding ends when `RenderCommandEncoder.endEncoding()` is called. 18 19### Current implementation 20 21- `MTLRenderCommandEncoder` creation will be deferred until all information about the render pass 22 have been recorded and known to the Metal backend. 23- Invoking `RenderCommandEncoder`'s methods such as draw(), setVisibilityResultMode(), setBuffer(), 24 etc will be recorded in a back-end owned buffer instead of encoding directly into an 25 `MTLRenderCommandEncoder` object. 26- Whenever an occlusion query starts, an offset within a visibility result buffer will be allocated 27 for this request. This offset is valid only for the current render pass. The visibility buffer's 28 capacity will be extended if needed to have enough storage for all the queries within the render 29 pass. The offset will be used to activate the visibility test in the render pass. 30- Calling `RenderCommandEncoder.endEncoding()` will: 31 - Bind the visibility result buffer allocated above. 32 - Create an `MTLRenderCommandEncoder` object. 33 - Encode using all render commands memorized in the back-end owned buffer. 34- Immediately after `RenderCommandEncoder.endEncoding()`: 35 - An extra compute shader or copying pass is added to copy the results from visibility result 36 buffer to the respective assigned occlusion queries' buffers. Each query will simply copy the 37 value from its respective allocated offset in the visibility buffer. 38 - Note that if the query spans across multiple render passes, its old value will be accumulated 39 with the result stored in the visibility result buffer instead of being replaced. 40- Special cases: 41 - If user calls `glClear` between `glBeginQuery` - `glEndQuery` pair, its pixels should not be 42 counted by the occlusion test. To avoid this, current visibility test will end, then another 43 offset in the visibility buffer will be allocated for the query, this new offset will be used 44 to continue the test after the `glClear` operation ends. In the final step, the values stored 45 in both the old offset and the new offset will be accumulated together. 46 - If user calls `glBeginQuery` then `glEndQuery` then `glBeginQuery` again within a single pass, 47 then the query will be allocated 2 offsets since Metal doesn't allow an offset to be re-used 48 in a render pass. Only the value stored in the 2nd offset will be copied back to the query at 49 the end of the render pass though. 50 51### Future improvements 52- One could simply allocates an offset within the visibility result buffer permanently for a query. 53 Then the extra copy step at the end of the render pass could be removed. 54- However, doing so means the visibility result buffer would be very large in order to store every 55 query object created. Even if the query object might never be activated in a render pass. 56- Furthermore, in order for the client to read back the result of a query, a host memory 57 synchronization for the visibility result buffer must be inserted. This could be slow if the 58 buffer is huge, and there are some offsets within the buffer are inactive within a render pass, 59 thus it is a wasteful synchronization.