1Buffer mapping patterns 2----------------------- 3 4There are two main strategies the driver has for CPU access to GL buffer 5objects. One is that the GL calls allocate temporary storage and blit to the GPU 6at 7``glBufferSubData()``/``glBufferData()``/``glFlushMappedBufferRange()``/``glUnmapBuffer()`` 8time. This makes the behavior easily match. However, this may be more costly 9than direct mapping of the GL BO on some platforms, and is essentially not 10available to tiling GPUs (since tiling involves running through the command 11stream multiple times). Thus, GL has additional interfaces to help make it so 12apps can directly access memory while avoiding implicit blocking on the GPU 13rendering from those BOs. 14 15Rendering engines have a variety of knobs to set on those GL interfaces for data 16upload, and as a whole they seem to take just about every path available. Let's 17look at some examples to see how they might constrain GL driver buffer upload 18behavior. 19 20Portal 2 21======== 22 23.. code-block:: text 24 25 1030842 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540) 26 1030876 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 65536, data = NULL, usage = GL_DYNAMIC_DRAW) 27 1030877 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, size = 576, data = blob(576)) 28 1030896 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 526, count = 252, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0) 29 1030915 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 19657, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x1f8, basevertex = 0) 30 1030917 glBufferDataARB(target = GL_ARRAY_BUFFER, size = 1572864, data = NULL, usage = GL_DYNAMIC_DRAW) 31 1030918 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 128, data = blob(128)) 32 1030919 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 576, size = 12, data = blob(12)) 33 1030936 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x240, basevertex = 0) 34 1030937 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 128, size = 128, data = blob(128)) 35 1030938 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 588, size = 12, data = blob(12)) 36 1030940 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 4, end = 7, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x24c, basevertex = 0) 37 [... repeated draws at increasing offsets] 38 1033097 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540) 39 40From this sequence, we can see that it is important that the driver either 41implement ``glBufferSubData()`` as a blit from a streaming uploader in sequence with 42the ``glDraw*()`` calls (a common behavior for non-tiled GPUs, particularly those with 43dedicated memory), or that you: 44 451) Track the valid range of the buffer so that you don't have to flush the draws 46 and synchronize on each following ``glBufferSubData()``. 47 482) Reallocate the buffer storage on ``glBufferData`` so that your first 49 ``glBufferSubData()`` of the frame doesn't stall on the last frame's 50 rendering completing. 51 52You can't just empty your valid range on ``glBufferData()`` unless you know that 53the GPU access from the previous frame has completed. This pattern of 54incrementing ``glBufferSubData()`` offsets interleaved with draws from that data 55is common among newer Valve games. 56 57.. code-block:: text 58 59 [ during setup ] 60 61 679259 glGenBuffersARB(n = 1, buffers = &1314) 62 679260 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314) 63 679261 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 3072, data = NULL, usage = GL_STATIC_DRAW) 64 679264 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000 65 679269 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072) 66 679270 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE 67 68 [... setup of other buffers on this binding point] 69 70 679343 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314) 71 679344 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000 72 679346 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768) 73 679347 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE 74 679348 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 768, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384300 75 679350 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768) 76 679351 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE 77 679352 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 1536, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384600 78 679354 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768) 79 679355 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE 80 679356 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 2304, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384900 81 679358 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768) 82 679359 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE 83 84 [... setup completes and we start drawing later] 85 86 761845 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314) 87 761846 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 323, count = 384, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0) 88 89This suggests that, for non-blitting drivers, resetting your "might be used on 90the GPU" range after a stall could save you a bunch of additional GPU stalls 91during setup. 92 93Terraria 94======== 95 96.. code-block:: text 97 98 167581 glXSwapBuffers(dpy = 0x3004630, drawable = 25165844) 99 100 167585 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW) 101 167586 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 1728, data = blob(1728)) 102 167588 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 71, count = 108, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0) 103 167589 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW) 104 167590 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 27456, data = blob(27456)) 105 167592 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 7, count = 12, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0) 106 167594 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 8) 107 167596 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 12) 108 [...] 109 110In this game, we can see ``glBufferData()`` being used on the same array buffer 111throughout, to get new storage so that the ``glBufferSubData()`` doesn't cause 112synchronization. 113 114Don't Starve 115============ 116 117.. code-block:: text 118 119 7251917 glGenBuffers(n = 1, buffers = &115052) 120 7251918 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052) 121 7251919 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW) 122 7251921 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052) 123 7251928 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6) 124 7251930 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 114872) 125 7251936 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 18) 126 7251938 glGenBuffers(n = 1, buffers = &115053) 127 7251939 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053) 128 7251940 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW) 129 7251942 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053) 130 7251949 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6) 131 7251973 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540) 132 [... drawing next frame] 133 7252388 glDeleteBuffers(n = 1, buffers = &115052) 134 7252389 glDeleteBuffers(n = 1, buffers = &115053) 135 7252390 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540) 136 137In this game we have a lot of tiny ``glBufferData()`` calls, suggesting that we 138could see working set wins and possibly CPU overhead reduction by packing small 139GL buffers in the same BO. Interestingly, the deletes of the temporary buffers 140always happen at the end of the next frame. 141 142Euro Truck Simulator 143==================== 144 145.. code-block:: text 146 147 [usage of VBO 14,15] 148 [...] 149 885199 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527) 150 885203 glInvalidateBufferData(buffer = 14) 151 885204 glInvalidateBufferData(buffer = 15) 152 [...] 153 889330 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527) 154 889334 glInvalidateBufferData(buffer = 12) 155 889335 glInvalidateBufferData(buffer = 16) 156 [...] 157 893461 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527) 158 893462 glClientWaitSync(sync = 0x77eee10, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED 159 893463 glDeleteSync(sync = 0x780a630) 160 893464 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x78ec730 161 893465 glInvalidateBufferData(buffer = 13) 162 893466 glInvalidateBufferData(buffer = 17) 163 893505 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14) 164 893506 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1000 165 893508 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE 166 893509 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 15) 167 893510 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 32, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034e5df000 168 893512 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE 169 893532 glBindVertexBuffers(first = 0, count = 2, buffers = {10, 15}, offsets = {0, 0}, strides = {52, 16}) 170 893552 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131) 171 893609 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6) 172 893732 glBindVertexBuffers(first = 0, count = 1, buffers = &14, offsets = &0, strides = &48) 173 893733 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 14) 174 893744 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0xf0, basevertex = 0) 175 893759 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x2e0, basevertex = 6) 176 893786 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 600, type = GL_UNSIGNED_SHORT, indices = 0xe87b0, basevertex = 21515) 177 893822 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6) 178 893845 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14) 179 893846 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 788, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1314 180 893848 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE 181 893886 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131) 182 893943 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6) 183 184At the start of this frame, buffer 14 and 15 haven't been used in the previous 2 185frames, and the :ext:`GL_ARB_sync` fence has ensured that the GPU has at least started 186frame n-1 as the CPU starts the current frame. The first map is ``offset = 0, 187INVALIDATE_BUFFER | UNSYNCHRONIZED``, which suggests that the driver should 188reallocate storage for the mapping even in the ``UNSYNCHRONIZED`` case, except 189that the buffer is definitely going to be idle, making reallocation unnecessary 190(you may need to empty your valid range, though, to prevent unnecessary batch 191flushes). 192 193Also note the use of a totally unrelated binding point for the mapping of the 194vertex array -- you can't effectively use it as a hint for any buffer placement 195in memory. The game does also use ``glCopyBufferSubData()``, but only on a 196different buffer. 197 198 199Plague Inc 200========== 201 202.. code-block:: text 203 204 1640732 glXSwapBuffers(dpy = 0xb218f20, drawable = 23068674) 205 1640733 glClientWaitSync(sync = 0xb4141430, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED 206 1640734 glDeleteSync(sync = 0xb4141430) 207 1640735 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0xb4141430 208 209 1640780 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 78) 210 1640787 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 79) 211 1640788 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL) 212 1640795 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL) 213 1640813 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096) 214 1640814 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4000 215 1640815 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091) 216 1640816 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998000 217 1640817 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096) 218 1640819 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352) 219 1640820 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 220 1640821 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091) 221 1640823 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12) 222 1640824 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 223 1640825 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 1096) 224 1640831 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1091) 225 1640832 glDrawElements(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL) 226 227 1640847 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096) 228 1640848 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 352, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4160 229 1640849 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091) 230 1640850 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 88, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998058 231 1640851 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096) 232 1640853 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352) 233 1640854 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 234 1640855 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091) 235 1640857 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12) 236 1640858 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 237 1640863 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x58, basevertex = 4) 238 239At the start of this frame, the VBOs haven't been used in about 6 frames, and 240the :ext:`GL_ARB_sync` fence has ensured that the GPU has started frame n-1. 241 242Note the use of ``glFlushMappedBufferRange()`` on a small fraction of the size 243of the VBO -- it is important that a blitting driver make use of the flush 244ranges when in explicit mode. 245 246Darkest Dungeon 247=============== 248 249.. code-block:: text 250 251 938384 glXSwapBuffers(dpy = 0x377fcd0, drawable = 23068692) 252 253 938385 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2) 254 938386 glBufferData(target = GL_ARRAY_BUFFER, size = 1048576, data = NULL, usage = GL_STREAM_DRAW) 255 938511 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2) 256 938512 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000 257 938514 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 512) 258 938515 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE 259 938523 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1) 260 938524 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2) 261 938525 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = NULL) 262 938527 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2) 263 938528 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000 264 938530 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 512, length = 512) 265 938531 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE 266 938539 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1) 267 938540 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2) 268 938541 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x30) 269 [... more maps and draws at increasing offsets] 270 271Interesting note for this game, after the initial ``glBufferData()`` in the 272frame to reallocate the storage, it unsync maps the whole buffer each time, and 273just changes which region it flushes. The same GL buffer name is used in every 274frame. 275 276Tabletop Simulator 277================== 278 279.. code-block:: text 280 281 1287594 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692) 282 1287595 glClientWaitSync(sync = 0x7abf554e37b0, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED 283 1287596 glDeleteSync(sync = 0x7abf554e37b0) 284 1287597 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7abf56647490 285 286 1287614 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480) 287 1287615 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7abf2e79a000 288 1287642 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 614) 289 1287650 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 5) 290 1287651 glBufferSubData(target = GL_COPY_WRITE_BUFFER, offset = 0, size = 1088, data = blob(1088)) 291 1287652 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 615) 292 1287653 glDrawElements(mode = GL_TRIANGLES, count = 1788, type = GL_UNSIGNED_SHORT, indices = NULL) 293 [... more draw calls] 294 1289055 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480) 295 1289057 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384) 296 1289058 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 297 1289059 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 480) 298 1289066 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 12, count = 4) 299 1289068 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 8, count = 4) 300 1289553 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692) 301 302In this app, buffer 480 gets used like this every other frame. The :ext:`GL_ARB_sync` 303fence ensures that frame n-1 has started on the GPU before CPU work starts on 304the current frame, so the unsynchronized access to the buffers is safe. 305 306Hollow Knight 307============= 308 309.. code-block:: text 310 311 1873034 glXSwapBuffers(dpy = 0x28609d0, drawable = 23068692) 312 1873035 glClientWaitSync(sync = 0x7b1a5ca6e130, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED 313 1873036 glDeleteSync(sync = 0x7b1a5ca6e130) 314 1873037 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7b1a5ca6e130 315 1873038 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29) 316 1873039 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c7e000 317 1873040 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30) 318 1873041 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a07430000 319 1873065 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29) 320 1873067 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640) 321 1873068 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 322 1873069 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30) 323 1873071 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720) 324 1873072 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 325 1873073 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29) 326 1873074 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 8640, length = 576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c801c0 327 1873075 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30) 328 1873076 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 720, length = 72, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a074302d0 329 1873077 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29) 330 1873079 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 576) 331 1873080 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 332 1873081 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30) 333 1873083 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 72) 334 1873084 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 335 1873085 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 29) 336 1873096 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 30) 337 1873097 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x2d0, basevertex = 240) 338 339In this app, buffer 29/30 get used like this starting from offset 0 every other 340frame. The :ext:`GL_ARB_sync` fence is used to make sure that the GPU has reached the 341start of the previous frame before we go unsynchronized writing over the n-2 342frame's buffer. 343 344Borderlands 2 345============= 346 347.. code-block:: text 348 349 3561998 glFlush() 350 3562004 glXSwapBuffers(dpy = 0xbaf0f90, drawable = 23068705) 351 3562006 glClientWaitSync(sync = 0x231c2ab0, flags = GL_SYNC_FLUSH_COMMANDS_BIT, timeout = 10000000000) = GL_ALREADY_SIGNALED 352 3562007 glDeleteSync(sync = 0x231c2ab0) 353 3562008 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x231aadc0 354 355 3562050 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193) 356 3562051 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1792, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xde056000 357 3562053 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE 358 3562054 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1194) 359 3562055 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1280, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xd9426000 360 3562057 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE 361 [... unrelated draws] 362 3563051 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193) 363 3563064 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 875) 364 3563065 glDrawElementsInstancedARB(mode = GL_TRIANGLES, count = 72, type = GL_UNSIGNED_SHORT, indices = NULL, instancecount = 28) 365 366The :ext:`GL_ARB_sync` fence ensures that the GPU has started frame n-1 before the CPU 367starts on the current frame. 368 369This sequence of buffer uploads appears in each frame with the same buffer 370names, so you do need to handle the ``GL_MAP_INVALIDATE_BUFFER_BIT`` as a 371reallocate if the buffer is GPU-busy (it wasn't in this trace capture) to avoid 372stalls on the n-1 frame completing. 373 374Note that this is just one small buffer. Most of the vertex data goes through a 375``glBufferSubData()``/``glDraw*()`` path with the VBO used across multiple 376frames, with a ``glBufferData()`` when needing to wrap. 377 378Buffer mapping conclusions 379-------------------------- 380 381* Non-blitting drivers must track the valid range of a freshly allocated buffer 382 as it gets uploaded in ``pipe_transfer_map()`` and avoid stalling on the GPU 383 when mapping an undefined portion of the buffer when ``glBufferSubData()`` is 384 interleaved with drawing. 385 386* Non-blitting drivers must reallocate storage on ``glBufferData(NULL)`` so that 387 the following ``glBufferSubData()`` won't stall. That ``glBufferData(NULL)`` 388 call will appear in the driver as an ``invalidate_resource()`` call if 389 ``PIPE_CAP_INVALIDATE_BUFFER`` is available. (If that flag is not set, then 390 mesa/st will create a new pipe_resource for you). Storage reallocation may be 391 skipped if you for some reason know that the buffer is idle, in which case you 392 can just empty the valid region. 393 394* Blitting drivers must use the ``transfer_flush_region()`` region 395 instead of the mapped range when ``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid 396 blitting too much data. (When that bit is unset, you just blit the whole 397 mapped range at unmap time.) 398 399* Buffer valid range tracking in non-blitting drivers must use the 400 ``transfer_flush_region()`` region instead of the mapped range when 401 ``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid excess stalls. 402 403* Buffer valid range tracking doesn't need to be fancy, "number of bytes 404 valid starting from 0" is sufficient for all examples found. 405 406* Use the ``util_debug_callback`` to report stalls on buffer mapping to ease 407 debug. 408 409* Buffer binding points are not useful for tuning buffer placement (See all the 410 ``PIPE_COPY_WRITE_BUFFER`` instances), you have to track the actual usage 411 history of a GL BO name. mesa/st does this for optimizing its state updates 412 on reallocation in the ``!PIPE_CAP_INVALIDATE_BUFFER`` case, and if you set 413 ``PIPE_CAP_INVALIDATE_BUFFER`` then you have to flag your own internal state 414 updates (VBO addresses, XFB addresses, texture buffer addresses, etc.) on 415 reallocation based on usage history. 416