1 2Platform: NVIDIA CUDA 3 Device: Quadro GV100 4 Driver version : 455.23.05 (Linux x64) 5 Compute units : 80 6 Clock frequency : 1627 MHz 7 8 Global memory bandwidth (GBPS) 9 float : 554.70 10 float2 : 575.69 11 float4 : 258.14 12 float8 : 537.39 13 float16 : 552.97 14 15 Single-precision compute (GFLOPS) 16 float : 6216.30 17 float2 : 11449.58 18 float4 : 14290.00 19 float8 : 7261.11 20 float16 : 7262.70 21 22 No half precision support! Skipped 23 24 Double-precision compute (GFLOPS) 25 double : 7212.01 26 double2 : 7191.88 27 double4 : 7168.91 28 double8 : 4505.54 29 double16 : 3124.18 30 31 Integer compute (GIOPS) 32 int : 6219.71 33 int2 : 9340.54 34 int4 : 14371.73 35 int8 : 14373.41 36 int16 : 14342.93 37 38 Integer compute Fast 24bit (GIOPS) 39 int : 9037.88 40 int2 : 6214.77 41 int4 : 6216.47 42 int8 : 9337.87 43 int16 : 14352.44 44 45 Transfer bandwidth (GBPS) 46 enqueueWriteBuffer : 9.75 47 enqueueReadBuffer : 12.11 48 enqueueWriteBuffer non-blocking : 9.32 49 enqueueReadBuffer non-blocking : 11.31 50 enqueueMapBuffer(for read) : 11.25 51 memcpy from mapped ptr : 9.54 52 enqueueUnmap(after write) : 11.26 53 memcpy to mapped ptr : 9.84 54 55 Kernel launch latency : 19.29 us 56 57