xref: /aosp_15_r20/external/clpeak/results/NVIDIA_CUDA/Quadro_GV100.log (revision 1cd03ba3888297bc945f2c84574e105e3ced3e34)
1
2Platform: NVIDIA CUDA
3  Device: Quadro GV100
4    Driver version  : 455.23.05 (Linux x64)
5    Compute units   : 80
6    Clock frequency : 1627 MHz
7
8    Global memory bandwidth (GBPS)
9      float   : 554.70
10      float2  : 575.69
11      float4  : 258.14
12      float8  : 537.39
13      float16 : 552.97
14
15    Single-precision compute (GFLOPS)
16      float   : 6216.30
17      float2  : 11449.58
18      float4  : 14290.00
19      float8  : 7261.11
20      float16 : 7262.70
21
22    No half precision support! Skipped
23
24    Double-precision compute (GFLOPS)
25      double   : 7212.01
26      double2  : 7191.88
27      double4  : 7168.91
28      double8  : 4505.54
29      double16 : 3124.18
30
31    Integer compute (GIOPS)
32      int   : 6219.71
33      int2  : 9340.54
34      int4  : 14371.73
35      int8  : 14373.41
36      int16 : 14342.93
37
38    Integer compute Fast 24bit (GIOPS)
39      int   : 9037.88
40      int2  : 6214.77
41      int4  : 6216.47
42      int8  : 9337.87
43      int16 : 14352.44
44
45    Transfer bandwidth (GBPS)
46      enqueueWriteBuffer              : 9.75
47      enqueueReadBuffer               : 12.11
48      enqueueWriteBuffer non-blocking : 9.32
49      enqueueReadBuffer non-blocking  : 11.31
50      enqueueMapBuffer(for read)      : 11.25
51        memcpy from mapped ptr        : 9.54
52      enqueueUnmap(after write)       : 11.26
53        memcpy to mapped ptr          : 9.84
54
55    Kernel launch latency : 19.29 us
56
57