xref: /aosp_15_r20/external/clpeak/results/Portable_Computing_Language/NVIDIA_A100-SXM4-40GB.log (revision 1cd03ba3888297bc945f2c84574e105e3ced3e34)
1Platform: Portable Computing Language
2  Device: NVIDIA A100-SXM4-40GB
3    Driver version  : 3.0-rc2 (Linux x64)
4    Compute units   : 108
5    Clock frequency : 1410 MHz
6
7    Global memory bandwidth (GBPS)
8      float   : 1301.28
9      float2  : 1369.03
10      float4  : 1406.91
11      float8  : 1438.37
12      float16 : 1460.08
13
14    Single-precision compute (GFLOPS)
15      float   : 19402.00
16      float2  : 19361.56
17      float4  : 19360.86
18      float8  : 19281.99
19      float16 : 19139.73
20
21    No half precision support! Skipped
22
23    Double-precision compute (GFLOPS)
24      double   : 9718.42
25      double2  : 9697.19
26      double4  : 9686.17
27      double8  : 9653.11
28      double16 : 9576.27
29
30    Integer compute (GIOPS)
31      int   : 19318.55
32      int2  : 19315.23
33      int4  : 19360.05
34      int8  : 19316.09
35      int16 : 19305.90
36
37    Integer compute Fast 24bit (GIOPS)
38      int   : 19322.74
39      int2  : 19319.41
40      int4  : 19333.47
41      int8  : 19316.84
42      int16 : 19306.22
43
44    Transfer bandwidth (GBPS)
45      enqueueWriteBuffer              : 20.22
46      enqueueReadBuffer               : 7.93
47      enqueueWriteBuffer non-blocking : 20.21
48      enqueueReadBuffer non-blocking  : 7.92
49      enqueueMapBuffer(for read)      : 141281.83
50        memcpy from mapped ptr        : 20.48
51      enqueueUnmap(after write)       : 15.90
52        memcpy to mapped ptr          : 20.23
53
54    Kernel launch latency : 7195.83 us
55
56  Device: NVIDIA A100-SXM4-40GB
57    Driver version  : 3.0-rc2 (Linux x64)
58    Compute units   : 108
59    Clock frequency : 1410 MHz
60
61    Global memory bandwidth (GBPS)
62      float   : 1298.47
63      float2  : 1368.92
64      float4  : 1406.60
65      float8  : 1439.31
66      float16 : 1460.02
67
68    Single-precision compute (GFLOPS)
69      float   : 19388.10
70      float2  : 19356.01
71      float4  : 19356.55
72      float8  : 19277.93
73      float16 : 19135.15
74
75    No half precision support! Skipped
76
77    Double-precision compute (GFLOPS)
78      double   : 9713.43
79      double2  : 9692.54
80      double4  : 9680.89
81      double8  : 9647.49
82      double16 : 9570.05
83
84    Integer compute (GIOPS)
85      int   : 19316.41
86      int2  : 19339.49
87      int4  : 19328.43
88      int8  : 19311.48
89      int16 : 19300.44
90
91    Integer compute Fast 24bit (GIOPS)
92      int   : 19317.16
93      int2  : 19313.40
94      int4  : 19327.89
95      int8  : 19311.15
96      int16 : 19299.80
97
98    Transfer bandwidth (GBPS)
99      enqueueWriteBuffer              : 14.44
100      enqueueReadBuffer               : 13.10
101      enqueueWriteBuffer non-blocking : 14.41
102      enqueueReadBuffer non-blocking  : 13.10
103      enqueueMapBuffer(for read)      : 26.35
104        memcpy from mapped ptr        : 19.53
105      enqueueUnmap(after write)       : 26.77
106        memcpy to mapped ptr          : 20.62
107
108    Kernel launch latency : 9458.67 us
109
110  Device: NVIDIA A100-SXM4-40GB
111    Driver version  : 3.0-rc2 (Linux x64)
112    Compute units   : 108
113    Clock frequency : 1410 MHz
114
115    Global memory bandwidth (GBPS)
116      float   : 1299.52
117      float2  : 1369.10
118      float4  : 1406.73
119      float8  : 1440.49
120      float16 : 1460.83
121
122    Single-precision compute (GFLOPS)
123      float   : 19401.13
124      float2  : 19356.17
125      float4  : 19356.55
126      float8  : 19277.87
127      float16 : 19135.10
128
129    No half precision support! Skipped
130
131    Double-precision compute (GFLOPS)
132      double   : 9714.25
133      double2  : 9693.57
134      double4  : 9682.23
135      double8  : 9647.81
136      double16 : 9571.95
137
138    Integer compute (GIOPS)
139      int   : 19317.69
140      int2  : 19341.86
141      int4  : 19328.53
142      int8  : 19312.01
143      int16 : 19301.08
144
145    Integer compute Fast 24bit (GIOPS)
146      int   : 19317.91
147      int2  : 19314.69
148      int4  : 19328.53
149      int8  : 19311.80
150      int16 : 19300.76
151
152    Transfer bandwidth (GBPS)
153      enqueueWriteBuffer              : 14.53
154      enqueueReadBuffer               : 9.13
155      enqueueWriteBuffer non-blocking : 14.44
156      enqueueReadBuffer non-blocking  : 9.12
157      enqueueMapBuffer(for read)      : 26.35
158        memcpy from mapped ptr        : 19.40
159      enqueueUnmap(after write)       : 26.77
160        memcpy to mapped ptr          : 20.62
161
162    Kernel launch latency : 11937.56 us
163
164  Device: NVIDIA A100-SXM4-40GB
165    Driver version  : 3.0-rc2 (Linux x64)
166    Compute units   : 108
167    Clock frequency : 1410 MHz
168
169    Global memory bandwidth (GBPS)
170      float   : 1304.24
171      float2  : 1369.08
172      float4  : 1406.75
173      float8  : 1439.62
174      float16 : 1460.71
175
176    Single-precision compute (GFLOPS)
177      float   : 19393.56
178      float2  : 19365.28
179      float4  : 19365.01
180      float8  : 19286.58
181      float16 : 19144.05
182
183    No half precision support! Skipped
184
185    Double-precision compute (GFLOPS)
186      double   : 9720.38
187      double2  : 9699.67
188      double4  : 9688.97
189      double8  : 9655.90
190      double16 : 9580.43
191
192    Integer compute (GIOPS)
193      int   : 19324.88
194      int2  : 19321.23
195      int4  : 19366.62
196      int8  : 19321.13
197      int16 : 19310.40
198
199    Integer compute Fast 24bit (GIOPS)
200      int   : 19327.03
201      int2  : 19323.49
202      int4  : 19337.24
203      int8  : 19320.91
204      int16 : 19310.19
205
206    Transfer bandwidth (GBPS)
207      enqueueWriteBuffer              : 14.41
208      enqueueReadBuffer               : 6.99
209      enqueueWriteBuffer non-blocking : 14.38
210      enqueueReadBuffer non-blocking  : 7.00
211      enqueueMapBuffer(for read)      : 25.94
212        memcpy from mapped ptr        : 20.83
213      enqueueUnmap(after write)       : 26.77
214        memcpy to mapped ptr          : 20.56
215
216    Kernel launch latency : 15067.95 us
217