xref: /aosp_15_r20/external/clpeak/README.md (revision 1cd03ba3888297bc945f2c84574e105e3ced3e34)
1*1cd03ba3SJeremy Kemp# clpeak
2*1cd03ba3SJeremy Kemp
3*1cd03ba3SJeremy Kemp[![Build Status](https://app.travis-ci.com/krrishnarraj/clpeak.svg?branch=master)](https://app.travis-ci.com/github/krrishnarraj/clpeak)
4*1cd03ba3SJeremy Kemp[![Snap Status](https://snapcraft.io/clpeak/badge.svg)](https://snapcraft.io/clpeak)
5*1cd03ba3SJeremy Kemp
6*1cd03ba3SJeremy KempA synthetic benchmarking tool to measure peak capabilities of opencl devices. It only measures the peak metrics that can be achieved using vector operations and does not represent a real-world use case
7*1cd03ba3SJeremy Kemp
8*1cd03ba3SJeremy Kemp## Building
9*1cd03ba3SJeremy Kemp
10*1cd03ba3SJeremy Kemp```console
11*1cd03ba3SJeremy Kempgit submodule update --init --recursive --remote
12*1cd03ba3SJeremy Kempmkdir build
13*1cd03ba3SJeremy Kempcd build
14*1cd03ba3SJeremy Kempcmake ..
15*1cd03ba3SJeremy Kempcmake --build .
16*1cd03ba3SJeremy Kemp```
17*1cd03ba3SJeremy Kemp
18*1cd03ba3SJeremy Kemp## Sample
19*1cd03ba3SJeremy Kemp
20*1cd03ba3SJeremy Kemp```text
21*1cd03ba3SJeremy KempPlatform: NVIDIA CUDA
22*1cd03ba3SJeremy Kemp  Device: Tesla V100-SXM2-16GB
23*1cd03ba3SJeremy Kemp    Driver version  : 390.77 (Linux x64)
24*1cd03ba3SJeremy Kemp    Compute units   : 80
25*1cd03ba3SJeremy Kemp    Clock frequency : 1530 MHz
26*1cd03ba3SJeremy Kemp
27*1cd03ba3SJeremy Kemp    Global memory bandwidth (GBPS)
28*1cd03ba3SJeremy Kemp      float   : 767.48
29*1cd03ba3SJeremy Kemp      float2  : 810.81
30*1cd03ba3SJeremy Kemp      float4  : 843.06
31*1cd03ba3SJeremy Kemp      float8  : 726.12
32*1cd03ba3SJeremy Kemp      float16 : 735.98
33*1cd03ba3SJeremy Kemp
34*1cd03ba3SJeremy Kemp    Single-precision compute (GFLOPS)
35*1cd03ba3SJeremy Kemp      float   : 15680.96
36*1cd03ba3SJeremy Kemp      float2  : 15674.50
37*1cd03ba3SJeremy Kemp      float4  : 15645.58
38*1cd03ba3SJeremy Kemp      float8  : 15583.27
39*1cd03ba3SJeremy Kemp      float16 : 15466.50
40*1cd03ba3SJeremy Kemp
41*1cd03ba3SJeremy Kemp    No half precision support! Skipped
42*1cd03ba3SJeremy Kemp
43*1cd03ba3SJeremy Kemp    Double-precision compute (GFLOPS)
44*1cd03ba3SJeremy Kemp      double   : 7859.49
45*1cd03ba3SJeremy Kemp      double2  : 7849.96
46*1cd03ba3SJeremy Kemp      double4  : 7832.96
47*1cd03ba3SJeremy Kemp      double8  : 7799.82
48*1cd03ba3SJeremy Kemp      double16 : 7740.88
49*1cd03ba3SJeremy Kemp
50*1cd03ba3SJeremy Kemp    Integer compute (GIOPS)
51*1cd03ba3SJeremy Kemp      int   : 15653.47
52*1cd03ba3SJeremy Kemp      int2  : 15654.40
53*1cd03ba3SJeremy Kemp      int4  : 15655.21
54*1cd03ba3SJeremy Kemp      int8  : 15659.04
55*1cd03ba3SJeremy Kemp      int16 : 15608.65
56*1cd03ba3SJeremy Kemp
57*1cd03ba3SJeremy Kemp    Transfer bandwidth (GBPS)
58*1cd03ba3SJeremy Kemp      enqueueWriteBuffer         : 10.64
59*1cd03ba3SJeremy Kemp      enqueueReadBuffer          : 11.92
60*1cd03ba3SJeremy Kemp      enqueueMapBuffer(for read) : 9.97
61*1cd03ba3SJeremy Kemp        memcpy from mapped ptr   : 8.62
62*1cd03ba3SJeremy Kemp      enqueueUnmap(after write)  : 11.04
63*1cd03ba3SJeremy Kemp        memcpy to mapped ptr     : 9.16
64*1cd03ba3SJeremy Kemp
65*1cd03ba3SJeremy Kemp    Kernel launch latency : 7.22 us
66*1cd03ba3SJeremy Kemp```
67