1*1cd03ba3SJeremy Kemp# clpeak 2*1cd03ba3SJeremy Kemp 3*1cd03ba3SJeremy Kemp[](https://app.travis-ci.com/github/krrishnarraj/clpeak) 4*1cd03ba3SJeremy Kemp[](https://snapcraft.io/clpeak) 5*1cd03ba3SJeremy Kemp 6*1cd03ba3SJeremy KempA synthetic benchmarking tool to measure peak capabilities of opencl devices. It only measures the peak metrics that can be achieved using vector operations and does not represent a real-world use case 7*1cd03ba3SJeremy Kemp 8*1cd03ba3SJeremy Kemp## Building 9*1cd03ba3SJeremy Kemp 10*1cd03ba3SJeremy Kemp```console 11*1cd03ba3SJeremy Kempgit submodule update --init --recursive --remote 12*1cd03ba3SJeremy Kempmkdir build 13*1cd03ba3SJeremy Kempcd build 14*1cd03ba3SJeremy Kempcmake .. 15*1cd03ba3SJeremy Kempcmake --build . 16*1cd03ba3SJeremy Kemp``` 17*1cd03ba3SJeremy Kemp 18*1cd03ba3SJeremy Kemp## Sample 19*1cd03ba3SJeremy Kemp 20*1cd03ba3SJeremy Kemp```text 21*1cd03ba3SJeremy KempPlatform: NVIDIA CUDA 22*1cd03ba3SJeremy Kemp Device: Tesla V100-SXM2-16GB 23*1cd03ba3SJeremy Kemp Driver version : 390.77 (Linux x64) 24*1cd03ba3SJeremy Kemp Compute units : 80 25*1cd03ba3SJeremy Kemp Clock frequency : 1530 MHz 26*1cd03ba3SJeremy Kemp 27*1cd03ba3SJeremy Kemp Global memory bandwidth (GBPS) 28*1cd03ba3SJeremy Kemp float : 767.48 29*1cd03ba3SJeremy Kemp float2 : 810.81 30*1cd03ba3SJeremy Kemp float4 : 843.06 31*1cd03ba3SJeremy Kemp float8 : 726.12 32*1cd03ba3SJeremy Kemp float16 : 735.98 33*1cd03ba3SJeremy Kemp 34*1cd03ba3SJeremy Kemp Single-precision compute (GFLOPS) 35*1cd03ba3SJeremy Kemp float : 15680.96 36*1cd03ba3SJeremy Kemp float2 : 15674.50 37*1cd03ba3SJeremy Kemp float4 : 15645.58 38*1cd03ba3SJeremy Kemp float8 : 15583.27 39*1cd03ba3SJeremy Kemp float16 : 15466.50 40*1cd03ba3SJeremy Kemp 41*1cd03ba3SJeremy Kemp No half precision support! Skipped 42*1cd03ba3SJeremy Kemp 43*1cd03ba3SJeremy Kemp Double-precision compute (GFLOPS) 44*1cd03ba3SJeremy Kemp double : 7859.49 45*1cd03ba3SJeremy Kemp double2 : 7849.96 46*1cd03ba3SJeremy Kemp double4 : 7832.96 47*1cd03ba3SJeremy Kemp double8 : 7799.82 48*1cd03ba3SJeremy Kemp double16 : 7740.88 49*1cd03ba3SJeremy Kemp 50*1cd03ba3SJeremy Kemp Integer compute (GIOPS) 51*1cd03ba3SJeremy Kemp int : 15653.47 52*1cd03ba3SJeremy Kemp int2 : 15654.40 53*1cd03ba3SJeremy Kemp int4 : 15655.21 54*1cd03ba3SJeremy Kemp int8 : 15659.04 55*1cd03ba3SJeremy Kemp int16 : 15608.65 56*1cd03ba3SJeremy Kemp 57*1cd03ba3SJeremy Kemp Transfer bandwidth (GBPS) 58*1cd03ba3SJeremy Kemp enqueueWriteBuffer : 10.64 59*1cd03ba3SJeremy Kemp enqueueReadBuffer : 11.92 60*1cd03ba3SJeremy Kemp enqueueMapBuffer(for read) : 9.97 61*1cd03ba3SJeremy Kemp memcpy from mapped ptr : 8.62 62*1cd03ba3SJeremy Kemp enqueueUnmap(after write) : 11.04 63*1cd03ba3SJeremy Kemp memcpy to mapped ptr : 9.16 64*1cd03ba3SJeremy Kemp 65*1cd03ba3SJeremy Kemp Kernel launch latency : 7.22 us 66*1cd03ba3SJeremy Kemp``` 67