xref: /aosp_15_r20/external/ComputeLibrary/docs/user_guide/advanced.dox (revision c217d954acce2dbc11938adb493fc0abd69584f3)
1*c217d954SCole Faust///
2*c217d954SCole Faust/// Copyright (c) 2017-2021 Arm Limited.
3*c217d954SCole Faust///
4*c217d954SCole Faust/// SPDX-License-Identifier: MIT
5*c217d954SCole Faust///
6*c217d954SCole Faust/// Permission is hereby granted, free of charge, to any person obtaining a copy
7*c217d954SCole Faust/// of this software and associated documentation files (the "Software"), to
8*c217d954SCole Faust/// deal in the Software without restriction, including without limitation the
9*c217d954SCole Faust/// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
10*c217d954SCole Faust/// sell copies of the Software, and to permit persons to whom the Software is
11*c217d954SCole Faust/// furnished to do so, subject to the following conditions:
12*c217d954SCole Faust///
13*c217d954SCole Faust/// The above copyright notice and this permission notice shall be included in all
14*c217d954SCole Faust/// copies or substantial portions of the Software.
15*c217d954SCole Faust///
16*c217d954SCole Faust/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17*c217d954SCole Faust/// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18*c217d954SCole Faust/// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19*c217d954SCole Faust/// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20*c217d954SCole Faust/// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21*c217d954SCole Faust/// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22*c217d954SCole Faust/// SOFTWARE.
23*c217d954SCole Faust///
24*c217d954SCole Faustnamespace arm_compute
25*c217d954SCole Faust{
26*c217d954SCole Faust/** @page advanced Advanced
27*c217d954SCole Faust
28*c217d954SCole Faust@tableofcontents
29*c217d954SCole Faust
30*c217d954SCole Faust@section S1_8_cl_tuner OpenCL Tuner
31*c217d954SCole Faust
32*c217d954SCole FaustThe OpenCL tuner, a.k.a. CLTuner, is a module of Arm Compute Library that can improve the performance of the OpenCL kernels tuning the Local-Workgroup-Size (LWS).
33*c217d954SCole FaustThe optimal LWS for each unique OpenCL kernel configuration is stored in a table. This table can be either imported or exported from/to a file.
34*c217d954SCole FaustThe OpenCL tuner runs the same OpenCL kernel for a range of local workgroup sizes and keeps the local workgroup size of the fastest run to use in subsequent calls to the kernel. It supports three modes of tuning with different trade-offs between the time taken to tune and the kernel execution time achieved using the best LWS found. In the Exhaustive mode, it searches all the supported values of LWS. This mode takes the longest time to tune and is the most likely to find the optimal LWS. Normal mode searches a subset of LWS values to yield a good approximation of the optimal LWS. It takes less time to tune than Exhaustive mode. Rapid mode takes the shortest time to tune and finds an LWS value that is at least as good or better than the default LWS value. The mode affects only the search for the optimal LWS and has no effect when the LWS value is imported from a file.
35*c217d954SCole FaustIn order for the performance numbers to be meaningful you must disable the GPU power management and set it to a fixed frequency for the entire duration of the tuning phase.
36*c217d954SCole Faust
37*c217d954SCole FaustIf you wish to know more about LWS and the important role on improving the GPU cache utilization, we suggest having a look at the presentation "Even Faster CNNs: Exploring the New Class of Winograd Algorithms available at the following link:
38*c217d954SCole Faust
39*c217d954SCole Fausthttps://www.embedded-vision.com/platinum-members/arm/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-iodice
40*c217d954SCole Faust
41*c217d954SCole FaustTuning a network from scratch can be long and affect considerably the execution time for the first run of your network. It is recommended for this reason to store the CLTuner's result in a file to amortize this time when you either re-use the same network or the functions with the same configurations. The tuning is performed only once for each OpenCL kernel.
42*c217d954SCole Faust
43*c217d954SCole FaustCLTuner looks for the optimal LWS for each unique OpenCL kernel configuration. Since a function (i.e. Convolution Layer, Pooling Layer, Fully Connected Layer ...) can be called multiple times but with different parameters, we associate an "id" (called "config_id") to each kernel to distinguish the unique configurations.
44*c217d954SCole Faust
45*c217d954SCole Faust    #Example: 2 unique Matrix Multiply configurations
46*c217d954SCole Faust@code{.cpp}
47*c217d954SCole Faust    TensorShape a0 = TensorShape(32,32);
48*c217d954SCole Faust    TensorShape b0 = TensorShape(32,32);
49*c217d954SCole Faust    TensorShape c0 = TensorShape(32,32);
50*c217d954SCole Faust    TensorShape a1 = TensorShape(64,64);
51*c217d954SCole Faust    TensorShape b1 = TensorShape(64,64);
52*c217d954SCole Faust    TensorShape c1 = TensorShape(64,64);
53*c217d954SCole Faust
54*c217d954SCole Faust    Tensor a0_tensor;
55*c217d954SCole Faust    Tensor b0_tensor;
56*c217d954SCole Faust    Tensor c0_tensor;
57*c217d954SCole Faust    Tensor a1_tensor;
58*c217d954SCole Faust    Tensor b1_tensor;
59*c217d954SCole Faust    Tensor c1_tensor;
60*c217d954SCole Faust
61*c217d954SCole Faust    a0_tensor.allocator()->init(TensorInfo(a0, 1, DataType::F32));
62*c217d954SCole Faust    b0_tensor.allocator()->init(TensorInfo(b0, 1, DataType::F32));
63*c217d954SCole Faust    c0_tensor.allocator()->init(TensorInfo(c0, 1, DataType::F32));
64*c217d954SCole Faust    a1_tensor.allocator()->init(TensorInfo(a1, 1, DataType::F32));
65*c217d954SCole Faust    b1_tensor.allocator()->init(TensorInfo(b1, 1, DataType::F32));
66*c217d954SCole Faust    c1_tensor.allocator()->init(TensorInfo(c1 1, DataType::F32));
67*c217d954SCole Faust
68*c217d954SCole Faust    CLGEMM gemm0;
69*c217d954SCole Faust    CLGEMM gemm1;
70*c217d954SCole Faust
71*c217d954SCole Faust    // Configuration 0
72*c217d954SCole Faust    gemm0.configure(&a0, &b0, nullptr, &c0, 1.0f, 0.0f);
73*c217d954SCole Faust
74*c217d954SCole Faust    // Configuration 1
75*c217d954SCole Faust    gemm1.configure(&a1, &b1, nullptr, &c1, 1.0f, 0.0f);
76*c217d954SCole Faust@endcode
77*c217d954SCole Faust
78*c217d954SCole Faust@subsection S1_8_1_cl_tuner_how_to How to use it
79*c217d954SCole Faust
80*c217d954SCole FaustAll the graph examples in the Compute Library's folder "examples" and the arm_compute_benchmark accept an argument to enable the OpenCL tuner and an argument to export/import the LWS values to/from a file
81*c217d954SCole Faust
82*c217d954SCole Faust    #Enable CL tuner
83*c217d954SCole Faust    ./graph_mobilenet --enable-tuner –-target=CL
84*c217d954SCole Faust    ./arm_compute_benchmark --enable-tuner
85*c217d954SCole Faust
86*c217d954SCole Faust    #Export/Import to/from a file
87*c217d954SCole Faust    ./graph_mobilenet --enable-tuner --target=CL --tuner-file=acl_tuner.csv
88*c217d954SCole Faust    ./arm_compute_benchmark --enable-tuner --tuner-file=acl_tuner.csv
89*c217d954SCole Faust
90*c217d954SCole FaustIf you are importing the CLTuner'results from a file, the new tuned LWS values will be appended to it.
91*c217d954SCole Faust
92*c217d954SCole FaustEither you are benchmarking the graph examples or the test cases in the arm_compute_benchmark remember to:
93*c217d954SCole Faust
94*c217d954SCole Faust    -# Disable the power management
95*c217d954SCole Faust    -# Keep the GPU frequency constant
96*c217d954SCole Faust    -# Run multiple times the network (i.e. 10).
97*c217d954SCole Faust
98*c217d954SCole FaustIf you are not using the graph API or the benchmark infrastructure you will need to manually pass a CLTuner object to CLScheduler before configuring any function.
99*c217d954SCole Faust
100*c217d954SCole Faust@code{.cpp}
101*c217d954SCole FaustCLTuner tuner;
102*c217d954SCole Faust
103*c217d954SCole Faust// Setup Scheduler
104*c217d954SCole FaustCLScheduler::get().default_init(&tuner);
105*c217d954SCole Faust@endcode
106*c217d954SCole Faust
107*c217d954SCole FaustAfter the first run, the CLTuner's results can be exported to a file using the method "save_to_file()".
108*c217d954SCole Faust- tuner.save_to_file("results.csv");
109*c217d954SCole Faust
110*c217d954SCole FaustThis file can be also imported using the method "load_from_file("results.csv")".
111*c217d954SCole Faust- tuner.load_from_file("results.csv");
112*c217d954SCole Faust
113*c217d954SCole Faust@section Security Concerns
114*c217d954SCole FaustHere are some security concerns that may affect Compute Library.
115*c217d954SCole Faust
116*c217d954SCole Faust@subsection A process running under the same uid could read another process memory
117*c217d954SCole Faust
118*c217d954SCole FaustProcesses running under same user ID (UID) may be able to read each other memory and running state. Hence, This can
119*c217d954SCole Faustlead to information disclosure and sensitive data can be leaked, such as the weights of the model currently executing.
120*c217d954SCole FaustThis mainly affects Linux systems and it's the responsibility of the system owner to make processes secure against
121*c217d954SCole Faustthis vulnerability. Moreover, the YAMA security kernel module can be used to detect and stop such a trial of hacking,
122*c217d954SCole Faustit can be selected at the kernel compile time by CONFIG_SECURITY_YAMA and configured during runtime changing the
123*c217d954SCole Faustptrace_scope in /proc/sys/kernel/yama.
124*c217d954SCole Faust
125*c217d954SCole FaustPlease refer to: https://www.kernel.org/doc/html/v4.15/admin-guide/LSM/Yama.html for more information on this regard.
126*c217d954SCole Faust
127*c217d954SCole Faust@subsection Malicious users could alter Compute Library related files
128*c217d954SCole Faust
129*c217d954SCole FaustExtra care must be taken in order to reduce the posibility of a user altering sensitive files. CLTuner files
130*c217d954SCole Faustshould be protected by arbitrary writes since this can lead Compute Library to crash or waste all system's resources.
131*c217d954SCole Faust
132*c217d954SCole Faust@subsection Various concerns
133*c217d954SCole Faust
134*c217d954SCole FaustSensitive applications that use Compute Library should consider posible attack vectors such as shared library hooking,
135*c217d954SCole Faustinformation leakage from the underlying OpenCL driver or previous excecution and running arbitrary networks that consume
136*c217d954SCole Faustall the available resources on the system, leading to denial of service.
137*c217d954SCole Faust
138*c217d954SCole Faust*/
139*c217d954SCole Faust} // namespace