xref: /aosp_15_r20/external/ComputeLibrary/docs/user_guide/release_version_and_change_log.dox (revision c217d954acce2dbc11938adb493fc0abd69584f3)
1///
2/// Copyright (c) 2017-2023 Arm Limited.
3///
4/// SPDX-License-Identifier: MIT
5///
6/// Permission is hereby granted, free of charge, to any person obtaining a copy
7/// of this software and associated documentation files (the "Software"), to
8/// deal in the Software without restriction, including without limitation the
9/// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
10/// sell copies of the Software, and to permit persons to whom the Software is
11/// furnished to do so, subject to the following conditions:
12///
13/// The above copyright notice and this permission notice shall be included in all
14/// copies or substantial portions of the Software.
15///
16/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17/// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18/// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19/// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20/// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21/// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22/// SOFTWARE.
23///
24namespace arm_compute
25{
26/** @page versions_changelogs Release Versions and Changelog
27
28@tableofcontents
29
30@section S2_1_versions Release versions
31
32All releases are numbered vYY.MM Where YY are the last two digits of the year, and MM the month number.
33If there is more than one release in a month then an extra sequential number is appended at the end:
34
35	v17.03 (First release of March 2017)
36	v17.03.1 (Second release of March 2017)
37	v17.04 (First release of April 2017)
38
39@note We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes.
40@note Starting from release 22.05, 'master' branch is no longer being used, it has been replaced by 'main'. Please update your clone jobs accordingly.
41
42@section S2_2_changelog Changelog
43v23.02.1 Public patch release
44 - Allow mismatching data layouts between the source tensor and weights for \link cpu::CpuGemmDirectConv2d CpuGemmDirectConv2d \endlink with fixed format kernels.
45 - Fixes for experimental CPU only Bazel and CMake builds.
46
47v23.02 Public major release
48 - New features:
49   - Rework the experimental dynamic fusion interface by identifying auxiliary and intermediate tensors, and specifying an explicit output operator.
50   - Add the following operators to the experimental dynamic fusion API:
51     - GpuAdd, GpuCast, GpuClamp, GpuDepthwiseConv2d, GpuMul, GpuOutput, GpuPool2d, GpuReshape, GpuResize, GpuSoftmax, GpuSub.
52   - Add SME/SME2 kernels for GeMM, Winograd convolution, Depthwise convolution and Pooling.
53   - Add new CPU operator AddMulAdd for float and quantized types.
54   - Add new flag @ref ITensorInfo::lock_paddings() to tensors to prevent extending tensor paddings.
55   - Add experimental support for CPU only Bazel and CMake builds.
56 - Performance optimizations:
57   - Optimize CPU base-e exponential functions for FP32.
58   - Optimize CPU StridedSlice by copying first dimension elements in bulk where possible.
59   - Optimize CPU quantized Subtraction by reusing the quantized Addition kernel.
60   - Optimize CPU ReduceMean by removing quantization steps and performing the operation in integer domain.
61   - Optimize GPU Scale and Dynamic Fusion GpuResize by removing quantization steps and performing the operation in integer domain.
62   - Update the heuristic for CLDepthwiseConvolutionNative kernel.
63   - Add new optimized OpenCL kernel to compute indirect convolution:
64     - \link opencl::kernels::ClIndirectConv2dKernel ClIndirectConv2dKernel \endlink
65   - Add new optimized OpenCL kernel to compute transposed convolution:
66     - \link opencl::kernels::ClTransposedConvolutionKernel ClTransposedConvolutionKernel \endlink
67 - Update recommended/minimum NDK version to r20b.
68 - Various optimizations and bug fixes.
69
70v22.11 Public major release
71 - New features:
72   - Add new experimental dynamic fusion API.
73   - Add CPU batch matrix multiplication with adj_x = false and adj_y = false for FP32.
74   - Add CPU MeanStdDevNorm for QASYMM8.
75   - Add CPU and GPU GELU activation function for FP32 and FP16.
76   - Add CPU swish activation function for FP32 and FP16.
77 - Performance optimizations:
78   - Optimize CPU bilinear scale for FP32, FP16, QASYMM8, QASYMM8_SIGNED, U8 and S8.
79   - Optimize CPU activation functions using LUT-based implementation:
80     - Sigmoid function for QASYMM8 and QASYMM8_SIGNED.
81     - Hard swish function for QASYMM8_SIGNED.
82   - Optimize CPU addition for QASYMM8 and QASYMM8_SIGNED using fixed-point arithmetic.
83   - Optimize CPU multiplication, subtraction and activation layers by considering tensors as 1D.
84   - Optimize GPU depthwise convolution kernel and heuristic.
85   - Optimize GPU Conv2d heuristic.
86   - Optimize CPU MeanStdDevNorm for FP16.
87   - Optimize CPU tanh activation function for FP16 using rational approximation.
88 - Improve GPU GeMMLowp start-up time.
89 - Various optimizations and bug fixes.
90
91v22.08 Public major release
92 - Various bug fixes.
93 - Disable unsafe FP optimizations causing accuracy issues in:
94   - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink
95   - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv3dKernel \endlink
96   - @ref CLDepthwiseConvolutionLayerNativeKernel
97 - Add Dynamic Fusion of Elementwise Operators: Div, Floor, Add.
98 - Optimize the gemm_reshaped_rhs_nly_nt OpenCL kernel using the arm_matrix_multiply extension available for Arm® Mali™-G715 and Arm® Mali™-G615.
99 - Add support for the arm_matrix_multiply extension in the gemmlowp_mm_reshaped_only_rhs_t OpenCL kernel.
100 - Expand GPUTarget list with missing Mali™ GPUs product names: G57, G68, G78AE, G610, G510, G310.
101 - Extend the direct convolution 2d interface to configure the block size.
102 - Update ClConv2D heuristic to use direct convolution.
103 - Use official Khronos® OpenCL extensions:
104   - Add cl_khr_integer_dot_product extension support.
105   - Add support of OpenCL 3.0 non-uniform workgroup.
106 - Cpu performance optimizations:
107   - Add LUT-based implementation of Hard Swish and Leaky ReLU activation function for aarch64 build.
108   - Optimize Add layer by considering the input tensors as 1D array.
109 - Add fixed-format BF16, FP16 and FP32 Neon™ GEMM kernels to support variable weights.
110 - Add new winograd convolution kernels implementation and update the ACL \link arm_compute::cpu::CpuWinogradConv2d CpuWinogradConv2d\endlink operator.
111 - Add experimental support for native builds for Windows on Arm®.
112 - Build flag interpretation change: arch=armv8.6-a now translates to -march=armv8.6-a CXX flag instead of march=armv8.2-a + explicit selection of feature extensions.
113 - Build flag change: toolchain_prefix, compiler_prefix:
114   - Use empty string "" to suppress any prefixes.
115   - Use "auto" to use default (auto) prefixes chosen by the build script. This is the default behavior when unspecified.
116   - Any other string will be used as custom prefixes to the compiler and the rest of toolchain tools.
117   - The default behaviour when prefix is unspecified does not change, but its signifier has been changed from empty string "" to "auto".
118 - armv7a with Android build will no longer be tested or maintained.
119
120v22.05 Public major release
121 - Various bug fixes.
122 - Various optimizations.
123 - Add support for NDK r23b.
124 - Inclusive language adjustment. Please refer to @ref S5_0_inc_lang for details.
125 - New Arm® Neon™ kernels / functions :
126   - \link opencl::kernels::ClPool3dKernel ClPool3dKernel \endlink
127 - New OpenCL kernels / functions :
128   - \link cpu::kernels::CpuPool3dKernel CpuPool3dKernel \endlink
129 - Improve the start-up times for the following OpenCL kernels:
130   - \link opencl::kernels::ClWinogradInputTransformKernel ClWinogradInputTransformKernel \endlink
131   - \link opencl::kernels::ClWinogradOutputTransformKernel ClWinogradOutputTransformKernel \endlink
132   - \link opencl::kernels::ClWinogradFilterTransformKernel ClWinogradFilterTransformKernel \endlink
133   - \link opencl::kernels::ClHeightConcatenateKernel ClHeightConcatenateKernel \endlink
134 - Decouple the implementation of the following Cpu kernels into various data types (fp32, fp16, int):
135   - \link cpu::kernels::CpuDirectConv2dKernel CpuDirectConv2dKernel \endlink
136   - \link cpu::kernels::CpuDepthwiseConv2dNativeKernel CpuDepthwiseConv2dNativeKernel \endlink
137   - \link cpu::kernels::CpuGemmMatrixAdditionKernel CpuGemmMatrixAdditionKernel \endlink
138   - \link cpu::kernels::CpuGemmMatrixMultiplyKernel CpuGemmMatrixMultiplyKernel \endlink
139   - @ref NEFuseBatchNormalizationKernel
140   - @ref NEL2NormalizeLayerKernel
141
142v22.02 Public major release
143 - Various bug fixes.
144 - Various optimizations.
145 - Update A510 arm_gemm cpu Kernels.
146 - Inclusive language adjustment. Please refer to @ref S5_0_inc_lang for details.
147 - Improve the start-up time for the following OpenCL kernels:
148   - @ref CLScale
149   - @ref CLGEMM
150   - @ref CLDepthwiseConvolutionLayer
151   - \link opencl::kernels::ClIm2ColKernel ClIm2ColKernel \endlink
152   - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink
153 - Remove functions:
154   - CLRemap
155   - NERemap
156 - Remove padding from OpenCL kernels:
157   - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink
158 - Remove padding from Cpu kernels:
159   - \link cpu::kernels::CpuDirectConv2dKernel CpuDirectConv2dKernel \endlink
160 - Decouple the implementation of the following Cpu kernels into various data types (fp32, fp16, int):
161   - \link cpu::kernels::CpuActivationKernel CpuActivationKernel \endlink
162   - \link cpu::kernels::CpuAddKernel CpuAddKernel \endlink
163   - \link cpu::kernels::CpuElementwiseKernel CpuElementwiseKernel \endlink
164   - \link cpu::CpuSoftmaxGeneric CpuSoftmaxKernel \endlink
165   - @ref NEBoundingBoxTransformKernel
166   - @ref NECropKernel
167   - @ref NEComputeAllAnchorsKernel
168   - @ref NEInstanceNormalizationLayerKernel
169   - NEMaxUnpoolingLayerKernel
170   - @ref NEMeanStdDevNormalizationKernel
171   - @ref NERangeKernel
172   - @ref NEROIAlignLayerKernel
173   - @ref NESelectKernel
174
175v21.11 Public major release
176 - Various bug fixes.
177 - Various optimizations:
178   - Improve performance of bilinear and nearest neighbor Scale on both CPU and GPU for FP32, FP16, Int8, Uint8 data types
179   - Improve performance of Softmax on GPU for Uint8/Int8
180 - New OpenCL kernels / functions:
181   - @ref CLConv3D
182 - New Arm® Neon™ kernels / functions:
183   - @ref NEConv3D
184 - Support configurable build by a selected subset of operator list
185 - Support MobileBert on Neon™ backend
186 - Improve operator/function logging
187 - Remove padding from OpenCL kernels:
188   - ClPool2dKernel
189   - ClScaleKernel
190   - ClGemmMatrixMultiplyReshapedKernel
191 - Remove padding from Cpu kernels:
192   - CpuPool2dKernel
193 - Remove Y padding from OpenCL kernels:
194   - ClGemmMatrixMultiplyKernel
195   - ClGemmReshapedRHSMatrixKernel
196 - Remove legacy GeMM kernels in gemm_v1.cl
197
198v21.08 Public major release
199 - Various bug fixes.
200 - Various optimizations:
201  - Improve LWS (Local-Workgroup-Size) heuristic in OpenCL for GeMM, Direct Convolution and Winograd Transformations when OpenCL tuner is not used
202  - Improve QASYMM8/QSYMM8 performance on OpenCL for various Arm® Mali™ GPU architectures
203  - Add dynamic weights support in Fully connected layer (CPU/GPU)
204  - Various performance optimizations for floating-point data types (CPU/GPU)
205 - Add a reduced core library build arm_compute_core_v2
206 - Expose Operator API
207 - Support fat binary build for arm8.2-a via fat_binary build flag
208 - Add CPU discovery capabilities
209 - Add data type f16 support for:
210  - CLRemapKernel
211 - Port the following functions to stateless API:
212   - @ref CLConvolutionLayer
213   - @ref CLFlattenLayer
214   - @ref CLFullyConnectedLayer
215   - @ref CLGEMM
216   - @ref CLGEMMConvolutionLayer
217   - @ref CLGEMMLowpMatrixMultiplyCore
218   - @ref CLWinogradConvolutionLayer
219   - @ref NEConvolutionLayer
220   - @ref NEFlattenLayer
221   - @ref NEFullyConnectedLayer
222   - @ref NEGEMM
223   - @ref NEGEMMConv2d
224   - @ref NEGEMMConvolutionLayer
225   - @ref NEGEMMLowpMatrixMultiplyCore
226   - @ref NEWinogradConvolutionLayer
227 - Remove the following functions:
228   - CLWinogradInputTransform
229 - Remove CLCoreRuntimeContext
230 - Remove ICPPSimpleKernel
231 - Rename file arm_compute/runtime/CL/functions/CLElementWiseUnaryLayer.h to arm_compute/runtime/CL/functions/CLElementwiseUnaryLayer.h
232
233v21.05 Public major release
234 - Various bug fixes.
235 - Various optimisations.
236 - Various documentation updates:
237   - Add supported operators and corresponding Android NNAPI operators.
238   - Documentation reorg into user guide and contributor guide.
239 - Add support for a global allocator for OpenCL tensors
240 - Add experimental support for [CLVK](https://github.com/kpet/clvk).
241 - Add data type S32 support for:
242  - @ref opencl::kernels::ClArithmeticKernel
243 - Add data type QASYMM8 support for:
244  - @ref CLROIPoolingLayer
245  - @ref CLROIPoolingLayerKernel
246  - @ref NEROIPoolingLayer
247  - @ref NEROIPoolingLayerKernel
248 - Add per-channel quantization support for:
249  - @ref CLDeconvolutionLayer
250  - @ref CLDirectDeconvolutionLayer
251  - @ref NEConvolutionLayer
252  - @ref NEDeconvolutionLayer
253 - Remove padding from OpenCL kernels:
254   - @ref CLL2NormalizeLayerKernel
255   - CLDepthwiseConvolutionLayer3x3NHWCKernel
256   - @ref CLNormalizationLayerKernel
257   - @ref CLNormalizePlanarYUVLayerKernel
258   - @ref opencl::kernels::ClMulKernel
259   - @ref CLReductionOperationKernel
260   - @ref CLROIPoolingLayerKernel
261 - Remove computer vision support from Arm® Neon™ backend
262 - Remove the following functions:
263   - NEAbsoluteDifference
264   - NEAccumulate
265   - NEBox3x3
266   - NECannyEdge
267   - NEChannelCombine
268   - NEChannelExtract
269   - NEColorConvert
270   - NEConvolution
271   - NEDerivative
272   - NEDilate
273   - NEEqualizeHistogram
274   - NEErode
275   - NEFastCorners
276   - NEGaussian3x3
277   - NEGaussian5x5
278   - NEGaussianPyramid
279   - NEHOGDescriptor
280   - NEHOGDetector
281   - NEHOGGradient
282   - NEHOGMultiDetection
283   - NEHarrisCorners
284   - NEHistogram
285   - NEIntegralImage
286   - NELaplacianPyramid
287   - NELaplacianReconstruct
288   - NEMagnitude
289   - NEMeanStdDev
290   - NEMedian3x3
291   - NEMinMaxLocation
292   - NENonLinearFilter
293   - NEOpticalFlow
294   - NEPhase
295   - NEScharr3x3
296   - NESobel3x3
297   - NESobel5x5
298   - NESobel7x7
299   - NETableLookup
300   - NEThreshold
301   - NEWarpAffine
302   - NEWarpPerspectiveKernel
303 - Remove all GLES kernels / functions / tests / examples
304 - Remove computer vision support from CL backend
305 - Remove the following functions:
306   - CLAbsoluteDifference
307   - CLAccumulate
308   - CLBox3x3
309   - CLCannyEdge
310   - CLChannelCombine
311   - CLChannelExtract
312   - CLColorConvert
313   - CLConvolution
314   - CLDerivative
315   - CLDilate
316   - CLEqualizeHistogram
317   - CLErode
318   - CLFastCorners
319   - CLGaussian3x3
320   - CLGaussian5x5
321   - CLGaussianPyramid
322   - CLHOGDescriptor
323   - CLHOGDetector
324   - CLHOGGradient
325   - CLHOGMultiDetection
326   - CLHarrisCorners
327   - CLHistogram
328   - CLIntegralImage
329   - CLLaplacianPyramid
330   - CLLaplacianReconstruct
331   - CLMagnitude
332   - CLMeanStdDev
333   - CLMedian3x3
334   - CLMinMaxLocation
335   - CLNonLinearFilter
336   - CLOpticalFlow
337   - CLPhase
338   - CLScharr3x3
339   - CLSobel3x3
340   - CLSobel5x5
341   - CLSobel7x7
342   - CLTableLookup
343   - CLThreshold
344   - CLWarpAffine
345   - CLWarpPerspective
346
347v21.02 Public major release
348 - Various bug fixes.
349 - Various optimisations.
350 - Upgrade C++ standard to C++14
351 - Add macOS support
352 - Add Armv8-R AArch64 architecture support
353 - Add SVE/SVE2 support for:
354   - NEScaleKernel
355   - @ref NEActivationLayer
356   - @ref NEArithmeticAddition
357   - @ref NEBatchNormalizationLayerKernel
358   - @ref cpu::kernels::CpuLogits1DSoftmaxKernel
359   - @ref cpu::kernels::CpuLogits1DMaxKernel
360   - @ref cpu::kernels::CpuElementwiseUnaryKernel
361 - Remove padding from OpenCL kernels:
362   - CLDirectConvolutionLayerKernel
363   - @ref CLArgMinMaxLayerKernel
364   - @ref CLPadLayerKernel
365   - @ref CLROIAlignLayerKernel
366   - @ref CLRangeKernel
367   - CLScaleKernel
368   - @ref CLSelectKernel
369   - @ref CLBitwiseKernel
370   - @ref opencl::kernels::ClFloorKernel
371   - CLTransposeKernel
372 - Deprecate functions in CLTuner:
373    - add_lws_to_table
374    - import_lws_table
375    - lws_table
376 - Remove functions:
377   - NELocallyConnectedLayer / CLLocallyConnectedLayer
378   - NEIm2Col
379   - NECol2Im
380   - NEGEMMInterleave4x4
381   - NEGEMMTranspose1xW
382   - NEComputeAllAnchors / CLComputeAllAnchors
383   - NEGEMMAssemblyDispatch
384   - NEUpsampleLayer / CLUpsampleLayer
385 - Remove kernels:
386   - NEGEMMMatrixVectorMultiplyKernel
387   - NELocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedMatrixMultiplyKernel
388   - NEUpsampleLayerKernel / CLUpsampleLayerKernel
389 - Extend OpenCL tuner with workgroup batch size support
390   - Experimental extension for the OpenCL tuner to tune the batches of work groups distribute to compute units
391 - Add functionality to load the OpenCL GEMM heuristics at runtime
392   - The GEMM heuristic file (MLGO) can be used to update the default GEMM heuristics available for OpenCL
393 - Note: there might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation
394 - Note: data-type decoupling is in progress and experimental. Warning of unused symbols might be raised
395
396v20.11 Public major release
397 - Various bug fixes.
398 - Various optimisations.
399 - Performance regressions can be noted when executing Depthwise Convolution on Arm® Neon™ with a depth multiplier > 1 for quantized data type.
400   This is planned to be resolved in 21.02 release.
401 - Added new data type QASYMM8_SIGNED support for @ref NEROIAlignLayer.
402 - Added new data type S32 support for:
403   - NEArithmeticSubtraction
404   - NEArithmeticSubtractionKernel
405   - @ref NEPixelWiseMultiplication
406   - NEPixelWiseMultiplicationKernel
407   - NEElementwiseDivision
408   - NEDivisionOperationKernel
409 - Interface change
410   - Properly support softmax axis to have the same meaning as other major frameworks. That is, axis now defines the dimension
411     on which Softmax/Logsoftmax is performed. E.g. for input of shape 4x5x6 and axis=1, softmax will be applied to 4x6=24 vectors of size 5.
412     The supported value range of axis is [-rank, rank).
413     This change applies to the following functions:
414      - @ref NESoftmaxLayer
415      - @ref NELogSoftmaxLayer
416      - @ref CLSoftmaxLayer
417      - @ref CLLogSoftmaxLayer
418      - GCSoftmaxLayer
419 - New OpenCL kernels / functions:
420   - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
421   - @ref CLLogicalNot
422   - @ref CLLogicalAnd
423   - @ref CLLogicalOr
424 - New Arm® Neon™ kernels / functions:
425   - @ref NELogicalNot
426   - @ref NELogicalAnd
427   - @ref NELogicalOr
428 - Removed padding from Arm® Neon™ kernels:
429   - NEComplexPixelWiseMultiplicationKernel
430   - NENonMaximaSuppression3x3Kernel
431   - NERemapKernel
432   - NEGEMMInterleave4x4Kernel
433   - NEDirectConvolutionLayerKernel
434   - NEScaleKernel
435   - NELocallyConnectedMatrixMultiplyKernel
436   - NEGEMMLowpOffsetContributionKernel
437   - NEGEMMTranspose1xWKernel
438   - NEPoolingLayerKernel
439   - NEConvolutionKernel
440   - NEDepthwiseConvolutionLayerNativeKernel
441   - NEGEMMLowpMatrixMultiplyKernel
442   - NEGEMMMatrixMultiplyKernel
443   - NEDirectConvolutionLayerOutputStageKernel
444   - @ref NEReductionOperationKernel
445   - NEGEMMLowpMatrixAReductionKernel
446   - NEGEMMLowpMatrixBReductionKernel
447 - Removed padding from OpenCL kernels:
448   - CLBatchConcatenateLayerKernel
449   - CLElementwiseOperationKernel
450   - @ref CLBatchNormalizationLayerKernel
451   - CLPoolingLayerKernel
452   - CLWinogradInputTransformKernel
453   - CLGEMMLowpMatrixMultiplyNativeKernel
454   - CLGEMMLowpMatrixAReductionKernel
455   - CLGEMMLowpMatrixBReductionKernel
456   - CLGEMMLowpOffsetContributionOutputStageKernel
457   - CLGEMMLowpOffsetContributionKernel
458   - CLWinogradOutputTransformKernel
459   - CLGEMMLowpMatrixMultiplyReshapedKernel
460   - @ref CLFuseBatchNormalizationKernel
461   - @ref CLDepthwiseConvolutionLayerNativeKernel
462   - CLDepthConvertLayerKernel
463   - CLCopyKernel
464   - CLDepthwiseConvolutionLayer3x3NHWCKernel
465   - CLActivationLayerKernel
466   - CLWinogradFilterTransformKernel
467   - CLWidthConcatenateLayerKernel
468   - CLWidthConcatenate4TensorsKernel
469   - CLWidthConcatenate2TensorsKernel
470   - CLLogits1DMaxShiftExpSumKernel
471   - CLLogits1DNormKernel
472   - CLHeightConcatenateLayerKernel
473   - CLGEMMMatrixMultiplyKernel
474   - CLGEMMLowpQuantizeDownInt32ScaleKernel
475   - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
476   - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
477   - CLDepthConcatenateLayerKernel
478   - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
479 - Removed OpenCL kernels / functions:
480   - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
481   - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel
482   - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel
483 - Deprecated OpenCL kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
484     - CLLocallyConnectedLayer
485     - CLLocallyConnectedMatrixMultiplyKernel
486     - CLAbsoluteDifference
487     - CLAbsoluteDifferenceKernel
488     - CLAccumulate
489     - CLAccumulateKernel
490     - CLAccumulateSquared
491     - CLAccumulateSquaredKernel
492     - CLAccumulateWeighted
493     - CLAccumulateWeightedKernel
494     - CLAccumulateWeightedFP16Kernel
495     - CLBox3x3
496     - CLBox3x3Kernel
497     - CLBox3x3FP16Kernel
498     - CLCannyEdge
499     - CLChannelCombine
500     - CLChannelCombineKernel
501     - CLChannelExtract
502     - CLChannelExtractKernel
503     - CLColorConvert
504     - CLColorConvertKernel
505     - CLConvolution3x3
506     - CLConvolutionRectangle
507     - CLConvolutionRectangleKernel
508     - CLConvolutionSquare
509     - CLConvolutionKernel
510     - CLDerivative
511     - CLDerivativeKernel
512     - CLDilate
513     - CLDilateKernel
514     - CLEqualizeHistogram
515     - CLErode
516     - CLErodeKernel
517     - CLFastCorners
518     - CLFastCornersKernel
519     - CLGaussian3x3
520     - CLGaussian3x3Kernel
521     - CLGaussian5x5
522     - CLGaussian5x5HorKernel
523     - CLGaussian5x5VertKernel
524     - CLGaussianPyramid
525     - CLGaussianPyramidHalf
526     - CLGaussianPyramidOrb
527     - CLHarrisCorners
528     - CLHarrisScoreKernel
529     - CLHarrisScoreFP16Kernel
530     - CLHistogram
531     - CLHistogramKernel
532     - CLHOGOrientationBinningKernel
533     - CLHOGBlockNormalizationKernel
534     - CLHOGDetectorKernel
535     - CLHOGNonMaximaSuppressionKernel
536     - CLHOGDescriptor
537     - CLHOGDetector
538     - CLHOGGradient
539     - CLHOGMultiDetection
540     - CLHOGOrientationBinningKernel
541     - CLHOGBlockNormalizationKernel
542     - CLHOGDetectorKernel
543     - CLIntegralImage
544     - CLIntegralImageKernel
545     - CLLaplacianReconstruct
546     - CLLaplacianPyramid
547     - CLMagnitude
548     - CLMagnitudePhaseKernel
549     - CLMedian3x3
550     - CLMedian3x3Kernel
551     - CLMinMaxLocation
552     - CLMinMaxLocationKernel
553     - CLNonLinearFilter
554     - CLNonLinearFilterKernel
555     - CLNonMaximaSuppression3x3
556     - CLNonMaximaSuppression3x3FP16Kernel
557     - CLNonMaximaSuppression3x3Kernel
558     - CLOpticalFlow
559     - CLPhase
560     - CLRemap
561     - CLRemapKernel
562     - CLScharr3x3
563     - CLScharr3x3Kernel
564     - CLSobel3x3
565     - CLSobel3x3Kernel
566     - CLSobel5x5
567     - CLSobel5x5HorKernel
568     - CLSobel5x5VertKernel
569     - CLSobel7x7
570     - CLSobel7x7HorKernel
571     - CLSobel7x7VertKernel
572     - CLThreshold
573     - CLThresholdKernel
574     - CLWarpAffine
575     - CLWarpAffineKernel
576     - CLWarpPerspective
577     - CLWarpPerspectiveKernel
578 - Deprecated Arm® Neon™ kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
579     - NELocallyConnectedLayer
580     - NELocallyConnectedMatrixMultiplyKernel
581     - NEAbsoluteDifference
582     - NEAbsoluteDifferenceKernel
583     - NEAccumulate
584     - NEAccumulateKernel
585     - NEAccumulateSquared
586     - NEAccumulateSquaredKernel
587     - NEAccumulateWeighted
588     - NEAccumulateWeightedKernel
589     - NEAccumulateWeightedFP16Kernel
590     - NEBox3x3
591     - NEBox3x3Kernel
592     - NEBox3x3FP16Kernel
593     - NECannyEdge
594     - NEChannelCombine
595     - NEChannelCombineKernel
596     - NEChannelExtract
597     - NEChannelExtractKernel
598     - NEColorConvert
599     - NEColorConvertKernel
600     - NEConvolution3x3
601     - NEConvolutionRectangle
602     - NEConvolutionRectangleKernel
603     - NEConvolutionSquare
604     - NEConvolutionKernel
605     - NEDerivative
606     - NEDerivativeKernel
607     - NEDilate
608     - NEDilateKernel
609     - NEEqualizeHistogram
610     - NEErode
611     - NEErodeKernel
612     - NEFastCorners
613     - NEFastCornersKernel
614     - NEGaussian3x3
615     - NEGaussian3x3Kernel
616     - NEGaussian5x5
617     - NEGaussian5x5HorKernel
618     - NEGaussian5x5VertKernel
619     - NEGaussianPyramid
620     - NEGaussianPyramidHalf
621     - NEGaussianPyramidOrb
622     - NEHarrisCorners
623     - NEHarrisScoreKernel
624     - NEHarrisScoreFP16Kernel
625     - NEHistogram
626     - NEHistogramKernel
627     - NEHOGOrientationBinningKernel
628     - NEHOGBlockNormalizationKernel
629     - NEHOGDetectorKernel
630     - NEHOGNonMaximaSuppressionKernel
631     - NEHOGDescriptor
632     - NEHOGDetector
633     - NEHOGGradient
634     - NEHOGMultiDetection
635     - NEHOGOrientationBinningKernel
636     - NEHOGBlockNormalizationKernel
637     - NEHOGDetectorKernel
638     - NEIntegralImage
639     - NEIntegralImageKernel
640     - NELaplacianReconstruct
641     - NELaplacianPyramid
642     - NEMagnitude
643     - NEMagnitudePhaseKernel
644     - NEMedian3x3
645     - NEMedian3x3Kernel
646     - NEMinMaxLocation
647     - NEMinMaxLocationKernel
648     - NENonLinearFilter
649     - NENonLinearFilterKernel
650     - NENonMaximaSuppression3x3
651     - NENonMaximaSuppression3x3FP16Kernel
652     - NENonMaximaSuppression3x3Kernel
653     - NEOpticalFlow
654     - NEPhase
655     - NERemap
656     - NERemapKernel
657     - NEScharr3x3
658     - NEScharr3x3Kernel
659     - NESobel3x3
660     - NESobel3x3Kernel
661     - NESobel5x5
662     - NESobel5x5HorKernel
663     - NESobel5x5VertKernel
664     - NESobel7x7
665     - NESobel7x7HorKernel
666     - NESobel7x7VertKernel
667     - NEThreshold
668     - NEThresholdKernel
669     - NEWarpAffine
670     - NEWarpAffineKernel
671     - NEWarpPerspective
672     - NEWarpPerspectiveKernel
673 - Deprecated GLES kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
674     - GCAbsoluteDifference
675     - GCActivationLayer
676     - GCArithmeticAddition
677     - GCBatchNormalizationLayer
678     - GCConcatenateLayer
679     - GCConvolutionLayer
680     - GCDepthwiseConvolutionLayer
681     - GCDirectConvolutionLayer
682     - GCDropoutLayer
683     - GCFillBorder
684     - GCFullyConnectedLayer
685     - GCGEMM
686     - GCGEMMInterleave4x4
687     - GCGEMMTranspose1xW
688     - GCNormalizationLayer
689     - GCNormalizePlanarYUVLayer
690     - GCPixelWiseMultiplication
691     - GCPoolingLayer
692     - GCScale
693     - GCSoftmaxLayer
694     - GCTensorShift
695     - GCTranspose
696
697
698v20.08 Public major release
699 - Various bug fixes.
700 - Various optimisations.
701 - Added new data type QASYMM8_SIGNED support for:
702   - @ref CLArgMinMaxLayer
703   - @ref CLArgMinMaxLayerKernel
704 - Added new data type U8 support for:
705   - @ref NECropKernel
706   - CLCropKernel
707 - Added align_corner support for nearest neighbor interpolation in:
708   - NEScaleKernel
709   - CLScaleKernel
710 - New OpenCL kernels / functions:
711   - @ref CLMaxUnpoolingLayerKernel
712 - New Arm® Neon™ kernels / functions:
713   - NEMaxUnpoolingLayerKernel
714 - New graph example:
715   - graph_yolov3_output_detector
716 - GEMMTuner improvements:
717   - Added fp16 support
718   - Output json files for easier integration
719   - Enabled tuning for export_to_cl_image_rhs option for RHS tensors
720   - More robust script for running benchmarks
721 - Removed padding from:
722   - NEPixelWiseMultiplicationKernel
723   - NEHeightConcatenateLayerKernel
724   - NEThresholdKernel
725   - NEBatchConcatenateLayerKernel
726   - NETransposeKernel
727   - @ref NEBatchNormalizationLayerKernel
728   - NEArithmeticSubtractionKernel
729   - @ref NEBoundingBoxTransformKernel
730   - NELogits1DMaxKernel
731   - NELogits1DSoftmaxKernel
732   - @ref NEROIPoolingLayerKernel
733   - @ref NEROIAlignLayerKernel
734   - NEYOLOLayerKernel
735   - NEUpsampleLayerKernel
736   - NEFloorKernel
737   - NEWidthConcatenateLayerKernel
738   - NEDepthConcatenateLayerKernel
739   - @ref NENormalizationLayerKernel
740   - @ref NEL2NormalizeLayerKernel
741   - NEFillArrayKernel
742   - NEDepthConvertLayerKernel
743   - @ref NERangeKernel
744   - @ref NEPriorBoxLayer
745 - Removed OpenCL kernels / functions:
746   - CLGEMMLowpQuantizeDownInt32ToUint8Scale
747   - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
748 - Removed Arm® Neon™ kernels / functions:
749   - NEGEMMLowpQuantizeDownInt32ToUint8Scale
750   - NEGEMMMatrixAccumulateBiasesKernel
751 - Deprecated functions / interfaces:
752   - Non-descriptor based interfaces for NEThreshold, CLThreshold
753   - Non-descriptor based interfaces for @ref NEScale, @ref CLScale and GCScale
754   - In @ref NESoftmaxLayer, @ref NELogSoftmaxLayer, @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer :
755      The default "axis" value for @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer is changed from 1 to 0.
756      Only axis 0 is supported.
757      The default "axis" value for @ref NESoftmaxLayer, @ref NELogSoftmaxLayer is changed from 1 to 0.
758      Only axis 0 is supported.
759 - The support for quantized data types has been removed from @ref CLLogSoftmaxLayer due to implementation complexity.
760 - Removed padding requirement for the input (e.g. LHS of GEMM) and output in CLGEMMMatrixMultiplyNativeKernel, CLGEMMMatrixMultiplyReshapedKernel, CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and CLIm2ColKernel (NHWC only)
761   - This change allows to use @ref CLGEMMConvolutionLayer without extra padding for the input and output.
762   - Only the weights/bias of @ref CLGEMMConvolutionLayer could require padding for the computation.
763   - Only on Arm® Mali™ Midgard GPUs, @ref CLGEMMConvolutionLayer could require padding since CLGEMMMatrixMultiplyKernel is called and currently requires padding.
764 - Added support for exporting the OpenCL buffer object to the OpenCL image object in CLGEMMMatrixMultiplyReshapedKernel and CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.
765   - This support allows to export the OpenCL buffer used for the reshaped RHS matrix to the OpenCL image object.
766   - The padding requirement for the OpenCL image object is considered into the CLGEMMReshapeRHSMatrixKernel.
767   - The reshaped RHS matrix stores the weights when GEMM is used to accelerate CLGEMMConvolutionLayer.
768
769v20.05 Public major release
770 - Various bug fixes.
771 - Various optimisations.
772 - Updated recommended NDK version to r18b.
773 - Updated recommended gcc version to Linaro 6.3.1.
774 - Added Bfloat16 type support
775 - Added Bfloat16 support in:
776     - NEWeightsReshapeKernel
777     - NEConvolutionLayerReshapeWeights
778     - NEIm2ColKernel
779     - NEIm2Col
780     - NEDepthConvertLayerKernel
781     - @ref NEDepthConvertLayer
782     - @ref NEGEMMConvolutionLayer
783     - NEGEMMAssemblyDispatch
784 - Added new data type QASYMM8_SIGNED support for:
785     - @ref CLDirectConvolutionLayer
786     - @ref CLDeconvolutionLayer
787     - @ref CLDirectDeconvolutionLayer
788     - @ref CLGEMMDeconvolutionLayer
789     - CLGEMMLowpMatrixMultiplyReshapedKernel
790     - CLGEMMLowpQuantizeDownInt32ScaleKernel
791     - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
792     - @ref CLReductionOperation
793     - @ref CLReduceMean
794     - @ref NEScale
795     - NEScaleKernel
796     - NEUpsampleLayer
797     - @ref NECast
798     - @ref NEReductionOperation
799     - @ref NEReduceMean
800     - @ref NEArgMinMaxLayer
801     - @ref NEDeconvolutionLayer
802     - NEGEMMLowpQuantizeDownInt32ScaleKernel
803     - @ref CPPBoxWithNonMaximaSuppressionLimit
804     - @ref CPPDetectionPostProcessLayer
805     - @ref CPPPermuteKernel
806     - @ref CPPPermute
807     - @ref CPPTopKVKernel
808     - @ref CPPTopKV
809     - @ref CPPUpsample
810     - @ref CPPUpsampleKernel
811 - New OpenCL kernels / functions:
812     - @ref CLQLSTMLayer
813     - @ref CLQLSTMLayerNormalizationKernel
814 - New Arm® Neon™ kernels / functions:
815     - @ref NEQLSTMLayer
816     - @ref NEQLSTMLayerNormalizationKernel
817 - Added HARD_SWISH support in:
818     - CLActivationLayerKernel
819     - NEActivationLayerKernel
820 - Deprecated OpenCL kernels / functions:
821     - CLGEMMLowpQuantizeDownInt32ToUint8Scale
822     - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
823 - Deprecated Arm® Neon™ kernels / functions:
824     - NEGEMMLowpQuantizeDownInt32ToUint8Scale
825 - Removed CPP kernels / functions:
826     - CPPFlipWeightsKernel
827 - Removed PoolingLayerInfo constructors without Data Layout.
828 - Removed CLDepthwiseConvolutionLayer3x3
829 - Removed NEDepthwiseConvolutionLayerOptimized
830 - Added support for Winograd 3x3,4x4 on Arm® Neon™ FP16:
831     - @ref NEWinogradConvolutionLayer
832     - CpuWinogradConv2dTransformInputKernel
833     - CpuWinogradConv2dTransformOutputKernel
834     - CpuWinogradConv2dTransformWeightsKernel
835 - Added CLCompileContext
836 - Added Arm® Neon™ GEMM kernel with 2D window support
837
838v20.02.1 Maintenance release
839 - Added Android-NN build script.
840
841v20.02 Public major release
842 - Various bug fixes.
843 - Various optimisations.
844 - Added new data type QASYMM8_SIGNED support for:
845     - @ref CLDepthwiseConvolutionLayer
846     - CLDepthwiseConvolutionLayer3x3
847     - @ref CLGEMMConvolutionLayer
848     - CLGEMMLowpMatrixMultiplyCore
849     - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
850     - CLGEMMLowpMatrixMultiplyNativeKernel
851     - @ref NEActivationLayer
852     - NEComparisonOperationKernel
853     - @ref NEConvolutionLayer
854     - @ref NEDepthwiseConvolutionLayer
855     - NEDepthwiseConvolutionLayer3x3Kernel
856     - NEDirectConvolutionLayerOutputStageKernel
857     - @ref NEElementwiseComparison
858     - @ref NEElementwiseMax
859     - @ref NEElementwiseMin
860     - @ref NEElementwiseSquaredDiff
861     - @ref NEFullyConnectedLayer
862     - NEGEMMMatrixVectorMultiplyKernel
863     - @ref NEPixelWiseMultiplication
864     - @ref NEPoolingLayer
865     - @ref NEPReluLayer
866 - Added support for QSYMM8_PER_CHANNEL in:
867     - NEDepthwiseConvolutionLayer3x3Kernel
868 - Added support for split sizes in:
869     - @ref CLSplit
870     - @ref NESplit
871 - New OpenCL kernels / functions:
872     - @ref CLFill
873     - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
874 - New Arm® Neon™ kernels / functions:
875     - @ref NEFill
876     - NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
877 - Deprecated Arm® Neon™ functions / interfaces:
878     - CLDepthwiseConvolutionLayer3x3
879     - NEDepthwiseConvolutionLayerOptimized
880     - PoolingLayerInfo constructors without Data Layout.
881 - Added support for quantization with multiplier greater than 1 on Arm® Neon™ and CL.
882 - Added support for quantized inputs of type QASYMM8_SIGNED and QASYMM8 to @ref CLQuantizationLayer.
883 - Added the ability to build bootcode for bare metal.
884 - Added support for generating synthetic QASYMM8 graphs.
885 - Added support for F16 datatype in VGG16.
886 - Removed pre-built binaries for GLES.
887
888v19.11.1 Public maintenance release
889 - Fix offset calculation in NEReductionOperationKernel.
890 - Fix data layout in NEScaleKernel for nhwc.
891 - Retain configuration step data layout to avoid side-effects.
892 - Perform sqrt in double domain for L2 pooling.
893 - Fix output shape calculation for Reduce Mean
894 - Restrict cases where optimized NEPadLayer runs.
895
896v19.11 Public major release
897 - Various bug fixes.
898 - Various optimisations.
899 - Updated recommended NDK version to r17c.
900 - Deprecated OpenCL kernels / functions:
901    - CLDepthwiseConvolutionLayerReshapeWeightsGenericKernel
902    - CLDepthwiseIm2ColKernel
903    - CLDepthwiseSeparableConvolutionLayer
904    - CLDepthwiseVectorToTensorKernel
905    - CLDirectConvolutionLayerOutputStageKernel
906 - Deprecated Arm® Neon™ kernels / functions:
907    - NEDepthwiseWeightsReshapeKernel
908    - NEDepthwiseIm2ColKernel
909    - NEDepthwiseSeparableConvolutionLayer
910    - NEDepthwiseVectorToTensorKernel
911    - NEDepthwiseConvolutionLayer3x3
912 - New OpenCL kernels / functions:
913    - @ref CLInstanceNormalizationLayerKernel / @ref CLInstanceNormalizationLayer
914    - @ref CLDepthwiseConvolutionLayerNativeKernel to replace the old generic depthwise convolution (see Deprecated
915      OpenCL kernels / functions)
916    - @ref CLLogSoftmaxLayer
917 - New Arm® Neon™ kernels / functions:
918    - @ref NEBoundingBoxTransformKernel / @ref NEBoundingBoxTransform
919    - @ref NEComputeAllAnchorsKernel / NEComputeAllAnchors
920    - @ref NEDetectionPostProcessLayer
921    - @ref NEGenerateProposalsLayer
922    - @ref NEInstanceNormalizationLayerKernel / @ref NEInstanceNormalizationLayer
923    - @ref NELogSoftmaxLayer
924    - @ref NEROIAlignLayerKernel / @ref NEROIAlignLayer
925 - Added QASYMM8 support for:
926    - @ref CLGenerateProposalsLayer
927    - @ref CLROIAlignLayer
928    - @ref CPPBoxWithNonMaximaSuppressionLimit
929 - Added QASYMM16 support for:
930    - @ref CLBoundingBoxTransform
931 - Added FP16 support for:
932    - CLGEMMMatrixMultiplyReshapedKernel
933 - Added new data type QASYMM8_PER_CHANNEL support for:
934    - CLDequantizationLayer
935    - @ref NEDequantizationLayer
936 - Added new data type QSYMM8_PER_CHANNEL support for:
937    - @ref CLConvolutionLayer
938    - @ref NEConvolutionLayer
939    - @ref CLDepthwiseConvolutionLayer
940    - @ref NEDepthwiseConvolutionLayer
941 - Added FP16 mixed-precision support for:
942    - CLGEMMMatrixMultiplyReshapedKernel
943    - CLPoolingLayerKernel
944 - Added FP32 and FP16 ELU activation for:
945    - @ref CLActivationLayer
946    - @ref NEActivationLayer
947 - Added asymmetric padding support for:
948    - @ref CLDirectDeconvolutionLayer
949    - @ref CLGEMMDeconvolutionLayer
950    - @ref NEDeconvolutionLayer
951 - Added SYMMETRIC and REFLECT modes for @ref CLPadLayerKernel / @ref CLPadLayer.
952 - Replaced the calls to NECopyKernel and NEMemsetKernel with @ref NEPadLayer in @ref NEGenerateProposalsLayer.
953 - Replaced the calls to CLCopyKernel and CLMemsetKernel with @ref CLPadLayer in @ref CLGenerateProposalsLayer.
954 - Improved performance for CL Inception V3 - FP16.
955 - Improved accuracy for CL Inception V3 - FP16 by enabling FP32 accumulator (mixed-precision).
956 - Improved Arm® Neon™ performance by enabling fusing batch normalization with convolution and depth-wise convolution layer.
957 - Improved Arm® Neon™ performance for MobileNet-SSD by improving the output detection performance.
958 - Optimized @ref CLPadLayer.
959 - Optimized CL generic depthwise convolution layer by introducing @ref CLDepthwiseConvolutionLayerNativeKernel.
960 - Reduced memory consumption by implementing weights sharing.
961
962v19.08.1 Public maintenance release
963 - Fix offset calculation in NEReductionOperationKernel.
964 - Fix data layout in NEScaleKernel for nhwc.
965 - Retain configuration step data layout to avoid side-effects.
966 - Perform sqrt in double domain for L2 pooling.
967 - Fix output shape calculation for Reduce Mean
968 - Fix broadcast CLPixelwiseMultiplication with 5D tensors
969
970v19.08 Public major release
971 - Various bug fixes.
972 - Various optimisations.
973 - Deprecated Arm® Neon™ functions
974    - NEDepthConcatenateLayer
975    - NEWidthConcatenateLayer
976 - Deprecated OpenCL kernels / functions
977    - CLDepthConcatenateLayer
978    - CLGEMMInterleave4x4Kernel / CLGEMMInterleave4x4
979    - CLGEMMTranspose1xWKernel / CLGEMMTranspose1xW
980    - CLWidthConcatenateLayer
981 - New Arm® Neon™ kernels / functions:
982    - @ref NEAbsLayer
983    - @ref NECast
984    - @ref NEElementwisePower
985    - @ref NELogLayer
986    - @ref NELSTMLayerQuantized
987    - @ref NENegLayer
988    - @ref NEPReluLayer
989    - @ref NESinLayer
990    - NEBatchConcatenateLayerKernel
991    - @ref NEDepthToSpaceLayerKernel / @ref NEDepthToSpaceLayer
992    - NEDepthwiseConvolutionLayerNativeKernel
993    - NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
994    - @ref NEMeanStdDevNormalizationKernel / @ref NEMeanStdDevNormalizationLayer
995    - @ref NESpaceToDepthLayerKernel / @ref NESpaceToDepthLayer
996 - New OpenCL kernels / functions:
997    - @ref CLAbsLayer
998    - @ref CLElementwisePower
999    - @ref CLLogLayer
1000    - @ref CLLSTMLayerQuantized
1001    - @ref CLNegLayer
1002    - @ref CLPReluLayer
1003    - @ref CLSinLayer
1004    - CLBatchConcatenateLayerKernel
1005    - @ref CLDepthToSpaceLayerKernel / @ref CLDepthToSpaceLayer
1006    - CLGEMMLowpMatrixMultiplyNativeKernel
1007    - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
1008    - CLGEMMMatrixMultiplyNativeKernel
1009    - CLMeanStdDevNormalizationKernel /CLMeanStdDevNormalizationLayer
1010    - @ref CLSpaceToDepthLayerKernel / @ref CLSpaceToDepthLayer
1011 - New examples:
1012    - neon_opticalflow
1013    - cl_cache
1014    - neon_permute
1015 - Added support for FP16 in @ref NEDeconvolutionLayer
1016 - Added support for FP16 in @ref CLDeconvolutionLayer
1017 - Added support for REDUCE_MIN and REDUCE_MAX in @ref ReductionOperation
1018 - Enable the fusion of batch normalization with convolution and depthwise convolution layer for FP32 in the graph API (OpenCL only)
1019 - Added support for fusing activation function and broadcast addition with the matrix multiplication for FP32 (OpenCL only)
1020 - Re-factored the depthwise convolution layer kernel on Arm® Neon™ for generic cases
1021 - Added an optimized depthwise convolution layer kernel for 5x5 filters (Neon™ only)
1022 - Added support to enable OpenCL kernel cache. Added example showing how to load the prebuilt OpenCL kernels from a binary cache file
1023 - Altered @ref QuantizationInfo interface to support per-channel quantization.
1024 - The CLDepthwiseConvolutionLayer3x3 will be included by @ref CLDepthwiseConvolutionLayer to accommodate for future optimizations.
1025 - The NEDepthwiseConvolutionLayerOptimized will be included by @ref NEDepthwiseConvolutionLayer to accommodate for future optimizations.
1026 - Removed inner_border_right and inner_border_top parameters from @ref CLDeconvolutionLayer interface
1027 - Removed inner_border_right and inner_border_top parameters from @ref NEDeconvolutionLayer interface
1028 - Optimized the Arm® Neon™ assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel
1029
1030v19.05 Public major release
1031 - Various bug fixes.
1032 - Various optimisations.
1033 - New Arm® Neon™ kernels / functions:
1034    - @ref NEBatchToSpaceLayerKernel / @ref NEBatchToSpaceLayer
1035    - NEComplexPixelWiseMultiplicationKernel / @ref NEComplexPixelWiseMultiplication
1036    - @ref NECropKernel / @ref NECropResize
1037    - NEDepthwiseConvolutionAssemblyDispatch
1038    - @ref NEFFTDigitReverseKernel
1039    - @ref NEFFTRadixStageKernel
1040    - @ref NEFFTScaleKernel
1041    - NEGEMMLowpOffsetContributionOutputStageKernel
1042    - NEHeightConcatenateLayerKernel
1043    - @ref NESpaceToBatchLayerKernel / @ref NESpaceToBatchLayer
1044    - @ref NEFFT1D
1045    - @ref NEFFT2D
1046    - @ref NEFFTConvolutionLayer
1047 - New OpenCL kernels / functions:
1048    - CLComplexPixelWiseMultiplicationKernel / @ref CLComplexPixelWiseMultiplication
1049    - CLCropKernel / @ref CLCropResize
1050    - @ref CLDeconvolutionReshapeOutputKernel
1051    - @ref CLFFTDigitReverseKernel
1052    - @ref CLFFTRadixStageKernel
1053    - @ref CLFFTScaleKernel
1054    - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
1055    - CLGEMMMatrixMultiplyReshapedOnlyRHSKernel
1056    - CLHeightConcatenateLayerKernel
1057    - @ref CLDirectDeconvolutionLayer
1058    - @ref CLFFT1D
1059    - @ref CLFFT2D
1060    - @ref CLFFTConvolutionLayer
1061    - @ref CLGEMMDeconvolutionLayer
1062 - New OpenGLES kernels / functions:
1063    - GCConcatenateLayer
1064 - Deprecated functions/interfaces
1065    - GCDepthConcatenateLayer
1066    - NEWidthConcatenateLayer
1067    - NEDepthConcatenateLayer
1068    - CLWidthConcatenateLayer
1069    - CLDepthConcatenateLayer
1070    - CLGEMMInterleave4x4
1071    - CLGEMMTranspose1xW
1072 - Support different quantization info in CLConcatLayer.
1073 - Add checks on different input/output quantization info were not supported.
1074 - Tensors have different quantization information.
1075 - Add FP16 support checks.
1076 - Fix output quantization CLDeptwiseConv3x3 when activation is fused.
1077 - New graph examples:
1078     - graph_convolution
1079     - graph_fully_connected
1080     - graph_depthwise_convolution
1081     - Deepspeech v0.4.1
1082 - Add support for QASYMM8 in NEArithmeticSubtractionKernel.
1083 - Add support for QASYMM8 in NEPixelWiseMultiplicationKernel.
1084 - Add support for QASYMM8 NEDeconvolution.
1085 - Add support for DequantizationLayer for Neon/CL.
1086 - Add support for dilation in CLDepthwiseConvolution.
1087 - Fuse offset contribution with the output stage when we use NEGEMMLowpMatrixMultiplyCore.
1088 - Optimize CLDeconvolution.
1089 - Add StackLayer to the graph API.
1090 - Add support for "reflect" padding mode in NEPad.
1091 - Winograd 7x7 NHWC on OpenCL.
1092 - Rework CL ML layers to run exclusively on CL.
1093 - Support different quantization info in PoolingLayer.
1094 - Implement and test import memory interfaces.
1095 - Added new tests and removed old ones.
1096 - Various clang-tidy fixes.
1097
1098v19.02 Public major release
1099 - Various bug fixes.
1100 - Various optimisations.
1101 - New Arm® Neon™ kernels / functions:
1102    - @ref NETileKernel / @ref NETile
1103    - @ref NEFuseBatchNormalizationKernel / @ref NEFuseBatchNormalization
1104    - NEElementwiseOperationKernel
1105    - @ref NEElementwiseMax
1106    - @ref NEElementwiseMin
1107    - @ref NEElementwiseSquaredDiff
1108    - @ref NESelectKernel / @ref NESelect
1109    - @ref NESplit
1110    - @ref NESlice
1111    - @ref NEUnstack
1112    - @ref NEStridedSliceKernel / @ref NEStridedSlice
1113    - NEElementwiseUnaryKernel
1114    - @ref NERsqrtLayer
1115    - @ref NEExpLayer
1116    - @ref NEReverseKernel / @ref NEReverse
1117    - @ref NEArgMinMaxLayer
1118    - @ref NEStackLayerKernel / @ref NEStackLayer
1119    - @ref NERangeKernel / @ref NERange
1120    - @ref NEPadLayer
1121    - NEMemsetKernel
1122    - @ref NEGatherKernel / @ref NEGather
1123    - @ref NEElementwiseComparison
1124    - @ref NEElementwiseComparisonStatic
1125    - NEComparisonOperationKernel
1126    - @ref NEElementwiseDivision
1127 - New OpenCL kernels / functions:
1128    - @ref CLSelectKernel / @ref CLSelect
1129    - @ref CLTileKernel / @ref CLTile
1130    - @ref CLComparisonKernel / @ref CLComparison
1131    - @ref CLArgMinMaxLayer
1132    - @ref CLElementwiseMax
1133    - @ref CLElementwiseMin
1134    - @ref CLElementwiseSquaredDiff
1135    - @ref CLStackLayerKernel / @ref CLStackLayer
1136    - @ref CLReverse / @ref CLReverseKernel
1137    - @ref CLRsqrtLayer
1138    - @ref CLExpLayer
1139    - CLElementWiseUnaryLayerKernel
1140    - CLGEMMReshapeLHSMatrixKernel
1141    - CLGEMMReshapeRHSMatrixKernel
1142    - CLGEMMMatrixMultiplyReshapedKernel
1143    - @ref CLRangeKernel / @ref CLRange
1144    - @ref CLUnstack
1145    - @ref CLGatherKernel / @ref CLGather
1146    - CLGEMMLowpMatrixMultiplyReshapedKernel
1147 - New CPP kernels / functions:
1148    - @ref CPPDetectionOutputLayer
1149    - @ref CPPTopKV / @ref CPPTopKVKernel
1150 - Added new examples:
1151    - graph_ssd_mobilenet.cpp
1152    - graph_mobilenet_v2.cpp
1153    - graph_resnet12.cpp
1154    - graph_srcnn955.cpp
1155    - graph_vgg_vdsr.cpp
1156    - graph_inception_resnet_v1.cpp
1157 - Add 4D tensors support to
1158    - @ref NESoftmaxLayer
1159 - Fused activation in @ref CLWinogradConvolutionLayer
1160 - Extended @ref NEPermute to support more cases
1161 - Added Neon™/SVE GEMM Hybrid kernels
1162 - Added u8 and s8 hybrid assembly kernels
1163 - Introduced GEMM strategy name in NEGEMMAssemblyWrapper
1164 - Improved @ref CLTuner
1165 - Fused the bias addition within @ref CLGEMM
1166 - Added support for QASYMM8 LOGISTIC activation in @ref NEActivationLayer
1167 - Added NHWC data layout support to:
1168    - @ref NEScale for F16
1169    - @ref CLNormalizationLayer IN_MAP_2D for FP32/FP16
1170    - @ref NEL2NormalizeLayer for FP32/FP16
1171    - @ref NENormalizationLayer IN_MAP_2D for FP32/FP16
1172    - @ref CLROIAlignLayer
1173    - @ref CLGenerateProposalsLayer
1174 - Added QASYMM8 support to the following kernels:
1175    - NEArithmeticAdditionKernel
1176    - @ref NEScale
1177 - Added new tests and improved validation and benchmarking suites.
1178 - Deprecated functions/interfaces
1179    - Usage of inner_border_right and inner_border_top has been deprecated in @ref CLDeconvolutionLayer and @ref NEDeconvolutionLayer
1180
1181v18.11 Public major release
1182 - Various bug fixes.
1183 - Various optimisations.
1184 - New Arm® Neon™ kernels / functions:
1185    - @ref NEChannelShuffleLayer / @ref NEChannelShuffleLayerKernel
1186    - @ref NEReduceMean
1187    - @ref NEReorgLayer / @ref NEReorgLayerKernel
1188    - @ref NEPriorBoxLayer / @ref NEPriorBoxLayerKernel
1189    - NEUpsampleLayer / NEUpsampleLayerKernel
1190    - NEYOLOLayer / NEYOLOLayerKernel
1191 - New OpenCL kernels / functions:
1192    - @ref CLBatchToSpaceLayer / @ref CLBatchToSpaceLayerKernel
1193    - @ref CLBoundingBoxTransform / @ref CLBoundingBoxTransformKernel
1194    - @ref CLComputeAllAnchorsKernel
1195    - @ref CLGenerateProposalsLayer
1196    - @ref CLNormalizePlanarYUVLayer / @ref CLNormalizePlanarYUVLayerKernel
1197    - @ref CLReorgLayer / @ref CLReorgLayerKernel
1198    - @ref CLSpaceToBatchLayer / @ref CLSpaceToBatchLayerKernel
1199    - @ref CLPadLayer
1200    - @ref CLReduceMean
1201    - @ref CLPriorBoxLayer / @ref CLPriorBoxLayerKernel
1202    - @ref CLROIAlignLayer / @ref CLROIAlignLayerKernel
1203    - @ref CLSlice
1204    - @ref CLSplit
1205    - @ref CLStridedSlice / @ref CLStridedSliceKernel
1206    - CLUpsampleLayer / CLUpsampleLayerKernel
1207    - CLYOLOLayer / CLYOLOLayerKernel
1208 - New CPP kernels / functions:
1209    - @ref CPPBoxWithNonMaximaSuppressionLimit / @ref CPPBoxWithNonMaximaSuppressionLimitKernel
1210 - Added the validate method in:
1211    - @ref NEDepthConvertLayer
1212    - @ref NEFloor / @ref CLFloor
1213    - NEGEMMMatrixAdditionKernel
1214    - @ref NEReshapeLayer / @ref CLReshapeLayer
1215    - @ref CLScale
1216 - Added new examples:
1217    - graph_shufflenet.cpp
1218    - graph_yolov3.cpp
1219 - Added documentation for add a new function or kernel.
1220 - Improved doxygen documentation adding a list of the existing functions.
1221 - Add 4D tensors support to
1222    - CLWidthConcatenateLayer
1223    - CLFlattenLayer
1224    - @ref CLSoftmaxLayer
1225 - Add dot product support for CLDepthwiseConvolutionLayer3x3NHWCKernel non-unit stride
1226 - Add SVE support
1227 - Fused batch normalization into convolution layer weights in @ref CLFuseBatchNormalization
1228 - Fuses activation in CLDepthwiseConvolutionLayer3x3NCHWKernel, CLDepthwiseConvolutionLayer3x3NHWCKernel and @ref NEGEMMConvolutionLayer
1229 - Added NHWC data layout support to:
1230    - @ref CLChannelShuffleLayer
1231    - @ref CLDeconvolutionLayer
1232    - @ref CLL2NormalizeLayer
1233 - Added QASYMM8 support to the following kernels:
1234    - CLScaleKernel
1235    - NEDepthwiseConvolutionLayer3x3Kernel
1236    - CLPixelWiseMultiplicationKernel
1237 - Added FP16 support to the following kernels:
1238    - CLDepthwiseConvolutionLayer3x3NHWCKernel
1239    - NEDepthwiseConvolutionLayer3x3Kernel
1240    - @ref CLNormalizePlanarYUVLayerKernel
1241    - @ref CLWinogradConvolutionLayer (5x5 kernel)
1242 - More tests added to both validation and benchmarking suites.
1243
1244v18.08 Public major release
1245 - Various bug fixes.
1246 - Various optimisations.
1247 - Updated recommended NDK version to r17b.
1248 - Removed support for QS8/QS16 data types.
1249 - Added support for grouped convolution in @ref CLConvolutionLayer.
1250 - Added NHWC data layout support to:
1251    - NEDepthConcatenateLayer / CLDepthConcatenateLayer
1252    - @ref NEWinogradConvolutionLayer / @ref CLWinogradConvolutionLayer
1253    - @ref CLDepthwiseConvolutionLayer
1254    - @ref CLDirectConvolutionLayer
1255    - @ref CLConvolutionLayer
1256    - @ref CLScale
1257    - CLIm2ColKernel
1258 - New Arm® Neon™ kernels / functions:
1259    - @ref NERNNLayer
1260 - New OpenCL kernels / functions:
1261    - @ref CLArithmeticDivision
1262 - Introduced prepare() stage support in the graph API for GLES.
1263 - Added support for memory reusage when trying to allocate smaller CLTensors.
1264 - Enabled NHWC execution on graph examples.
1265 - Added JPEG accessor for validation purposes.
1266 - Added validate methods to some kernels / functions.
1267
1268v18.05 Public major release
1269 - Various bug fixes.
1270 - Various optimisations.
1271 - Major redesign in the interface for the Neon™ kernels implemented in assembly.
1272 - Removed arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore / arm_compute::NEHGEMMAArch64FP16Kernel
1273 - Added NEGEMMAssemblyWrapper and AssemblyKernelGlue which are used to execute assembly kernels in Neon™ functions.
1274 - Minor changes to the CPUInfo type to make it compatible with the new assembly gemm interface.
1275 - Moved Neon™ assembly kernels to the folder src/core/Neon/kernels/arm_gemm.
1276 - Improved doxygen documentation.
1277 - Improved memory management for layer's transitions.
1278 - Added support for NHWC data layout in tensors.
1279 - Added NHWC data layout support to:
1280    - @ref NEGEMMConvolutionLayer
1281    - @ref NEDirectConvolutionLayer
1282    - @ref NEPoolingLayer / @ref CLPoolingLayer
1283    - @ref NEBatchNormalizationLayer / @ref CLBatchNormalizationLayer
1284    - @ref NEDepthwiseConvolutionLayer
1285    - @ref NEScale
1286    - NEIm2Col
1287 - Added support for dilated convolutions in @ref NEConvolutionLayer and @ref CLConvolutionLayer.
1288 - New OpenCL kernels / functions:
1289    - @ref CLChannelShuffleLayer / @ref CLChannelShuffleLayerKernel
1290    - CLConvertFullyConnectedWeightsKernel / @ref CLConvertFullyConnectedWeights
1291    - @ref CLCopy / CLCopyKernel
1292    - @ref CLLSTMLayer
1293    - @ref CLRNNLayer
1294    - CLWidthConcatenateLayer / CLWidthConcatenateLayerKernel
1295    - CLWinogradFilterTransformKernel / @ref CLWinogradConvolutionLayer
1296    - CLWinogradInputTransformKernel / CLWinogradInputTransform
1297 - New Arm® Neon™ kernels / functions:
1298    - NEConvertFullyConnectedWeightsKernel / @ref NEConvertFullyConnectedWeights.
1299 - Created the validate method in @ref CLDepthwiseConvolutionLayer.
1300 - Beta and gamma are no longer mandatory arguments in @ref NEBatchNormalizationLayer and @ref CLBatchNormalizationLayer.
1301 - Added depth multiplier support in @ref NEDepthwiseConvolutionLayer and @ref CLDepthwiseConvolutionLayer.
1302 - Added broadcast multiply support in @ref NEPixelWiseMultiplication / NEPixelWiseMultiplicationKernel.
1303 - Port mobilenet example to NHWC data layout.
1304 - Enabled Winograd method in @ref CLConvolutionLayer.
1305 - Renamed NEWinogradLayer to @ref NEWinogradConvolutionLayer.
1306 - Updated @ref NEWinogradConvolutionLayer to use highly optimised assembly kernels in src/core/Neon/kernels/arm_gemm.
1307 - Added memory manager support in GLES functions.
1308 - Major refactoring of the graph API.
1309 - Added GLES backend in the graph API.
1310 - Added support for the memory manager in the graph API.
1311 - Enabled Winograd Convolution method in the graph API.
1312 - Added support for grouped convolutions in the graph API.
1313 - Replaced NEDeconvolutionLayerUpsampleKernel with NEScaleKernel in @ref NEDeconvolutionLayer.
1314 - Added fast maths flag in @ref CLConvolutionLayer.
1315 - Added new tests and benchmarks in validation and benchmark frameworks
1316 - Merge Activation layer with Convolution Layer (Neon™, CL, GLES)
1317 - Added support to OpenCL 2.0 SVM
1318 - Added support to import memory in OpenCL tensors.
1319 - Added the prepare() method to perform any one off pre-processing before running the function.
1320 - Added new examples:
1321    - graph_inception_v4.cpp
1322    - graph_resnext50.cpp
1323 - Added memory measurement instrument for CL.
1324
1325v18.03 Public maintenance release
1326 - Various bug fixes.
1327 - Fixed bug in @ref NEActivationLayer
1328 - Fix in @ref CLTuner when using batches.
1329 - Updated recommended NDK version to r16b (And fixed warnings).
1330 - Fixed bug in validation code.
1331 - Added Inception v4 graph example.
1332 - Renamed NEWinogradLayer.cpp to @ref NEWinogradConvolutionLayer
1333
1334v18.02 Public major release
1335 - Various Arm® Neon™ / OpenCL / GLES optimisations.
1336 - Various bug fixes.
1337 - Changed default number of threads on big LITTLE systems.
1338 - Refactored examples and added:
1339    - graph_mobilenet_qassym8
1340    - graph_resnet
1341    - graph_squeezenet_v1_1
1342 - Renamed @ref CLConvolutionLayer into @ref CLGEMMConvolutionLayer and created a new @ref CLConvolutionLayer to select the fastest convolution method.
1343 - Renamed @ref NEConvolutionLayer into @ref NEGEMMConvolutionLayer and created a new @ref NEConvolutionLayer to select the fastest convolution method.
1344 - Added in place support to:
1345    - @ref CLActivationLayer
1346    - @ref CLBatchNormalizationLayer
1347 - Added QASYMM8 support to:
1348    - @ref CLActivationLayer
1349    - @ref CLDepthwiseConvolutionLayer
1350    - @ref NEDepthwiseConvolutionLayer
1351    - @ref NESoftmaxLayer
1352 - Added FP16 support to:
1353    - CLDepthwiseConvolutionLayer3x3
1354    - @ref CLDepthwiseConvolutionLayer
1355 - Added broadcasting support to NEArithmeticAddition / @ref CLArithmeticAddition / @ref CLPixelWiseMultiplication
1356 - Added fused batched normalization and activation to @ref CLBatchNormalizationLayer and @ref NEBatchNormalizationLayer
1357 - Added support for non-square pooling to @ref NEPoolingLayer and @ref CLPoolingLayer
1358 - New OpenCL kernels / functions:
1359    - CLDirectConvolutionLayerOutputStageKernel
1360 - New Arm® Neon™ kernels / functions
1361    - Added name() method to all kernels.
1362    - Added support for Winograd 5x5.
1363    - NEPermuteKernel / @ref NEPermute
1364    - CpuWinogradConv2dTransformInputKernel / NEWinogradLayer
1365    - CpuWinogradConv2dTransformOutputKernel / NEWinogradLayer
1366    - CpuWinogradConv2dTransformWeightsKernel / NEWinogradLayer
1367    - Renamed NEWinogradLayerKernel into NEWinogradLayerBatchedGEMMKernel
1368 - New GLES kernels / functions:
1369    - GCTensorShiftKernel / GCTensorShift
1370
1371v18.01 Public maintenance release
1372 - Various bug fixes
1373 - Added some of the missing validate() methods
1374 - Added @ref CLDeconvolutionLayerUpsampleKernel / @ref CLDeconvolutionLayer @ref CLDeconvolutionLayerUpsample
1375 - Added CLPermuteKernel / @ref CLPermute
1376 - Added method to clean the programs cache in the CL Kernel library.
1377 - Added GCArithmeticAdditionKernel / GCArithmeticAddition
1378 - Added GCDepthwiseConvolutionLayer3x3Kernel / GCDepthwiseConvolutionLayer3x3
1379 - Added GCNormalizePlanarYUVLayerKernel / GCNormalizePlanarYUVLayer
1380 - Added GCScaleKernel / GCScale
1381 - Added GCWeightsReshapeKernel / GCConvolutionLayer
1382 - Added FP16 support to the following GLES compute kernels:
1383    - GCCol2ImKernel
1384    - GCGEMMInterleave4x4Kernel
1385    - GCGEMMTranspose1xWKernel
1386    - GCIm2ColKernel
1387 - Refactored Arm® Neon™ Winograd (NEWinogradLayerKernel)
1388 - Added NEDirectConvolutionLayerOutputStageKernel
1389 - Added QASYMM8 support to the following Arm® Neon™ kernels:
1390    - NEDepthwiseConvolutionLayer3x3Kernel
1391    - @ref NEFillBorderKernel
1392    - NEPoolingLayerKernel
1393 - Added new examples:
1394    - graph_cl_mobilenet_qasymm8.cpp
1395    - graph_inception_v3.cpp
1396    - gc_dc.cpp
1397 - More tests added to both validation and benchmarking suites.
1398
1399v17.12 Public major release
1400 - Most machine learning functions on OpenCL support the new data type QASYMM8
1401 - Introduced logging interface
1402 - Introduced opencl timer
1403 - Reworked GEMMLowp interface
1404 - Added new Arm® Neon™ assembly kernels for GEMMLowp, SGEMM and HGEMM
1405 - Added validation method for most Machine Learning kernels / functions
1406 - Added new graph examples such as googlenet, mobilenet, squeezenet, vgg16 and vgg19
1407 - Added sgemm example for OpenCL
1408 - Added absolute difference example for GLES compute
1409 - Added new tests and benchmarks in validation and benchmark frameworks
1410 - Added new kernels / functions for GLES compute
1411
1412 - New OpenGL ES kernels / functions
1413    - GCAbsoluteDifferenceKernel / GCAbsoluteDifference
1414    - GCActivationLayerKernel / GCActivationLayer
1415    - GCBatchNormalizationLayerKernel / GCBatchNormalizationLayer
1416    - GCCol2ImKernel
1417    - GCDepthConcatenateLayerKernel / GCDepthConcatenateLayer
1418    - GCDirectConvolutionLayerKernel / GCDirectConvolutionLayer
1419    - GCDropoutLayerKernel / GCDropoutLayer
1420    - GCFillBorderKernel / GCFillBorder
1421    - GCGEMMInterleave4x4Kernel / GCGEMMInterleave4x4
1422    - GCGEMMMatrixAccumulateBiasesKernel / GCGEMMMatrixAdditionKernel / GCGEMMMatrixMultiplyKernel / GCGEMM
1423    - GCGEMMTranspose1xWKernel / GCGEMMTranspose1xW
1424    - GCIm2ColKernel
1425    - GCNormalizationLayerKernel / GCNormalizationLayer
1426    - GCPixelWiseMultiplicationKernel / GCPixelWiseMultiplication
1427    - GCPoolingLayerKernel / GCPoolingLayer
1428    - GCLogits1DMaxKernel / GCLogits1DShiftExpSumKernel / GCLogits1DNormKernel / GCSoftmaxLayer
1429    - GCTransposeKernel / GCTranspose
1430
1431 - New Arm® Neon™ kernels / functions
1432    - arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore
1433    - arm_compute::NEHGEMMAArch64FP16Kernel
1434    - NEDepthwiseConvolutionLayer3x3Kernel / NEDepthwiseIm2ColKernel / NEGEMMMatrixVectorMultiplyKernel / NEDepthwiseVectorToTensorKernel / @ref NEDepthwiseConvolutionLayer
1435    - NEGEMMLowpOffsetContributionKernel / NEGEMMLowpMatrixAReductionKernel / NEGEMMLowpMatrixBReductionKernel / NEGEMMLowpMatrixMultiplyCore
1436    - NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
1437    - NEWinogradLayer / NEWinogradLayerKernel
1438
1439 - New OpenCL kernels / functions
1440    - CLGEMMLowpOffsetContributionKernel / CLGEMMLowpMatrixAReductionKernel / CLGEMMLowpMatrixBReductionKernel / CLGEMMLowpMatrixMultiplyCore
1441    - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
1442
1443 - New graph nodes for Arm® Neon™ and OpenCL
1444    - graph::BranchLayer
1445    - graph::DepthConvertLayer
1446    - graph::DepthwiseConvolutionLayer
1447    - graph::DequantizationLayer
1448    - graph::FlattenLayer
1449    - graph::QuantizationLayer
1450    - graph::ReshapeLayer
1451
1452v17.10 Public maintenance release
1453 - Bug fixes:
1454    - Check the maximum local workgroup size supported by OpenCL devices
1455    - Minor documentation updates (Fixed instructions to build the examples)
1456    - Introduced a graph::GraphContext
1457    - Added a few new Graph nodes, support for branches and grouping.
1458    - Automatically enable cl_printf in debug builds
1459    - Fixed bare metal builds for armv7a
1460    - Added AlexNet and cartoon effect examples
1461    - Fixed library builds: libraries are no longer built as supersets of each other.(It means application using the Runtime part of the library now need to link against both libarm_compute_core and libarm_compute)
1462
1463v17.09 Public major release
1464 - Experimental Graph support: initial implementation of a simple stream API to easily chain machine learning layers.
1465 - Memory Manager (@ref BlobLifetimeManager, @ref BlobMemoryPool, @ref ILifetimeManager, @ref IMemoryGroup, @ref IMemoryManager, @ref IMemoryPool, @ref IPoolManager, @ref MemoryManagerOnDemand, @ref PoolManager)
1466 - New validation and benchmark frameworks (Boost and Google frameworks replaced by homemade framework).
1467 - Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both Arm® Neon™ and OpenCL.
1468 - New Arm® Neon™ kernels / functions:
1469    - arm_compute::NEGEMMAssemblyBaseKernel arm_compute::NEGEMMAArch64Kernel
1470    - NEDequantizationLayerKernel / @ref NEDequantizationLayer
1471    - NEFloorKernel / @ref NEFloor
1472    - @ref NEL2NormalizeLayerKernel / @ref NEL2NormalizeLayer
1473    - NEQuantizationLayerKernel NEMinMaxLayerKernel / @ref NEQuantizationLayer
1474    - @ref NEROIPoolingLayerKernel / @ref NEROIPoolingLayer
1475    - @ref NEReductionOperationKernel / @ref NEReductionOperation
1476    - NEReshapeLayerKernel / @ref NEReshapeLayer
1477
1478 - New OpenCL kernels / functions:
1479    - CLDepthwiseConvolutionLayer3x3NCHWKernel CLDepthwiseConvolutionLayer3x3NHWCKernel CLDepthwiseIm2ColKernel CLDepthwiseVectorToTensorKernel CLDepthwiseWeightsReshapeKernel / CLDepthwiseConvolutionLayer3x3 @ref CLDepthwiseConvolutionLayer CLDepthwiseSeparableConvolutionLayer
1480    - CLDequantizationLayerKernel / CLDequantizationLayer
1481    - CLDirectConvolutionLayerKernel / @ref CLDirectConvolutionLayer
1482    - CLFlattenLayer
1483    - CLFloorKernel / @ref CLFloor
1484    - CLGEMMTranspose1xW
1485    - CLGEMMMatrixVectorMultiplyKernel
1486    - @ref CLL2NormalizeLayerKernel / @ref CLL2NormalizeLayer
1487    - CLQuantizationLayerKernel CLMinMaxLayerKernel / @ref CLQuantizationLayer
1488    - @ref CLROIPoolingLayerKernel / @ref CLROIPoolingLayer
1489    - @ref CLReductionOperationKernel / @ref CLReductionOperation
1490    - CLReshapeLayerKernel / @ref CLReshapeLayer
1491
1492v17.06 Public major release
1493 - Various bug fixes
1494 - Added support for fixed point 8 bit (QS8) to the various Arm® Neon™ machine learning kernels.
1495 - Added unit tests and benchmarks (AlexNet, LeNet)
1496 - Added support for sub tensors.
1497 - Added infrastructure to provide GPU specific optimisation for some OpenCL kernels.
1498 - Added @ref OMPScheduler (OpenMP) scheduler for Neon
1499 - Added @ref SingleThreadScheduler scheduler for Arm® Neon™ (For bare metal)
1500 - User can specify their own scheduler by implementing the @ref IScheduler interface.
1501 - New OpenCL kernels / functions:
1502    - @ref CLBatchNormalizationLayerKernel / @ref CLBatchNormalizationLayer
1503    - CLDepthConcatenateLayerKernel / CLDepthConcatenateLayer
1504    - CLHOGOrientationBinningKernel CLHOGBlockNormalizationKernel, CLHOGDetectorKernel / CLHOGDescriptor CLHOGDetector CLHOGGradient CLHOGMultiDetection
1505    - CLLocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedLayer
1506    - CLWeightsReshapeKernel / CLConvolutionLayerReshapeWeights
1507 - New C++ kernels:
1508    - CPPDetectionWindowNonMaximaSuppressionKernel
1509 - New Arm® Neon™ kernels / functions:
1510    - @ref NEBatchNormalizationLayerKernel / @ref NEBatchNormalizationLayer
1511    - NEDepthConcatenateLayerKernel / NEDepthConcatenateLayer
1512    - NEDirectConvolutionLayerKernel / @ref NEDirectConvolutionLayer
1513    - NELocallyConnectedMatrixMultiplyKernel / NELocallyConnectedLayer
1514    - NEWeightsReshapeKernel / NEConvolutionLayerReshapeWeights
1515
1516v17.05 Public bug fixes release
1517 - Various bug fixes
1518 - Remaining of the functions ported to use accurate padding.
1519 - Library does not link against OpenCL anymore (It uses dlopen / dlsym at runtime instead to determine whether or not OpenCL is available).
1520 - Added "free" method to allocator.
1521 - Minimum version of g++ required for armv7 Linux changed from 4.8 to 4.9
1522
1523v17.04 Public bug fixes release
1524
1525 The following functions have been ported to use the new accurate padding:
1526 -  CLColorConvertKernel
1527 -  CLEdgeNonMaxSuppressionKernel
1528 -  CLEdgeTraceKernel
1529 -  CLGaussianPyramidHorKernel
1530 -  CLGaussianPyramidVertKernel
1531 -  CLGradientKernel
1532 -  NEChannelCombineKernel
1533 -  NEFillArrayKernel
1534 -  NEGaussianPyramidHorKernel
1535 -  NEGaussianPyramidVertKernel
1536 -  NEHarrisScoreFP16Kernel
1537 -  NEHarrisScoreKernel
1538 -  NEHOGDetectorKernel
1539 -  NELogits1DMaxKernel
1540 -  NELogits1DShiftExpSumKernel
1541 -  NELogits1DNormKernel
1542 -  NENonMaximaSuppression3x3FP16Kernel
1543 -  NENonMaximaSuppression3x3Kernel
1544
1545v17.03.1 First Major public release of the sources
1546 - Renamed the library to arm_compute
1547 - New CPP target introduced for C++ kernels shared between Arm® Neon™ and CL functions.
1548 - New padding calculation interface introduced and ported most kernels / functions to use it.
1549 - New OpenCL kernels / functions:
1550   - CLGEMMLowpMatrixMultiplyKernel / CLGEMMLowp
1551 - New Arm® Neon™ kernels / functions:
1552   - @ref NENormalizationLayerKernel / @ref NENormalizationLayer
1553   - NETransposeKernel / @ref NETranspose
1554   - NELogits1DMaxKernel, NELogits1DShiftExpSumKernel, NELogits1DNormKernel / @ref NESoftmaxLayer
1555   - NEIm2ColKernel, NECol2ImKernel, NEConvolutionLayerWeightsReshapeKernel / @ref NEConvolutionLayer
1556   - NEGEMMMatrixAccumulateBiasesKernel / @ref NEFullyConnectedLayer
1557   - NEGEMMLowpMatrixMultiplyKernel / NEGEMMLowp
1558
1559v17.03 Sources preview
1560 - New OpenCL kernels / functions:
1561   - CLGradientKernel, CLEdgeNonMaxSuppressionKernel, CLEdgeTraceKernel / CLCannyEdge
1562   - GEMM refactoring + FP16 support: CLGEMMInterleave4x4Kernel, CLGEMMTranspose1xWKernel, CLGEMMMatrixMultiplyKernel, CLGEMMMatrixAdditionKernel / @ref CLGEMM
1563   - CLGEMMMatrixAccumulateBiasesKernel / @ref CLFullyConnectedLayer
1564   - CLTransposeKernel / @ref CLTranspose
1565   - CLLKTrackerInitKernel, CLLKTrackerStage0Kernel, CLLKTrackerStage1Kernel, CLLKTrackerFinalizeKernel / CLOpticalFlow
1566   - @ref CLNormalizationLayerKernel / @ref CLNormalizationLayer
1567   - CLLaplacianPyramid, CLLaplacianReconstruct
1568 - New Arm® Neon™ kernels / functions:
1569   - NEActivationLayerKernel / @ref NEActivationLayer
1570   - GEMM refactoring + FP16 support (Requires armv8.2 CPU): NEGEMMInterleave4x4Kernel, NEGEMMTranspose1xWKernel, NEGEMMMatrixMultiplyKernel, NEGEMMMatrixAdditionKernel / @ref NEGEMM
1571   - NEPoolingLayerKernel / @ref NEPoolingLayer
1572
1573v17.02.1 Sources preview
1574 - New OpenCL kernels / functions:
1575   - CLLogits1DMaxKernel, CLLogits1DShiftExpSumKernel, CLLogits1DNormKernel / @ref CLSoftmaxLayer
1576   - CLPoolingLayerKernel / @ref CLPoolingLayer
1577   - CLIm2ColKernel, CLCol2ImKernel, CLConvolutionLayerWeightsReshapeKernel / CLConvolutionLayer
1578   - CLRemapKernel / CLRemap
1579   - CLGaussianPyramidHorKernel, CLGaussianPyramidVertKernel / CLGaussianPyramid, CLGaussianPyramidHalf, CLGaussianPyramidOrb
1580   - CLMinMaxKernel, CLMinMaxLocationKernel / CLMinMaxLocation
1581   - CLNonLinearFilterKernel / CLNonLinearFilter
1582 - New Arm® Neon™ FP16 kernels (Requires armv8.2 CPU)
1583   - NEAccumulateWeightedFP16Kernel
1584   - NEBox3x3FP16Kernel
1585   - NENonMaximaSuppression3x3FP16Kernel
1586
1587v17.02 Sources preview
1588 - New OpenCL kernels / functions:
1589   - CLActivationLayerKernel / @ref CLActivationLayer
1590   - CLChannelCombineKernel / CLChannelCombine
1591   - CLDerivativeKernel / CLChannelExtract
1592   - CLFastCornersKernel / CLFastCorners
1593   - CLMeanStdDevKernel / CLMeanStdDev
1594 - New Arm® Neon™ kernels / functions:
1595   - HOG / SVM: NEHOGOrientationBinningKernel, NEHOGBlockNormalizationKernel, NEHOGDetectorKernel, NEHOGNonMaximaSuppressionKernel / NEHOGDescriptor, NEHOGDetector, NEHOGGradient, NEHOGMultiDetection
1596   - NENonLinearFilterKernel / NENonLinearFilter
1597 - Introduced a CLScheduler to manage the default context and command queue used by the runtime library and create synchronisation events.
1598 - Switched all the kernels / functions to use tensors instead of images.
1599 - Updated documentation to include instructions to build the library from sources.
1600
1601v16.12 Binary preview release
1602 - Original release
1603
1604 */
1605} // namespace arm_compute
1606