1/// 2/// Copyright (c) 2017-2023 Arm Limited. 3/// 4/// SPDX-License-Identifier: MIT 5/// 6/// Permission is hereby granted, free of charge, to any person obtaining a copy 7/// of this software and associated documentation files (the "Software"), to 8/// deal in the Software without restriction, including without limitation the 9/// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or 10/// sell copies of the Software, and to permit persons to whom the Software is 11/// furnished to do so, subject to the following conditions: 12/// 13/// The above copyright notice and this permission notice shall be included in all 14/// copies or substantial portions of the Software. 15/// 16/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17/// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 18/// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 19/// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 20/// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 21/// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 22/// SOFTWARE. 23/// 24namespace arm_compute 25{ 26/** @page versions_changelogs Release Versions and Changelog 27 28@tableofcontents 29 30@section S2_1_versions Release versions 31 32All releases are numbered vYY.MM Where YY are the last two digits of the year, and MM the month number. 33If there is more than one release in a month then an extra sequential number is appended at the end: 34 35 v17.03 (First release of March 2017) 36 v17.03.1 (Second release of March 2017) 37 v17.04 (First release of April 2017) 38 39@note We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes. 40@note Starting from release 22.05, 'master' branch is no longer being used, it has been replaced by 'main'. Please update your clone jobs accordingly. 41 42@section S2_2_changelog Changelog 43v23.02.1 Public patch release 44 - Allow mismatching data layouts between the source tensor and weights for \link cpu::CpuGemmDirectConv2d CpuGemmDirectConv2d \endlink with fixed format kernels. 45 - Fixes for experimental CPU only Bazel and CMake builds. 46 47v23.02 Public major release 48 - New features: 49 - Rework the experimental dynamic fusion interface by identifying auxiliary and intermediate tensors, and specifying an explicit output operator. 50 - Add the following operators to the experimental dynamic fusion API: 51 - GpuAdd, GpuCast, GpuClamp, GpuDepthwiseConv2d, GpuMul, GpuOutput, GpuPool2d, GpuReshape, GpuResize, GpuSoftmax, GpuSub. 52 - Add SME/SME2 kernels for GeMM, Winograd convolution, Depthwise convolution and Pooling. 53 - Add new CPU operator AddMulAdd for float and quantized types. 54 - Add new flag @ref ITensorInfo::lock_paddings() to tensors to prevent extending tensor paddings. 55 - Add experimental support for CPU only Bazel and CMake builds. 56 - Performance optimizations: 57 - Optimize CPU base-e exponential functions for FP32. 58 - Optimize CPU StridedSlice by copying first dimension elements in bulk where possible. 59 - Optimize CPU quantized Subtraction by reusing the quantized Addition kernel. 60 - Optimize CPU ReduceMean by removing quantization steps and performing the operation in integer domain. 61 - Optimize GPU Scale and Dynamic Fusion GpuResize by removing quantization steps and performing the operation in integer domain. 62 - Update the heuristic for CLDepthwiseConvolutionNative kernel. 63 - Add new optimized OpenCL kernel to compute indirect convolution: 64 - \link opencl::kernels::ClIndirectConv2dKernel ClIndirectConv2dKernel \endlink 65 - Add new optimized OpenCL kernel to compute transposed convolution: 66 - \link opencl::kernels::ClTransposedConvolutionKernel ClTransposedConvolutionKernel \endlink 67 - Update recommended/minimum NDK version to r20b. 68 - Various optimizations and bug fixes. 69 70v22.11 Public major release 71 - New features: 72 - Add new experimental dynamic fusion API. 73 - Add CPU batch matrix multiplication with adj_x = false and adj_y = false for FP32. 74 - Add CPU MeanStdDevNorm for QASYMM8. 75 - Add CPU and GPU GELU activation function for FP32 and FP16. 76 - Add CPU swish activation function for FP32 and FP16. 77 - Performance optimizations: 78 - Optimize CPU bilinear scale for FP32, FP16, QASYMM8, QASYMM8_SIGNED, U8 and S8. 79 - Optimize CPU activation functions using LUT-based implementation: 80 - Sigmoid function for QASYMM8 and QASYMM8_SIGNED. 81 - Hard swish function for QASYMM8_SIGNED. 82 - Optimize CPU addition for QASYMM8 and QASYMM8_SIGNED using fixed-point arithmetic. 83 - Optimize CPU multiplication, subtraction and activation layers by considering tensors as 1D. 84 - Optimize GPU depthwise convolution kernel and heuristic. 85 - Optimize GPU Conv2d heuristic. 86 - Optimize CPU MeanStdDevNorm for FP16. 87 - Optimize CPU tanh activation function for FP16 using rational approximation. 88 - Improve GPU GeMMLowp start-up time. 89 - Various optimizations and bug fixes. 90 91v22.08 Public major release 92 - Various bug fixes. 93 - Disable unsafe FP optimizations causing accuracy issues in: 94 - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink 95 - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv3dKernel \endlink 96 - @ref CLDepthwiseConvolutionLayerNativeKernel 97 - Add Dynamic Fusion of Elementwise Operators: Div, Floor, Add. 98 - Optimize the gemm_reshaped_rhs_nly_nt OpenCL kernel using the arm_matrix_multiply extension available for Arm® Mali™-G715 and Arm® Mali™-G615. 99 - Add support for the arm_matrix_multiply extension in the gemmlowp_mm_reshaped_only_rhs_t OpenCL kernel. 100 - Expand GPUTarget list with missing Mali™ GPUs product names: G57, G68, G78AE, G610, G510, G310. 101 - Extend the direct convolution 2d interface to configure the block size. 102 - Update ClConv2D heuristic to use direct convolution. 103 - Use official Khronos® OpenCL extensions: 104 - Add cl_khr_integer_dot_product extension support. 105 - Add support of OpenCL 3.0 non-uniform workgroup. 106 - Cpu performance optimizations: 107 - Add LUT-based implementation of Hard Swish and Leaky ReLU activation function for aarch64 build. 108 - Optimize Add layer by considering the input tensors as 1D array. 109 - Add fixed-format BF16, FP16 and FP32 Neon™ GEMM kernels to support variable weights. 110 - Add new winograd convolution kernels implementation and update the ACL \link arm_compute::cpu::CpuWinogradConv2d CpuWinogradConv2d\endlink operator. 111 - Add experimental support for native builds for Windows on Arm®. 112 - Build flag interpretation change: arch=armv8.6-a now translates to -march=armv8.6-a CXX flag instead of march=armv8.2-a + explicit selection of feature extensions. 113 - Build flag change: toolchain_prefix, compiler_prefix: 114 - Use empty string "" to suppress any prefixes. 115 - Use "auto" to use default (auto) prefixes chosen by the build script. This is the default behavior when unspecified. 116 - Any other string will be used as custom prefixes to the compiler and the rest of toolchain tools. 117 - The default behaviour when prefix is unspecified does not change, but its signifier has been changed from empty string "" to "auto". 118 - armv7a with Android build will no longer be tested or maintained. 119 120v22.05 Public major release 121 - Various bug fixes. 122 - Various optimizations. 123 - Add support for NDK r23b. 124 - Inclusive language adjustment. Please refer to @ref S5_0_inc_lang for details. 125 - New Arm® Neon™ kernels / functions : 126 - \link opencl::kernels::ClPool3dKernel ClPool3dKernel \endlink 127 - New OpenCL kernels / functions : 128 - \link cpu::kernels::CpuPool3dKernel CpuPool3dKernel \endlink 129 - Improve the start-up times for the following OpenCL kernels: 130 - \link opencl::kernels::ClWinogradInputTransformKernel ClWinogradInputTransformKernel \endlink 131 - \link opencl::kernels::ClWinogradOutputTransformKernel ClWinogradOutputTransformKernel \endlink 132 - \link opencl::kernels::ClWinogradFilterTransformKernel ClWinogradFilterTransformKernel \endlink 133 - \link opencl::kernels::ClHeightConcatenateKernel ClHeightConcatenateKernel \endlink 134 - Decouple the implementation of the following Cpu kernels into various data types (fp32, fp16, int): 135 - \link cpu::kernels::CpuDirectConv2dKernel CpuDirectConv2dKernel \endlink 136 - \link cpu::kernels::CpuDepthwiseConv2dNativeKernel CpuDepthwiseConv2dNativeKernel \endlink 137 - \link cpu::kernels::CpuGemmMatrixAdditionKernel CpuGemmMatrixAdditionKernel \endlink 138 - \link cpu::kernels::CpuGemmMatrixMultiplyKernel CpuGemmMatrixMultiplyKernel \endlink 139 - @ref NEFuseBatchNormalizationKernel 140 - @ref NEL2NormalizeLayerKernel 141 142v22.02 Public major release 143 - Various bug fixes. 144 - Various optimizations. 145 - Update A510 arm_gemm cpu Kernels. 146 - Inclusive language adjustment. Please refer to @ref S5_0_inc_lang for details. 147 - Improve the start-up time for the following OpenCL kernels: 148 - @ref CLScale 149 - @ref CLGEMM 150 - @ref CLDepthwiseConvolutionLayer 151 - \link opencl::kernels::ClIm2ColKernel ClIm2ColKernel \endlink 152 - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink 153 - Remove functions: 154 - CLRemap 155 - NERemap 156 - Remove padding from OpenCL kernels: 157 - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink 158 - Remove padding from Cpu kernels: 159 - \link cpu::kernels::CpuDirectConv2dKernel CpuDirectConv2dKernel \endlink 160 - Decouple the implementation of the following Cpu kernels into various data types (fp32, fp16, int): 161 - \link cpu::kernels::CpuActivationKernel CpuActivationKernel \endlink 162 - \link cpu::kernels::CpuAddKernel CpuAddKernel \endlink 163 - \link cpu::kernels::CpuElementwiseKernel CpuElementwiseKernel \endlink 164 - \link cpu::CpuSoftmaxGeneric CpuSoftmaxKernel \endlink 165 - @ref NEBoundingBoxTransformKernel 166 - @ref NECropKernel 167 - @ref NEComputeAllAnchorsKernel 168 - @ref NEInstanceNormalizationLayerKernel 169 - NEMaxUnpoolingLayerKernel 170 - @ref NEMeanStdDevNormalizationKernel 171 - @ref NERangeKernel 172 - @ref NEROIAlignLayerKernel 173 - @ref NESelectKernel 174 175v21.11 Public major release 176 - Various bug fixes. 177 - Various optimizations: 178 - Improve performance of bilinear and nearest neighbor Scale on both CPU and GPU for FP32, FP16, Int8, Uint8 data types 179 - Improve performance of Softmax on GPU for Uint8/Int8 180 - New OpenCL kernels / functions: 181 - @ref CLConv3D 182 - New Arm® Neon™ kernels / functions: 183 - @ref NEConv3D 184 - Support configurable build by a selected subset of operator list 185 - Support MobileBert on Neon™ backend 186 - Improve operator/function logging 187 - Remove padding from OpenCL kernels: 188 - ClPool2dKernel 189 - ClScaleKernel 190 - ClGemmMatrixMultiplyReshapedKernel 191 - Remove padding from Cpu kernels: 192 - CpuPool2dKernel 193 - Remove Y padding from OpenCL kernels: 194 - ClGemmMatrixMultiplyKernel 195 - ClGemmReshapedRHSMatrixKernel 196 - Remove legacy GeMM kernels in gemm_v1.cl 197 198v21.08 Public major release 199 - Various bug fixes. 200 - Various optimizations: 201 - Improve LWS (Local-Workgroup-Size) heuristic in OpenCL for GeMM, Direct Convolution and Winograd Transformations when OpenCL tuner is not used 202 - Improve QASYMM8/QSYMM8 performance on OpenCL for various Arm® Mali™ GPU architectures 203 - Add dynamic weights support in Fully connected layer (CPU/GPU) 204 - Various performance optimizations for floating-point data types (CPU/GPU) 205 - Add a reduced core library build arm_compute_core_v2 206 - Expose Operator API 207 - Support fat binary build for arm8.2-a via fat_binary build flag 208 - Add CPU discovery capabilities 209 - Add data type f16 support for: 210 - CLRemapKernel 211 - Port the following functions to stateless API: 212 - @ref CLConvolutionLayer 213 - @ref CLFlattenLayer 214 - @ref CLFullyConnectedLayer 215 - @ref CLGEMM 216 - @ref CLGEMMConvolutionLayer 217 - @ref CLGEMMLowpMatrixMultiplyCore 218 - @ref CLWinogradConvolutionLayer 219 - @ref NEConvolutionLayer 220 - @ref NEFlattenLayer 221 - @ref NEFullyConnectedLayer 222 - @ref NEGEMM 223 - @ref NEGEMMConv2d 224 - @ref NEGEMMConvolutionLayer 225 - @ref NEGEMMLowpMatrixMultiplyCore 226 - @ref NEWinogradConvolutionLayer 227 - Remove the following functions: 228 - CLWinogradInputTransform 229 - Remove CLCoreRuntimeContext 230 - Remove ICPPSimpleKernel 231 - Rename file arm_compute/runtime/CL/functions/CLElementWiseUnaryLayer.h to arm_compute/runtime/CL/functions/CLElementwiseUnaryLayer.h 232 233v21.05 Public major release 234 - Various bug fixes. 235 - Various optimisations. 236 - Various documentation updates: 237 - Add supported operators and corresponding Android NNAPI operators. 238 - Documentation reorg into user guide and contributor guide. 239 - Add support for a global allocator for OpenCL tensors 240 - Add experimental support for [CLVK](https://github.com/kpet/clvk). 241 - Add data type S32 support for: 242 - @ref opencl::kernels::ClArithmeticKernel 243 - Add data type QASYMM8 support for: 244 - @ref CLROIPoolingLayer 245 - @ref CLROIPoolingLayerKernel 246 - @ref NEROIPoolingLayer 247 - @ref NEROIPoolingLayerKernel 248 - Add per-channel quantization support for: 249 - @ref CLDeconvolutionLayer 250 - @ref CLDirectDeconvolutionLayer 251 - @ref NEConvolutionLayer 252 - @ref NEDeconvolutionLayer 253 - Remove padding from OpenCL kernels: 254 - @ref CLL2NormalizeLayerKernel 255 - CLDepthwiseConvolutionLayer3x3NHWCKernel 256 - @ref CLNormalizationLayerKernel 257 - @ref CLNormalizePlanarYUVLayerKernel 258 - @ref opencl::kernels::ClMulKernel 259 - @ref CLReductionOperationKernel 260 - @ref CLROIPoolingLayerKernel 261 - Remove computer vision support from Arm® Neon™ backend 262 - Remove the following functions: 263 - NEAbsoluteDifference 264 - NEAccumulate 265 - NEBox3x3 266 - NECannyEdge 267 - NEChannelCombine 268 - NEChannelExtract 269 - NEColorConvert 270 - NEConvolution 271 - NEDerivative 272 - NEDilate 273 - NEEqualizeHistogram 274 - NEErode 275 - NEFastCorners 276 - NEGaussian3x3 277 - NEGaussian5x5 278 - NEGaussianPyramid 279 - NEHOGDescriptor 280 - NEHOGDetector 281 - NEHOGGradient 282 - NEHOGMultiDetection 283 - NEHarrisCorners 284 - NEHistogram 285 - NEIntegralImage 286 - NELaplacianPyramid 287 - NELaplacianReconstruct 288 - NEMagnitude 289 - NEMeanStdDev 290 - NEMedian3x3 291 - NEMinMaxLocation 292 - NENonLinearFilter 293 - NEOpticalFlow 294 - NEPhase 295 - NEScharr3x3 296 - NESobel3x3 297 - NESobel5x5 298 - NESobel7x7 299 - NETableLookup 300 - NEThreshold 301 - NEWarpAffine 302 - NEWarpPerspectiveKernel 303 - Remove all GLES kernels / functions / tests / examples 304 - Remove computer vision support from CL backend 305 - Remove the following functions: 306 - CLAbsoluteDifference 307 - CLAccumulate 308 - CLBox3x3 309 - CLCannyEdge 310 - CLChannelCombine 311 - CLChannelExtract 312 - CLColorConvert 313 - CLConvolution 314 - CLDerivative 315 - CLDilate 316 - CLEqualizeHistogram 317 - CLErode 318 - CLFastCorners 319 - CLGaussian3x3 320 - CLGaussian5x5 321 - CLGaussianPyramid 322 - CLHOGDescriptor 323 - CLHOGDetector 324 - CLHOGGradient 325 - CLHOGMultiDetection 326 - CLHarrisCorners 327 - CLHistogram 328 - CLIntegralImage 329 - CLLaplacianPyramid 330 - CLLaplacianReconstruct 331 - CLMagnitude 332 - CLMeanStdDev 333 - CLMedian3x3 334 - CLMinMaxLocation 335 - CLNonLinearFilter 336 - CLOpticalFlow 337 - CLPhase 338 - CLScharr3x3 339 - CLSobel3x3 340 - CLSobel5x5 341 - CLSobel7x7 342 - CLTableLookup 343 - CLThreshold 344 - CLWarpAffine 345 - CLWarpPerspective 346 347v21.02 Public major release 348 - Various bug fixes. 349 - Various optimisations. 350 - Upgrade C++ standard to C++14 351 - Add macOS support 352 - Add Armv8-R AArch64 architecture support 353 - Add SVE/SVE2 support for: 354 - NEScaleKernel 355 - @ref NEActivationLayer 356 - @ref NEArithmeticAddition 357 - @ref NEBatchNormalizationLayerKernel 358 - @ref cpu::kernels::CpuLogits1DSoftmaxKernel 359 - @ref cpu::kernels::CpuLogits1DMaxKernel 360 - @ref cpu::kernels::CpuElementwiseUnaryKernel 361 - Remove padding from OpenCL kernels: 362 - CLDirectConvolutionLayerKernel 363 - @ref CLArgMinMaxLayerKernel 364 - @ref CLPadLayerKernel 365 - @ref CLROIAlignLayerKernel 366 - @ref CLRangeKernel 367 - CLScaleKernel 368 - @ref CLSelectKernel 369 - @ref CLBitwiseKernel 370 - @ref opencl::kernels::ClFloorKernel 371 - CLTransposeKernel 372 - Deprecate functions in CLTuner: 373 - add_lws_to_table 374 - import_lws_table 375 - lws_table 376 - Remove functions: 377 - NELocallyConnectedLayer / CLLocallyConnectedLayer 378 - NEIm2Col 379 - NECol2Im 380 - NEGEMMInterleave4x4 381 - NEGEMMTranspose1xW 382 - NEComputeAllAnchors / CLComputeAllAnchors 383 - NEGEMMAssemblyDispatch 384 - NEUpsampleLayer / CLUpsampleLayer 385 - Remove kernels: 386 - NEGEMMMatrixVectorMultiplyKernel 387 - NELocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedMatrixMultiplyKernel 388 - NEUpsampleLayerKernel / CLUpsampleLayerKernel 389 - Extend OpenCL tuner with workgroup batch size support 390 - Experimental extension for the OpenCL tuner to tune the batches of work groups distribute to compute units 391 - Add functionality to load the OpenCL GEMM heuristics at runtime 392 - The GEMM heuristic file (MLGO) can be used to update the default GEMM heuristics available for OpenCL 393 - Note: there might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation 394 - Note: data-type decoupling is in progress and experimental. Warning of unused symbols might be raised 395 396v20.11 Public major release 397 - Various bug fixes. 398 - Various optimisations. 399 - Performance regressions can be noted when executing Depthwise Convolution on Arm® Neon™ with a depth multiplier > 1 for quantized data type. 400 This is planned to be resolved in 21.02 release. 401 - Added new data type QASYMM8_SIGNED support for @ref NEROIAlignLayer. 402 - Added new data type S32 support for: 403 - NEArithmeticSubtraction 404 - NEArithmeticSubtractionKernel 405 - @ref NEPixelWiseMultiplication 406 - NEPixelWiseMultiplicationKernel 407 - NEElementwiseDivision 408 - NEDivisionOperationKernel 409 - Interface change 410 - Properly support softmax axis to have the same meaning as other major frameworks. That is, axis now defines the dimension 411 on which Softmax/Logsoftmax is performed. E.g. for input of shape 4x5x6 and axis=1, softmax will be applied to 4x6=24 vectors of size 5. 412 The supported value range of axis is [-rank, rank). 413 This change applies to the following functions: 414 - @ref NESoftmaxLayer 415 - @ref NELogSoftmaxLayer 416 - @ref CLSoftmaxLayer 417 - @ref CLLogSoftmaxLayer 418 - GCSoftmaxLayer 419 - New OpenCL kernels / functions: 420 - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel 421 - @ref CLLogicalNot 422 - @ref CLLogicalAnd 423 - @ref CLLogicalOr 424 - New Arm® Neon™ kernels / functions: 425 - @ref NELogicalNot 426 - @ref NELogicalAnd 427 - @ref NELogicalOr 428 - Removed padding from Arm® Neon™ kernels: 429 - NEComplexPixelWiseMultiplicationKernel 430 - NENonMaximaSuppression3x3Kernel 431 - NERemapKernel 432 - NEGEMMInterleave4x4Kernel 433 - NEDirectConvolutionLayerKernel 434 - NEScaleKernel 435 - NELocallyConnectedMatrixMultiplyKernel 436 - NEGEMMLowpOffsetContributionKernel 437 - NEGEMMTranspose1xWKernel 438 - NEPoolingLayerKernel 439 - NEConvolutionKernel 440 - NEDepthwiseConvolutionLayerNativeKernel 441 - NEGEMMLowpMatrixMultiplyKernel 442 - NEGEMMMatrixMultiplyKernel 443 - NEDirectConvolutionLayerOutputStageKernel 444 - @ref NEReductionOperationKernel 445 - NEGEMMLowpMatrixAReductionKernel 446 - NEGEMMLowpMatrixBReductionKernel 447 - Removed padding from OpenCL kernels: 448 - CLBatchConcatenateLayerKernel 449 - CLElementwiseOperationKernel 450 - @ref CLBatchNormalizationLayerKernel 451 - CLPoolingLayerKernel 452 - CLWinogradInputTransformKernel 453 - CLGEMMLowpMatrixMultiplyNativeKernel 454 - CLGEMMLowpMatrixAReductionKernel 455 - CLGEMMLowpMatrixBReductionKernel 456 - CLGEMMLowpOffsetContributionOutputStageKernel 457 - CLGEMMLowpOffsetContributionKernel 458 - CLWinogradOutputTransformKernel 459 - CLGEMMLowpMatrixMultiplyReshapedKernel 460 - @ref CLFuseBatchNormalizationKernel 461 - @ref CLDepthwiseConvolutionLayerNativeKernel 462 - CLDepthConvertLayerKernel 463 - CLCopyKernel 464 - CLDepthwiseConvolutionLayer3x3NHWCKernel 465 - CLActivationLayerKernel 466 - CLWinogradFilterTransformKernel 467 - CLWidthConcatenateLayerKernel 468 - CLWidthConcatenate4TensorsKernel 469 - CLWidthConcatenate2TensorsKernel 470 - CLLogits1DMaxShiftExpSumKernel 471 - CLLogits1DNormKernel 472 - CLHeightConcatenateLayerKernel 473 - CLGEMMMatrixMultiplyKernel 474 - CLGEMMLowpQuantizeDownInt32ScaleKernel 475 - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel 476 - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel 477 - CLDepthConcatenateLayerKernel 478 - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel 479 - Removed OpenCL kernels / functions: 480 - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel 481 - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel 482 - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel 483 - Deprecated OpenCL kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together): 484 - CLLocallyConnectedLayer 485 - CLLocallyConnectedMatrixMultiplyKernel 486 - CLAbsoluteDifference 487 - CLAbsoluteDifferenceKernel 488 - CLAccumulate 489 - CLAccumulateKernel 490 - CLAccumulateSquared 491 - CLAccumulateSquaredKernel 492 - CLAccumulateWeighted 493 - CLAccumulateWeightedKernel 494 - CLAccumulateWeightedFP16Kernel 495 - CLBox3x3 496 - CLBox3x3Kernel 497 - CLBox3x3FP16Kernel 498 - CLCannyEdge 499 - CLChannelCombine 500 - CLChannelCombineKernel 501 - CLChannelExtract 502 - CLChannelExtractKernel 503 - CLColorConvert 504 - CLColorConvertKernel 505 - CLConvolution3x3 506 - CLConvolutionRectangle 507 - CLConvolutionRectangleKernel 508 - CLConvolutionSquare 509 - CLConvolutionKernel 510 - CLDerivative 511 - CLDerivativeKernel 512 - CLDilate 513 - CLDilateKernel 514 - CLEqualizeHistogram 515 - CLErode 516 - CLErodeKernel 517 - CLFastCorners 518 - CLFastCornersKernel 519 - CLGaussian3x3 520 - CLGaussian3x3Kernel 521 - CLGaussian5x5 522 - CLGaussian5x5HorKernel 523 - CLGaussian5x5VertKernel 524 - CLGaussianPyramid 525 - CLGaussianPyramidHalf 526 - CLGaussianPyramidOrb 527 - CLHarrisCorners 528 - CLHarrisScoreKernel 529 - CLHarrisScoreFP16Kernel 530 - CLHistogram 531 - CLHistogramKernel 532 - CLHOGOrientationBinningKernel 533 - CLHOGBlockNormalizationKernel 534 - CLHOGDetectorKernel 535 - CLHOGNonMaximaSuppressionKernel 536 - CLHOGDescriptor 537 - CLHOGDetector 538 - CLHOGGradient 539 - CLHOGMultiDetection 540 - CLHOGOrientationBinningKernel 541 - CLHOGBlockNormalizationKernel 542 - CLHOGDetectorKernel 543 - CLIntegralImage 544 - CLIntegralImageKernel 545 - CLLaplacianReconstruct 546 - CLLaplacianPyramid 547 - CLMagnitude 548 - CLMagnitudePhaseKernel 549 - CLMedian3x3 550 - CLMedian3x3Kernel 551 - CLMinMaxLocation 552 - CLMinMaxLocationKernel 553 - CLNonLinearFilter 554 - CLNonLinearFilterKernel 555 - CLNonMaximaSuppression3x3 556 - CLNonMaximaSuppression3x3FP16Kernel 557 - CLNonMaximaSuppression3x3Kernel 558 - CLOpticalFlow 559 - CLPhase 560 - CLRemap 561 - CLRemapKernel 562 - CLScharr3x3 563 - CLScharr3x3Kernel 564 - CLSobel3x3 565 - CLSobel3x3Kernel 566 - CLSobel5x5 567 - CLSobel5x5HorKernel 568 - CLSobel5x5VertKernel 569 - CLSobel7x7 570 - CLSobel7x7HorKernel 571 - CLSobel7x7VertKernel 572 - CLThreshold 573 - CLThresholdKernel 574 - CLWarpAffine 575 - CLWarpAffineKernel 576 - CLWarpPerspective 577 - CLWarpPerspectiveKernel 578 - Deprecated Arm® Neon™ kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together): 579 - NELocallyConnectedLayer 580 - NELocallyConnectedMatrixMultiplyKernel 581 - NEAbsoluteDifference 582 - NEAbsoluteDifferenceKernel 583 - NEAccumulate 584 - NEAccumulateKernel 585 - NEAccumulateSquared 586 - NEAccumulateSquaredKernel 587 - NEAccumulateWeighted 588 - NEAccumulateWeightedKernel 589 - NEAccumulateWeightedFP16Kernel 590 - NEBox3x3 591 - NEBox3x3Kernel 592 - NEBox3x3FP16Kernel 593 - NECannyEdge 594 - NEChannelCombine 595 - NEChannelCombineKernel 596 - NEChannelExtract 597 - NEChannelExtractKernel 598 - NEColorConvert 599 - NEColorConvertKernel 600 - NEConvolution3x3 601 - NEConvolutionRectangle 602 - NEConvolutionRectangleKernel 603 - NEConvolutionSquare 604 - NEConvolutionKernel 605 - NEDerivative 606 - NEDerivativeKernel 607 - NEDilate 608 - NEDilateKernel 609 - NEEqualizeHistogram 610 - NEErode 611 - NEErodeKernel 612 - NEFastCorners 613 - NEFastCornersKernel 614 - NEGaussian3x3 615 - NEGaussian3x3Kernel 616 - NEGaussian5x5 617 - NEGaussian5x5HorKernel 618 - NEGaussian5x5VertKernel 619 - NEGaussianPyramid 620 - NEGaussianPyramidHalf 621 - NEGaussianPyramidOrb 622 - NEHarrisCorners 623 - NEHarrisScoreKernel 624 - NEHarrisScoreFP16Kernel 625 - NEHistogram 626 - NEHistogramKernel 627 - NEHOGOrientationBinningKernel 628 - NEHOGBlockNormalizationKernel 629 - NEHOGDetectorKernel 630 - NEHOGNonMaximaSuppressionKernel 631 - NEHOGDescriptor 632 - NEHOGDetector 633 - NEHOGGradient 634 - NEHOGMultiDetection 635 - NEHOGOrientationBinningKernel 636 - NEHOGBlockNormalizationKernel 637 - NEHOGDetectorKernel 638 - NEIntegralImage 639 - NEIntegralImageKernel 640 - NELaplacianReconstruct 641 - NELaplacianPyramid 642 - NEMagnitude 643 - NEMagnitudePhaseKernel 644 - NEMedian3x3 645 - NEMedian3x3Kernel 646 - NEMinMaxLocation 647 - NEMinMaxLocationKernel 648 - NENonLinearFilter 649 - NENonLinearFilterKernel 650 - NENonMaximaSuppression3x3 651 - NENonMaximaSuppression3x3FP16Kernel 652 - NENonMaximaSuppression3x3Kernel 653 - NEOpticalFlow 654 - NEPhase 655 - NERemap 656 - NERemapKernel 657 - NEScharr3x3 658 - NEScharr3x3Kernel 659 - NESobel3x3 660 - NESobel3x3Kernel 661 - NESobel5x5 662 - NESobel5x5HorKernel 663 - NESobel5x5VertKernel 664 - NESobel7x7 665 - NESobel7x7HorKernel 666 - NESobel7x7VertKernel 667 - NEThreshold 668 - NEThresholdKernel 669 - NEWarpAffine 670 - NEWarpAffineKernel 671 - NEWarpPerspective 672 - NEWarpPerspectiveKernel 673 - Deprecated GLES kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together): 674 - GCAbsoluteDifference 675 - GCActivationLayer 676 - GCArithmeticAddition 677 - GCBatchNormalizationLayer 678 - GCConcatenateLayer 679 - GCConvolutionLayer 680 - GCDepthwiseConvolutionLayer 681 - GCDirectConvolutionLayer 682 - GCDropoutLayer 683 - GCFillBorder 684 - GCFullyConnectedLayer 685 - GCGEMM 686 - GCGEMMInterleave4x4 687 - GCGEMMTranspose1xW 688 - GCNormalizationLayer 689 - GCNormalizePlanarYUVLayer 690 - GCPixelWiseMultiplication 691 - GCPoolingLayer 692 - GCScale 693 - GCSoftmaxLayer 694 - GCTensorShift 695 - GCTranspose 696 697 698v20.08 Public major release 699 - Various bug fixes. 700 - Various optimisations. 701 - Added new data type QASYMM8_SIGNED support for: 702 - @ref CLArgMinMaxLayer 703 - @ref CLArgMinMaxLayerKernel 704 - Added new data type U8 support for: 705 - @ref NECropKernel 706 - CLCropKernel 707 - Added align_corner support for nearest neighbor interpolation in: 708 - NEScaleKernel 709 - CLScaleKernel 710 - New OpenCL kernels / functions: 711 - @ref CLMaxUnpoolingLayerKernel 712 - New Arm® Neon™ kernels / functions: 713 - NEMaxUnpoolingLayerKernel 714 - New graph example: 715 - graph_yolov3_output_detector 716 - GEMMTuner improvements: 717 - Added fp16 support 718 - Output json files for easier integration 719 - Enabled tuning for export_to_cl_image_rhs option for RHS tensors 720 - More robust script for running benchmarks 721 - Removed padding from: 722 - NEPixelWiseMultiplicationKernel 723 - NEHeightConcatenateLayerKernel 724 - NEThresholdKernel 725 - NEBatchConcatenateLayerKernel 726 - NETransposeKernel 727 - @ref NEBatchNormalizationLayerKernel 728 - NEArithmeticSubtractionKernel 729 - @ref NEBoundingBoxTransformKernel 730 - NELogits1DMaxKernel 731 - NELogits1DSoftmaxKernel 732 - @ref NEROIPoolingLayerKernel 733 - @ref NEROIAlignLayerKernel 734 - NEYOLOLayerKernel 735 - NEUpsampleLayerKernel 736 - NEFloorKernel 737 - NEWidthConcatenateLayerKernel 738 - NEDepthConcatenateLayerKernel 739 - @ref NENormalizationLayerKernel 740 - @ref NEL2NormalizeLayerKernel 741 - NEFillArrayKernel 742 - NEDepthConvertLayerKernel 743 - @ref NERangeKernel 744 - @ref NEPriorBoxLayer 745 - Removed OpenCL kernels / functions: 746 - CLGEMMLowpQuantizeDownInt32ToUint8Scale 747 - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat 748 - Removed Arm® Neon™ kernels / functions: 749 - NEGEMMLowpQuantizeDownInt32ToUint8Scale 750 - NEGEMMMatrixAccumulateBiasesKernel 751 - Deprecated functions / interfaces: 752 - Non-descriptor based interfaces for NEThreshold, CLThreshold 753 - Non-descriptor based interfaces for @ref NEScale, @ref CLScale and GCScale 754 - In @ref NESoftmaxLayer, @ref NELogSoftmaxLayer, @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer : 755 The default "axis" value for @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer is changed from 1 to 0. 756 Only axis 0 is supported. 757 The default "axis" value for @ref NESoftmaxLayer, @ref NELogSoftmaxLayer is changed from 1 to 0. 758 Only axis 0 is supported. 759 - The support for quantized data types has been removed from @ref CLLogSoftmaxLayer due to implementation complexity. 760 - Removed padding requirement for the input (e.g. LHS of GEMM) and output in CLGEMMMatrixMultiplyNativeKernel, CLGEMMMatrixMultiplyReshapedKernel, CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and CLIm2ColKernel (NHWC only) 761 - This change allows to use @ref CLGEMMConvolutionLayer without extra padding for the input and output. 762 - Only the weights/bias of @ref CLGEMMConvolutionLayer could require padding for the computation. 763 - Only on Arm® Mali™ Midgard GPUs, @ref CLGEMMConvolutionLayer could require padding since CLGEMMMatrixMultiplyKernel is called and currently requires padding. 764 - Added support for exporting the OpenCL buffer object to the OpenCL image object in CLGEMMMatrixMultiplyReshapedKernel and CLGEMMMatrixMultiplyReshapedOnlyRHSKernel. 765 - This support allows to export the OpenCL buffer used for the reshaped RHS matrix to the OpenCL image object. 766 - The padding requirement for the OpenCL image object is considered into the CLGEMMReshapeRHSMatrixKernel. 767 - The reshaped RHS matrix stores the weights when GEMM is used to accelerate CLGEMMConvolutionLayer. 768 769v20.05 Public major release 770 - Various bug fixes. 771 - Various optimisations. 772 - Updated recommended NDK version to r18b. 773 - Updated recommended gcc version to Linaro 6.3.1. 774 - Added Bfloat16 type support 775 - Added Bfloat16 support in: 776 - NEWeightsReshapeKernel 777 - NEConvolutionLayerReshapeWeights 778 - NEIm2ColKernel 779 - NEIm2Col 780 - NEDepthConvertLayerKernel 781 - @ref NEDepthConvertLayer 782 - @ref NEGEMMConvolutionLayer 783 - NEGEMMAssemblyDispatch 784 - Added new data type QASYMM8_SIGNED support for: 785 - @ref CLDirectConvolutionLayer 786 - @ref CLDeconvolutionLayer 787 - @ref CLDirectDeconvolutionLayer 788 - @ref CLGEMMDeconvolutionLayer 789 - CLGEMMLowpMatrixMultiplyReshapedKernel 790 - CLGEMMLowpQuantizeDownInt32ScaleKernel 791 - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel 792 - @ref CLReductionOperation 793 - @ref CLReduceMean 794 - @ref NEScale 795 - NEScaleKernel 796 - NEUpsampleLayer 797 - @ref NECast 798 - @ref NEReductionOperation 799 - @ref NEReduceMean 800 - @ref NEArgMinMaxLayer 801 - @ref NEDeconvolutionLayer 802 - NEGEMMLowpQuantizeDownInt32ScaleKernel 803 - @ref CPPBoxWithNonMaximaSuppressionLimit 804 - @ref CPPDetectionPostProcessLayer 805 - @ref CPPPermuteKernel 806 - @ref CPPPermute 807 - @ref CPPTopKVKernel 808 - @ref CPPTopKV 809 - @ref CPPUpsample 810 - @ref CPPUpsampleKernel 811 - New OpenCL kernels / functions: 812 - @ref CLQLSTMLayer 813 - @ref CLQLSTMLayerNormalizationKernel 814 - New Arm® Neon™ kernels / functions: 815 - @ref NEQLSTMLayer 816 - @ref NEQLSTMLayerNormalizationKernel 817 - Added HARD_SWISH support in: 818 - CLActivationLayerKernel 819 - NEActivationLayerKernel 820 - Deprecated OpenCL kernels / functions: 821 - CLGEMMLowpQuantizeDownInt32ToUint8Scale 822 - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat 823 - Deprecated Arm® Neon™ kernels / functions: 824 - NEGEMMLowpQuantizeDownInt32ToUint8Scale 825 - Removed CPP kernels / functions: 826 - CPPFlipWeightsKernel 827 - Removed PoolingLayerInfo constructors without Data Layout. 828 - Removed CLDepthwiseConvolutionLayer3x3 829 - Removed NEDepthwiseConvolutionLayerOptimized 830 - Added support for Winograd 3x3,4x4 on Arm® Neon™ FP16: 831 - @ref NEWinogradConvolutionLayer 832 - CpuWinogradConv2dTransformInputKernel 833 - CpuWinogradConv2dTransformOutputKernel 834 - CpuWinogradConv2dTransformWeightsKernel 835 - Added CLCompileContext 836 - Added Arm® Neon™ GEMM kernel with 2D window support 837 838v20.02.1 Maintenance release 839 - Added Android-NN build script. 840 841v20.02 Public major release 842 - Various bug fixes. 843 - Various optimisations. 844 - Added new data type QASYMM8_SIGNED support for: 845 - @ref CLDepthwiseConvolutionLayer 846 - CLDepthwiseConvolutionLayer3x3 847 - @ref CLGEMMConvolutionLayer 848 - CLGEMMLowpMatrixMultiplyCore 849 - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel 850 - CLGEMMLowpMatrixMultiplyNativeKernel 851 - @ref NEActivationLayer 852 - NEComparisonOperationKernel 853 - @ref NEConvolutionLayer 854 - @ref NEDepthwiseConvolutionLayer 855 - NEDepthwiseConvolutionLayer3x3Kernel 856 - NEDirectConvolutionLayerOutputStageKernel 857 - @ref NEElementwiseComparison 858 - @ref NEElementwiseMax 859 - @ref NEElementwiseMin 860 - @ref NEElementwiseSquaredDiff 861 - @ref NEFullyConnectedLayer 862 - NEGEMMMatrixVectorMultiplyKernel 863 - @ref NEPixelWiseMultiplication 864 - @ref NEPoolingLayer 865 - @ref NEPReluLayer 866 - Added support for QSYMM8_PER_CHANNEL in: 867 - NEDepthwiseConvolutionLayer3x3Kernel 868 - Added support for split sizes in: 869 - @ref CLSplit 870 - @ref NESplit 871 - New OpenCL kernels / functions: 872 - @ref CLFill 873 - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint 874 - New Arm® Neon™ kernels / functions: 875 - @ref NEFill 876 - NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint 877 - Deprecated Arm® Neon™ functions / interfaces: 878 - CLDepthwiseConvolutionLayer3x3 879 - NEDepthwiseConvolutionLayerOptimized 880 - PoolingLayerInfo constructors without Data Layout. 881 - Added support for quantization with multiplier greater than 1 on Arm® Neon™ and CL. 882 - Added support for quantized inputs of type QASYMM8_SIGNED and QASYMM8 to @ref CLQuantizationLayer. 883 - Added the ability to build bootcode for bare metal. 884 - Added support for generating synthetic QASYMM8 graphs. 885 - Added support for F16 datatype in VGG16. 886 - Removed pre-built binaries for GLES. 887 888v19.11.1 Public maintenance release 889 - Fix offset calculation in NEReductionOperationKernel. 890 - Fix data layout in NEScaleKernel for nhwc. 891 - Retain configuration step data layout to avoid side-effects. 892 - Perform sqrt in double domain for L2 pooling. 893 - Fix output shape calculation for Reduce Mean 894 - Restrict cases where optimized NEPadLayer runs. 895 896v19.11 Public major release 897 - Various bug fixes. 898 - Various optimisations. 899 - Updated recommended NDK version to r17c. 900 - Deprecated OpenCL kernels / functions: 901 - CLDepthwiseConvolutionLayerReshapeWeightsGenericKernel 902 - CLDepthwiseIm2ColKernel 903 - CLDepthwiseSeparableConvolutionLayer 904 - CLDepthwiseVectorToTensorKernel 905 - CLDirectConvolutionLayerOutputStageKernel 906 - Deprecated Arm® Neon™ kernels / functions: 907 - NEDepthwiseWeightsReshapeKernel 908 - NEDepthwiseIm2ColKernel 909 - NEDepthwiseSeparableConvolutionLayer 910 - NEDepthwiseVectorToTensorKernel 911 - NEDepthwiseConvolutionLayer3x3 912 - New OpenCL kernels / functions: 913 - @ref CLInstanceNormalizationLayerKernel / @ref CLInstanceNormalizationLayer 914 - @ref CLDepthwiseConvolutionLayerNativeKernel to replace the old generic depthwise convolution (see Deprecated 915 OpenCL kernels / functions) 916 - @ref CLLogSoftmaxLayer 917 - New Arm® Neon™ kernels / functions: 918 - @ref NEBoundingBoxTransformKernel / @ref NEBoundingBoxTransform 919 - @ref NEComputeAllAnchorsKernel / NEComputeAllAnchors 920 - @ref NEDetectionPostProcessLayer 921 - @ref NEGenerateProposalsLayer 922 - @ref NEInstanceNormalizationLayerKernel / @ref NEInstanceNormalizationLayer 923 - @ref NELogSoftmaxLayer 924 - @ref NEROIAlignLayerKernel / @ref NEROIAlignLayer 925 - Added QASYMM8 support for: 926 - @ref CLGenerateProposalsLayer 927 - @ref CLROIAlignLayer 928 - @ref CPPBoxWithNonMaximaSuppressionLimit 929 - Added QASYMM16 support for: 930 - @ref CLBoundingBoxTransform 931 - Added FP16 support for: 932 - CLGEMMMatrixMultiplyReshapedKernel 933 - Added new data type QASYMM8_PER_CHANNEL support for: 934 - CLDequantizationLayer 935 - @ref NEDequantizationLayer 936 - Added new data type QSYMM8_PER_CHANNEL support for: 937 - @ref CLConvolutionLayer 938 - @ref NEConvolutionLayer 939 - @ref CLDepthwiseConvolutionLayer 940 - @ref NEDepthwiseConvolutionLayer 941 - Added FP16 mixed-precision support for: 942 - CLGEMMMatrixMultiplyReshapedKernel 943 - CLPoolingLayerKernel 944 - Added FP32 and FP16 ELU activation for: 945 - @ref CLActivationLayer 946 - @ref NEActivationLayer 947 - Added asymmetric padding support for: 948 - @ref CLDirectDeconvolutionLayer 949 - @ref CLGEMMDeconvolutionLayer 950 - @ref NEDeconvolutionLayer 951 - Added SYMMETRIC and REFLECT modes for @ref CLPadLayerKernel / @ref CLPadLayer. 952 - Replaced the calls to NECopyKernel and NEMemsetKernel with @ref NEPadLayer in @ref NEGenerateProposalsLayer. 953 - Replaced the calls to CLCopyKernel and CLMemsetKernel with @ref CLPadLayer in @ref CLGenerateProposalsLayer. 954 - Improved performance for CL Inception V3 - FP16. 955 - Improved accuracy for CL Inception V3 - FP16 by enabling FP32 accumulator (mixed-precision). 956 - Improved Arm® Neon™ performance by enabling fusing batch normalization with convolution and depth-wise convolution layer. 957 - Improved Arm® Neon™ performance for MobileNet-SSD by improving the output detection performance. 958 - Optimized @ref CLPadLayer. 959 - Optimized CL generic depthwise convolution layer by introducing @ref CLDepthwiseConvolutionLayerNativeKernel. 960 - Reduced memory consumption by implementing weights sharing. 961 962v19.08.1 Public maintenance release 963 - Fix offset calculation in NEReductionOperationKernel. 964 - Fix data layout in NEScaleKernel for nhwc. 965 - Retain configuration step data layout to avoid side-effects. 966 - Perform sqrt in double domain for L2 pooling. 967 - Fix output shape calculation for Reduce Mean 968 - Fix broadcast CLPixelwiseMultiplication with 5D tensors 969 970v19.08 Public major release 971 - Various bug fixes. 972 - Various optimisations. 973 - Deprecated Arm® Neon™ functions 974 - NEDepthConcatenateLayer 975 - NEWidthConcatenateLayer 976 - Deprecated OpenCL kernels / functions 977 - CLDepthConcatenateLayer 978 - CLGEMMInterleave4x4Kernel / CLGEMMInterleave4x4 979 - CLGEMMTranspose1xWKernel / CLGEMMTranspose1xW 980 - CLWidthConcatenateLayer 981 - New Arm® Neon™ kernels / functions: 982 - @ref NEAbsLayer 983 - @ref NECast 984 - @ref NEElementwisePower 985 - @ref NELogLayer 986 - @ref NELSTMLayerQuantized 987 - @ref NENegLayer 988 - @ref NEPReluLayer 989 - @ref NESinLayer 990 - NEBatchConcatenateLayerKernel 991 - @ref NEDepthToSpaceLayerKernel / @ref NEDepthToSpaceLayer 992 - NEDepthwiseConvolutionLayerNativeKernel 993 - NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel 994 - @ref NEMeanStdDevNormalizationKernel / @ref NEMeanStdDevNormalizationLayer 995 - @ref NESpaceToDepthLayerKernel / @ref NESpaceToDepthLayer 996 - New OpenCL kernels / functions: 997 - @ref CLAbsLayer 998 - @ref CLElementwisePower 999 - @ref CLLogLayer 1000 - @ref CLLSTMLayerQuantized 1001 - @ref CLNegLayer 1002 - @ref CLPReluLayer 1003 - @ref CLSinLayer 1004 - CLBatchConcatenateLayerKernel 1005 - @ref CLDepthToSpaceLayerKernel / @ref CLDepthToSpaceLayer 1006 - CLGEMMLowpMatrixMultiplyNativeKernel 1007 - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel 1008 - CLGEMMMatrixMultiplyNativeKernel 1009 - CLMeanStdDevNormalizationKernel /CLMeanStdDevNormalizationLayer 1010 - @ref CLSpaceToDepthLayerKernel / @ref CLSpaceToDepthLayer 1011 - New examples: 1012 - neon_opticalflow 1013 - cl_cache 1014 - neon_permute 1015 - Added support for FP16 in @ref NEDeconvolutionLayer 1016 - Added support for FP16 in @ref CLDeconvolutionLayer 1017 - Added support for REDUCE_MIN and REDUCE_MAX in @ref ReductionOperation 1018 - Enable the fusion of batch normalization with convolution and depthwise convolution layer for FP32 in the graph API (OpenCL only) 1019 - Added support for fusing activation function and broadcast addition with the matrix multiplication for FP32 (OpenCL only) 1020 - Re-factored the depthwise convolution layer kernel on Arm® Neon™ for generic cases 1021 - Added an optimized depthwise convolution layer kernel for 5x5 filters (Neon™ only) 1022 - Added support to enable OpenCL kernel cache. Added example showing how to load the prebuilt OpenCL kernels from a binary cache file 1023 - Altered @ref QuantizationInfo interface to support per-channel quantization. 1024 - The CLDepthwiseConvolutionLayer3x3 will be included by @ref CLDepthwiseConvolutionLayer to accommodate for future optimizations. 1025 - The NEDepthwiseConvolutionLayerOptimized will be included by @ref NEDepthwiseConvolutionLayer to accommodate for future optimizations. 1026 - Removed inner_border_right and inner_border_top parameters from @ref CLDeconvolutionLayer interface 1027 - Removed inner_border_right and inner_border_top parameters from @ref NEDeconvolutionLayer interface 1028 - Optimized the Arm® Neon™ assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel 1029 1030v19.05 Public major release 1031 - Various bug fixes. 1032 - Various optimisations. 1033 - New Arm® Neon™ kernels / functions: 1034 - @ref NEBatchToSpaceLayerKernel / @ref NEBatchToSpaceLayer 1035 - NEComplexPixelWiseMultiplicationKernel / @ref NEComplexPixelWiseMultiplication 1036 - @ref NECropKernel / @ref NECropResize 1037 - NEDepthwiseConvolutionAssemblyDispatch 1038 - @ref NEFFTDigitReverseKernel 1039 - @ref NEFFTRadixStageKernel 1040 - @ref NEFFTScaleKernel 1041 - NEGEMMLowpOffsetContributionOutputStageKernel 1042 - NEHeightConcatenateLayerKernel 1043 - @ref NESpaceToBatchLayerKernel / @ref NESpaceToBatchLayer 1044 - @ref NEFFT1D 1045 - @ref NEFFT2D 1046 - @ref NEFFTConvolutionLayer 1047 - New OpenCL kernels / functions: 1048 - CLComplexPixelWiseMultiplicationKernel / @ref CLComplexPixelWiseMultiplication 1049 - CLCropKernel / @ref CLCropResize 1050 - @ref CLDeconvolutionReshapeOutputKernel 1051 - @ref CLFFTDigitReverseKernel 1052 - @ref CLFFTRadixStageKernel 1053 - @ref CLFFTScaleKernel 1054 - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel 1055 - CLGEMMMatrixMultiplyReshapedOnlyRHSKernel 1056 - CLHeightConcatenateLayerKernel 1057 - @ref CLDirectDeconvolutionLayer 1058 - @ref CLFFT1D 1059 - @ref CLFFT2D 1060 - @ref CLFFTConvolutionLayer 1061 - @ref CLGEMMDeconvolutionLayer 1062 - New OpenGLES kernels / functions: 1063 - GCConcatenateLayer 1064 - Deprecated functions/interfaces 1065 - GCDepthConcatenateLayer 1066 - NEWidthConcatenateLayer 1067 - NEDepthConcatenateLayer 1068 - CLWidthConcatenateLayer 1069 - CLDepthConcatenateLayer 1070 - CLGEMMInterleave4x4 1071 - CLGEMMTranspose1xW 1072 - Support different quantization info in CLConcatLayer. 1073 - Add checks on different input/output quantization info were not supported. 1074 - Tensors have different quantization information. 1075 - Add FP16 support checks. 1076 - Fix output quantization CLDeptwiseConv3x3 when activation is fused. 1077 - New graph examples: 1078 - graph_convolution 1079 - graph_fully_connected 1080 - graph_depthwise_convolution 1081 - Deepspeech v0.4.1 1082 - Add support for QASYMM8 in NEArithmeticSubtractionKernel. 1083 - Add support for QASYMM8 in NEPixelWiseMultiplicationKernel. 1084 - Add support for QASYMM8 NEDeconvolution. 1085 - Add support for DequantizationLayer for Neon/CL. 1086 - Add support for dilation in CLDepthwiseConvolution. 1087 - Fuse offset contribution with the output stage when we use NEGEMMLowpMatrixMultiplyCore. 1088 - Optimize CLDeconvolution. 1089 - Add StackLayer to the graph API. 1090 - Add support for "reflect" padding mode in NEPad. 1091 - Winograd 7x7 NHWC on OpenCL. 1092 - Rework CL ML layers to run exclusively on CL. 1093 - Support different quantization info in PoolingLayer. 1094 - Implement and test import memory interfaces. 1095 - Added new tests and removed old ones. 1096 - Various clang-tidy fixes. 1097 1098v19.02 Public major release 1099 - Various bug fixes. 1100 - Various optimisations. 1101 - New Arm® Neon™ kernels / functions: 1102 - @ref NETileKernel / @ref NETile 1103 - @ref NEFuseBatchNormalizationKernel / @ref NEFuseBatchNormalization 1104 - NEElementwiseOperationKernel 1105 - @ref NEElementwiseMax 1106 - @ref NEElementwiseMin 1107 - @ref NEElementwiseSquaredDiff 1108 - @ref NESelectKernel / @ref NESelect 1109 - @ref NESplit 1110 - @ref NESlice 1111 - @ref NEUnstack 1112 - @ref NEStridedSliceKernel / @ref NEStridedSlice 1113 - NEElementwiseUnaryKernel 1114 - @ref NERsqrtLayer 1115 - @ref NEExpLayer 1116 - @ref NEReverseKernel / @ref NEReverse 1117 - @ref NEArgMinMaxLayer 1118 - @ref NEStackLayerKernel / @ref NEStackLayer 1119 - @ref NERangeKernel / @ref NERange 1120 - @ref NEPadLayer 1121 - NEMemsetKernel 1122 - @ref NEGatherKernel / @ref NEGather 1123 - @ref NEElementwiseComparison 1124 - @ref NEElementwiseComparisonStatic 1125 - NEComparisonOperationKernel 1126 - @ref NEElementwiseDivision 1127 - New OpenCL kernels / functions: 1128 - @ref CLSelectKernel / @ref CLSelect 1129 - @ref CLTileKernel / @ref CLTile 1130 - @ref CLComparisonKernel / @ref CLComparison 1131 - @ref CLArgMinMaxLayer 1132 - @ref CLElementwiseMax 1133 - @ref CLElementwiseMin 1134 - @ref CLElementwiseSquaredDiff 1135 - @ref CLStackLayerKernel / @ref CLStackLayer 1136 - @ref CLReverse / @ref CLReverseKernel 1137 - @ref CLRsqrtLayer 1138 - @ref CLExpLayer 1139 - CLElementWiseUnaryLayerKernel 1140 - CLGEMMReshapeLHSMatrixKernel 1141 - CLGEMMReshapeRHSMatrixKernel 1142 - CLGEMMMatrixMultiplyReshapedKernel 1143 - @ref CLRangeKernel / @ref CLRange 1144 - @ref CLUnstack 1145 - @ref CLGatherKernel / @ref CLGather 1146 - CLGEMMLowpMatrixMultiplyReshapedKernel 1147 - New CPP kernels / functions: 1148 - @ref CPPDetectionOutputLayer 1149 - @ref CPPTopKV / @ref CPPTopKVKernel 1150 - Added new examples: 1151 - graph_ssd_mobilenet.cpp 1152 - graph_mobilenet_v2.cpp 1153 - graph_resnet12.cpp 1154 - graph_srcnn955.cpp 1155 - graph_vgg_vdsr.cpp 1156 - graph_inception_resnet_v1.cpp 1157 - Add 4D tensors support to 1158 - @ref NESoftmaxLayer 1159 - Fused activation in @ref CLWinogradConvolutionLayer 1160 - Extended @ref NEPermute to support more cases 1161 - Added Neon™/SVE GEMM Hybrid kernels 1162 - Added u8 and s8 hybrid assembly kernels 1163 - Introduced GEMM strategy name in NEGEMMAssemblyWrapper 1164 - Improved @ref CLTuner 1165 - Fused the bias addition within @ref CLGEMM 1166 - Added support for QASYMM8 LOGISTIC activation in @ref NEActivationLayer 1167 - Added NHWC data layout support to: 1168 - @ref NEScale for F16 1169 - @ref CLNormalizationLayer IN_MAP_2D for FP32/FP16 1170 - @ref NEL2NormalizeLayer for FP32/FP16 1171 - @ref NENormalizationLayer IN_MAP_2D for FP32/FP16 1172 - @ref CLROIAlignLayer 1173 - @ref CLGenerateProposalsLayer 1174 - Added QASYMM8 support to the following kernels: 1175 - NEArithmeticAdditionKernel 1176 - @ref NEScale 1177 - Added new tests and improved validation and benchmarking suites. 1178 - Deprecated functions/interfaces 1179 - Usage of inner_border_right and inner_border_top has been deprecated in @ref CLDeconvolutionLayer and @ref NEDeconvolutionLayer 1180 1181v18.11 Public major release 1182 - Various bug fixes. 1183 - Various optimisations. 1184 - New Arm® Neon™ kernels / functions: 1185 - @ref NEChannelShuffleLayer / @ref NEChannelShuffleLayerKernel 1186 - @ref NEReduceMean 1187 - @ref NEReorgLayer / @ref NEReorgLayerKernel 1188 - @ref NEPriorBoxLayer / @ref NEPriorBoxLayerKernel 1189 - NEUpsampleLayer / NEUpsampleLayerKernel 1190 - NEYOLOLayer / NEYOLOLayerKernel 1191 - New OpenCL kernels / functions: 1192 - @ref CLBatchToSpaceLayer / @ref CLBatchToSpaceLayerKernel 1193 - @ref CLBoundingBoxTransform / @ref CLBoundingBoxTransformKernel 1194 - @ref CLComputeAllAnchorsKernel 1195 - @ref CLGenerateProposalsLayer 1196 - @ref CLNormalizePlanarYUVLayer / @ref CLNormalizePlanarYUVLayerKernel 1197 - @ref CLReorgLayer / @ref CLReorgLayerKernel 1198 - @ref CLSpaceToBatchLayer / @ref CLSpaceToBatchLayerKernel 1199 - @ref CLPadLayer 1200 - @ref CLReduceMean 1201 - @ref CLPriorBoxLayer / @ref CLPriorBoxLayerKernel 1202 - @ref CLROIAlignLayer / @ref CLROIAlignLayerKernel 1203 - @ref CLSlice 1204 - @ref CLSplit 1205 - @ref CLStridedSlice / @ref CLStridedSliceKernel 1206 - CLUpsampleLayer / CLUpsampleLayerKernel 1207 - CLYOLOLayer / CLYOLOLayerKernel 1208 - New CPP kernels / functions: 1209 - @ref CPPBoxWithNonMaximaSuppressionLimit / @ref CPPBoxWithNonMaximaSuppressionLimitKernel 1210 - Added the validate method in: 1211 - @ref NEDepthConvertLayer 1212 - @ref NEFloor / @ref CLFloor 1213 - NEGEMMMatrixAdditionKernel 1214 - @ref NEReshapeLayer / @ref CLReshapeLayer 1215 - @ref CLScale 1216 - Added new examples: 1217 - graph_shufflenet.cpp 1218 - graph_yolov3.cpp 1219 - Added documentation for add a new function or kernel. 1220 - Improved doxygen documentation adding a list of the existing functions. 1221 - Add 4D tensors support to 1222 - CLWidthConcatenateLayer 1223 - CLFlattenLayer 1224 - @ref CLSoftmaxLayer 1225 - Add dot product support for CLDepthwiseConvolutionLayer3x3NHWCKernel non-unit stride 1226 - Add SVE support 1227 - Fused batch normalization into convolution layer weights in @ref CLFuseBatchNormalization 1228 - Fuses activation in CLDepthwiseConvolutionLayer3x3NCHWKernel, CLDepthwiseConvolutionLayer3x3NHWCKernel and @ref NEGEMMConvolutionLayer 1229 - Added NHWC data layout support to: 1230 - @ref CLChannelShuffleLayer 1231 - @ref CLDeconvolutionLayer 1232 - @ref CLL2NormalizeLayer 1233 - Added QASYMM8 support to the following kernels: 1234 - CLScaleKernel 1235 - NEDepthwiseConvolutionLayer3x3Kernel 1236 - CLPixelWiseMultiplicationKernel 1237 - Added FP16 support to the following kernels: 1238 - CLDepthwiseConvolutionLayer3x3NHWCKernel 1239 - NEDepthwiseConvolutionLayer3x3Kernel 1240 - @ref CLNormalizePlanarYUVLayerKernel 1241 - @ref CLWinogradConvolutionLayer (5x5 kernel) 1242 - More tests added to both validation and benchmarking suites. 1243 1244v18.08 Public major release 1245 - Various bug fixes. 1246 - Various optimisations. 1247 - Updated recommended NDK version to r17b. 1248 - Removed support for QS8/QS16 data types. 1249 - Added support for grouped convolution in @ref CLConvolutionLayer. 1250 - Added NHWC data layout support to: 1251 - NEDepthConcatenateLayer / CLDepthConcatenateLayer 1252 - @ref NEWinogradConvolutionLayer / @ref CLWinogradConvolutionLayer 1253 - @ref CLDepthwiseConvolutionLayer 1254 - @ref CLDirectConvolutionLayer 1255 - @ref CLConvolutionLayer 1256 - @ref CLScale 1257 - CLIm2ColKernel 1258 - New Arm® Neon™ kernels / functions: 1259 - @ref NERNNLayer 1260 - New OpenCL kernels / functions: 1261 - @ref CLArithmeticDivision 1262 - Introduced prepare() stage support in the graph API for GLES. 1263 - Added support for memory reusage when trying to allocate smaller CLTensors. 1264 - Enabled NHWC execution on graph examples. 1265 - Added JPEG accessor for validation purposes. 1266 - Added validate methods to some kernels / functions. 1267 1268v18.05 Public major release 1269 - Various bug fixes. 1270 - Various optimisations. 1271 - Major redesign in the interface for the Neon™ kernels implemented in assembly. 1272 - Removed arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore / arm_compute::NEHGEMMAArch64FP16Kernel 1273 - Added NEGEMMAssemblyWrapper and AssemblyKernelGlue which are used to execute assembly kernels in Neon™ functions. 1274 - Minor changes to the CPUInfo type to make it compatible with the new assembly gemm interface. 1275 - Moved Neon™ assembly kernels to the folder src/core/Neon/kernels/arm_gemm. 1276 - Improved doxygen documentation. 1277 - Improved memory management for layer's transitions. 1278 - Added support for NHWC data layout in tensors. 1279 - Added NHWC data layout support to: 1280 - @ref NEGEMMConvolutionLayer 1281 - @ref NEDirectConvolutionLayer 1282 - @ref NEPoolingLayer / @ref CLPoolingLayer 1283 - @ref NEBatchNormalizationLayer / @ref CLBatchNormalizationLayer 1284 - @ref NEDepthwiseConvolutionLayer 1285 - @ref NEScale 1286 - NEIm2Col 1287 - Added support for dilated convolutions in @ref NEConvolutionLayer and @ref CLConvolutionLayer. 1288 - New OpenCL kernels / functions: 1289 - @ref CLChannelShuffleLayer / @ref CLChannelShuffleLayerKernel 1290 - CLConvertFullyConnectedWeightsKernel / @ref CLConvertFullyConnectedWeights 1291 - @ref CLCopy / CLCopyKernel 1292 - @ref CLLSTMLayer 1293 - @ref CLRNNLayer 1294 - CLWidthConcatenateLayer / CLWidthConcatenateLayerKernel 1295 - CLWinogradFilterTransformKernel / @ref CLWinogradConvolutionLayer 1296 - CLWinogradInputTransformKernel / CLWinogradInputTransform 1297 - New Arm® Neon™ kernels / functions: 1298 - NEConvertFullyConnectedWeightsKernel / @ref NEConvertFullyConnectedWeights. 1299 - Created the validate method in @ref CLDepthwiseConvolutionLayer. 1300 - Beta and gamma are no longer mandatory arguments in @ref NEBatchNormalizationLayer and @ref CLBatchNormalizationLayer. 1301 - Added depth multiplier support in @ref NEDepthwiseConvolutionLayer and @ref CLDepthwiseConvolutionLayer. 1302 - Added broadcast multiply support in @ref NEPixelWiseMultiplication / NEPixelWiseMultiplicationKernel. 1303 - Port mobilenet example to NHWC data layout. 1304 - Enabled Winograd method in @ref CLConvolutionLayer. 1305 - Renamed NEWinogradLayer to @ref NEWinogradConvolutionLayer. 1306 - Updated @ref NEWinogradConvolutionLayer to use highly optimised assembly kernels in src/core/Neon/kernels/arm_gemm. 1307 - Added memory manager support in GLES functions. 1308 - Major refactoring of the graph API. 1309 - Added GLES backend in the graph API. 1310 - Added support for the memory manager in the graph API. 1311 - Enabled Winograd Convolution method in the graph API. 1312 - Added support for grouped convolutions in the graph API. 1313 - Replaced NEDeconvolutionLayerUpsampleKernel with NEScaleKernel in @ref NEDeconvolutionLayer. 1314 - Added fast maths flag in @ref CLConvolutionLayer. 1315 - Added new tests and benchmarks in validation and benchmark frameworks 1316 - Merge Activation layer with Convolution Layer (Neon™, CL, GLES) 1317 - Added support to OpenCL 2.0 SVM 1318 - Added support to import memory in OpenCL tensors. 1319 - Added the prepare() method to perform any one off pre-processing before running the function. 1320 - Added new examples: 1321 - graph_inception_v4.cpp 1322 - graph_resnext50.cpp 1323 - Added memory measurement instrument for CL. 1324 1325v18.03 Public maintenance release 1326 - Various bug fixes. 1327 - Fixed bug in @ref NEActivationLayer 1328 - Fix in @ref CLTuner when using batches. 1329 - Updated recommended NDK version to r16b (And fixed warnings). 1330 - Fixed bug in validation code. 1331 - Added Inception v4 graph example. 1332 - Renamed NEWinogradLayer.cpp to @ref NEWinogradConvolutionLayer 1333 1334v18.02 Public major release 1335 - Various Arm® Neon™ / OpenCL / GLES optimisations. 1336 - Various bug fixes. 1337 - Changed default number of threads on big LITTLE systems. 1338 - Refactored examples and added: 1339 - graph_mobilenet_qassym8 1340 - graph_resnet 1341 - graph_squeezenet_v1_1 1342 - Renamed @ref CLConvolutionLayer into @ref CLGEMMConvolutionLayer and created a new @ref CLConvolutionLayer to select the fastest convolution method. 1343 - Renamed @ref NEConvolutionLayer into @ref NEGEMMConvolutionLayer and created a new @ref NEConvolutionLayer to select the fastest convolution method. 1344 - Added in place support to: 1345 - @ref CLActivationLayer 1346 - @ref CLBatchNormalizationLayer 1347 - Added QASYMM8 support to: 1348 - @ref CLActivationLayer 1349 - @ref CLDepthwiseConvolutionLayer 1350 - @ref NEDepthwiseConvolutionLayer 1351 - @ref NESoftmaxLayer 1352 - Added FP16 support to: 1353 - CLDepthwiseConvolutionLayer3x3 1354 - @ref CLDepthwiseConvolutionLayer 1355 - Added broadcasting support to NEArithmeticAddition / @ref CLArithmeticAddition / @ref CLPixelWiseMultiplication 1356 - Added fused batched normalization and activation to @ref CLBatchNormalizationLayer and @ref NEBatchNormalizationLayer 1357 - Added support for non-square pooling to @ref NEPoolingLayer and @ref CLPoolingLayer 1358 - New OpenCL kernels / functions: 1359 - CLDirectConvolutionLayerOutputStageKernel 1360 - New Arm® Neon™ kernels / functions 1361 - Added name() method to all kernels. 1362 - Added support for Winograd 5x5. 1363 - NEPermuteKernel / @ref NEPermute 1364 - CpuWinogradConv2dTransformInputKernel / NEWinogradLayer 1365 - CpuWinogradConv2dTransformOutputKernel / NEWinogradLayer 1366 - CpuWinogradConv2dTransformWeightsKernel / NEWinogradLayer 1367 - Renamed NEWinogradLayerKernel into NEWinogradLayerBatchedGEMMKernel 1368 - New GLES kernels / functions: 1369 - GCTensorShiftKernel / GCTensorShift 1370 1371v18.01 Public maintenance release 1372 - Various bug fixes 1373 - Added some of the missing validate() methods 1374 - Added @ref CLDeconvolutionLayerUpsampleKernel / @ref CLDeconvolutionLayer @ref CLDeconvolutionLayerUpsample 1375 - Added CLPermuteKernel / @ref CLPermute 1376 - Added method to clean the programs cache in the CL Kernel library. 1377 - Added GCArithmeticAdditionKernel / GCArithmeticAddition 1378 - Added GCDepthwiseConvolutionLayer3x3Kernel / GCDepthwiseConvolutionLayer3x3 1379 - Added GCNormalizePlanarYUVLayerKernel / GCNormalizePlanarYUVLayer 1380 - Added GCScaleKernel / GCScale 1381 - Added GCWeightsReshapeKernel / GCConvolutionLayer 1382 - Added FP16 support to the following GLES compute kernels: 1383 - GCCol2ImKernel 1384 - GCGEMMInterleave4x4Kernel 1385 - GCGEMMTranspose1xWKernel 1386 - GCIm2ColKernel 1387 - Refactored Arm® Neon™ Winograd (NEWinogradLayerKernel) 1388 - Added NEDirectConvolutionLayerOutputStageKernel 1389 - Added QASYMM8 support to the following Arm® Neon™ kernels: 1390 - NEDepthwiseConvolutionLayer3x3Kernel 1391 - @ref NEFillBorderKernel 1392 - NEPoolingLayerKernel 1393 - Added new examples: 1394 - graph_cl_mobilenet_qasymm8.cpp 1395 - graph_inception_v3.cpp 1396 - gc_dc.cpp 1397 - More tests added to both validation and benchmarking suites. 1398 1399v17.12 Public major release 1400 - Most machine learning functions on OpenCL support the new data type QASYMM8 1401 - Introduced logging interface 1402 - Introduced opencl timer 1403 - Reworked GEMMLowp interface 1404 - Added new Arm® Neon™ assembly kernels for GEMMLowp, SGEMM and HGEMM 1405 - Added validation method for most Machine Learning kernels / functions 1406 - Added new graph examples such as googlenet, mobilenet, squeezenet, vgg16 and vgg19 1407 - Added sgemm example for OpenCL 1408 - Added absolute difference example for GLES compute 1409 - Added new tests and benchmarks in validation and benchmark frameworks 1410 - Added new kernels / functions for GLES compute 1411 1412 - New OpenGL ES kernels / functions 1413 - GCAbsoluteDifferenceKernel / GCAbsoluteDifference 1414 - GCActivationLayerKernel / GCActivationLayer 1415 - GCBatchNormalizationLayerKernel / GCBatchNormalizationLayer 1416 - GCCol2ImKernel 1417 - GCDepthConcatenateLayerKernel / GCDepthConcatenateLayer 1418 - GCDirectConvolutionLayerKernel / GCDirectConvolutionLayer 1419 - GCDropoutLayerKernel / GCDropoutLayer 1420 - GCFillBorderKernel / GCFillBorder 1421 - GCGEMMInterleave4x4Kernel / GCGEMMInterleave4x4 1422 - GCGEMMMatrixAccumulateBiasesKernel / GCGEMMMatrixAdditionKernel / GCGEMMMatrixMultiplyKernel / GCGEMM 1423 - GCGEMMTranspose1xWKernel / GCGEMMTranspose1xW 1424 - GCIm2ColKernel 1425 - GCNormalizationLayerKernel / GCNormalizationLayer 1426 - GCPixelWiseMultiplicationKernel / GCPixelWiseMultiplication 1427 - GCPoolingLayerKernel / GCPoolingLayer 1428 - GCLogits1DMaxKernel / GCLogits1DShiftExpSumKernel / GCLogits1DNormKernel / GCSoftmaxLayer 1429 - GCTransposeKernel / GCTranspose 1430 1431 - New Arm® Neon™ kernels / functions 1432 - arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore 1433 - arm_compute::NEHGEMMAArch64FP16Kernel 1434 - NEDepthwiseConvolutionLayer3x3Kernel / NEDepthwiseIm2ColKernel / NEGEMMMatrixVectorMultiplyKernel / NEDepthwiseVectorToTensorKernel / @ref NEDepthwiseConvolutionLayer 1435 - NEGEMMLowpOffsetContributionKernel / NEGEMMLowpMatrixAReductionKernel / NEGEMMLowpMatrixBReductionKernel / NEGEMMLowpMatrixMultiplyCore 1436 - NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint 1437 - NEWinogradLayer / NEWinogradLayerKernel 1438 1439 - New OpenCL kernels / functions 1440 - CLGEMMLowpOffsetContributionKernel / CLGEMMLowpMatrixAReductionKernel / CLGEMMLowpMatrixBReductionKernel / CLGEMMLowpMatrixMultiplyCore 1441 - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint 1442 1443 - New graph nodes for Arm® Neon™ and OpenCL 1444 - graph::BranchLayer 1445 - graph::DepthConvertLayer 1446 - graph::DepthwiseConvolutionLayer 1447 - graph::DequantizationLayer 1448 - graph::FlattenLayer 1449 - graph::QuantizationLayer 1450 - graph::ReshapeLayer 1451 1452v17.10 Public maintenance release 1453 - Bug fixes: 1454 - Check the maximum local workgroup size supported by OpenCL devices 1455 - Minor documentation updates (Fixed instructions to build the examples) 1456 - Introduced a graph::GraphContext 1457 - Added a few new Graph nodes, support for branches and grouping. 1458 - Automatically enable cl_printf in debug builds 1459 - Fixed bare metal builds for armv7a 1460 - Added AlexNet and cartoon effect examples 1461 - Fixed library builds: libraries are no longer built as supersets of each other.(It means application using the Runtime part of the library now need to link against both libarm_compute_core and libarm_compute) 1462 1463v17.09 Public major release 1464 - Experimental Graph support: initial implementation of a simple stream API to easily chain machine learning layers. 1465 - Memory Manager (@ref BlobLifetimeManager, @ref BlobMemoryPool, @ref ILifetimeManager, @ref IMemoryGroup, @ref IMemoryManager, @ref IMemoryPool, @ref IPoolManager, @ref MemoryManagerOnDemand, @ref PoolManager) 1466 - New validation and benchmark frameworks (Boost and Google frameworks replaced by homemade framework). 1467 - Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both Arm® Neon™ and OpenCL. 1468 - New Arm® Neon™ kernels / functions: 1469 - arm_compute::NEGEMMAssemblyBaseKernel arm_compute::NEGEMMAArch64Kernel 1470 - NEDequantizationLayerKernel / @ref NEDequantizationLayer 1471 - NEFloorKernel / @ref NEFloor 1472 - @ref NEL2NormalizeLayerKernel / @ref NEL2NormalizeLayer 1473 - NEQuantizationLayerKernel NEMinMaxLayerKernel / @ref NEQuantizationLayer 1474 - @ref NEROIPoolingLayerKernel / @ref NEROIPoolingLayer 1475 - @ref NEReductionOperationKernel / @ref NEReductionOperation 1476 - NEReshapeLayerKernel / @ref NEReshapeLayer 1477 1478 - New OpenCL kernels / functions: 1479 - CLDepthwiseConvolutionLayer3x3NCHWKernel CLDepthwiseConvolutionLayer3x3NHWCKernel CLDepthwiseIm2ColKernel CLDepthwiseVectorToTensorKernel CLDepthwiseWeightsReshapeKernel / CLDepthwiseConvolutionLayer3x3 @ref CLDepthwiseConvolutionLayer CLDepthwiseSeparableConvolutionLayer 1480 - CLDequantizationLayerKernel / CLDequantizationLayer 1481 - CLDirectConvolutionLayerKernel / @ref CLDirectConvolutionLayer 1482 - CLFlattenLayer 1483 - CLFloorKernel / @ref CLFloor 1484 - CLGEMMTranspose1xW 1485 - CLGEMMMatrixVectorMultiplyKernel 1486 - @ref CLL2NormalizeLayerKernel / @ref CLL2NormalizeLayer 1487 - CLQuantizationLayerKernel CLMinMaxLayerKernel / @ref CLQuantizationLayer 1488 - @ref CLROIPoolingLayerKernel / @ref CLROIPoolingLayer 1489 - @ref CLReductionOperationKernel / @ref CLReductionOperation 1490 - CLReshapeLayerKernel / @ref CLReshapeLayer 1491 1492v17.06 Public major release 1493 - Various bug fixes 1494 - Added support for fixed point 8 bit (QS8) to the various Arm® Neon™ machine learning kernels. 1495 - Added unit tests and benchmarks (AlexNet, LeNet) 1496 - Added support for sub tensors. 1497 - Added infrastructure to provide GPU specific optimisation for some OpenCL kernels. 1498 - Added @ref OMPScheduler (OpenMP) scheduler for Neon 1499 - Added @ref SingleThreadScheduler scheduler for Arm® Neon™ (For bare metal) 1500 - User can specify their own scheduler by implementing the @ref IScheduler interface. 1501 - New OpenCL kernels / functions: 1502 - @ref CLBatchNormalizationLayerKernel / @ref CLBatchNormalizationLayer 1503 - CLDepthConcatenateLayerKernel / CLDepthConcatenateLayer 1504 - CLHOGOrientationBinningKernel CLHOGBlockNormalizationKernel, CLHOGDetectorKernel / CLHOGDescriptor CLHOGDetector CLHOGGradient CLHOGMultiDetection 1505 - CLLocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedLayer 1506 - CLWeightsReshapeKernel / CLConvolutionLayerReshapeWeights 1507 - New C++ kernels: 1508 - CPPDetectionWindowNonMaximaSuppressionKernel 1509 - New Arm® Neon™ kernels / functions: 1510 - @ref NEBatchNormalizationLayerKernel / @ref NEBatchNormalizationLayer 1511 - NEDepthConcatenateLayerKernel / NEDepthConcatenateLayer 1512 - NEDirectConvolutionLayerKernel / @ref NEDirectConvolutionLayer 1513 - NELocallyConnectedMatrixMultiplyKernel / NELocallyConnectedLayer 1514 - NEWeightsReshapeKernel / NEConvolutionLayerReshapeWeights 1515 1516v17.05 Public bug fixes release 1517 - Various bug fixes 1518 - Remaining of the functions ported to use accurate padding. 1519 - Library does not link against OpenCL anymore (It uses dlopen / dlsym at runtime instead to determine whether or not OpenCL is available). 1520 - Added "free" method to allocator. 1521 - Minimum version of g++ required for armv7 Linux changed from 4.8 to 4.9 1522 1523v17.04 Public bug fixes release 1524 1525 The following functions have been ported to use the new accurate padding: 1526 - CLColorConvertKernel 1527 - CLEdgeNonMaxSuppressionKernel 1528 - CLEdgeTraceKernel 1529 - CLGaussianPyramidHorKernel 1530 - CLGaussianPyramidVertKernel 1531 - CLGradientKernel 1532 - NEChannelCombineKernel 1533 - NEFillArrayKernel 1534 - NEGaussianPyramidHorKernel 1535 - NEGaussianPyramidVertKernel 1536 - NEHarrisScoreFP16Kernel 1537 - NEHarrisScoreKernel 1538 - NEHOGDetectorKernel 1539 - NELogits1DMaxKernel 1540 - NELogits1DShiftExpSumKernel 1541 - NELogits1DNormKernel 1542 - NENonMaximaSuppression3x3FP16Kernel 1543 - NENonMaximaSuppression3x3Kernel 1544 1545v17.03.1 First Major public release of the sources 1546 - Renamed the library to arm_compute 1547 - New CPP target introduced for C++ kernels shared between Arm® Neon™ and CL functions. 1548 - New padding calculation interface introduced and ported most kernels / functions to use it. 1549 - New OpenCL kernels / functions: 1550 - CLGEMMLowpMatrixMultiplyKernel / CLGEMMLowp 1551 - New Arm® Neon™ kernels / functions: 1552 - @ref NENormalizationLayerKernel / @ref NENormalizationLayer 1553 - NETransposeKernel / @ref NETranspose 1554 - NELogits1DMaxKernel, NELogits1DShiftExpSumKernel, NELogits1DNormKernel / @ref NESoftmaxLayer 1555 - NEIm2ColKernel, NECol2ImKernel, NEConvolutionLayerWeightsReshapeKernel / @ref NEConvolutionLayer 1556 - NEGEMMMatrixAccumulateBiasesKernel / @ref NEFullyConnectedLayer 1557 - NEGEMMLowpMatrixMultiplyKernel / NEGEMMLowp 1558 1559v17.03 Sources preview 1560 - New OpenCL kernels / functions: 1561 - CLGradientKernel, CLEdgeNonMaxSuppressionKernel, CLEdgeTraceKernel / CLCannyEdge 1562 - GEMM refactoring + FP16 support: CLGEMMInterleave4x4Kernel, CLGEMMTranspose1xWKernel, CLGEMMMatrixMultiplyKernel, CLGEMMMatrixAdditionKernel / @ref CLGEMM 1563 - CLGEMMMatrixAccumulateBiasesKernel / @ref CLFullyConnectedLayer 1564 - CLTransposeKernel / @ref CLTranspose 1565 - CLLKTrackerInitKernel, CLLKTrackerStage0Kernel, CLLKTrackerStage1Kernel, CLLKTrackerFinalizeKernel / CLOpticalFlow 1566 - @ref CLNormalizationLayerKernel / @ref CLNormalizationLayer 1567 - CLLaplacianPyramid, CLLaplacianReconstruct 1568 - New Arm® Neon™ kernels / functions: 1569 - NEActivationLayerKernel / @ref NEActivationLayer 1570 - GEMM refactoring + FP16 support (Requires armv8.2 CPU): NEGEMMInterleave4x4Kernel, NEGEMMTranspose1xWKernel, NEGEMMMatrixMultiplyKernel, NEGEMMMatrixAdditionKernel / @ref NEGEMM 1571 - NEPoolingLayerKernel / @ref NEPoolingLayer 1572 1573v17.02.1 Sources preview 1574 - New OpenCL kernels / functions: 1575 - CLLogits1DMaxKernel, CLLogits1DShiftExpSumKernel, CLLogits1DNormKernel / @ref CLSoftmaxLayer 1576 - CLPoolingLayerKernel / @ref CLPoolingLayer 1577 - CLIm2ColKernel, CLCol2ImKernel, CLConvolutionLayerWeightsReshapeKernel / CLConvolutionLayer 1578 - CLRemapKernel / CLRemap 1579 - CLGaussianPyramidHorKernel, CLGaussianPyramidVertKernel / CLGaussianPyramid, CLGaussianPyramidHalf, CLGaussianPyramidOrb 1580 - CLMinMaxKernel, CLMinMaxLocationKernel / CLMinMaxLocation 1581 - CLNonLinearFilterKernel / CLNonLinearFilter 1582 - New Arm® Neon™ FP16 kernels (Requires armv8.2 CPU) 1583 - NEAccumulateWeightedFP16Kernel 1584 - NEBox3x3FP16Kernel 1585 - NENonMaximaSuppression3x3FP16Kernel 1586 1587v17.02 Sources preview 1588 - New OpenCL kernels / functions: 1589 - CLActivationLayerKernel / @ref CLActivationLayer 1590 - CLChannelCombineKernel / CLChannelCombine 1591 - CLDerivativeKernel / CLChannelExtract 1592 - CLFastCornersKernel / CLFastCorners 1593 - CLMeanStdDevKernel / CLMeanStdDev 1594 - New Arm® Neon™ kernels / functions: 1595 - HOG / SVM: NEHOGOrientationBinningKernel, NEHOGBlockNormalizationKernel, NEHOGDetectorKernel, NEHOGNonMaximaSuppressionKernel / NEHOGDescriptor, NEHOGDetector, NEHOGGradient, NEHOGMultiDetection 1596 - NENonLinearFilterKernel / NENonLinearFilter 1597 - Introduced a CLScheduler to manage the default context and command queue used by the runtime library and create synchronisation events. 1598 - Switched all the kernels / functions to use tensors instead of images. 1599 - Updated documentation to include instructions to build the library from sources. 1600 1601v16.12 Binary preview release 1602 - Original release 1603 1604 */ 1605} // namespace arm_compute 1606