1*32afb93cSXin Li# RenderScript Intrinsics Replacement Toolkit - v0.8 BETA 2*32afb93cSXin Li 3*32afb93cSXin LiThis Toolkit provides a collection of high-performance image manipulation functions 4*32afb93cSXin Lilike blur, blend, and resize. It can be used as a stand-alone replacement for most 5*32afb93cSXin Liof the deprecated RenderScript Intrinsics functions. 6*32afb93cSXin Li 7*32afb93cSXin LiThe Toolkit provides ten image manipulation functions: 8*32afb93cSXin Li* blend, 9*32afb93cSXin Li* blur, 10*32afb93cSXin Li* color matrix, 11*32afb93cSXin Li* convolve, 12*32afb93cSXin Li* histogram and histogramDot, 13*32afb93cSXin Li* LUT (lookup table) and LUT 3D, 14*32afb93cSXin Li* resize, and 15*32afb93cSXin Li* YUV to RGB. 16*32afb93cSXin Li 17*32afb93cSXin LiThe Toolkit provides a C++ and a Java/Kotlin interface. It is packaged as an Android 18*32afb93cSXin Lilibrary that you can add to your project. 19*32afb93cSXin Li 20*32afb93cSXin LiThese functions execute multithreaded on the CPU. They take advantage of Neon/AdvSimd 21*32afb93cSXin Lion Arm processors and SSE on Intel's. 22*32afb93cSXin Li 23*32afb93cSXin LiCompared to the RenderScript Intrinsics, this Toolkit is simpler to use and twice as fast 24*32afb93cSXin Liwhen executing on the CPU. However RenderScript Intrinsics allow more flexibility for 25*32afb93cSXin Lithe type of allocations supported. This toolkit does not support allocations of floats; 26*32afb93cSXin Limost the functions support ByteArrays and Bitmaps. 27*32afb93cSXin Li 28*32afb93cSXin LiYou should instantiate the Toolkit once and reuse it throughout your application. 29*32afb93cSXin LiOn instantiation, the Toolkit creates a thread pool that's used for processing all the functions. 30*32afb93cSXin LiYou can limit the number of poolThreads used by the Toolkit via the constructor. The poolThreads 31*32afb93cSXin Liare destroyed once the Toolkit is destroyed, after any pending work is done. 32*32afb93cSXin Li 33*32afb93cSXin LiThis library is thread safe. You can call methods from different poolThreads. The functions will 34*32afb93cSXin Liexecute sequentially. 35*32afb93cSXin Li 36*32afb93cSXin Li 37*32afb93cSXin Li## Future improvement ideas: 38*32afb93cSXin Li 39*32afb93cSXin Li* Turn the Java version of the Toolkit into a singleton, to reduce the chance that someone inadventarly 40*32afb93cSXin Licreate multiple threadpools. 41*32afb93cSXin Li 42*32afb93cSXin Li* Support ByteBuffer. It should be straightforward to use GetDirectBufferAddress in JniEntryPoints.cpp. 43*32afb93cSXin LiSee https://developer.android.com/training/articles/perf-jni and jni_helper.h. 44*32afb93cSXin Li 45*32afb93cSXin Li* The RenderScript Intrinsics support floats for colorMatrix, convolve, and resize. The Toolkit does not. 46*32afb93cSXin Li 47*32afb93cSXin Li* Allow in place update of buffers, or writing to an existing byte array. 48*32afb93cSXin Li 49*32afb93cSXin Li* For Blur, we could have a version that accepts a mask. This is commonly used for background 50*32afb93cSXin Liblurring. We should allow the mask to be smaller than the original, since neural networks models 51*32afb93cSXin Lithat do segmentation are downscaled. 52*32afb93cSXin Li 53*32afb93cSXin Li* Allow yuvToRgb to have a Restriction. 54*32afb93cSXin Li 55*32afb93cSXin Li* Add support for YUV_420_888, the YUV format favored by Camera2. Allow various strides to be specified. 56*32afb93cSXin Li 57*32afb93cSXin Li* When passing a Restriction, it would be nice to say "Create a smaller output". 58*32afb93cSXin LiThe original RenderScript does not allow that. It's not that useful when outputing new buffers as 59*32afb93cSXin Liour Java library does. 60*32afb93cSXin Li 61*32afb93cSXin Li* For Resize, Restriction working on input buffer would be more useful but that's not RenderScript. 62*32afb93cSXin Li 63*32afb93cSXin Li* Integrate and test with imageprocessing_jb. Do the same for [github/renderscript-samples/](https://github.com/android/renderscript-samples/tree/main/RenderScriptIntrinsic) 64*32afb93cSXin Li 65*32afb93cSXin Li* Allow Bitmaps with rowSize != width * vectorSize. We could do this also for ByteArray. 66*32afb93cSXin Li 67*32afb93cSXin Li- In TaskProcessor.cpp, the code below is fine and clean, but probably a bit inefficient. 68*32afb93cSXin LiWhen this wakes up another thread, it may have to immediately go back to sleep, since we still hold the lock. 69*32afb93cSXin LiIt could instead set a need_to_notify flag and test that after releasing the lock (both places). 70*32afb93cSXin LiThat might avoid some context switches. 71*32afb93cSXin Li```cpp 72*32afb93cSXin Liif (mTilesInProcess == 0 && mTilesNotYetStarted == 0) { 73*32afb93cSXin Li mWorkIsFinished.notify_one(); 74*32afb93cSXin Li``` 75*32afb93cSXin Li 76*32afb93cSXin Li* When compiled as part of Android, librenderscript_toolkit.so is 101,456 bytes. When compiled by Android Studio as part of an .aar, it's 387K. Figure out why and slim it down. 77