1*35ffd701SAndroid Build Coastguard Worker# sse2neon 2*35ffd701SAndroid Build Coastguard Worker 3*35ffd701SAndroid Build Coastguard Worker 4*35ffd701SAndroid Build Coastguard WorkerA C/C++ header file that converts Intel SSE intrinsics to Arm/Aarch64 NEON intrinsics. 5*35ffd701SAndroid Build Coastguard Worker 6*35ffd701SAndroid Build Coastguard Worker## Introduction 7*35ffd701SAndroid Build Coastguard Worker 8*35ffd701SAndroid Build Coastguard Worker`sse2neon` is a translator of Intel SSE (Streaming SIMD Extensions) intrinsics 9*35ffd701SAndroid Build Coastguard Workerto [Arm NEON](https://developer.arm.com/architectures/instruction-sets/simd-isas/neon), 10*35ffd701SAndroid Build Coastguard Workershortening the time needed to get an Arm working program that then can be used to 11*35ffd701SAndroid Build Coastguard Workerextract profiles and to identify hot paths in the code. 12*35ffd701SAndroid Build Coastguard WorkerThe header file `sse2neon.h` contains several of the functions provided by Intel 13*35ffd701SAndroid Build Coastguard Workerintrinsic headers such as `<xmmintrin.h>`, only implemented with NEON-based counterparts 14*35ffd701SAndroid Build Coastguard Workerto produce the exact semantics of the intrinsics. 15*35ffd701SAndroid Build Coastguard Worker 16*35ffd701SAndroid Build Coastguard Worker## Mapping and Coverage 17*35ffd701SAndroid Build Coastguard Worker 18*35ffd701SAndroid Build Coastguard WorkerHeader file | Extension | 19*35ffd701SAndroid Build Coastguard Worker---|---| 20*35ffd701SAndroid Build Coastguard Worker`<mmintrin.h>` | MMX | 21*35ffd701SAndroid Build Coastguard Worker`<xmmintrin.h>` | SSE | 22*35ffd701SAndroid Build Coastguard Worker`<emmintrin.h>` | SSE2 | 23*35ffd701SAndroid Build Coastguard Worker`<pmmintrin.h>` | SSE3 | 24*35ffd701SAndroid Build Coastguard Worker`<tmmintrin.h>` | SSSE3 | 25*35ffd701SAndroid Build Coastguard Worker`<smmintrin.h>` | SSE4.1 | 26*35ffd701SAndroid Build Coastguard Worker`<nmmintrin.h>` | SSE4.2 | 27*35ffd701SAndroid Build Coastguard Worker`<wmmintrin.h>` | AES | 28*35ffd701SAndroid Build Coastguard Worker 29*35ffd701SAndroid Build Coastguard Worker`sse2neon` aims to support SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES extension. 30*35ffd701SAndroid Build Coastguard Worker 31*35ffd701SAndroid Build Coastguard WorkerIn order to deliver NEON-equivalent intrinsics for all SSE intrinsics used widely, 32*35ffd701SAndroid Build Coastguard Workerplease be aware that some SSE intrinsics exist a direct mapping with a concrete 33*35ffd701SAndroid Build Coastguard WorkerNEON-equivalent intrinsic. However, others lack of 1-to-1 mapping, that means the 34*35ffd701SAndroid Build Coastguard Workerequivalents are implemented using several NEON intrinsics. 35*35ffd701SAndroid Build Coastguard Worker 36*35ffd701SAndroid Build Coastguard WorkerFor example, SSE intrinsic `_mm_loadu_si128` has a direct NEON mapping (`vld1q_s32`), 37*35ffd701SAndroid Build Coastguard Workerbut SSE intrinsic `_mm_maddubs_epi16` has to be implemented with 13+ NEON instructions. 38*35ffd701SAndroid Build Coastguard Worker 39*35ffd701SAndroid Build Coastguard Worker## Usage 40*35ffd701SAndroid Build Coastguard Worker 41*35ffd701SAndroid Build Coastguard Worker- Put the file `sse2neon.h` in to your source code directory. 42*35ffd701SAndroid Build Coastguard Worker 43*35ffd701SAndroid Build Coastguard Worker- Locate the following SSE header files included in the code: 44*35ffd701SAndroid Build Coastguard Worker```C 45*35ffd701SAndroid Build Coastguard Worker#include <xmmintrin.h> 46*35ffd701SAndroid Build Coastguard Worker#include <emmintrin.h> 47*35ffd701SAndroid Build Coastguard Worker``` 48*35ffd701SAndroid Build Coastguard Worker {p,t,s,n,w}mmintrin.h should be replaceable, but the coverage of these extensions might be limited though. 49*35ffd701SAndroid Build Coastguard Worker 50*35ffd701SAndroid Build Coastguard Worker- Replace them with: 51*35ffd701SAndroid Build Coastguard Worker```C 52*35ffd701SAndroid Build Coastguard Worker#include "sse2neon.h" 53*35ffd701SAndroid Build Coastguard Worker``` 54*35ffd701SAndroid Build Coastguard Worker 55*35ffd701SAndroid Build Coastguard Worker- Explicitly specify platform-specific options to gcc/clang compilers. 56*35ffd701SAndroid Build Coastguard Worker * On ARMv8-A targets, you should specify the following compiler option: (Remove `crypto` and/or `crc` if your architecture does not support cryptographic and/or CRC32 extensions) 57*35ffd701SAndroid Build Coastguard Worker ```shell 58*35ffd701SAndroid Build Coastguard Worker -march=armv8-a+fp+simd+crypto+crc 59*35ffd701SAndroid Build Coastguard Worker ``` 60*35ffd701SAndroid Build Coastguard Worker * On ARMv7-A targets, you need to append the following compiler option: 61*35ffd701SAndroid Build Coastguard Worker ```shell 62*35ffd701SAndroid Build Coastguard Worker -mfpu=neon 63*35ffd701SAndroid Build Coastguard Worker ``` 64*35ffd701SAndroid Build Coastguard Worker 65*35ffd701SAndroid Build Coastguard Worker## Compile-time Configurations 66*35ffd701SAndroid Build Coastguard Worker 67*35ffd701SAndroid Build Coastguard WorkerConsidering the balance between correctness and performance, `sse2neon` recognizes the following compile-time configurations: 68*35ffd701SAndroid Build Coastguard Worker* `SSE2NEON_PRECISE_MINMAX`: Enable precise implementation of `_mm_min_ps` and `_mm_max_ps`. If you need consistent results such as NaN special cases, enable it. 69*35ffd701SAndroid Build Coastguard Worker* `SSE2NEON_PRECISE_DIV`: Enable precise implementation of `_mm_rcp_ps` and `_mm_div_ps` by additional Netwon-Raphson iteration for accuracy. 70*35ffd701SAndroid Build Coastguard Worker* `SSE2NEON_PRECISE_SQRT`: Enable precise implementation of `_mm_sqrt_ps` and `_mm_rsqrt_ps` by additional Netwon-Raphson iteration for accuracy. 71*35ffd701SAndroid Build Coastguard Worker 72*35ffd701SAndroid Build Coastguard WorkerThe above are turned off by default, and you should define the corresponding macro(s) as `1` before including `sse2neon.h` if you need the precise implementations. 73*35ffd701SAndroid Build Coastguard Worker 74*35ffd701SAndroid Build Coastguard Worker## Run Built-in Test Suite 75*35ffd701SAndroid Build Coastguard Worker 76*35ffd701SAndroid Build Coastguard Worker`sse2neon` provides a unified interface for developing test cases. These test 77*35ffd701SAndroid Build Coastguard Workercases are located in `tests` directory, and the input data is specified at 78*35ffd701SAndroid Build Coastguard Workerruntime. Use the following commands to perform test cases: 79*35ffd701SAndroid Build Coastguard Worker```shell 80*35ffd701SAndroid Build Coastguard Worker$ make check 81*35ffd701SAndroid Build Coastguard Worker``` 82*35ffd701SAndroid Build Coastguard Worker 83*35ffd701SAndroid Build Coastguard WorkerYou can specify GNU toolchain for cross compilation as well. 84*35ffd701SAndroid Build Coastguard Worker[QEMU](https://www.qemu.org/) should be installed in advance. 85*35ffd701SAndroid Build Coastguard Worker```shell 86*35ffd701SAndroid Build Coastguard Worker$ make CROSS_COMPILE=aarch64-linux-gnu- check # ARMv8-A 87*35ffd701SAndroid Build Coastguard Worker``` 88*35ffd701SAndroid Build Coastguard Workeror 89*35ffd701SAndroid Build Coastguard Worker```shell 90*35ffd701SAndroid Build Coastguard Worker$ make CROSS_COMPILE=arm-linux-gnueabihf- check # ARMv7-A 91*35ffd701SAndroid Build Coastguard Worker``` 92*35ffd701SAndroid Build Coastguard Worker 93*35ffd701SAndroid Build Coastguard WorkerCheck the details via [Test Suite for SSE2NEON](tests/README.md). 94*35ffd701SAndroid Build Coastguard Worker 95*35ffd701SAndroid Build Coastguard Worker## Adoptions 96*35ffd701SAndroid Build Coastguard WorkerHere is a partial list of open source projects that have adopted `sse2neon` for Arm/Aarch64 support. 97*35ffd701SAndroid Build Coastguard Worker* [aether-game-utils](https://github.com/johnhues/aether-game-utils) is a collection of cross platform utilities for quickly creating small game prototypes in C++. 98*35ffd701SAndroid Build Coastguard Worker* [Apache Impala](https://impala.apache.org/) is a lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters. 99*35ffd701SAndroid Build Coastguard Worker* [Apache Kudu](https://kudu.apache.org/) completes Hadoop's storage layer to enable fast analytics on fast data. 100*35ffd701SAndroid Build Coastguard Worker* [ART](https://github.com/dinosaure/art) is an implementation in OCaml of [Adaptive Radix Tree](https://db.in.tum.de/~leis/papers/ART.pdf) (ART). 101*35ffd701SAndroid Build Coastguard Worker* [Async](https://github.com/romange/async) is a set of c++ primitives that allows efficient and rapid development in C++17 on GNU/Linux systems. 102*35ffd701SAndroid Build Coastguard Worker* [Blender](https://www.blender.org/) is the free and open source 3D creation suite, supporting the entirety of the 3D pipeline. 103*35ffd701SAndroid Build Coastguard Worker* [Boo](https://github.com/AxioDL/boo) is a cross-platform windowing and event manager similar to SDL or SFML, with additional 3D rendering functionality. 104*35ffd701SAndroid Build Coastguard Worker* [CARTA](https://github.com/CARTAvis/carta-backend) is a new visualization tool designed for viewing radio astronomy images in CASA, FITS, MIRIAD, and HDF5 formats (using the IDIA custom schema for HDF5). 105*35ffd701SAndroid Build Coastguard Worker* [Catcoon](https://github.com/i-evi/catcoon) is a [feedforward neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network) implementation in C. 106*35ffd701SAndroid Build Coastguard Worker* [dab-cmdline](https://github.com/JvanKatwijk/dab-cmdline) provides entries for the functionality to handle Digital audio broadcasting (DAB)/DAB+ through some simple calls. 107*35ffd701SAndroid Build Coastguard Worker* [EDGE](https://github.com/3dfxdev/EDGE) is an advanced OpenGL source port spawned from the DOOM engine, with focus on easy development and expansion for modders and end-users. 108*35ffd701SAndroid Build Coastguard Worker* [Embree](https://github.com/embree/embree) a collection of high-performance ray tracing kernels. Its target users are graphics application engineers who want to improve the performance of their photo-realistic rendering application by leveraging Embree's performance-optimized ray tracing kernels. 109*35ffd701SAndroid Build Coastguard Worker* [emp-tool](https://github.com/emp-toolkit/emp-tool) aims to provide a benchmark for secure computation and allowing other researchers to experiment and extend. 110*35ffd701SAndroid Build Coastguard Worker* [FoundationDB](https://www.foundationdb.org) is a distributed database designed to handle large volumes of structured data across clusters of commodity servers. 111*35ffd701SAndroid Build Coastguard Worker* [iqtree_arm_neon](https://github.com/joshlvmh/iqtree_arm_neon) is the Arm NEON port of [IQ-TREE](http://www.iqtree.org/), fast and effective stochastic algorithm to infer phylogenetic trees by maximum likelihood. 112*35ffd701SAndroid Build Coastguard Worker* [kram](https://github.com/alecazam/kram) is a wrapper to several popular encoders to and from PNG/[KTX](https://www.khronos.org/opengles/sdk/tools/KTX/file_format_spec/) files with [LDR/HDR and BC/ASTC/ETC2](https://developer.arm.com/solutions/graphics-and-gaming/developer-guides/learn-the-basics/adaptive-scalable-texture-compression/single-page). 113*35ffd701SAndroid Build Coastguard Worker* [libscapi](https://github.com/cryptobiu/libscapi) stands for the "Secure Computation API", providing reliable, efficient, and highly flexible cryptographic infrastructure. 114*35ffd701SAndroid Build Coastguard Worker* [libmatoya](https://github.com/matoya/libmatoya) is a cross-platform application development library, providing various features such as common cryptography tasks. 115*35ffd701SAndroid Build Coastguard Worker* [Madronalib](https://github.com/madronalabs/madronalib) enables efficient audio DSP on SIMD processors with readable and brief C++ code. 116*35ffd701SAndroid Build Coastguard Worker* [minimap2](https://github.com/lh3/minimap2) is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. 117*35ffd701SAndroid Build Coastguard Worker* [MMseqs2](https://github.com/soedinglab/MMseqs2) (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets. 118*35ffd701SAndroid Build Coastguard Worker* [MRIcroGL](https://github.com/rordenlab/MRIcroGL) is a cross-platform tool for viewing NIfTI, DICOM, MGH, MHD, NRRD, AFNI format medical images. 119*35ffd701SAndroid Build Coastguard Worker* [N2](https://github.com/oddconcepts/n2o) is an approximate nearest neighborhoods algorithm library written in C++, providing a much faster search speed than other implementations when modeling large dataset. 120*35ffd701SAndroid Build Coastguard Worker* [niimath](https://github.com/rordenlab/niimath) is a general image calculator with superior performance. 121*35ffd701SAndroid Build Coastguard Worker* [OBS Studio](https://github.com/obsproject/obs-studio) is software designed for capturing, compositing, encoding, recording, and streaming video content, efficiently. 122*35ffd701SAndroid Build Coastguard Worker* [OGRE](https://github.com/OGRECave/ogre) is a scene-oriented, flexible 3D engine written in C++ designed to make it easier and more intuitive for developers to produce games and demos utilising 3D hardware. 123*35ffd701SAndroid Build Coastguard Worker* [OpenXRay](https://github.com/OpenXRay/xray-16) is an improved version of the X-Ray engine, used in world famous S.T.A.L.K.E.R. game series by GSC Game World. 124*35ffd701SAndroid Build Coastguard Worker* [parallel-n64](https://github.com/libretro/parallel-n64) is an optimized/rewritten Nintendo 64 emulator made specifically for [Libretro](https://www.libretro.com/). 125*35ffd701SAndroid Build Coastguard Worker* [PFFFT](https://github.com/marton78/pffft) does 1D Fast Fourier Transforms, of single precision real and complex vectors. 126*35ffd701SAndroid Build Coastguard Worker* [PlutoSDR Firmware](https://github.com/seanstone/plutosdr-fw) is the customized firmware for the [PlutoSDR](https://wiki.analog.com/university/tools/pluto) that can be used to introduce fundamentals of Software Defined Radio (SDR) or Radio Frequency (RF) or Communications as advanced topics in electrical engineering in a self or instructor lead setting. 127*35ffd701SAndroid Build Coastguard Worker* [Pygame](https://www.pygame.org) is cross-platform and designed to make it easy to write multimedia software, such as games, in Python. 128*35ffd701SAndroid Build Coastguard Worker* [simd_utils](https://github.com/JishinMaster/simd_utils) is a header-only library implementing common mathematical functions using SIMD intrinsics. 129*35ffd701SAndroid Build Coastguard Worker* [SMhasher](https://github.com/rurban/smhasher) provides comprehensive Hash function quality and speed tests. 130*35ffd701SAndroid Build Coastguard Worker* [Spack](https://github.com/spack/spack) is a multi-platform package manager that builds and installs multiple versions and configurations of software. 131*35ffd701SAndroid Build Coastguard Worker* [srsLTE](https://github.com/srsLTE/srsLTE) is an open source SDR LTE software suite. 132*35ffd701SAndroid Build Coastguard Worker* [Surge](https://github.com/surge-synthesizer/surge) is an open source digital synthesizer. 133*35ffd701SAndroid Build Coastguard Worker* [XMRig](https://github.com/xmrig/xmrig) is an open source CPU miner for [Monero](https://web.getmonero.org/) cryptocurrency. 134*35ffd701SAndroid Build Coastguard Worker 135*35ffd701SAndroid Build Coastguard Worker## Related Projects 136*35ffd701SAndroid Build Coastguard Worker* [SIMDe](https://github.com/simd-everywhere/simde): fast and portable implementations of SIMD 137*35ffd701SAndroid Build Coastguard Worker intrinsics on hardware which doesn't natively support them, such as calling SSE functions on ARM. 138*35ffd701SAndroid Build Coastguard Worker* [CatBoost's sse2neon](https://github.com/catboost/catboost/blob/master/library/cpp/sse/sse2neon.h) 139*35ffd701SAndroid Build Coastguard Worker* [ARM\_NEON\_2\_x86\_SSE](https://github.com/intel/ARM_NEON_2_x86_SSE) 140*35ffd701SAndroid Build Coastguard Worker* [AvxToNeon](https://github.com/kunpengcompute/AvxToNeon) 141*35ffd701SAndroid Build Coastguard Worker* [POWER/PowerPC support for GCC](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000) contains a series of headers simplifying porting x86_64 code that 142*35ffd701SAndroid Build Coastguard Worker makes explicit use of Intel intrinsics to powerpc64le (pure little-endian mode that has been introduced with the [POWER8](https://en.wikipedia.org/wiki/POWER8)). 143*35ffd701SAndroid Build Coastguard Worker - implementation: [xmmintrin.h](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/xmmintrin.h), [emmintrin.h](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/emmintrin.h), [pmmintrin.h](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/pmmintrin.h), [tmmintrin.h](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/tmmintrin.h), [smmintrin.h](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/smmintrin.h) 144*35ffd701SAndroid Build Coastguard Worker 145*35ffd701SAndroid Build Coastguard Worker## Reference 146*35ffd701SAndroid Build Coastguard Worker* [Intel Intrinsics Guide](https://software.intel.com/sites/landingpage/IntrinsicsGuide/) 147*35ffd701SAndroid Build Coastguard Worker* [Arm Neon Intrinsics Reference](https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 148*35ffd701SAndroid Build Coastguard Worker* [Neon Programmer's Guide for Armv8-A](https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/neon-programmers-guide-for-armv8-a) 149*35ffd701SAndroid Build Coastguard Worker* [NEON Programmer's Guide](https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf) 150*35ffd701SAndroid Build Coastguard Worker* [qemu/target/i386/ops_sse.h](https://github.com/qemu/qemu/blob/master/target/i386/ops_sse.h): Comprehensive SSE instruction emulation in C. Ideal for semantic checks. 151*35ffd701SAndroid Build Coastguard Worker 152*35ffd701SAndroid Build Coastguard Worker## Licensing 153*35ffd701SAndroid Build Coastguard Worker 154*35ffd701SAndroid Build Coastguard Worker`sse2neon` is freely redistributable under the MIT License. 155