arm-optimized-routines/math/README.contributors

*412f47f9SXin LiSTYLE REQUIREMENTS
*412f47f9SXin Li==================
*412f47f9SXin Li
*412f47f9SXin Li1. Most code in this sub-directory is expected to be upstreamed into glibc so
*412f47f9SXin Li   the GNU Coding Standard and glibc specific conventions should be followed
*412f47f9SXin Li   to ease upstreaming.
*412f47f9SXin Li
*412f47f9SXin Li2. ABI and symbols: the code should be written so it is suitable for inclusion
*412f47f9SXin Li   into a libc with minimal changes. This e.g. means that internal symbols
*412f47f9SXin Li   should be hidden and in the implementation reserved namespace according to
*412f47f9SXin Li   ISO C and POSIX rules. If possible the built shared libraries and static
*412f47f9SXin Li   library archives should be usable to override libc symbols at link time (or
*412f47f9SXin Li   at runtime via LD_PRELOAD). This requires the symbols to follow the glibc ABI
*412f47f9SXin Li   (other than symbol versioning), this cannot be done reliably for static
*412f47f9SXin Li   linking so this is a best effort requirement.
*412f47f9SXin Li
*412f47f9SXin Li3. API: include headers should be suitable for benchmarking and testing code
*412f47f9SXin Li   and should not conflict with libc headers.
*412f47f9SXin Li
*412f47f9SXin Li
*412f47f9SXin LiCONTRIBUTION GUIDELINES FOR math SUB-DIRECTORY
*412f47f9SXin Li==============================================
*412f47f9SXin Li
*412f47f9SXin Li1. Math functions have quality and performance requirements.
*412f47f9SXin Li
*412f47f9SXin Li2. Quality:
*412f47f9SXin Li   - Worst-case ULP error should be small in the entire input domain (for most
*412f47f9SXin Li     common double precision scalar functions the target is < 0.66 ULP error,
*412f47f9SXin Li     and < 1 ULP for single precision, even performance optimized function
*412f47f9SXin Li     variant should not have > 5 ULP error if the goal is to be a drop in
*412f47f9SXin Li     replacement for a standard math function), this should be tested
*412f47f9SXin Li     statistically (or on all inputs if possible in reasonable amount of time).
*412f47f9SXin Li     The ulp tool is for this and runulp.sh should be updated for new functions.
*412f47f9SXin Li
*412f47f9SXin Li   - All standard rounding modes need to be supported but in non-default rounding
*412f47f9SXin Li     modes the quality requirement can be relaxed. (Non-nearest rounded
*412f47f9SXin Li     computation can be slow and inaccurate but has to be correct for conformance
*412f47f9SXin Li     reasons.)
*412f47f9SXin Li
*412f47f9SXin Li   - Special cases and error handling need to follow ISO C Annex F requirements,
*412f47f9SXin Li     POSIX requirements, IEEE 754-2008 requirements and Glibc requiremnts:
*412f47f9SXin Li     https://www.gnu.org/software/libc/manual/html_mono/libc.html#Errors-in-Math-Functions
*412f47f9SXin Li     this should be tested by direct tests (glibc test system may be used for it).
*412f47f9SXin Li
*412f47f9SXin Li   - Error handling code should be decoupled from the approximation code as much
*412f47f9SXin Li     as possible. (There are helper functions, these take care of errno as well
*412f47f9SXin Li     as exception raising.)
*412f47f9SXin Li
*412f47f9SXin Li   - Vector math code does not need to work in non-nearest rounding mode and error
*412f47f9SXin Li     handling side effects need not happen (fenv exceptions and errno), but the
*412f47f9SXin Li     result should be correct (within quality requirements, which are lower for
*412f47f9SXin Li     vector code than for scalar code).
*412f47f9SXin Li
*412f47f9SXin Li   - Error bounds of the approximation should be clearly documented.
*412f47f9SXin Li
*412f47f9SXin Li   - The code should build and pass tests on arm, aarch64 and x86_64 GNU linux
*412f47f9SXin Li     systems. (Routines and features can be disabled on specific targets, but
*412f47f9SXin Li     the build must complete). On aarch64, both little- and big-endian targets
*412f47f9SXin Li     are supported as well as valid combinations of architecture extensions.
*412f47f9SXin Li     The configurations that should be tested depend on the contribution.
*412f47f9SXin Li
*412f47f9SXin Li3. Performance:
*412f47f9SXin Li   - Common math code should be benchmarked on modern aarch64 microarchitectures
*412f47f9SXin Li     over typical inputs.
*412f47f9SXin Li
*412f47f9SXin Li   - Performance improvements should be documented (relative numbers can be
*412f47f9SXin Li     published; it is enough to use the mathbench microbenchmark tool which should
*412f47f9SXin Li     be updated for new functions).
*412f47f9SXin Li
*412f47f9SXin Li   - Attention should be paid to the compilation flags: for aarch64 fma
*412f47f9SXin Li     contraction should be on and math errno turned off so some builtins can be
*412f47f9SXin Li     inlined.
*412f47f9SXin Li
*412f47f9SXin Li   - The code should be reasonably performant on x86_64 too, e.g. some rounding
*412f47f9SXin Li     instructions and fma may not be available on x86_64, such builtins turn into
*412f47f9SXin Li     libc calls with slow code. Such slowdown is not acceptable, a faster fallback
*412f47f9SXin Li     should be present: glibc and bionic use the same code on all targets. (This
*412f47f9SXin Li     does not apply to vector math code).