1*46dbe239SXin Li# FXdiv 2*46dbe239SXin Li 3*46dbe239SXin Li[](https://github.com/Maratyszcza/FXdiv/blob/master/LICENSE) 4*46dbe239SXin Li[](https://travis-ci.org/Maratyszcza/FXdiv) 5*46dbe239SXin Li 6*46dbe239SXin Li 7*46dbe239SXin LiHeader-only library for division via fixed-point multiplication by inverse 8*46dbe239SXin Li 9*46dbe239SXin LiOn modern CPUs and GPUs integer division is several times slower than multiplication. FXdiv implements an algorithm to replace an integer division with a multiplication and two shifts. This algorithm improves performance when an application performs repeated divisions by the same divisor. 10*46dbe239SXin Li 11*46dbe239SXin Li## Features 12*46dbe239SXin Li 13*46dbe239SXin Li- Integer division for `uint32_t`, `uint64_t`, and `size_t` 14*46dbe239SXin Li- Header-only library, no installation or build required 15*46dbe239SXin Li- Compatible with C99, C++, OpenCL, and CUDA 16*46dbe239SXin Li- Uses platform-specific compiler intrinsics for optimal performance 17*46dbe239SXin Li- Covered with unit tests and microbenchmarks 18*46dbe239SXin Li 19*46dbe239SXin Li## Example 20*46dbe239SXin Li 21*46dbe239SXin Li```c 22*46dbe239SXin Li#include <fxdiv.h> 23*46dbe239SXin Li 24*46dbe239SXin Li/* Division of array by a constant: reference implementation */ 25*46dbe239SXin Livoid divide_array_c(size_t length, uint32_t array[], uint32_t divisor) { 26*46dbe239SXin Li for (size_t i = 0; i < length; i++) { 27*46dbe239SXin Li array[i] /= divisor; 28*46dbe239SXin Li } 29*46dbe239SXin Li} 30*46dbe239SXin Li 31*46dbe239SXin Li/* Division of array by a constant: implementation with FXdiv */ 32*46dbe239SXin Livoid divide_array_fxdiv(size_t length, uint32_t array[], uint32_t divisor) { 33*46dbe239SXin Li const struct fxdiv_divisor_uint32_t precomputed_divisor = 34*46dbe239SXin Li fxdiv_init_uint32_t(divisor); 35*46dbe239SXin Li for (size_t i = 0; i < length; i++) { 36*46dbe239SXin Li array[i] = fxdiv_quotient_uint32_t(array[i], precomputed_divisor); 37*46dbe239SXin Li } 38*46dbe239SXin Li} 39*46dbe239SXin Li``` 40*46dbe239SXin Li 41*46dbe239SXin Li## Status 42*46dbe239SXin Li 43*46dbe239SXin LiCurrently working features: 44*46dbe239SXin Li 45*46dbe239SXin Li| Platform | uint32_t | uint64_t | size_t | 46*46dbe239SXin Li| --------------- |:--------:|:--------:|:--------:| 47*46dbe239SXin Li| x86-64 gcc | Works | Works | Works | 48*46dbe239SXin Li| x86-64 clang | Works | Works | Works | 49*46dbe239SXin Li| x86-64 MSVC | Works | Works | Works | 50*46dbe239SXin Li| x86 gcc | Works | Works | Works | 51*46dbe239SXin Li| x86 clang | Works | Works | Works | 52*46dbe239SXin Li| x86 MSVC | Works | Works | Works | 53*46dbe239SXin Li| ARMv7 gcc | Works | Works | Works | 54*46dbe239SXin Li| ARMv7 clang | Works | Works | Works | 55*46dbe239SXin Li| ARMv7 MSVC* | Compiles | Compiles | Compiles | 56*46dbe239SXin Li| ARM64 gcc | Works | Works | Works | 57*46dbe239SXin Li| ARM64 clang | Works | Works | Works | 58*46dbe239SXin Li| ARM64 MSVC* | Compiles | Compiles | Compiles | 59*46dbe239SXin Li| PPC64 gcc | Works | Works | Works | 60*46dbe239SXin Li| WAsm clang | Works | Works | Works | 61*46dbe239SXin Li| Asm.js clang | Works | Works | Works | 62*46dbe239SXin Li| PNaCl clang | Works | Works | Works | 63*46dbe239SXin Li| CUDA | Untested | Untested | Untested | 64*46dbe239SXin Li| OpenCL | Untested | Untested | Untested | 65*46dbe239SXin Li 66*46dbe239SXin Li*ARMv7 and ARM64 builds with MSVC are presumed to work, but were only verified to compile successfully 67*46dbe239SXin Li 68*46dbe239SXin Li## References 69*46dbe239SXin Li 70*46dbe239SXin Li- Granlund, Torbjörn, and Peter L. Montgomery. "Division by invariant integers using multiplication." In ACM SIGPLAN Notices, vol. 29, no. 6, pp. 61-72. ACM, 1994. Available: [gmplib.org/~tege/divcnst-pldi94.pdf](https://gmplib.org/~tege/divcnst-pldi94.pdf) 71