1*bf2c3715SXin LiBench Template Library 2*bf2c3715SXin Li 3*bf2c3715SXin Li**************************************** 4*bf2c3715SXin LiIntroduction : 5*bf2c3715SXin Li 6*bf2c3715SXin LiThe aim of this project is to compare the performance 7*bf2c3715SXin Liof available numerical libraries. The code is designed 8*bf2c3715SXin Lias generic and modular as possible. Thus, adding new 9*bf2c3715SXin Linumerical libraries or new numerical tests should 10*bf2c3715SXin Lirequire minimal effort. 11*bf2c3715SXin Li 12*bf2c3715SXin Li 13*bf2c3715SXin Li***************************************** 14*bf2c3715SXin Li 15*bf2c3715SXin LiInstallation : 16*bf2c3715SXin Li 17*bf2c3715SXin LiBTL uses cmake / ctest: 18*bf2c3715SXin Li 19*bf2c3715SXin Li1 - create a build directory: 20*bf2c3715SXin Li 21*bf2c3715SXin Li $ mkdir build 22*bf2c3715SXin Li $ cd build 23*bf2c3715SXin Li 24*bf2c3715SXin Li2 - configure: 25*bf2c3715SXin Li 26*bf2c3715SXin Li $ ccmake .. 27*bf2c3715SXin Li 28*bf2c3715SXin Li3 - run the bench using ctest: 29*bf2c3715SXin Li 30*bf2c3715SXin Li $ ctest -V 31*bf2c3715SXin Li 32*bf2c3715SXin LiYou can run the benchmarks only on libraries matching a given regular expression: 33*bf2c3715SXin Li ctest -V -R <regexp> 34*bf2c3715SXin LiFor instance: 35*bf2c3715SXin Li ctest -V -R eigen2 36*bf2c3715SXin Li 37*bf2c3715SXin LiYou can also select a given set of actions defining the environment variable BTL_CONFIG this way: 38*bf2c3715SXin Li BTL_CONFIG="-a action1{:action2}*" ctest -V 39*bf2c3715SXin LiAn example: 40*bf2c3715SXin Li BTL_CONFIG="-a axpy:vector_matrix:trisolve:ata" ctest -V -R eigen2 41*bf2c3715SXin Li 42*bf2c3715SXin LiFinally, if bench results already exist (the bench*.dat files) then they merges by keeping the best for each matrix size. If you want to overwrite the previous ones you can simply add the "--overwrite" option: 43*bf2c3715SXin Li BTL_CONFIG="-a axpy:vector_matrix:trisolve:ata --overwrite" ctest -V -R eigen2 44*bf2c3715SXin Li 45*bf2c3715SXin Li4 : Analyze the result. different data files (.dat) are produced in each libs directories. 46*bf2c3715SXin Li If gnuplot is available, choose a directory name in the data directory to store the results and type: 47*bf2c3715SXin Li $ cd data 48*bf2c3715SXin Li $ mkdir my_directory 49*bf2c3715SXin Li $ cp ../libs/*/*.dat my_directory 50*bf2c3715SXin Li Build the data utilities in this (data) directory 51*bf2c3715SXin Li make 52*bf2c3715SXin Li Then you can look the raw data, 53*bf2c3715SXin Li go_mean my_directory 54*bf2c3715SXin Li or smooth the data first : 55*bf2c3715SXin Li smooth_all.sh my_directory 56*bf2c3715SXin Li go_mean my_directory_smooth 57*bf2c3715SXin Li 58*bf2c3715SXin Li 59*bf2c3715SXin Li************************************************* 60*bf2c3715SXin Li 61*bf2c3715SXin LiFiles and directories : 62*bf2c3715SXin Li 63*bf2c3715SXin Li generic_bench : all the bench sources common to all libraries 64*bf2c3715SXin Li 65*bf2c3715SXin Li actions : sources for different action wrappers (axpy, matrix-matrix product) to be tested. 66*bf2c3715SXin Li 67*bf2c3715SXin Li libs/* : bench sources specific to each tested libraries. 68*bf2c3715SXin Li 69*bf2c3715SXin Li machine_dep : directory used to store machine specific Makefile.in 70*bf2c3715SXin Li 71*bf2c3715SXin Li data : directory used to store gnuplot scripts and data analysis utilities 72*bf2c3715SXin Li 73*bf2c3715SXin Li************************************************** 74*bf2c3715SXin Li 75*bf2c3715SXin LiPrinciples : the code modularity is achieved by defining two concepts : 76*bf2c3715SXin Li 77*bf2c3715SXin Li ****** Action concept : This is a class defining which kind 78*bf2c3715SXin Li of test must be performed (e.g. a matrix_vector_product). 79*bf2c3715SXin Li An Action should define the following methods : 80*bf2c3715SXin Li 81*bf2c3715SXin Li *** Ctor using the size of the problem (matrix or vector size) as an argument 82*bf2c3715SXin Li Action action(size); 83*bf2c3715SXin Li *** initialize : this method initialize the calculation (e.g. initialize the matrices and vectors arguments) 84*bf2c3715SXin Li action.initialize(); 85*bf2c3715SXin Li *** calculate : this method actually launch the calculation to be benchmarked 86*bf2c3715SXin Li action.calculate; 87*bf2c3715SXin Li *** nb_op_base() : this method returns the complexity of the calculate method (allowing the mflops evaluation) 88*bf2c3715SXin Li *** name() : this method returns the name of the action (std::string) 89*bf2c3715SXin Li 90*bf2c3715SXin Li ****** Interface concept : This is a class or namespace defining how to use a given library and 91*bf2c3715SXin Li its specific containers (matrix and vector). Up to now an interface should following types 92*bf2c3715SXin Li 93*bf2c3715SXin Li *** real_type : kind of float to be used (float or double) 94*bf2c3715SXin Li *** stl_vector : must correspond to std::vector<real_type> 95*bf2c3715SXin Li *** stl_matrix : must correspond to std::vector<stl_vector> 96*bf2c3715SXin Li *** gene_vector : the vector type for this interface --> e.g. (real_type *) for the C_interface 97*bf2c3715SXin Li *** gene_matrix : the matrix type for this interface --> e.g. (gene_vector *) for the C_interface 98*bf2c3715SXin Li 99*bf2c3715SXin Li + the following common methods 100*bf2c3715SXin Li 101*bf2c3715SXin Li *** free_matrix(gene_matrix & A, int N) dealocation of a N sized gene_matrix A 102*bf2c3715SXin Li *** free_vector(gene_vector & B) dealocation of a N sized gene_vector B 103*bf2c3715SXin Li *** matrix_from_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an stl_matrix A_stl into a gene_matrix A. 104*bf2c3715SXin Li The allocation of A is done in this function. 105*bf2c3715SXin Li *** vector_to_stl(gene_vector & B, stl_vector & B_stl) copy the content of an stl_vector B_stl into a gene_vector B. 106*bf2c3715SXin Li The allocation of B is done in this function. 107*bf2c3715SXin Li *** matrix_to_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an gene_matrix A into an stl_matrix A_stl. 108*bf2c3715SXin Li The size of A_STL must corresponds to the size of A. 109*bf2c3715SXin Li *** vector_to_stl(gene_vector & A, stl_vector & A_stl) copy the content of an gene_vector A into an stl_vector A_stl. 110*bf2c3715SXin Li The size of B_STL must corresponds to the size of B. 111*bf2c3715SXin Li *** copy_matrix(gene_matrix & source, gene_matrix & cible, int N) : copy the content of source in cible. Both source 112*bf2c3715SXin Li and cible must be sized NxN. 113*bf2c3715SXin Li *** copy_vector(gene_vector & source, gene_vector & cible, int N) : copy the content of source in cible. Both source 114*bf2c3715SXin Li and cible must be sized N. 115*bf2c3715SXin Li 116*bf2c3715SXin Li and the following method corresponding to the action one wants to be benchmarked : 117*bf2c3715SXin Li 118*bf2c3715SXin Li *** matrix_vector_product(const gene_matrix & A, const gene_vector & B, gene_vector & X, int N) 119*bf2c3715SXin Li *** matrix_matrix_product(const gene_matrix & A, const gene_matrix & B, gene_matrix & X, int N) 120*bf2c3715SXin Li *** ata_product(const gene_matrix & A, gene_matrix & X, int N) 121*bf2c3715SXin Li *** aat_product(const gene_matrix & A, gene_matrix & X, int N) 122*bf2c3715SXin Li *** axpy(real coef, const gene_vector & X, gene_vector & Y, int N) 123*bf2c3715SXin Li 124*bf2c3715SXin Li The bench algorithm (generic_bench/bench.hh) is templated with an action itself templated with 125*bf2c3715SXin Li an interface. A typical main.cpp source stored in a given library directory libs/A_LIB 126*bf2c3715SXin Li looks like : 127*bf2c3715SXin Li 128*bf2c3715SXin Li bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ; 129*bf2c3715SXin Li 130*bf2c3715SXin Li this function will produce XY data file containing measured mflops as a function of the size for 50 131*bf2c3715SXin Li sizes between 10 and 10000. 132*bf2c3715SXin Li 133*bf2c3715SXin Li This algorithm can be adapted by providing a given Perf_Analyzer object which determines how the time 134*bf2c3715SXin Li measurements must be done. For example, the X86_Perf_Analyzer use the asm rdtsc function and provides 135*bf2c3715SXin Li a very fast and accurate (but less portable) timing method. The default is the Portable_Perf_Analyzer 136*bf2c3715SXin Li so 137*bf2c3715SXin Li 138*bf2c3715SXin Li bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ; 139*bf2c3715SXin Li 140*bf2c3715SXin Li is equivalent to 141*bf2c3715SXin Li 142*bf2c3715SXin Li bench< Portable_Perf_Analyzer,AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ; 143*bf2c3715SXin Li 144*bf2c3715SXin Li If your system supports it we suggest to use a mixed implementation (X86_perf_Analyzer+Portable_Perf_Analyzer). 145*bf2c3715SXin Li replace 146*bf2c3715SXin Li bench<Portable_Perf_Analyzer,Action>(size_min,size_max,nb_point); 147*bf2c3715SXin Li with 148*bf2c3715SXin Li bench<Mixed_Perf_Analyzer,Action>(size_min,size_max,nb_point); 149*bf2c3715SXin Li in generic/bench.hh 150*bf2c3715SXin Li 151*bf2c3715SXin Li. 152*bf2c3715SXin Li 153*bf2c3715SXin Li 154*bf2c3715SXin Li 155