xref: /aosp_15_r20/external/eigen/bench/btl/README (revision bf2c37156dfe67e5dfebd6d394bad8b2ab5804d4)
1*bf2c3715SXin LiBench Template Library
2*bf2c3715SXin Li
3*bf2c3715SXin Li****************************************
4*bf2c3715SXin LiIntroduction :
5*bf2c3715SXin Li
6*bf2c3715SXin LiThe aim of this project is to compare the performance
7*bf2c3715SXin Liof available numerical libraries. The code is designed
8*bf2c3715SXin Lias generic and modular as possible. Thus, adding new
9*bf2c3715SXin Linumerical libraries or new numerical tests should
10*bf2c3715SXin Lirequire minimal effort.
11*bf2c3715SXin Li
12*bf2c3715SXin Li
13*bf2c3715SXin Li*****************************************
14*bf2c3715SXin Li
15*bf2c3715SXin LiInstallation :
16*bf2c3715SXin Li
17*bf2c3715SXin LiBTL uses cmake / ctest:
18*bf2c3715SXin Li
19*bf2c3715SXin Li1 - create a build directory:
20*bf2c3715SXin Li
21*bf2c3715SXin Li  $ mkdir build
22*bf2c3715SXin Li  $ cd build
23*bf2c3715SXin Li
24*bf2c3715SXin Li2 - configure:
25*bf2c3715SXin Li
26*bf2c3715SXin Li  $ ccmake ..
27*bf2c3715SXin Li
28*bf2c3715SXin Li3 - run the bench using ctest:
29*bf2c3715SXin Li
30*bf2c3715SXin Li  $ ctest -V
31*bf2c3715SXin Li
32*bf2c3715SXin LiYou can run the benchmarks only on libraries matching a given regular expression:
33*bf2c3715SXin Li  ctest -V -R <regexp>
34*bf2c3715SXin LiFor instance:
35*bf2c3715SXin Li  ctest -V -R eigen2
36*bf2c3715SXin Li
37*bf2c3715SXin LiYou can also select a given set of actions defining the environment variable BTL_CONFIG this way:
38*bf2c3715SXin Li  BTL_CONFIG="-a action1{:action2}*" ctest -V
39*bf2c3715SXin LiAn example:
40*bf2c3715SXin Li  BTL_CONFIG="-a axpy:vector_matrix:trisolve:ata" ctest -V -R eigen2
41*bf2c3715SXin Li
42*bf2c3715SXin LiFinally, if bench results already exist (the bench*.dat files) then they merges by keeping the best for each matrix size. If you want to overwrite the previous ones you can simply add the "--overwrite" option:
43*bf2c3715SXin Li  BTL_CONFIG="-a axpy:vector_matrix:trisolve:ata --overwrite" ctest -V -R eigen2
44*bf2c3715SXin Li
45*bf2c3715SXin Li4 : Analyze the result. different data files (.dat) are produced in each libs directories.
46*bf2c3715SXin Li If gnuplot is available, choose a directory name in the data directory to store the results and type:
47*bf2c3715SXin Li        $ cd data
48*bf2c3715SXin Li        $ mkdir my_directory
49*bf2c3715SXin Li        $ cp ../libs/*/*.dat my_directory
50*bf2c3715SXin Li Build the data utilities in this (data) directory
51*bf2c3715SXin Li        make
52*bf2c3715SXin Li Then you can look the raw data,
53*bf2c3715SXin Li        go_mean my_directory
54*bf2c3715SXin Li or smooth the data first :
55*bf2c3715SXin Li	smooth_all.sh my_directory
56*bf2c3715SXin Li	go_mean my_directory_smooth
57*bf2c3715SXin Li
58*bf2c3715SXin Li
59*bf2c3715SXin Li*************************************************
60*bf2c3715SXin Li
61*bf2c3715SXin LiFiles and directories :
62*bf2c3715SXin Li
63*bf2c3715SXin Li generic_bench : all the bench sources common to all libraries
64*bf2c3715SXin Li
65*bf2c3715SXin Li actions : sources for different action wrappers (axpy, matrix-matrix product) to be tested.
66*bf2c3715SXin Li
67*bf2c3715SXin Li libs/* : bench sources specific to each tested libraries.
68*bf2c3715SXin Li
69*bf2c3715SXin Li machine_dep : directory used to store machine specific Makefile.in
70*bf2c3715SXin Li
71*bf2c3715SXin Li data : directory used to store gnuplot scripts and data analysis utilities
72*bf2c3715SXin Li
73*bf2c3715SXin Li**************************************************
74*bf2c3715SXin Li
75*bf2c3715SXin LiPrinciples : the code modularity is achieved by defining two concepts :
76*bf2c3715SXin Li
77*bf2c3715SXin Li ****** Action concept : This is a class defining which kind
78*bf2c3715SXin Li  of test must be performed (e.g. a matrix_vector_product).
79*bf2c3715SXin Li	An Action should define the following methods :
80*bf2c3715SXin Li
81*bf2c3715SXin Li        *** Ctor using the size of the problem (matrix or vector size) as an argument
82*bf2c3715SXin Li	    Action action(size);
83*bf2c3715SXin Li        *** initialize : this method initialize the calculation (e.g. initialize the matrices and vectors arguments)
84*bf2c3715SXin Li	    action.initialize();
85*bf2c3715SXin Li	*** calculate : this method actually launch the calculation to be benchmarked
86*bf2c3715SXin Li	    action.calculate;
87*bf2c3715SXin Li	*** nb_op_base() : this method returns the complexity of the calculate method (allowing the mflops evaluation)
88*bf2c3715SXin Li        *** name() : this method returns the name of the action (std::string)
89*bf2c3715SXin Li
90*bf2c3715SXin Li ****** Interface concept : This is a class or namespace defining how to use a given library and
91*bf2c3715SXin Li  its specific containers (matrix and vector). Up to now an interface should following types
92*bf2c3715SXin Li
93*bf2c3715SXin Li	*** real_type : kind of float to be used (float or double)
94*bf2c3715SXin Li	*** stl_vector : must correspond to std::vector<real_type>
95*bf2c3715SXin Li	*** stl_matrix : must correspond to std::vector<stl_vector>
96*bf2c3715SXin Li	*** gene_vector : the vector type for this interface        --> e.g. (real_type *) for the C_interface
97*bf2c3715SXin Li	*** gene_matrix : the matrix type for this interface        --> e.g. (gene_vector *) for the C_interface
98*bf2c3715SXin Li
99*bf2c3715SXin Li	+ the following common methods
100*bf2c3715SXin Li
101*bf2c3715SXin Li        *** free_matrix(gene_matrix & A, int N)  dealocation of a N sized gene_matrix A
102*bf2c3715SXin Li        *** free_vector(gene_vector & B)  dealocation of a N sized gene_vector B
103*bf2c3715SXin Li        *** matrix_from_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an stl_matrix A_stl into a gene_matrix A.
104*bf2c3715SXin Li	     The allocation of A is done in this function.
105*bf2c3715SXin Li	*** vector_to_stl(gene_vector & B, stl_vector & B_stl)  copy the content of an stl_vector B_stl into a gene_vector B.
106*bf2c3715SXin Li	     The allocation of B is done in this function.
107*bf2c3715SXin Li        *** matrix_to_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an gene_matrix A into an stl_matrix A_stl.
108*bf2c3715SXin Li             The size of A_STL must corresponds to the size of A.
109*bf2c3715SXin Li        *** vector_to_stl(gene_vector & A, stl_vector & A_stl) copy the content of an gene_vector A into an stl_vector A_stl.
110*bf2c3715SXin Li             The size of B_STL must corresponds to the size of B.
111*bf2c3715SXin Li	*** copy_matrix(gene_matrix & source, gene_matrix & cible, int N) : copy the content of source in cible. Both source
112*bf2c3715SXin Li		and cible must be sized NxN.
113*bf2c3715SXin Li	*** copy_vector(gene_vector & source, gene_vector & cible, int N) : copy the content of source in cible. Both source
114*bf2c3715SXin Li 		and cible must be sized N.
115*bf2c3715SXin Li
116*bf2c3715SXin Li	and the following method corresponding to the action one wants to be benchmarked :
117*bf2c3715SXin Li
118*bf2c3715SXin Li	***  matrix_vector_product(const gene_matrix & A, const gene_vector & B, gene_vector & X, int N)
119*bf2c3715SXin Li	***  matrix_matrix_product(const gene_matrix & A, const gene_matrix & B, gene_matrix & X, int N)
120*bf2c3715SXin Li        ***  ata_product(const gene_matrix & A, gene_matrix & X, int N)
121*bf2c3715SXin Li	***  aat_product(const gene_matrix & A, gene_matrix & X, int N)
122*bf2c3715SXin Li        ***  axpy(real coef, const gene_vector & X, gene_vector & Y, int N)
123*bf2c3715SXin Li
124*bf2c3715SXin Li The bench algorithm (generic_bench/bench.hh) is templated with an action itself templated with
125*bf2c3715SXin Li an interface. A typical main.cpp source stored in a given library directory libs/A_LIB
126*bf2c3715SXin Li looks like :
127*bf2c3715SXin Li
128*bf2c3715SXin Li bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ;
129*bf2c3715SXin Li
130*bf2c3715SXin Li this function will produce XY data file containing measured  mflops as a function of the size for 50
131*bf2c3715SXin Li sizes between 10 and 10000.
132*bf2c3715SXin Li
133*bf2c3715SXin Li This algorithm can be adapted by providing a given Perf_Analyzer object which determines how the time
134*bf2c3715SXin Li measurements must be done. For example, the X86_Perf_Analyzer use the asm rdtsc function and provides
135*bf2c3715SXin Li a very fast and accurate (but less portable) timing method. The default is the Portable_Perf_Analyzer
136*bf2c3715SXin Li so
137*bf2c3715SXin Li
138*bf2c3715SXin Li bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ;
139*bf2c3715SXin Li
140*bf2c3715SXin Li is equivalent to
141*bf2c3715SXin Li
142*bf2c3715SXin Li bench< Portable_Perf_Analyzer,AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ;
143*bf2c3715SXin Li
144*bf2c3715SXin Li If your system supports it we suggest to use a mixed implementation (X86_perf_Analyzer+Portable_Perf_Analyzer).
145*bf2c3715SXin Li replace
146*bf2c3715SXin Li     bench<Portable_Perf_Analyzer,Action>(size_min,size_max,nb_point);
147*bf2c3715SXin Li with
148*bf2c3715SXin Li     bench<Mixed_Perf_Analyzer,Action>(size_min,size_max,nb_point);
149*bf2c3715SXin Li in generic/bench.hh
150*bf2c3715SXin Li
151*bf2c3715SXin Li.
152*bf2c3715SXin Li
153*bf2c3715SXin Li
154*bf2c3715SXin Li
155