BTL++: From Performance Assessment to Optimal Libraries

  • Laurent Plagne
  • Frank Hülsemann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5103)


This paper presents the Benchmark Template Library in C++, in short BTL++, which is a flexible framework to assess the run time of user defined computational kernels. When the same kernel is implemented in several different ways, the collected performance data can be used to automatically construct an interface library that dispatches a function call to the fastest variant available.

The benchmark examples in this article are mostly functions from the dense linear algebra BLAS API. However, BTL++ can be applied to any kernel that can be called by a function from a C++ main program. Within the same framework, we are able to compare different implementations of the operations to be benchmarked, from libraries such as ATLAS, over procedural solutions in Fortran and C to more recent C++ libraries with a higher level of abstraction. Results of single threaded and multi-threaded computations are included.


Computational Action Computational Kernel Expression Template Performance Evaluation Method High Performance Computing Application 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Netlib: BLAS web page,
  2. 2.
    Whaley, R.C., Petitet, A.: Minimizing development and maintenance costs in supporting persistently optimized BLAS. Software: Practice and Experience 35(2), 101–121 (2005)CrossRefGoogle Scholar
  3. 3.
    Intel: MKL web page can be found from,
  4. 4.
  5. 5.
    Goto, K., van de Geijn, R.A.: Anatomy of a High-Performance Matrix Multiplication. ACM Transactions on Mathematical Software 34(3) (September 2007) Google Scholar
  6. 6.
    Veldhuizen, T.L.: Arrays in blitz++. In: Caromel, D., Oldehoeft, R.R., Tholburn, M. (eds.) ISCOPE 1998. LNCS, vol. 1505, pp. 223–230. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  7. 7.
    Siek, J.G., Lumsdaine, A.: The matrix template library: Generic components for high-performance scientific computing. Computing in Science and Engineering 1(6), 70–78 (1999)CrossRefGoogle Scholar
  8. 8.
    Walter, J., Koch, M.: uBLAS web page,
  9. 9.
    Plagne, L.: BTL web page,
  10. 10.
    Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A Portable Programming Interface for Performance Evaluation on Modern Processors. The International Journal of High Performance Computing Applications 14(3), 189–204 (2000)CrossRefGoogle Scholar
  11. 11.
    Schöne, R., Juckeland, G., Nagel, W.E., Pflüger, S., Wloch, R.: Performance comparison and optimization: Case studies using BenchIT. In: Joubert, G.R., Nagel, W.E., Peters, F.J., Plata, O.G., Tirado, P., Zapata, E.L. (eds.) Parallel Computing: Current & Future Issues of High-End Computing. Proceedings of the International Conference ParCo 2005, vol. 33, pp. 877–884. Central Institute for Applied Mathematics, Jülich, Germany (2006)Google Scholar
  12. 12.
    Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proceedings of the IEEE 93(2), 216–231 (2005)CrossRefGoogle Scholar
  13. 13.
    Berghen, F.V.: miniSSEL1BLAS web page,
  14. 14.
    Petzold, O.: tvmet web page,
  15. 15.
    Mello, U., Khabibrakhmanov, I.: On the reusability and numeric efficiency of C++ packages in scientific computing (2003),
  16. 16.
    Plagne, L., Hülsemann, F.: Improving large vector operations with C++ expression template and ATLAS. In: 6th intl. workshop on Multiparadigm Programming with Object-Oriented Languages (MPOOL 2007) (July 2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Laurent Plagne
    • 1
  • Frank Hülsemann
    • 1
  1. 1.EDF R&DClamartFrance

Personalised recommendations