Abstract
High-performance routines of Basic Linear Algebra Subprograms (BLAS) are frequently required in the field of numerical calculations. We have implemented Dynamically Load-balanced BLAS (DL-BLAS) to enhance the performance of BLAS when other tasks use the CPU resources of multi-core CPU architectures. DL-BLAS tiles matrices into submatrices to construct subtasks and dynamically assigns tasks to CPU cores. We found that the dimensions of the submatrices used in DL-BLAS affect the performance. To achieve high performance, we must solve an optimization problem in which the variables are the dimensions of the submatrices. The search space of the optimization problem is so vast that an exhaustive search is unrealistic. We propose an autotuning search algorithm that consists of Diagonal Search, Reductive Search, and Parameter Selection. The proposed autotuning algorithm provides semioptimal parameters in realistic computing time. Using the proposed algorithm, we obtain parameters that provide the best performance in most cases. As a result, in several performance evaluation tests, DL-BLAS achieved a higher performance than ATLAS or GotoBLAS.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Basic Linear Algebra Subprograms (BLAS) Technical Forum Standard: BLAS (Basic Linear Algebra Subprograms). http://www.netlib.org/blas/. 26 Jan 2009
Lawson CL, Hanson RJ, Kincaid DR, Krogh FT (1979) Basic linear algebra subprograms for Fortran usage. ACM Trans Math Softw 5:308–323
Dongarra JJ, Croz JD, Hammarling S, Hanson RJ (1988) An extended set of FORTRAN basic linear algebra subprograms. ACM Trans Math Softw 14(1):1–17
Dongarra JJ, Croz JD, Hammarling S, Duff I (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17
Blackford LS, Demmel J, Dongarra J, Duff I, Hammarling S, Henry G, Heroux M, Kaufman L, Lumsdaine A, Petitet A, Pozo R, Remington K, Whaley RC (2002) An updated set of basic linear algebra subprograms (BLAS). ACM Trans Math Softw 28(2):135–151
Sawa Y, Suda R (2008) BLAS parallelization for binary distrubution for multi-core processors (In Japanese). JSIAM Annualy meeting, pp. 425–426
Sawa Y (2009) Adaptive parallelization of BLAS for multi-tasking environment. Master Thesis, Department of Computer Science, Graduate School of Information Science and Technology, University of Tokyo
Anderson E, Bai Z, Bischof C, Blackford S, Demmel J, Dongarra J, Croz JD, Greenbaum A, Hammarling S, McKenney A, Sorensen D (1999) LAPACK Users’ Guide. Third edn. Society for Industrial and Applied Mathematics, Philadelphia, PA
Lehoucq R, Maschhoff K, Sorensen D, Yang C (2009) ARPACK – Arnoldi Package. http://www.caam.rice.edu/software/ARPACK/. 26 Jan 2009
Goto K (2009) GotoBLAS. http://www.tacc.utexas.edu/resources/software/. 26 Jan 2009
Goto K, van de Geijn R (2008) High-performance implementation of the level-3 blas. ACM Trans Math Softw 35(1):1–14
Goto K, van de Geijn R (2008) Anatomy of high-performance matrix multiplication. ACM Trans Math Softw 34(3)
Whaley RC, Petitet A (2009) Automatically Tuned Linear Algebra Software (ATLAS). http://math-atlas.sourceforge.net/. 26 Jan 2009
Whaley RC, Petitet A (2005) Minimizing development and maintenance costs in supporting persistently optimized BLAS. Softw Pract Exp 35(2):101–121
Dackland K, Elmroth E, Kagstrom B, Loan CV (1991) Design and evaluation of parallel block algorithms: Lu factorization on an ibm 3090 vf/600j, PPSC, pp 3–10
Kagstrom B, Ling P, van Loan C (1998) GEMM-Basd Level3 BLAS. ACM Trans Math Softw (TOMS) 288–302
Kagstrom B, Ling P, van Loan C (2009) GEMM Based BLAS. http://www.netlib.org/blas/gemm_based/. 26 Jan 2009
Acknowledgements
This study was supported in part by a Grant-in-Aid for Scientific Research on Priority Areas, “New IT Infrastructure for the information-explosion Era”, by MEXT, Japan, and by the 4th IJARC Blue Sky program of Microsoft Corporation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer New York
About this chapter
Cite this chapter
Sawa, Y., Suda, R. (2011). Autotuning Method for Deciding Block Size Parameters in Dynamically Load-Balanced BLAS. In: Naono, K., Teranishi, K., Cavazos, J., Suda, R. (eds) Software Automatic Tuning. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6935-4_3
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6935-4_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6934-7
Online ISBN: 978-1-4419-6935-4
eBook Packages: EngineeringEngineering (R0)