Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction
In this paper we present new hybrid CPU-GPU routines to accelerate the solution of linear systems, with band coefficient matrix, by off-loading the major part of the computations to the GPU and leveraging highly tuned implementations of the BLAS for the graphics processor. Our experiments with an nVidia S2070 GPU report speed-ups up to 6× for the hybrid band solver based on the LU factorization over analogous CPU-only routines in Intel’s MKL. As a practical demonstration of these benefits, we plug the new CPU-GPU codes into a sparse matrix Lyapunov equation solver, showing a 3× acceleration on the solution of a large-scale benchmark arising in model reduction.
KeywordsBand linear systems linear algebra graphics processors (GPUs) high performance control theory
Unable to display preview. Download preview PDF.
- 1.Anderson, E., Bai, Z., Demmel, J., Dongarra, J.E., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A.E., Ostrouchov, S., Sorensen, D.: LAPACK Users’ Guide. SIAM, Philadelphia (1992)Google Scholar
- 2.Du Croz, J., Mayes, P., Radicati, G.: Factorization of band matrices using level 3 BLAS. LAPACK Working Note 21, Technical Report CS-90-109, University of Tennessee (1990)Google Scholar
- 3.The Top500 list (2013), http://www.top500.org
- 4.Kirk, D., Hwu, W.: Programming Massively Parallel Processors: A Hands-on Approach, 2nd edn. Morgan Kaufmann (2012)Google Scholar
- 5.Farber, R.: CUDA application design and development. Morgan Kaufmann (2011)Google Scholar
- 6.Volkov, V., Demmel, J.: LU, QR and Cholesky factorizations using vector capabilities of GPUs. Technical Report UCB/EECS-2008-49, EECS Department, University of California, Berkeley (2008)Google Scholar
- 9.Penzl, T.: LYAPACK: A MATLAB toolbox for large Lyapunov and Riccati equations, model reduction problems, and linear-quadratic optimal control problems. User’s guide, version 1.0. (2000), http://www.netlib.org/lyapack/guide.pdf
- 10.Strazdins, P.: A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Technical Report TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia (1998)Google Scholar
- 11.Antoulas, A.: Approximation of Large-Scale Dynamical Systems. SIAM Publications, Philadelphia (2005)Google Scholar
- 14.IMTEK (Oberwolfach model reduction benchmark collection), http://www.imtek.de/simulation/benchmark/