Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction

  • Peter Benner
  • Ernesto Dufrechou
  • Pablo Ezzatti
  • Pablo Igounet
  • Enrique S. Quintana-Ortí
  • Alfredo Remón
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8584)


In this paper we present new hybrid CPU-GPU routines to accelerate the solution of linear systems, with band coefficient matrix, by off-loading the major part of the computations to the GPU and leveraging highly tuned implementations of the BLAS for the graphics processor. Our experiments with an nVidia S2070 GPU report speed-ups up to 6× for the hybrid band solver based on the LU factorization over analogous CPU-only routines in Intel’s MKL. As a practical demonstration of these benefits, we plug the new CPU-GPU codes into a sparse matrix Lyapunov equation solver, showing a 3× acceleration on the solution of a large-scale benchmark arising in model reduction.


Band linear systems linear algebra graphics processors (GPUs) high performance control theory 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anderson, E., Bai, Z., Demmel, J., Dongarra, J.E., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A.E., Ostrouchov, S., Sorensen, D.: LAPACK Users’ Guide. SIAM, Philadelphia (1992)Google Scholar
  2. 2.
    Du Croz, J., Mayes, P., Radicati, G.: Factorization of band matrices using level 3 BLAS. LAPACK Working Note 21, Technical Report CS-90-109, University of Tennessee (1990)Google Scholar
  3. 3.
    The Top500 list (2013),
  4. 4.
    Kirk, D., Hwu, W.: Programming Massively Parallel Processors: A Hands-on Approach, 2nd edn. Morgan Kaufmann (2012)Google Scholar
  5. 5.
    Farber, R.: CUDA application design and development. Morgan Kaufmann (2011)Google Scholar
  6. 6.
    Volkov, V., Demmel, J.: LU, QR and Cholesky factorizations using vector capabilities of GPUs. Technical Report UCB/EECS-2008-49, EECS Department, University of California, Berkeley (2008)Google Scholar
  7. 7.
    Barrachina, S., Castillo, M., Igual, F.D., Mayo, R., Quintana-Ortí, E.S.: Solving dense linear systems on graphics processors. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 739–748. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  8. 8.
    Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Matrix inversion on CPU–GPU platforms with applications in control theory. Concurrency and Computation: Practice and Experience 25, 1170–1182 (2013)CrossRefGoogle Scholar
  9. 9.
    Penzl, T.: LYAPACK: A MATLAB toolbox for large Lyapunov and Riccati equations, model reduction problems, and linear-quadratic optimal control problems. User’s guide, version 1.0. (2000),
  10. 10.
    Strazdins, P.: A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Technical Report TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia (1998)Google Scholar
  11. 11.
    Antoulas, A.: Approximation of Large-Scale Dynamical Systems. SIAM Publications, Philadelphia (2005)Google Scholar
  12. 12.
    Penzl, T.: A cyclic low-rank Smith method for large sparse Lyapunov equations. SIAM J. Sci. Comput. 21, 1401–1418 (1999)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 24th National Conference, ACM 1969, pp. 157–172. ACM, New York (1969)CrossRefGoogle Scholar
  14. 14.
    IMTEK (Oberwolfach model reduction benchmark collection),

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Peter Benner
    • 1
  • Ernesto Dufrechou
    • 2
  • Pablo Ezzatti
    • 2
  • Pablo Igounet
    • 2
  • Enrique S. Quintana-Ortí
    • 3
  • Alfredo Remón
    • 1
  1. 1.Max Planck Institute for Dynamics of Complex Technical SystemsMagdeburgGermany
  2. 2.Instituto de ComputaciónUniversidad de la RepúblicaMontevideoUruguay
  3. 3.Dep. de Ingeniería y Ciencia de la ComputaciónUniversidad Jaime ICastellónSpain

Personalised recommendations