Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU

  • Luc Buatois
  • Guillaume Caumon
  • Bruno Lévy
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4782)


A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where the non-zero pattern of the matrix is dictated by the connectivity matrix of the mesh. The advent of GPUs with their ever-growing amount of parallel horsepower makes them a tempting resource for such numerical computations. This can be helped by new APIs (CTM from ATI and CUDA from NVIDIA) which give a direct access to the multithreaded computational resources and associated memory bandwidth of GPUs; CUDA even provides a BLAS implementation but only for dense matrices (CuBLAS). However, existing GPU linear solvers are restricted to specific types of matrices, or use non-optimal compressed row storage strategies. By combining recent GPU programming techniques with supercomputing strategies (namely block compressed row storage and register blocking), we implement a sparse general-purpose linear solver which outperforms leading-edge CPU counterparts (MKL / ACML).


Memory Bandwidth Graphic Card Sparse Matrice Assembly Code Graphic Memory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Buck, I., Fatahalian, K., Hanrahan, P.: Gpubench: Evaluating gpu performance for numerical and scientific applications. In: Proceedings of the 2004 ACM Workshop on General-Purpose Computing on Graphics Processors, ACM Press, New York (2004)Google Scholar
  2. 2.
    GPGPU (General-Purpose computation on GPUs),
  3. 3.
    Keane, A.: CUDA (compute unified device architecture) (2006),
  4. 4.
    Peercy, M., Segal, M., Gerstmann, D.: A performance-oriented data-parallel virtual machine for gpus. In: ACM SIGGRAPH 2006 (2006)Google Scholar
  5. 5.
    Levy, B., Petitjean, S., Ray, N., Maillot, J.: Least squares conformal maps for automatic texture atlas generation. In: SIGGRAPH 2002, San-Antonio, Texas, USA, ACM Press, New York (2002)Google Scholar
  6. 6.
    Mallet, J.: Discrete Smooth Interpolation. Computer Aided Design 24(4), 263–270 (1992)CrossRefGoogle Scholar
  7. 7.
    Levy, B.: Numerical methods for digital geometry processing. In: Israel Korea Bi-National Conference (November 2005)Google Scholar
  8. 8.
    Hestenes, M.R., Stiefel, E.: Methods of Conjugate Gradients for Solving Linear Systems. J. Research Nat. Bur. Standards 49, 409–436 (1952)zbMATHMathSciNetGoogle Scholar
  9. 9.
    Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., der Vorst, H.V.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia, PA (1994)Google Scholar
  10. 10.
    Intel: Math kernel library.
  11. 11.
    AMD: Amd core math library.
  12. 12.
    Shewchuk, J.R.: An introduction to the conjugate gradient method without the agonizing pain. Technical report, CMU School of Computer Science (1994),
  13. 13.
    Krüger, J., Westermann, R.: Linear algebra operators for gpu implementation of numerical algorithms. ACM Transactions on Graphics (TOG) 22(3), 908–916 (2003)CrossRefGoogle Scholar
  14. 14.
    Jung, J.H., O’Leary, D.P.: Cholesky decomposition and linear programming on a gpu, Scholarly Paper, University of Maryland (2006)Google Scholar
  15. 15.
    Galoppo, N., Govindaraju, N.K., Henson, M., Manocha, D.: LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware. In: SC 2005, IEEE Computer Society, Washington, DC, USA (2005)Google Scholar
  16. 16.
    Bolz, J., Farmer, I., Grinspun, E., Schröder, P.: Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans. Graph. 22(3), 917–924 (2003)CrossRefGoogle Scholar
  17. 17.
    Microsoft: Direct3d reference. (2006)Google Scholar
  18. 18.
    Segal, M., Akeley, K.: The OpenGL graphics system: A specification, version 2.0 (2004),
  19. 19.
    Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for gpus: stream computing on graphics hardware. ACM Trans. Graph. 23(3), 777–786 (2004)CrossRefGoogle Scholar
  20. 20.
    McCool, M., DuToit, S.: Metaprogramming GPUs with Sh. AK Peters (2004)Google Scholar
  21. 21.
    Fernando, R., Kilgard, M.J.: The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics. Addison-Wesley Longman Publishing Co. Inc., Boston, MA, USA (2003)Google Scholar
  22. 22.
    Fatahalian, K., Sugerman, J., Hanrahan, P.: Understanding the efficiency of gpu algorithms for matrix-matrix multiplication. In: HWWS 2004. Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pp. 133–137. ACM Press, New York (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Luc Buatois
    • 1
  • Guillaume Caumon
    • 2
  • Bruno Lévy
    • 3
  1. 1.Gocad Research Group, INRIA, Nancy UniversitéFrance
  2. 2.ENSG/CRPG, Nancy UniversitéFrance
  3. 3.ALICE - INRIA Lorraine, NancyFrance

Personalised recommendations