The Journal of Supercomputing

, Volume 69, Issue 1, pp 200–224 | Cite as

Parallel sparse linear solver with GMRES method using minimization techniques of communications for GPU clusters

  • Lilia Ziane Khodja
  • Raphaël Couturier
  • Arnaud Giersch
  • Jacques M. Bahi


In this paper, we aim at exploiting the power computing of a graphics processing unit (GPU) cluster for solving large sparse linear systems. We implement the parallel algorithm of the generalized minimal residual iterative method using the Compute Unified Device Architecture programming language and the MPI parallel environment. The experiments show that a GPU cluster is more efficient than a CPU cluster. In order to optimize the performances, we use a compressed storage format for the sparse vectors and the hypergraph partitioning. These solutions improve the spatial and temporal localization of the shared data between the computing nodes of the GPU cluster.


Parallel GMRES Cluster of GPUs Communication reduction 



This paper is based upon work supported by the Région de Franche-Comté and partially funded by the Labex ACTION program (contract ANR-11-LABX-01-01).


  1. 1.
    Ament M, Knittel G, Weiskopf D, Strasser W (2010) A parallel preconditioned conjugate gradient solver for the poisson problem on a multi-GPU platform. In: Proceedings of the 2010 18th Euromicro conference on parallel, distributed and network-based processing, IEEE Computer Society, pp 583–592Google Scholar
  2. 2.
    Arnoldi W (1951) The principle of minimized iteration in the solution of the matrix eigenvalue problem. Quart Appl Math 9:17–29zbMATHMathSciNetGoogle Scholar
  3. 3.
    Bahi J, Contassot-Vivier S, Couturier R (2008) Parallel iterative algorithms: from sequential to grid computing. In: Numerical analysis and scientific computing. Chapman & Hall/CRCGoogle Scholar
  4. 4.
    Bahi J, Couturier R, Ziane Khodja L (2011) Parallel GMRES implementation for solving sparse linear systems on GPU clusters. In: Proceedings of the 19th high performance computing symposia, HPC ’11, SCS, International, pp 12–19Google Scholar
  5. 5.
    Bahi J, Couturier R, Ziane Khodja L (2012) Parallel sparse linear solver gmres for gpu clusters with compression of exchanged data. In: Euro-Par 2011: parallel processing workshops, volume 7155 of LNCS, Springer, pp 471–480Google Scholar
  6. 6.
    Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC’09, Portland, Oregon, ACM, pp 1–11Google Scholar
  7. 7.
    Bolz J, Farmer I, Grinspun E, Schröder P (2003) Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans Graph 22(3):917–924CrossRefGoogle Scholar
  8. 8.
    Çatalyürek Ü, Aykanat C (1999) Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans Parallel Distrib Syst 10(7):673–693Google Scholar
  9. 9.
    Çatalyürek Ü, Aykanat C (1999) PaToH: partitioning tool for hypergraphs. Accessed 28 Feb 2014
  10. 10.
    Cevahir A, Nukada A, Matsuoka S (2009) Fast conjugate gradients with multiple GPUs. In: Computational science ICCS 2009, volume 5544 of LNCS, Springer, pp 893–903Google Scholar
  11. 11.
    Cevahir A, Nukada A, Matsuoka S (2010) High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning. Comput Sci Res Dev 25:83–91CrossRefGoogle Scholar
  12. 12.
    Chen C, Taha T (2013) A communication reduction approach to iteratively solve large sparse linear systems on a GPGPU cluster. Cluster Comput 1–11Google Scholar
  13. 13.
    Contassot-Vivier S, Jost T, Vialle S (2012) Impact of asynchronism on GPU accelerated parallel iterative computations. In: Applied parallel and scientific computing, volume 7133 of LNCS, Springer, pp 43–53Google Scholar
  14. 14.
    Couturier R, Domas S (2012) Sparse systems solving on GPUs with GMRES. J Supercomput 59(3):1504–1516CrossRefGoogle Scholar
  15. 15.
    CUSP Library. Accessed 28 Feb 2014
  16. 16.
    Davis T, Hu Y (1997) The University of Florida sparse matrix collection, Digest. Accessed 28 Feb 2014
  17. 17.
    Devine K, Boman E, Heaphy R, Bisseling R, Çatalyürek Ü (2006) Parallel hypergraph partitioning for scientific computing. In: Proceedings of the 20th international conference on parallel and distributed processing, IPDPS’06, IEEE Computer Society, pp 124–124Google Scholar
  18. 18.
    DeVries B, Iannelli J, Trefftz C, O’Hearn K, Wolffe G (2013) Parallel implementations of FGMRES for solving large, sparse non-symmetric linear systems. Proc Comput Sci 18:491–500CrossRefGoogle Scholar
  19. 19.
    Gaikwad A, Toke I (2010) Parallel iterative linear solvers on GPU: a financial engineering case. In: Proceedings of the 2010 18th Euromicro conference on parallel, distributed and network-based processing, IEEE Computer Society, pp 607–614Google Scholar
  20. 20.
    Ghaemian N, Abdollahzadeh A, Heinemann Z, Harrer A, Sharifi M, Heinemann G (2008) Accelerating the GMRES iterative linear solver of an oil reservoir simulator using the multi-processing power of compute unified device architecture of graphics cards. In: PARA 2008Google Scholar
  21. 21.
    Göddeke D, Strzodka R, Mohd-Yusof J, McCormick P, Buijssen S, Grajewski M, Turek S (2007) Exploring weak scalability for FEM calculations on a GPU-enhanced cluster. Parallel Comput Spec Issue High-perform Comput Accel 33(10–11):685–699Google Scholar
  22. 22.
    Haase G, Liebmann M, Douglas C, Plank G (2010) A parallel algebraic multigrid solver on graphics processing units. In: High performance computing and applications, volume 5938 of LNCS, Springer, pp 38–47Google Scholar
  23. 23.
    Jost T, Contassot-Vivier S, Vialle S (2009) An efficient multi-algorithms sparse linear solver for GPUs. In International conference on parallel computing, ParCo2009Google Scholar
  24. 24.
    Karypis G, Kumar V (1998) hMETIS: a hypergraph partitioning package. Accessed 28 Feb 2014
  25. 25.
    Li R, Saad Y (2013) GPU-accelerated preconditioned iterative linear solvers. J Supercomput 63(2):443–466CrossRefGoogle Scholar
  26. 26.
    Neic A, Liebmann M, Haase G, Plank G (2012) Algebraic multigrid solver on clusters of CPUs and GPUs. In: Applied parallel and scientific computing, volume 7134 of LNCS, Springer, pp 389–398Google Scholar
  27. 27.
    NVIDIA Corporation (2012) CUDA Toolkit 4.2 CUBLAS Library.Google Scholar
  28. 28.
    NVIDIA Corporation (2012) NVIDIA CUDA C Programming Guide.Google Scholar
  29. 29.
    Paige C, Saunders M (1975) Solution of sparse indefinite systems of linear equations. SIAM J Numer Anal 12(4):617–629CrossRefzbMATHMathSciNetGoogle Scholar
  30. 30.
    PHG—parallel hypergraph and graph partitioning with Zoltan. Accessed 28 Feb 2014
  31. 31.
    Saad Y, Schultz M (1986) GMRES : a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J Sci Stat Comput 7(3):856–869CrossRefzbMATHMathSciNetGoogle Scholar
  32. 32.
    Wang M, Klie H, Parashar M, Sudan H (2009) Solving sparse linear systems on NVIDIA Tesla GPUs. In: Computational science ICCS 2009, volume 5544 of LNCS, Springer, pp 864–873Google Scholar
  33. 33.
    Weber D, Bender J, Schnoes M, Stork A, Fellner D (2013) Efficient GPU data structures and methods to solve sparse linear systems in dynamics applications. Comput Graph Forum 32:16–26CrossRefGoogle Scholar
  34. 34.
    Zhao N, Wang X (2012) A parallel preconditioned Bi-Conjugate Gradient stabilized solver for the Poisson problem. J Comput 7(12): 3088–3095Google Scholar
  35. 35.
    Zoltan: parallel partitioning, load balancing and data-management services. User’s guide. Accessed 28 Feb 2014

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Lilia Ziane Khodja
    • 1
  • Raphaël Couturier
    • 1
  • Arnaud Giersch
    • 1
  • Jacques M. Bahi
    • 1
  1. 1.FEMTO-ST InstituteUniversity of Franche-Comte, IUT Belfort-MontbéliardBelfortFrance

Personalised recommendations