Abstract
In this paper, we aim at exploiting the power computing of a graphics processing unit (GPU) cluster for solving large sparse linear systems. We implement the parallel algorithm of the generalized minimal residual iterative method using the Compute Unified Device Architecture programming language and the MPI parallel environment. The experiments show that a GPU cluster is more efficient than a CPU cluster. In order to optimize the performances, we use a compressed storage format for the sparse vectors and the hypergraph partitioning. These solutions improve the spatial and temporal localization of the shared data between the computing nodes of the GPU cluster.
Similar content being viewed by others
References
Ament M, Knittel G, Weiskopf D, Strasser W (2010) A parallel preconditioned conjugate gradient solver for the poisson problem on a multi-GPU platform. In: Proceedings of the 2010 18th Euromicro conference on parallel, distributed and network-based processing, IEEE Computer Society, pp 583–592
Arnoldi W (1951) The principle of minimized iteration in the solution of the matrix eigenvalue problem. Quart Appl Math 9:17–29
Bahi J, Contassot-Vivier S, Couturier R (2008) Parallel iterative algorithms: from sequential to grid computing. In: Numerical analysis and scientific computing. Chapman & Hall/CRC
Bahi J, Couturier R, Ziane Khodja L (2011) Parallel GMRES implementation for solving sparse linear systems on GPU clusters. In: Proceedings of the 19th high performance computing symposia, HPC ’11, SCS, International, pp 12–19
Bahi J, Couturier R, Ziane Khodja L (2012) Parallel sparse linear solver gmres for gpu clusters with compression of exchanged data. In: Euro-Par 2011: parallel processing workshops, volume 7155 of LNCS, Springer, pp 471–480
Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC’09, Portland, Oregon, ACM, pp 1–11
Bolz J, Farmer I, Grinspun E, Schröder P (2003) Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans Graph 22(3):917–924
Çatalyürek Ü, Aykanat C (1999) Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans Parallel Distrib Syst 10(7):673–693
Çatalyürek Ü, Aykanat C (1999) PaToH: partitioning tool for hypergraphs. http://bmi.osu.edu/~umit/PaToH/manual.pdf. Accessed 28 Feb 2014
Cevahir A, Nukada A, Matsuoka S (2009) Fast conjugate gradients with multiple GPUs. In: Computational science ICCS 2009, volume 5544 of LNCS, Springer, pp 893–903
Cevahir A, Nukada A, Matsuoka S (2010) High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning. Comput Sci Res Dev 25:83–91
Chen C, Taha T (2013) A communication reduction approach to iteratively solve large sparse linear systems on a GPGPU cluster. Cluster Comput 1–11
Contassot-Vivier S, Jost T, Vialle S (2012) Impact of asynchronism on GPU accelerated parallel iterative computations. In: Applied parallel and scientific computing, volume 7133 of LNCS, Springer, pp 43–53
Couturier R, Domas S (2012) Sparse systems solving on GPUs with GMRES. J Supercomput 59(3):1504–1516
CUSP Library. http://cusplibrary.github.io/. Accessed 28 Feb 2014
Davis T, Hu Y (1997) The University of Florida sparse matrix collection, Digest. http://www.cise.ufl.edu/research/sparse/matrices/. Accessed 28 Feb 2014
Devine K, Boman E, Heaphy R, Bisseling R, Çatalyürek Ü (2006) Parallel hypergraph partitioning for scientific computing. In: Proceedings of the 20th international conference on parallel and distributed processing, IPDPS’06, IEEE Computer Society, pp 124–124
DeVries B, Iannelli J, Trefftz C, O’Hearn K, Wolffe G (2013) Parallel implementations of FGMRES for solving large, sparse non-symmetric linear systems. Proc Comput Sci 18:491–500
Gaikwad A, Toke I (2010) Parallel iterative linear solvers on GPU: a financial engineering case. In: Proceedings of the 2010 18th Euromicro conference on parallel, distributed and network-based processing, IEEE Computer Society, pp 607–614
Ghaemian N, Abdollahzadeh A, Heinemann Z, Harrer A, Sharifi M, Heinemann G (2008) Accelerating the GMRES iterative linear solver of an oil reservoir simulator using the multi-processing power of compute unified device architecture of graphics cards. In: PARA 2008
Göddeke D, Strzodka R, Mohd-Yusof J, McCormick P, Buijssen S, Grajewski M, Turek S (2007) Exploring weak scalability for FEM calculations on a GPU-enhanced cluster. Parallel Comput Spec Issue High-perform Comput Accel 33(10–11):685–699
Haase G, Liebmann M, Douglas C, Plank G (2010) A parallel algebraic multigrid solver on graphics processing units. In: High performance computing and applications, volume 5938 of LNCS, Springer, pp 38–47
Jost T, Contassot-Vivier S, Vialle S (2009) An efficient multi-algorithms sparse linear solver for GPUs. In International conference on parallel computing, ParCo2009
Karypis G, Kumar V (1998) hMETIS: a hypergraph partitioning package. http://glaros.dtc.umn.edu/gkhome/fetch/sw/hmetis/manual.pdf. Accessed 28 Feb 2014
Li R, Saad Y (2013) GPU-accelerated preconditioned iterative linear solvers. J Supercomput 63(2):443–466
Neic A, Liebmann M, Haase G, Plank G (2012) Algebraic multigrid solver on clusters of CPUs and GPUs. In: Applied parallel and scientific computing, volume 7134 of LNCS, Springer, pp 389–398
NVIDIA Corporation (2012) CUDA Toolkit 4.2 CUBLAS Library.
NVIDIA Corporation (2012) NVIDIA CUDA C Programming Guide.
Paige C, Saunders M (1975) Solution of sparse indefinite systems of linear equations. SIAM J Numer Anal 12(4):617–629
PHG—parallel hypergraph and graph partitioning with Zoltan. http://www.cs.sandia.gov/Zoltan/ug_html/ug_alg_phg.html. Accessed 28 Feb 2014
Saad Y, Schultz M (1986) GMRES : a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J Sci Stat Comput 7(3):856–869
Wang M, Klie H, Parashar M, Sudan H (2009) Solving sparse linear systems on NVIDIA Tesla GPUs. In: Computational science ICCS 2009, volume 5544 of LNCS, Springer, pp 864–873
Weber D, Bender J, Schnoes M, Stork A, Fellner D (2013) Efficient GPU data structures and methods to solve sparse linear systems in dynamics applications. Comput Graph Forum 32:16–26
Zhao N, Wang X (2012) A parallel preconditioned Bi-Conjugate Gradient stabilized solver for the Poisson problem. J Comput 7(12): 3088–3095
Zoltan: parallel partitioning, load balancing and data-management services. User’s guide. http://www.cs.sandia.gov/Zoltan/ug_html/ug.html. Accessed 28 Feb 2014
Acknowledgments
This paper is based upon work supported by the Région de Franche-Comté and partially funded by the Labex ACTION program (contract ANR-11-LABX-01-01).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ziane Khodja, L., Couturier, R., Giersch, A. et al. Parallel sparse linear solver with GMRES method using minimization techniques of communications for GPU clusters. J Supercomput 69, 200–224 (2014). https://doi.org/10.1007/s11227-014-1143-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1143-8