A communication reduction approach to iteratively solve large sparse linear systems on a GPGPU cluster

Chen, Chong; Taha, Tarek M.

doi:10.1007/s10586-013-0279-2

A communication reduction approach to iteratively solve large sparse linear systems on a GPGPU cluster

Published: 22 June 2013

Volume 17, pages 327–337, (2014)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Chong Chen¹ &
Tarek M. Taha¹

338 Accesses
7 Citations
Explore all metrics

Abstract

Finite Element Methods (FEM) are widely used in academia and industry, especially in the fields of mechanical engineering, civil engineering, aerospace, and electrical engineering. These methods usually convert partial difference equations into large sparse linear systems. For complex problems, solving these large sparse linear systems is a time consuming process. This paper presents a parallelized iterative solver for large sparse linear systems implemented on a GPGPU cluster. Traditionally, these problems do not scale well on GPGPU clusters. This paper presents an approach to reduce the communications between cluster compute nodes for these solvers. Additionally, computation and communication are overlapped to reduce the impact of data exchange. The parallelized system achieved a speedup of up to 15.3 times on 16 NVIDIA Tesla GPUs, compared to a single GPU. An analytical evaluation of the algorithm is conducted in this paper, and the analytical equations for predicting the performance are presented and validated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A parallel version of GPBi-CG method suitable for distributed parallel computing

Article 31 December 2014

Implementation and performance evaluation of a communication-avoiding GMRES method for stencil-based code on GPU cluster

Article 05 September 2019

A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster

Article 11 October 2016

References

Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia (2003)
Book MATH Google Scholar
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of SC ’09, Portland, OR (2009)
Google Scholar
Mohiyuddin, M., Hoemmen, M., Demmel, J., Yelick, K.: Minimizing communication in sparse matrix solvers. In: Proceedings of SC’09 (2009)
Google Scholar
Bahi, J.M., Couturier, R., Khodja, L.J.: Parallel GMRES implementation for solving sparse linear systems on GPU clusters. In: Proceedings of HPC’11, Boston, MA, pp. 12–19 (2011)
Google Scholar
Bahi, J.M., Couturier, R., Khodja, L.J.: Parallel sparse linear solver GMRES for GPU clusters with compression of exchanged data. In: Lect. Notes Comput. Sci., vol. 7155, pp. 471–480 (2012)
Google Scholar
Cevahir, A., Nukada, A., Matsuoka, S.: High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning. Comput. Sci. Res. Dev. 25, 83–91 (2010)
Article Google Scholar
He, H., Wanga, L., Lee, E., Chen, P.: An MPI-CUDA implementation and optimization for parallel sparse equations and least squares (LSQR). In: The 2012 International Conference on Computational Science (ICCS), Procedia Computer Science. Elsevier, Amsterdam (2012)
Google Scholar
NVIDIA: CUDA programming guide, version 5.0 (2012)
CUSP library: http://code.google.com/p/cusp-library/
Guo, P., Huang, H., Chen, Q., Wang, L., Lee, E., Chen, P.: A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs. In: Proceeding of TeraGrid ’11, Salt Lake City, UT (2011)
Google Scholar
Godwin, J., Holewinski, J., Sadayappan, P.: High-performance sparse matrix-vector multiplication on GPUs for structured grid computations. In: The GPGPU 5, London, UK (2012)
Google Scholar
NVIDIA: Developing a Linux kernel module using RDMA for GPU direct, v0.2, July 2012
Jordan, A., Bycul, R.P.: The parallel algorithm of conjugate gradient method. In: Lect. Notes Comput. Sci., vol. 2326, pp. 156–165 (2002)
Google Scholar
Ament, M., Knittel, G., Weiskopf, D., Straßer, W.: A parallel preconditioned conjugate gradient solver for the Poisson problem on a multi-GPU platform. In: Proceedings of 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, Pisa, Italy (2010)
Google Scholar
Benzi, M., Meyer, C.D., Tuma, M.: A sparse approximate inverse preconditioner for the conjugate gradient method. SIAM J. Sci. Comput. 17(5), 1135–1149 (1996)
Article MATH MathSciNet Google Scholar
Huckle, T.: Factorized sparse approximate inverses for preconditioning. J. Supercomput. 25, 109–117 (2003)
Article MATH Google Scholar
Devine, K., Boman, E., Heaphy, R., Bisseling, R., Atalyurek, U.: Parallel hypergraph partitioning for scientific computing. In: Proceeding of IPDPS’06, Isle of Rhodes, Greece (2006)
Google Scholar
Schloegel, K., Karypis, G., Kumar, K.: Parallel static and dynamic multi-constraint graph partitioning. Concurr. Comput. 14, 219–240 (2012)
Article Google Scholar
Catalyurek, U.V., Aykanat, C.: Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Distrib. Syst. 10(7), 673–693 (1999)
Article Google Scholar
Pichel, J.C., Rivera, F.F., Fernández, M., Rodríguez, A.: Optimization of sparse matrix–vector multiplication using reordering techniques on GPUs. Microprocess. Microsyst. 36(2), 65–77 (2012)
Article Google Scholar
ViennaCL: http://viennacl.sourceforge.net/
Davis, T., Hu, Y.: The University of Florida sparse matrix collection. http://www.cise.ufl.edu/research/sparse/matrices/
Blelloch, G.E., Koutis, I., Miller, G.L., Tangwongsan, K.: Hierarchical diagonal blocking and precision reduction applied to combinatorial multigrid. In: Proceedings of SC’10, New Orleans, LA (2010)
Google Scholar
Göddeke, D.: Fast and accurate finite-element multigrid solvers for PDE simulations on GPU clusters. PhD dissertation, Technische Universität Dortmund, Fakultät für Mathematik, Logos Verlag, Berlin (2010). ISBN: 978-3-8325-2768-6
Bolz, J., Schr, P.: Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans. Graph. 22(3), 917–924 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Dayton, Dayton, OH, 45469, USA
Chong Chen & Tarek M. Taha

Authors

Chong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tarek M. Taha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tarek M. Taha.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, C., Taha, T.M. A communication reduction approach to iteratively solve large sparse linear systems on a GPGPU cluster. Cluster Comput 17, 327–337 (2014). https://doi.org/10.1007/s10586-013-0279-2

Download citation

Received: 17 February 2013
Accepted: 13 May 2013
Published: 22 June 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10586-013-0279-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A communication reduction approach to iteratively solve large sparse linear systems on a GPGPU cluster

Abstract

Access this article

Similar content being viewed by others

A parallel version of GPBi-CG method suitable for distributed parallel computing

Implementation and performance evaluation of a communication-avoiding GMRES method for stencil-based code on GPU cluster

A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A communication reduction approach to iteratively solve large sparse linear systems on a GPGPU cluster

Abstract

Access this article

Similar content being viewed by others

A parallel version of GPBi-CG method suitable for distributed parallel computing

Implementation and performance evaluation of a communication-avoiding GMRES method for stencil-based code on GPU cluster

A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation