Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs

Mukunoki, Daichi; Takahashi, Daisuke

doi:10.1007/978-3-642-55224-3_59

Daichi Mukunoki^19,21 &
Daisuke Takahashi²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8384))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

1634 Accesses
3 Citations

Abstract

The convergence of the Krylov subspace methods is affected by round-off errors. The number of iterations until convergence may be decreased by reducing round-off errors through the use of quadruple precision arithmetic instead of double precision. We implemented the CG and BiCGStab methods using quadruple precision arithmetic and compared the performance with the standard double precision implementations on an NVIDIA Tesla K20X GPU. Our results show that in some cases our implementations using quadruple precision arithmetic outperform the double precision versions. We will show that quadruple precision arithmetic is not costly for the CG and BiCGStab methods on GPUs and the use of quadruple precision arithmetic may be a more effective alternative to the use of preconditioning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An efficient GPU version of the preconditioned GMRES method

Article 25 October 2018

Multi-GPU Acceleration of Algebraic Multigrid Preconditioners

Improving the Performance of the GMRES Method Using Mixed-Precision Techniques

References

Bailey, D.H.: QD (C++/Fortran-90 double-double and quad-double package). http://crd.lbl.gov/~dhbailey/mpdist/
Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., der Vorst, H.V.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994)
Book Google Scholar
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004 (2008)
Google Scholar
Davis, T., Hu, Y.: The University of Florida Sparse Matrix Collection. http://www.cise.ufl.edu/research/sparse/matrices/
Dekker, T.J.: A floating-point technique for extending the available precision. Numer. Math. 18, 224–242 (1971)
Article MATH MathSciNet Google Scholar
Furuichi, M., May, D., Tackley, P.: Development of a stokes flow solver robust to large viscosity jumps using a schur complement approach with mixed precision arithmetic. J. Comput. Phys. 230(24), 8835–8851 (2011)
Article MATH MathSciNet Google Scholar
Gravvanis, G., Filelis-Papadopoulos, C., Giannoutakis, K.: Solving finite difference linear systems on GPUs: CUDA based parallel explicit preconditioned biconjugate conjugate gradient type methods. J. Supercomput. 61(3), 590–604 (2012)
Article Google Scholar
Hasegawa, H.: Utilizing the quadruple-precision floating-point arithmetic operation for the Krylov Subspace Methods. In: Proceedings of the SIAM Conference on Applied Linear Algebra (LA03) (2003)
Google Scholar
IEEE Computer Society: IEEE Standard for Floating-Point Arithmetic. IEEE Std 754-2008, pp. 1–58 (2008)
Google Scholar
Mukunoki, D., Takahashi, D.: Implementation and evaluation of triple precision BLAS subroutines on GPUs. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW 2012), The 13th Workshop on Parallel and Distributed Scientific and Engineering, Computing (PDSEC-12), pp. 1378–1386 (2012)
Google Scholar
Naumov, M.: Incomplete-LU and Cholesky Preconditioned Iterative Methods Using CUSPARSE and CUBLAS. NVIDIA White Paper, \({\rm WP}-06720-001\_{\rm v}5.5\) (2013)
Google Scholar
NVIDIA Corporation: CUBLAS Library. https://developer.nvidia.com/cublas
NVIDIA Corporation: cuSPARSE Library. https://developer.nvidia.com/cusparse
Reguly, I., Giles, M.: Efficient sparse matrix-vector multiplication on cache-based GPUs. In: Proceedings of the Innovative Parallel Computing: Foundations and Applications of GPU, Manycore, and Heterogeneous Systems (InPar 2012), pp. 1–12 (2012)
Google Scholar
Saito, T., Ishiwata, E., Hasegawa, H.: Analysis of the GCR method with mixed precision arithmetic using QuPAT. J. Comput. Sci. 3(3), 87–91 (2012)
Article Google Scholar

Download references

Acknowledgment

This research was supported by JST, CREST.

Author information

Authors and Affiliations

Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan
Daichi Mukunoki
Faculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305–8573, Japan
Daisuke Takahashi
Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan
Daichi Mukunoki

Authors

Daichi Mukunoki
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Takahashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daichi Mukunoki .

Editor information

Editors and Affiliations

Institute of Computer and Information Science, Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Tennessee, Department of Computer Science, Knoxville, Tennessee, USA
Jack Dongarra
Institute of Computer and Information Science, Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski
Technical University of Denmark Informatics and Mathematical Modelling, Kongens Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mukunoki, D., Takahashi, D. (2014). Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2013. Lecture Notes in Computer Science(), vol 8384. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55224-3_59

Download citation

DOI: https://doi.org/10.1007/978-3-642-55224-3_59
Published: 06 May 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55223-6
Online ISBN: 978-3-642-55224-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs

Abstract

Access this chapter

Similar content being viewed by others

An efficient GPU version of the preconditioned GMRES method

Multi-GPU Acceleration of Algebraic Multigrid Preconditioners

Improving the Performance of the GMRES Method Using Mixed-Precision Techniques

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs

Abstract

Access this chapter

Similar content being viewed by others

An efficient GPU version of the preconditioned GMRES method

Multi-GPU Acceleration of Algebraic Multigrid Preconditioners

Improving the Performance of the GMRES Method Using Mixed-Precision Techniques

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation