Skip to main content

Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs

  • Conference paper
  • First Online:
Parallel Processing and Applied Mathematics (PPAM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8384))

Abstract

The convergence of the Krylov subspace methods is affected by round-off errors. The number of iterations until convergence may be decreased by reducing round-off errors through the use of quadruple precision arithmetic instead of double precision. We implemented the CG and BiCGStab methods using quadruple precision arithmetic and compared the performance with the standard double precision implementations on an NVIDIA Tesla K20X GPU. Our results show that in some cases our implementations using quadruple precision arithmetic outperform the double precision versions. We will show that quadruple precision arithmetic is not costly for the CG and BiCGStab methods on GPUs and the use of quadruple precision arithmetic may be a more effective alternative to the use of preconditioning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bailey, D.H.: QD (C++/Fortran-90 double-double and quad-double package). http://crd.lbl.gov/~dhbailey/mpdist/

  2. Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., der Vorst, H.V.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994)

    Book  Google Scholar 

  3. Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004 (2008)

    Google Scholar 

  4. Davis, T., Hu, Y.: The University of Florida Sparse Matrix Collection. http://www.cise.ufl.edu/research/sparse/matrices/

  5. Dekker, T.J.: A floating-point technique for extending the available precision. Numer. Math. 18, 224–242 (1971)

    Article  MATH  MathSciNet  Google Scholar 

  6. Furuichi, M., May, D., Tackley, P.: Development of a stokes flow solver robust to large viscosity jumps using a schur complement approach with mixed precision arithmetic. J. Comput. Phys. 230(24), 8835–8851 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  7. Gravvanis, G., Filelis-Papadopoulos, C., Giannoutakis, K.: Solving finite difference linear systems on GPUs: CUDA based parallel explicit preconditioned biconjugate conjugate gradient type methods. J. Supercomput. 61(3), 590–604 (2012)

    Article  Google Scholar 

  8. Hasegawa, H.: Utilizing the quadruple-precision floating-point arithmetic operation for the Krylov Subspace Methods. In: Proceedings of the SIAM Conference on Applied Linear Algebra (LA03) (2003)

    Google Scholar 

  9. IEEE Computer Society: IEEE Standard for Floating-Point Arithmetic. IEEE Std 754-2008, pp. 1–58 (2008)

    Google Scholar 

  10. Mukunoki, D., Takahashi, D.: Implementation and evaluation of triple precision BLAS subroutines on GPUs. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW 2012), The 13th Workshop on Parallel and Distributed Scientific and Engineering, Computing (PDSEC-12), pp. 1378–1386 (2012)

    Google Scholar 

  11. Naumov, M.: Incomplete-LU and Cholesky Preconditioned Iterative Methods Using CUSPARSE and CUBLAS. NVIDIA White Paper, \({\rm WP}-06720-001\_{\rm v}5.5\) (2013)

    Google Scholar 

  12. NVIDIA Corporation: CUBLAS Library. https://developer.nvidia.com/cublas

  13. NVIDIA Corporation: cuSPARSE Library. https://developer.nvidia.com/cusparse

  14. Reguly, I., Giles, M.: Efficient sparse matrix-vector multiplication on cache-based GPUs. In: Proceedings of the Innovative Parallel Computing: Foundations and Applications of GPU, Manycore, and Heterogeneous Systems (InPar 2012), pp. 1–12 (2012)

    Google Scholar 

  15. Saito, T., Ishiwata, E., Hasegawa, H.: Analysis of the GCR method with mixed precision arithmetic using QuPAT. J. Comput. Sci. 3(3), 87–91 (2012)

    Article  Google Scholar 

Download references

Acknowledgment

This research was supported by JST, CREST.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daichi Mukunoki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mukunoki, D., Takahashi, D. (2014). Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2013. Lecture Notes in Computer Science(), vol 8384. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55224-3_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-55224-3_59

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-55223-6

  • Online ISBN: 978-3-642-55224-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics