GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement

  • Hartwig Anzt
  • Piotr Luszczek
  • Jack Dongarra
  • Vincent Heuveline
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7484)


In hardware-aware high performance computing, block- asynchronous iteration and mixed precision iterative refinement are two techniques that may be used to leverage the computing power of SIMD accelerators like GPUs in the iterative solution of linear equation systems. Although they use a very different approach for this purpose, they share the basic idea of compensating the convergence properties of an inferior numerical algorithm by a more efficient usage of the provided computing power. In this paper, we analyze the potential of combining both techniques. Therefore, we derive a mixed precision iterative refinement algorithm using a block-asynchronous iteration as an error correction solver, and compare its performance with a pure implementation of a block-asynchronous iteration and an iterative refinement method using double precision for the error correction solver. For matrices from the University of Florida Matrix collection, we report the convergence behaviour and provide the total solver runtime using different GPU architectures.


mixed precision iterative refinement block-asynchronous iteration GPU linear system relaxation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anzt, H., Heuveline, V., Rocker, B.: An Error Correction Solver for Linear Systems: Evaluation of Mixed Precision Implementations. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 58–70. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  2. 2.
    Anzt, H., Tomov, S., Dongarra, J., Heuveline, V.: A Block-Asynchronous Relaxation Method for Graphics Processing Units. Technical report, Innovative Computing Laboratory, University of Tennessee, UT-CS-11-687 (2011)Google Scholar
  3. 3.
    Anzt, H., Tomov, S., Gates, M., Dongarra, J., Heuveline, V.: Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems. Technical report, Innovative Computing Laboratory, University of Tennessee, UT-CS-11-689 (2011)Google Scholar
  4. 4.
    Aydin, U., Dubois, M.: Generalized asynchronous iterations, pp. 272–278 (1986)Google Scholar
  5. 5.
    Aydin, U., Dubois, M.: Sufficient conditions for the convergence of asynchronous iterations. Parallel Computing 10(1), 83–92 (1989)MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Baboulin, M., Buttari, A., Dongarra, J.J., Langou, J., Langou, J., Luszczek, P., Kurzak, J., Tomov, S.: Accelerating scientific computations with mixed precision algorithms. Computer Physics Communications 180(12), 2526–2533 (2009)zbMATHCrossRefGoogle Scholar
  7. 7.
    Bai, Z.-Z., Migallón, V., Penadés, J., Szyld, D.B.: Block and asynchronous two-stage methods for mildly nonlinear systems. Num. Math. 82, 1–20 (1999)zbMATHCrossRefGoogle Scholar
  8. 8.
    Baker, A.H., Falgout, R.D., Gamblin, T., Kolev, T.V., Martin, S., Meier Yang, U.: Scaling algebraic multigrid solvers: On the road to exascale. In: Proceedings of Competence in High Performance Computing CiHPC (2010)Google Scholar
  9. 9.
    Baker, A.H., Falgout, R.D., Kolev, T.V., Meier Yang, U.: Multigrid smoothers for ultra-parallel computing, LLNL-JRNL-435315 (2011)Google Scholar
  10. 10.
    Bertsekas, D.P., Eckstein, J.: Distributed asynchronous relaxation methods for linear network flow problems. In: Proceedings of IFAC 1987 (1986)Google Scholar
  11. 11.
    Buttari, A., Dongarra, J.J., Langou, J., Langou, J., Luszczek, P., Kurzak, J.: Mixed precision iterative refinement techniques for the solution of dense linear systems. Int. J. of High Perf. Comp. & Appl. 21(4), 457–486 (2007)CrossRefGoogle Scholar
  12. 12.
    Chazan, D., Miranker, W.: Chaotic Relaxation. Linear Algebra and Its Applications 2(7), 199–222 (1969)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Frommer, A., Szyld, D.B.: On asynchronous iterations. Journal of Computational and Applied Mathematics 123, 201–216 (2000)MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Göddeke, D., Strzodka, R.: Performance and accuracy of hardware–oriented native–, emulated– and mixed–precision solvers in FEM simulations (part 2: Double precision GPUs). Technical report, TU Dortmund (July 2008)Google Scholar
  15. 15.
    Göddeke, D., Strzodka, R., Turek, S.: Performance and accuracy of hardware–oriented native–, emulated– and mixed–precision solvers in FEM simulations. Int. J. of Parallel, Emergent and Distributed Systems 22(4), 221–256 (2007)zbMATHCrossRefGoogle Scholar
  16. 16.
    Kelley, C.T.: Iterative Methods for Linear and Nonlinear Equations. SIAM (1995)Google Scholar
  17. 17.
    NVIDIA Corporation. Whitepaper: NVIDIA’s Next Generation CUDA Compute Architecture: FermiGoogle Scholar
  18. 18.
    NVIDIA Corporation. CUDA Toolkit 4.0 Readiness For CUDA Applications, 4.0 edition (March 2011)Google Scholar
  19. 19.
    NVIDIA Corporation. NVIDIA CUDA Compute Unified Device Architecture C Programming Guide, 4.2 edition (April 2012)Google Scholar
  20. 20.
    Szyld, D.B.: The mystery of asynchronous iterations convergence when the spectral radius is one. Technical Report 98-102, Department of Mathematics, Temple University, Philadelphia, Pa. (October 1998)Google Scholar
  21. 21.
    Trefethen, N.: Hundred-dollar, hundred-digit challenge problems. SIAM News 35(1) (January 2, 2002), Problem no. 7.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Hartwig Anzt
    • 1
  • Piotr Luszczek
    • 2
  • Jack Dongarra
    • 2
    • 3
    • 4
  • Vincent Heuveline
    • 1
  1. 1.Karlsruhe Institute of TechnologyKarlsruheGermany
  2. 2.University of TennesseeKnoxvilleUSA
  3. 3.Oak Ridge National LaboratoryOak RidgeUSA
  4. 4.University of ManchesterManchesterUK

Personalised recommendations