Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems

  • Hartwig Anzt
  • Stanimire Tomov
  • Jack Dongarra
  • Vincent Heuveline
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7640)


In this paper, we analyze the potential of using weights for block-asynchronous relaxation methods on GPUs. For this purpose, we introduce different weighting techniques similar to those applied in block-smoothers for multigrid methods. For test matrices taken from the University of Florida Matrix Collection we report the convergence behavior and the total runtime for the different techniques. Analyzing the results, we observe that using weights may accelerate the convergence rate of block-asynchronous iteration considerably. While component-wise relaxation methods are seldom directly applied to systems of linear equations, using them as smoother in a multigrid framework they often provide an important contribution to finite element solvers. Since the parallelization potential of the classical smoothers like SOR and Gauss-Seidel is usually very limited, replacing them by weighted block-asynchronous smoothers may be beneficial to the overall multigrid performance. Due to the increase of heterogeneity in today’s architecture designs, the significance and the need for highly parallel asynchronous smoothers is expected to grow.


asynchronous relaxation weighted block-asynchronous iteration methods multigrid smoother GPU 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [AD86]
    Aydin, U., Dubois, M.: Generalized asynchronous iterations, pp. 272–278 (1986)Google Scholar
  2. [AD89]
    Aydin, U., Dubois, M.: Sufficient conditions for the convergence of asynchronous iterations. Parallel Computing 10(1), 83–92 (1989)MathSciNetzbMATHCrossRefGoogle Scholar
  3. [ATDH11]
    Anzt, H., Tomov, S., Dongarra, J., Heuveline, V.: A block-asynchronous relaxation method for graphics processing units. Technical report, Innovative Computing Laboratory, University of Tennessee, UT-CS-11-687 (2011)Google Scholar
  4. [ATG+11]
    Anzt, H., Tomov, S., Gates, M., Dongarra, J., Heuveline, V.: Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems. Technical report, Innovative Computing Laboratory, University of Tennessee, UT-CS-11-689 (2011)Google Scholar
  5. [Bag95]
    Bagnara, R.: A unified proof for the convergence of jacobi and gauss-seidel methods. SIAM Rev. 37, 93–97 (1995)MathSciNetzbMATHCrossRefGoogle Scholar
  6. [Bah97]
    Miellou, J.C., Rhofir, K., Bahi, J.: Asynchronous multisplitting methods for nonlinear fixed point problems. Numerical Algorithms 15(3-4), 315–345 (1997), cited By (since 1996) 23Google Scholar
  7. [BE86]
    Bertsekas, D.P., Eckstein, J.: Distributed asynchronous relaxation methods for linear network flow problems. In: Proceedings of IFAC 1987 (1986)Google Scholar
  8. [BFG+]
    Baker, A.H., Falgout, R.D., Gamblin, T., Kolev, T.V., Martin, S., Yang, U.M.: Scaling algebraic multigrid solvers: On the road to exascale. In: Proceedings of Competence in High Performance Computing, CiHPC 2010 (2010)Google Scholar
  9. [BFKMY11]
    Baker, A.H., Falgout, R.D., Kolev, T.V., Yang, U.M.: Multigrid smoothers for ultra-parallel computing, LLNL-JRNL-435315 (2011)Google Scholar
  10. [BMPS99]
    Bai, Z.-Z., Migallón, V., Penadés, J., Szyld, D.B.: Block and asynchronous two-stage methods for mildly nonlinear systems. Numerische Mathematik 82, 1–20 (1999)MathSciNetzbMATHCrossRefGoogle Scholar
  11. [BSS99]
    Blathras, K., Szyld, D.B., Shi, Y.: Timing models and local stopping criteria for asynchronous iterative algorithms. Journal of Parallel and Distributed Computing 58, 446–465 (1999)CrossRefGoogle Scholar
  12. [CM69]
    Chazan, D., Miranker, W.: Chaotic Relaxation. Linear Algebra and Its Applications 2(7), 199–222 (1969)MathSciNetzbMATHCrossRefGoogle Scholar
  13. [DB91]
    Dubois, M., Briggs, F.A.: The run-time efficiency of parallel asynchronous algorithms. IEEE Trans. Computers 40(11), 1260–1266 (1991)CrossRefGoogle Scholar
  14. [FS00]
    Frommer, A., Szyld, D.B.: On asynchronous iterations. Journal of Computational and Applied Mathematics 123, 201–216 (2000)MathSciNetzbMATHCrossRefGoogle Scholar
  15. [int]
    Intel C++ Compiler Options. Intel Corporation. Document Number: 307776-002USGoogle Scholar
  16. [NVI09]
    NVIDIA Corporation. NVIDIA CUDA Compute Unified Device Architecture Programming Guide, 2.3.1 edn. (August 2009)Google Scholar
  17. [NVI11]
    NVIDIA Corporation. CUDA TOOLKIT 4.0 READINESS FOR CUDA APPLICATIONS, 4.0 edn. (March 2011)Google Scholar
  18. [Saa03]
    Saad, Y.: Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia (2003)zbMATHCrossRefGoogle Scholar
  19. [Str97]
    Strikwerda, J.C.: A convergence theorem for chaotic asynchronous relaxation. Linear Algebra and its Applications 253(1-3), 15–24 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  20. [Var10]
    Varga, R.S.: Matrix Iterative Analysis. Springer Series in Computational Mathematics. Springer (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Hartwig Anzt
    • 1
  • Stanimire Tomov
    • 2
  • Jack Dongarra
    • 2
    • 3
    • 4
  • Vincent Heuveline
    • 1
  1. 1.Karlsruhe Institute of Technology (KIT)KarlsruheGermany
  2. 2.University of TennesseeKnoxvilleUSA
  3. 3.Oak Ridge National LaboratoryOak RidgeUSA
  4. 4.University of ManchesterManchesterUK

Personalised recommendations