Iterative Sparse Triangular Solves for Preconditioning

  • Hartwig AnztEmail author
  • Edmond Chow
  • Jack Dongarra
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9233)


Sparse triangular solvers are typically parallelized using level-scheduling techniques, but parallel efficiency is poor on high-throughput architectures like GPUs. We propose using an iterative approach for solving sparse triangular systems when an approximation is suitable. This approach will not work for all problems, but can be successful for sparse triangular matrices arising from incomplete factorizations, where an approximate solution is acceptable. We demonstrate the performance gains that this approach can have on GPUs in the context of solving sparse linear systems with a preconditioned Krylov subspace method. We also illustrate the effect of using asynchronous iterations.



This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Numbers DE-SC-0012538 and DE-SC-0010042. Support from NVIDIA is also gratefully acknowledged.


  1. 1.
    Alvarado, F.L., Schreiber, R.: Optimal parallel solution of sparse triangular systems. SIAM J. Sci. Comput. 14, 446–460 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Anderson, E.C., Saad, Y.: Solving sparse triangular systems on parallel computers. Intl. J. High Speed Comput. 1, 73–96 (1989)CrossRefzbMATHGoogle Scholar
  3. 3.
    Anzt, H., Tomov, S., Dongarra, J., Heuveline, V.: A block-asynchronous relaxation method for graphics processing units. J. Parallel Distrib. Comput. 73(12), 1613–1626 (2013)CrossRefGoogle Scholar
  4. 4.
    Anzt, H., Tomov, S., Gates, M., Dongarra, J., Heuveline, V.: Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems. In: ICCS. Procedia Computer Science, vol. 9, pp. 7–16. Elsevier (2012)Google Scholar
  5. 5.
    Anzt, H.: Asynchronous and Multiprecision Linear Solvers - Scalable and Fault-Tolerant Numerics for Energy Efficient High Performance Computing. Ph.D. thesis, Karlsruhe Institute of Technology, Institute for Applied and Numerical Mathematics, Nov 2012Google Scholar
  6. 6.
    Chow, E., Patel, A.: Fine-grained parallel incomplete LU factorization. SIAM J. Sci. Comput. 37, C169–C193 (2015)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Chow, E., Anzt, H., Dongarra, J.: Asynchronous iterative algorithm for computing incomplete factorizations on GPUs. In: Kunkel, J.M., Ludwig, T. (eds.) ISC High Performance 2015. LNCS, vol. 9137, pp. 1–16. Springer, Heidelberg (2015) CrossRefGoogle Scholar
  8. 8.
    Davis, T.A.: University of Florida Sparse Matrix Collection. na-digest 92 (1994)Google Scholar
  9. 9.
    Duff, I.S., Meurant, G.A.: The effect of ordering on preconditioned conjugate gradients. BIT 29(4), 635–657 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Duin, A.C.N.V.: Scalable parallel preconditioning with the sparse approximate inverse of triangular matrices. SIAM J. Matrix Anal. Appl. 20, 987–1006 (1996)CrossRefGoogle Scholar
  11. 11.
    Frommer, A., Szyld, D.B.: On asynchronous iterations. J. Comput. Appl. Math. 123, 201–216 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Hammond, S.W., Schreiber, R.: Efficient ICCG on a shared memory multiprocessor. Intl. J. High Speed Comput. 4, 1–21 (1992)CrossRefGoogle Scholar
  13. 13.
    Innovative Computing Lab: Software distribution of MAGMA version 1.6 (2015).
  14. 14.
    Mayer, J.: Parallel algorithms for solving linear systems with sparse triangular matrices. Computing 86(4), 291–312 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Naumov, M.: Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU. Technical report NVR-2011-001, NVIDIA (2011)Google Scholar
  16. 16.
    NVIDIA Corporation: NVIDIA CUDA Compute Unified Device Architecture Programming Guide, 2.3.1 edn., August 2009Google Scholar
  17. 17.
    NVIDIA Corporation: CUSPARSE LIBRARY V7.0, March 2015Google Scholar
  18. 18.
    NVIDIA Corporation: NVIDIA CUDA TOOLKIT V7.0, March 2015Google Scholar
  19. 19.
    Park, J., Smelyanskiy, M., Sundaram, N., Dubey, P.: Sparsifying synchronization for high-performance shared-memory sparse triangular solver. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 124–140. Springer, Heidelberg (2014) Google Scholar
  20. 20.
    Pothen, A., Alvarado, F.: A fast reordering algorithm for parallel sparse triangular solution. SIAM J. Sci. Statis. Comput. 13(2), 645–653 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Saad, Y.: A flexible inner-outer preconditioned GMRES algorithm. SIAM J. Sci. Comput. 14(2), 461–469 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Saad, Y.: Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia (2003)CrossRefzbMATHGoogle Scholar
  23. 23.
    Saltz, J.H.: Aggregation methods for solving sparse triangular systems on multiprocessors. SIAM J. Sci. Stat. Comput. 11, 123–144 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Tuma, M., Benzi, M.: A comparative study of sparse approximate inverse preconditioners. Appl. Numer. Math. 30, 305–340 (1998)MathSciNetGoogle Scholar
  25. 25.
    van der Vorst, H.: A vectorizable variant of some ICCG methods. SIAM J. Sci. Statis. Comput. 3(3), 350–356 (1982)CrossRefzbMATHGoogle Scholar
  26. 26.
    Wolf, M.M., Heroux, M.A., Boman, E.G.: Factors impacting performance of multithreaded sparse triangular solve. In: Daydé, M., Lopes, J.C., Marques, O., Palma, J.M.L.M. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 32–44. Springer, Heidelberg (2011) CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.University of TennesseeKnoxvilleUSA
  2. 2.Georgia Institute of TechnologyAtlantaUSA

Personalised recommendations