Advertisement

A Comparison of Soft-Fault Error Models in the Parallel Preconditioned Flexible GMRES

  • Evan Coleman
  • Aygul Jamal
  • Marc Baboulin
  • Amal KhabouEmail author
  • Masha Sosonkina
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10777)

Abstract

The effect of two soft fault error models on the convergence of the parallel flexible GMRES (FGMRES) iterative method solving an elliptical PDE problem on a regular grid is evaluated. We consider two types of preconditioners: an incomplete LU factorization with dual threshold (ILUT), and an algebraic recursive multilevel solver (ARMS) combined with random butterfly transformation (RBT). The experiments quantify the difference between two soft fault error models considered in this study and compare their potential impact on the convergence.

Keywords

Fault tolerance Soft fault models FGMRES Parallel iterative linear solvers Preconditioners ARMS ILUT RBT randomization 

References

  1. 1.
    Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., et al.: The landscape of parallel computing research: a view from Berkeley. Technical report, UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006)Google Scholar
  2. 2.
    Baboulin, M., Dongarra, J., Herrmann, J., Tomov, S.: Accelerating linear system solutions using randomization techniques. ACM Trans. Math. Softw. 39(2), 8:1–8:13 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Baboulin, M., Jamal, A., Sosonkina, M.: Using random butterfly transformations in parallel Schur complement-based preconditioning. In: 2015 Federated Conference on Computer Science and Information Systems, pp. 649–654 (2015)Google Scholar
  4. 4.
    Bridges, P.G., Ferreira, K.B., Heroux, M.A., Hoemmen, M.: Fault-tolerant linear solvers via selective reliability. arXiv preprint arXiv:1206.1390 (2012)
  5. 5.
    Bronevetsky, G., de Supinski, B.: Soft error vulnerability of iterative linear algebra methods. In: Proceedings of the of the 22nd Annual International Conference on Supercomputing, pp. 155–164. ACM (2008)Google Scholar
  6. 6.
    Cappello, F., Geist, A., Gropp, W., Kale, S., Kramer, B., Snir, M.: Toward exascale resilience: 2014 update. Supercomput. Front. Innov. 1(1), 5–28 (2014)Google Scholar
  7. 7.
    Coleman, E., Sosonkina, M.: Evaluating a persistent soft fault model on preconditioned iterative methods. In: Proceedings of the 22nd Annual International Conference on Parallel and Distributed Processing Techniques and Applications (2016)Google Scholar
  8. 8.
    Coleman, E., Sosonkina, M., Chow, E.: Fault tolerant variants of the fine-grained parallel incomplete LU factorization. In: Proceedings of the 2017 Spring Simulation Multiconference. Society for Computer Simulation International (2017)Google Scholar
  9. 9.
    Elliott, J., Hoemmen, M., Mueller, F.: Evaluating the impact of SDC on the GMRES iterative solver. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 1193–1202. IEEE (2014)Google Scholar
  10. 10.
    Elliott, J., Hoemmen, M., Mueller, F.: Tolerating silent data corruption in opaque preconditioners (2014). arXiv:1404.5552
  11. 11.
    Elliott, J., Hoemmen, M., Mueller, F.: A numerical soft fault model for iterative linear solvers. In: Proceedings of the 24nd International Symposium on High-Performance Parallel and Distributed Computing (2015)Google Scholar
  12. 12.
    Elliott, J., Mueller, F., Stoyanov, M., Webster, C.: Quantifying the impact of single bit flips on floating point arithmetic. preprint (2013)Google Scholar
  13. 13.
    Elliott, J., Hoemmen, M., Mueller, F.: Resilience in numerical methods: a position on fault models and methodologies (2014). arXiv:1401.3013
  14. 14.
    Jamal, A., Baboulin, M., Khabou, A., Sosonkina, M.: A hybrid CPU/GPU approach for the parallel algebraic recursive multilevel solver pARMS. In: 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2016, Timisoara, Romania, pp. 411–416, 24–27 Sept 2016Google Scholar
  15. 15.
    Li, Z., Saad, Y., Sosonkina, M.: pARMS: a parallel version of the algebraic recursive multilevel solver. Numer. Linear Algebra Appl. 10(5–6), 485–509 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Saad, Y.: Iterative Methods for Sparse Linear Systems. Siam, Philadelphia (2003)CrossRefzbMATHGoogle Scholar
  17. 17.
    Saad, Y., Suchomel, B.: ARMS: an algebraic recursive multilevel solver for general sparse linear systems. Numer. Linear Algebra Appl. 9(5), 359–378 (2002)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Evan Coleman
    • 1
    • 2
  • Aygul Jamal
    • 3
  • Marc Baboulin
    • 3
  • Amal Khabou
    • 3
    Email author
  • Masha Sosonkina
    • 2
  1. 1.Naval Surface Warfare Center - Dahlgren DivisionDahlgrenUSA
  2. 2.Old Dominion UniversityNorfolkUSA
  3. 3.Université Paris-Sud, Université Paris-SaclayOrsayFrance

Personalised recommendations