Skip to main content

Towards an Exascale Enabled Sparse Solver Repository

  • Conference paper
  • First Online:
Software for Exascale Computing - SPPEXA 2013-2015

Abstract

As we approach the exascale computing era, disruptive changes in the software landscape are required to tackle the challenges posed by manycore CPUs and accelerators. We discuss the development of a new ‘exascale enabled’ sparse solver repository (the ESSR) that addresses these challenges—from fundamental design considerations and development processes to actual implementations of some prototypical iterative schemes for computing eigenvalues of sparse matrices. Key features of the ESSR include holistic performance engineering, tight integration between software layers and mechanisms to mitigate hardware failures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Equipping Sparse Solvers for the Exascale, http://blogs.fau.de/essex, funded by the priority program “Software for Exascale Computing” (SPPEXA) of the German Research Foundation (DFG)

  2. 2.

    see, e.g., https://www.olcf.ornl.gov/summit/

  3. 3.

    see http://bitbucket.org/essex

  4. 4.

    see http://www.sppexa.de/

  5. 5.

    https://github.com/google/googletest

  6. 6.

    https://github.org/google/sanitizers

  7. 7.

    https://doc.itc.rwth-aachen.de/display/CCP/Project+MUST

  8. 8.

    http://docs.nvidia.com/cuda/cuda-memcheck/

  9. 9.

    https://jenkins-ci.org

  10. 10.

    http://www.cscs.ch/computers/pizdaint/index.html

  11. 11.

    https://www.lrz.de/services/compute/supermuc/

References

  1. Alvermann, A., Basermann, A., Fehske, H., Galgon, M., Hager, G., Kreutzer, M., Krämer, L., Lang, B., Pieper, A., Röhrig-Zöllner, M., Shahzad, F., Thies, J., Wellein, G.: ESSEX: equipping sparse solvers for exascale. In: Lopes, L., et al. (eds.) Euro-Par 2014: Parallel Processing Workshops. Lecture Notes in Computer Science, vol. 8806, pp. 577–588. Springer, Cham (2014). http://dx.doi.org/10.1007/978-3-319-14313-2_49

    Google Scholar 

  2. Baker, C.G., Hetmaniuk, U.L., Lehoucq, R.B., Thornquist, H.K.: Anasazi software for the numerical solution of large-scale eigenvalue problems. ACM Trans. Math. Softw. 36 (3), 1–23 (2009). http://doi.acm.org/10.1145/1527286.1527287

    Article  MathSciNet  Google Scholar 

  3. Balay, S., Abhyankar, S., Adams, M.F., Brown, J., Brune, P., Buschelman, K., Dalcin, L., Eijkhout, V., Gropp, W.D., Kaushik, D., Knepley, M.G., McInnes, L.C., Rupp, K., Smith, B.F., Zampini, S., Zhang, H.: PETSc Web page (2015). http://www.mcs.anl.gov/petsc

    Google Scholar 

  4. Bland, W., Bouteiller, A., Herault, T., Hursey, J., Bosilca, G., Dongarra, J.: An evaluation of user-level failure mitigation support in MPI. In: Träff, J.L., Benkner, S., Dongarra, J. (eds.) Recent Advances in the Message Passing Interface. Lecture Notes in Computer Science, vol. 7490, pp. 193–203. Springer, Berlin/Heidelberg (2012)

    Chapter  Google Scholar 

  5. Daly, J. et al.: Inter-Agency Workshop on HPC Resilience at Extreme Scale. Tech. rep. (Feb 2012)

    Google Scholar 

  6. Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34 (1), A206–A239 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  7. Di Napoli, E., Polizzi, E., Saad, Y.: Efficient estimation of eigenvalue counts in an interval (2013). Preprint (arXiv:1308.4275), http://arxiv.org/abs/1308.4275

  8. El-Sayed, N., Schroeder, B.: Reading between the lines of failure logs: understanding how HPC systems fail. In: Proceedings of the 2013 43rd Annual IEEE-IFIP International Conference on Dependable Systems and Networks (DSN ’13), pp. 1–12. IEEE Computer Society, Washington, DC (2013)

    Google Scholar 

  9. Fokkema, D.R., Sleijpen, G.L.G., van der Vorst, H.A.: Jacobi–Davidson style QR and QZ algorithms for the reduction of matrix pencils. SIAM J. Sci. Comput. 20 (1), 94–125 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  10. Galgon, M., Krämer, L., Lang, B.: Counting eigenvalues and improving the integration in the FEAST algorithm (2012). Preprint BUW-IMACM 12/22, available from http://www.imacm.uni-wuppertal.de

  11. Galgon, M., Krämer, L., Lang, B., Alvermann, A., Fehske, H., Pieper, A.: Improving robustness of the FEAST algorithm and solving eigenvalue problems from graphene nanoribbons. Proc. Appl. Math. Mech. 14 (1), 821–822 (2014)

    Article  Google Scholar 

  12. Galgon, M., Krämer, L., Thies, J., Basermann, A., Lang, B.: On the parallel iterative solution of linear systems arising in the FEAST algorithm for computing inner eigenvalues. J. Parallel Comput. 49, 153–163 (2015)

    Article  MathSciNet  Google Scholar 

  13. Galgon, M., Krämer, L., Lang, B.: Adaptive choice of projectors in projection based eigensolvers (2015), submitted. Available from http://www.imacm.uni-wuppertal.de/

  14. GASPI project website: http://www.gaspi.de/en/project.html

  15. Gordon, D., Gordon, R.: CARP-CG: A robust and efficient parallel solver for linear systems, applied to strongly convection dominated PDEs. J. Parallel Comput. 36 (9), 495–515 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  16. Gropp, W.D., Kaushik, D.K., Keyes, D.E., Smith, B.F.: Towards realistic performance bounds for implicit CFD codes. In: Ecer, A., et al. (eds.) Proceedings of Parallel CFD’99, pp. 233–240. Elesevier, New York (1999)

    Google Scholar 

  17. Hernandez, V., Roman, J.E., Vidal, V.: SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Softw. 31 (3), 351–362 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  18. Heroux, M.A., Bartlett, R.A., Howle, V.E., Hoekstra, R.J., Hu, J.J., Kolda, T.G., Lehoucq, R.B., Long, K.R., Pawlowski, R.P., Phipps, E.T., Salinger, A.G., Thornquist, H.K., Tuminaro, R.S., Willenbring, J.M., Williams, A., Stanley, K.S.: An overview of the Trilinos project. ACM Trans. Math. Softw. 31 (3), 397–423 (2005), http://doi.acm.org/10.1145/1089014.1089021

    Article  MathSciNet  MATH  Google Scholar 

  19. Heroux, M.A., Willenbring, J.M.: A new overview of the Trilinos project. Sci. Program. 20 (2), 83–88 (2012)

    Google Scholar 

  20. Hochstenbach, M.E., Notay, Y.: The Jacobi-Davidson method. GAMM-Mitteilungen 29 (2), 368–382 (2006). http://mntek3.ulb.ac.be/pub/docs/reports/pdf/jdgamm.pdf

    Article  MathSciNet  MATH  Google Scholar 

  21. Hursey, J.: Coordinated checkpoint/restart process fault tolerance for MPI applications on HPC systems. Ph.D. thesis, Indiana University, Bloomington (2010)

    Google Scholar 

  22. Jackson, D.: On approximation by trigonometric sums and polynomials. Trans. Am. Math. Soc. 13, 491–515 (1912)

    Article  MathSciNet  MATH  Google Scholar 

  23. Krämer, L.: Integration based solvers for standard and generalized Hermitian eigenvalue problems. Ph.D. thesis, Bergische Universität Wuppertal (2014). http://nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:hbz:468-20140701-112141-6

  24. Krämer, L., Di Napoli, E., Galgon, M., Lang, B., Bientinesi, P.: Dissecting the FEAST algorithm for generalized eigenproblems. J. Comput. Appl. Math. 244, 1–9 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  25. Kreutzer, M., Hager, G., Wellein, G., Pieper, A., Alvermann, A., Fehske, H.: Performance engineering of the kernel polynomial method on large-scale CPU-GPU systems. In: Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International, pp. 417–426 (2015). http://arXiv.org/abs/1410.5242

  26. Kreutzer, M., Pieper, A., Alvermann, A., Fehske, H., Hager, G., Wellein, G., Bishop, A.R.: Efficient large-scale sparse eigenvalue computations on heterogeneous hardware. In: Poster at the 2015 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (2015). http://sc15.supercomputing.org/sites/all/themes/SC15images/tech_poster/tech_poster_pages/post205.html.

  27. Kreutzer, M., Thies, J., Pieper, A., Alvermann, A., Galgon, M., Röhrig-Zöllner, M., Shahzad, F., Basermann, A., Bishop, A., Fehske, H., Hager, G., Lang, B., Wellein, G.: Performance engineering and energy efficiency of building blocks for large, sparse eigenvalue computations on heterogeneous supercomputers. In: Bungartz, H.-J., et al. (eds.) Software for Exascale Computing – SPPEXA 2013–2015. Lecture Notes in Computational Science and Engineering, vol. 113. Springer (2016)

    Google Scholar 

  28. Kreutzer, M., Thies, J., Röhrig-Zöllner, M., Pieper, A., Shahzad, F., Galgon, M., Basermann, A., Fehske, H., Hager, G., Wellein, G.: GHOST: building blocks for high performance sparse linear algebra on heterogeneous systems (2015). Preprint (arXiv:1507.08101), http://arxiv.org/abs/1507.08101

  29. Laguna, I., et al.: Evaluating user-level fault tolerance for MPI applications. In: Proceedings of the 21st European MPI Users’ Group Meeting (EuroMPI/ASIA ’14), pp. 57:57–57:62. ACM, New York (2014)

    Google Scholar 

  30. Lehoucq, R.B., Yang, C.C., Sorensen, D.C.: ARPACK users’ guide: solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods. SIAM, Philadelphia (1998). http://opac.inria.fr/record=b1104502

    Book  MATH  Google Scholar 

  31. Pieper, A., Kreutzer, M., Galgon, M., Alvermann, A., Fehske, H., Hager, G., Lang, B., Wellein, G.: High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations (2015), submitted. Preprint (arXiv:1510.04895)

    Google Scholar 

  32. Polizzi, E.: A density matrix-based algorithm for solving eigenvalue problems. Phys. Rev. B 79, 115112 (2009)

    Article  Google Scholar 

  33. Polizzi, E., Kestyn, J.: High-performance numerical library for solving eigenvalue problems: FEAST eigenvalue solver v3.0 user guide (2015). http://arxiv.org/abs/1203.4031

  34. (PT-)SCOTCH project website. http://www.labri.fr/perso/pelegrin/scotch/

  35. Röhrig-Zöllner, M., Thies, J., Kreutzer, M., Alvermann, A., Pieper, A., Basermann, A., Hager, G., Wellein, G., Fehske, H.: Performance of block Jacobi-Davidson eigensolvers. In: Poster at 2014 ACM/IEEE International Conference on High Performance Computing Networking, Storage and Analysis (2014)

    Google Scholar 

  36. Röhrig-Zöllner, M., Thies, J., Kreutzer, M., Alvermann, A., Pieper, A., Basermann, A., Hager, G., Wellein, G., Fehske, H.: Increasing the performance of the Jacobi-Davidson method by blocking. SIAM J. Sci. Comput. 37 (6), C697–C722 (2015). http://elib.dlr.de/89980/

    Google Scholar 

  37. Sato, K. et al.: Design and modeling of a non-blocking checkpointing system. In: Proceedings of the Conference on High Performance Computing, Networking, Storage and Analysis, pp. 19:1–19:10. IEEE Computer Society Press, Los Alamitos (2012)

    Google Scholar 

  38. Shahzad, F., Kreutzer, M., Zeiser, T., Machado, R., Pieper, A., Hager, G., Wellein, G.: Building a fault tolerant application using the GASPI communication layer. In: Proceedings of the 1st International Workshop on Fault Tolerant Systems (FTS 2015), in conjunction with IEEE Cluster 2015, pp. 580–587 (2015)

    Google Scholar 

  39. Shahzad, F., Wittmann, M., Kreutzer, M., Zeiser, T., Hager, G., Wellein, G.: A survey of checkpoint/restart techniques on distributed memory systems. Parallel Process. Lett. 23 (04), 1340011–1–1340011–20 (2013). http://www.worldscientific.com/doi/abs/10.1142/S0129626413400112

    Google Scholar 

  40. Shahzad, F., Wittmann, M., Zeiser, T., Hager, G., Wellein, G.: An evaluation of different I/O techniques for checkpoint/restart. In: Proceedings of the 2013 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1708–1716. IEEE Computer Society (2013). http://dx.doi.org/10.1109/IPDPSW.2013.145

  41. Shahzad, F., Wittmann, M., Zeiser, T., Wellein, G.: Asynchronous checkpointing by dedicated checkpoint threads. In: Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface (EuroMPI’12), pp. 289–290. Springer, Berlin/Heidelberg (2012)

    Google Scholar 

  42. Stathopoulos, A., McCombs, J.R.: PRIMME: preconditioned iterative multimethod eigensolver–methods and software description. ACM Trans. Math. Softw. 37 (2), 1–30 (2010)

    Article  Google Scholar 

  43. Stathopoulos, A., Wu, K.: A block orthogonalization procedure with constant synchronization requirements. SIAM J. Sci. Comput. 23 (6), 2165–2182 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  44. Tal-Ezer, H., Kosloff, R.: An accurate and efficient scheme for propagating the time dependent Schrödinger equation. J. Chem. Phys. 81, 3967 (1984)

    Article  Google Scholar 

  45. TOP500 Supercomputer Sites. http://www.top500.org, accessed: June 2015

  46. Treibig, J., Hager, G., Wellein, G.: LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of the 2010 39th International Conference on Parallel Processing Workshops (ICPPW ’10), pp. 207–216. IEEE Computer Society, Washington, DC (2010). http://dx.doi.org/10.1109/ICPPW.2010.38

  47. Weiße, A., Fehske, H.: Chebyshev expansion techniques. In: Fehske, H., Schneider, R., Weiße, A. (eds.) Computational Many-Particle Physics. Lecture Notes Physics, vol. 739, pp. 545–577. Springer, Berlin/Heidelberg (2008)

    Chapter  Google Scholar 

  48. Weiße, A., Wellein, G., Alvermann, A., Fehske, H.: The kernel polynomial method. Rev. Mod. Phys. 78, 275–306 (2006). http://dx.doi.org/10.1103/RevModPhys.78.275

    Article  MathSciNet  MATH  Google Scholar 

  49. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52 (4), 65–76 (2009). http://doi.acm.org/10.1145/1498765.1498785

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the German Research Foundation (DFG) through the Priority Program 1648 “Software for Exascale Computing” under project ESSEX. We would like to thank Michael Meinel (DLR Simulation and Software Technology, software engineering group) for helpful comments on the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonas Thies .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Thies, J. et al. (2016). Towards an Exascale Enabled Sparse Solver Repository. In: Bungartz, HJ., Neumann, P., Nagel, W. (eds) Software for Exascale Computing - SPPEXA 2013-2015. Lecture Notes in Computational Science and Engineering, vol 113. Springer, Cham. https://doi.org/10.1007/978-3-319-40528-5_13

Download citation

Publish with us

Policies and ethics