ESSEX: Equipping Sparse Solvers for Exascale

  • Andreas Alvermann
  • Achim Basermann
  • Holger Fehske
  • Martin Galgon
  • Georg Hager
  • Moritz Kreutzer
  • Lukas Krämer
  • Bruno Lang
  • Andreas Pieper
  • Melven Röhrig-Zöllner
  • Faisal Shahzad
  • Jonas Thies
  • Gerhard Wellein
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8806)

Abstract

The ESSEX project investigates computational issues arising at exascale for large-scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The project pursues a coherent co-design of all software layers where a holistic performance engineering process guides code development across the classic boundaries of application, numerical method, and basic kernel library. Within ESSEX the numerical methods cover widely applicable solvers such as classic Krylov, Jacobi-Davidson, or the recent FEAST methods, as well as domain-specific iterative schemes relevant for the ESSEX quantum physics application. This report introduces the project structure and presents selected results which demonstrate the potential impact of ESSEX for efficient sparse solvers on highly scalable heterogeneous supercomputers.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Threading and GPGPU support in PETSc, http://www.mcs.anl.gov/petsc/features/
  2. 2.
    Parallel Arnoldi package (PARPACK) homepage, http://www.caam.rice.edu/~kristyn/parpack_home.html
  3. 3.
  4. 4.
    LAMA — Library for Accelerated Math Applications, http://www.libama.org
  5. 5.
    Förster, M., Kraus, J.: Scalable parallel AMG on ccNUMA machines with OpenMP. Computer Science - Research and Development 26, 221–228 (2011) ISSN 1865-2034CrossRefGoogle Scholar
  6. 6.
    pOSKI: parallel optimized sparse kernel interface, http://bebop.cs.berkeley.edu/poski
  7. 7.
    Bautista-Gomez, L., Tsuboi, S., Komatitsch, D., Cappello, F., Maruyama, N., Matsuoka, S.: FTI: high performance fault tolerance interface for hybrid systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 32:1–32:32. ACM, New York (2011)Google Scholar
  8. 8.
    Plank, J.S., Kim, Y., Dongarra, J.J.: Algorithm-based diskless checkpointing for fault-tolerant matrix operations. In: Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing, FTCS 1995, pp. 351–360. IEEE Computer Society, Washington, DC (1995)Google Scholar
  9. 9.
    Horton, M., Tomov, S., Dongarra, J.: A class of hybrid LAPACK algorithms for multicore and GPU architectures. In: Symposium on Application Accelerators in High-Performance Computing, pp. 150–158. IEEE Computer Society, Los Alamitos (2011)Google Scholar
  10. 10.
    Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multicore chips via simple machine models. Concurrency Computat. Pract. Exper. (2013), doi:10.1002/cpe.3180Google Scholar
  11. 11.
    Polizzi, E.: Density-matrix-based algorithm for solving eigenvalue problems. Phys. Rev. B 79, 115112 (2009)CrossRefGoogle Scholar
  12. 12.
    Weiße, A., Wellein, G., Alvermann, A., Fehske, H.: The kernel polynomial method. Rev. Mod. Phys. 78, 275 (2006)CrossRefMATHGoogle Scholar
  13. 13.
    Tal-Ezer, H., Kosloff, R.: An accurate and efficient scheme for propagating the time dependent Schrödinger equation. J. Chem. Phys. 81, 3967 (1984)CrossRefGoogle Scholar
  14. 14.
    Fehske, H., Schleede, J., Schubert, G., Wellein, G., Filinov, V.S., Bishop, A.R.: Numerical approaches to time evolution of complex quantum systems. Phys. Lett. A 373, 2182 (2009)CrossRefMATHGoogle Scholar
  15. 15.
    Alvermann, A., Fehske, H.: High-order commutator-free exponential time-propagation of driven quantum systems. J. Comp. Phys. 230, 5930 (2011)CrossRefMathSciNetMATHGoogle Scholar
  16. 16.
    di Napoli, E., Polizzi, E., Saad, Y.: Efficient estimation of eigenvalue counts in an interval, Preprint arXiv:1308.4275 (2013)Google Scholar
  17. 17.
    Bhardwaj, O., Ineichen, Y., Bekas, C., Curioni, A.: Highly scalable linear time estimation of spectrograms - a tool for very large scale data analysis. Poster at 2013 ACM/IEEE International Conference on High Performance Computing Networking, Storage and Analysis (2013)Google Scholar
  18. 18.
    Pieper, A., Schubert, G., Wellein, G., Fehske, H.: Effects of disorder and contacts on transport through graphene nanoribbons. Phys. Rev. B 88, 195409 (2013)CrossRefGoogle Scholar
  19. 19.
    Pieper, A., Heinisch, R.L., Wellein, G., Fehske, H.: Dot-bound and dispersive states in graphene quantum dot superlattices. Phys. Rev. B 89, 165121 (2014)CrossRefGoogle Scholar
  20. 20.
    Krämer, L., Galgon, M., Lang, B., Alvermann, A., Fehske, H., Pieper, A.: Improving robustness of the FEAST algorithm and solving eigenvalue problems from graphene nanoribbons (Submitted to PAMM 2014)Google Scholar
  21. 21.
    Krämer, L., Di Napoli, E., Galgon, M., Lang, B., Bientinesi, P.: Dissecting the FEAST algorithm for generalized eigenproblems. J. Comput. Appl. Math. 244, 1–9 (2013)CrossRefMathSciNetMATHGoogle Scholar
  22. 22.
    Krämer, L.: Integration Based Solvers for Standard and Generalized Eigenvalue Problems. Ph.D. thesis, Bergische Universität Wuppertal (2014)Google Scholar
  23. 23.
    Röhrig-Zöllner, M., Thies, J., Kreutzer, M., Alvermann, A., Pieper, A., Basermann, A., Hager, G., Wellein, G., Fehske, H.: Increasing the performance of the Jacobi-Davidson method by blocking. SIAM J. Sci. Comput. (Submitted)Google Scholar
  24. 24.
    Shahzad, F., Wittmann, M., Zeiser, T., Wellein, G.: Asynchronous checkpointing by dedicated checkpoint threads. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds.) EuroMPI 2012. LNCS, vol. 7490, pp. 289–290. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  25. 25.
    Shahzad, F., Wittmann, M., Kreutzer, M., Zeiser, T., Hager, G., Wellein, G.: A survey of checkpoint/restart techniques on distributed memory systems. Parallel Processing Letters 23(04), 13400111–134001120 (2013)CrossRefMathSciNetGoogle Scholar
  26. 26.
    Shahzad, F., Wittmann, M., Zeiser, T., Hager, G., Wellein, G.: An evaluation of different I/O techniques for checkpoint/restart. In: Proceedings of the 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 1708–1716. IEEE Computer Society (2013)Google Scholar
  27. 27.
    Shahzad, F., Wittmann, M., Kreutzer, M., Zeiser, T., Hager, G., Wellein, G.: PGAS implementation of SPMVM and LBM with GPI. In: Proceedings of the 7th International Conference on PGAS Programming Models, pp. 172–184 (2013)Google Scholar
  28. 28.
    Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In:Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 18:1–18:11. ACM, New York (2009)Google Scholar
  29. 29.
    Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Bishop, A.: A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM Journal on Scientific Computing 36(5), C401–C423 (2014)Google Scholar
  30. 30.
    Müthing, S., Ribbrock, D., Göddeke, D.: Integrating multi-threading and accelerators into DUNE-ISTL. In: Proceedings of ENUMATH 2013 (accepted 2014)Google Scholar
  31. 31.
    Anzt, H., Tomov, S., Dongarra, J.: Implementing a sparse matrix vector product for the SELL-C/SELL-C-σ formats on NVIDIA GPUs. Tech. rep. (March 2014), http://www.eecs.utk.edu/resources/library/585
  32. 32.
    Intel Math Kernel Library (MKL), https://software.intel.com/en-us/intel-mkl

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Andreas Alvermann
    • 1
  • Achim Basermann
    • 2
  • Holger Fehske
    • 1
  • Martin Galgon
    • 3
  • Georg Hager
    • 4
  • Moritz Kreutzer
    • 4
  • Lukas Krämer
    • 3
  • Bruno Lang
    • 3
  • Andreas Pieper
    • 1
  • Melven Röhrig-Zöllner
    • 2
  • Faisal Shahzad
    • 4
  • Jonas Thies
    • 2
  • Gerhard Wellein
    • 4
  1. 1.Ernst-Moritz-Arndt-Universität GreifswaldGreifswaldGermany
  2. 2.German Aerospace CenterKölnGermany
  3. 3.Bergische Universität WuppertalWuppertalGermany
  4. 4.Friedrich-Alexander-Universität Erlangen-NürnbergErlangenGermany

Personalised recommendations