Skip to main content

Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects

Abstract

We first briefly report on the status and recent achievements of the ELPA-AEO (Eigen value Solvers for Petaflop Applications—Algorithmic Extensions and Optimizations) and ESSEX II (Equipping Sparse Solvers for Exascale) projects. In both collaboratory efforts, scientists from the application areas, mathematicians, and computer scientists work together to develop and make available efficient highly parallel methods for the solution of eigenvalue problems. Then we focus on a topic addressed in both projects, the use of mixed precision computations to enhance efficiency. We give a more detailed description of our approaches for benefiting from either lower or higher precision in three selected contexts and of the results thus obtained.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  1. 1.

    Alvermann, A., Basermann, A., Fehske, H., Galgon, M., Hager, G., Kreutzer, M., Krämer, L., Lang, B., Pieper, A., Röhrig-Zöllner, M., Shahzad, F., Thies, J., Wellein, G.: ESSEX: Equipping sparse solvers for exascale. In: Lopes, L., et al. (eds.) Euro-Par 2014: Parallel Processing Workshops, LNCS, Springer, vol. 8806, pp. 577–588 (2014)

  2. 2.

    Auckenthaler, T., Blum, V., Bungartz, H.J., Huckle, T., Johanni, R., Krämer, L., Lang, B., Lederer, H., Willems, P.R.: Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Comput. 37(12), 783–794 (2011)

    Article  Google Scholar 

  3. 3.

    Baboulin, M., Buttari, A., Dongarra, J., Kurzak, J., Langou, J., Langou, J., Luszczek, P., Tomov, S.: Accelerating scientific computations with mixed precision algorithms. Comput. Phys. Comm. 180(12), 2526–2533 (2009)

    Article  MATH  Google Scholar 

  4. 4.

    Blum, V., Gehrke, R., Hanke, F., Havu, P., Havu, V., Ren, X., Reuter, K., Scheffler, M.: Ab initio molecular simulations with numeric atom-centered orbitals. Comput. Phys. Comm. 180, 2175–2196 (2009)

    Article  MATH  Google Scholar 

  5. 5.

    Cannon, L.E.: A cellular computer to implement the Kalman filter algorithm. Ph.D. thesis, Montana State University, Bozeman, MT (1969)

  6. 6.

    Carbogno, C., Levi, C.G., Van de Walle, C.G., Scheffler, M.: Ferroelastic switching of doped zirconia: modeling and understanding from first principles. Phys. Rev. B 90, 144109 (2014)

    Article  Google Scholar 

  7. 7.

    Carbogno, C., Ramprasad, R., Scheffler, M.: Ab Initio Green–Kubo approach for the thermal conductivity of solids. Phys. Rev. Lett. 118(17), 175901 (2017)

    Article  Google Scholar 

  8. 8.

    Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34(1), A206–A239 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  9. 9.

    Galgon, M., Krämer, L., Lang, B.: Improving projection-based eigensolvers via adaptive techniques. Numer. Linear Algebra Appl. 25(1), e2124 (2017)

    MathSciNet  Article  MATH  Google Scholar 

  10. 10.

    Gavin, B., Polizzi, E.: Krylov eigenvalue strategy using the FEAST algorithm with inexact system solves. Numer. Linear Algebra Appl. p. e2188 (2018)

  11. 11.

    Havu, V., Blum, V., Havu, P., Scheffler, M.: Efficient \(O(N)\) integration for all-electron electronic structure calculation using numeric basis functions. J. Comput. Phys. 228(22), 8367–8379 (2009)

    Article  MATH  Google Scholar 

  12. 12.

    Hoemmen, M.: Communication-avoiding Krylov subspace methods. Ph.D. thesis, University of California, Berkeley (2010)

  13. 13.

    Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Bishop, A.R.: A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM J. Sci. Comput. 36(5), C401–C423 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  14. 14.

    Kreutzer, M., Thies, J., Pieper, A., Alvermann, A., Galgon, M., Röhrig-Zöllner, M., Shahzad, F., Basermann, A., Bishop, A.R., Fehske, H., Hager, G., Lang, B., Wellein, G.: Performance engineering and energy efficiency of building blocks for large, sparse eigenvalue computations on heterogeneous supercomputers. In: Bungartz, H.J., Neumann, P., Nagel, W.E. (eds.) Software for Exascale Computing—SPPEXA 2013–2015, LNCSE, vol. 113, pp. 317–338. Springer, Switzerland (2016)

    Google Scholar 

  15. 15.

    Kreutzer, M., Thies, J., Röhrig-Zöllner, M., Pieper, A., Shahzad, F., Galgon, M., Basermann, A., Fehske, H., Hager, G., Wellein, G.: GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems. Int. J. Parallel Prog. 45(5), 1046–1072 (2016)

    Article  Google Scholar 

  16. 16.

    Kühne, T.D., Krack, M., Mohamed, F.R., Parrinello, M.: Efficient and accurate Car-Parrinello-like approach to Born-Oppenheimer molecular dynamics. Phys. Rev. Lett. 98(6), 066401 (2007)

    Article  Google Scholar 

  17. 17.

    Lang, B.: Efficient reduction of banded hermitian positive definite generalized eigenvalue problems to banded standard eigenvalue problems. SIAM J. Sci. Comput. 41(1), C52–C72 (2019)

    MathSciNet  Article  MATH  Google Scholar 

  18. 18.

    Manin, V., Lang, B.: Cannon-type triangular matrix multiplication for the reduction of generalized hpd eigenproblems to standard form (2018) (Submitted)

  19. 19.

    Marek, A., Blum, V., Johanni, R., Havu, V., Lang, B., Auckenthaler, T., Heinecke, A., Bungartz, H.J., Lederer, H.: The ELPA library: Scalable parallel eigenvalue solutions for electronic structure theory and computational science. J. Phys.: Condens. Matter 26(21), 213201 (2014)

    Google Scholar 

  20. 20.

    Muller, J.M., Brisebarre, N., de Dinechin, F., Jeannerod, C.P., Lefèvre, V., Melquiond, G., Revol, N., Stehlé, D., Torres, S.: Handbook of Floating-Point Arithmetic. Springer, Berlin (2010)

    Book  MATH  Google Scholar 

  21. 21.

    Nemec, L., Blum, V., Rinke, P., Scheffler, M.: Thermodynamic equilibrium conditions of graphene films on SiC. Phys. Rev. Lett. 111(6), 065502 (2013)

    Article  Google Scholar 

  22. 22.

    Pieper, A., Kreutzer, M., Alvermann, A., Galgon, M., Fehske, H., Hager, G., Lang, B., Wellein, G.: High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations. J. Comput. Phys. 325, 226–243 (2016)

    MathSciNet  Article  MATH  Google Scholar 

  23. 23.

    Polizzi, E.: Density-matrix-based algorithm for solving eigenvalue problems. Phys. Rev. B 79(11), 115112 (2009)

    Article  Google Scholar 

  24. 24.

    Röhrig-Zöllner, M., Thies, J., Kreutzer, M., Alvermann, A., Pieper, A., Basermann, A., Hager, G., Wellein, G., Fehske, H.: Increasing the performance of the Jacobi–Davidson method by blocking. SIAM J. Sci. Comput. 37(6), C697–C722 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  25. 25.

    Rouet, F.H., Li, X.S., Ghysels, P., Napov, A.: A distributed-memory package for dense hierarchically semi-separable matrix computations using randomization. ACM Trans. Math. Softw. 42(4), 27:1–27:35 (2016)

  26. 26.

    Saad, Y.: Numerical Methods for Large Eigenvalue Problems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2011)

    Book  MATH  Google Scholar 

  27. 27.

    Sakurai, T., Sugiura, H.: A projection method for generalized eigenvalue problems using numerical integration. J. Comput. Appl. Math. 159(1), 119–128 (2003)

    MathSciNet  Article  MATH  Google Scholar 

  28. 28.

    Sakurai, T., Tadano, H.: CIRR: a Rayleigh-Ritz type method with contour integral for generalized eigenvalue problems. Hokkaido Math. J. 36, 745–757 (2007)

    MathSciNet  Article  MATH  Google Scholar 

  29. 29.

    Schönemann, P.H.: A generalized solution of the orthogonal Procrustes problem. Psychometrika 31(1), 1–10 (1966)

    MathSciNet  Article  MATH  Google Scholar 

  30. 30.

    Shahzad, F., Thies, J., Kreutzer, M., Zeiser, T., Hager, G., Wellein, G.: CRAFT: A library for easier application-level checkpoint/restart and automatic fault tolerance (2017). Preprint: arXiv:1708.02030 (Submitted)

  31. 31.

    Song, W., Wubs, F., Thies, J., Baars, S.: Numerical bifurcation analysis of a 3D turing-type reaction-diffusion model. Commun. Nonlinear Sci. Numer. Simul. 60, 145–164 (2018)

    MathSciNet  Article  Google Scholar 

  32. 32.

    Stathopoulos, A., Wu, K.: A block orthogonalization procedure with constant synchronization requirements. SIAM J. Sci. Comput. 23(6), 2165–2182 (2002)

    MathSciNet  Article  MATH  Google Scholar 

  33. 33.

    Stewart, G.W.: Block Gram–Schmidt orthogonalization. SIAM J. Sci. Comput. 31(1), 761–775 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  34. 34.

    Thies, J., Galgon, M., Shahzad, F., Alvermann, A., Kreutzer, M., Pieper, A., Röhrig-Zöllner, M., Basermann, A., Fehske, H., Hager, G., Lang, B., Wellein, G.: Towards an exascale enabled sparse solver repository. In: Bungartz, H.J., Neumann, P., Nagel, W.E. (eds.) Software for Exascale Computing—SPPEXA 2013–2015, LNCSE, vol. 113, pp. 295–316. Springer, Switzerland (2016)

    Google Scholar 

  35. 35.

    Yamamoto, Y., Nakatsukasa, Y., Yanagisawa, Y., Fukaya, T.: Roundoff error analysis of the Cholesky QR2 algorithm. Electron. Trans. Numer. Anal. 44, 306–326 (2015)

    MathSciNet  MATH  Google Scholar 

  36. 36.

    Yamazaki, I., Tomov, S., Dong, T., Dongarra, J.: Mixed-precision orthogonalization scheme and adaptive step size for improving the stability and performance of CA-GMRES on GPUs. In: Daydé, M.J., Marques, O., Nakajima, K. (eds.) High Performance Computing for Computational Science—VECPAR 2014—11th International Conference, Eugene, OR, USA, June 30–July 3, 2014, Revised Selected Papers, Lecture Notes in Computer Science, vol. 8969, pp. 17–30. Springer (2014)

  37. 37.

    Yamazaki, I., Tomov, S., Dongarra, J.: Mixed-precision Cholesky QR factorization and its case studies on multicore CPU with multiple GPUs. SIAM J. Sci. Comput. 37(3), C307–C330 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  38. 38.

    Yu, V.W., Corsetti, F., García, A., Huhn, W.P., Jacquelin, M., Jia, W., Lange, B., Lin, L., Lu, J., Mi, W., Seifitokaldani, A., Vázquez-Mayagoitia, Á., Yang, C., Yang, H., Blum, V.: ELSI: A unified software interface for Kohn-Sham electronic structure solvers. Comput. Phys. Comm. 222, 267–285 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank the unknown referees for their valuable comments that helped to improve and clarify the presentation.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Bruno Lang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work has been supported by the Deutsche Forschungsgemeinschaft through the priority programme 1648 “Software for Exascale Computing” (SPPEXA) under the project ESSEX-II and by the Federal Ministry of Education and Research through the project “Eigenvalue soLvers for Petaflop Applications—Algorithmic Extensions and Optimizations” (ELPA-AEO) under Grant No. 01H15001.

About this article

Verify currency and authenticity via CrossMark

Cite this article

Alvermann, A., Basermann, A., Bungartz, HJ. et al. Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects. Japan J. Indust. Appl. Math. 36, 699–717 (2019). https://doi.org/10.1007/s13160-019-00360-8

Download citation

Keywords

  • ELPA-AEO
  • ESSEX
  • Eigensolver
  • Parallel
  • Mixed precision

Mathematics Subject Classification

  • 65F15
  • 65F25
  • 65Y05
  • 65Y99