Advertisement

Optimization of Condensed Matter Physics Application with OpenMP Tasking Model

  • Joel CriadoEmail author
  • Marta Garcia-Gasulla
  • Jesús Labarta
  • Arghya Chatterjee
  • Oscar Hernandez
  • Raül Sirvent
  • Gonzalo Alvarez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11718)

Abstract

The Density Matrix Renormalization Group (DMRG++) is a condensed matter physics application used to study superconductivity properties of materials. It’s main computations consist of calculating hamiltonian matrix which requires sparse matrix-vector multiplications. This paper presents task-based parallelization and optimization strategies of the Hamiltonian algorithm. The algorithm is implemented as a mini-application in C++ and parallelized with OpenMP. The optimization leverages tasking features, such as dependencies or priorities included in the OpenMP standard 4.5. The code refactoring targets performance as much as programmability. The optimized version achieves a speedup of 8.0\(\times \) with 8 threads and 20.5\(\times \) with 40 threads on a Power9 computing node while reducing the memory consumption to 90 MB with respect to the original code, by adding less than ten OpenMP directives.

Keywords

OpenMP Tasks Dependencies Optimization Analysis 

Notes

Acknowledgments

This work is partially supported by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology (project TIN2015-65316-P), by the Generalitat de Catalunya (contract 2017-SGR-1414) and by the BSC-IBM Deep Learning Research Agreement, under JSA “Application porting, analysis and optimization for POWER and POWER AI”. This work was partially supported by the Scientific Discovery through Advanced Computing (SciDAC) program funded by U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research and Basic Energy Sciences, Division of Materials Sciences and Engineering. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

References

  1. 1.
    Extrae website. https://tools.bsc.es/extrae. Accessed May 2019
  2. 2.
    Paraver website. https://tools.bsc.es/paraver. Accessed June 2019
  3. 3.
    Power9 CTE User’s Guide. https://www.bsc.es/support/POWER_CTE-ug.pdf. Accessed May 2019
  4. 4.
    Alvarez, G.: DMRG++ website. https://g1257.github.com/dmrgPlusPlus
  5. 5.
    Alvarez, G.: Implementation of the SU(2) Hamiltonian symmetry for the DMRG algorithm. Comput. Phys. Commun. 183, 2226–2232 (2012)CrossRefGoogle Scholar
  6. 6.
    Alvarez, G.: The density matrix renormalization group for strongly correlated electron systems: a generic implementation. Comput. Phys. Commun. 180(9), 1572–1578 (2009)CrossRefGoogle Scholar
  7. 7.
    Cajas, J.C., et al.: Fluid-structure interaction based on HPC multicode coupling. SIAM J. Sci. Comput. 40(6), C677–C703 (2018)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Chatterjee, A., Alvarez, G., D’Azevedo, E., Elwasif, W., Hernandez, O., Sarkar, V.: Porting DMRG++ scientific application to OpenPOWER. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds.) ISC High Performance 2018. LNCS, vol. 11203, pp. 418–431. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-02465-9_29CrossRefGoogle Scholar
  9. 9.
    Garcia, M., Labarta, J., Corbalan, J.: Hints to improve automatic load balancing with lewi for hybrid applications. J. Parallel Distrib. Comput. 74(9), 2781–2794 (2014) CrossRefGoogle Scholar
  10. 10.
    Garcia-Gasulla, M., Mantovani, F., Josep-Fabrego, M., Eguzkitza, B., Houzeaux, G.: Runtime mechanisms to survive new HPC architectures: a use case in human respiratory simulations. Int. J. High Perform. Comput. Appl. (2019, online)Google Scholar
  11. 11.
    Llort, G., Servat, H., González, J., Giménez, J., Labarta, J.: On the usefulness of object tracking techniques in performance analysis. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 29. ACM (2013)Google Scholar
  12. 12.
    Martineau, M., McIntosh-Smith, S.: The productivity, portability and performance of OpenMP 4.5 for scientific applications targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 185–200. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-65578-9_13CrossRefGoogle Scholar
  13. 13.
    OpenMP Architecture Review Board: OpenMP 4.5 Specification. Technical report, November 2015. https://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
  14. 14.
    Pillet, V., Labarta, J., Cortes, T., Girona, S.: PARAVER: a tool to visualize and analyze parallel code. In: Proceedings of WoTUG-18: Transputer and OCCAM Developments, vol. 44, pp. 17–31. IOS Press (1995)Google Scholar
  15. 15.
    Sadasivam, S.K., Thompto, B.W., Kalla, R., Starke, W.J.: IBM Power9 processor architecture. IEEE Micro 37(2), 40–51 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Joel Criado
    • 1
    Email author
  • Marta Garcia-Gasulla
    • 1
  • Jesús Labarta
    • 1
  • Arghya Chatterjee
    • 2
  • Oscar Hernandez
    • 2
  • Raül Sirvent
    • 1
  • Gonzalo Alvarez
    • 2
  1. 1.Barcelona Supercomputing CenterBarcelonaSpain
  2. 2.Oak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations