Optimization of Condensed Matter Physics Application with OpenMP Tasking Model
The Density Matrix Renormalization Group (DMRG++) is a condensed matter physics application used to study superconductivity properties of materials. It’s main computations consist of calculating hamiltonian matrix which requires sparse matrix-vector multiplications. This paper presents task-based parallelization and optimization strategies of the Hamiltonian algorithm. The algorithm is implemented as a mini-application in C++ and parallelized with OpenMP. The optimization leverages tasking features, such as dependencies or priorities included in the OpenMP standard 4.5. The code refactoring targets performance as much as programmability. The optimized version achieves a speedup of 8.0\(\times \) with 8 threads and 20.5\(\times \) with 40 threads on a Power9 computing node while reducing the memory consumption to 90 MB with respect to the original code, by adding less than ten OpenMP directives.
KeywordsOpenMP Tasks Dependencies Optimization Analysis
This work is partially supported by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology (project TIN2015-65316-P), by the Generalitat de Catalunya (contract 2017-SGR-1414) and by the BSC-IBM Deep Learning Research Agreement, under JSA “Application porting, analysis and optimization for POWER and POWER AI”. This work was partially supported by the Scientific Discovery through Advanced Computing (SciDAC) program funded by U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research and Basic Energy Sciences, Division of Materials Sciences and Engineering. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
- 1.Extrae website. https://tools.bsc.es/extrae. Accessed May 2019
- 2.Paraver website. https://tools.bsc.es/paraver. Accessed June 2019
- 3.Power9 CTE User’s Guide. https://www.bsc.es/support/POWER_CTE-ug.pdf. Accessed May 2019
- 4.Alvarez, G.: DMRG++ website. https://g1257.github.com/dmrgPlusPlus
- 8.Chatterjee, A., Alvarez, G., D’Azevedo, E., Elwasif, W., Hernandez, O., Sarkar, V.: Porting DMRG++ scientific application to OpenPOWER. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds.) ISC High Performance 2018. LNCS, vol. 11203, pp. 418–431. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02465-9_29CrossRefGoogle Scholar
- 10.Garcia-Gasulla, M., Mantovani, F., Josep-Fabrego, M., Eguzkitza, B., Houzeaux, G.: Runtime mechanisms to survive new HPC architectures: a use case in human respiratory simulations. Int. J. High Perform. Comput. Appl. (2019, online)Google Scholar
- 11.Llort, G., Servat, H., González, J., Giménez, J., Labarta, J.: On the usefulness of object tracking techniques in performance analysis. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 29. ACM (2013)Google Scholar
- 12.Martineau, M., McIntosh-Smith, S.: The productivity, portability and performance of OpenMP 4.5 for scientific applications targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 185–200. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_13CrossRefGoogle Scholar
- 13.OpenMP Architecture Review Board: OpenMP 4.5 Specification. Technical report, November 2015. https://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
- 14.Pillet, V., Labarta, J., Cortes, T., Girona, S.: PARAVER: a tool to visualize and analyze parallel code. In: Proceedings of WoTUG-18: Transputer and OCCAM Developments, vol. 44, pp. 17–31. IOS Press (1995)Google Scholar