Abstract
We describe for the VASP application (a widely used electronic structure code written in FORTRAN) the transition from an MPI-only to a hybrid code base leveraging the three relevant levels of parallelism to be addressed when optimizing for an effective execution on modern computer platforms: multiprocessing, multithreading and SIMD vectorization. To achieve code portability, we draw on MPI parallelization together with OpenMP threading and SIMD constructs. Combining the latter can be challenging in complex code bases. Optimization targets are combining multithreading and vectorization in different calling contexts as well as whole function vectorization. In addition to outlining design decisions made throughout the code transformation process, we will demonstrate the effectiveness of the code adaptations using different compilers (GNU, Intel) and target platforms (CPU, Intel Xeon Phi (KNL)).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Benchmarks were done on Cori, a Cray XC40 system at NERSC. It has over 9300 Intel Xeon Phi 7250 (KNL) nodes with 68 CPU cores (272 threads) @1.4 GHz and 96 GB DDR4 main memory per node. In addition, Cori has over 2000 dual-socket 16-core Intel Xeon E5-2698v3 (“Haswell”) nodes, each with 32 CPU cores (64 threads) @2.3 GHz, a 256-bit wide vector unit per CPU core, and 128 GB DDR4 memory. Cori’s nodes are interconnected with Cray’s Aries network with Dragonfly topology. A comprehensive study of the different kinds of parameters and options when building and running VASP on Cori is given in [8].
- 2.
At the time of the writing of this paper, we used the GNU compiler gfortran-6.3. This version does not fully support OpenMP 4.5 for Fortran (the same seems to be true for gfortran- 7.1—tested on a local workstation). For remarks on that, see the text below.
- 3.
gfortran-6.3 found fault with the !$omp declare simd (foo) directive for subroutine definitions within Fortran modules (not so for functions): it states that foo has been host associated already. Working around by moving subroutines outside the module causes conflicts with variable scoping. We did not implement that workaround, as subroutine vectorization fails only with the GNU compiler, and only in the module context.
References
Kresse, G., Furthmüller, J.: Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996)
Kresse, G., Furthmüller, J.: Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6(1), 15–50 (1996)
Marsman, M., Paier, J., Stroppa, A., Kresse, G.: Hybrid functionals applied to extended systems. J. Phys. Condens. Matter 20(6), 064201 (2008)
Kaltak, M., Klimeš, J., Kresse, G.: Cubic scaling algorithm for the random phase approximation: self-interstitials and vacancies in Si. Phys. Rev. B Condens. Matter Mater. Phys. 90(5), 054115–054115 (2014)
Liu, P., Kaltak, M., Klimeš, J., Kresse, G.: Cubic scaling \(GW\): towards fast quasiparticle calculations. Phys. Rev. B: Condens. Matter 94(16), 165109 (2016)
Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation Intel Xeon Phi product. IEEE Micro 36(2), 34–46 (2016)
Kresse, G., Joubert, D.: From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B 59, 1758–1775 (1999)
Zhao, Z., Marsman, M., Wende, F., Kim, J.: Performance of hybrid MPI/OpenMP VASP on Cray XC40 based on Intel Knights landing many integrated core architecture. In: CUG Proceedings (2017)
Klemm, M., Duran, A., Tian, X., Saito, H., Caballero, D., Martorell, X.: Extending OpenMP* with vector constructs for modern multicore SIMD architectures. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 59–72. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30961-8_5
OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 4.0. (2013). http://www.openmp.org
OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 4.5. (2015). http://www.openmp.org/
Wende, F., Noack, M., Schütt, T., Sachs, S., Steinke, T.: Application performance on a Cray XC30 evaluation system with Xeon Phi coprocessors at HLRN-III. In: Cray User Group (2015)
Wende, F., Noack, M., Steinke, T., Klemm, M., Zitzlsberger, G., Newburn, C.J.: Portable SIMD performance with OpenMP* 4.x compiler directives. In: Euro-Par 2016, Parallel Processing, 22nd International Conference on Parallel and Distributed Computing (2016)
Senkevich, A.: Libmvec (2015). https://sourceware.org/glibc/wiki/libmvec
Acknowledgements
This work is (partially) supported by Intel within the IPCC activities at ZIB, by the ASCAR Office in the DOE, Office of Science, under contract number DE-AC02-05CH11231. It used the resources of National Energy Scientific Computing Center (NERSC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wende, F., Marsman, M., Zhao, Z., Kim, J. (2017). Porting VASP from MPI to MPI+OpenMP [SIMD]. In: de Supinski, B., Olivier, S., Terboven, C., Chapman, B., Müller, M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science(), vol 10468. Springer, Cham. https://doi.org/10.1007/978-3-319-65578-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-65578-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65577-2
Online ISBN: 978-3-319-65578-9
eBook Packages: Computer ScienceComputer Science (R0)