Porting VASP from MPI to MPI+OpenMP [SIMD]

Wende, Florian; Marsman, Martijn; Zhao, Zhengji; Kim, Jeongnim

doi:10.1007/978-3-319-65578-9_8

Florian Wende¹⁸,
Martijn Marsman¹⁹,
Zhengji Zhao²⁰ &
…
Jeongnim Kim²¹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10468))

Included in the following conference series:

International Workshop on OpenMP

1514 Accesses
5 Citations

Abstract

We describe for the VASP application (a widely used electronic structure code written in FORTRAN) the transition from an MPI-only to a hybrid code base leveraging the three relevant levels of parallelism to be addressed when optimizing for an effective execution on modern computer platforms: multiprocessing, multithreading and SIMD vectorization. To achieve code portability, we draw on MPI parallelization together with OpenMP threading and SIMD constructs. Combining the latter can be challenging in complex code bases. Optimization targets are combining multithreading and vectorization in different calling contexts as well as whole function vectorization. In addition to outlining design decisions made throughout the code transformation process, we will demonstrate the effectiveness of the code adaptations using different compilers (GNU, Intel) and target platforms (CPU, Intel Xeon Phi (KNL)).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Benchmarks were done on Cori, a Cray XC40 system at NERSC. It has over 9300 Intel Xeon Phi 7250 (KNL) nodes with 68 CPU cores (272 threads) @1.4 GHz and 96 GB DDR4 main memory per node. In addition, Cori has over 2000 dual-socket 16-core Intel Xeon E5-2698v3 (“Haswell”) nodes, each with 32 CPU cores (64 threads) @2.3 GHz, a 256-bit wide vector unit per CPU core, and 128 GB DDR4 memory. Cori’s nodes are interconnected with Cray’s Aries network with Dragonfly topology. A comprehensive study of the different kinds of parameters and options when building and running VASP on Cori is given in [8].
2.
At the time of the writing of this paper, we used the GNU compiler gfortran-6.3. This version does not fully support OpenMP 4.5 for Fortran (the same seems to be true for gfortran- 7.1—tested on a local workstation). For remarks on that, see the text below.
3.
gfortran-6.3 found fault with the !$omp declare simd (foo) directive for subroutine definitions within Fortran modules (not so for functions): it states that foo has been host associated already. Working around by moving subroutines outside the module causes conflicts with variable scoping. We did not implement that workaround, as subroutine vectorization fails only with the GNU compiler, and only in the module context.

References

Kresse, G., Furthmüller, J.: Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996)
Article Google Scholar
Kresse, G., Furthmüller, J.: Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6(1), 15–50 (1996)
Article Google Scholar
Marsman, M., Paier, J., Stroppa, A., Kresse, G.: Hybrid functionals applied to extended systems. J. Phys. Condens. Matter 20(6), 064201 (2008)
Article Google Scholar
Kaltak, M., Klimeš, J., Kresse, G.: Cubic scaling algorithm for the random phase approximation: self-interstitials and vacancies in Si. Phys. Rev. B Condens. Matter Mater. Phys. 90(5), 054115–054115 (2014)
Article Google Scholar
Liu, P., Kaltak, M., Klimeš, J., Kresse, G.: Cubic scaling $GW$: towards fast quasiparticle calculations. Phys. Rev. B: Condens. Matter 94(16), 165109 (2016)
Article Google Scholar
Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation Intel Xeon Phi product. IEEE Micro 36(2), 34–46 (2016)
Article Google Scholar
Kresse, G., Joubert, D.: From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B 59, 1758–1775 (1999)
Article Google Scholar
Zhao, Z., Marsman, M., Wende, F., Kim, J.: Performance of hybrid MPI/OpenMP VASP on Cray XC40 based on Intel Knights landing many integrated core architecture. In: CUG Proceedings (2017)
Google Scholar
Klemm, M., Duran, A., Tian, X., Saito, H., Caballero, D., Martorell, X.: Extending OpenMP* with vector constructs for modern multicore SIMD architectures. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 59–72. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30961-8_5
Chapter Google Scholar
OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 4.0. (2013). http://www.openmp.org
OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 4.5. (2015). http://www.openmp.org/
Wende, F., Noack, M., Schütt, T., Sachs, S., Steinke, T.: Application performance on a Cray XC30 evaluation system with Xeon Phi coprocessors at HLRN-III. In: Cray User Group (2015)
Google Scholar
Wende, F., Noack, M., Steinke, T., Klemm, M., Zitzlsberger, G., Newburn, C.J.: Portable SIMD performance with OpenMP* 4.x compiler directives. In: Euro-Par 2016, Parallel Processing, 22nd International Conference on Parallel and Distributed Computing (2016)
Google Scholar
Senkevich, A.: Libmvec (2015). https://sourceware.org/glibc/wiki/libmvec

Download references

Acknowledgements

This work is (partially) supported by Intel within the IPCC activities at ZIB, by the ASCAR Office in the DOE, Office of Science, under contract number DE-AC02-05CH11231. It used the resources of National Energy Scientific Computing Center (NERSC).

Author information

Authors and Affiliations

Zuse Institute Berlin, Berlin, Germany
Florian Wende
University of Vienna, Vienna, Austria
Martijn Marsman
National Energy Research Scientific Computing Center, Berkeley, USA
Zhengji Zhao
Intel Corporation, Hillsboro, USA
Jeongnim Kim

Authors

Florian Wende
View author publications
You can also search for this author in PubMed Google Scholar
Martijn Marsman
View author publications
You can also search for this author in PubMed Google Scholar
Zhengji Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jeongnim Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florian Wende .

Editor information

Editors and Affiliations

Lawrence Livermore National Laboratory, Livermore, California, USA
Bronis R. de Supinski
Sandia National Laboratories, Albuquerque, New Mexico, USA
Stephen L. Olivier
RWTH Aachen University, Aachen, Germany
Christian Terboven
Stony Brook University, Stony Brook, New York, USA
Barbara M. Chapman
RWTH Aachen University, Aachen, Germany
Matthias S. Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wende, F., Marsman, M., Zhao, Z., Kim, J. (2017). Porting VASP from MPI to MPI+OpenMP [SIMD]. In: de Supinski, B., Olivier, S., Terboven, C., Chapman, B., Müller, M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science(), vol 10468. Springer, Cham. https://doi.org/10.1007/978-3-319-65578-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-65578-9_8
Published: 17 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65577-2
Online ISBN: 978-3-319-65578-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics