Abstract
The most widely used technique to allow for parallel simulations in molecular dynamics is spatial domain decomposition, where the physical geometry is divided into boxes, one per processor. This technique can inherently produce computational load imbalance when either the spatial distribution of particles or the computational cost per particle is not uniform. This paper shows the benefits of using a hybrid MPI+OpenMP model to deal with this load imbalance. We consider LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator), a prototypical molecular dynamics simulator that provides its own balancing mechanism and an OpenMP implementation for many of its modules, allowing for a hybrid setup. In this work, we extend the current OpenMP implementation of LAMMPS and optimize it and evaluate three different setups: MPI-only, MPI with the LAMMPS balance mechanism, and hybrid setup using our improved OpenMP version. This comparison is made using the five standard benchmarks included in the LAMMPS distribution plus two additional test cases. Results show that the hybrid approach can deal with load balancing problems better and more effectively (50% improvement versus MPI-only for a highly imbalanced test case) than the LAMMPS balance mechanism (only 43% improvement) and improve simulations with issues other than load imbalance.
This is a preview of subscription content, access via your institution.
















References
Thompson AP, Aktulga HM, Berger R, Bolintineanu DS, Michael Brown W, Crozier PS, in ’t Veld PJ, Kohlmeyer A, Moore SG, Nguyen TD, Shan R, Stevens M, Tranchida J, Trott C, Plimpton SJ (2021) Lammps - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Computer Physics Communications p. 108171. https://doi.org/10.1016/j.cpc.2021.108171. https://www.sciencedirect.com/science/article/pii/S0010465521002836
Plimpton S (1995) Fast parallel algorithms for short-range molecular dynamics. J Comput Phys 117(1):1–19
Devine KD, Boman EG, Heaphy RT, Hendrickson BA, Teresco JD, Faik J, Flaherty JE, Gervasio LG (2005) New challenges in dynamic load balancing. Appl Numer Math 52(2–3):133–152
Deng Y, Peierls RF, Rivera C (2000) An Adaptive Load Balancing Method for Parallel Molecular Dynamics Simulations. Journal of Computational Physics 161(1):250–263 https://doi.org/10.1006/jcph.2000.6501. http://www.sciencedirect.com/science/article/pii/S002199910096501X
Plimpton S, Pollock R, Stevens M (2000) Particle-mesh ewald and rrespa for parallel molecular dynamics simulations. In: Proceeding 8th SIAM Conference on Parallel Processing for Scientific Computing
Walshaw C, Cross M (2000) Mesh partitioning: a multilevel balancing and refinement algorithm. SIAM J Sci Comput 22(1):63–80
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Harlacher DF, Klimach H, Roller S, Siebert C, Wolf F (2012). Dynamic load balancing for unstructured meshes on space-filling curves. In: 2012 IEEE 26th international parallel and distributed processing symposium workshops & PhD forum, pp 1661–1669. IEEE (2012)
Schloegel K, Karypis G, Kumar V (2000) A unified algorithm for load-balancing adaptive scientific simulations. In: SC’00: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing, pp 59–59. IEEE (2000)
LAMMPS balance command. https://docs.lammps.org/balance.html. [Online; accessed 03-November-2021]
LAMMPS fix balance command. https://docs.lammps.org/fix_balance.html. [Online; accessed 03-November-2021]
Huang C, Lawlor O, Kale LV (2003) Adaptive mpi. In: international workshop on languages and compilers for parallel computing, pp. 306–322. Springer
Acun B, Gupta A, Jain N, Langer A, Menon H, Mikida E, Ni X, Robson M, Sun Y, Totoni E et al (2014) Parallel programming with migratable objects: Charm++ in practice. In: SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 647–658. IEEE
Adaptive MPI - Using Existing MPI Codes with AMPI. https://charm.readthedocs.io/en/latest/ampi/03-using.html. [Online; accessed 04-November-2021]
Etinski M, Corbalan J, Labarta J, Valero M, Veidenbaum A (2009). Power-aware load balancing of large scale mpi applications. In: 2009 IEEE international symposium on parallel & distributed processing, pp 1–8. IEEE
Garcia M, Corbalan J, Labarta, J (2009) LeWI: A Runtime Balancing Algorithm for Nested Parallelism. In: Proceedings of the International Conference on Parallel Processing (ICPP09)
Garcia-Gasulla M, Mantovani F, Josep-Fabrego M, Eguzkitza B, Houzeaux G (2020) Runtime mechanisms to survive new hpc architectures: a use case in human respiratory simulations. Int J High Performance Comput Appl 34(1):42–56
Rabenseifner R, Hager G, Jost G (2009) Hybrid mpi/openmp parallel programming on clusters of multi-core smp nodes. In: 2009 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp 427–436. IEEE
Rabenseifner R, Wellein G (2003) Communication and optimization aspects of parallel programming models on hybrid architectures. Int J High Performance Comput Appl 17(1):49–62
Berger R, Kloss C, Kohlmeyer A, Pirker S (2015) Hybrid parallelization of the LIGGGHTS open-source DEM code. Powder Technology 278:234–247 https://doi.org/10.1016/j.powtec.2015.03.019. https://www.sciencedirect.com/science/article/pii/S0032591015002144
Kunaseth M, Richards D, Glosli J, Kalia R, Nakano A, Vashishta P (2013) Analysis of scalable data-privatization threading algorithms for hybrid mpi/openmp parallelization of molecular dynamics. J Supercomput 66:406–430. https://doi.org/10.1007/s11227-013-0915-x
Jung J, Mori T, Sugita Y (2014) Midpoint cell method for hybrid (mpi+openmp) parallelization of molecular dynamics simulations. J Comput Chem. 35(14):1064–1072 https://doi.org/10.1002/jcc.23591. https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.23591
Pal A, Agarwala A, Raha S, Bhattacharya B (2014) Performance metrics in a hybrid mpi-openmp based molecular dynamics simulation with short-range interactions. J Parallel Distribut Comput. 74(3):2203–2214 https://doi.org/10.1016/j.jpdc.2013.12.008. https://www.sciencedirect.com/science/article/pii/S0743731513002505
OpenMP. https://www.openmp.org/. [Online; accessed 03-November-2021]
NAMD Scalable Molecular Dynamics. https://www.ks.uiuc.edu/Research/namd/. [Online; accessed 04-November-2021]
GROMACS. https://www.gromacs.org/. [Online; accessed 04-November-2021]
LAMMPS documentation, OpenMP section. https://docs.lammps.org/Speed_omp.html. [Online; accessed 04-October-2021]
POP (Performance Optimisation and Productivity, A Centre of Excellence in HPC. Patterns, Loop iterations manually distributed. https://co-design.pop-coe.eu/patterns/loop-manual-distribution.html. [Online; accessed 04-October-2021]
LAMMPS website. https://www.lammps.org/. [Online; accessed 08-November-2021]
Official LAMMPS website, benchmark section: Billion-atom LJ benchmarks. https://www.lammps.org/bench.html#billionl. [Online; accessed 29-September-2021]
Rhodopsin protein benchmark. https://www.lammps.org/bench.html#rhodo. [Online; accessed 08-November-2021]
Granular chute flow benchmark. https://www.lammps.org/bench.html#chute. [Online; accessed 08-November-2021]
Polymer chain melt benchmark. https://www.lammps.org/bench.html#chain. [Online; accessed 08-November-2021]
EAM metallic solid benchmark. https://www.lammps.org/bench.html#eam. [Online; accessed 08-November-2021]
Lennard-Jones liquid benchmark. https://www.lammps.org/bench.html#lj. [Online; accessed 08-November-2021]
Vassaux M, Sinclair RC, Richardson RA, Suter JL, Coveney PV (2019) The role of graphene in enhancing the material properties of thermosetting polymers. Adv Theor Simulations. 2(5):1800168 https://doi.org/10.1002/adts.201800168. https://onlinelibrary.wiley.com/doi/abs/10.1002/adts.201800168
Suter JL, Sinclair RC, Coveney PV (2020) Principles governing control of aggregation and dispersion of graphene and graphene oxide in polymer melts. Adv Mater. 32(36):2003213 https://doi.org/10.1002/adma.202003213. https://onlinelibrary.wiley.com/doi/abs/10.1002/adma.202003213
Barcelona Supercomputing Center: Extrae. https://tools.bsc.es/extrae. [Online; accessed 03-November-2021]
Servat H et al (2013) Framework for a productive performance optimization. Parallel Comput 39(8):336–353
Terpstra D, Jagode H, You H, Dongarra J (2010) Collecting performance data with papi-c. In: Müller MS, Resch MM, Schulz A, Nagel WE (eds) Tools for High Performance Computing 2009. Springer, Berlin, pp 157–173
Barcelona Supercomputing Center: Paraver. https://tools.bsc.es/paraver. [Online; accessed 03-November-2021]
Pillet V, Labarta J, Cortes T, Girona S (1995) Paraver: A tool to visualize and analyze parallel code. In: Proceedings of WoTUG-18: Transputer and Occam Developments, 44, pp 17–31
Wagner M, Mohr S, Giménez J, Labarta J (2017) A structured approach to performance analysis. In: International Workshop on Parallel Tools for High Performance Computing, pp 1–15. Springer
Banchelli F, Peiro K, Querol A, Ramirez-Gargallo G, Ramirez-Miranda G, Vinyals J, Vizcaino P, Garcia-Gasulla M, Mantovani F (2020) Performance study of hpc applications on an arm-based cluster using a generic efficiency model. In: 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 167–174. IEEE
Fincham D (1987) Parallel computers and molecular simulation. Mol Simul 1(1–2):1–45. https://doi.org/10.1080/08927028708080929
Smith W (1991) Molecular dynamics on hypercube parallel computers. Comput Phys Commun. 62(2):229–248 https://doi.org/10.1016/0010-4655(91)90097-5. http://www.sciencedirect.com/science/article/pii/0010465591900975
Plimpton S, Hendrickson B (1996) A new parallel method for molecular dynamics simulation of macromolecular systems. J Comput Chem 17(3):326–337. https://doi.org/10.1002/(SICI)1096-987X
Rabenseifner R, Hager G, Jost G (2009) Hybrid mpi/openmp parallel programming on clusters of multi-core smp nodes. In: 2009 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp 427–436. https://doi.org/10.1109/PDP.2009.43
LAMMPS release 20 Nov 2019. https://github.com/lammps/lammps/releases/tag/patch_20Nov2019. [Online; accessed 08-November-2021]
Marenostrum4. https://www.bsc.es/marenostrum/marenostrum. [Online; accessed 03-November-2021]
Acknowledgements
This work is partially supported by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology (TIN2015-65316-P), by the Generalitat de Catalunya (2017-SGR-1414), and by the European POP CoE (GA n. 824080). This work is also funded as part of the European Union Horizon 2020 research and innovation program under grant agreement nos. 800925 (VECMA project; www.vecma.eu) and 823712 (CompBioMed2 Centre of Excellence; www.compbiomed.eu), as well as the UK EPSRC for the UK High-End Computing Consortium (grant no. EP/R029598/1).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Morillo, J., Vassaux, M., Coveney, P.V. et al. Hybrid parallelization of molecular dynamics simulations to reduce load imbalance. J Supercomput 78, 9184–9215 (2022). https://doi.org/10.1007/s11227-021-04214-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-04214-4
Keywords
- Load Balance
- Parallel computing
- Molecular dynamics
- MPI
- OpenMP
- Hybrid programming model