Skip to main content

Hybrid parallelization of molecular dynamics simulations to reduce load imbalance

Abstract

The most widely used technique to allow for parallel simulations in molecular dynamics is spatial domain decomposition, where the physical geometry is divided into boxes, one per processor. This technique can inherently produce computational load imbalance when either the spatial distribution of particles or the computational cost per particle is not uniform. This paper shows the benefits of using a hybrid MPI+OpenMP model to deal with this load imbalance. We consider LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator), a prototypical molecular dynamics simulator that provides its own balancing mechanism and an OpenMP implementation for many of its modules, allowing for a hybrid setup. In this work, we extend the current OpenMP implementation of LAMMPS and optimize it and evaluate three different setups: MPI-only, MPI with the LAMMPS balance mechanism, and hybrid setup using our improved OpenMP version. This comparison is made using the five standard benchmarks included in the LAMMPS distribution plus two additional test cases. Results show that the hybrid approach can deal with load balancing problems better and more effectively (50% improvement versus MPI-only for a highly imbalanced test case) than the LAMMPS balance mechanism (only 43% improvement) and improve simulations with issues other than load imbalance.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Notes

  1. https://pop-coe.eu/ .

References

  1. Thompson AP, Aktulga HM, Berger R, Bolintineanu DS, Michael Brown W, Crozier PS, in ’t Veld PJ, Kohlmeyer A, Moore SG, Nguyen TD, Shan R, Stevens M, Tranchida J, Trott C, Plimpton SJ (2021) Lammps - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Computer Physics Communications p. 108171. https://doi.org/10.1016/j.cpc.2021.108171. https://www.sciencedirect.com/science/article/pii/S0010465521002836

  2. Plimpton S (1995) Fast parallel algorithms for short-range molecular dynamics. J Comput Phys 117(1):1–19

    Article  Google Scholar 

  3. Devine KD, Boman EG, Heaphy RT, Hendrickson BA, Teresco JD, Faik J, Flaherty JE, Gervasio LG (2005) New challenges in dynamic load balancing. Appl Numer Math 52(2–3):133–152

    Article  MathSciNet  Google Scholar 

  4. Deng Y, Peierls RF, Rivera C (2000) An Adaptive Load Balancing Method for Parallel Molecular Dynamics Simulations. Journal of Computational Physics 161(1):250–263 https://doi.org/10.1006/jcph.2000.6501. http://www.sciencedirect.com/science/article/pii/S002199910096501X

  5. Plimpton S, Pollock R, Stevens M (2000) Particle-mesh ewald and rrespa for parallel molecular dynamics simulations. In: Proceeding 8th SIAM Conference on Parallel Processing for Scientific Computing

  6. Walshaw C, Cross M (2000) Mesh partitioning: a multilevel balancing and refinement algorithm. SIAM J Sci Comput 22(1):63–80

    Article  MathSciNet  Google Scholar 

  7. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392

    Article  MathSciNet  Google Scholar 

  8. Harlacher DF, Klimach H, Roller S, Siebert C, Wolf F (2012). Dynamic load balancing for unstructured meshes on space-filling curves. In: 2012 IEEE 26th international parallel and distributed processing symposium workshops & PhD forum, pp 1661–1669. IEEE (2012)

  9. Schloegel K, Karypis G, Kumar V (2000) A unified algorithm for load-balancing adaptive scientific simulations. In: SC’00: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing, pp 59–59. IEEE (2000)

  10. LAMMPS balance command. https://docs.lammps.org/balance.html. [Online; accessed 03-November-2021]

  11. LAMMPS fix balance command. https://docs.lammps.org/fix_balance.html. [Online; accessed 03-November-2021]

  12. Huang C, Lawlor O, Kale LV (2003) Adaptive mpi. In: international workshop on languages and compilers for parallel computing, pp. 306–322. Springer

  13. Acun B, Gupta A, Jain N, Langer A, Menon H, Mikida E, Ni X, Robson M, Sun Y, Totoni E et al (2014) Parallel programming with migratable objects: Charm++ in practice. In: SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 647–658. IEEE

  14. Adaptive MPI - Using Existing MPI Codes with AMPI. https://charm.readthedocs.io/en/latest/ampi/03-using.html. [Online; accessed 04-November-2021]

  15. Etinski M, Corbalan J, Labarta J, Valero M, Veidenbaum A (2009). Power-aware load balancing of large scale mpi applications. In: 2009 IEEE international symposium on parallel & distributed processing, pp 1–8. IEEE

  16. Garcia M, Corbalan J, Labarta, J (2009) LeWI: A Runtime Balancing Algorithm for Nested Parallelism. In: Proceedings of the International Conference on Parallel Processing (ICPP09)

  17. Garcia-Gasulla M, Mantovani F, Josep-Fabrego M, Eguzkitza B, Houzeaux G (2020) Runtime mechanisms to survive new hpc architectures: a use case in human respiratory simulations. Int J High Performance Comput Appl 34(1):42–56

    Article  Google Scholar 

  18. Rabenseifner R, Hager G, Jost G (2009) Hybrid mpi/openmp parallel programming on clusters of multi-core smp nodes. In: 2009 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp 427–436. IEEE

  19. Rabenseifner R, Wellein G (2003) Communication and optimization aspects of parallel programming models on hybrid architectures. Int J High Performance Comput Appl 17(1):49–62

    Article  Google Scholar 

  20. Berger R, Kloss C, Kohlmeyer A, Pirker S (2015) Hybrid parallelization of the LIGGGHTS open-source DEM code. Powder Technology 278:234–247 https://doi.org/10.1016/j.powtec.2015.03.019. https://www.sciencedirect.com/science/article/pii/S0032591015002144

  21. Kunaseth M, Richards D, Glosli J, Kalia R, Nakano A, Vashishta P (2013) Analysis of scalable data-privatization threading algorithms for hybrid mpi/openmp parallelization of molecular dynamics. J Supercomput 66:406–430. https://doi.org/10.1007/s11227-013-0915-x

    Article  Google Scholar 

  22. Jung J, Mori T, Sugita Y (2014) Midpoint cell method for hybrid (mpi+openmp) parallelization of molecular dynamics simulations. J Comput Chem. 35(14):1064–1072 https://doi.org/10.1002/jcc.23591. https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.23591

  23. Pal A, Agarwala A, Raha S, Bhattacharya B (2014) Performance metrics in a hybrid mpi-openmp based molecular dynamics simulation with short-range interactions. J Parallel Distribut Comput. 74(3):2203–2214 https://doi.org/10.1016/j.jpdc.2013.12.008. https://www.sciencedirect.com/science/article/pii/S0743731513002505

  24. OpenMP. https://www.openmp.org/. [Online; accessed 03-November-2021]

  25. NAMD Scalable Molecular Dynamics. https://www.ks.uiuc.edu/Research/namd/. [Online; accessed 04-November-2021]

  26. GROMACS. https://www.gromacs.org/. [Online; accessed 04-November-2021]

  27. LAMMPS documentation, OpenMP section. https://docs.lammps.org/Speed_omp.html. [Online; accessed 04-October-2021]

  28. POP (Performance Optimisation and Productivity, A Centre of Excellence in HPC. Patterns, Loop iterations manually distributed. https://co-design.pop-coe.eu/patterns/loop-manual-distribution.html. [Online; accessed 04-October-2021]

  29. LAMMPS website. https://www.lammps.org/. [Online; accessed 08-November-2021]

  30. Official LAMMPS website, benchmark section: Billion-atom LJ benchmarks. https://www.lammps.org/bench.html#billionl. [Online; accessed 29-September-2021]

  31. Rhodopsin protein benchmark. https://www.lammps.org/bench.html#rhodo. [Online; accessed 08-November-2021]

  32. Granular chute flow benchmark. https://www.lammps.org/bench.html#chute. [Online; accessed 08-November-2021]

  33. Polymer chain melt benchmark. https://www.lammps.org/bench.html#chain. [Online; accessed 08-November-2021]

  34. EAM metallic solid benchmark. https://www.lammps.org/bench.html#eam. [Online; accessed 08-November-2021]

  35. Lennard-Jones liquid benchmark. https://www.lammps.org/bench.html#lj. [Online; accessed 08-November-2021]

  36. Vassaux M, Sinclair RC, Richardson RA, Suter JL, Coveney PV (2019) The role of graphene in enhancing the material properties of thermosetting polymers. Adv Theor Simulations. 2(5):1800168 https://doi.org/10.1002/adts.201800168. https://onlinelibrary.wiley.com/doi/abs/10.1002/adts.201800168

  37. Suter JL, Sinclair RC, Coveney PV (2020) Principles governing control of aggregation and dispersion of graphene and graphene oxide in polymer melts. Adv Mater. 32(36):2003213 https://doi.org/10.1002/adma.202003213. https://onlinelibrary.wiley.com/doi/abs/10.1002/adma.202003213

  38. Barcelona Supercomputing Center: Extrae. https://tools.bsc.es/extrae. [Online; accessed 03-November-2021]

  39. Servat H et al (2013) Framework for a productive performance optimization. Parallel Comput 39(8):336–353

    Article  Google Scholar 

  40. Terpstra D, Jagode H, You H, Dongarra J (2010) Collecting performance data with papi-c. In: Müller MS, Resch MM, Schulz A, Nagel WE (eds) Tools for High Performance Computing 2009. Springer, Berlin, pp 157–173

    Chapter  Google Scholar 

  41. Barcelona Supercomputing Center: Paraver. https://tools.bsc.es/paraver. [Online; accessed 03-November-2021]

  42. Pillet V, Labarta J, Cortes T, Girona S (1995) Paraver: A tool to visualize and analyze parallel code. In: Proceedings of WoTUG-18: Transputer and Occam Developments, 44, pp 17–31

  43. Wagner M, Mohr S, Giménez J, Labarta J (2017) A structured approach to performance analysis. In: International Workshop on Parallel Tools for High Performance Computing, pp 1–15. Springer

  44. Banchelli F, Peiro K, Querol A, Ramirez-Gargallo G, Ramirez-Miranda G, Vinyals J, Vizcaino P, Garcia-Gasulla M, Mantovani F (2020) Performance study of hpc applications on an arm-based cluster using a generic efficiency model. In: 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 167–174. IEEE

  45. Fincham D (1987) Parallel computers and molecular simulation. Mol Simul 1(1–2):1–45. https://doi.org/10.1080/08927028708080929

    Article  Google Scholar 

  46. Smith W (1991) Molecular dynamics on hypercube parallel computers. Comput Phys Commun. 62(2):229–248 https://doi.org/10.1016/0010-4655(91)90097-5. http://www.sciencedirect.com/science/article/pii/0010465591900975

  47. Plimpton S, Hendrickson B (1996) A new parallel method for molecular dynamics simulation of macromolecular systems. J Comput Chem 17(3):326–337. https://doi.org/10.1002/(SICI)1096-987X

    Article  Google Scholar 

  48. Rabenseifner R, Hager G, Jost G (2009) Hybrid mpi/openmp parallel programming on clusters of multi-core smp nodes. In: 2009 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp 427–436. https://doi.org/10.1109/PDP.2009.43

  49. LAMMPS release 20 Nov 2019. https://github.com/lammps/lammps/releases/tag/patch_20Nov2019. [Online; accessed 08-November-2021]

  50. Marenostrum4. https://www.bsc.es/marenostrum/marenostrum. [Online; accessed 03-November-2021]

Download references

Acknowledgements

This work is partially supported by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology (TIN2015-65316-P), by the Generalitat de Catalunya (2017-SGR-1414), and by the European POP CoE (GA n. 824080). This work is also funded as part of the European Union Horizon 2020 research and innovation program under grant agreement nos. 800925 (VECMA project; www.vecma.eu) and 823712 (CompBioMed2 Centre of Excellence; www.compbiomed.eu), as well as the UK EPSRC for the UK High-End Computing Consortium (grant no. EP/R029598/1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julian Morillo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Morillo, J., Vassaux, M., Coveney, P.V. et al. Hybrid parallelization of molecular dynamics simulations to reduce load imbalance. J Supercomput 78, 9184–9215 (2022). https://doi.org/10.1007/s11227-021-04214-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04214-4

Keywords

  • Load Balance
  • Parallel computing
  • Molecular dynamics
  • MPI
  • OpenMP
  • Hybrid programming model