Computational Particle Mechanics

, Volume 6, Issue 2, pp 271–295 | Cite as

Comparison between pure MPI and hybrid MPI-OpenMP parallelism for Discrete Element Method (DEM) of ellipsoidal and poly-ellipsoidal particles

  • Beichuan YanEmail author
  • Richard A. Regueiro


Parallel computing of 3D Discrete Element Method (DEM) simulations can be achieved in different modes, and two of them are pure MPI and hybrid MPI-OpenMP. The hybrid MPI-OpenMP mode allows flexibly combined mapping schemes on contemporary multiprocessing supercomputers. This paper profiles computational components and floating-point operation features of complex-shaped 3D DEM, develops a space decomposition-based MPI parallelism and various thread-based OpenMP parallelism, and carries out performance comparison and analysis from intranode to internode scales across four orders of magnitude of problem size (namely, number of particles). The influences of memory/cache hierarchy, processes/threads pinning, variation of hybrid MPI-OpenMP mapping scheme, ellipsoid versus poly-ellipsoid are carefully examined. It is found that OpenMP is able to achieve high efficiency in interparticle contact detection, but the unparallelizable code prevents it from achieving the same high efficiency for overall performance; pure MPI achieves not only lower computational granularity (thus higher spatial locality of particles) but also lower communication granularity (thus faster MPI transmission) than hybrid MPI-OpenMP using the same computational resources; the cache miss rate is sensitive to the memory consumption shrinkage per processor, and the last level cache contributes most significantly to the strong superlinear speedup among all of the three cache levels of modern microprocessors; in hybrid MPI-OpenMPI mode, as the number of MPI processes increases (and the number of threads per MPI processes decreases accordingly), the total execution time decreases, until the maximum performance is obtained at pure MPI mode; the processes/threads pinning on NUMA architectures improves performance significantly when there are multiple threads per process, whereas the improvement becomes less pronounced when the number of threads per process decreases; both the communication time and computation time increase substantially from ellipsoids to poly-ellipsoids. Overall, pure MPI outperforms hybrid MPI-OpenMP in 3D DEM modeling of ellipsoidal and poly-ellipsoidal particles.


Parallelism Discrete element Ellipsoidal MPI OpenMP Granularity 



We would like to acknowledge the support provided by ONR MURI Grant N00014-11-1-0691, and the DoD High Performance Computing Modernization Program (HPCMP) for granting us the computing resources required to conduct this work.

Compliance with ethical standards

Conflict of interest

The authors declare that there is no conflict of interest.


  1. 1.
    Baugh JW Jr, Konduri R (2001) Discrete element modelling on a cluster of workstations. Eng Comput 17(1):1–15zbMATHGoogle Scholar
  2. 2.
    Chandra R (2001) Parallel programming in OpenMP. Morgan kaufmann, BurlingtonGoogle Scholar
  3. 3.
    Chorley MJ, Walker DW (2010) Performance analysis of a hybrid MPI/OpenMP application on multi-core clusters. J Comput Sci 1(3):168–174Google Scholar
  4. 4.
    Dagum L, Enon R (1998) Openmp: an industry standard api for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55Google Scholar
  5. 5.
    Delaney GW, Cleary PW, Sinnott MD, Morrison RD (2010) Novel application of DEM to modelling comminution processes. In: IOP conference series: materials science and engineering, vol 10. IOP Publishing, p 012099Google Scholar
  6. 6.
    Drosinos N, Koziris N (2004) Performance comparison of pure MPI vs hybrid MPI-Openmp parallelization models on SMP clusters. In: Parallel and distributed processing symposium, 2004. Proceedings. 18th International. IEEE, p 15Google Scholar
  7. 7.
    Grest GS, Dünweg B, Kremer K (1989) Vectorized link cell fortran code for molecular dynamics simulations for a large number of particles. Comput Phys Commun 55(3):269–285Google Scholar
  8. 8.
    Gropp W, Lusk E, Skjellum A (1999) Using MPI: portable parallel programming with the message-passing interface, vol 1. MIT press, CambridgezbMATHGoogle Scholar
  9. 9.
    Henty DS (2000) Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling. In: Proceedings of the 2000 ACM/IEEE conference on supercomputing. IEEE Computer Society, p 10Google Scholar
  10. 10.
    Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) iDistance: an adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst (TODS) 30(2):364–397Google Scholar
  11. 11.
    Jost G, Jin HQ, anMey D, Hatay FF (2003) Comparing the openmp, mpi, and hybrid programming paradigm on an smp cluster. ntrsnasagovGoogle Scholar
  12. 12.
    Lim KW, Andrade JE (2014) Granular element method for three-dimensional discrete element calculations. Int J Numer Anal Methods Geomech 38(2):167–188Google Scholar
  13. 13.
    Luecke G, Weiss O, Kraeva M, Coyle J, Hoekstra J (2010) Performance analysis of pure MPI versus MPI+ OpenMP for jacobi iteration and a 3D FFT on the cray XT5. In: Cray user group 2010 proceedingsGoogle Scholar
  14. 14.
    Maknickas A, Kačeniauskas A, Kačianauskas R, Balevičius R, Džiugys A (2006) Parallel DEM software for simulation of granular media. Informatica 17(2):207–224zbMATHGoogle Scholar
  15. 15.
    Michael JQ (2003) Parallel programming in C with MPI and OpenMP. McGraw-Hill Press, New YorkGoogle Scholar
  16. 16.
    Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP 1(2):331–340Google Scholar
  17. 17.
    Munjiza A, Andrews K (1998) Nbs contact detection algorithm for bodies of similar size. Int J Numer Methods Eng 43(1):131–149zbMATHGoogle Scholar
  18. 18.
    Ng TT (1994) Numerical simulations of granular soil using elliptical particles. Comput Geotech 16(2):153–169Google Scholar
  19. 19.
    Ng TT (2004) Triaxial test simulations with discrete element method and hydrostatic boundaries. J Eng Mech 130(10):1188–1194Google Scholar
  20. 20.
    Pacheco PS (1997) Parallel programming with MPI. Morgan Kaufmann, BurlingtonzbMATHGoogle Scholar
  21. 21.
    Pal A, Agarwala A, Raha S, Bhattacharya B (2014) Performance metrics in a hybrid MPI–OpenMP based molecular dynamics simulation with short-range interactions. J Parallel Distrib Comput 74(3):2203–2214Google Scholar
  22. 22.
    Peters JF, Hopkins MA, Kala R, Wahl RE (2009) A polyellipsoid particle for nonspherical discrete element method. Eng Comput 26(6):645–657Google Scholar
  23. 23.
    Rabenseifner R, Hager G, Jost G (2009) Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: 2009 17th euromicro international conference on parallel, distributed and network-based processing. IEEE, pp 427–436Google Scholar
  24. 24.
    Vedachalam V, Virdee D (2011) Discrete element modelling of granular snow particles using liggghts. M.Sc. University of EdinburghGoogle Scholar
  25. 25.
    Washington DW, Meegoda JN (2003) Micro-mechanical simulation of geotechnical problems using massively parallel computers. Int J Numer Anal Methods Geomech 27(14):1227–1234zbMATHGoogle Scholar
  26. 26.
    Wellmann C, Lillie C, Wriggers P (2008) A contact detection algorithm for superellipsoids based on the common-normal concept. Eng Comput 25(5):432–442zbMATHGoogle Scholar
  27. 27.
    Williams JR, Pentland AP (1992) Superquadrics and modal dynamics for discrete elements in interactive design. Eng Comput 9(2):115–127Google Scholar
  28. 28.
    Williams JR, Perkins E, Cook B (2004) A contact algorithm for partitioning n arbitrary sized objects. Eng Comput 21(2/3/4):235–248zbMATHGoogle Scholar
  29. 29.
    Yan B, Regueiro RA (2018a) A comprehensive study of MPI parallelism in three dimensional discrete element method (DEM) simulation of complex-shaped granular particles. Comput Part Mech 5(4):553–577Google Scholar
  30. 30.
    Yan B, Regueiro R (2018b) Comparison between o(n\(^2\)) and o(n) neighbor search algorithm and its influence on superlinear speedup in parallel discrete element method (DEM) for complex-shaped particles. Eng Comput 35(6):2327–2348Google Scholar
  31. 31.
    Yan B, Regueiro RA (2018c) Superlinear speedup phenomenon in parallel 3d discrete element method (DEM) simulations of complex-shaped particles. Parallel Comput 75:61–87MathSciNetGoogle Scholar
  32. 32.
    Yan B, Regueiro RA, Sture S (2010) Three dimensional ellipsoidal discrete element modeling of granular materials and its coupling with finite element facets. Eng Comput 27(4):519–550zbMATHGoogle Scholar

Copyright information

© OWZ 2018

Authors and Affiliations

  1. 1.Department of Civil, Environmental, and Architectural EngineeringUniversity of Colorado BoulderBoulderUSA

Personalised recommendations