Structured mesh-oriented framework design and optimization for a coarse-grained parallel CFD solver based on hybrid MPI/OpenMP programming

  • Feng He
  • Xiaoshe Dong
  • Nianjun Zou
  • Weiguo Wu
  • Xingjun ZhangEmail author


Despite the shortcomings of the MPI/OpenMP hybrid parallel model that is extensively employed in massively parallel CFD solvers, this paper creates a set of MPI/OpenMP coarse-grained hybrid communication mapping rules for a structured mesh and establishes a mapping relationship among the geometric topology, the boundary communication topology, the topology of processes and threads groups, and the communication buffer. Based on the key technologies of the nonblocking asynchronous message communication and fine-grained mutex synchronization with a double-buffer mechanism for shared memory communication, an MPI/OpenMP coarse-grained hybrid parallel CFD solver framework for a structured mesh is designed. The experimental results show that the framework has high parallel performance and excellent scalability.


Structured mesh Hybrid MPI/OpenMP model Parallel computational fluid dynamics High performance computing Mutex synchronization 



This work is supported by the National Key Research and Development Program of China under Grand No. 2016YFB0200902, and the NSFC project under Grand No. 61572394.


  1. 1.
    Wu ZY, Zhu Q (2009) Scalable parallel computing framework for pump scheduling optimization. World Environmental and Water Resources Congress, pp 1–11Google Scholar
  2. 2.
    Yao J, Jameson A, Alonso JJ, Liu F (2001) Development and validation of a massively parallel flow solver for turbomachinery flows. J Propuls Power 17(3):659–668CrossRefGoogle Scholar
  3. 3.
    van der Weide E, Kalitzin G, Schluter J, Alonso JJ (2006) Unsteady turbomachinery computations using massively parallel platforms. 44th AIAA Aerospace Sciences Meeting and Exhibit, p 421Google Scholar
  4. 4.
    Corral R, Gisbert F, Pueblas J (2013) Computation of turbomachinery flows with a parallel unstructured mesh Navier–Stokes equations solver on GPUs. In: 21st AIAA Computational Fluid Dynamics Conference, p 2864Google Scholar
  5. 5.
    Greenshields CJ (2018) OpenFOAM user guide. Accessed 11 Dec 2018
  6. 6.
    Aiqing Z, Zeyao M, Zhang Y (2014) Three-level hierarchical software architecture for data-driven parallel computing with applications. J Comput Res Dev 51:2538–2546Google Scholar
  7. 7.
    Li HF, Liang TY, Chiu JY (2013) A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters. J Supercomput 66(1):381–405CrossRefGoogle Scholar
  8. 8.
    Utrera G, Gil M, Martorell X (2015) In search of the best MPI-OpenMP distribution for optimum Intel-MIC cluster performance. In: 2015 International Conference on High Performance Computing and Simulation (HPCS), pp 429–435Google Scholar
  9. 9.
    Yang L, Chiu SC, Liao W-K, Thomas MA (2014) High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations. J Supercomput 70(1):284–300CrossRefGoogle Scholar
  10. 10.
    Peterson B, Humphrey A, Holmen J et al (2018) Demonstrating GPU code portability and scalability for radiative heat transfer computations. J Comput Sci 27:303–319CrossRefGoogle Scholar
  11. 11.
    Dimakopoulos VV (2014) Parallel programming models. In: Torquati M, Bertels K, Karlsson S, Pacull F (eds) Smart multicore embedded systems. Springer, New York, pp 3–20CrossRefGoogle Scholar
  12. 12.
    Jin HW, Sur S, Chai L, Panda DK (2007) Lightweight Kernel-level primitives for high-performance MPI intra-node communication over multi-core systems. In: IEEE International Conference on Cluster Computing, pp 446–451Google Scholar
  13. 13.
    D.A. Mallón, G.L. Taboada, C. Teijeiro, et al (2009) Performance evaluation of MPI, UPC and OpenMP on multicore architectures. European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting. Springer, Berlin, pp 174–184CrossRefGoogle Scholar
  14. 14.
    Mininni PD, Rosenberg D, Reddy R, Pouquet A (2011) A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence. Parallel Comput 36(6–7):316–326CrossRefGoogle Scholar
  15. 15.
    Balaji P, Buntinas D, Goodell D et al (2011) MPI on millions of cores. Parallel Process Lett 21(01):45–60MathSciNetCrossRefGoogle Scholar
  16. 16.
    Chorley MJ, Walker DW (2010) Performance analysis of a hybrid MPI/OpenMP application on multi-core clusters. J Comput Sci 1(3):168–174CrossRefGoogle Scholar
  17. 17.
    Drosinos N, Koziris N (2004) Performance comparison of pure MPI versus hybrid MPI-OpenMP parallelization models on SMP clusters. In: 18th International Parallel and Distributed Processing Symposium, pp 15–24Google Scholar
  18. 18.
    Hager G, Jost G, Rabenseifner R (2009) Communication characteristics and hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. Proc Cray User Group Conf 4(500):5455Google Scholar
  19. 19.
    Yan B, Regueiro RA (2018) Comparison between pure MPI and hybrid MPI-OpenMP parallelism for discrete element method (DEM) of ellipsoidal and poly-ellipsoidal particles. Comput Part Mech 6:1–25Google Scholar
  20. 20.
    Smith L, Bull M, Clark J, Building M, King T (2001) Development of mixed mode MPI/OpenMP applications. Sci Program 9(2–3):83–98Google Scholar
  21. 21.
    Ashworth M, Anton L, Guo X, Pickles S (2015) Exploiting multi-core processors for scientific applications using hybrid MPI-OpenMP. Techn Rep. CrossRefGoogle Scholar
  22. 22.
    Rabenseifner R, Hager G, Jost G (2009) Hybrid MPI and OpenMP parallel programming on clusters of multicore nodes. In: 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp 427–436Google Scholar
  23. 23.
    Iakymchuk R, Akhmetova D, Iakymchuk R, Laure E (2017) Performance study of multithreaded MPI and OpenMP tasking in a large scientific code performance study of multithreaded MPI and OpenMP tasking in a large scientific code. In: IEEE International Conference on Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 756–765Google Scholar
  24. 24.
    Xue J (2012) Loop tiling for parallelism. Springer, BostonzbMATHGoogle Scholar
  25. 25.
    Rabenseifner R (2003) Hybrid parallel programming : performance problems and chances. In: Proceedings of the 45th Cray User Group Conference, Ohio, pp 12–16Google Scholar
  26. 26.
    Sharma R, Kanungo P (2011) Performance evaluation of MPI and hybrid MPI+OpenMP programming paradigms on multi-core processors cluster. In: International Conference on Recent Trends in Information Systems (ReTIS), pp 137–140Google Scholar
  27. 27.
    Deck S, Duveau P, d’Espiney P, Guillen P (2002) Development and application of Spalart–Allmaras one equation turbulence model to three-dimensional supersonic complex configurations. Aerosp Sci Technol 6(3):171–183CrossRefGoogle Scholar
  28. 28.
    Ghia U, Ghia KN, Shin CT (1982) High-Re solutions for incompressible flow using the Navier–Stokes equations and a multigrid method. J Comput Phys 48(3):387–411CrossRefGoogle Scholar
  29. 29.
    Liu A, Ju YP, Zhang CH (2018) Parallel simulation of aerodynamic instabilities in transonic axial compressor rotor. J Propul Power 34(6):1561–1573CrossRefGoogle Scholar
  30. 30.
    Debreu L, Blayo E (1998) On the schwarz alternating method for oceanic models on parallel computers. J Comput Phys 141(2):93–111MathSciNetCrossRefGoogle Scholar
  31. 31.
    Tan G, Li L, Triechle S, Phillips E, Bao Y, Sun N (2011) Fast implementation of DGEMM on fermi GPU. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis SC’11, p 35Google Scholar
  32. 32.
    Yang C, Xue W, Fu H, et al (2017) 10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis SC’17, pp 57–68Google Scholar
  33. 33.
    TOP500 List Novermber (2018) Accessed 11 Nov 2018
  34. 34.
    Yan B, Regueiro RA (2018) Superlinear speedup phenomenon in parallel 3D discrete element method (DEM) simulations of complex-shaped particles. Parallel Comput 75:61–87MathSciNetCrossRefGoogle Scholar
  35. 35.
    Skoumpourdis D, Papadopoulos PK, Koziri MG, Tziritas N, Loukopoulos T, Anagnostopoulos I (2017) On improving the speedup of slice and tile level parallelism in HEVC using AVX2. In: Proceedings of the 21st Pan-Hellenic Conference on Informatics, p 52Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Xi’an Jiaotong UniversityXi’anChina
  2. 2.Jiuquan Satellite Launch CenterJiuquanChina

Personalised recommendations