Structured mesh-oriented framework design and optimization for a coarse-grained parallel CFD solver based on hybrid MPI/OpenMP programming
- 22 Downloads
Abstract
Despite the shortcomings of the MPI/OpenMP hybrid parallel model that is extensively employed in massively parallel CFD solvers, this paper creates a set of MPI/OpenMP coarse-grained hybrid communication mapping rules for a structured mesh and establishes a mapping relationship among the geometric topology, the boundary communication topology, the topology of processes and threads groups, and the communication buffer. Based on the key technologies of the nonblocking asynchronous message communication and fine-grained mutex synchronization with a double-buffer mechanism for shared memory communication, an MPI/OpenMP coarse-grained hybrid parallel CFD solver framework for a structured mesh is designed. The experimental results show that the framework has high parallel performance and excellent scalability.
Keywords
Structured mesh Hybrid MPI/OpenMP model Parallel computational fluid dynamics High performance computing Mutex synchronizationNotes
Acknowledgements
This work is supported by the National Key Research and Development Program of China under Grand No. 2016YFB0200902, and the NSFC project under Grand No. 61572394.
References
- 1.Wu ZY, Zhu Q (2009) Scalable parallel computing framework for pump scheduling optimization. World Environmental and Water Resources Congress, pp 1–11Google Scholar
- 2.Yao J, Jameson A, Alonso JJ, Liu F (2001) Development and validation of a massively parallel flow solver for turbomachinery flows. J Propuls Power 17(3):659–668CrossRefGoogle Scholar
- 3.van der Weide E, Kalitzin G, Schluter J, Alonso JJ (2006) Unsteady turbomachinery computations using massively parallel platforms. 44th AIAA Aerospace Sciences Meeting and Exhibit, p 421Google Scholar
- 4.Corral R, Gisbert F, Pueblas J (2013) Computation of turbomachinery flows with a parallel unstructured mesh Navier–Stokes equations solver on GPUs. In: 21st AIAA Computational Fluid Dynamics Conference, p 2864Google Scholar
- 5.Greenshields CJ (2018) OpenFOAM user guide. https://cfd.direct/openfoam/user-guide/. Accessed 11 Dec 2018
- 6.Aiqing Z, Zeyao M, Zhang Y (2014) Three-level hierarchical software architecture for data-driven parallel computing with applications. J Comput Res Dev 51:2538–2546Google Scholar
- 7.Li HF, Liang TY, Chiu JY (2013) A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters. J Supercomput 66(1):381–405CrossRefGoogle Scholar
- 8.Utrera G, Gil M, Martorell X (2015) In search of the best MPI-OpenMP distribution for optimum Intel-MIC cluster performance. In: 2015 International Conference on High Performance Computing and Simulation (HPCS), pp 429–435Google Scholar
- 9.Yang L, Chiu SC, Liao W-K, Thomas MA (2014) High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations. J Supercomput 70(1):284–300CrossRefGoogle Scholar
- 10.Peterson B, Humphrey A, Holmen J et al (2018) Demonstrating GPU code portability and scalability for radiative heat transfer computations. J Comput Sci 27:303–319CrossRefGoogle Scholar
- 11.Dimakopoulos VV (2014) Parallel programming models. In: Torquati M, Bertels K, Karlsson S, Pacull F (eds) Smart multicore embedded systems. Springer, New York, pp 3–20CrossRefGoogle Scholar
- 12.Jin HW, Sur S, Chai L, Panda DK (2007) Lightweight Kernel-level primitives for high-performance MPI intra-node communication over multi-core systems. In: IEEE International Conference on Cluster Computing, pp 446–451Google Scholar
- 13.D.A. Mallón, G.L. Taboada, C. Teijeiro, et al (2009) Performance evaluation of MPI, UPC and OpenMP on multicore architectures. European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting. Springer, Berlin, pp 174–184CrossRefGoogle Scholar
- 14.Mininni PD, Rosenberg D, Reddy R, Pouquet A (2011) A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence. Parallel Comput 36(6–7):316–326CrossRefGoogle Scholar
- 15.Balaji P, Buntinas D, Goodell D et al (2011) MPI on millions of cores. Parallel Process Lett 21(01):45–60MathSciNetCrossRefGoogle Scholar
- 16.Chorley MJ, Walker DW (2010) Performance analysis of a hybrid MPI/OpenMP application on multi-core clusters. J Comput Sci 1(3):168–174CrossRefGoogle Scholar
- 17.Drosinos N, Koziris N (2004) Performance comparison of pure MPI versus hybrid MPI-OpenMP parallelization models on SMP clusters. In: 18th International Parallel and Distributed Processing Symposium, pp 15–24Google Scholar
- 18.Hager G, Jost G, Rabenseifner R (2009) Communication characteristics and hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. Proc Cray User Group Conf 4(500):5455Google Scholar
- 19.Yan B, Regueiro RA (2018) Comparison between pure MPI and hybrid MPI-OpenMP parallelism for discrete element method (DEM) of ellipsoidal and poly-ellipsoidal particles. Comput Part Mech 6:1–25Google Scholar
- 20.Smith L, Bull M, Clark J, Building M, King T (2001) Development of mixed mode MPI/OpenMP applications. Sci Program 9(2–3):83–98Google Scholar
- 21.Ashworth M, Anton L, Guo X, Pickles S (2015) Exploiting multi-core processors for scientific applications using hybrid MPI-OpenMP. Techn Rep. https://doi.org/10.13140/2.1.5065.2487 CrossRefGoogle Scholar
- 22.Rabenseifner R, Hager G, Jost G (2009) Hybrid MPI and OpenMP parallel programming on clusters of multicore nodes. In: 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp 427–436Google Scholar
- 23.Iakymchuk R, Akhmetova D, Iakymchuk R, Laure E (2017) Performance study of multithreaded MPI and OpenMP tasking in a large scientific code performance study of multithreaded MPI and OpenMP tasking in a large scientific code. In: IEEE International Conference on Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 756–765Google Scholar
- 24.Xue J (2012) Loop tiling for parallelism. Springer, BostonzbMATHGoogle Scholar
- 25.Rabenseifner R (2003) Hybrid parallel programming : performance problems and chances. In: Proceedings of the 45th Cray User Group Conference, Ohio, pp 12–16Google Scholar
- 26.Sharma R, Kanungo P (2011) Performance evaluation of MPI and hybrid MPI+OpenMP programming paradigms on multi-core processors cluster. In: International Conference on Recent Trends in Information Systems (ReTIS), pp 137–140Google Scholar
- 27.Deck S, Duveau P, d’Espiney P, Guillen P (2002) Development and application of Spalart–Allmaras one equation turbulence model to three-dimensional supersonic complex configurations. Aerosp Sci Technol 6(3):171–183CrossRefGoogle Scholar
- 28.Ghia U, Ghia KN, Shin CT (1982) High-Re solutions for incompressible flow using the Navier–Stokes equations and a multigrid method. J Comput Phys 48(3):387–411CrossRefGoogle Scholar
- 29.Liu A, Ju YP, Zhang CH (2018) Parallel simulation of aerodynamic instabilities in transonic axial compressor rotor. J Propul Power 34(6):1561–1573CrossRefGoogle Scholar
- 30.Debreu L, Blayo E (1998) On the schwarz alternating method for oceanic models on parallel computers. J Comput Phys 141(2):93–111MathSciNetCrossRefGoogle Scholar
- 31.Tan G, Li L, Triechle S, Phillips E, Bao Y, Sun N (2011) Fast implementation of DGEMM on fermi GPU. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis SC’11, p 35Google Scholar
- 32.Yang C, Xue W, Fu H, et al (2017) 10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis SC’17, pp 57–68Google Scholar
- 33.TOP500 List Novermber (2018) https://www.top500.org/lists/2018/11/. Accessed 11 Nov 2018
- 34.Yan B, Regueiro RA (2018) Superlinear speedup phenomenon in parallel 3D discrete element method (DEM) simulations of complex-shaped particles. Parallel Comput 75:61–87MathSciNetCrossRefGoogle Scholar
- 35.Skoumpourdis D, Papadopoulos PK, Koziri MG, Tziritas N, Loukopoulos T, Anagnostopoulos I (2017) On improving the speedup of slice and tile level parallelism in HEVC using AVX2. In: Proceedings of the 21st Pan-Hellenic Conference on Informatics, p 52Google Scholar