A Novel Shared-Memory Thread-Pool Implementation for Hybrid Parallel CFD Solvers
The Computational Fluid Dynamics (CFD) solver TAU for unstructured grids is widely used in the European aerospace industry. TAU runs on High-Performance Computing (HPC) clusters with several thousands of cores using MPI-based domain decomposition. In order to make more efficient use of current multi-core CPUs and to prepare TAU for the many-core era, a shared-memory parallelization has been added to one of TAU’s solver to obtain a hybrid parallelization: MPI-based domain decomposition plus multi-threaded processing of a domain.
For the edge-based solver considered, a simple loop-based approach via OpenMP FOR directives would – due to the Amdahl trap – not deliver the required speed-up. A more sophisticated, thread-pool-based shared-memory parallelization has been developed which allows for a relaxed thread synchronization with automatic and dynamic load balancing.
In this paper we describe the concept behind this shared-memory parallelization, we explain how the multi-threaded computation of a domain works. Some details of its implementation in TAU as well as some first performance results are presented. We emphasize that the concept is not TAU-specific. Actually, this design pattern appears to be very generic and may well be applied to other grid/mesh/graph-based codes.
KeywordsComputational Fluid Dynamics Unstructured Grid Mutual Exclusion Data Race Edge Loop
Unable to display preview. Download preview PDF.
- 1.Alrutz, T.: Investigation of the parallel performance of the unstructured DLR-TAU-code on distributed computing systems. In: Deane, E. (ed.) Parallel Computational Fluid Dynamics, pp. 509–516. Elsevier, Amsterdam (2005)Google Scholar
- 2.Alrutz, T., Simmendinger, C., Gerhold, T.: Efficiency enhancement of an unstructured CFD-code on distributed computing systems. In: Proc. ParCFD (2009)Google Scholar
- 3.Devine, K., Boman, E., Riesen, L., Catalyurek, U., Chevalier, C.: Getting started with Zoltan: A short tutorial. In: Proc. Dagstuhl Seminar Combinatorial Scientific Computing, Also Sandia National Labs Tech Report SAND2009-0578C (2009)Google Scholar
- 4.Hendrickson, B., Leland, R.: A multilevel algorithm for partitioning graphs. In: Proc. 1995 ACM/IEEE Conference on Supercomputing (CDROM), ACM, New York (1995)Google Scholar
- 6.Kroll, N., Fassbender, J.K. (eds.): MEGAFLOW — Numerical Flow Simulation for Aircraft Design Results of the second phase of the German CFD initiative MEGAFLOW presented during its closing symposium at DLR, Braunschweig, Germany, December 10-11. Notes on Numerical Fluid Mechanics and Multidisciplinary Design, vol. 89. Springer, Heidelberg (2005)zbMATHGoogle Scholar
- 7.Marjanović, V., Labarta, J., Ayguadé, E., Valero, M.: Overlapping communication and computation by using a hybrid MPI/SMPSs approach. In: Proc. 24th ACM Int’l Conference on Supercomputing, pp. 5–16 (2010)Google Scholar
- 10.Simmendinger, C., Kügeler, E.: Hybrid parallelization of a turbomachinery CFD code: performance enhancements on multicore architectures. In: Proc. ECCOMAS-CFD (2010)Google Scholar
- 11.Treibig, J., Hager, G., Wellein, G.: LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. CoRR, abs/1004.4431 (2010)Google Scholar