A Novel Shared-Memory Thread-Pool Implementation for Hybrid Parallel CFD Solvers

  • Jens Jägersküpper
  • Christian Simmendinger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6853)


The Computational Fluid Dynamics (CFD) solver TAU for unstructured grids is widely used in the European aerospace industry. TAU runs on High-Performance Computing (HPC) clusters with several thousands of cores using MPI-based domain decomposition. In order to make more efficient use of current multi-core CPUs and to prepare TAU for the many-core era, a shared-memory parallelization has been added to one of TAU’s solver to obtain a hybrid parallelization: MPI-based domain decomposition plus multi-threaded processing of a domain.

For the edge-based solver considered, a simple loop-based approach via OpenMP FOR directives would – due to the Amdahl trap – not deliver the required speed-up. A more sophisticated, thread-pool-based shared-memory parallelization has been developed which allows for a relaxed thread synchronization with automatic and dynamic load balancing.

In this paper we describe the concept behind this shared-memory parallelization, we explain how the multi-threaded computation of a domain works. Some details of its implementation in TAU as well as some first performance results are presented. We emphasize that the concept is not TAU-specific. Actually, this design pattern appears to be very generic and may well be applied to other grid/mesh/graph-based codes.


Computational Fluid Dynamics Unstructured Grid Mutual Exclusion Data Race Edge Loop 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alrutz, T.: Investigation of the parallel performance of the unstructured DLR-TAU-code on distributed computing systems. In: Deane, E. (ed.) Parallel Computational Fluid Dynamics, pp. 509–516. Elsevier, Amsterdam (2005)Google Scholar
  2. 2.
    Alrutz, T., Simmendinger, C., Gerhold, T.: Efficiency enhancement of an unstructured CFD-code on distributed computing systems. In: Proc. ParCFD (2009)Google Scholar
  3. 3.
    Devine, K., Boman, E., Riesen, L., Catalyurek, U., Chevalier, C.: Getting started with Zoltan: A short tutorial. In: Proc. Dagstuhl Seminar Combinatorial Scientific Computing, Also Sandia National Labs Tech Report SAND2009-0578C (2009)Google Scholar
  4. 4.
    Hendrickson, B., Leland, R.: A multilevel algorithm for partitioning graphs. In: Proc. 1995 ACM/IEEE Conference on Supercomputing (CDROM), ACM, New York (1995)Google Scholar
  5. 5.
    Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Comp. 20, 359–392 (1998)CrossRefMathSciNetzbMATHGoogle Scholar
  6. 6.
    Kroll, N., Fassbender, J.K. (eds.): MEGAFLOW — Numerical Flow Simulation for Aircraft Design Results of the second phase of the German CFD initiative MEGAFLOW presented during its closing symposium at DLR, Braunschweig, Germany, December 10-11. Notes on Numerical Fluid Mechanics and Multidisciplinary Design, vol. 89. Springer, Heidelberg (2005)zbMATHGoogle Scholar
  7. 7.
    Marjanović, V., Labarta, J., Ayguadé, E., Valero, M.: Overlapping communication and computation by using a hybrid MPI/SMPSs approach. In: Proc. 24th ACM Int’l Conference on Supercomputing, pp. 5–16 (2010)Google Scholar
  8. 8.
    Mavripilis, D.: Parallel performance investigation of an ustructured mesh Navier-Stokes solver. The Int’l Journal of High Performance Comp. 2(16), 395–407 (2002)CrossRefGoogle Scholar
  9. 9.
    Planas, J., Badia, R., Ayguadé, E., Labarta, J.: Hierarchical task-based programming with StarSs. Int. J. High Perform. Comput. Appl. 23, 284–299 (2009)CrossRefGoogle Scholar
  10. 10.
    Simmendinger, C., Kügeler, E.: Hybrid parallelization of a turbomachinery CFD code: performance enhancements on multicore architectures. In: Proc. ECCOMAS-CFD (2010)Google Scholar
  11. 11.
    Treibig, J., Hager, G., Wellein, G.: LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. CoRR, abs/1004.4431 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jens Jägersküpper
    • 1
  • Christian Simmendinger
    • 2
  1. 1.Institute of Aerodynamics and Flow Technology Center of Computer Applications in Aerospace Science and Engineering (C2A2S2E)German Aerospace Center (DLR)BraunschweigGermany
  2. 2.T-Systems Solution for Research (SfR)StuttgartGermany

Personalised recommendations