An Efficient OpenMP Runtime System for Hierarchical Architectures

  • Samuel Thibault
  • François Broquedis
  • Brice Goglin
  • Raymond Namyst
  • Pierre-André Wacrenier
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4935)


Exploiting the full computational power of always deeper hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. The emergence of multi-core chips and NUMA machines makes it important to minimize the number of remote memory accesses, to favor cache affinities, and to guarantee fast completion of synchronization steps. By using the BubbleSched platform as a threading backend for the GOMP OpenMP compiler, we are able to easily transpose affinities of thread teams into scheduling hints using abstractions called bubbles. We then propose a scheduling strategy suited to nested OpenMP parallelism. The resulting preliminary performance evaluations show an important improvement of the speedup on a typical NAS OpenMP benchmark application.


OpenMP Nested Parallelism Hierarchical Thread Scheduling Bubbles Multi-Core NUMA SMP 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [AGMJ04]
    Ayguade, E., Gonzalez, M., Martorell, X., Jost, G.: Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications. In: 18th International Parallel and Distributed Processing Symposium (IPDPS) (2004)Google Scholar
  2. [BS05]
    Blikberg, R., Sørevik, T.: Load balancing and OpenMP implementation of nested parallelism. Parallel Computing 31(10-12), 984–998 (2005)CrossRefGoogle Scholar
  3. [CDC+99]
    Carlson, W., Draper, J.M., Culler, D.E., Yelick, K., Brooks, E., Warren, K.: Introduction to UPC and Language Specification. Technical Report CCS-TR-99-157, George Mason University (May 1999)Google Scholar
  4. [DGC05]
    Duran, A., Gonzàles, M., Corbalán, J.: Automatic Thread Distribution for Nested Parallelism in OpenMP. In: 19th ACM International Conference on Supercomputing, Cambridge, MA, USA, June 2005, pp. 121–130 (2005)Google Scholar
  5. [dWJ03]
    Van der Wijngaart, R.F., Jin, H.: NAS Parallel Benchmarks, Multi-Zone Versions. Technical Report NAS-03-010, NASA Advanced Supercomputing (NAS) Division (2003)Google Scholar
  6. [FLR98]
    Frigo, M., Leiserson, C.E., Randall, K.H.: The Implementation of the Cilk-5 Multithreaded Language. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada (June 1998),
  7. [gom]
    GOMP – An OpenMP implementation for GCC,
  8. [GOM+00]
    Gonzalez, M., Oliver, J., Martorell, X., Ayguade, E., Labarta, J., Navarro, N.: OpenMP Extensions for Thread Groups and Their Run-Time Support. In: Languages and Compilers for Parallel Computing. Springer, Heidelberg (2000)Google Scholar
  9. [GSS+06]
    Gao, G.R., Sterling, T., Stevens, R., Hereld, M., Zhu, W.: Hierarchical multithreading: programming model and system software. In: 20th International Parallel and Distributed Processing Symposium (IPDPS) (April 2006)Google Scholar
  10. [MM06]
    Marathe, J., Mueller, F.: Hardware Profile-guided Automatic Page Placement for ccNUMA Systems. In: Sixth Symposium on Principles and Practice of Parallel Programming (March 2006)Google Scholar
  11. [NLRH06]
    Nordén, M., Löf, H., Rantakokko, J., Holmgren, S.: Geographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers. In: Second International Workshop on OpenMP (IWOMP 2006), Reims, France (2006)Google Scholar
  12. [SGDA05]
    Shen, X., Gao, Y., Ding, C., Archambault, R.: Lightweight Reference Affinity Analysis. In: 19th ACM International Conference on Supercomputing, Cambridge, MA, USA, June 2005, pp. 131–140 (2005)Google Scholar
  13. [Thi05]
    Thibault, S.: A Flexible Thread Scheduler for Hierarchical Multiprocessor Machines. In: Second International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters (COSET-2), Cambridge / USA, 06 2005. ICS / ACM / IRISAGoogle Scholar
  14. [TTSY00]
    Tanaka, Y., Taura, K., Sato, M., Yonezawa, A.: Performance Evaluation of OpenMP Applications with Nested Parallelism. In: Languages, Compilers, and Run-Time Systems for Scalable Computers, pp. 100–112 (2000)Google Scholar
  15. [Zha06]
    Zhang, G.: Extending the OpenMP standard for thread mapping and grouping. In: Second International Workshop on OpenMP (IWOMP 2006), Reims, France (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Samuel Thibault
    • 1
  • François Broquedis
    • 1
  • Brice Goglin
    • 1
  • Raymond Namyst
    • 1
  • Pierre-André Wacrenier
    • 1
  1. 1.INRIA Futurs - LaBRITalence cedexFrance

Personalised recommendations