Scheduling Dynamic OpenMP Applications over Multicore Architectures

  • François Broquedis
  • François Diakhaté
  • Samuel Thibault
  • Olivier Aumage
  • Raymond Namyst
  • Pierre-André Wacrenier
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5004)


Approaching the theoretical performance of hierarchical multicore machines requires a very careful distribution of threads and data among the underlying non-uniform architecture in order to minimize cache misses and NUMA penalties. While it is acknowledged that OpenMP can enhance the quality of thread scheduling on such architectures in a portable way, by transmitting precious information about the affinities between threads and data to the underlying runtime system, most OpenMP runtime systems are actually unable to efficiently support highly irregular, massively parallel applications on NUMA machines.

In this paper, we present a thread scheduling policy suited to the execution of OpenMP programs featuring irregular and massive nested parallelism over hierarchical architectures. Our policy enforces a distribution of threads that maximizes the proximity of threads belonging to the same parallel region, and uses a NUMA-aware work stealing strategy when load balancing is needed. It has been developed as a plug-in to the forestGOMP OpenMP platform [TBG+07]. We demonstrate the efficiency of our approach with a highly irregular recursive OpenMP program resulting from the generic parallelization of a surface reconstruction application. We achieve a speedup of 14 on a 16-core machine with no application-level optimization.


OpenMP Nested Parallelism Hierarchical Thread Scheduling Bubbles Multi-Core NUMA SMP 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [ACD+07]
    Ayguade, E., Copty, N., Duranl, A., Hoeflinger, J., Lin, Y., Massaioli, F., Su, E., Unnikrishnan, P., Zhang, G.: A proposal for task parallelism in OpenMP. In: Third International Workshop on OpenMP (IWOMP 2007), Beijing, China (2007)Google Scholar
  2. [AGMJ04]
    Ayguade, E., Gonzalez, M., Martorell, X., Jost, G.: Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications. In: 18th International Parallel and Distributed Processing Symposium (IPDPS) (2004)Google Scholar
  3. [aMST07]
    an Mey, D., Sarholz, S., Terboven, C.: Nested Parallelization with OpenMP. Parallel Computing 35(5), 459–476 (2007)zbMATHGoogle Scholar
  4. [BDG+04]
    Balart, J., Duran, A., Gonzàlez, M., Martorell, X., Ayguadé, E., Labarta, J.: Nanos mercurium: A research compiler for openmp. In: European Workshop on OpenMP (EWOMP) (October 2004)Google Scholar
  5. [BS05]
    Blikberg, R., Sørevik, T.: Load balancing and OpenMP implementation of nested parallelism. Parallel Computing, 31(10-12):984–998 (October 2005)Google Scholar
  6. [CHJ+06]
    Chapman, B.M., Huang, L., Jin, H., Jost, G., de Supinski, B.R.: Extending openmp worksharing directives for multithreading. In: EuroPar 2006 Parallel Processing (2006)Google Scholar
  7. [DGC05]
    Duran, A., Gonzàles, M., Corbalán, J.: Automatic Thread Distribution for Nested Parallelism in OpenMP. In: 19th ACM International Conference on Supercomputing, Cambridge, MA, USA, June 2005, pp. 121–130 (2005)Google Scholar
  8. [DSCL04]
    Duran, A., Silvera, R., Corbalán, J., Labarta, J.: Runtime adjustment of parallel nested loops. In: Chapman, B.M. (ed.) WOMPAT 2004. LNCS, vol. 3349, Springer, Heidelberg (2005)Google Scholar
  9. [FLR98]
    Frigo, M., Leiserson, C.E., Randall, K.H.: The Implementation of the Cilk-5 Multithreaded Language. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada (June 1998)Google Scholar
  10. [gom]
    GOMP – An OpenMP implementation for GCC,
  11. [GOM+01]
    Gonzalez, M., Oliver, J., Martorell, X., Ayguade, E., Labarta, J., Navarro, N.: OpenMP Extensions for Thread Groups and Their Run-Time Support. In: Languages and Compilers for Parallel Computing, Springer, Heidelberg (2001)Google Scholar
  12. [GSS+06]
    Gao, G.R., Sterling, T., Stevens, R., Hereld, M., Zhu, W.: Hierarchical multithreading: programming model and system software. In: 20th International Parallel and Distributed Processing Symposium (IPDPS) (April 2006)Google Scholar
  13. [GSW+06]
    Gerndt, A., Sarholz, S., Wolter, M., an Mey, D., Bischof, C., Kuhlen, T.: Nested OpenMP for Efficient Computation of 3D Critical Points in Multi-Block CFD Datasets. In: Super Computing (November 2006)Google Scholar
  14. [HD07]
    Hadjidoukas, P.E., Dimakopoulos, V.V.: Nested Parallelism in the OMPi OpenMP/C compiler. In: EuroPar, Rennes,France, July 2007, ACM, New York (2007)Google Scholar
  15. [Kar05]
    Karlsson, S.: An Introduction to Balder - An OpenMP Run-time Library for Clusters of SMPs. In: International Workshop on OpenMP (IWOMP) (June 2005)Google Scholar
  16. [MAN+99]
    Martorell, X., Ayguadé, E., Navarro, N., Corbalán, J., González, M., Labarta, J.: Thread Fork/Join Techniques for Multi-Level Parallelism Exploitation in NUMA Multiprocessors. In: International Conference on SuperComputing, pp. 294–301. ACM Press, New York (1999)Google Scholar
  17. [OBA+03]
    Ohtake, Y., Belyaev, A., Alexa, M., Turk, G., Seidel, H.-P.: Multi-level partition of unity implicits. ACM Trans. Graph. 22(3), 463–470 (2003)CrossRefGoogle Scholar
  18. [STH+04]
    Su, E., Tian, X., Haab, M.G.G., Shah, S., Petersen, P.: Compiler Support of the Workqueuing Execution Model for Intel SMP Architectures. In: European Workshop on OpenMP (EWOMP) (October 2004)Google Scholar
  19. [TBG+07]
    Thibault, S., Broquedis, F., Goglin, B., Namyst, R., Wacrenier, P.-A.: An Efficient OpenMP Runtime System for Hierarchical Architectures. In: International Workshop on OpenMP (IWOMP), Beijing,China, June 2007, pp. 148–159 (2007)Google Scholar
  20. [TGBS05]
    Tian, X., Girkar, M., Bik, A., Saito, H.: Practical Compiler Techniques on Efficient Multithreaded Code Generation for OpenMP Programs. Comput. J. 48(5), 588–601 (2005)CrossRefGoogle Scholar
  21. [TGS+03]
    Tian, X., Girkar, M., Shah, S., Armstrong, D., Su, E., Petersen, P.: Compiler and Runtime Support for Running OpenMP Programs on Pentium- and Itanium-Architectures. In: Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments, April 2003, pp. 47–55 (2003)Google Scholar
  22. [THH+05]
    Tian, X., Hoeflinger, J.P., Haab, G., Chen, Y.-K., Girkar, M., Shah, S.: A compiler for exploiting nested parallelism in OpenMP programs. Parallel Comput. 31(10-12), 960–983 (2005)CrossRefGoogle Scholar
  23. [TTSY00]
    Tanaka, Y., Taura, K., Sato, M., Yonezawa, A.: Performance evaluation of openmp applications with nested parallelism. In: Languages, Compilers, and Run-Time Systems for Scalable Computers, pp. 100–112 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • François Broquedis
    • 1
  • François Diakhaté
    • 1
  • Samuel Thibault
    • 1
  • Olivier Aumage
    • 1
  • Raymond Namyst
    • 1
  • Pierre-André Wacrenier
    • 1
  1. 1.INRIA Futurs - LaBRIUniversité Bordeaux 1France

Personalised recommendations