Scheduling Dynamic OpenMP Applications over Multicore Architectures

Broquedis, François; Diakhaté, François; Thibault, Samuel; Aumage, Olivier; Namyst, Raymond; Wacrenier, Pierre-André

doi:10.1007/978-3-540-79561-2_15

François Broquedis¹,
François Diakhaté¹,
Samuel Thibault¹,
Olivier Aumage¹,
Raymond Namyst¹ &
…
Pierre-André Wacrenier¹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5004))

Included in the following conference series:

International Workshop on OpenMP

958 Accesses
13 Citations

Abstract

Approaching the theoretical performance of hierarchical multicore machines requires a very careful distribution of threads and data among the underlying non-uniform architecture in order to minimize cache misses and NUMA penalties. While it is acknowledged that OpenMP can enhance the quality of thread scheduling on such architectures in a portable way, by transmitting precious information about the affinities between threads and data to the underlying runtime system, most OpenMP runtime systems are actually unable to efficiently support highly irregular, massively parallel applications on NUMA machines.

In this paper, we present a thread scheduling policy suited to the execution of OpenMP programs featuring irregular and massive nested parallelism over hierarchical architectures. Our policy enforces a distribution of threads that maximizes the proximity of threads belonging to the same parallel region, and uses a NUMA-aware work stealing strategy when load balancing is needed. It has been developed as a plug-in to the forestGOMP OpenMP platform [TBG+07]. We demonstrate the efficiency of our approach with a highly irregular recursive OpenMP program resulting from the generic parallelization of a surface reconstruction application. We achieve a speedup of 14 on a 16-core machine with no application-level optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ayguade, E., Copty, N., Duranl, A., Hoeflinger, J., Lin, Y., Massaioli, F., Su, E., Unnikrishnan, P., Zhang, G.: A proposal for task parallelism in OpenMP. In: Third International Workshop on OpenMP (IWOMP 2007), Beijing, China (2007)
Google Scholar
Ayguade, E., Gonzalez, M., Martorell, X., Jost, G.: Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications. In: 18th International Parallel and Distributed Processing Symposium (IPDPS) (2004)
Google Scholar
an Mey, D., Sarholz, S., Terboven, C.: Nested Parallelization with OpenMP. Parallel Computing 35(5), 459–476 (2007)
MATH Google Scholar
Balart, J., Duran, A., Gonzàlez, M., Martorell, X., Ayguadé, E., Labarta, J.: Nanos mercurium: A research compiler for openmp. In: European Workshop on OpenMP (EWOMP) (October 2004)
Google Scholar
Blikberg, R., Sørevik, T.: Load balancing and OpenMP implementation of nested parallelism. Parallel Computing, 31(10-12):984–998 (October 2005)
Google Scholar
Chapman, B.M., Huang, L., Jin, H., Jost, G., de Supinski, B.R.: Extending openmp worksharing directives for multithreading. In: EuroPar 2006 Parallel Processing (2006)
Google Scholar
Duran, A., Gonzàles, M., Corbalán, J.: Automatic Thread Distribution for Nested Parallelism in OpenMP. In: 19th ACM International Conference on Supercomputing, Cambridge, MA, USA, June 2005, pp. 121–130 (2005)
Google Scholar
Duran, A., Silvera, R., Corbalán, J., Labarta, J.: Runtime adjustment of parallel nested loops. In: Chapman, B.M. (ed.) WOMPAT 2004. LNCS, vol. 3349, Springer, Heidelberg (2005)
Google Scholar
Frigo, M., Leiserson, C.E., Randall, K.H.: The Implementation of the Cilk-5 Multithreaded Language. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada (June 1998)
Google Scholar
GOMP – An OpenMP implementation for GCC, http://gcc.gnu.org/projects/gomp/
Gonzalez, M., Oliver, J., Martorell, X., Ayguade, E., Labarta, J., Navarro, N.: OpenMP Extensions for Thread Groups and Their Run-Time Support. In: Languages and Compilers for Parallel Computing, Springer, Heidelberg (2001)
Google Scholar
Gao, G.R., Sterling, T., Stevens, R., Hereld, M., Zhu, W.: Hierarchical multithreading: programming model and system software. In: 20th International Parallel and Distributed Processing Symposium (IPDPS) (April 2006)
Google Scholar
Gerndt, A., Sarholz, S., Wolter, M., an Mey, D., Bischof, C., Kuhlen, T.: Nested OpenMP for Efficient Computation of 3D Critical Points in Multi-Block CFD Datasets. In: Super Computing (November 2006)
Google Scholar
Hadjidoukas, P.E., Dimakopoulos, V.V.: Nested Parallelism in the OMPi OpenMP/C compiler. In: EuroPar, Rennes,France, July 2007, ACM, New York (2007)
Google Scholar
Karlsson, S.: An Introduction to Balder - An OpenMP Run-time Library for Clusters of SMPs. In: International Workshop on OpenMP (IWOMP) (June 2005)
Google Scholar
Martorell, X., Ayguadé, E., Navarro, N., Corbalán, J., González, M., Labarta, J.: Thread Fork/Join Techniques for Multi-Level Parallelism Exploitation in NUMA Multiprocessors. In: International Conference on SuperComputing, pp. 294–301. ACM Press, New York (1999)
Google Scholar
Ohtake, Y., Belyaev, A., Alexa, M., Turk, G., Seidel, H.-P.: Multi-level partition of unity implicits. ACM Trans. Graph. 22(3), 463–470 (2003)
Article Google Scholar
Su, E., Tian, X., Haab, M.G.G., Shah, S., Petersen, P.: Compiler Support of the Workqueuing Execution Model for Intel SMP Architectures. In: European Workshop on OpenMP (EWOMP) (October 2004)
Google Scholar
Thibault, S., Broquedis, F., Goglin, B., Namyst, R., Wacrenier, P.-A.: An Efficient OpenMP Runtime System for Hierarchical Architectures. In: International Workshop on OpenMP (IWOMP), Beijing,China, June 2007, pp. 148–159 (2007)
Google Scholar
Tian, X., Girkar, M., Bik, A., Saito, H.: Practical Compiler Techniques on Efficient Multithreaded Code Generation for OpenMP Programs. Comput. J. 48(5), 588–601 (2005)
Article Google Scholar
Tian, X., Girkar, M., Shah, S., Armstrong, D., Su, E., Petersen, P.: Compiler and Runtime Support for Running OpenMP Programs on Pentium- and Itanium-Architectures. In: Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments, April 2003, pp. 47–55 (2003)
Google Scholar
Tian, X., Hoeflinger, J.P., Haab, G., Chen, Y.-K., Girkar, M., Shah, S.: A compiler for exploiting nested parallelism in OpenMP programs. Parallel Comput. 31(10-12), 960–983 (2005)
Article Google Scholar
Tanaka, Y., Taura, K., Sato, M., Yonezawa, A.: Performance evaluation of openmp applications with nested parallelism. In: Languages, Compilers, and Run-Time Systems for Scalable Computers, pp. 100–112 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA Futurs - LaBRI, Université Bordeaux 1, France
François Broquedis, François Diakhaté, Samuel Thibault, Olivier Aumage, Raymond Namyst & Pierre-André Wacrenier

Authors

François Broquedis
View author publications
You can also search for this author in PubMed Google Scholar
François Diakhaté
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Thibault
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Aumage
View author publications
You can also search for this author in PubMed Google Scholar
Raymond Namyst
View author publications
You can also search for this author in PubMed Google Scholar
Pierre-André Wacrenier
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Rudolf Eigenmann Bronis R. de Supinski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Broquedis, F., Diakhaté, F., Thibault, S., Aumage, O., Namyst, R., Wacrenier, PA. (2008). Scheduling Dynamic OpenMP Applications over Multicore Architectures. In: Eigenmann, R., de Supinski, B.R. (eds) OpenMP in a New Era of Parallelism. IWOMP 2008. Lecture Notes in Computer Science, vol 5004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79561-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-79561-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-79560-5
Online ISBN: 978-3-540-79561-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics