Abstract
OpenMP 4.0 introduced dependent tasks, which give the programmer a way to express fine grain parallelism. Using appropriate OS support (such as NUMA libraries), the runtime can rely on the information in the depend clause to dynamically map the tasks to the architecture topology. Controlling data locality is one of the key factors to reach a high level of performance when targeting NUMA architectures. On this topic, OpenMP does not provide a lot of flexibility to the programmer yet, which lets the runtime decide where a task should be executed. In this paper, we present a class of applications which would benefit from having such a control and flexibility over tasks and data placement. We also propose our own interpretation of the new affinity clause for the task directive, which is being discussed by the OpenMP Architecture Review Board. This clause enables the programmer to give hints to the runtime about tasks placement during the program execution, which can be used to control the data mapping on the architecture. In our proposal, the programmer can express affinity between a task and the following resources: a thread, a NUMA node, and a data. We then present an implementation of this proposal in the Clang-3.8 compiler, and an implementation of the corresponding extensions in our OpenMP runtime libKOMP. Finally, we present a preliminary evaluation of this work running two task-based OpenMP kernels on a 192-core NUMA architecture, that shows noticeable improvements both in terms of performance and scalability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bleuse, R., Gautier, T., Lima, J.V.F., Mounié, G., Trystram, D.: Scheduling data flow program in XKaapi: a new affinity based algorithm for heterogeneous architectures. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014 Parallel Processing. LNCS, vol. 8632, pp. 560–571. Springer, Heidelberg (2014)
Broquedis, F., Furmento, N., Goglin, B., Wacrenier, P.-A., Namyst, R.: ForestGOMP: an efficient OpenMP environment for NUMA architectures. Int. J. Parallel Programm. 38(5), 418–439 (2010)
Broquedis, F., Gautier, T., Danjean, V.: libKOMP, an efficient OpenMP runtime system for both fork-join and data flow paradigms. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 102–115. Springer, Heidelberg (2012)
Drebes, A., Heydemann, K., Drach, N., Pop, A., Cohen, A.: Topology-aware and dependence-aware scheduling and memory allocation for task-parallel languages. ACM Trans. Archit. Code Optim. 11(3), 30:1–30:25 (2014). Special Issue on OpenMP; Müller, M.S., Ayguade, E. (eds.)
Durand, M., Broquedis, F., Gautier, T., Raffin, B.: OpenMP in the Era of Low Power Devices and Accelerators, pp. 141–155. Springer, Berlin, Heidelberg (2013)
Huang, L., Jin, H., Yi, L., Chapman, B.: Enabling locality-aware computations in openmp. Sci. Program. 18(3–4), 169–181 (2010)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Kennedy, K., Koelbel, C., Zima, H.: The rise and fall of high performance fortran: an historical object lesson. In: Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages, HOPL III, pp. 7-1–7-22. ACM, New York (2007)
Lima, J.V.F., Gautier, T., Danjean, V., Raffin, B., Maillard, N.: Design and analysis of scheduling strategies for multi-CPU and multi-GPU architectures. Parallel Comput. 44, 37–52 (2015)
Marowka, A., Liu, Z., Chapman, B.: Openmp-oriented applications for distributed shared memory architectures: research articles. Concurr. Comput. Pract. Exper. 16, 371–384 (2004)
Olivier, S., Porterfield, A., Wheeler, K.B., Spiegel, M., Prins, J.F.: Openmp task scheduling strategies for multicore NUMA systems. IJHPCA 26(2), 110–124 (2012)
Olivier, S.L., de Supinski, B.R., Schulz, M., Prins, J.F.: Characterizing and mitigating work time inflation in task parallel programs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012 (2012)
Saad, Y.: Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, 2nd edn. SIAM, Philadelphia (2003)
Virouleau, P., Brunet, P., Broquedis, F., Furmento, N., Thibault, S., Aumage, O., Gautier, T.: Evaluation of OpenMP dependent tasks with the KASTORS benchmark suite. In: DeRose, L., Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 16–29. Springer, Heidelberg (2014)
Acknowledgments
This work is integrated and supported by the ELCI project, a French FSN (“Fond pour la Société Numérique”) project that associates academic and industrial partners to design and provide a software environment for very high performance computing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Virouleau, P., Roussel, A., Broquedis, F., Gautier, T., Rastello, F., Gratien, JM. (2016). Description, Implementation and Evaluation of an Affinity Clause for Task Directives. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-45550-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45549-5
Online ISBN: 978-3-319-45550-1
eBook Packages: Computer ScienceComputer Science (R0)