Locality-Aware Task Scheduling and Data Distribution on NUMA Systems

Muddukrishna, Ananya; Jonsson, Peter A.; Vlassov, Vladimir; Brorsson, Mats

doi:10.1007/978-3-642-40698-0_12

Ananya Muddukrishna¹⁹,
Peter A. Jonsson²⁰,
Vladimir Vlassov¹⁹ &
…
Mats Brorsson^19,20

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8122))

Included in the following conference series:

International Workshop on OpenMP

1509 Accesses
19 Citations

Abstract

Modern parallel computer systems exhibit Non-Uniform Memory Access (NUMA) behavior. For best performance, any parallel program therefore has to match data allocation and scheduling of computations to the memory architecture of the machine. When done manually, this becomes a tedious process and since each individual system has its own peculiarities this also leads to programs that are not performance-portable.

We propose the use of a data distribution scheme in which NUMA hardware peculiarities are abstracted away from the programmer and data distribution is delegated to a runtime system which is generated once for each machine. In addition we propose using task data dependence information now possible with the OpenMP 4.0RC2 proposal to guide the scheduling of OpenMP tasks to further reduce data stall times.

We demonstrate the viability and performance of our proposals on a four socket AMD Opteron machine with eight NUMA nodes. We identify that both data distribution and locality-aware task scheduling improves performance compared to default policies while still providing an architecture-oblivious approach for the programmer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Olivier, S.L., de Supinski, B.R., Schulz, M., Prins, J.F.: Characterizing and mitigating work time inflation in task parallel programs. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 1–12 (2012)
Google Scholar
Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: A generic framework for managing hardware affinities in hpc applications. In: 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 180–186 (2010)
Google Scholar
Ribeiro, C.P., Mhaut, J.F.: Minas: Memory affinity management framework (2009)
Google Scholar
Kleen, A.: A numa api for linux. Novel Inc. (2005)
Google Scholar
Terboven, C., Schmidl, D., Cramer, T.: an Mey, D.: Assessing OpenMP Tasking Implementations on NUMA Architectures. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 182–195. Springer, Heidelberg (2012)
Chapter Google Scholar
McCurdy, C., Vetter, J.S.: Memphis: Finding and fixing NUMA-Related performance problems on multi-core platforms. Proceedings of the IEEE (2010)
Google Scholar
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona openmp tasks suite: A set of benchmarks targeting the exploitation of task parallelism in openmp. In: International Conference on Parallel Processing, ICPP 2009, pp. 124–131 (2009)
Google Scholar
AMD: BIOS and kernel developers guide for AMD family 10h processors
Google Scholar
Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., Hughes, B.: Cache hierarchy and memory subsystem of the AMD opteron processor. IEEE Micro 30(2), 16–29 (2010)
Article Google Scholar
Molka, D., Schne, R., Hackenberg, D., Mller, M.: Memory performance and SPEC OpenMP scalability on quad-socket x86_64 systems. Algorithms and Architectures for Parallel Processing, 170–181 (2011)
Google Scholar
Pillet, V., Labarta, J., Cortes, T., Girona, S.: Paraver: A tool to visualize and analyze parallel code. WoTUG-18, 17–31 (1995)
Google Scholar
Huang, L., Jin, H., Yi, L., Chapman, B.: Enabling locality-aware computations in OpenMP. Scientific Programming 181, 169–181 (2010)
Article Google Scholar
Majo, Z., Gross, T.R.: Matching memory access patterns and data placement for NUMA systems. In: Proceedings of the Tenth International Symposium on Code Generation and Optimization, pp. 230–241 (2012)
Google Scholar
Nikolopoulos, D.S., Papatheodorou, T.S., Polychronopoulos, C.D., Labarta, J.: Is data distribution necessary in OpenMP? In: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing (CDROM), p. 47 (2000)
Google Scholar
Terboven, C., Schmidl, D., Jin, H., Reichstein, T.: Data and thread affinity in openmp programs. In: Proceedings of the 2008 Workshop on Memory Access on Future Processors: a Solved Problem? pp. 377–384 (2008)
Google Scholar
Broquedis, F., Furmento, N., Goglin, B., Namyst, R., Wacrenier, P.-A.: Dynamic task and data placement over NUMA architectures: An openMP runtime perspective. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 79–92. Springer, Heidelberg (2009)
Chapter MATH Google Scholar
Goglin, B., Furmento, N.: Enabling high-performance memory migration for multithreaded applications on linux. In: IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009, pp. 1–9 (2009)
Google Scholar
Wittmann, M., Hager, G.: Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems. arXiv preprint arXiv:1101 (2010)
Google Scholar
Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Spiegel, M., Prins, J.F.: OpenMP task scheduling strategies for multicore NUMA systems. International Journal of High Performance Computing Applications 26(2), 110–124 (2012)
Article Google Scholar
Pilla, L.L., Ribeiro, C.P., Cordeiro, D., Mhaut, J.F.: Charm++ on NUMA platforms: the impact of SMP optimizations and a NUMA-aware load balancer. In: 4th Workshop of the INRIA-Illinois Joint Laboratory on Petascale Computing, Urbana, IL, USA (2010)
Google Scholar
Schmidl, D., Terboven, C.: an Mey, D.: Towards NUMA Support with Distance Information. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 69–79. Springer, Heidelberg (2011)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

KTH Royal Institute of Technology, Sweden
Ananya Muddukrishna, Vladimir Vlassov & Mats Brorsson
SICS Swedish ICT AB, Sweden
Peter A. Jonsson & Mats Brorsson

Authors

Ananya Muddukrishna
View author publications
You can also search for this author in PubMed Google Scholar
Peter A. Jonsson
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Vlassov
View author publications
You can also search for this author in PubMed Google Scholar
Mats Brorsson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research School of Computer Science, The Australian National University, Australia
Alistair P. Rendell
Dept. of Computer Science, University of Houston Oak Ridge National Laboratory, USA
Barbara M. Chapman
Lehrstuhl für Hochleistungsrechnen und Rechen- und Kommunikationszentrum, RWTH Aachen University, Seffenter Weg 23, 52074, Aachen, Germany
Matthias S. Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Muddukrishna, A., Jonsson, P.A., Vlassov, V., Brorsson, M. (2013). Locality-Aware Task Scheduling and Data Distribution on NUMA Systems. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds) OpenMP in the Era of Low Power Devices and Accelerators. IWOMP 2013. Lecture Notes in Computer Science, vol 8122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40698-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-40698-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40697-3
Online ISBN: 978-3-642-40698-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics