Assessing Task-to-Data Affinity in the LLVM OpenMP Runtime

Klinkenberg, Jannis; Samfass, Philipp; Terboven, Christian; Duran, Alejandro; Klemm, Michael; Teruel, Xavier; Mateo, Sergi; Olivier, Stephen L.; Müller, Matthias S.

doi:10.1007/978-3-319-98521-3_16

Jannis Klinkenberg¹⁸,
Philipp Samfass¹⁹,
Christian Terboven¹⁸,
Alejandro Duran²⁰,
Michael Klemm²⁰,
Xavier Teruel²¹,
Sergi Mateo²¹,
Stephen L. Olivier²² &
…
Matthias S. Müller¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11128))

Included in the following conference series:

International Workshop on OpenMP

873 Accesses
2 Citations

Abstract

In modern shared-memory NUMA systems which typically consist of two or more multi-core processor packages with local memory, affinity of data to computation is crucial for achieving high performance with an OpenMP program. OpenMP* 3.0 introduced support for task-parallel programs in 2008 and has continued to extend its applicability and expressiveness. However, the ability to support data affinity of tasks is missing. In this paper, we investigate several approaches for task-to-data affinity that combine locality-aware task distribution and task stealing. We introduce the task affinity clause that will be part of OpenMP 5.0 and provide the reasoning behind its design. Evaluation with our experimental implementation in the LLVM OpenMP runtime shows that task affinity improves execution performance up to 4.5x on an 8-socket NUMA machine and significantly reduces runtime variability of OpenMP tasks. Our results demonstrate that a variety of applications can benefit from task affinity and that the presented clause is closing the gap of task-to-data affinity in OpenMP 5.0.

Under the terms of Contract DE-NA0003525, there is a non-exclusive license for use of this work by or on behalf of the U.S. Government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our implementation is based on a LLVM development version for OpenMP 5.0 from September 2017 and is available at https://github.com/jklinkenberg/openmp/tree/task-affinity.

References

libnuma. http://man7.org/linux/man-pages/man3/numa.3.html. Accessed 23 Apr 2018
Nanos++ runtime. https://github.com/bsc-pm/nanox. Accessed 26 Apr 2018
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: 2009 International Conference on Parallel Processing, pp. 124–131, September 2009
Google Scholar
GNU: GOMP An OpenMP implementation for GCC. https://gcc.gnu.org/projects/gomp/. Accessed 16 Apr 2018
Huang, L., Jin, H., Yi, L., Chapman, B.M.: Enabling locality-aware computations in OpenMP. Sci. Program. 18(3–4), 169–181 (2010)
Google Scholar
Muddukrishna, A., Jonsson, P.A., Brorsson, M.: Locality-aware task scheduling and data distribution for OpenMP programs on NUMA systems and manycore processors. Sci. Program. 2015, 5:1–5:16 (2015)
Google Scholar
Olivier, S.L., de Supinski, B.R., Schulz, M., Prins, J.F.: Characterizing and mitigating work time inflation in task parallel programs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 65:1–65:12. IEEE Computer Society Press, Los Alamitos, CA, USA (2012)
Google Scholar
OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 3.0, May 2008. http://www.openmp.org/
Röhl, T., Eitzinger, J., Hager, G., Wellein, G.: LIKWID monitoring stack: a flexible framework enabling job specific performance monitoring for the masses. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 781–784, September 2017
Google Scholar
Terboven, C., et al.: Approaches for task affinity in OpenMP. In: Maruyama, N., de Supinski, B.R., Wahib, M. (eds.) IWOMP 2016. LNCS, vol. 9903, pp. 102–115. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45550-1_8
Chapter Google Scholar
Treibig, J., Hager, G., Wellein, G.: LIKWID: a lightweight performance-oriented tool suite for x86 Multicore environments. In: Proceedings of the 2010 39th International Conference on Parallel Processing Workshops, ICPPW 2010, pp. 207–216. IEEE Computer Society, Washington, DC (2010)
Google Scholar
Virouleau, P., Broquedis, F., Gautier, T., Rastello, F.: Using data dependencies to improve task-based scheduling strategies on NUMA architectures. In: Dutot, P.-F., Trystram, D. (eds.) Euro-Par 2016. LNCS, vol. 9833, pp. 531–544. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43659-3_39
Chapter Google Scholar
Ziakas, D., Baum, A., Maddox, R.A., Safranek, R.J.: Intel QuickPath interconnect architectural features supporting scalable system architectures. In: 2010 18th IEEE Symposium on High Performance Interconnects, pp. 1–6, August 2010
Google Scholar

Download references

Acknowledgements

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energys National Nuclear Security Administration under contract DE-NA-0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

This work is partially supported by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project and by the Generalitat de Catalunya (contract 2017-SGR-1414).

Some of the experiments were performed with computing resources granted by JARA-HPC from RWTH Aachen University under project jara0001. Parts of this work were funded by the German Federal Ministry of Research and Education (BMBF) under grant numbers 01IH16004B (Project Chameleon).

Intel and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

* Other names and brands are the property of their respective owners.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance.

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Author information

Authors and Affiliations

Chair for High Performance Computing, IT Center, RWTH Aachen University, Aachen, Germany
Jannis Klinkenberg, Christian Terboven & Matthias S. Müller
Department of Informatics, Technical University of Munich, Garching, Germany
Philipp Samfass
Intel, Santa Clara, USA
Alejandro Duran & Michael Klemm
Barcelona Supercomputing Center, Barcelona, Spain
Xavier Teruel & Sergi Mateo
Center for Computing Research, Sandia National Laboratories, Albuquerque, NM, USA
Stephen L. Olivier

Authors

Jannis Klinkenberg
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Samfass
View author publications
You can also search for this author in PubMed Google Scholar
Christian Terboven
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Duran
View author publications
You can also search for this author in PubMed Google Scholar
Michael Klemm
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Teruel
View author publications
You can also search for this author in PubMed Google Scholar
Sergi Mateo
View author publications
You can also search for this author in PubMed Google Scholar
Stephen L. Olivier
View author publications
You can also search for this author in PubMed Google Scholar
Matthias S. Müller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephen L. Olivier .

Editor information

Editors and Affiliations

Lawrence Livermore National Laboratory, Livermore, CA, USA
Bronis R. de Supinski
Barcelona Supercomputing Center, Barcelona, Barcelona, Spain
Pedro Valero-Lara
Universitat Politècnica de Catalunya, Barcelona, Spain
Xavier Martorell
Barcelona Supercomputing Center, Barcelona, Barcelona, Spain
Sergi Mateo Bellido
Universitat Politècnica de Catalunya, Barcelona, Barcelona, Spain
Jesus Labarta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Klinkenberg, J. et al. (2018). Assessing Task-to-Data Affinity in the LLVM OpenMP Runtime. In: de Supinski, B., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds) Evolving OpenMP for Evolving Architectures. IWOMP 2018. Lecture Notes in Computer Science(), vol 11128. Springer, Cham. https://doi.org/10.1007/978-3-319-98521-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-98521-3_16
Published: 29 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98520-6
Online ISBN: 978-3-319-98521-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics