autopin – Automated Optimization of Thread-to-Core Pinning on Multicore Systems

Klug, Tobias; Ott, Michael; Weidendorfer, Josef; Trinitis, Carsten

doi:10.1007/978-3-642-19448-1_12

Tobias Klug¹⁷,
Michael Ott¹⁷,
Josef Weidendorfer¹⁷ &
…
Carsten Trinitis¹⁷

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 6590))

659 Accesses
27 Citations

Abstract

In this paper we present a framework for automatic detection and application of the best binding between threads of a running parallel application and processor cores in a shared memory system, by making use of hardware performance counters. This is especially important within the scope of multicore architectures with shared cache levels. We demonstrate that many applications from the SPEC OMP benchmark show quite sensitive runtime behavior depending on the thread/core binding used. In our tests, the proposed framework is able to find the best binding in nearly all cases. The proposed framework is intended to supplement job scheduling systems for better automatic exploitation of systems with multicore processors, as well as making programmers aware of this issue by providing measurement logs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Graham, S.L., Kessler, P.B., McKusick, M.K.: gprof: a Call Graph Execution Profiler. In: SIGPLAN Symposium on Compiler Construction, pp. 120–126 (1982)
Google Scholar
Intel: VTune Performance Analyzer, http://www.intel.com/software/products/vtune
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-Oblivious Algorithms. In: FOCS 1999: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, p. 285. IEEE Computer Society Press, Washington, DC (1999)
Google Scholar
Intel: Math Kernel Library, http://developer.intel.com/software/products/mkl
Whaley, R.C., Dongarra, J.J.: Automatically Tuned Linear Algebra Software. Technical report (1997)
Google Scholar
Intel Corporation: Intel 64 and IA-32 Architectures: Software Developer’s Manual, Denver, CO, USA (2007)
Google Scholar
Advanced Micro Devices: AMD64 Architecture Programmer’s Manual. Number 24593 (2007)
Google Scholar
Browne, S., Dongarra, J., Garner, N., London, K., Mucci, P.: A scalable cross-platform infrastructure for application performance tuning using hardware counters. In: Supercomputing 2000: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing, Washington, DC, USA, p. 42. IEEE Computer Society, Los Alamitos (2000)
Google Scholar
Levon, J.: OProfile manual, http://oprofile.sourceforge.net/doc/
Eranian, S.: The perfmon2 Interface Specification. Technical Report HPL-2004-200R1, Hewlett-Packard Laboratory (February 2005)
Google Scholar
OpenMP.org: The OpenMP API specification for parallel programming, http://www.openmp.org/
Chapman, B., an Mey, D.: The Future of OpenMP in the Multi-Core Era. In: ParCo 2007: Proceedings of the International Conference on Parallel Computing: Architectures, Algorithms and Applications, pp. 571–572. IOS Press, Amsterdam (2008)
Google Scholar
an Mey, D., Terboven, C.: Affinity Matters!, http://www.compunity.org/events/pastevents/parco07/AffinityMatters_DaM.pdf
Chapman, B.: The Multicore Programming Challenge. In: Xu, M., Zhan, Y.-W., Cao, J., Liu, Y. (eds.) APPT 2007. LNCS, vol. 4847, p. 3. Springer, Heidelberg (2007)
Chapter Google Scholar
Fürlinger, K., Moore, S.: Continuous runtime profiling of openmp applications. In: Proceedings of the 2007 Conference on Parallel Computing (PARCO 2007), pp. 677–686 (September 2007)
Google Scholar
Ott, M., Klug, T., Weidendorfer, J., Trinitis, C.: autopin - Automated Optimization of Thread-to-Core Pinning on Multicore Systems. In: Proceedings of 1st Workshop on Programmability Issues for Multi-Core Computers (MULTIPROG) (January 2008), http://www.lrr.in.tum.de/~ottmi/docs/multiprog08.pdf
Schermerhorn, L.T.: Automatic Page Migration for Linux - A Matter of Hygiene (January 2007); Talk at linux.conf.au 2007
Google Scholar
Saito, H., Gaertner, G., Jones, W.B., Eigenmann, R., Iwashita, H., Lieberman, R., van Waveren, G.M., Whitney, B.: Large system performance of spec omp2001 benchmarks. In: Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds.) ISHPC 2002. LNCS, vol. 2327, pp. 370–379. Springer, Heidelberg (2002)
Chapter Google Scholar
Weidendorfer, J., Ott, M., Klug, T., Trinitis, C.: Latencies of conflicting writes on contemporary multicore architectures. In: Malyshkin, V.E. (ed.) PaCT 2007. LNCS, vol. 4671, pp. 318–327. Springer, Heidelberg (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl für Rechnertechnik und Rechnerorganisation / Parallelrechnerarchitektur (LRR/TUM), Technische Universität München, Boltzmannstraße 3, 85748, Garching bei München, Germany
Tobias Klug, Michael Ott, Josef Weidendorfer & Carsten Trinitis

Authors

Tobias Klug
View author publications
You can also search for this author in PubMed Google Scholar
Michael Ott
View author publications
You can also search for this author in PubMed Google Scholar
Josef Weidendorfer
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Trinitis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 412 96, Gothenburg, Sweden
Per Stenström

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Klug, T., Ott, M., Weidendorfer, J., Trinitis, C. (2011). autopin – Automated Optimization of Thread-to-Core Pinning on Multicore Systems. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers III. Lecture Notes in Computer Science, vol 6590. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19448-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-19448-1_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19447-4
Online ISBN: 978-3-642-19448-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics