Complexities of Performance Prediction for Bandwidth-Limited Loop Kernels on Multi-Core Architectures

Treibig, Jan; Hager, Georg; Wellein, Gerhard

doi:10.1007/978-3-642-13872-0_1

Jan Treibig⁵,
Georg Hager⁵ &
Gerhard Wellein⁵

1828 Accesses
1 Citations

Abstract

The balance metric is a simple approach to estimate the performance of bandwidth-limited loop kernels. However, applying the method to modern multi-core architectures yields unsatisfactory results. This paper analyzes the influence of cache hierarchy design on performance predictions for bandwidth-limited loop kernels on current mainstream processors. We present a diagnostic model with improved predictive power, correcting the limitations of the simple balance metric. The importance of code execution overhead even in bandwidth-bound situations is emphasized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

W. Schönauer: Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers. Self-edition, Karlsruhe (2000).
Google Scholar
K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, K. Yelick: Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures. In: ACM/IEEE (Ed.): Proceedings of the ACM/IEEE SC 2008 Conference (Supercomputing Conference ’08, Austin, TX, Nov 15–21, 2008).
Google Scholar
Intel Corporation: Intel 64 and IA-32 Architectures Optimization Reference Manual. (2008) Document Number: 248966–17.
Google Scholar
W. Jalby, C. Lemuet and X. Le Pasteur: WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing. International Journal of High Performance Computing Applications, Vol. 18, 211–224 (2004).
Google Scholar

Download references

Author information

Authors and Affiliations

Regionales Rechenzentrum Erlangen, Friedrich-Alexander Universität Erlangen-Nürnberg, Martensstr. 1, D-91058, Erlangen, Germany
Jan Treibig, Georg Hager & Gerhard Wellein

Authors

Jan Treibig
View author publications
You can also search for this author in PubMed Google Scholar
Georg Hager
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Wellein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Treibig .

Editor information

Editors and Affiliations

Inst. Aerodynamik und Gasdynamik, Universität Stuttgart, Pfaffenwaldring 21, Stuttgart, 70550, Germany
Siegfried Wagner
Astrophysikalisches Institut Potsdam, An der Sternwarte 16, Potsdam, 14482, Germany
Matthias Steinmetz
Leibniz-Rechenzentrum, Boltzmannstr. 1, Garching b. München, 85748, Germany
Arndt Bode
Leibniz-Rechenzentrum, Boltzmannstr. 1, Garching b. München, 85748, Germany
Markus Michael Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Treibig, J., Hager, G., Wellein, G. (2010). Complexities of Performance Prediction for Bandwidth-Limited Loop Kernels on Multi-Core Architectures. In: Wagner, S., Steinmetz, M., Bode, A., Müller, M. (eds) High Performance Computing in Science and Engineering, Garching/Munich 2009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13872-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-13872-0_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13871-3
Online ISBN: 978-3-642-13872-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Complexities of Performance Prediction for Bandwidth-Limited Loop Kernels on Multi-Core Architectures