Skip to main content

Complexities of Performance Prediction for Bandwidth-Limited Loop Kernels on Multi-Core Architectures

  • Conference paper
High Performance Computing in Science and Engineering, Garching/Munich 2009

Abstract

The balance metric is a simple approach to estimate the performance of bandwidth-limited loop kernels. However, applying the method to modern multi-core architectures yields unsatisfactory results. This paper analyzes the influence of cache hierarchy design on performance predictions for bandwidth-limited loop kernels on current mainstream processors. We present a diagnostic model with improved predictive power, correcting the limitations of the simple balance metric. The importance of code execution overhead even in bandwidth-bound situations is emphasized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. W. Schönauer: Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers. Self-edition, Karlsruhe (2000).

    Google Scholar 

  2. K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, K. Yelick: Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures. In: ACM/IEEE (Ed.): Proceedings of the ACM/IEEE SC 2008 Conference (Supercomputing Conference ’08, Austin, TX, Nov 15–21, 2008).

    Google Scholar 

  3. Intel Corporation: Intel 64 and IA-32 Architectures Optimization Reference Manual. (2008) Document Number: 248966–17.

    Google Scholar 

  4. W. Jalby, C. Lemuet and X. Le Pasteur: WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing. International Journal of High Performance Computing Applications, Vol. 18, 211–224 (2004).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Treibig .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Treibig, J., Hager, G., Wellein, G. (2010). Complexities of Performance Prediction for Bandwidth-Limited Loop Kernels on Multi-Core Architectures. In: Wagner, S., Steinmetz, M., Bode, A., Müller, M. (eds) High Performance Computing in Science and Engineering, Garching/Munich 2009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13872-0_1

Download citation

Publish with us

Policies and ethics