Complexities of Performance Prediction for Bandwidth-Limited Loop Kernels on Multi-Core Architectures


The balance metric is a simple approach to estimate the performance of bandwidth-limited loop kernels. However, applying the method to modern multi-core architectures yields unsatisfactory results. This paper analyzes the influence of cache hierarchy design on performance predictions for bandwidth-limited loop kernels on current mainstream processors. We present a diagnostic model with improved predictive power, correcting the limitations of the simple balance metric. The importance of code execution overhead even in bandwidth-bound situations is emphasized.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    W. Schönauer: Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers. Self-edition, Karlsruhe (2000). Google Scholar
  2. 2.
    K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, K. Yelick: Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures. In: ACM/IEEE (Ed.): Proceedings of the ACM/IEEE SC 2008 Conference (Supercomputing Conference ’08, Austin, TX, Nov 15–21, 2008). Google Scholar
  3. 3.
    Intel Corporation: Intel 64 and IA-32 Architectures Optimization Reference Manual. (2008) Document Number: 248966–17. Google Scholar
  4. 4.
    W. Jalby, C. Lemuet and X. Le Pasteur: WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing. International Journal of High Performance Computing Applications, Vol. 18, 211–224 (2004). Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Regionales Rechenzentrum ErlangenFriedrich-Alexander Universität Erlangen-NürnbergErlangenGermany

Personalised recommendations