Complexities of Performance Prediction for Bandwidth-Limited Loop Kernels on Multi-Core Architectures

  • Jan TreibigEmail author
  • Georg Hager
  • Gerhard Wellein


The balance metric is a simple approach to estimate the performance of bandwidth-limited loop kernels. However, applying the method to modern multi-core architectures yields unsatisfactory results. This paper analyzes the influence of cache hierarchy design on performance predictions for bandwidth-limited loop kernels on current mainstream processors. We present a diagnostic model with improved predictive power, correcting the limitations of the simple balance metric. The importance of code execution overhead even in bandwidth-bound situations is emphasized.


Performance Prediction Main Memory Memory Bandwidth Cache Line Algorithmic Balance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    W. Schönauer: Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers. Self-edition, Karlsruhe (2000). Google Scholar
  2. 2.
    K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, K. Yelick: Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures. In: ACM/IEEE (Ed.): Proceedings of the ACM/IEEE SC 2008 Conference (Supercomputing Conference ’08, Austin, TX, Nov 15–21, 2008). Google Scholar
  3. 3.
    Intel Corporation: Intel 64 and IA-32 Architectures Optimization Reference Manual. (2008) Document Number: 248966–17. Google Scholar
  4. 4.
    W. Jalby, C. Lemuet and X. Le Pasteur: WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing. International Journal of High Performance Computing Applications, Vol. 18, 211–224 (2004). Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Regionales Rechenzentrum ErlangenFriedrich-Alexander Universität Erlangen-NürnbergErlangenGermany

Personalised recommendations