Introducing a Performance Model for Bandwidth-Limited Loop Kernels

  • Jan Treibig
  • Georg Hager
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6067)

Abstract

We present a diagnostic performance model for bandwidth-limited loop kernels which is founded on the analysis of modern cache based microarchitectures. This model allows an accurate performance prediction and evaluation for existing instruction codes. It provides an in-depth understanding of how performance for different memory hierarchy levels is made up. The performance of raw memory load, store and copy operations and a stream vector triad are analyzed and benchmarked on three modern x86-type quad-core architectures in order to demonstrate the capabilities of the model.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Intel Corporation: Intel 64 and IA-32 Architectures Optimization Reference Manual, Document Number: 248966–17 (2008)Google Scholar
  2. 2.
    AMD, Inc.: Software Optimization Guide for AMD Family 10h Processors. Document Number: 40546 (2008)Google Scholar
  3. 3.
    Treibig, J., Hager, G., Wellein, G.: Multi-core architectures: Complixities of performance prediction and the impact of cache topology (2009), http://arxiv.org/abs/0910.4865
  4. 4.
    Schönauer, W.: Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers. Self-edition, Karlsruhe (2000)Google Scholar
  5. 5.
    Jalby, W., Lemuet, C., Le Pasteur, X.: WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing. International Journal of High Performance Computing Applications 18, 211–224 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jan Treibig
    • 1
  • Georg Hager
    • 1
  1. 1.Regionales Rechenzentrum ErlangenUniversity Erlangen-Nuernberg 

Personalised recommendations