Abstract
We present a diagnostic performance model for bandwidth-limited loop kernels which is founded on the analysis of modern cache based microarchitectures. This model allows an accurate performance prediction and evaluation for existing instruction codes. It provides an in-depth understanding of how performance for different memory hierarchy levels is made up. The performance of raw memory load, store and copy operations and a stream vector triad are analyzed and benchmarked on three modern x86-type quad-core architectures in order to demonstrate the capabilities of the model.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Intel Corporation: Intel 64 and IA-32 Architectures Optimization Reference Manual, Document Number: 248966–17 (2008)
AMD, Inc.: Software Optimization Guide for AMD Family 10h Processors. Document Number: 40546 (2008)
Treibig, J., Hager, G., Wellein, G.: Multi-core architectures: Complixities of performance prediction and the impact of cache topology (2009), http://arxiv.org/abs/0910.4865
Schönauer, W.: Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers. Self-edition, Karlsruhe (2000)
Jalby, W., Lemuet, C., Le Pasteur, X.: WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing. International Journal of High Performance Computing Applications 18, 211–224 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Treibig, J., Hager, G. (2010). Introducing a Performance Model for Bandwidth-Limited Loop Kernels. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2009. Lecture Notes in Computer Science, vol 6067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14390-8_64
Download citation
DOI: https://doi.org/10.1007/978-3-642-14390-8_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14389-2
Online ISBN: 978-3-642-14390-8
eBook Packages: Computer ScienceComputer Science (R0)