Optimizing performance on modern HPC systems: learning from simple kernel benchmarks

Hager, G.; Zeiser, T.; Treibig, J.; Wellein, G.

doi:10.1007/3-540-31768-6_23

G. Hager¹⁵,
T. Zeiser¹⁵,
J. Treibig¹⁶ &
…
G. Wellein¹⁵

Part of the book series: Notes on Numerical Fluid Mechanics and Multidisciplinary Design ((NNFM,volume 91))

580 Accesses
3 Citations

Abstract

We discuss basic optimization and parallelization strategies for current cache-based microprocessors (Intel Itanium2, Intel Netburst and AMD64 variants) in single-CPU and shared memory environments. Using selected kernel benchmarks representing data intensive applications we focus on the effective bandwidths attainable, which is still suboptimal using current compilers.We stress the need for a subtle OpenMP implementation even for simple benchmark programs, to exploit the high aggregate memory bandwidth available nowadays on ccNUMA systems. If the quality of main memory access is the measure, classical vector systems such as the NEC SX6+ are still a class of their own and are able to sustain the performance level of in-cache operations of modern microprocessors even with arbitrarily large data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lemuet C, Jalby W, Touati S (2004) Improving load/store queues usage in scientific computing. The International Conference on Parallel Processing (ICPP'04). Montraal IEEE
Google Scholar
Oliker L et al. (2003) Evaluation of cache-based superscalar and cacheless vector architectures for scientific computations. In: Proc. SC2003, Phoenix, AZ
Google Scholar
Deserno F et al. (2004) Performance of scientific applications on modern supercomputers. In: Wagner S et al. (eds) High Performance Computing in Science and Engineering. Munich 2004. Transactions of the Second Joint HLRB and KONWIHR Status and Result Workshop. Springer-Verlag, Berlin, Heidelberg
Google Scholar
Oliker L et al. (2004) Scientific computations on modern parallel vector systems. In: Proc. SC2004, Pittsburgh, PA
Google Scholar
Pohl T et al. (2004) Performance evaluation of parallel large-scale Lattice Boltzmann applications on three supercomputing architectures. In: Proc. SC2004, Pittsburgh, PA
Google Scholar
Schönauer W (2000) Scientific Supercomputing. Self-edition, Karlsruhe
Google Scholar
Jalby W, Lemuet C, Touati S An effective memory operations optimization technique for vector loops on Itanium2 processors. Concurrency Comput Pract Exp (accepted for publication)
Google Scholar
Intel Corp. (2004) Itanium2^TM programming and optimization reference manual. Intel http://developer.intel.com/
Google Scholar
Bast H, Levinthal D, Intel Corp. Private communication
Google Scholar
Intel Corp. (2004) IA-32 optimization reference manual. Intel http://developer.intel.com/
Google Scholar
Rightmark Memory Analyzer http://cpu.rightmark.org/products/rmma.shtml
Google Scholar
AMD Athlon processor, x86 code optimization guide 86–98 http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf
Google Scholar

Download references

Author information

Authors and Affiliations

Regional Computing Centre Erlangen (RRZE), University of Erlangen-Nuremberg, Martensstr. 1, 91058, Erlangen, Germany
G. Hager, T. Zeiser & G. Wellein
Chair of System Simulation (LSS), University of Erlangen-Nuremberg, Cauerstr. 6, 91058, Erlangen, Germany
J. Treibig

Authors

G. Hager
View author publications
You can also search for this author in PubMed Google Scholar
T. Zeiser
View author publications
You can also search for this author in PubMed Google Scholar
J. Treibig
View author publications
You can also search for this author in PubMed Google Scholar
G. Wellein
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Aerodynamic Institute of the RWTH Aachen, Wuellnerstr. zw. 5 u. 7, 52062, Aachen, Germany
Egon Krause
Institute of Computational Technologies of SB RAS, Ac. Lavrentyev Ave. 6, 630090, Novosibirsk, Russia
Yurii Shokin
High Performance Computing Center Stuttgart, University of Stuttgart, Nobelstrasse 19, 70569, Stuttgart, Germany
Michael Resch
High Performance Computing Center Stuttgart, University of Stuttgart, Nobelstrasse 19, 70569, Stuttgart, Germany
Nina Shokina Dr.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hager, G., Zeiser, T., Treibig, J., Wellein, G. (2006). Optimizing performance on modern HPC systems: learning from simple kernel benchmarks. In: Krause, E., Shokin, Y., Resch, M., Shokina, N. (eds) Computational Science and High Performance Computing II. Notes on Numerical Fluid Mechanics and Multidisciplinary Design, vol 91. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31768-6_23

Download citation

DOI: https://doi.org/10.1007/3-540-31768-6_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31767-8
Online ISBN: 978-3-540-31768-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics