Data Transfer Requirement Analysis with Bandwidth Curves

Weidendorfer, Josef

doi:10.1007/978-3-642-54420-0_59

Josef Weidendorfer²⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8374))

Included in the following conference series:

European Conference on Parallel Processing

1783 Accesses

Abstract

The amount of data transfer required by (parallel) applications can be of significant importance for performance due to bandwidth limitations in real systems. For intra-node, data transfer refers to memory accesses, and usually, cache miss counts are used for performance analysis. However, they cannot explicitly show utilization of links and buses in the memory hierarchy. In this paper, we propose an explicit visualization of the bandwidth requirements of applications within memory hierarchies. This is based on a machine model without bandwidth limits. We show its usefulness within the context of a simple 2D stencil iterative solver, with and without cache optimization applied.

Download to read the full chapter text

Chapter PDF

Modeling Large Compute Nodes with Heterogeneous Memories with Cache-Aware Roofline Model

Modeling the Impact of Reduced Memory Bandwidth on HPC Applications

Time and Energy Performance of Parallel Systems with Hierarchical Memory

Article Open access 09 September 2015

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Intel VTune Amplifier, http://software.intel.com/en-us/intel-vtune-amplifier-xe
International Technology Roadmap for Seminconductors (2012), http://www.itrs.net/Links/2012ITRS/2012Chapters/2012Overview.pdf
Almási, G., Caşcaval, C., Padua, D.A.: Calculating stack distances efficiently. In: Proceedings of the 2002 Workshop on Memory System Performance, MSP 2002, pp. 37–43. ACM, New York (2002)
Google Scholar
Bennett, B.T., Kruskal, V.J.: LRU stack processing. IBM Journal of Research and Development 19, 353–357 (1975)
Article MATH MathSciNet Google Scholar
Berg, E., Hagersten, E.: Statcache: A probabilistic approach to efficient and accurate data locality analysis. In: Proc. of the Int. Symposium on Performance Analysis of Systems and Software (2004)
Google Scholar
de Melo, A.C.: Performance Counters on Linux. Presentation at the Linux Plumbers Conference (September 2009)
Google Scholar
Luk, C., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proc. of PLDI 2005. ACM, New York (2005)
Google Scholar
Niu, Q., Dinan, J., Lu, Q., Sadayappan, P.: PARDA: A fast parallel reuse distance analysis algorithm. In: Int. Parallel and Distributed Processing Symposium (2012)
Google Scholar
Schuff, D.L., Kulkarni, M., Pai, V.S.: Accelerating multicore reuse distance analysis with sampling and parallelization. In: Proc. of the 19th Int. Conf. on Parallel Architectures and Compilation Techniques. ACM, New York (2010)
Google Scholar
Weidendorfer, J., Trinitis, C.: Collecting and exploiting cache-reuse metrics. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3515, pp. 191–198. Springer, Heidelberg (2005)
Chapter Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Comm. ACM 52(4) (April 2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität München, Munich, Germany
Josef Weidendorfer

Authors

Josef Weidendorfer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Rechen- und Kommunikationszentrum, RWTH Aachen, Seffenter Weg 23, 52074, Aachen, Germany
Dieter an Mey
TU Vienna, 1040, Vienna, Austria
Michael Alexander
RWTH Aachen University, Seffenter Weg 23, 52074, Aachen, Germany
Paolo Bientinesi & Carsten Clauss &
University Magna Graecia of Catanzaro, 88100, Catanzaro, Italy
Mario Cannataro
Inria Rennes - Bretagne Atlantique, 35042, Rennes, France
Alexandru Costan & Christine Morin &
University of Innsbruck, 6020, Innsbruck, Austria
Gabor Kecskemeti
Department of Computer Science, University of Pisa, 56126, Pisa, Italy
Laura Ricci
Universitat Politècnica de València, 46022, València, Spain
Julio Sahuquillo
LLNL, USA
Martin Schulz
Dipartimento di Informatica, Università di Salerno, 84084, Salerno, Italy
Vittorio Scarano
Tennessee Tech University and Oak Ridge National Laboratory, 38505, Cookeville, TN, USA
Stephen L. Scott
Technische Universität München, 80333, Munich, Germany
Josef Weidendorfer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Weidendorfer, J. (2014). Data Transfer Requirement Analysis with Bandwidth Curves. In: an Mey, D., et al. Euro-Par 2013: Parallel Processing Workshops. Euro-Par 2013. Lecture Notes in Computer Science, vol 8374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54420-0_59

Download citation

DOI: https://doi.org/10.1007/978-3-642-54420-0_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54419-4
Online ISBN: 978-3-642-54420-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Data Transfer Requirement Analysis with Bandwidth Curves

Abstract

Chapter PDF

Similar content being viewed by others

Modeling Large Compute Nodes with Heterogeneous Memories with Cache-Aware Roofline Model

Modeling the Impact of Reduced Memory Bandwidth on HPC Applications

Time and Energy Performance of Parallel Systems with Hierarchical Memory

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Data Transfer Requirement Analysis with Bandwidth Curves

Abstract

Chapter PDF

Similar content being viewed by others

Modeling Large Compute Nodes with Heterogeneous Memories with Cache-Aware Roofline Model

Modeling the Impact of Reduced Memory Bandwidth on HPC Applications

Time and Energy Performance of Parallel Systems with Hierarchical Memory

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation