A Study on the Influence of Caching: Sequences of Dense Linear Algebra Kernels

Peise, Elmar; Bientinesi, Paolo

doi:10.1007/978-3-319-17353-5_21

Elmar Peise¹⁶ &
Paolo Bientinesi¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8969))

Included in the following conference series:

International Conference on High Performance Computing for Computational Science

742 Accesses
2 Citations

Abstract

It is universally known that caching is critical to attain high-performance implementations: In many situations, data locality (in space and time) plays a bigger role than optimizing the (number of) arithmetic floating point operations. In this paper, we show evidence that at least for linear algebra algorithms, caching is also a crucial factor for accurate performance modeling and performance prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
With \(n = 1{,}568 = 2^5 \cdot 7^2\), we choose a matrix size that is not a power of \(2\) to avoid performance artifacts due to the specific problem size.
2.
The subscripts R through U are the values of the flag arguments side, uplo, trans, and diag; they distinguish the form of the operation performed by the kernel.
3.
Read from the CPU’s time stamp counter through the assembly instruction rdtsc.
4.
The system fluctuations cause variations of the dgeqrf timings of 0.057 % on average. With the exception of the tiny dcopy s, these fluctuations are not significant.
5.
By “touching”, we mean a simple read+write access to the data, e.g. .
6.
The length of the list can be safely restricted to the number of kernel calls per iteration of the blocked algorithm.
7.
For \(n = 2400\), the upper triangular portion of the matrix is about twice as large as the cache size.

References

Peise, E., Bientinesi, P.: Performance modeling for dense linear algebra. In: Proceedings of the 3rd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS12), November 2012
Google Scholar
Whaley, R.: Empirically tuning lapack’s blocking factor for increased performance. In: 2008 International Multiconference on Computer Science and Information Technology, IMCSIT 2008, pp. 303–310, October 2008
Google Scholar
Lam, M.D., Rothberg, E.E., Wolf, M.E.: The cache performance and optimizations of blocked algorithms. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IV, pp. 63–74. ACM, New York (1991)
Google Scholar
Iakymchuk, R., Bientinesi, P.: Modeling performance through memory-stalls. ACM SIGMETRICS Perform. Eval. Rev. 40(2), 86–91 (2012)
Article Google Scholar
OpenBLAS: http://www.openblas.net/

Download references

Acknowledgments

Financial support from the Deutsche Forschungsgemeinschaft (DFG) through grant GSC 111 and the Deutsche Telekom Stiftung is gratefully acknowledged.

Author information

Authors and Affiliations

AICES, RWTH Aachen, Aachen, Germany
Elmar Peise & Paolo Bientinesi

Authors

Elmar Peise
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Bientinesi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elmar Peise .

Editor information

Editors and Affiliations

IRIT, ENSEEIHT, Toulouse Cedex, France
Michel Daydé
Lawrence Berkeley National Laboratory, Berkeley, California, USA
Osni Marques
Information Technology Center, The University of Tokyo, Tokyo, Japan
Kengo Nakajima

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peise, E., Bientinesi, P. (2015). A Study on the Influence of Caching: Sequences of Dense Linear Algebra Kernels. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science -- VECPAR 2014. VECPAR 2014. Lecture Notes in Computer Science(), vol 8969. Springer, Cham. https://doi.org/10.1007/978-3-319-17353-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-17353-5_21
Published: 18 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17352-8
Online ISBN: 978-3-319-17353-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics