Abstract
In this paper, we describe source code transformations based on sw-pipelining, loop unrolling, and loop fusion for the sparse matrix-vector multiplication and for the Conjugate Gradient algorithm that enable data prefetching and overlapping of load and FPU arithmetic instructions and improve the temporal cache locality. We develop a probabilistic model for estimation of the numbers of cache misses for 3 types of data caches: direct mapped and s-way set associative with random and with LRU replacement strategies. Using HW cache monitoring tools, we compare the predicted number of cache misses with real numbers on Intel x86 architecture with L1 and L2 caches. The accuracy of our analytical model is around 97%. The errors in estimations are due to minor simplifying assumptions in our model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Heras, D.B., Cabaleiro, J.C., Rivera, F.F.: Modeling data locality for the sparse matrix-vector product using distance measures. Parallel Computing 27(7), 897–912 (2001)
Rollin, S., Geus, R.: Towards a fast parallel sparse matrix-vector multiplication. In: D’Hollander, E.H., Joubert, J.R., Peters, F.J., Sips, H. (eds.) Proc. of PARCO 1999 Parallel Computing: Fundamentals and Applications, pp. 308–315. Imperial College Press, London (2000)
Temam, O., Jalby, W.: Characterizing the behavior of sparse algorithms on caches. In: Supercomputing, pp. 578–587 (1992)
Vuduc, R., Demmel, J.W., Yelick, K.A., Kamil, S., Nishtala, R., Lee, B.: Performance optimizations and bounds for sparse matrix-vector multiply. In: Proceedings of Supercomputing 2002, Baltimore, MD, USA (November 2002)
Wolfe, M.: High-Performance Compilers for Parallel Computing. Addison-Wesley, Reading (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tvrdík, P., Šimeček, I. (2004). Analytical Modeling of Optimized Sparse Linear Code. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2003. Lecture Notes in Computer Science, vol 3019. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24669-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-24669-5_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21946-0
Online ISBN: 978-3-540-24669-5
eBook Packages: Springer Book Archive