A Cache-Oblivious Sparse Matrix–Vector Multiplication Scheme Based on the Hilbert Curve

Conference paper
Part of the Mathematics in Industry book series (MATHINDUSTRY, volume 17)

Abstract

The sparse matrix–vector (SpMV) multiplication is an important kernel in many applications. When the sparse matrix used is unstructured, however, standard SpMV multiplication implementations typically are inefficient in terms of cache usage, sometimes working at only a fraction of peak performance. Cache-aware algorithms take information on specifics of the cache architecture as a parameter to derive an efficient SpMV multiply. In contrast, cache-oblivious algorithms strive to obtain efficiency regardless of cache specifics. In earlier work in this latter area, Haase et al. (2007) use the Hilbert curve to order nonzeroes in the sparse matrix. They obtain speedup mainly when multiplying against multiple (up to eight) right-hand sides simultaneously. We improve on this by introducing a new datastructure, called Bi-directional Incremental Compressed Row Storage (BICRS). Using this datastructure to store the nonzeroes in Hilbert order, speedups of up to a factor two are attained for the SpMV multiplication y = Ax on sufficiently large, unstructured matrices.

Keywords

Sparse Matrix Column Index Hilbert Curve Cache Architecture Link Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., van der Vorst, H. (eds.): Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia, PA (2000)Google Scholar
  2. 2.
    Bender, M.A., Brodal, G.S., Fagerberg, R., Jacob, R., Vicari, E.: Optimal sparse matrix dense vector multiplication in the I/O-model. In: Proceedings 19th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 61–70. ACM Press, New York (2007)Google Scholar
  3. 3.
    Dennis, J.M., Jessup, E.R.: Applying automated memory analysis to improve iterative algorithms. SIAM J. Sci. Comput. 29(5), 2210–2223 (2007)Google Scholar
  4. 4.
    Goto, K., van de Geijn, R.: On reducing TLB misses in matrix multiplication. Technical Report TR-2002-55, University of Texas at Austin, Department of Computer Sciences (2002)Google Scholar
  5. 5.
    Haase, G., Liebmann, M., Plank, G.: A Hilbert-order multiplication scheme for unstructured sparse matrices. Int. J. Parallel, Emergent Distr. Syst. 22(4), 213–220 (2007)Google Scholar
  6. 6.
    Im, E.J., Yelick, K.A.: Optimizing sparse matrix computations for register reuse in SPARSITY. In: Proceedings International Conference on Computational Science, Part I, Lecture Notes in Computer Science, vol. 2073, pp. 127–136, Springer, Berlin (2001)Google Scholar
  7. 7.
    Koster, J.: Parallel templates for numerical linear algebra, a high-performance computation library. Master’s thesis, Utrecht University, Department of Mathematics (2002)Google Scholar
  8. 8.
    Lorton, K.P., Wise, D.S.: Analyzing block locality in Morton-order and Morton-hybrid matrices. SIGARCH Comput. Archit. News 35(4), 6–12 (2007)Google Scholar
  9. 9.
    Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing. Technical report, IBM, Ottawa, Canada (1966)Google Scholar
  10. 10.
    Nishtala, R., Vuduc, R.W., Demmel, J.W., Yelick, K.A.: When cache blocking of sparse matrix vector multiply works and why. Appl. Algebra Engrg. Comm. Comput. 18(3), 297–311 (2007)Google Scholar
  11. 11.
    Pinar, A., Heath, M.T.: Improving performance of sparse matrix-vector multiplication. In: Proceedings Supercomputing 1999, p. 30. ACM Press, New York (1999)Google Scholar
  12. 12.
    Toledo, S.: Improving the memory-system performance of sparse-matrix vector multiplication. IBM J. Res. Dev. 41(6), 711–725 (1997)Google Scholar
  13. 13.
    Vuduc, R., Demmel, J.W., Yelick, K.A.: OSKI: A library of automatically tuned sparse matrix kernels. J. Phys. Conf. Series 16, 521–530 (2005)Google Scholar
  14. 14.
    Vuduc, R.W., Moon, H.J.: Fast sparse matrix-vector multiplication by exploiting variable block structure. In: High Performance Computing and Communications 2005, Lecture Notes in Computer Science, vol. 3726, pp. 807–816, Springer, Berlin (2005)Google Scholar
  15. 15.
    Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the ATLAS project. Parallel Comput. 27(1–2), 3–35 (2001)Google Scholar
  16. 16.
    Yzelman, A.N., Bisseling, R.H.: Cache-oblivious sparse matrix–vector multiplication by using sparse matrix partitioning methods. SIAM J. Sci. Comput. 31(4), 3128–3154 (2009)Google Scholar
  17. 17.
    Yzelman, A.N., Bisseling, R.H.: Two-dimensional cache-oblivious sparse matrix–vector multiplication, Parallel Comput. 37(12), 806–819 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Utrecht UniversityUtrechtThe Netherlands

Personalised recommendations