Applicable Algebra in Engineering, Communication and Computing

, Volume 18, Issue 3, pp 297–311

When cache blocking of sparse matrix vector multiply works and why

  • Rajesh Nishtala
  • Richard W. Vuduc
  • James W. Demmel
  • Katherine A. Yelick
Article

DOI: 10.1007/s00200-007-0038-9

Cite this article as:
Nishtala, R., Vuduc, R.W., Demmel, J.W. et al. AAECC (2007) 18: 297. doi:10.1007/s00200-007-0038-9

Abstract

We present new performance models and more compact data structures for cache blocking when applied to sparse matrix-vector multiply (SpM × V). We extend our prior models by relaxing the assumption that the vectors fit in cache and find that the new models are accurate enough to predict optimum block sizes. In addition, we determine criteria that predict when cache blocking improves performance. We conclude with architectural suggestions that would make memory systems execute SpM × V faster.

Keywords

Performance optimizationSparse matrix multiplicationMemory hierarchiesPerformance modeling

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • Rajesh Nishtala
    • 1
  • Richard W. Vuduc
    • 1
  • James W. Demmel
    • 1
  • Katherine A. Yelick
    • 1
  1. 1.Computer Science DivisionUniversity of California at BerkeleyBerkeleyUSA