When cache blocking of sparse matrix vector multiply works and why

  • Rajesh NishtalaEmail author
  • Richard W. Vuduc
  • James W. Demmel
  • Katherine A. Yelick


We present new performance models and more compact data structures for cache blocking when applied to sparse matrix-vector multiply (SpM × V). We extend our prior models by relaxing the assumption that the vectors fit in cache and find that the new models are accurate enough to predict optimum block sizes. In addition, we determine criteria that predict when cache blocking improves performance. We conclude with architectural suggestions that would make memory systems execute SpM × V faster.


Performance optimization Sparse matrix multiplication Memory hierarchies Performance modeling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bilmes, J., Asanović, K., Demmel, J., Lam, D., Chin, C.W. PHiPAC: A portable, high- performance, ANSI C coding methodology and its application to matrix multiply, University of Tennessee, LAPACK Working Note 111 (1996)Google Scholar
  2. 2.
    Browne, S., Dongarra, J., Garner, N., London, K., Mucci, P.: A scalable cross-platform infrastructure for application performance tuning using hardware counters. In: Proceedings of supercomputing, November (2000)Google Scholar
  3. 3.
    Fraguela, B.B., Doallo, R., Zapata, E.L.: Memory hierarchy performance prediction for sparse blocked algorithms. Parallel Proc Lett, 9(3) (1999)Google Scholar
  4. 4.
    Gropp, W.D., Kasushik, D.K., Keyes, D.E., Smith, B.F.: Towards realistic bounds for implicit CFD codes. In: Proceedings of parallel computational fluid dynamics, pp. 241–248 (1999)Google Scholar
  5. 5.
    Heber, G., Dolgert, A.J., Alt, M., Mazurkiewicz, K.A., Stringer, L.: Fracture mechanics on the Intel Itanium architecture: a case study. In: Workshop on EPIC architectures and compiler technology (ACM MICRO 34), Austin, TX (2001)Google Scholar
  6. 6.
    Heras, D.B., Perez, V.B., Dominguez, J.C.C., Rivera, F.F.: Modeling and improving locality for irregular problems: sparse matrix-vector product on cache memories as a case study. In: HPCN Europe, pp. 201–210 (1999)Google Scholar
  7. 7.
    Im, E.-J.: Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May (2000)Google Scholar
  8. 8.
    Nishtala, R., Vuduc, R.W., Demmel, J.W., Yelick, K.A.: Performance modeling and analysis of cache blocking in sparse matrix vector multiply. Technical report (UCB/CSD-04-1335), University of California, Berkeley, EECS Dept. (2004)Google Scholar
  9. 9.
    Saad, Y. SPARSKIT: A basic toolkit for sparse matrix computations (1994) www.cs.umn. edu/Research/arpa/SPARSKIT/sparskit.htmlGoogle Scholar
  10. 10.
    Saavedra-Barrera, R.H.: CPU performance evaluation and execution time prediction using narrow spectrum benchmarking. PhD thesis, University of California, Berkeley, February (1992)Google Scholar
  11. 11.
    Snavely, A., Carrington, L., Wolter, N.: Modeling application performance by convolving machine signatures with application profiles (2001)Google Scholar
  12. 12.
    Temam, O., Jalby, W.: Characterizing the behavior of sparse algorithms on caches. In: Proceedings of supercomputing ’92 (1992)Google Scholar
  13. 13.
    Vuduc, R.W. OSKI: Optimized Sparse Kernel Interface (2005) http://bebop.cs.berkeley. edu/oski/Google Scholar
  14. 14.
    Vuduc, R., Demmel, J.W., Yelick, K.A., Kamil, S., Nishtala, R., Lee, B.: Performance optimizations and bounds for sparse matrix-vector multiply. In: Proceedings of supercomputing, Baltimore, MD, USA, November (2002)Google Scholar
  15. 15.
    Vuduc, R.W.: Automatic performance tuning of sparse matrix kernels. PhD thesis, University of California, Berkeley (2003)Google Scholar
  16. 16.
    Whaley, C., Dongarra, J.: Automatically tuned linear algebra software. In: Proceedings of supercomputer (1998)Google Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • Rajesh Nishtala
    • 1
    Email author
  • Richard W. Vuduc
    • 1
  • James W. Demmel
    • 1
  • Katherine A. Yelick
    • 1
  1. 1.Computer Science DivisionUniversity of California at BerkeleyBerkeleyUSA

Personalised recommendations