VBSF: a new storage format for SIMD sparse matrix–vector multiplication on modern processors


Sparse matrix–vector multiplication (SpMV) is one of the most indispensable kernels of solving problems in numerous applications, but its performance of SpMV is limited by the need for frequent memory access. Modern processors exploit data-level parallelism to improve the performance using single-instruction multiple data (SIMD). In order to take full advantage of SIMD acceleration technology, a new storage format called Variable Blocked-\(\sigma\)-SIMD Format (VBSF) is proposed in this paper to change the irregular nature of traditional matrix storage formats. This format combines the adjacent nonzero elements into variable size blocks to ensure that SpMV can be computed with SIMD vector units. We compare the VBSF-based SpMV with traditional storage formats using 15 matrices as a benchmark suite on three computing platforms (FT2000, Intel Xeon E5 and Intel Silver) with different SIMD length. For the matrices in the benchmark suite, the VBSF obtains great performance improvement on three platforms, respectively, and it proves to have better storage efficiency compared with other storage formats.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. 1.

    Blelloch GE, Heroux MA, Zagha M (1993) Segmented operations for sparse matrix computation on vector multiprocessors. Technical reports, Pittsburgh, PA, USA

  2. 2.

    Chen S, Fang J, Chen D, Xu C, Wang Z (2018) Optimizing sparse matrix–vector multiplication on emerging many-core architectures. ArXiv preprint arXiv:1805.11938

  3. 3.

    Chen X, Xie P, Chi L et al (2018) An efficient SIMD compression format for sparse matrix-vector multiplication. Concurr Comput Pract Exp 30(23):e4800

    Article  Google Scholar 

  4. 4.

    Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1:1–1:25

    MathSciNet  MATH  Google Scholar 

  5. 5.

    DAzevedo EF, Fahey MR, Mills RT (2005) Vectorized sparse matrix multiply for compressed row storage format. In: Proceedings of the 5th International Conference on Computational Science-Volume Part I, ICCS’05. Springer, Berlin, pp 99–106

  6. 6.

    Goumas G, Kourtis K, Anastopoulos N, Karakasis V, Koziris N (2009) Performance evaluation of the sparse matrix–vector multiplication on modern architectures. J Supercomput 50(1):36–77

    Article  Google Scholar 

  7. 7.

    Im EJ, Yelick K, Vuduc R (2004) Sparsity: optimization framework for sparse matrix kernels. Int J High Perform Comput Appl 18(1):135–158

    Article  Google Scholar 

  8. 8.

    Im EJ, Yelick KA (2001) Optimizing sparse matrix computations for register reuse in SPARSITY. In: Proceedings of the International Conference on Computational Sciences-Part I, ICCS ’01. Springer, Berlin, pp 127–136

  9. 9.

    Karakasis V, Goumas G, Koziris N (2009) A comparative study of blocking storage methods for sparse matrices on multicore architectures. In: Proceedings of the 2009 International Conference On Computational Science And Engineering-Volume 01, CSE ’09. IEEE Computer Society, Washington, DC, pp 247–256

  10. 10.

    Karakasis V, Goumas G, Koziris N (2009) Perfomance models for blocked sparse matrix–vector multiplication kernels. In: Proceedings of the 2009 International Conference on Parallel Processing, ICPP ’09. IEEE Computer Society, Washington, DC, pp 356–364

  11. 11.

    Kreutzer M, Hager G, Wellein G, Fehske H, Bishop A (2013) A unified sparse matrix data format for efficient general sparse matrix–vector multiplication on modern processors with wide SIMD units. SIAM J Sci Comput 36(5):C401–C423

    MathSciNet  Article  Google Scholar 

  12. 12.

    Langr D, Tvrdik P (2015) Evaluation criteria for sparse matrix storage formats. IEEE Trans Parallel Distrib Syst 27(2):428–440

    Article  Google Scholar 

  13. 13.

    Li J, Tan G, Chen M, Sun N (2013) SMAT: an input adaptive auto-tuner for sparse matrix–vector multiplication. SIGPLAN Not. 48(6):117–126

    Article  Google Scholar 

  14. 14.

    Li J, Zhang X, Tan G, Chen M (2014) Study of choosing the optimal storage format of sparse matrix vector multiplication. J Comput Res Dev 51(4):882–894

    Google Scholar 

  15. 15.

    Liu F, Yang C (2014) A new sparse matrix storage format for improving SpMV performance by SIMD. J Numer Methods Comput Appl 35(4):269–276

    MathSciNet  MATH  Google Scholar 

  16. 16.

    Liu W, Vinter B (2015) CSR5: an efficient storage format for cross-platform sparse matrix–vector multiplication. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ’15. New York, pp 339–350

  17. 17.

    Liu X, Smelyanskiy M, Chow E, Dubey P (2013) Efficient sparse matrix–vector multiplication on x86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ACM, pp 273–282

  18. 18.

    Patterson DA (2007) The parallel computing landscape: a Berkeley view. In: ACM/IEEE International Symposium on Low Power Electronics and Design

  19. 19.

    Pinar A, Heath M.T (1999) Improving performance of sparse matrix–vector multiplication. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, SC ’99. ACM, New York

  20. 20.

    Saad Y (1990) SPARSKIT: a basic tool kit for sparse matrix computations NASA Ames Research Center TR 90-20 

  21. 21.

    Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia

    Google Scholar 

  22. 22.

    Sedaghati N, Mu T, Pouchet L.N, Parthasarathy S, Sadayappan P (2015) Automatic selection of sparse matrix representation on GPUs. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ’15. New York, pp 99–108

  23. 23.

    Shalf J, Dosanjh S, Morrison J (2011) Exascale computing technology challenges. In: Proceedings of the 9th International Conference on High Performance Computing for Computational Science, VECPAR’10. Springer, Berlin, pp 1–25

  24. 24.

    Shen J, Varbanescu AL, Zou P, Lu Y, Sips H (2014) Improving performance by matching imbalanced workloads with heterogeneous platforms. In: Proceedings of the 28th ACM International Conference on Supercomputing, ICS ’14. ACM, New York, pp 241–250

  25. 25.

    Sun X, Zhang Y, Wang T, Long G, Zhang X, Li Y (2011) CRSD: application specific auto-tuning of SpMV for diagonal sparse matrices. In: Proceedings of the 17th International Conference on Parallel Processing-Volume Part II, Euro-Par’11. Springer, pp 316–327

  26. 26.

    Vuduc R.W, Moon H.J (2005) Fast sparse matrix–vector multiplication by exploiting variable block structure. In: Proceedings of the First International Conference on High Performance Computing and Communications, HPCC’05. Springer, Berlin, pp 807–816

  27. 27.

    Xu C, Deng X, Zhang L, Fang J, Wang G, Jiang Y, Cao W, Che Y, Wang Y, Wang Z (2014) Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer. J Comput Phys 278:275–297

    Article  Google Scholar 

  28. 28.

    Yelick K (2008) pOSKI: an extensible autotuning framework to perform optimized SpMVs on Multicore Architectures. Ph.D. Thesis, Department of Electrical Engineering and Computer Sciences, University of California at Berkeley

  29. 29.

    Zhang A, An H, Yao W, Liang W, Jiang X, Li F (2016) Efficient sparse matrix–vector multiplication on intel xeon phi. J Chin Comput Syst 37(4):818–823

    Google Scholar 

  30. 30.

    Zhao Y, Li J, Liao C, Shen X (2018) Bridging the gap between deep learning and sparse matrix format selection. SIGPLAN Not. 53(1):94–108

    Article  Google Scholar 

Download references


This research work was supported in part by the National Key Research and Development Program of China (2017YFB0202104).

Author information



Corresponding author

Correspondence to Jie Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Xie, P., Chen, X. et al. VBSF: a new storage format for SIMD sparse matrix–vector multiplication on modern processors. J Supercomput 76, 2063–2081 (2020). https://doi.org/10.1007/s11227-019-02835-4

Download citation


  • Blocking sparse matrix storage format
  • Single-instruction multiple data (SIMD)
  • Performance optimization
  • Sparse matrix–vector multiplication (SpMV)