VBSF: a new storage format for SIMD sparse matrix–vector multiplication on modern processors

  • Yishui Li
  • Peizhen Xie
  • Xinhai Chen
  • Jie LiuEmail author
  • Bo Yang
  • Shengguo Li
  • Chunye Gong
  • Xinbiao Gan
  • Han Xu


Sparse matrix–vector multiplication (SpMV) is one of the most indispensable kernels of solving problems in numerous applications, but its performance of SpMV is limited by the need for frequent memory access. Modern processors exploit data-level parallelism to improve the performance using single-instruction multiple data (SIMD). In order to take full advantage of SIMD acceleration technology, a new storage format called Variable Blocked-\(\sigma\)-SIMD Format (VBSF) is proposed in this paper to change the irregular nature of traditional matrix storage formats. This format combines the adjacent nonzero elements into variable size blocks to ensure that SpMV can be computed with SIMD vector units. We compare the VBSF-based SpMV with traditional storage formats using 15 matrices as a benchmark suite on three computing platforms (FT2000, Intel Xeon E5 and Intel Silver) with different SIMD length. For the matrices in the benchmark suite, the VBSF obtains great performance improvement on three platforms, respectively, and it proves to have better storage efficiency compared with other storage formats.


Blocking sparse matrix storage format Single-instruction multiple data (SIMD) Performance optimization Sparse matrix–vector multiplication (SpMV) 



This research work was supported in part by the National Key Research and Development Program of China (2017YFB0202104).


  1. 1.
    Blelloch GE, Heroux MA, Zagha M (1993) Segmented operations for sparse matrix computation on vector multiprocessors. Technical reports, Pittsburgh, PA, USAGoogle Scholar
  2. 2.
    Chen S, Fang J, Chen D, Xu C, Wang Z (2018) Optimizing sparse matrix–vector multiplication on emerging many-core architectures. ArXiv preprint arXiv:1805.11938
  3. 3.
    Chen X, Xie P, Chi L et al (2018) An efficient SIMD compression format for sparse matrix-vector multiplication. Concurr Comput Pract Exp 30(23):e4800CrossRefGoogle Scholar
  4. 4.
    Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1:1–1:25MathSciNetzbMATHGoogle Scholar
  5. 5.
    DAzevedo EF, Fahey MR, Mills RT (2005) Vectorized sparse matrix multiply for compressed row storage format. In: Proceedings of the 5th International Conference on Computational Science-Volume Part I, ICCS’05. Springer, Berlin, pp 99–106Google Scholar
  6. 6.
    Goumas G, Kourtis K, Anastopoulos N, Karakasis V, Koziris N (2009) Performance evaluation of the sparse matrix–vector multiplication on modern architectures. J Supercomput 50(1):36–77CrossRefGoogle Scholar
  7. 7.
    Im EJ, Yelick K, Vuduc R (2004) Sparsity: optimization framework for sparse matrix kernels. Int J High Perform Comput Appl 18(1):135–158CrossRefGoogle Scholar
  8. 8.
    Im EJ, Yelick KA (2001) Optimizing sparse matrix computations for register reuse in SPARSITY. In: Proceedings of the International Conference on Computational Sciences-Part I, ICCS ’01. Springer, Berlin, pp 127–136Google Scholar
  9. 9.
    Karakasis V, Goumas G, Koziris N (2009) A comparative study of blocking storage methods for sparse matrices on multicore architectures. In: Proceedings of the 2009 International Conference On Computational Science And Engineering-Volume 01, CSE ’09. IEEE Computer Society, Washington, DC, pp 247–256Google Scholar
  10. 10.
    Karakasis V, Goumas G, Koziris N (2009) Perfomance models for blocked sparse matrix–vector multiplication kernels. In: Proceedings of the 2009 International Conference on Parallel Processing, ICPP ’09. IEEE Computer Society, Washington, DC, pp 356–364Google Scholar
  11. 11.
    Kreutzer M, Hager G, Wellein G, Fehske H, Bishop A (2013) A unified sparse matrix data format for efficient general sparse matrix–vector multiplication on modern processors with wide SIMD units. SIAM J Sci Comput 36(5):C401–C423MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Langr D, Tvrdik P (2015) Evaluation criteria for sparse matrix storage formats. IEEE Trans Parallel Distrib Syst 27(2):428–440CrossRefGoogle Scholar
  13. 13.
    Li J, Tan G, Chen M, Sun N (2013) SMAT: an input adaptive auto-tuner for sparse matrix–vector multiplication. SIGPLAN Not. 48(6):117–126CrossRefGoogle Scholar
  14. 14.
    Li J, Zhang X, Tan G, Chen M (2014) Study of choosing the optimal storage format of sparse matrix vector multiplication. J Comput Res Dev 51(4):882–894Google Scholar
  15. 15.
    Liu F, Yang C (2014) A new sparse matrix storage format for improving SpMV performance by SIMD. J Numer Methods Comput Appl 35(4):269–276MathSciNetzbMATHGoogle Scholar
  16. 16.
    Liu W, Vinter B (2015) CSR5: an efficient storage format for cross-platform sparse matrix–vector multiplication. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ’15. New York, pp 339–350Google Scholar
  17. 17.
    Liu X, Smelyanskiy M, Chow E, Dubey P (2013) Efficient sparse matrix–vector multiplication on x86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ACM, pp 273–282Google Scholar
  18. 18.
    Patterson DA (2007) The parallel computing landscape: a Berkeley view. In: ACM/IEEE International Symposium on Low Power Electronics and DesignGoogle Scholar
  19. 19.
    Pinar A, Heath M.T (1999) Improving performance of sparse matrix–vector multiplication. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, SC ’99. ACM, New YorkGoogle Scholar
  20. 20.
    Saad Y (1990) SPARSKIT: a basic tool kit for sparse matrix computations NASA Ames Research Center TR 90-20 Google Scholar
  21. 21.
    Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRefzbMATHGoogle Scholar
  22. 22.
    Sedaghati N, Mu T, Pouchet L.N, Parthasarathy S, Sadayappan P (2015) Automatic selection of sparse matrix representation on GPUs. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ’15. New York, pp 99–108Google Scholar
  23. 23.
    Shalf J, Dosanjh S, Morrison J (2011) Exascale computing technology challenges. In: Proceedings of the 9th International Conference on High Performance Computing for Computational Science, VECPAR’10. Springer, Berlin, pp 1–25Google Scholar
  24. 24.
    Shen J, Varbanescu AL, Zou P, Lu Y, Sips H (2014) Improving performance by matching imbalanced workloads with heterogeneous platforms. In: Proceedings of the 28th ACM International Conference on Supercomputing, ICS ’14. ACM, New York, pp 241–250Google Scholar
  25. 25.
    Sun X, Zhang Y, Wang T, Long G, Zhang X, Li Y (2011) CRSD: application specific auto-tuning of SpMV for diagonal sparse matrices. In: Proceedings of the 17th International Conference on Parallel Processing-Volume Part II, Euro-Par’11. Springer, pp 316–327Google Scholar
  26. 26.
    Vuduc R.W, Moon H.J (2005) Fast sparse matrix–vector multiplication by exploiting variable block structure. In: Proceedings of the First International Conference on High Performance Computing and Communications, HPCC’05. Springer, Berlin, pp 807–816Google Scholar
  27. 27.
    Xu C, Deng X, Zhang L, Fang J, Wang G, Jiang Y, Cao W, Che Y, Wang Y, Wang Z (2014) Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer. J Comput Phys 278:275–297CrossRefzbMATHGoogle Scholar
  28. 28.
    Yelick K (2008) pOSKI: an extensible autotuning framework to perform optimized SpMVs on Multicore Architectures. Ph.D. Thesis, Department of Electrical Engineering and Computer Sciences, University of California at BerkeleyGoogle Scholar
  29. 29.
    Zhang A, An H, Yao W, Liang W, Jiang X, Li F (2016) Efficient sparse matrix–vector multiplication on intel xeon phi. J Chin Comput Syst 37(4):818–823Google Scholar
  30. 30.
    Zhao Y, Li J, Liao C, Shen X (2018) Bridging the gap between deep learning and sparse matrix format selection. SIGPLAN Not. 53(1):94–108CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Yishui Li
    • 1
  • Peizhen Xie
    • 1
  • Xinhai Chen
    • 1
  • Jie Liu
    • 1
    Email author
  • Bo Yang
    • 1
  • Shengguo Li
    • 1
  • Chunye Gong
    • 1
  • Xinbiao Gan
    • 1
  • Han Xu
    • 1
  1. 1.Science and Technology on Parallel and Distributed Processing LaboratoryNational University of Defense TechnologyChangshaChina

Personalised recommendations