Skip to main content

An effective SPMV based on block strategy and hybrid compression on GPU


Due to the non-uniformity of the sparse matrix, the calculation of SPMV (sparse matrix vector multiplication) will lead to redundancy in calculation, redundancy in storage, unbalanced load and low GPU utilization. In this study, a new matrix compression method based on CSR and COO is proposed for the above analysis: PBC algorithm. This method considers the load balancing condition in the calculation process of SPMV, and blocks are divided according to the strategy of row main order to ensure the minimum standard deviation between each block, aiming to satisfy the maximum similarity in the number of nonzero elements between each block. This paper preprocesses the original matrix based on block splitting algorithm to meet the conditions of load balancing for each block stored in the form of CSR and COO. Finally, the experimental results show that the time of SPMV preprocessing is within the acceptable range of the algorithm. Compared with the serial code without CSR optimization, the parallel method in this paper has an acceleration ratio of 178x. In addition, compared with the serial code for CSR optimization, the parallel method in this paper has an acceleration ratio of 6x. And a representative matrix compression method is also selected for performing comparative analysis. The experimental results show that the PBC algorithm has a good efficiency improvement compared with the comparison algorithm.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.

    Ernesto D, Pablo E (2018) Solving sparse triangular linear systems in modern GPUs: a synchronization-free algorithm. In: 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), 1:196–203

  2. 2.

    Ahmadi A, Manganiello F, Khademi A et al (2021) A parallel Jacobi-embedded Gauss-Seidel mMethod. IEEE Trans Parallel Distrib Syst 76:8883–8900

    Google Scholar 

  3. 3.

    Barreda M, Dolz MF et al (2020) Performance modeling of the sparse matrix-vector product via convolutional neural networks. J Supercomput 76:8883–8900

    Article  Google Scholar 

  4. 4.

    Benatia A, Ji, WX, Wang, YZ (2016) Sparse matrix format selection with multiclass SVM for SPMV on GPU. In: 45th International Conference on Parallel Processing (ICPP), Comm. ACM 38(4):393–422

  5. 5.

    Nathan B, Michael G (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the ACM/IEEE Conference on High Performance Computing, SC 2009, pp. 1–11

  6. 6.

    Francisco V, Ortega G, José-Jesús F (2010) Improving the performance of the sparse matrix vector product with GPUs. In: 10th IEEE International Conference on Computer and Information Technology, CIT Bradford, West Yorkshire, UK, pp. 1146–1151

  7. 7.

    Dominik G, Anton L (2011) Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation. In: Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units, GPGPU, pp. 1–8

  8. 8.

    Juan C, Francisco F, Marcos F et al (2012) Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs. Microprocess Microsyst 36(2):65–77

    Article  Google Scholar 

  9. 9.

    Yzelman AJN, Roose D (2013) High-level strategies for parallel shared-memory sparse matrix-vector multiplication. IEEE Tras Parallel Distrib Syst 25(1):116–125

    Article  Google Scholar 

  10. 10.

    Ashari A, Sedaghati N, Eisenlohr J et al (2014) An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In: Proceedings of the 28th ACM international conference on Supercomputing. ACM, pp. 273–282

  11. 11.

    Yang W, Li K, Mo Z et al (2015) Performance optimization using partitioned SPMV on GPUs and multicore CPUs. IEEE Trans Comput 64(9):2623–2636

    MathSciNet  Article  Google Scholar 

  12. 12.

    Cheng K, Tian J, Ma RL (2018) Study on efficient storage format of sparse matrix based on GPU. Comput Eng 491(08):60–66

    Google Scholar 

  13. 13.

    Buatois L, Caumon G, Levy B (2009) Concurrent number cruncher: a GPU implementation of a general sparse linear solver. Int J Parallel Emerg Distrib Syst 24(3):205–223

    MathSciNet  Article  Google Scholar 

  14. 14.

    Oberhuber T, Suzuki A, Vacata J (2011) New row-grouped CSR format for storing the sparse matrices on GPU with implementation in CUDA. Acta Tech 56(4):447–466

    MathSciNet  Google Scholar 

  15. 15.

    Belgin M, Back G, Ribbens CJ (2009) Pattern-based sparse matrix representation for memory-efficient SMVM kernels. In: Proceedings of the 23rd International Conference on Supercomputing, ACM, pp 100–109

  16. 16.

    Williams S, Oliker L, Vuduc R (2009) Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Comput 35(3):178–194

    Article  Google Scholar 

  17. 17.

    Monakov A, Lokhmotov A, Avetisyan A (2010) Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: International Conference on High-Performance Embedded Architectures and Compilers. DBLP, pp 111–125

  18. 18.

    Choi JW, Singh A (2010) Model-driven autotuning of sparse matrix vector multiply on GPUs. In: PPoPP ’10: Proceedings of the 15th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pp 115–126

  19. 19.

    Yzelman AN, Bisseling RH (2011) Two-dimensional cache-oblivious sparse matrix vector multiplication. Parallel Comput 37(12):806–819

    Article  Google Scholar 

  20. 20.

    Abubaker NFT, Kadir A, Cevdet A (2019) Spatiotemporal graph and hypergraph partitioning models for sparse matrix-vector multiplication on many-core architectures. IEEE Trans Parallel Distrib Syst 30(2):445–458

    Article  Google Scholar 

  21. 21.

    Yang C, Aydin B, Owens J (2018) Design principles for sparse matrix multiplication on the GPU. In: Euro-Par, pp 1–16

  22. 22.

    Yan SG, Li C, Zhang YQ (2014). yaSPMV: yet another SPMV framework on GPUs. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, pp 107–118

  23. 23.

    Liu W, Vinter B (2015) CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In: The 29th ACM International Conference on Supercomputing (ICS ’15). ACM, ACM, pp. 1–12

  24. 24.

    Tan GM, Liu JH, Li JJ (2018) Design and implementation of adaptive SPMV library for multicore and many-core architecture. ACM Trans Math Softw 44(4):1–25

    MathSciNet  Article  Google Scholar 

  25. 25.

    Yang W, Li K, Li K (2018) A parallel computing method using blocked format with optimal partitioning for SPMV on GPU. J Comput Syst Sci 92:152–170

    MathSciNet  Article  Google Scholar 

  26. 26.

    Benatia A, Ji W, Wang Y et al (2019) BestSF: A sparse meta-format for optimizing SPMV on GPU. ACM Trans Arch Code Optim 15(3):1–27

    Google Scholar 

  27. 27.

    Yang WD, Li KL, Liu YZ et al (2014) Optimization of Quasi - diagonal matrix-vector multiplication on GPU[J]. Int J High Perform Comput Appl 28(2):183–195

    Article  Google Scholar 

  28. 28.

    Fukaya T, Ishida K, Miura A et al (2021) Accelerating the SpMV kernel on standard CPUs by exploiting the partially diagonal structures[J]. Preprints

  29. 29.

    He G, Chen Q, Gao J (2021) A new diagonal storage for efficient implementation of sparse matrix-vector multiplication on graphics processing unit. Concurr Comput Pract Exp 2021(4):1–15

    Article  Google Scholar 

  30. 30.

    Gao J, Xia Y, Yin R et al (2021) Adaptive diagonal sparse matrix-vector multiplication on GPU[J]. J Parallel Distrib Comput 157(11):1–53

    Google Scholar 

Download references


This work was partially supported by NSFC (National Natural Science Foundation of China) Number No. 61672181.

Author information



Corresponding author

Correspondence to Yuhua Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cui, H., Wang, N., Wang, Y. et al. An effective SPMV based on block strategy and hybrid compression on GPU. J Supercomput (2021).

Download citation


  • SPMV
  • GPU
  • Unbalanced load
  • PBC algorithm
  • CSR
  • Parallel efficiency