Due to the non-uniformity of the sparse matrix, the calculation of SPMV (sparse matrix vector multiplication) will lead to redundancy in calculation, redundancy in storage, unbalanced load and low GPU utilization. In this study, a new matrix compression method based on CSR and COO is proposed for the above analysis: PBC algorithm. This method considers the load balancing condition in the calculation process of SPMV, and blocks are divided according to the strategy of row main order to ensure the minimum standard deviation between each block, aiming to satisfy the maximum similarity in the number of nonzero elements between each block. This paper preprocesses the original matrix based on block splitting algorithm to meet the conditions of load balancing for each block stored in the form of CSR and COO. Finally, the experimental results show that the time of SPMV preprocessing is within the acceptable range of the algorithm. Compared with the serial code without CSR optimization, the parallel method in this paper has an acceleration ratio of 178x. In addition, compared with the serial code for CSR optimization, the parallel method in this paper has an acceleration ratio of 6x. And a representative matrix compression method is also selected for performing comparative analysis. The experimental results show that the PBC algorithm has a good efficiency improvement compared with the comparison algorithm.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price includes VAT (USA)
Tax calculation will be finalised during checkout.
Ernesto D, Pablo E (2018) Solving sparse triangular linear systems in modern GPUs: a synchronization-free algorithm. In: 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), 1:196–203
Ahmadi A, Manganiello F, Khademi A et al (2021) A parallel Jacobi-embedded Gauss-Seidel mMethod. IEEE Trans Parallel Distrib Syst 76:8883–8900
Barreda M, Dolz MF et al (2020) Performance modeling of the sparse matrix-vector product via convolutional neural networks. J Supercomput 76:8883–8900
Benatia A, Ji, WX, Wang, YZ (2016) Sparse matrix format selection with multiclass SVM for SPMV on GPU. In: 45th International Conference on Parallel Processing (ICPP), Comm. ACM 38(4):393–422
Nathan B, Michael G (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the ACM/IEEE Conference on High Performance Computing, SC 2009, pp. 1–11
Francisco V, Ortega G, José-Jesús F (2010) Improving the performance of the sparse matrix vector product with GPUs. In: 10th IEEE International Conference on Computer and Information Technology, CIT Bradford, West Yorkshire, UK, pp. 1146–1151
Dominik G, Anton L (2011) Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation. In: Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units, GPGPU, pp. 1–8
Juan C, Francisco F, Marcos F et al (2012) Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs. Microprocess Microsyst 36(2):65–77
Yzelman AJN, Roose D (2013) High-level strategies for parallel shared-memory sparse matrix-vector multiplication. IEEE Tras Parallel Distrib Syst 25(1):116–125
Ashari A, Sedaghati N, Eisenlohr J et al (2014) An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In: Proceedings of the 28th ACM international conference on Supercomputing. ACM, pp. 273–282
Yang W, Li K, Mo Z et al (2015) Performance optimization using partitioned SPMV on GPUs and multicore CPUs. IEEE Trans Comput 64(9):2623–2636
Cheng K, Tian J, Ma RL (2018) Study on efficient storage format of sparse matrix based on GPU. Comput Eng 491(08):60–66
Buatois L, Caumon G, Levy B (2009) Concurrent number cruncher: a GPU implementation of a general sparse linear solver. Int J Parallel Emerg Distrib Syst 24(3):205–223
Oberhuber T, Suzuki A, Vacata J (2011) New row-grouped CSR format for storing the sparse matrices on GPU with implementation in CUDA. Acta Tech 56(4):447–466
Belgin M, Back G, Ribbens CJ (2009) Pattern-based sparse matrix representation for memory-efficient SMVM kernels. In: Proceedings of the 23rd International Conference on Supercomputing, ACM, pp 100–109
Williams S, Oliker L, Vuduc R (2009) Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Comput 35(3):178–194
Monakov A, Lokhmotov A, Avetisyan A (2010) Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: International Conference on High-Performance Embedded Architectures and Compilers. DBLP, pp 111–125
Choi JW, Singh A (2010) Model-driven autotuning of sparse matrix vector multiply on GPUs. In: PPoPP ’10: Proceedings of the 15th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pp 115–126
Yzelman AN, Bisseling RH (2011) Two-dimensional cache-oblivious sparse matrix vector multiplication. Parallel Comput 37(12):806–819
Abubaker NFT, Kadir A, Cevdet A (2019) Spatiotemporal graph and hypergraph partitioning models for sparse matrix-vector multiplication on many-core architectures. IEEE Trans Parallel Distrib Syst 30(2):445–458
Yang C, Aydin B, Owens J (2018) Design principles for sparse matrix multiplication on the GPU. In: Euro-Par, pp 1–16
Yan SG, Li C, Zhang YQ (2014). yaSPMV: yet another SPMV framework on GPUs. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, pp 107–118
Liu W, Vinter B (2015) CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In: The 29th ACM International Conference on Supercomputing (ICS ’15). ACM, ACM, pp. 1–12
Tan GM, Liu JH, Li JJ (2018) Design and implementation of adaptive SPMV library for multicore and many-core architecture. ACM Trans Math Softw 44(4):1–25
Yang W, Li K, Li K (2018) A parallel computing method using blocked format with optimal partitioning for SPMV on GPU. J Comput Syst Sci 92:152–170
Benatia A, Ji W, Wang Y et al (2019) BestSF: A sparse meta-format for optimizing SPMV on GPU. ACM Trans Arch Code Optim 15(3):1–27
Yang WD, Li KL, Liu YZ et al (2014) Optimization of Quasi - diagonal matrix-vector multiplication on GPU[J]. Int J High Perform Comput Appl 28(2):183–195
Fukaya T, Ishida K, Miura A et al (2021) Accelerating the SpMV kernel on standard CPUs by exploiting the partially diagonal structures[J]. Preprints
He G, Chen Q, Gao J (2021) A new diagonal storage for efficient implementation of sparse matrix-vector multiplication on graphics processing unit. Concurr Comput Pract Exp 2021(4):1–15
Gao J, Xia Y, Yin R et al (2021) Adaptive diagonal sparse matrix-vector multiplication on GPU[J]. J Parallel Distrib Comput 157(11):1–53
This work was partially supported by NSFC (National Natural Science Foundation of China) Number No. 61672181.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Cui, H., Wang, N., Wang, Y. et al. An effective SPMV based on block strategy and hybrid compression on GPU. J Supercomput (2021). https://doi.org/10.1007/s11227-021-04123-6
- Unbalanced load
- PBC algorithm
- Parallel efficiency