VBSF: a new storage format for SIMD sparse matrix–vector multiplication on modern processors

Li, Yishui; Xie, Peizhen; Chen, Xinhai; Liu, Jie; Yang, Bo; Li, Shengguo; Gong, Chunye; Gan, Xinbiao; Xu, Han

doi:10.1007/s11227-019-02835-4

VBSF: a new storage format for SIMD sparse matrix–vector multiplication on modern processors

Published: 10 April 2019

Volume 76, pages 2063–2081, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Yishui Li¹,
Peizhen Xie¹,
Xinhai Chen¹,
Jie Liu¹,
Bo Yang¹,
Shengguo Li¹,
Chunye Gong¹,
Xinbiao Gan¹ &
…
Han Xu¹

971 Accesses
11 Citations
Explore all metrics

Abstract

Sparse matrix–vector multiplication (SpMV) is one of the most indispensable kernels of solving problems in numerous applications, but its performance of SpMV is limited by the need for frequent memory access. Modern processors exploit data-level parallelism to improve the performance using single-instruction multiple data (SIMD). In order to take full advantage of SIMD acceleration technology, a new storage format called Variable Blocked-\(\sigma\)-SIMD Format (VBSF) is proposed in this paper to change the irregular nature of traditional matrix storage formats. This format combines the adjacent nonzero elements into variable size blocks to ensure that SpMV can be computed with SIMD vector units. We compare the VBSF-based SpMV with traditional storage formats using 15 matrices as a benchmark suite on three computing platforms (FT2000, Intel Xeon E5 and Intel Silver) with different SIMD length. For the matrices in the benchmark suite, the VBSF obtains great performance improvement on three platforms, respectively, and it proves to have better storage efficiency compared with other storage formats.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A simple and efficient storage format for SIMD-accelerated SpMV

Article 20 June 2021

New Efficient General Sparse Matrix Formats for Parallel SpMV Operations

Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUs

Article Open access 11 March 2024

References

Blelloch GE, Heroux MA, Zagha M (1993) Segmented operations for sparse matrix computation on vector multiprocessors. Technical reports, Pittsburgh, PA, USA
Chen S, Fang J, Chen D, Xu C, Wang Z (2018) Optimizing sparse matrix–vector multiplication on emerging many-core architectures. ArXiv preprint arXiv:1805.11938
Chen X, Xie P, Chi L et al (2018) An efficient SIMD compression format for sparse matrix-vector multiplication. Concurr Comput Pract Exp 30(23):e4800
Article Google Scholar
Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1:1–1:25
MathSciNet MATH Google Scholar
DAzevedo EF, Fahey MR, Mills RT (2005) Vectorized sparse matrix multiply for compressed row storage format. In: Proceedings of the 5th International Conference on Computational Science-Volume Part I, ICCS’05. Springer, Berlin, pp 99–106
Goumas G, Kourtis K, Anastopoulos N, Karakasis V, Koziris N (2009) Performance evaluation of the sparse matrix–vector multiplication on modern architectures. J Supercomput 50(1):36–77
Article Google Scholar
Im EJ, Yelick K, Vuduc R (2004) Sparsity: optimization framework for sparse matrix kernels. Int J High Perform Comput Appl 18(1):135–158
Article Google Scholar
Im EJ, Yelick KA (2001) Optimizing sparse matrix computations for register reuse in SPARSITY. In: Proceedings of the International Conference on Computational Sciences-Part I, ICCS ’01. Springer, Berlin, pp 127–136
Karakasis V, Goumas G, Koziris N (2009) A comparative study of blocking storage methods for sparse matrices on multicore architectures. In: Proceedings of the 2009 International Conference On Computational Science And Engineering-Volume 01, CSE ’09. IEEE Computer Society, Washington, DC, pp 247–256
Karakasis V, Goumas G, Koziris N (2009) Perfomance models for blocked sparse matrix–vector multiplication kernels. In: Proceedings of the 2009 International Conference on Parallel Processing, ICPP ’09. IEEE Computer Society, Washington, DC, pp 356–364
Kreutzer M, Hager G, Wellein G, Fehske H, Bishop A (2013) A unified sparse matrix data format for efficient general sparse matrix–vector multiplication on modern processors with wide SIMD units. SIAM J Sci Comput 36(5):C401–C423
Article MathSciNet Google Scholar
Langr D, Tvrdik P (2015) Evaluation criteria for sparse matrix storage formats. IEEE Trans Parallel Distrib Syst 27(2):428–440
Article Google Scholar
Li J, Tan G, Chen M, Sun N (2013) SMAT: an input adaptive auto-tuner for sparse matrix–vector multiplication. SIGPLAN Not. 48(6):117–126
Article Google Scholar
Li J, Zhang X, Tan G, Chen M (2014) Study of choosing the optimal storage format of sparse matrix vector multiplication. J Comput Res Dev 51(4):882–894
Google Scholar
Liu F, Yang C (2014) A new sparse matrix storage format for improving SpMV performance by SIMD. J Numer Methods Comput Appl 35(4):269–276
MathSciNet MATH Google Scholar
Liu W, Vinter B (2015) CSR5: an efficient storage format for cross-platform sparse matrix–vector multiplication. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ’15. New York, pp 339–350
Liu X, Smelyanskiy M, Chow E, Dubey P (2013) Efficient sparse matrix–vector multiplication on x86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ACM, pp 273–282
Patterson DA (2007) The parallel computing landscape: a Berkeley view. In: ACM/IEEE International Symposium on Low Power Electronics and Design
Pinar A, Heath M.T (1999) Improving performance of sparse matrix–vector multiplication. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, SC ’99. ACM, New York
Saad Y (1990) SPARSKIT: a basic tool kit for sparse matrix computations NASA Ames Research Center TR 90-20
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia
Book Google Scholar
Sedaghati N, Mu T, Pouchet L.N, Parthasarathy S, Sadayappan P (2015) Automatic selection of sparse matrix representation on GPUs. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ’15. New York, pp 99–108
Shalf J, Dosanjh S, Morrison J (2011) Exascale computing technology challenges. In: Proceedings of the 9th International Conference on High Performance Computing for Computational Science, VECPAR’10. Springer, Berlin, pp 1–25
Shen J, Varbanescu AL, Zou P, Lu Y, Sips H (2014) Improving performance by matching imbalanced workloads with heterogeneous platforms. In: Proceedings of the 28th ACM International Conference on Supercomputing, ICS ’14. ACM, New York, pp 241–250
Sun X, Zhang Y, Wang T, Long G, Zhang X, Li Y (2011) CRSD: application specific auto-tuning of SpMV for diagonal sparse matrices. In: Proceedings of the 17th International Conference on Parallel Processing-Volume Part II, Euro-Par’11. Springer, pp 316–327
Vuduc R.W, Moon H.J (2005) Fast sparse matrix–vector multiplication by exploiting variable block structure. In: Proceedings of the First International Conference on High Performance Computing and Communications, HPCC’05. Springer, Berlin, pp 807–816
Xu C, Deng X, Zhang L, Fang J, Wang G, Jiang Y, Cao W, Che Y, Wang Y, Wang Z (2014) Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer. J Comput Phys 278:275–297
Article Google Scholar
Yelick K (2008) pOSKI: an extensible autotuning framework to perform optimized SpMVs on Multicore Architectures. Ph.D. Thesis, Department of Electrical Engineering and Computer Sciences, University of California at Berkeley
Zhang A, An H, Yao W, Liang W, Jiang X, Li F (2016) Efficient sparse matrix–vector multiplication on intel xeon phi. J Chin Comput Syst 37(4):818–823
Google Scholar
Zhao Y, Li J, Liao C, Shen X (2018) Bridging the gap between deep learning and sparse matrix format selection. SIGPLAN Not. 53(1):94–108
Article Google Scholar

Download references

Acknowledgements

This research work was supported in part by the National Key Research and Development Program of China (2017YFB0202104).

Author information

Authors and Affiliations

Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, 410005, China
Yishui Li, Peizhen Xie, Xinhai Chen, Jie Liu, Bo Yang, Shengguo Li, Chunye Gong, Xinbiao Gan & Han Xu

Authors

Yishui Li
View author publications
You can also search for this author in PubMed Google Scholar
Peizhen Xie
View author publications
You can also search for this author in PubMed Google Scholar
Xinhai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Shengguo Li
View author publications
You can also search for this author in PubMed Google Scholar
Chunye Gong
View author publications
You can also search for this author in PubMed Google Scholar
Xinbiao Gan
View author publications
You can also search for this author in PubMed Google Scholar
Han Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Xie, P., Chen, X. et al. VBSF: a new storage format for SIMD sparse matrix–vector multiplication on modern processors. J Supercomput 76, 2063–2081 (2020). https://doi.org/10.1007/s11227-019-02835-4

Download citation

Published: 10 April 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11227-019-02835-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VBSF: a new storage format for SIMD sparse matrix–vector multiplication on modern processors

Abstract

Access this article

Similar content being viewed by others

A simple and efficient storage format for SIMD-accelerated SpMV

New Efficient General Sparse Matrix Formats for Parallel SpMV Operations

Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

VBSF: a new storage format for SIMD sparse matrix–vector multiplication on modern processors

Abstract

Access this article

Similar content being viewed by others

A simple and efficient storage format for SIMD-accelerated SpMV

New Efficient General Sparse Matrix Formats for Parallel SpMV Operations

Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation