Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform

Xu, Shiming; Xue, Wei; Lin, Hai Xiang

doi:10.1007/s11227-011-0626-0

Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform

Open access
Published: 07 June 2011

Volume 63, pages 710–721, (2013)
Cite this article

Download PDF

You have full access to this open access article

The Journal of Supercomputing Aims and scope Submit manuscript

Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform

Download PDF

Shiming Xu¹,
Wei Xue² &
Hai Xiang Lin³

1493 Accesses
18 Citations
Explore all metrics

Abstract

In this article, we discuss the performance modeling and optimization of Sparse Matrix-Vector Multiplication () on NVIDIA GPUs using CUDA. has a very low computation-data ratio and its performance is mainly bound by the memory bandwidth. We propose optimization of based on ELLPACK from two aspects: (1) enhanced performance for the dense vector by reducing cache misses, and (2) reduce accessed matrix data by index reduction. With matrix bandwidth reduction techniques, both cache usage enhancement and index compression can be enabled. For GPU with better cache support, we propose differentiated memory access scheme to avoid contamination of caches by matrix data. Performance evaluation shows that the combined speedups of proposed optimizations for GT-200 are 16% (single-precision) and 12.6% (double-precision) for GT-200 GPU, and 19% (single-precision) and 15% (double-precision) for GF-100 GPU.

Article PDF

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Zone CUDA. http://www.nvidia.com/cuda
decuda. http://wiki.github.com/laanwj/decuda
GPGPU.org. http://www.gpgpu.org
Belgin M, Back G, Ribbens C (2011) A library for pattern-based sparse matrix vector multiply. Intl J Parallel Program 39(1):62–67
Article Google Scholar
Buatois L, Caumon G, Levy B (2009) Concurrent number cruncher—a GPU implementation of a general sparse linear solver. Intl J of Parallel, Emergent and Distributed Systems 24(3):205–223
Article MathSciNet Google Scholar
Chen D, Li D, Xiong M, Bao H, Li X (2010) GPGPU-aided ensemble empirical mode decomposition for EEG analysis during anaesthesia. IEEE Trans Inf Technol BioMed 14(6):1417–1427
Article Google Scholar
Choi JW, Singh A, Vuduc RW (2010) Model-driven autotuning of sparse matrix-vector multiply on CPUs. ACM SIGPLAN Not 45(5):115–126
Article Google Scholar
Cuthill E, McKee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: Proc 24th nat conf ACM, pp 157–172
Google Scholar
Kourtis K, Goumas G, Koziris N (2008) Optimizing sparse matrix-vector multiplication using index and value compression, pp 87–96
Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proc SC’09
Google Scholar
Vuduc RW (2002) Automatic performance tuning of sparse matrix kernels. PhD thesis, University of California, Berkeley, 2002
Willcock J, Lumsdaine A (2006) Accelerating sparse matrix computations via data compression. In: Proc of the 20th annual intl conf on supercomputing, ICS ’06. ACM, New York, pp 307–316
Chapter Google Scholar
Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, Demmel JW (2007) Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Proc 2007 ACM/IEEE conference on supercomputing, SC ’07. ACM, New York, pp 38:1–38:12
Google Scholar
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelphia
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Mekelweg 4, 2628 CD, Delft, The Netherlands
Shiming Xu
Tsinghua University, RM. 8-210, East Main Bldg., 100084, Beijing, China
Wei Xue
Mekelweg 4, 2628 CD, Delft, The Netherlands
Hai Xiang Lin

Authors

Shiming Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xue
View author publications
You can also search for this author in PubMed Google Scholar
Hai Xiang Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shiming Xu.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Xu, S., Xue, W. & Lin, H.X. Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform. J Supercomput 63, 710–721 (2013). https://doi.org/10.1007/s11227-011-0626-0

Download citation

Published: 07 June 2011
Issue Date: March 2013
DOI: https://doi.org/10.1007/s11227-011-0626-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform

Abstract

Article PDF

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation