Skip to main content

Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7975))

Abstract

Sparse matrix-vector multiplication (SpMV) is an important operation in scientific and engineering computing. This paper presents optimization techniques for SpMV for the Compressed Row Storage (CRS) format on NVIDIA Kepler architecture GPUs using CUDA. Our implementation is based on an existing method proposed for the Fermi architecture, an earlier generation of the GPU, and takes advantage of some of the new features of the Kepler architecture. On a Tesla K20 Kepler architecture GPU on double precision operations, our implementation is, on average, approximately 1.29 times faster than that the Fermi optimized implementation for 200 different types of matrices. As a result, our implementation outperforms the NVIDIA cuSPARSE library’s CRS format SpMV in CUDA 5.0 on 174 of the 200 matrices, and the average speedup compared to the cuSPARSE SpMV routine across all 200 matrices is approximately 1.45.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baskaran, M.M., Bordawekar, R.: Optimizing Sparse Matrix-Vector Multiplication on GPUs. IBM Research Report RC24704 (2009)

    Google Scholar 

  2. Bell, N., Garland, M.: Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical Report NVR-2008-004 (2008)

    Google Scholar 

  3. NVIDIA Corporation: Whitepaper NVIDIAs Next Generation CUDA Compute Architecture: Kepler GK110. itepaper.pdf (2012), http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Wh

  4. Davis, J.D., Chung, E.S.: SpMV: A Memory-Bound Application on the GPU Stuck Between a Rock and a Hard Place. Microsoft Technical Report MSR–TR–2012–95 (2012)

    Google Scholar 

  5. Davis, T., Hu, Y.: The University of Florida Sparse Matrix Collection, http://www.cise.ufl.edu/research/sparse/matrices/

  6. El Zein, A.H., Rendell, A.P.: Generating Optimal CUDA Sparse Matrix Vector Product Implementations for Evolving GPU Hardware. Concurrency and Computation: Practice and Experience 24, 3–13 (2012)

    Article  Google Scholar 

  7. Feng, X., Jin, H., Zheng, R., Hu, K., Zeng, J., Shao, Z.: Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs. In: Proc. IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS 2011), pp. 165–172 (2011)

    Google Scholar 

  8. Guo, P., Wang, L.: Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs. In: Proc. International Conference on Computational and Information Sciences (ICCIS 2010), pp. 1154–1157 (2010)

    Google Scholar 

  9. Kubota, Y., Takahashi, D.: Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2011, Part II. LNCS, vol. 6783, pp. 547–561. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Matam, K., Kothapalli, K.: Accelerating Sparse Matrix Vector Multiplication in Iterative Methods Using GPU. In: Proc. International Conference on Parallel Processing (ICPP 2011), pp. 612–621 (2011)

    Google Scholar 

  11. NVIDIA Corporation: cuSPARSE Library (included in CUDA Toolkit), https://developer.nvidia.com/cusparse

  12. Reguly, I., Giles, M.: Efficient sparse matrix-vector multiplication on cache-based GPUs. In: Proc. Innovative Parallel Computing: Foundations and Applications of GPU, Manycore, and Heterogeneous Systems (InPar 2012), pp. 1–12 (2012)

    Google Scholar 

  13. Xu, W., Zhang, H., Jiao, S., Wang, D., Song, F., Liu, Z.: Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU. In: Proc. 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2012), pp. 231–235 (2012)

    Google Scholar 

  14. Yoshizawa, H., Takahashi, D.: Automatic Tuning of Sparse Matrix-Vector Multiplication for CRS format on GPUs. In: Proc. 15th IEEE International Conference on Computational Science and Engineering (CSE 2012), pp. 130–136 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mukunoki, D., Takahashi, D. (2013). Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2013. ICCSA 2013. Lecture Notes in Computer Science, vol 7975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39640-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39640-3_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39639-7

  • Online ISBN: 978-3-642-39640-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics