Skip to main content

Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures

  • Conference paper
High Performance Embedded Architectures and Compilers (HiPEAC 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5952))

Abstract

Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallelize for GPUs due to irregular patterns of memory references. In this paper we present a new storage format for sparse matrices that better employs locality, has low memory footprint and enables automatic specialization for various matrices and future devices via parameter tuning. Experimental evaluation demonstrates significant speedups compared to previously published results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The landscape of parallel computing research: A view from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley (December 2006)

    Google Scholar 

  2. Baskaran, M.M., Bordawekar, R.: Optimizing sparse matrix-vector multiplication on GPUs. Technical report, IBM TJ Watson Research Center (2009)

    Google Scholar 

  3. Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004 (2008)

    Google Scholar 

  4. Buatois, L., Caumon, G., Lévy, B.: Concurrent number cruncher: An efficient sparse linear solver on the GPU. In: Perrott, R., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T. (eds.) HPCC 2007. LNCS, vol. 4782, pp. 358–371. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Kincaid, D.R., Oppe, T.C., Young, D.M.: ITPACKV 2D User’s Guide

    Google Scholar 

  6. Monakov, A., Avetisyan, A.: Implementing blocked sparse matrix-vector multiplication on NVIDIA GPUs. In: SAMOS, pp. 289–297 (2009)

    Google Scholar 

  7. NVIDIA Corporation. NVIDIA CUDA Programming Guide 2.2 (2009)

    Google Scholar 

  8. Vázquez, F., Garzón, E.M., Martnez, J.A., Fernández, J.J.: The sparse matrix vector product on GPUs. Technical report, University of Almeria (2009)

    Google Scholar 

  9. Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: SC 2008: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pp. 1–11. IEEE Press, Los Alamitos (2008)

    Google Scholar 

  10. Vuduc, R.W.: Automatic performance tuning of sparse matrix kernels, PhD thesis, University of California, Berkeley (2003); Chair-Demmel, J.W.

    Google Scholar 

  11. Williams, S., Oliker, L., Vuduc, R.W., Shalf, J., Yelick, K.A., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: SC, p. 38 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Monakov, A., Lokhmotov, A., Avetisyan, A. (2010). Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2010. Lecture Notes in Computer Science, vol 5952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11515-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11515-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11514-1

  • Online ISBN: 978-3-642-11515-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics