Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures

Monakov, Alexander; Lokhmotov, Anton; Avetisyan, Arutyun

doi:10.1007/978-3-642-11515-8_10

Alexander Monakov²¹,
Anton Lokhmotov²² &
Arutyun Avetisyan²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5952))

Included in the following conference series:

International Conference on High-Performance Embedded Architectures and Compilers

1844 Accesses
148 Citations

Abstract

Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallelize for GPUs due to irregular patterns of memory references. In this paper we present a new storage format for sparse matrices that better employs locality, has low memory footprint and enables automatic specialization for various matrices and future devices via parameter tuning. Experimental evaluation demonstrates significant speedups compared to previously published results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The landscape of parallel computing research: A view from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley (December 2006)
Google Scholar
Baskaran, M.M., Bordawekar, R.: Optimizing sparse matrix-vector multiplication on GPUs. Technical report, IBM TJ Watson Research Center (2009)
Google Scholar
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004 (2008)
Google Scholar
Buatois, L., Caumon, G., Lévy, B.: Concurrent number cruncher: An efficient sparse linear solver on the GPU. In: Perrott, R., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T. (eds.) HPCC 2007. LNCS, vol. 4782, pp. 358–371. Springer, Heidelberg (2007)
Chapter Google Scholar
Kincaid, D.R., Oppe, T.C., Young, D.M.: ITPACKV 2D User’s Guide
Google Scholar
Monakov, A., Avetisyan, A.: Implementing blocked sparse matrix-vector multiplication on NVIDIA GPUs. In: SAMOS, pp. 289–297 (2009)
Google Scholar
NVIDIA Corporation. NVIDIA CUDA Programming Guide 2.2 (2009)
Google Scholar
Vázquez, F., Garzón, E.M., Martnez, J.A., Fernández, J.J.: The sparse matrix vector product on GPUs. Technical report, University of Almeria (2009)
Google Scholar
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: SC 2008: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pp. 1–11. IEEE Press, Los Alamitos (2008)
Google Scholar
Vuduc, R.W.: Automatic performance tuning of sparse matrix kernels, PhD thesis, University of California, Berkeley (2003); Chair-Demmel, J.W.
Google Scholar
Williams, S., Oliker, L., Vuduc, R.W., Shalf, J., Yelick, K.A., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: SC, p. 38 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for System Programming of RAS, 25 Solzhenitsyna street, Moscow, 109004, Russian Federation
Alexander Monakov & Arutyun Avetisyan
Department of Computing, Imperial College London, 180 Queen’s Gate, London, SW7 2AZ, United Kingdom
Anton Lokhmotov

Authors

Alexander Monakov
View author publications
You can also search for this author in PubMed Google Scholar
Anton Lokhmotov
View author publications
You can also search for this author in PubMed Google Scholar
Arutyun Avetisyan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, The University of Texas at Austin, 1 University Station C0803, TX 78712-0240, Austin, USA
Yale N. Patt
Dipartimento di Ingegneria della Informazione, Università di Pisa, Via Diotisalvi 2, 56100, Pisa, Italy
Pierfrancesco Foglia
IBM T.J.Watson Research Center, 19 Skyline Drive, NY 10532, Hawthorne, USA
Evelyn Duesterwald
Hewlett-Packard, Cami de Can Graells 1-21, Sant Cugat del Vallés, 08174, Barcelona, Spain
Paolo Faraboschi
Computer Architecture Department, Technical University of Catalunya (UPC), c/Jordi Girona 1-3, 08034, Barcelona, Spain
Xavier Martorell

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Monakov, A., Lokhmotov, A., Avetisyan, A. (2010). Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2010. Lecture Notes in Computer Science, vol 5952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11515-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-11515-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11514-1
Online ISBN: 978-3-642-11515-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics