Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs

Aliaga, José Ignacio; Anzt, Hartwig; Quintana-Ortí, Enrique S.; Tomás, Andrés E.; Tsai, Yuhsiang M.

doi:10.1007/978-3-030-71593-9_7

José Ignacio Aliaga¹⁸,
Hartwig Anzt^19,20,
Enrique S. Quintana-Ortí²¹,
Andrés E. Tomás^18,22 &
…
Yuhsiang M. Tsai¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12480))

Included in the following conference series:

European Conference on Parallel Processing

804 Accesses
2 Citations

Abstract

We contribute to the optimization of the sparse matrix-vector product on graphics processing units by introducing a variant of the coordinate sparse matrix layout that compresses the integer representation of the matrix indices. In addition, we employ a look-ahead table to avoid the storage of repeated numerical values in the sparse matrix, yielding a more compact data representation that is easier to maintain in the cache. Our evaluation on the two most recent generations of NVIDIA GPUs, the V100 and the A100 architectures, shows considerable performance improvements over the kernels for the sparse matrix-vector product in cuSPARSE (CUDA 11.0.167).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Suitesparse matrix collection (2018). https://sparse.tamu.edu. Accessed Sept 2020
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical report NVR-2008-004, NVIDIA Corporation, December 2008
Google Scholar
Buluç, A., Williams, S., Oliker, L., Demmel, J.: Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication. In: Proceedings of the IEEE International Parallel & Distributed Processing Symposium, pp. 721–733 (2011)
Google Scholar
Choi, J.W., Singh, A., Vuduc, R.W.: Model-driven autotuning of sparse matrix-vector multiply on GPUs. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pp. 115–126 (2010)
Google Scholar
Filippone, S., Cardellini, V., Barbieri, D., Fanfarillo, A.: Sparse matrix-vector multiplication on GPGPUs. ACM Trans. Math. Softw. 43(4), 1–49 (2017)
Article MathSciNet Google Scholar
Flegar, G., Anzt, H.: Overcoming load imbalance for irregular sparse matrices. In: Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms, IA3 2017 (2017)
Google Scholar
Flegar, G., Quintana-Ortí, E.S.: Balanced CSR sparse matrix-vector product on graphics processors. In: Rivera, F.F., Pena, T.F., Cabaleiro, J.C. (eds.) Euro-Par 2017. LNCS, vol. 10417, pp. 697–709. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64203-1_50
Chapter Google Scholar
Grossman, M., Thiele, C., Araya-Polo, M., Frank, F., Alpak, F.O., Sarkar, V.: A survey of sparse matrix-vector multiplication performance on large matrices. CoRR abs/1608.00636 (2016). http://arxiv.org/abs/1608.00636
Liu, W., Vinter, B.: CSR5: an efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS 2015, pp. 339–350. ACM, New York (2015)
Google Scholar
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia (2003)
Book Google Scholar

Download references

Acknowledgements

This work was partially sponsored by the EU H2020 project 732631 OPRECOMP and project TIN2017-82972-R of the Spanish MINECO. Hartwig Anzt and Yuhsiang M. Tsai were supported by the “Impuls und Vernetzungsfond” of the Helmholtz Association under grant VH-NG-1241 and by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The authors would like to thank the Steinbuch Centre for Computing (SCC) of the Karlsruhe Institute of Technology for providing access to an NVIDIA A100 GPU.

Author information

Authors and Affiliations

Dpto. de Ingeniería y Ciencia de Computadores, Universitat Jaume I, Castellón de la Plana, Spain
José Ignacio Aliaga & Andrés E. Tomás
Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Karlsruhe, Germany
Hartwig Anzt & Yuhsiang M. Tsai
Innovative Computing Lab, University of Tennessee, Knoxville, USA
Hartwig Anzt
DISCA, Universitat Politècnica de València, Valencia, Spain
Enrique S. Quintana-Ortí
Dpto. de Informática, Universitat de València, Valencia, Spain
Andrés E. Tomás

Authors

José Ignacio Aliaga
View author publications
You can also search for this author in PubMed Google Scholar
Hartwig Anzt
View author publications
You can also search for this author in PubMed Google Scholar
Enrique S. Quintana-Ortí
View author publications
You can also search for this author in PubMed Google Scholar
Andrés E. Tomás
View author publications
You can also search for this author in PubMed Google Scholar
Yuhsiang M. Tsai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Enrique S. Quintana-Ortí .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Bartosz Balis
CiTIUS, Santiago de Compostela, Spain
Dora B. Heras
ICAR-CNR, Naples, Italy
Laura Antonelli
University of Stirling, Stirling, UK
Andrea Bracciali
Friedrich-Alexander-Universität, Erlangen, Germany
Thomas Gruber
Konkuk University, Seoul, Korea (Republic of)
Jin Hyun-Wook
Otto von Guericke University Magdeburg, Magdeburg, Germany
Michael Kuhn
Tennessee Tech University, Cookeville, TN, USA
Stephen L. Scott
Koç University, Istanbul, Turkey
Didem Unat
Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aliaga, J.I., Anzt, H., Quintana-Ortí, E.S., Tomás, A.E., Tsai, Y.M. (2021). Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs. In: Balis, B., et al. Euro-Par 2020: Parallel Processing Workshops. Euro-Par 2020. Lecture Notes in Computer Science(), vol 12480. Springer, Cham. https://doi.org/10.1007/978-3-030-71593-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-71593-9_7
Published: 14 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71592-2
Online ISBN: 978-3-030-71593-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics