Skip to main content

Towards Reducing Communications in Sparse Matrix Kernels

  • Conference paper
  • First Online:
Cloud Computing, Big Data & Emerging Topics (JCC-BD&ET 2023)

Abstract

The significant presence that many-core devices like GPUs have these days, and their enormous computational power, motivates the study of sparse matrix operations in this hardware. The essential sparse kernels in scientific computing, such as the sparse matrix-vector multiplication (SpMV), usually have many different high-performance GPU implementations. Sparse matrix problems typically imply memory-bound operations, and this characteristic is particularly limiting in massively parallel processors. This work revisits the main ideas about reducing the volume of data required by sparse storage formats and advances in understanding some compression techniques. In particular, we study the use of index compression combined with sparse matrix reordering techniques. The systematic experimental evaluation on a large set of real-world matrices confirms that this approach is promising, achieving meaningful data storage reductions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://sparse.tamu.edu/.

References

  1. Anzt, H., Dongarra, J., Flegar, G., Higham, N.J., Quintana-Ortí, E.S.: Adaptive precision in block-jacobi preconditioning for iterative sparse linear system solvers. Concurrency Comput. Pract. Experience 31(6), e4460 (2018). https://doi.org/10.1002/cpe.4460

    Article  Google Scholar 

  2. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–11 (2009)

    Google Scholar 

  3. Bell, N., Garland, M.: Cusp library (2012). https://github.com/cusplibrary/cusplibrary

  4. Berger, G., Freire, M., Marini, R., Dufrechou, E., Ezzatti, P.: Unleashing the performance of bmsparse for the sparse matrix multiplication in GPUs. In: Proceedings of the 2021 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), pp. 19–26, November 2021

    Google Scholar 

  5. Berger, G., Freire, M., Marini, R., Dufrechou, E., Ezzatti, P.: Advancing on an efficient sparse matrix multiplication kernel for modern gpus. Practice and Experience, Concurrency and Computation (2022)

    Google Scholar 

  6. Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 24th National Conference, pp. 157–172. ACM Press (1969). https://doi.org/10.1145/800195.805928

  7. Dufrechou, E., Ezzatti, P., Freire, M., Quintana-Ortí, E.S.: Machine learning for optimal selection of sparse triangular system solvers on GPUs. J. Parall. Distrib. Comput. 158, 47–55 (2021). https://doi.org/10.1016/j.jpdc.2021.07.013

    Article  Google Scholar 

  8. Dufrechou, E., Ezzatti, P., Quintana-Ortí, E.S.: Selecting optimal SpMV realizations for GPUs via machine learning. Int. J. High Perform. Comput. Appl. 35(3) (2021). https://doi.org/10.1177/1094342021990738

  9. Gale, T., Zaharia, M., Young, C., Elsen, E.: Sparse GPU kernels for deep learning. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC 2020, IEEE Press (2020)

    Google Scholar 

  10. Grützmacher, T., Cojean, T., Flegar, G., Göbel, F., Anzt, H.: A customized precision format based on mantissa segmentation for accelerating sparse linear algebra. Concurrency Comput. Pract. Experience 32(15) (2019). https://doi.org/10.1002/cpe.5418

  11. Guo, D., Gropp, W., Olson, L.N.: A hybrid format for better performance of sparse matrix-vector multiplication on a GPU. Int. J. High Perform. Comput. Appl. 30(1), 103–120 (2015). https://doi.org/10.1177/1094342015593156

    Article  Google Scholar 

  12. Gustavson, F.G., Liniger, W., Willoughby, R.: Symbolic generation of an optimal crout algorithm for sparse systems of linear equations. J. ACM 17(1), 87–109 (1970)

    Article  MATH  Google Scholar 

  13. Hong, C., Sukumaran-Rajam, A., Nisa, I., Singh, K., Sadayappan, P.: Adaptive sparse tiling for sparse matrix multiplication. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, ACM, February 2019. https://doi.org/10.1145/3293883.3295712

  14. Kourtis, K., Goumas, G., Koziris, N.: Optimizing sparse matrix-vector multiplication using index and value compression. In: Proceedings of the 2008 Conference on Computing Frontiers, ACM Press (2008). https://doi.org/10.1145/1366230.1366244

  15. Langr, D., Tvrdík, P.: Evaluation criteria for sparse matrix storage formats. IEEE Trans. Parall. Distrib. Syst. 27(2), 428–440 (2016). https://doi.org/10.1109/TPDS.2015.2401575

    Article  Google Scholar 

  16. Maggioni, M., Berger-Wolf, T.: CoAdELL: adaptivity and compression for improving sparse matrix-vector multiplication on GPUs. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, IEEE, May 2014. https://doi.org/10.1109/ipdpsw.2014.106

  17. Marichal, R., Dufrechou, E., Ezzatti, P.: Optimizing sparse matrix storage for the big data era. In: Naiouf, M., Rucci, E., Chichizola, F., De Giusti, L. (eds.) JCC-BD &ET 2021. CCIS, vol. 1444, pp. 121–135. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-84825-5_9

    Chapter  Google Scholar 

  18. Monakov, A., Lokhmotov, A., Avetisyan, A.: Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: High Performance Embedded Architectures and Compilers, pp. 111–125. Springer, Berlin Heidelberg (2010)

    Google Scholar 

  19. Pinar, A., Heath, M.T.: Improving performance of sparse matrix-vector multiplication. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, pp. 30-es. SC 1999, Association for Computing Machinery, New York, NY, USA (1999)

    Google Scholar 

  20. Saad, Y.: Sparskit: a basic tool kit for sparse matrix computations - version 2 (1994)

    Google Scholar 

  21. Sun, X., Zhang, Y., Wang, T., Zhang, X., Yuan, L., Rao, L.: Optimizing SpMV for diagonal sparse matrices on GPU. In: 2011 International Conference on Parallel Processing, IEEE, September 2011. https://doi.org/10.1109/icpp.2011.53

  22. Tang, W.T., et al.: Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ACM (2013). https://doi.org/10.1145/2503210.2503234

  23. Willcock, J., Lumsdaine, A.: Accelerating sparse matrix computations via data compression. In: Proceedings of the 20th Annual International Conference on Supercomputing - ICS 2006, ACM Press (2006). https://doi.org/10.1145/1183401.1183444

  24. Xu, S., Lin, H.X., Xue, W.: Sparse matrix-vector multiplication optimizations based on matrix bandwidth reduction using NVIDIA CUDA. In: 2010 Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science, IEEE, August 2010

    Google Scholar 

  25. Yang, C., Buluç, A., Owens, J.D.: Design principles for sparse matrix multiplication on the GPU. In: Aldinucci, M., Padovani, L., Torquati, M. (eds.) Euro-Par 2018. LNCS, vol. 11014, pp. 672–687. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96983-1_48

    Chapter  Google Scholar 

Download references

Acknowledgments

We acknowledge support of the ANII MPG Independent Research Group: Efficient Heterogeneous Computing at UdelaR, a partner group of the Max Planck Institute in Magdeburg. This work is partially funded by the UDELAR CSIC-INI project CompactDisp: Formatos dispersos eficientes para arquitecturas de hardware modernas. We also thank PEDECIBA Informática and the University of the Republic, Uruguay.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ernesto Dufrechou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Freire, M., Marichal, R., Dufrechou, E., Ezzatti, P. (2023). Towards Reducing Communications in Sparse Matrix Kernels. In: Naiouf, M., Rucci, E., Chichizola, F., De Giusti, L. (eds) Cloud Computing, Big Data & Emerging Topics. JCC-BD&ET 2023. Communications in Computer and Information Science, vol 1828. Springer, Cham. https://doi.org/10.1007/978-3-031-40942-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-40942-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-40941-7

  • Online ISBN: 978-3-031-40942-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics