Abstract
The significant presence that many-core devices like GPUs have these days, and their enormous computational power, motivates the study of sparse matrix operations in this hardware. The essential sparse kernels in scientific computing, such as the sparse matrix-vector multiplication (SpMV), usually have many different high-performance GPU implementations. Sparse matrix problems typically imply memory-bound operations, and this characteristic is particularly limiting in massively parallel processors. This work revisits the main ideas about reducing the volume of data required by sparse storage formats and advances in understanding some compression techniques. In particular, we study the use of index compression combined with sparse matrix reordering techniques. The systematic experimental evaluation on a large set of real-world matrices confirms that this approach is promising, achieving meaningful data storage reductions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Anzt, H., Dongarra, J., Flegar, G., Higham, N.J., Quintana-Ortí, E.S.: Adaptive precision in block-jacobi preconditioning for iterative sparse linear system solvers. Concurrency Comput. Pract. Experience 31(6), e4460 (2018). https://doi.org/10.1002/cpe.4460
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–11 (2009)
Bell, N., Garland, M.: Cusp library (2012). https://github.com/cusplibrary/cusplibrary
Berger, G., Freire, M., Marini, R., Dufrechou, E., Ezzatti, P.: Unleashing the performance of bmsparse for the sparse matrix multiplication in GPUs. In: Proceedings of the 2021 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), pp. 19–26, November 2021
Berger, G., Freire, M., Marini, R., Dufrechou, E., Ezzatti, P.: Advancing on an efficient sparse matrix multiplication kernel for modern gpus. Practice and Experience, Concurrency and Computation (2022)
Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 24th National Conference, pp. 157–172. ACM Press (1969). https://doi.org/10.1145/800195.805928
Dufrechou, E., Ezzatti, P., Freire, M., Quintana-Ortí, E.S.: Machine learning for optimal selection of sparse triangular system solvers on GPUs. J. Parall. Distrib. Comput. 158, 47–55 (2021). https://doi.org/10.1016/j.jpdc.2021.07.013
Dufrechou, E., Ezzatti, P., Quintana-Ortí, E.S.: Selecting optimal SpMV realizations for GPUs via machine learning. Int. J. High Perform. Comput. Appl. 35(3) (2021). https://doi.org/10.1177/1094342021990738
Gale, T., Zaharia, M., Young, C., Elsen, E.: Sparse GPU kernels for deep learning. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC 2020, IEEE Press (2020)
Grützmacher, T., Cojean, T., Flegar, G., Göbel, F., Anzt, H.: A customized precision format based on mantissa segmentation for accelerating sparse linear algebra. Concurrency Comput. Pract. Experience 32(15) (2019). https://doi.org/10.1002/cpe.5418
Guo, D., Gropp, W., Olson, L.N.: A hybrid format for better performance of sparse matrix-vector multiplication on a GPU. Int. J. High Perform. Comput. Appl. 30(1), 103–120 (2015). https://doi.org/10.1177/1094342015593156
Gustavson, F.G., Liniger, W., Willoughby, R.: Symbolic generation of an optimal crout algorithm for sparse systems of linear equations. J. ACM 17(1), 87–109 (1970)
Hong, C., Sukumaran-Rajam, A., Nisa, I., Singh, K., Sadayappan, P.: Adaptive sparse tiling for sparse matrix multiplication. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, ACM, February 2019. https://doi.org/10.1145/3293883.3295712
Kourtis, K., Goumas, G., Koziris, N.: Optimizing sparse matrix-vector multiplication using index and value compression. In: Proceedings of the 2008 Conference on Computing Frontiers, ACM Press (2008). https://doi.org/10.1145/1366230.1366244
Langr, D., Tvrdík, P.: Evaluation criteria for sparse matrix storage formats. IEEE Trans. Parall. Distrib. Syst. 27(2), 428–440 (2016). https://doi.org/10.1109/TPDS.2015.2401575
Maggioni, M., Berger-Wolf, T.: CoAdELL: adaptivity and compression for improving sparse matrix-vector multiplication on GPUs. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, IEEE, May 2014. https://doi.org/10.1109/ipdpsw.2014.106
Marichal, R., Dufrechou, E., Ezzatti, P.: Optimizing sparse matrix storage for the big data era. In: Naiouf, M., Rucci, E., Chichizola, F., De Giusti, L. (eds.) JCC-BD &ET 2021. CCIS, vol. 1444, pp. 121–135. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-84825-5_9
Monakov, A., Lokhmotov, A., Avetisyan, A.: Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: High Performance Embedded Architectures and Compilers, pp. 111–125. Springer, Berlin Heidelberg (2010)
Pinar, A., Heath, M.T.: Improving performance of sparse matrix-vector multiplication. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, pp. 30-es. SC 1999, Association for Computing Machinery, New York, NY, USA (1999)
Saad, Y.: Sparskit: a basic tool kit for sparse matrix computations - version 2 (1994)
Sun, X., Zhang, Y., Wang, T., Zhang, X., Yuan, L., Rao, L.: Optimizing SpMV for diagonal sparse matrices on GPU. In: 2011 International Conference on Parallel Processing, IEEE, September 2011. https://doi.org/10.1109/icpp.2011.53
Tang, W.T., et al.: Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ACM (2013). https://doi.org/10.1145/2503210.2503234
Willcock, J., Lumsdaine, A.: Accelerating sparse matrix computations via data compression. In: Proceedings of the 20th Annual International Conference on Supercomputing - ICS 2006, ACM Press (2006). https://doi.org/10.1145/1183401.1183444
Xu, S., Lin, H.X., Xue, W.: Sparse matrix-vector multiplication optimizations based on matrix bandwidth reduction using NVIDIA CUDA. In: 2010 Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science, IEEE, August 2010
Yang, C., Buluç, A., Owens, J.D.: Design principles for sparse matrix multiplication on the GPU. In: Aldinucci, M., Padovani, L., Torquati, M. (eds.) Euro-Par 2018. LNCS, vol. 11014, pp. 672–687. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96983-1_48
Acknowledgments
We acknowledge support of the ANII MPG Independent Research Group: Efficient Heterogeneous Computing at UdelaR, a partner group of the Max Planck Institute in Magdeburg. This work is partially funded by the UDELAR CSIC-INI project CompactDisp: Formatos dispersos eficientes para arquitecturas de hardware modernas. We also thank PEDECIBA Informática and the University of the Republic, Uruguay.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Freire, M., Marichal, R., Dufrechou, E., Ezzatti, P. (2023). Towards Reducing Communications in Sparse Matrix Kernels. In: Naiouf, M., Rucci, E., Chichizola, F., De Giusti, L. (eds) Cloud Computing, Big Data & Emerging Topics. JCC-BD&ET 2023. Communications in Computer and Information Science, vol 1828. Springer, Cham. https://doi.org/10.1007/978-3-031-40942-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-40942-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40941-7
Online ISBN: 978-3-031-40942-4
eBook Packages: Computer ScienceComputer Science (R0)