Skip to main content
Log in

DIESEL: A novel deep learning-based tool for SpMV computations and solving sparse linear equation systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Sparse linear algebra is central to many areas of engineering, science, and business. The community has done considerable work on proposing new methods for sparse matrix-vector multiplication (SpMV) computations and iterative sparse solvers on graphical processing units (GPUs). Due to vast variations in matrix features, no single method performs well across all sparse matrices. A few tools on automatic prediction of best-performing SpMV kernels have emerged recently and require many more efforts to fully utilize their potential. The utilization of a GPU by the existing SpMV kernels is far from its full capacity. Moreover, the development and performance analysis of SpMV techniques on GPUs have not been studied in sufficient depth. This paper proposes DIESEL, a deep learning-based tool that predicts and executes the best performing SpMV kernel for a given matrix using a feature set carefully devised by us through rigorous empirical and mathematical instruments. The dataset comprises 1056 matrices from 26 different real-life application domains including computational fluid dynamics, materials, electromagnetics, economics, and more. We propose a range of new metrics and methods for performance analysis, visualization, and comparison of SpMV tools. DIESEL provides better performance with its accuracy \(88.2\%\), workload accuracy \(91.96\%\), and average relative loss \(4.4\%\), compared to \(85.9\%\), \(85.31\%\), and \(7.65\%\) by the next best performing artificial intelligence (AI)-based SpMV tool. The extensive results and analyses presented in this paper provide several key insights into the performance of the SpMV tools and how these relate to the matrix datasets and the performance metrics, allowing the community to further improve and compare basic and AI-based SpMV tools in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  1. AlAhmadi S, Muhammed T, Mehmood R, Albeshri A (2020) Performance characteristics for sparse matrix-vector multiplication on GPUs. Springer International Publishing, Cham, pp 409–426. https://doi.org/10.1007/978-3-030-13705-2_17

  2. Alyahya H, Mehmood R, Katib I (2020) Parallel iterative solution of large sparse linear equation systems on the intel MIC architecture. Springer International Publishing, Cham, pp 377–407. https://doi.org/10.1007/978-3-030-13705-2_16

  3. Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA (2006) The landscape of parallel computing research: a view from Berkeley. Tech. Rep. UCB/EECS-2006-183, EECS Department, University of California, Berkeley, http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html

  4. Baskaran MM, Bordawekar R (2009) Optimizing sparse matrix-vector multiplication on GPUs. Tech. Rep. RC24704 (W0812-047), IBM Research

  5. Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA. Tech. rep., Nvidia Technical Report NVR-2008-004, Nvidia Corporation

  6. Benatia A, Ji W, Wang Y, Shi F (2016) Sparse matrix format selection with multiclass SVM for SpMV on GPU. In: 2016 45th International Conference on Parallel Processing (ICPP), pp 496–505. https://doi.org/10.1109/ICPP.2016.64

  7. Benatia A, Ji W, Wang Y, Shi F (2018) Bestsf: a sparse meta-format for optimizing SpMV on GPU. ACM Trans Archit Code Optim 15(3). https://doi.org/10.1145/3226228

  8. Bengio Y (2009) Learning deep architectures for ai. Found Trends Mach Learn 2(1):1–127. https://doi.org/10.1561/2200000006

    Article  MathSciNet  MATH  Google Scholar 

  9. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828. https://doi.org/10.1109/TPAMI.2013.50

    Article  Google Scholar 

  10. Bernaschi M, Bisson M, Fantozzi C, Janna C (2016) A factored sparse approximate inverse preconditioned conjugate gradient solver on graphics processing units. SIAM J Sci Comput 38(1):C53–C72. https://doi.org/10.1137/15M1027826

    Article  MathSciNet  MATH  Google Scholar 

  11. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27, software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  12. Choi JW, Singh A, Vuduc RW (2010) Model-driven autotuning of sparse matrix-vector multiply on GPUs. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, New York, NY, USA, PPoPP ’10, pp 115 – 126. https://doi.org/10.1145/1693453.1693471

  13. Davis TA, Hu Y (2011) The university of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1:1–1:25. https://doi.org/10.1145/2049662.2049663

  14. Dhar S, Guo J, Liu J, Tripathi S, Kurup U, Shah M (2020) On-device machine learning: an algorithms and learning theory perspective. 1911.00623

  15. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639):115–118. http://dx.doi.org/10.1038/nature21056, letter

  16. Filippone S, Cardellini V, Barbieri D, Fanfarillo A (2017) Sparse matrix-vector multiplication on GPGPUs. ACM Trans Math Softw 43(4):1–49. https://doi.org/10.1145/3017994

    Article  MathSciNet  MATH  Google Scholar 

  17. Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU Press

  18. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, http://www.deeplearningbook.org

  19. Grimes RG, Kincaid DR, Young DM (1979) ITPACK 2.0 user’s guide. Center for Numerical Analysis, The University of Texas at Austin

  20. Grossman M, Thiele C, Araya-Polo M, Frank F, Alpak FO, Sarkar V (2016) A survey of sparse matrix-vector multiplication performance on large matrices. ArXiv abs/1608.00636

  21. Guo P, Wang L, Chen P (2014) A performance modeling and optimization analysis tool for sparse matrix-vector multiplication on GPUs. IEEE Trans Parallel Distrib Syst 25(5):1112–1123. https://doi.org/10.1109/TPDS.2013.123

    Article  Google Scholar 

  22. Janna C, Ferronato M, Gambolati G (2015) The use of supernodes in factored sparse approximate inverse preconditioning. SIAM J Sci Comput 37(1):C72–C94. https://doi.org/10.1137/140956026

    Article  MathSciNet  MATH  Google Scholar 

  23. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR abs/1412.6980,

  24. Kirk DB, Wen-Mei WH (2016) Programming massively parallel processors: a hands-on approach. Morgan kaufmann

  25. Kreutzer M, Hager G, Wellein G, Fehske H, Basermann A, Bishop AR (2012) Sparse matrix-vector multiplication on GPGPU clusters: a new storage format and a scalable implementation. In: Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International, IEEE, pp 1696–1702

  26. Li K, Yang W, Li K (2015) Performance analysis and optimization for SpMV on GPU using probabilistic modeling. IEEE Trans Parallel Distrib Syst 26(1):196–205. https://doi.org/10.1109/TPDS.2014.2308221

    Article  Google Scholar 

  27. Li R, Saad Y (2013) GPU-accelerated preconditioned iterative linear solvers. J Supercomput 63(2):443–466. https://doi.org/10.1007/s11227-012-0825-3

    Article  Google Scholar 

  28. van der Maaten L, Hinton G (2012) Visualizing non-metric similarities in multiple maps. Mach Learn 87(1):33–55. https://doi.org/10.1007/s10994-011-5273-4

    Article  MathSciNet  MATH  Google Scholar 

  29. Maaten Lvd, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605

  30. Mehmood R, Crowcroft J (2005) Parallel iterative solution method for large sparse linear equation systems. University of Cambridge, Computer Laboratory

  31. Mohammed T (2017) A novel deep learning based iterative solver for large sparse linear equation systems. Master’s thesis, King Abdulaziz University. https://kaupp.sa/Details/Thesis/133000

  32. Muhammed T, Mehmood R, Albeshri A, Katib I (2019) SURAA: a novel method and tool for loadbalanced and coalesced SpMV computations on GPUs. Appl Sci 9(5):947. https://doi.org/10.3390/app9050947

    Article  Google Scholar 

  33. Nisa I, Siegel C, Rajam AS, Vishnu A, Sadayappan P (2018) Effective machine learning based format selection and performance modeling for SpMV on GPUs. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 1056–1065. https://doi.org/10.1109/IPDPSW.2018.00164

  34. Saad Y, van der Vorst HA (2000) Iterative solution of linear systems in the 20th century. J Comput Appl Math 123(1–2):1–33. https://doi.org/10.1016/S0377-0427(00)00412-X, http://www.sciencedirect.com/science/article/pii/ S037704270000412X, numerical Analysis 2000. Vol. III: Linear Algebra

  35. Sedaghati N, Mu T, Pouchet LN, Parthasarathy S, Sadayappan P (2015) Automatic selection of sparse matrix representation on GPUs. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ACM, New York, NY, USA, ICS ’15, pp 99–108. https://doi.org/10.1145/2751205.2751244

  36. Tan G, Liu J, Li J (2018) Design and implementation of adaptive SpMV library for multicore and many-core architecture. ACM Trans Math Softw 44(4). https://doi.org/10.1145/3218823

  37. Usman S, Mehmood R, Katib I, Albeshri A (2019a) ZAKI+: a machine learning based process mapping tool for SpMV computations on distributed memory architectures. IEEE Access 7:81279–81296. https://doi.org/10.1109/ACCESS.2019.2923565

    Article  Google Scholar 

  38. Usman S, Mehmood R, Katib I, Albeshri A, Altowaijri S (2019b) ZAKI: a smart method and tool for automatic performance optimization of parallel SpMV computations on distributed memory machines. Mobile Netw Appl

  39. Usman S, Mehmood R, Katib I (2020) Big data and HPC convergence for smart infrastructures: a review and proposed architecture. Springer International Publishing, Cham, pp 561–586. https://doi.org/10.1007/978-3-030-13705-2_23

  40. Verschoor M, Jalba AC (2012) Analysis and performance estimation of the Conjugate Gradient method on multiple GPUs. Parallel Comput 38(10–11):552–575. https://doi.org/10.1016/j.parco.2012.07.002, http://www.sciencedirect.com/science/article/pii/ S0167819112000609

  41. Zardoshti P, Khunjush F, Sarbazi-Azad H (2015) Adaptive sparse matrix representation for efficient matrix–vector multiplication. J Supercomput pp 1–21

Download references

Acknowledgements

This project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under Grant Number RG-6-611-40. The authors, therefore, acknowledge with thanks the DSR for their technical and financial support. This work is supported by the HPC Center at King Abdulaziz University, Jeddah, Saudi Arabia. The experiments reported in this paper were performed on the Aziz supercomputer at KAU. We are thankful to the anonymous reviewers whose comments have greatly improved the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rashid Mehmood.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohammed, T., Albeshri, A., Katib, I. et al. DIESEL: A novel deep learning-based tool for SpMV computations and solving sparse linear equation systems. J Supercomput 77, 6313–6355 (2021). https://doi.org/10.1007/s11227-020-03489-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03489-3

Keywords

Navigation