Abstract
Graph neural networks (GNNs) are emerging as a powerful technique for modeling graph structures. Due to the sparsity of real-world graph data, GNN performance is limited by extensive sparse matrix multiplication (SpMM) operations involved in computation. While the right sparse matrix storage format varies across input data, existing deep learning frameworks employ a single, static storage format, leaving much room for improvement. This paper investigates how the choice of sparse matrix storage formats affect the GNN performance. We observe that choosing a suitable sparse matrix storage format can significantly improve the GNN training performance, but the right format depends on the input workloads and can change as the GNN iterates over the input graph. We then develop a predictive model to dynamically choose a sparse matrix storage format to be used by a GNN layer based on the input matrices. Our model is first trained offline using training matrix samples, and the trained model can be applied to any input matrix and GNN kernels with SpMM computation. We implement our approach on top of PyTorch and apply it to 5 representative GNN models running on a multi-core CPU using real-life and synthetic datasets. Experimental results show that our approach gives an average speedup of 1.17x (up to 3x) for GNN running time.
Z. Wang—This project was supported in part by an Alibaba Innovative Research Programme.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI (2016)
Bojchevski, A., Günnemann, S.: Deep gaussian embedding of graphs: unsupervised inductive learning via ranking. arXiv (2017)
Brockschmidt, M.: GNN-film: graph neural networks with feature-wise linear modulation. In: ICML 2020, 13–18 July 2020, Virtual Event (2020)
Buitinck, L., et al.: API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD Workshop (2013)
Chen, D., et al.: Optimizing sparse matrix-vector multiplications on an ARMv8-based many-core architecture. Int. J. Parallel Prog. 47, 418–432 (2019)
Chen, D., et al.: Characterizing scalability of sparse matrix-vector multiplications on Phytium FT-2000+. Int. J. Parallel Prog. 1, 80–97 (2020)
Chen, T., et al.: Xgboost: extreme gradient boosting. R Package 1(4), 1–4 (2015)
Cui, P., et al.: A survey on network embedding. IEEE TKDE 31(5), 833–852 (2018)
Cummins, C., et al.: End-to-end deep learning of optimization heuristics. In: PACT (2017)
Dalton, S., et al.: Optimizing sparse matrix-matrix multiplication for the GPU. ACM TOMS 41, 1–20 (2015)
Fey, M., Lenssen, J.E.: Fast graph representation learning with pytorch geometric. arXiv (2019)
Gardner, M.W., Dorling, S.: Artificial neural networks (the multilayer perceptron)-a review of applications in the atmospheric sciences. Atmos. Environ. 32, 2627–2636 (1998)
Gilbert, J.R., et al.: A unified framework for numerical and combinatorial computing. Comput. Sci. Eng. 10(2), 20–25 (2008)
Greathouse, J.L., Daga, M.: Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In: SC (2014)
Hamilton, W.L., et al.: Inductive representation learning on large graphs. In: NeurIPS (2017)
Hu, W., et al.: Open graph benchmark: Datasets for machine learning on graphs. arXiv (2020)
Huang, K., et al.: Understanding and bridging the gaps in current GNN performance optimizations. In: PPoPP (2021)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv (2016)
Langr, D., Tvrdik, P.: Evaluation criteria for sparse matrix storage formats. IEEE Trans. Parallel Distrib. Syst. 27(2), 428–440 (2015)
Li, J., et al.: SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication. In: PLDI (2013)
Mehrabi, A., et al.: Learning sparse matrix row permutations for efficient SPMM on GPU architectures. In: ISPASS (2021)
Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24, 1565–1567 (2006)
Paszke, A., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems (2019)
Pichel, J.C., Pateiro-López, B.: Sparse matrix classification on imbalanced datasets using convolutional neural networks. IEEE Access (2019)
Ren, J., et al.: Optimise web browsing on heterogeneous mobile platforms: a machine learning based approach. In: INFOCOM (2017)
Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 593–607. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_38
Sedaghati, N., et al.: Automatic selection of sparse matrix representation on GPUs. In: ICS (2015)
Tailor, S.A., Opolka, F.L., Liò, P., Lane, N.D.: Adaptive filters and aggregator fusion for efficient graph convolutions (2021)
Tournavitis, G., et al.: Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In: PLDI (2009)
Veličković, P., et al.: Graph attention networks (2018)
Venkat, A., et al.: Loop and data transformations for sparse matrix code. In: PLDI (2015)
Wang, H., et al.: Combining graph-based learning with automated data collection for code vulnerability detection. IEEE TIFS 16, 1943–1958 (2020)
Wang, M., et al.: Deep graph library: towards efficient and scalable deep learning on graphs. (2019)
Wang, Z., O’Boyle, M.: Machine learning in compiler optimization. In: Proceedings of the IEEE (2018)
Wang, Z., O’Boyle, M.F.: Mapping parallelism to multi-cores: a machine learning based approach. In: PPoPP (2009)
Wang, Z., O’Boyle, M.F.: Partitioning streaming parallelism for multi-cores: a machine learning based approach. In: PACT (2010)
Wang, Z., et al.: Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems. ACM TACO 11(4), 1–26 (2014)
Wang, Z., et al.: Integrating profile-driven parallelism detection and machine-learning-based mapping. ACM TACO 11, 1–26 (2014)
Xie, Y., et al.: When do GNNs work: understanding and improving neighborhood aggregation. In: IJCAI (2020)
Xu, K., et al.: Cross-lingual knowledge graph alignment via graph matching neural network (2019)
Ye, G., et al.: Deep program structure modeling through multi-relational graph-based learning. In: PACT (2020)
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Zhang, P., et al.: Auto-tuning streamed applications on Intel Xeon Phi. In: IPDPS (2018)
Zhang, P., et al.: Optimizing streaming parallelism on heterogeneous many-core architectures. IEEE TPDS 31(8), 1878–1896 (2020)
Zhao, Y., et al.: Bridging the gap between deep learning and sparse matrix format selection. In: PPoPP (2018)
Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Qiu, S., You, L., Wang, Z. (2022). Optimizing Sparse Matrix Multiplications for Graph Neural Networks. In: Li, X., Chandrasekaran, S. (eds) Languages and Compilers for Parallel Computing. LCPC 2021. Lecture Notes in Computer Science, vol 13181. Springer, Cham. https://doi.org/10.1007/978-3-030-99372-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-99372-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99371-9
Online ISBN: 978-3-030-99372-6
eBook Packages: Computer ScienceComputer Science (R0)