Backpropagation Computation for Training Graph Attention Networks

Gould, Joe; Parhi, Keshab K.

doi:10.1007/s11265-023-01897-1

Backpropagation Computation for Training Graph Attention Networks

Published: 16 October 2023

Volume 96, pages 1–14, (2024)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

279 Accesses
Explore all metrics

Abstract

Graph Neural Networks (GNNs) are a form of deep learning that have found use for a variety of problems, including the modeling of drug interactions, time-series analysis, and traffic prediction. They represent the problem using non-Euclidian graphs, allowing for a high degree of versatility, and are able to learn complex relationships by iteratively aggregating more contextual information from neighbors that are farther away. Inspired by its power in transformers, Graph Attention Networks (GATs) incorporate an attention mechanism on top of graph aggregation. GATs are considered the state of the art due to their superior performance. To learn the best parameters for a given graph problem, GATs use traditional backpropagation to compute weight updates. To the best of our knowledge, these updates are calculated in software, and closed-form equations describing their calculation for GATs aren’t well known. This paper derives closed-form equations for backpropagation in GATs using matrix notation. These equations can form the basis for design of hardware accelerators for training GATs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A unified framework for backpropagation-free soft and hard gated graph neural networks

Article Open access 26 December 2023

A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions

Article Open access 16 January 2024

Recent Graph Neural Networks: A Survey

Data Availability

Data sharing is not applicable to this article, as no datasets were generated or analyzed during the current study.

References

Cheng, Z., Yan, C., Wu, F. X., & Wang, J. (2022). Drug-target interaction prediction using multi-head self-attention and graph attention network. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(4), 2208–2218. https://doi.org/10.1109/TCBB.2021.3077905. Conference Name: IEEE/ACM Transactions on Computational Biology and Bioinformatics.
Yang, Z., Liu, J., Wang, Z., Wang, Y., & Feng, J. (2020). Multi-class metabolic pathway prediction by graph attention-based deep learning method. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 126–131. https://doi.org/10.1109/BIBM49941.2020.9313298
Zhao, H., Wang, Y., Duan, J., Huang, C., Cao, D., Tong, Y., Xu, B., Bai, J., Tong, J., & Zhang, Q. (2020). Multivariate time-series anomaly detection via graph attention network. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 841–850. https://doi.org/10.1109/ICDM50108.2020.00093. ISSN: 2374-8486.
Zhang, C., James, J. Q., & Liu, Y. (2019). Spatial-temporal graph attention networks: A deep learning approach for traffic forecasting. IEEE Access, 7, 166246–166256. https://doi.org/10.1109/ACCESS.2019.2953888. Conference Name: IEEE Access.
Balaji, S. S., & Parhi, K. K. (2023). Classifying Subjects with PFC Lesions from Healthy Controls during Working Memory Encoding via Graph Convolutional Networks. In: 2023 11th International IEEE/EMBS Conference on Neural Engineering (NER), pp. 1–4. https://doi.org/10.1109/NER52421.2023.10123793. ISSN: 1948-3554.
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2018). Graph Attention Networks. arXiv. arXiv:1710.10903 [cs, stat]. http://arxiv.org/abs/1710.10903. Accessed 24 Feb 2023.
Brody, S., Alon, U., & Yahav, E. (January 2022). How attentive are graph attention networks? Technical Report arXiv:2105.14491, arXiv. arXiv:2105.14491 [cs] type: article. http://arxiv.org/abs/2105.14491. Accessed 2024 Feb 2023.
Unnikrishnan, N. K., & Parhi, K. K. (2023). InterGrad: Energy-efficient training of convolutional neural networks via interleaved gradient scheduling. IEEE Transactions on Circuits and Systems I: Regular Papers, 70(5), 1949–1962. https://doi.org/10.1109/TCSI.2023.3246468. Conference Name: IEEE Transactions on Circuits and Systems I: Regular Papers
Gori, M., Monfardini, G., & Scarselli, F. (2005). A new model for learning in graph domains. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, vol. 2, pp. 729–7342. https://doi.org/10.1109/IJCNN.2005.1555942. ISSN: 2161-4407
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2009). The graph neural network model. IEEE Transactions on Neural Networks, 20(1), 61–80. https://doi.org/10.1109/TNN.2008.2005605. Conference Name: IEEE Transactions on Neural Networks
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Yu, P. S. (2021). A comprehensive survey on graph neural networks. IEEE Transactions Neural Networks Learning System, 32(1), 4–24. https://doi.org/10.1109/TNNLS.2020.2978386. arXiv:1901.00596 [cs, stat]. Accessed 24 Feb 2023.
Parhi, K. K., & Unnikrishnan, N. K. (2020). Brain-inspired computing: models and architectures. IEEE Open Journal of Circuits and Systems, 1, 185–204. https://doi.org/10.1109/OJCAS.2020.3032092. Conference Name: IEEE Open Journal of Circuits and Systems
Kipf, T. N., & Welling, M. (February 2017). Semi-supervised classification with graph convolutional networks. Technical Report. arXiv:1609.02907, arXiv. arXiv:1609.02907 [cs, stat] type: article. http://arxiv.org/abs/1609.02907. Accessed 24 Feb 2023.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (June 2017). Neural message passing for quantum chemistry. Technical Report. arXiv:1704.01212, arXiv. arXiv:1704.01212 [cs] type: article. http://arxiv.org/abs/1704.01212. Accessed 24 Feb 2023.
Zhang, B., & Prasanna, V. (2023). Dynasparse: Accelerating GNN inference through dynamic sparsity exploitation. https://arxiv.org/abs/2303.12901v1. Accessed 3 Jun 2023.
Mondal, S., Manasi, S. D., Kunal, K., Ramprasath, S., & Sapatnekar, S. S. (2021). GNNIE: GNN inference engine with load-balancing and graph-specific caching. https://arxiv.org/abs/2105.10554v2. Accessed 3 Jun 2023.
He, Z., Tian, T., Wu, Q., & Jin, X. (2023). FTW-GAT: An FPGA-based accelerator for graph attention networks with ternary weights. IEEE Transactions on Circuits and Systems II: Express Briefs, 1–1. https://doi.org/10.1109/TCSII.2023.3280180. Conference Name: IEEE Transactions on Circuits and Systems II: Express Briefs.
Geng, T., Wu, C., Zhang, Y., Tan, C., Xie, C., You, H., Herbordt, M., Lin, Y., & Li, A. (2021). I-GCN: A graph convolutional network accelerator with runtime locality enhancement through islandization. In: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO ’21, pp. 1051–1063. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3466752.3480113
Zeng, H., & Prasanna, V. (2019). GraphACT: Accelerating GCN training on CPU-FPGA heterogeneous platforms. https://doi.org/10.1145/3373087.3375312. https://arxiv.org/abs/2001.02498v1. Accessed 3 Jun 2023.
Chen, X., Wang, Y., Xie, X., Hu, X., Basak, A., Liang, L., Yan, M., Deng, L., Ding, Y., Du, Z., & Xie, Y. (2022). Rubik: A hierarchical architecture for efficient graph neural network training. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 41(4), 936–949. https://doi.org/10.1109/TCAD.2021.3079142. Conference Name: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
Zheng, D., Ma, C., Wang, M., Zhou, J., Su, Q., Song, X., Gan, Q., Zhang, Z., & Karypis, G. (2020). DistDGL: Distributed graph neural network training for billion-scale graphs. https://arxiv.org/abs/2010.05337v3. Accessed 3 Jun 2023.
Lin, Z., Li, C., Miao, Y., Liu, Y., & Xu, Y. (2020). Pagraph: Scaling GNN training on large graphs via computation-aware caching. In: Proceedings of the 11th ACM Symposium on Cloud Computing. SoCC ’20, pp. 401–415. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3419111.3421281
Lin, Y.-C., Zhang, B., & Prasanna, V. (2023). HitGNN: High-throughput GNN training framework on CPU+Multi-FPGA heterogeneous platform. https://arxiv.org/abs/2303.01568v1. Accessed 3 Jun 2023.
Luong, M.-T., Pham, H., Manning, C.D. (2015). Effective approaches to attention-based neural machine translation. arXiv. arXiv:1508.04025 [cs]. http://arxiv.org/abs/1508.04025. Accessed 27 Jan 2023.
Gehring, J., Auli, M., Grangier, D., & Dauphin, Y. N. (July 2017). A convolutional encoder model for neural machine translation. Technical Report. arXiv:1611.02344, arXiv. arXiv:1611.02344 [cs] type: article. http://arxiv.org/abs/1611.02344. Accessed 27 Jan 2023.
Mnih, V., Heess, N., Graves, A., & Kavukcuoglu, K. (June 2014). Recurrent Models of Visual Attention. Technical Report. arXiv:1406.6247, arXiv. arXiv:1406.6247 [cs, stat] type: article. http://arxiv.org/abs/1406.6247. Accessed 27 Jan 2023.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (December 2017). Attention is all you need. Technical Report. arXiv:1706.03762, arXiv. arXiv:1706.03762 [cs] type: article. http://arxiv.org/abs/1706.03762. Accessed 27 Jan 2023.
Unnikrishnan, N. K., & Parhi, K. K. (2021). LayerPipe: Accelerating deep neural network training by intra-layer and inter-layer gradient pipelining and multiprocessor scheduling. In: 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–8. https://doi.org/10.1109/ICCAD51958.2021.9643567. ISSN: 1558-2434.

Download references

Acknowledgements

The authors thank Nanda Unnikrishnan for numerous useful discussions.

Funding

This paper was supported in part by the National Science Foundation under grant number CCF-1954749.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, 55455, MN, USA
Joe Gould & Keshab K. Parhi

Authors

Joe Gould
View author publications
You can also search for this author in PubMed Google Scholar
Keshab K. Parhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Joe Gould or Keshab K. Parhi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Supplementary Figures

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gould, J., Parhi, K.K. Backpropagation Computation for Training Graph Attention Networks. J Sign Process Syst 96, 1–14 (2024). https://doi.org/10.1007/s11265-023-01897-1

Download citation

Received: 03 June 2023
Revised: 27 September 2023
Accepted: 28 September 2023
Published: 16 October 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11265-023-01897-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Backpropagation Computation for Training Graph Attention Networks

Abstract

Access this article

Similar content being viewed by others

A unified framework for backpropagation-free soft and hard gated graph neural networks