Abstract
Graph Neural Networks (GNNs) are a form of deep learning that have found use for a variety of problems, including the modeling of drug interactions, time-series analysis, and traffic prediction. They represent the problem using non-Euclidian graphs, allowing for a high degree of versatility, and are able to learn complex relationships by iteratively aggregating more contextual information from neighbors that are farther away. Inspired by its power in transformers, Graph Attention Networks (GATs) incorporate an attention mechanism on top of graph aggregation. GATs are considered the state of the art due to their superior performance. To learn the best parameters for a given graph problem, GATs use traditional backpropagation to compute weight updates. To the best of our knowledge, these updates are calculated in software, and closed-form equations describing their calculation for GATs aren’t well known. This paper derives closed-form equations for backpropagation in GATs using matrix notation. These equations can form the basis for design of hardware accelerators for training GATs.
Similar content being viewed by others
Data Availability
Data sharing is not applicable to this article, as no datasets were generated or analyzed during the current study.
References
Cheng, Z., Yan, C., Wu, F. X., & Wang, J. (2022). Drug-target interaction prediction using multi-head self-attention and graph attention network. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(4), 2208–2218. https://doi.org/10.1109/TCBB.2021.3077905. Conference Name: IEEE/ACM Transactions on Computational Biology and Bioinformatics.
Yang, Z., Liu, J., Wang, Z., Wang, Y., & Feng, J. (2020). Multi-class metabolic pathway prediction by graph attention-based deep learning method. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 126–131. https://doi.org/10.1109/BIBM49941.2020.9313298
Zhao, H., Wang, Y., Duan, J., Huang, C., Cao, D., Tong, Y., Xu, B., Bai, J., Tong, J., & Zhang, Q. (2020). Multivariate time-series anomaly detection via graph attention network. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 841–850. https://doi.org/10.1109/ICDM50108.2020.00093. ISSN: 2374-8486.
Zhang, C., James, J. Q., & Liu, Y. (2019). Spatial-temporal graph attention networks: A deep learning approach for traffic forecasting. IEEE Access, 7, 166246–166256. https://doi.org/10.1109/ACCESS.2019.2953888. Conference Name: IEEE Access.
Balaji, S. S., & Parhi, K. K. (2023). Classifying Subjects with PFC Lesions from Healthy Controls during Working Memory Encoding via Graph Convolutional Networks. In: 2023 11th International IEEE/EMBS Conference on Neural Engineering (NER), pp. 1–4. https://doi.org/10.1109/NER52421.2023.10123793. ISSN: 1948-3554.
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2018). Graph Attention Networks. arXiv. arXiv:1710.10903 [cs, stat]. http://arxiv.org/abs/1710.10903. Accessed 24 Feb 2023.
Brody, S., Alon, U., & Yahav, E. (January 2022). How attentive are graph attention networks? Technical Report arXiv:2105.14491, arXiv. arXiv:2105.14491 [cs] type: article. http://arxiv.org/abs/2105.14491. Accessed 2024 Feb 2023.
Unnikrishnan, N. K., & Parhi, K. K. (2023). InterGrad: Energy-efficient training of convolutional neural networks via interleaved gradient scheduling. IEEE Transactions on Circuits and Systems I: Regular Papers, 70(5), 1949–1962. https://doi.org/10.1109/TCSI.2023.3246468. Conference Name: IEEE Transactions on Circuits and Systems I: Regular Papers
Gori, M., Monfardini, G., & Scarselli, F. (2005). A new model for learning in graph domains. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, vol. 2, pp. 729–7342. https://doi.org/10.1109/IJCNN.2005.1555942. ISSN: 2161-4407
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2009). The graph neural network model. IEEE Transactions on Neural Networks, 20(1), 61–80. https://doi.org/10.1109/TNN.2008.2005605. Conference Name: IEEE Transactions on Neural Networks
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Yu, P. S. (2021). A comprehensive survey on graph neural networks. IEEE Transactions Neural Networks Learning System, 32(1), 4–24. https://doi.org/10.1109/TNNLS.2020.2978386. arXiv:1901.00596 [cs, stat]. Accessed 24 Feb 2023.
Parhi, K. K., & Unnikrishnan, N. K. (2020). Brain-inspired computing: models and architectures. IEEE Open Journal of Circuits and Systems, 1, 185–204. https://doi.org/10.1109/OJCAS.2020.3032092. Conference Name: IEEE Open Journal of Circuits and Systems
Kipf, T. N., & Welling, M. (February 2017). Semi-supervised classification with graph convolutional networks. Technical Report. arXiv:1609.02907, arXiv. arXiv:1609.02907 [cs, stat] type: article. http://arxiv.org/abs/1609.02907. Accessed 24 Feb 2023.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (June 2017). Neural message passing for quantum chemistry. Technical Report. arXiv:1704.01212, arXiv. arXiv:1704.01212 [cs] type: article. http://arxiv.org/abs/1704.01212. Accessed 24 Feb 2023.
Zhang, B., & Prasanna, V. (2023). Dynasparse: Accelerating GNN inference through dynamic sparsity exploitation. https://arxiv.org/abs/2303.12901v1. Accessed 3 Jun 2023.
Mondal, S., Manasi, S. D., Kunal, K., Ramprasath, S., & Sapatnekar, S. S. (2021). GNNIE: GNN inference engine with load-balancing and graph-specific caching. https://arxiv.org/abs/2105.10554v2. Accessed 3 Jun 2023.
He, Z., Tian, T., Wu, Q., & Jin, X. (2023). FTW-GAT: An FPGA-based accelerator for graph attention networks with ternary weights. IEEE Transactions on Circuits and Systems II: Express Briefs, 1–1. https://doi.org/10.1109/TCSII.2023.3280180. Conference Name: IEEE Transactions on Circuits and Systems II: Express Briefs.
Geng, T., Wu, C., Zhang, Y., Tan, C., Xie, C., You, H., Herbordt, M., Lin, Y., & Li, A. (2021). I-GCN: A graph convolutional network accelerator with runtime locality enhancement through islandization. In: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO ’21, pp. 1051–1063. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3466752.3480113
Zeng, H., & Prasanna, V. (2019). GraphACT: Accelerating GCN training on CPU-FPGA heterogeneous platforms. https://doi.org/10.1145/3373087.3375312. https://arxiv.org/abs/2001.02498v1. Accessed 3 Jun 2023.
Chen, X., Wang, Y., Xie, X., Hu, X., Basak, A., Liang, L., Yan, M., Deng, L., Ding, Y., Du, Z., & Xie, Y. (2022). Rubik: A hierarchical architecture for efficient graph neural network training. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 41(4), 936–949. https://doi.org/10.1109/TCAD.2021.3079142. Conference Name: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
Zheng, D., Ma, C., Wang, M., Zhou, J., Su, Q., Song, X., Gan, Q., Zhang, Z., & Karypis, G. (2020). DistDGL: Distributed graph neural network training for billion-scale graphs. https://arxiv.org/abs/2010.05337v3. Accessed 3 Jun 2023.
Lin, Z., Li, C., Miao, Y., Liu, Y., & Xu, Y. (2020). Pagraph: Scaling GNN training on large graphs via computation-aware caching. In: Proceedings of the 11th ACM Symposium on Cloud Computing. SoCC ’20, pp. 401–415. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3419111.3421281
Lin, Y.-C., Zhang, B., & Prasanna, V. (2023). HitGNN: High-throughput GNN training framework on CPU+Multi-FPGA heterogeneous platform. https://arxiv.org/abs/2303.01568v1. Accessed 3 Jun 2023.
Luong, M.-T., Pham, H., Manning, C.D. (2015). Effective approaches to attention-based neural machine translation. arXiv. arXiv:1508.04025 [cs]. http://arxiv.org/abs/1508.04025. Accessed 27 Jan 2023.
Gehring, J., Auli, M., Grangier, D., & Dauphin, Y. N. (July 2017). A convolutional encoder model for neural machine translation. Technical Report. arXiv:1611.02344, arXiv. arXiv:1611.02344 [cs] type: article. http://arxiv.org/abs/1611.02344. Accessed 27 Jan 2023.
Mnih, V., Heess, N., Graves, A., & Kavukcuoglu, K. (June 2014). Recurrent Models of Visual Attention. Technical Report. arXiv:1406.6247, arXiv. arXiv:1406.6247 [cs, stat] type: article. http://arxiv.org/abs/1406.6247. Accessed 27 Jan 2023.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (December 2017). Attention is all you need. Technical Report. arXiv:1706.03762, arXiv. arXiv:1706.03762 [cs] type: article. http://arxiv.org/abs/1706.03762. Accessed 27 Jan 2023.
Unnikrishnan, N. K., & Parhi, K. K. (2021). LayerPipe: Accelerating deep neural network training by intra-layer and inter-layer gradient pipelining and multiprocessor scheduling. In: 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–8. https://doi.org/10.1109/ICCAD51958.2021.9643567. ISSN: 1558-2434.
Acknowledgements
The authors thank Nanda Unnikrishnan for numerous useful discussions.
Funding
This paper was supported in part by the National Science Foundation under grant number CCF-1954749.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix. Supplementary Figures
Appendix. Supplementary Figures
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gould, J., Parhi, K.K. Backpropagation Computation for Training Graph Attention Networks. J Sign Process Syst 96, 1–14 (2024). https://doi.org/10.1007/s11265-023-01897-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-023-01897-1