Attention and Edge Memory Convolution for Bioactivity Prediction
We present some augmentations to literature Message Passing Neural Network (MPNN) architectures and benchmark their performances against a wide range of chemically and pharmaceutically relevant datasets. We analyse the effects of activation function for regularisation, we propose a new graph attention mechanism, and we implement a new edge-based memory system that should maximise the effectiveness of hidden state usage by directing and isolating information flow around the graph. We compare our results to the MolNet  benchmarking paper results on graph-based techniques, and also investigate the effect of method performance as a function of dataset preprocessing.
KeywordsGraph convolution Cheminformatics Deep learning
Many fields and research areas over the past decade have benefit greatly from the rise of deep learning . AI has risen in popularity notably in the pharmaceutical industry, for activities such as bioactivity and physical-chemical property prediction, de novo design, synthesis prediction and image analysis, to name a few. The rapid growth of accessible computing power thanks to graphically-accelerated computing, and the ever increasing quantity of available chemical and biochemical data, have lead to a natural desire for data-hungry machine learning techniques such as deep learning to attempt to exploit this information to the greatest possible extent.
1.1 Graph Convolution
Message-Passing Neural Networks. Graph Neural Networks began in 2005 by Gori et al. , and in 2013 the first Graph Convolutional Network schema based on spectral graph theory was published by Bruna et al. . Our work is focussed on the framework presented by Google – the Message Passing Neural Network , which was developed to generalize and be able to represent a selection of previously-published graph-based techniques [2, 4, 5, 10, 11, 12, 13]. We analyse the performance effects of activation function (SELU)-based normalisation, and chemically-based dataset preprocessing, and propose two new novel architectures as extensions to the MPNN framework:
Attention MPNN (AMPNN) in which attention is performed over hidden state vector elements, dependent on edge type, allowing weighted summation in the message-passing function.
An Edge-Memory network, in which hidden states belong to directed edges and can only propagate in a single direction, designed to naturally allow for asymmetric bias and to maximise useful hidden memory information when propagating a node’s neighbourhood.
Low-Level Features from Graph Structure. Unlike traditional cheminformatic approaches to Machine Learning tasks, which use feature engineering, graphs are one of the lowest-level representations of chemical structures from which many features can be calculated directly. By directly using a chemical structure as the starting point for deep learning, feature-engineering can be avoided, and prior assumptions about task-specific knowledge don’t need to be made. Instead, task-specific features are learned within the network, and derived from the chemical structure directly. This allows for a potentially very powerful general-purpose approach to chemical task modelling, and also presents an interesting approach to the secure sharing of chemical data – the dissemination of trained models for activity prediction in lieu of chemical data itself, without the risk of reverse-engineering IP-sensitive structural information from e.g. chemical fingerprints [1, 6], and the ability to jointly-train models without the need to pre-negotiate engineered features relevant to the task.
We evaluate our networks on a selection of benchmarking datasets, referred to as: HIV (42k compounds, classification, single-task); MUV (93k compounds, classification, 17 tasks); Tox21 (8k compounds, classification, 12 tasks); ESOL (1k compounds, regression, single-task); QM8 (22k compounds, regression, 12 tasks); SIDER (1.4k compounds, classification, 27-task), LIPO (4k compounds, regression, single-task) and BBBP (2k compounds, classification, single-task). Datasets were split and tested according to previous MolNet benchmarking  and hyperparameter optimisation was performed using Bayesian Optimisation in parallel with Local Penalisation .
The project leading to this article received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 676434, “Big Data in Chemistry” (“BIGCHEM”, http://bigchem.eu). The article reflects only the authors’ view, and neither the European Commission nor the Research Executive Agency are responsible for any use that may be made of the information it contains.
- 2.Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv:1312.6203 [cs], December 2013. http://arxiv.org/abs/1312.6203
- 3.Chen, H., Engkvist, O., Wang, Y., Olivecrona, M., Blaschke, T.: The rise of deep learning in drug discovery. Drug Discov. Today 23(6), 1241–1250 (2018). https://doi.org/10.1016/j.drudis.2018.01.039, http://www.sciencedirect.com/science/article/pii/S1359644617303598CrossRefGoogle Scholar
- 4.Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 3844–3852. Curran Associates Inc. (2016). http://papers.nips.cc/paper/6081-convolutional-neural-networks-on-graphs-with-fast-localized-spectral-filtering.pdf
- 5.Duvenaud, D.K., et al.: Convolutional networks on graphs for learning molecular fingerprints. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2224–2232. Curran Associates Inc. (2015). http://papers.nips.cc/paper/5954-convolutional-networks-on-graphs-for-learning-molecular-fingerprints.pdf
- 7.Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. arXiv:1704.01212 [cs], April 2017. http://arxiv.org/abs/1704.01212
- 8.González, J., Dai, Z., Hennig, P., Lawrence, N.D.: Batch Bayesian Optimization via Local Penalization. arXiv:1505.08052 [stat], May 2015. http://arxiv.org/abs/1505.08052
- 9.Gori, M., Monfardini, G., Scarselli, F.: A new model for learning in graph domains. In: Proceedings of the 2005 IEEE International Joint Conference on Neural Networks 2005, vol. 2, pp. 729–734, July 2005. https://doi.org/10.1109/IJCNN.2005.1555942
- 11.Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv:1609.02907 [cs, stat], September 2016. http://arxiv.org/abs/1609.02907
- 12.Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated Graph Sequence Neural Networks. arXiv:1511.05493 [cs, stat], November 2015. http://arxiv.org/abs/1511.05493
- 14.Wu, Z., et al.: MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9(2), 513–530 (2018). https://doi.org/10.1039/C7SC02664A, https://pubs.rsc.org/en/content/articlelanding/2018/sc/c7sc02664aCrossRefGoogle Scholar
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.