Attention and Edge Memory Convolution for Bioactivity Prediction

. We present some augmentations to literature Message Passing Neural Network (MPNN) architectures and benchmark their performances against a wide range of chemically and pharmaceutically relevant datasets. We analyse the eﬀects of activation function for regularisation, we propose a new graph attention mechanism, and we implement a new edge-based memory system that should maximise the eﬀectiveness of hidden state usage by directing and isolating information ﬂow around the graph. We compare our results to the MolNet [14] benchmarking paper results on graph-based techniques, and also investigate the eﬀect of method performance as a function of dataset preprocessing.


Introduction
Many fields and research areas over the past decade have benefit greatly from the rise of deep learning [3]. AI has risen in popularity notably in the pharmaceutical industry, for activities such as bioactivity and physical-chemical property prediction, de novo design, synthesis prediction and image analysis, to name a few. The rapid growth of accessible computing power thanks to graphicallyaccelerated computing, and the ever increasing quantity of available chemical and biochemical data, have lead to a natural desire for data-hungry machine learning techniques such as deep learning to attempt to exploit this information to the greatest possible extent.

Graph Convolution
In Graph Convolutional Networks (GCNs), information propagates through a given graph much like how convolutional neural networks (CNNs) treat grid data (e.g. image data, text strings etc.). In contrast to image-data, however, graphs have irregular local connectivity, are not necessarily shift-invariant, and are not from a Euclidean domain (Fig. 1). CNNs exploit these properties for Funding from the EU H2020 MSC Grant 676434 "BigChem". their powerful performance, and to match this on graphs these problems must be surmounted, in a manner that is invariant to the ordered representation of the graph. Fig. 1. Convolutional Neural Networks operating on e.g. image data (left) have a regular Euclidean representation, with a fixed dimensionality of neighbours to each data point (vertical, horizontal, and channel-depth). With graph based data, however, each node can have a variable number of neighbours (irregular), and the graph can be traversed in any order (isomorphic representation) without an easily-describable canonical representation Message-Passing Neural Networks. Graph Neural Networks began in 2005 by Gori et al. [9], and in 2013 the first Graph Convolutional Network schema based on spectral graph theory was published by Bruna et al. [2]. Our work is focussed on the framework presented by Google -the Message Passing Neural Network [7], which was developed to generalize and be able to represent a selection of previously-published graph-based techniques [2,4,5,[10][11][12][13]. We analyse the performance effects of activation function (SELU)-based normalisation, and chemically-based dataset preprocessing, and propose two new novel architectures as extensions to the MPNN framework: Attention MPNN (AMPNN) in which attention is performed over hidden state vector elements, dependent on edge type, allowing weighted summation in the message-passing function.
An Edge-Memory network, in which hidden states belong to directed edges and can only propagate in a single direction, designed to naturally allow for asymmetric bias and to maximise useful hidden memory information when propagating a node's neighbourhood.

Low-Level Features from Graph Structure.
Unlike traditional cheminformatic approaches to Machine Learning tasks, which use feature engineering, graphs are one of the lowest-level representations of chemical structures from which many features can be calculated directly. By directly using a chemical structure as the starting point for deep learning, feature-engineering can be avoided, and prior assumptions about task-specific knowledge don't need to be made. Instead, task-specific features are learned within the network, and derived from the chemical structure directly. This allows for a potentially very powerful general-purpose approach to chemical task modelling, and also presents an interesting approach to the secure sharing of chemical data -the dissemination of trained models for activity prediction in lieu of chemical data itself, without the risk of reverse-engineering IP-sensitive structural information from e.g. chemical fingerprints [1,6], and the ability to jointly-train models without the need to pre-negotiate engineered features relevant to the task.

Results
We present results for models trained on benchmarking datasets both as presented verbatim, referred to as Original Dataset, and with custom preprocessing (Charge-Parent Missing-Data -CPMD). Results are in general on-par with state-of-the-art, beating classification performance on the MUV dataset, and obtaining lower error on the smaller LIPO regression set. The charge-parent aspect of database preprocessing was found to be negligible (no performance difference between e.g. SIDER models or single-task models, with no missing data values), but the introduction of missing data values and a suitable masking loss function was found to have a strong positive effect on performance on highly sparse sets (MUV), over tripling performance relative to MolNet for the Attention, Edge and SELU networks, bringing them on-par with SVM and beating SVM with the Edge-based approach. The charge-parent aspect of the preprocessing was done to investigate how robust the model is to ionic complexes, such as those shown in Fig. 4. As the network does not model ionic bonds, it was unknown whether disjoint graphs would interfere with the message propagation, and ions such as sodium would act as noise in the training. However, due to the lack of performance difference between the two sets when all data is present, it can be assumed that the readout function safely bridges these gaps and does not interfere with models' performance ( Figs. 2 and 3).   The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.