Classification of drugs based on mechanism of action using machine learning techniques

The mechanism of action is an important aspect of drug development. It can help scientists in the process of drug discovery. This paper provides a machine learning model to predict the mechanism of action of a drug. The machine learning models used in this paper are Binary Relevance K Nearest Neighbors (Type A and Type B), Multi-label K-Nearest Neighbors and a custom neural network. These machine learning models are evaluated using the mean column-wise log loss. The custom neural network model had the best accuracy with a log loss of 0.01706. This neural network model is integrated into a web application using Flask framework. A user can upload a custom testing features dataset, which contains the gene expression and the cell viability levels. The web application will output the top classes of drugs, along with the scatter plots for each of the drug.


Introduction
The term "mechanism of action" (MoA) refers to how a medicine or other substance causes an effect in the body. A drug's mechanism of action, for example, could be how it affects a specific target in a cell, such as an enzyme, or how it impacts a cell function, such as cell proliferation. Knowing a drug's mechanism of action can provide information about the drug's safety and how it affects the body [1].
The majority of medications work by interacting with proteins in the host or pathogen. Drug targets include a variety of proteins, and the name receptor is only used when the interaction results in a signal transmission cascade. A receptor is a molecule or polymeric structure that identifies and binds an endogenous substance on the surface or inside a cell. When a substance elicits a detectable physiological or pharmacological response characteristic of the receptor, it is said to be an agonist. Some medications may be unable to initiate any action on their own after binding to a receptor site, but they can prevent the action of other agonists. These are called Antagonists [2].
Understanding a biologically active compound's mechanism of action entails not only identifying the target but also investigating the biological chemistry that occurs before or after target binding. Many genes are involved in a drug's mechanism of action, and so have an impact on sensitivity. The mechanism of action of a small molecule encompasses both the intracellular target(s) and the activities that occur before and after target engagement [3].
Because of the high level of intricacy in the interactions between the Tuberculosis medicine and Mycobacterium tuberculosis, Tuberculosis treatment requires an adequate cognizance of MOA, which is critical for the successful delivery of drug candidates. Several methods were outlined for investigating tuberculosis drug MoA's and provided guidance for future tuberculosis drug development. In the context of Mycobacterium tuberculosis pathogenesis, they evaluated diverse platforms for their strengths and limitations in Tuberculosis drug MOA elucidation [4].
Bioinformatics is a science that uses several layers of information such as image-based data, pathways and gene expression to aid in the understanding of Mechanisms of Action. In order to comprehend MoA, it is necessary to analyse the complicated responses of the human biological system to drug treatments. The impact of bio-informatics on drug discovery and several Bioinformatic methods for the comprehension of Mechanisms of Action were discussed [5].
This paper discusses about the various machine learning models and their accuracies. Further, a web application is developed using the Flask web framework. The machine learning model which has the best accuracy is utilized for developing this web application. This web application can be helpful for the scientists in the drug discovery process.
The rest of this paper is organized as follows. In Sect. 2, a literature survey of mechanism of action is described. In Sect. 3, the methodology is shown. In Sect. 4, the pre-processing of the dataset is presented. In Sect. 5, the evaluation of the machine learning model is elucidated. In Sect. 6, various machine learning models and their results are presented. In Sect. 7, the architecture of the web application is shown. In Sect. 8, the snapshots of the running web application is presented. Finally, Sect. 9 concludes this paper.

Literature survey of mechanism of action
Some drugs's mechanism of action are unrevealed. Meanwhile, several drugs' mechanism of action have been already discovered. For example, Aspirin's mechanism of action involves irreversible inhibition of the enzyme cyclooxygenase, which reduces inflammation and pain by decreasing the formation of thromboxanes and prostaglandins.
Different drugs can have different mechanism of action. A literature survey of mechanism of action of various drugs are described in Table 1. In human prostate cancer cells with various levels of treatment resistance, the efficacy and mechanism of action of the marine alkaloid 3,10-dibromofascaplysin were examined. All of the cell lines tested showed anticancer activity The mechanism of action of aspirin [7] The enzyme cyclooxygenase is inhibited by aspirin. Aspirin-like medications hindered the generation of physiologically significant Prostaglandins by blocking this step in the Prostaglandin production process. This gave a unified explanation for the aspirin-like medicines' therapeutic activity and its common adverse effects Research on the Mechanism of Action of a Citrinin and Anti-Citrinin Antibody Based on Mimotope X27 [8] A mimotope is a powerful recognition receptor that can be utilised to investigate antigen and antibody mechanisms of action. The mimotope method was used to develop a binding model between citrinin and antibody. They discussed a method for increasing the sensitivity of detection of citrinin in immunoassays The mechanism of action of ramoplanin and enduracidin [9] They tested whether ramoplanin and enduracidin had an innate preference for one step over the other using inhibitory kinetics and binding assays. They discovered that as compared to the MurG stage, both ramoplanin and enduracidin hindered the transglycosylation step of peptidoglycan production Mechanism of action of antipruritic drugs [10,18] They found that Antipruritic drugs acted centrally via a sedative characteristic, but H1 receptor antagonists only have a peripheral antipruritic effect when itching is caused by histamine release Mechanism of Action of Atypical Antipsychotic Drugs in Mood Disorders [11,19,20] The neural mechanisms of existing atypical antipsychotics and prospective antipsychotics were discussed in this paper, as well as how they relate to their efficacy in mood disorders such as anxiety and depression 1 3

Methodology
Serialization and de-serialization is used in order to avoid training the model using the training dataset every time a fresh testing dataset is submitted to the web application. As shown in the Fig. 1, in the first step, the model is trained using the training dataset. This model is saved into a file, which is present in the HDF5 format [12]. HDF5 stands for Hierarchical Data Format 5. This process is called serialization and this is done using the keras built-in module "save". The model's architecture, weights, training setup (loss and optimizer), and optimizer state are all contained in this serialized file.
Later, this file is de-serialized and it is supplied into the Flask web application. This process is called de-serialization and this is done using the keras built-in module "load_model". In this way, whenever a new testing dataset is uploaded in the web application, training the model with the training dataset every single time can be avoided (the model uses the serialized file to load the pre-trained model's configuration).

Data pre-processing
A detailed analysis of this dataset and the methodology used has been provided in a previous paper [13]. The dataset is taken from kaggle [14]. As shown in Fig. 2, first, the given dataset is divided into the training dataset and the testing dataset. Furthermore, the training dataset and the testing dataset have been divided into the features dataset and the target dataset.
Both the training features dataset and the training target dataset consists of 23,814 training samples. Also, both the testing features dataset and the testing target dataset consists of 3982 testing samples. In the data pre-processing stage, the categorical values of the attributes are mapped into numerical values as shown in Table 2.

Evaluation of the machine learning model
The accuracy of the machine learning model is evaluated by applying log loss function for each drug-MoA annotation pair. The mean column-wise log loss is used for the evaluation of the model. For every sample id "sig_id", the probability that the sample had a positive response needs to be predicted for each MoA target. A positive response means that a drug belongs to a particular class of drug (i.e. target). A lesser value of log loss (i.e. score) indicates better accuracy.
The formula for the evaluation of the machine learning model is shown in Eq. (1) [15].
where: N is the number of sig_id observations in the test data (i = 1,2,…,N). M is the number of scored MoA targets (m = 1,2,…,M). ŷ i,m is the predicted probability of a positive MoA response for a sample id (sig_id). y i,m is the ground truth, 1 for a positive response, 0 otherwise. log() is the natural base e logarithm.

Models and results
In the dataset, there could be numerous Mechanism of Action's (MoA's) for each of the drug. Thus, this machine learning problem belongs to multi-label classification. The machine models tested in this paper are BRkNN (Binary Relevance K Nearest Neighbors), ML-KNN (Multi-label K-Nearest Neighbors) and a custom Neural Network.

BRKNN (binary relevance K nearest neighbors)
BRkNN is a variant of the k-Nearest Neighbors (kNN) method that is essentially equal to combining Binary Relevance (BR) with the kNN algorithm. BRkNN extends the kNN method to make independent predictions for each label [16]. Based on the assessment of each label's confidence score, BRKNN is classified into two types: BRkNN-a and BRkNN-b.

BRkNN-a (type A)
BRkNN-a determines whether BRkNN returns the empty set if none of the labels appear in at least half of the k nearest neighbours. If this criterion is met, the label with the highest confidence is outputted [16]. For this model, Fig. 3 and Table 3 shows the graph and prediction scores, respectively.
In the graph shown in Fig. 3, the X-axis represents the number of neighbours, and the Y-axis represents the public and private dataset score. The private dataset score improves from 3 neighbours until 5 neighbours. The public dataset score improves until from 3 neighbours until 10 neighbours. Afterwards, both of the scores decline as the number of neighbours increases.

BRkNN-b (type B)
First, BRkNN-b estimates the "s" (average size) of the label sets of the k nearest neighbors, and later, outputs the integer which is nearest to "s" labels, which is having the highest confidence [16]. For this model, Fig. 4 and Table 4 shows the graph and prediction scores, respectively. In the graph shown in Fig. 4, the X-axis represents the number of neighbors and the Y-axis represents the public and private dataset score. Both the private dataset score and the public dataset score improves from 3 neighbors until 2000 neighbors. 5000 neighbors yeilds a worse score. Afterwards, both of the scores remains constant. As seen from Table 4, the maximum difference between the public dataset score and private dataset score is 0.38482, which occurs when the number of neighbors is 30.

ML-KNN (multi-label K-nearest neighbors)
The ML-KNN technique is based on the well-known k-Nearest Neighbor (kNN) algorithm. First, the k nearest neighbors in the training set are selected for each test instance. The maximum a posteriori (MAP) concept is then used  [17]. For this model, Fig. 5 and Table 5 shows the graph and prediction scores, respectively. In the graph shown in Fig. 5, the X-axis represents the number of neighbors and the Y-axis represents the public and private dataset score. Both the private dataset score and the public dataset score improves from 3 neighbors until 20 neighbors. Afterwards, both of the scores declines as the number of neighbors increases.

Custom neural network
A neural network is created using Keras [18][19][20]. Keras is a Python-based deep learning API that runs on top of Ten-sorFlow. Since there are 875 input features in the dataset, the input layer units is 875. Similarly, there are 206 output  targets, the output layer units is 206. Both the dropout layer 1 and dropout layer 2 have 0.5 as the dropout rate. The model is compiled using the binary cross-entropy loss function. The optimizer used is adam. The Neural network implementation code can be found in Github [21]. Figure 6 shows the layers of the neural network. Table 6 gives the description of the each of the layer used. Table 7 shows the activation functions used for the dense layers and the output layer. Figures 7 and 8 shows the graph for the sigmoid and RELU activation function, respectively. Figure 9 shows the accuracy graph. Table 8 shows the prediction scores for this model. In the graph shown in Fig. 9, the X-axis represents the epochs and the Y-axis represents the public and private dataset score. Both the private dataset score and the public dataset score improves from 15 epochs until 75 epochs. Afterwards, both of the scores declines as the number of epochs increases.     Since the Private Dataset Score is considered for scoring the final leader board, the best score for private dataset obtained with each of the models is considered. The summary of the best accuracy for each of the model is shown in the Table 9. As seen from the Table 9, the custom neural network with 75 epochs and 100 batch size performs the best.

Architecture of the web application
A web application is developed to visualize the Mechanism of Action of each of the drug. The source code for this web application can be found in Github [21]. This web application is developed using the Flask framework [22]. It also uses the Jinga Templating Engine [23]. Figure 10 shows the architecture of the web application. The input is a CSV file, which contains gene expression and cell viability values of each drug. The output is the top classes of drugs (with the highest probability). The number of top MoA's to be displayed in the output can be set using the variable NUMBER_OF_TOP_MOA.

Running the web application
In Table 10, the testing dataset which is to be uploaded in the web application (testing_dataset.csv) is shown. For each of the drug present in the testing dataset, the top MoA's for each of the drug is displayed. Along with it, a scatter plot is displayed for each of the drug. In order to visualize the scatter plot, an ID is given to each of the class of drug. The first class of drug (i.e. 5-alpha_reductase_inhibitor) is given ID 1, the second class of drug (i.e. 11-beta-hsd1_inhibitor) is given ID 2 and so on. This list of ID's is stored in a python list (present in the flask application). The X-axis of the scatter plot represents the ID of the class of the drug, and the Y-axis represents the probability of the drug belonging to that class of the drug. The homepage of the web application is shown in Fig. 11. The top classes of drug is shown in Figs. 12 and 13. The scatter plot is shown in Figs. 14 and 15.

Conclusion
Mechanism of drug can help scientists accelerate the drug discovery process. This paper discussed various machine learning models to predict the mechanism of action of a drug. Also, a Flask-based web application is introduced, through which a user can input a custom testing features data-set containing gene expression and cell viability levels. The output produced is the top classes of drugs, along with their scatter plot. This can help scientists to predict the Mechanism of Action and can help them in the discovery of new drugs. Figure 16 and Table 11 shows a summary regarding the log loss of all the machine learning models used in this paper. Among these models, the custom neural network model performed the best. In the future, if a better-performing machine learning model is discovered, they can seamlessly integrate their machine learning model into the web application provided in this paper. Also, for a better visualization of the scatterplots, the web application can be extended to use interactive scatterplots using plotly, bokeh or any other plotting packages. Author contributions HLG, CHA wrote the main manuscript text. PGR, SBR designed the model and applied suitable ML algorithmsto draw the results. FF validated the data sets and results. HLG, CHA proof read the article before submitting. All authors read and approved the final manuscript.

Competing interests
The authors declare no competing interests.  Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.