Background

Although the impressive advances have been witnessed in life sciences and technology and genomics over the past years. To bring a new drug to patients still takes ~ 15 years and 800 million to one billion of dollars [1,2,3]. Traditional drug research and development (R&D) process requires testing for side efforts and safety through cellular model systems, extensive animal model and clinical trial experimental validation. The average cost of new drug discovery has significantly increased and more than 90% of drug candidates fail during development, which caused pharmaceutical R&D tremendously expensive, time costing and high risky [3, 4]. This further directly led to a small quantity and high price of new drugs on the market. Drug repositioning or drug repurposing, identifying new clinical indications for those approved drugs has been used as an important strategy to maximize the potential usage of the existing drugs and increase the number of new drugs [5, 6]. Compared with the traditional drug R&D process, drug repositioning has two major advantages. The first is the safety of the approved drugs has been verified by rigorous clinical trials, the repositioning candidates have passed all necessary tests usual to de novo drug R&D, so these drugs are safe to use. Another advantage is drug repositioning has an abridged process of drug discovery and preparation, which means saving time and money.

In recent years, the establishment of online public databases on pharmacochemical properties, drug molecules chemical structure, drug–drug interactions, disease–disease interactions, related genomic sequences and side efforts has promoted the study of drug–disease interactions and drug repositioning [7]. Such as KEGG [8], OMIM [9], CMap [10], DrugBank [11], STITCH [12] and ChEMBL [13]. The goal of drug repositioning is to find potential indications for existing approved drugs and apply the new identified drug candidates to the clinical treatment for other disease than originally targeted disease. Integrated data from these various sources, to date, many machine learning methods are developed [14,15,16,17,18,19,20,21,22,23,24,25].

For instance, Chiang et al. conducted a ‘guilt-by-association’ network-based model to predict potential drug–disease associations, this method assumes that if the two diseases have similar treatment profiles, then the drug used for only one of the two diseases can be used for the other, thus recommending the new use of a drug. However, this approach tends to older drugs with multiple different uses and diseases with manifold different treatments [26]. Gottlieb et al. [27] demonstrated a method for large-scale prediction of drug indications, named PREDICT, which uses comprehensive drug–drug and disease–disease similarity measures to obtain discriminative features. Napolitano et al. [28] proposed a multi-class Support Vector Machine (SVM) classifier to predict novel drug–disease interactions and they defined drug similarities by using combined drug datasets. Moreover, some network-based methods also be put forward in recent years [29, 30]. Wu et al. [31] introduced a weighted drug–disease heterogeneous network to predict new use of drug by clustering based on experimental proved drug–target interactions and gene–disease relationships. Wang et al. [32] also constructed a heterogeneous network integrated drug targets, diseases and drugs into a unified framework, which can rank candidate drugs for each disease by an iterative approach. Martinez et al. [33] proposed DrugNet to perform drug–disease and disease–drug prioritization based on a network-based prioritization method, which can integrate extensive types of data from complex networks involving interconnected drugs, proteins and diseases.

More recently, some recommendation system based methods have been developed for computational drug discovery [34, 35]. Luo et al. [5] presented MBiRW model to identify new interactions for known drugs, which applied comprehensive similarity measures and Bi-Random walk algorithm. Thereafter, Nagaraj et al. [4] developed a novel drug discovery strategy DrugPredict, which combined computational model with biological testing in cell line in order to rapidly identify novel drug candidates for epithelial ovarian cancer. Their work exploited unique repositioning opportunities rendered by a vast amount of disease genomics, phenomics, treatments and genetic pathway [4]. Matrix factorization methods have also been used to identify novel drug–disease interactions, which takes one input matrix and obtained two related matrices as output, while the two are multiplied to approximate the originally input matrix, e.g. kernel Bayesian matrix factorization, collaborative matrix factorization method and so on. Most existing methods rely on the properties of some important drugs or diseases to exploit the drug similarity and disease similarity measures. However, there are some known interactions between drugs and diseases that previous studies have not considered to utilize, which yet have valuable information can be exploited to improve similarity measures.

In this study, we propose a deep learning model for potential Drug–Disease Interactions Prediction, named DDIPred. It applied gated recurrent neural network for predicting new indications of existing drugs using comprehensive similarity measures and Gaussian interaction profile kernel features. The workflow of this study is demonstrated as shown in Fig. 1. More specifically, the similarity measures are calculated based on drug chemical structures, disease phenotypes and known drug–disease interactions. Furthermore, the Gaussian interaction profile (GIP) kernel was applied to exploit effective feature of drug and disease based on known drug–disease interactions. The truncated singular value decomposition (TSVD) is further used to reduce the dimensionality of these combined two feature [17]. Finally, we fed these discriminative features into deep gated recurrent units (GRU) model as input to learn and predict the novel drug–disease interactions, which means potential new use of existing drugs. Moreover, the performance of the proposed model is evaluated on two gold standard datasets under ten-fold cross-validation. And we further made case studies to verify the predictive ability of our model. Experimental results demonstrate that the proposed model has the superior capability to discover potential new use of drugs.

Fig. 1
figure 1

The workflow of DDIPred

Materials and methodology

In this section, the dataset used in this study will be introduced first. And then, based on the basic hypothesis that the similar drugs have similar indications, we proposed a novel deep learning approach of integrating comprehensive similarity measures and Gaussian interaction profile kernel with GRU model to predict potential drug–disease interactions. We will present the details of similarity measures and Gaussian interaction profile kernel and the implement of GRU model. Meanwhile, we will also describe the comparison models, experimental methods, and the evaluation criteria in this section.

Benchmark datasets

To evaluate the performance of our model, we selected two widely used benchmark datasets including Fdataset and Cdataset. The gold standard dataset Fdataset is obtained from Gottlieb et al.’s work [27], which is made up of multiple data sources. More concretely, for this dataset, there are 1933 known associations between drugs and diseases and 593 drugs from DrugBank [36] and 313 diseases registered in OMIM [9] (the Online Mendelian Inheritance in Man). We also carried out another benchmark dataset Cdataset at the same time, this dataset is firstly presented in Luo et al.’s paper [5]. There are 2532 drug–disease associations in this dataset, including 409 diseases and 663 drugs. Each dataset consists of three matrices: drug–drug similarity matrix \(S_{D} \in R^{m \times m}\), disease-disease similarity matrix \(S_{d} \in R^{n \times n}\) and drug–disease interactions matrix \(I \in R^{m \times n}\). \(S_{D}\) and \(S_{d}\) are symmetric matrices and each row or column element represents the similarity between a drug and other drugs, a disease and other diseases, respectively. The details of similarity calculation is given in next section. The m rows of matrix \(I\) indicate m drugs, n columns represent n diseases, when drug \(D_{i}\) have association with disease \(d_{j}\), set the element \(I\left( {i,j} \right)\) to 1, else set to 0. The interacting drug–disease pairs are used as positive samples, and the same number of pairs without known interaction are randomly selected as negative samples. The details of these two datasets are shown in Table 1.

Table 1 The details of the two drug–disease associations benchmark datasets

Similarity measures

Follow the description above, the drugs similarity is calculated based on the chemical structure information, which comes from drug-related properties [5]. More concretely, the similarity between two drugs is calculated by the Chemical Development Kit [37] of their 2D chemical fingerprints, which use the Simplified Molecular Input Line Entry Specification (SMILES) [38] of all drugs that downloaded from DrugBank. Moreover, the correlation between two drugs’ similarity and their common diseases are analyzed and set those similarity that is not discriminative close to 0. The similarity are adjusted using the logistic regression function which has been used to modify the diseases-genes associations similarity by [39]. The function can be defined as follow:

$$L\left( {\text{x}} \right) = \frac{1}{{1 + e^{{\left( {ax + b} \right)}} }}$$
(1)

where x represents the similarity value, a and b are adjusting parameters. And then, the drugs are clustered based on known drug–disease associations by using a graph clustering method, ClusterONE [40], which has been employed to detect valuable modules for drug repositioning [5, 31, 41]. The cohesiveness of a cluster M could be defined by ClusterONE as follows:

$$f\left( M \right) = \frac{{C_{in} \left( M \right)}}{{(C_{in} \left( M \right) + C_{bound} \left( M \right) + P\left( M \right))}}$$
(2)

where \(C_{in} \left( M \right)\) indicates the total weight of edges within a set of vertices M, \(C_{bound} \left( M \right)\) stands for the total weight of edges connecting this set to the remaining of group, and P(M) is the penalty term [5].

Gaussian interaction profile kernel

For diseases, we adopted Gaussian interaction profile kernel [42] to obtain the representation of disease–disease associations [43]. Based on the assumption that the diseases with a similar interaction pattern with drugs are likely to show similar interaction behavior with new drugs [42]. Similar assumptions can also be applied to drugs. Suppose (\(D_{i}\), \(D_{j}\)) indicates two different drugs, while (\(d_{i}\), \(d_{j}\)) represents two different diseases. Their gaussian interaction profile kernel similarity KG can calculation as follows:

$$KG_{disease} \left( {d_{i} , d_{j} } \right) = {\text{exp}}\left( { - \alpha_{d} \left\| d_{i} - d_{j}^{2}\right\| } \right)$$
(3)
$$\alpha_{d} = \frac{{{\alpha_{d}}^{{\prime}} }}{{\left( {\frac{1}{nd}\mathop \sum \nolimits_{i = 1}^{{n_{d} }} \left| {y_{{d_{i} }} } \right|^{2} } \right)}}$$
(4)

Here, for simplicity, the \({\alpha_{d}}^{{\prime}}\) is set to 0.5, and the \(n_{d}\) stands for the number of the diseases, which is inspired by [42]. Then, the matrix decomposition algorithm TSVD was further applied to reduce the dimension of these features.

Implementation of gated recurrent units neural network

In order to overcome several known defects of standard Recurrent Neural network (RNN) model, a series of improved models has been proposed in deep learning field. Among them, the Long short term memory (LSTM) [44, 45] and other similar variant models have the best performance and are widely used in a many fields [46,47,48]. The main reason for their effectiveness is the pull-in of gated mechanisms. The Gated Recurrent Units (GRU) was proposed by Cho et al. [49], which has only resetting gate and updating gate and all memory contents are fully open to each timestep. We follow the similar calculation process in [50].

The update gate \(u_{t}\) is calculated by:

$$z_{t} = sigmoid\left( {W_{z} i_{t} + U_{t} h_{t - 1} - b_{z} } \right)$$
(5)

here, the \(i_{t}\) indicates the input vector of GRU, \(h_{t - 1}\) stands for the previous output of model, \(W_{z}\), \(U_{z}\) and \(b_{z}\) are forward, recurrent matrices and biases for update gate, respectively. Similar to the process of update gate, the computed process of reset gate can be defined as follows:

$$r_{t} = sigmoid\left( {W_{r} i_{t} + U_{r} h_{t - 1} - b_{r} } \right)$$
(6)

where the parameters are same as above. Moreover, the candidate memory state \(c_{t}\) can be computed by:

$$c_{t} = \sigma \left( {W_{h} i_{t} + U_{h} \left( {r_{t} *h_{t - 1} } \right) - b_{h} } \right)$$
(7)

where \(\sigma_{h}\) is the tanh function and \(*\) means an element-wise multiplication. Finally, the memory state \(h_{t}\) of the GRU model is defined as:

$$h_{t} = \left( {1 - z_{t} } \right)h_{t - 1} + z_{t} c_{t}$$
(8)

In practice, the GRU model is implemented based on Keras framework [51]. Considering the limited scale of the problem, we set the number of hidden neurons in the GRU input layer to 128 and add a Dense layer (fully connected layer) behind the output layer as the classifier to reduce the final prediction probability results. The sigmoid function is employed as activation function, its mathematical behaviors can be expressed as follows:

$$\upsigma = {\text{sigmoid}}\left( x \right) = \frac{1}{{\left( {1 + e^{ - x} } \right)}}$$
(9)

before activation layer, we applied Dropout to reduce overfitting and enhance the model’s robustness [52]. The parameter of dropout was set to 0.25. And the binary cross-entropy was used as loss function, which corresponding to sigmoid activation function. Furthermore, loss function has significant influence to the performance of machine learning model. The binary cross-entropy can be defined as:

$$L\left( {{\text{t}},{\text{p}}} \right) = - \left( {\left( {1 - {\text{p}}} \right) \times \log \left( {1 - {\text{p}}} \right) + {\text{t}} \times {\text{log}}\left( {\text{p}} \right)} \right)$$
(10)

where p and t denote the prediction output and true label value. Moreover, we used the Adam optimizer the update the weights of model. The Adam integrated the advantages of both RMSProp and AdaGrad, which is popluar in this field [53].

Performance evaluation metrics

In order to comprehensively evaluate the performance of our model, we follow the widely used evaluation indicators and strategies [54, 55]. The tenfold cross-validation was applied to evaluate the performance of DDIPred. In each validation, all data randomly divides into ten equal parts. Nine-fold data are taken as train data, the rest one-fold is taken as test data. To guarantee the unbiased comparison, it confirmed that there is no overlap between train data and test data. The final validation result is the mean value of tenfold with standard deviations. We follow the extensive used evaluation criteria, including accuracy (Acc), true positive rate (TPR), true negative rate (TNR), positive predictive value (PPV) and Matthews Correlation Coefficient (MCC) defined as:

$${\text{Acc}} = \frac{TN + TP}{{TN + TP + FN + FP}}$$
(11)
$${\text{TPR}} = \frac{TP}{{TP + FN}}$$
(12)
$${\text{TNR }} = \frac{TN}{{TN + FP}}$$
(13)
$${\text{PPV}} = \frac{TP}{{TP + FP}}$$
(14)
$${\text{MCC}} = \frac{TP \times TN - FP \times FN}{{\sqrt {\left( {TP + FP} \right)\left( {TP + FN} \right)\left( {TN + FP} \right)\left( {TN + FN} \right)} }}$$
(15)

where TN stands for the true negative number, TP represents the true positive number, FN denotes the false negative number and FP indicates the false positive number. Certainly, the Receiver Operating Characteristic (ROC) curve and the area under the ROC curve (AUC) are also adopted to evaluate the performance. And considering the specificity of the research task, the predicted top-N ranked results are more valuable for related drug development or disease treatment research. We also test the performance of model based on the count of accurately retrieved true drug–disease interactions.

Results and discussion

In this study, we propose a deep learning model to predict potential drug–disease interactions, which can advance the discovery of new use of existing drugs or new treatment of diseases. In this section, we will systematically evaluate the performance of the model. Firstly, we evaluated the prediction capability of DDIPred on two benchmark datasets. And then, we compared it with other state-of-the-art models under the same experimental conditions. Furthermore, we made case studies to verify the practicability of the proposed method.

Drug–disease interactions prediction capability evaluation

First, the drug–disease interactions prediction capability of DDIPred is evaluated on two benchmark datasets Fdataset and Cdataset. The details of tenfold cross validation are listed at Tables 2 and 3 for Cdataset and Fdataset. The average values of tenfold cross-validation are taken as final report results as shown in Fig. 2.

Table 2 The tenfold cross-validation details on Cdataset
Table 3 The tenfold cross-validation details on Fdataset
Fig. 2
figure 2

The performance of DDIPred on two benchmark datasets

As the Table 2 shown, the mean accuracy of tenfold cross-validation on Cdataset is 81.48% with standard deviation 1.48%, the mean TPR is 80.59% with standard deviation 2.86%, the mean TNR is 83.01% with standard deviation 2.71%, the average PPV is 80.03% with standard deviation 2.88% and the mean MCC of DDIPred on Cdataset is 63.06% with standard deviation 2.99%. The rigorous cross validation results provided that our model have obvious predictive ability for predicting the associations between drugs and diseases.

The tenfold cross-validation performance of DDIPred on Fdataset is shown in Table 3. The average accuracy on Fdataset is 77.83% with standard deviation 2.43%, and the average TPR is 77.13% with standard deviation 4.37%, the average TNR is 79.22% with standard deviation 3.48%, the average PPV is 76.57% with standard deviation 4.06% and the mean MCC of DDIPred on Fdataset is 55.80% with standard deviation 4.93%. The performance of DDIPred on this dataset is slightly weaker than on the Cdataset, but it still has acceptable results, which means it is competent for the drug–disease associations prediction task.

Comparison with other state-of-the-art methods

We further compared the proposed model with other state-of-the-art methods on same datasets under same experimental conditions, including previous studies and widely used machine learning model Support Vector Machine (SVM), the comparison results are reported at Tables 4 and 5 and Fig. 3.

Table 4 Comparison of the AUC of previous studies and DDIPred on datasets
Table 5 Comparing the tenfold cross-validation performance of DDIPred and SVM on two gold standard datasets
Fig. 3
figure 3

The performance of DDIPred and comparison method on two benchmark datasets: a the ROC and AUC of DDIPred on Cdataset; b the ROC and AUC of SVM on Cdataset; c the ROC and AUC of DDIPred on Fdataset; d the ROC and AUC of SVM on Fdataset

We compared the AUC of our model and previous studies including DrugNet [33] and HGBI [32]. Considering the difference of experimental evaluation indicators in different research, we only compared the AUC value reported in every study, which can best reflect the performance of model. As shown in Table 4 and Fig. 3, the DrugNet obtained a AUC of 0.804 on Cdataset and a AUC of 0.778 on Fdataset. The HGBI performed better than DrugNet with AUC of 0.858, 0.829 on Cdataset and Fdataset respectively. However, the AUC of DDIPred are 0.871, 0.838 on Cdataset and Fdataset, our model performs best on both datasets.

Furthermore, we did a comparison between our model and widely used machine learning model SVM, which is often used as a baseline model and usually has great performance in various fields. The feature input, tenfold cross validation set, evaluation metrics and other experimental conditions are exactly same between DDIPred and SVM model. The parameters of SVM are determined by grid search. The results are shown in Table 5. Our model has significantly improved all indicators.

Case studies

In order to further examined the capability of the proposed model in predicting new associations between drugs and diseases. A drug and a disease are selected as case to be measured. The feature of the tested drug or disease and the feature of each disease or drug were combined as test data. Then, these data are fed into trained model to obtained prediction scores. Finally, all candidates are ranked based on prediction scores. The Zoledronic acid (DrugBank Accession Number: DB00399) and Dexamethasone (DrugBank Accession Number: DB01234) were selected for our case. Zoledronic acid is usually used to treat bone metastases pain, hypercalcemia of malignancy. And it can also helpful to prevent skeletal fractures in multiple myeloma and prostate cancer patients. Dexamethasone has anti-inflammatory, anti-immune, anti-toxin, antipyretic and other effects, and has a greater impact on metabolism. The prediction results are demonstrated in Tables 6 and 7, our model found the diseases most relevant to the target drugs, both confirmed indications and new potential candidate diseases are successfully predicted.

Table 6 Predicted diseases most relevant to Zoledronic acid
Table 7 Predicted diseases most relevant to Dexamethasone

Conclusion

In this work, we proposed a novel deep learning model DDIPred using comprehensive similarity measure and Gaussian interaction profile kernel and gated recurrent neural networks to predict potential drug–disease associations, which may find new indications of existing drugs and can accelerate the process of drug research and development. The similarity measure matrix is used to exploit discriminative feature for drugs based on their chemical fingerprints. Meanwhile, the Gaussian interactions profile kernel is employed to obtain efficient feature for diseases based on known disease–disease associations. Then, we implemented a competitive deep learning GRU model to deal with the prediction task. Our model achieved remarkable performance on both two benchmark datasets with excellent AUC of 0.871 and 0.838 on Cdataset and Fdataset, and outperforms all comparison state-of-the-art models in many indicators. And we further made case studies to verify the predictive ability of our model. The rigorous experimental results proved the proposed method is powerful tool for predicting new indications for drugs or new treatments for diseases, and can be regarded as a useful guide for drug repositioning and drug discovery.