Background

Drug-drug interactions (DDI) is defined as that a drug affects the efficacy of another drug when multi-drugs are adopted in the treatment of a disease [1]. DDIs can lead to change systemic exposure and result in variations in drug responses, which can improve the drugs efficiency and the life quality of illnesses [2]. However, DDIs also can cause serious adverse effects, drug withdrawal from the market and even the patient morality [3, 4]. Meanwhile, with the medical technology development and personalized medical requirements, more and more patients were simultaneously treated by multi-drugs and between 2009 and 2012, 38.1% of U.S. adults aging 18-44 years used three or more prescription drugs during a 30 day time period [57]. Therefore, identifying the potential DDIs has become a major issue in drug development and practice process.

With the high-throughput sequencing technology development, many databases related to drugs have been constructed. DrugBank database can provide drug targets, drug enzymes, drug transporters and DDIs, which are widely used in studying drug-target associations and drug repositioning [810]. PubChem Compound database contains the chemical substructures and their biological test results [11]. In addition, SIDER and OFFSIDES databases include drug side effects and "off-label" side effects, respectively [12, 13]. KEGG database contains drug pathways and chemical substructures [14]. TWOSIDES database contains the DDIs based on the adverse event reports in the AERS (adverse effect reactions) [13, 15].

The above mentioned databases related to drugs were extracted from the published literature and reports with experimental validation, and could provide the basis to the development of computational methods to predict new DDIs. Recently, many computational methods have been proposed to predict potential DDIs based on the assumption that similar drugs tend to interact with similar other drugs. These approaches usually used the biological network data, chemical substructure data and phenotypic data. Based on MACCS substructures of drugs, Vilar et al. developed a similarity-based model to predict new DDIs [16]. Liu et al. proposed a model to predict potential DDIs via random forest-based classification model, which also adopted a feature selection technique over the chemical substructures, protein-protein interactions between targets of drugs and target enrichment of KEGG pathways [17]. Cheng et al. proposed a method to infer novel DDIs via machine learning classifiers, whose major feature is integrating drug chemical, phenotypic and genomic properties [18]. IPFs (interaction profile fingerprints) method was proposed to predict hidden DDIs [19]. Logistic regression model was used to predict new DDIs by Takeda et al., which analyzed the effects of 2D structural similarities of drugs on DDI prediction with other pharmacokinetics (PK) and pharmacodynamics (PD) knowledge [20]. Via constructing the drug similarity based on their 2D and 3D molecular substructures, targets, side effects and known DDIs, Vilar et al. further proposed a method to predict new DDIs on a large scale data, where the key feature is capturing the characteristics of drugs by 3D substructures when 2D substructures are missing [21]. Herrero-Zazo et al. provided a computational method to predict DDIs by different types of DDIs and their mechanisms [22]. By integrating similarities from drug molecular and pharmacological phenotypes, Li et al. used a Bayesian network to provide large-scale exploration and analysis of drug combinations [23]. Through calculating the functional similarity from drug carriers, drug transporters, drug enzymes and drug targets, Ferdousi et al. developed an approach to discover new DDIs [24]. Based on the Probabilistic Soft Logic method, a computational framework was developed to discover new DDIs by integrating the multiple drug similarities and known DDIs [25]. The label propagation approach was also developed to discover new DDIs, which used drug chemical structures, side effects and off side effects [26]. In order to predict drug adverse drug reactions (ADRs), a systems pharmacology model called MEF (multiple evidence fusion) has been developed by integrating known DDIs and other similarities of drugs [27]. Based on the assumption that synergistic effects of drugs are usually similar, Network-based Laplacian regularized Least Square Synergistic (NLLSS) method was developed to predict novel DDIs [28]. Via calculating the similarities of chemical, biological, phenotypic and known DDIs of drugs, Zhang et al. proposed three ensemble methods to predict novel DDIs, which included a weight average ensemble method and two classifier ensemble methods (L1 and L2) [29].

In addition, many other approaches used quantitative structure-activity relationship (QSAR) model, clinical data and data mining to study DDIs. Matthews et al. developed 14 QSAR models to predict the cardiac adverse effects for generic pharmaceutical substances [30]. Zakharov et al. developed QSAR models to predict the likelihood of DDIs for any pair of drugs by radial basis functions with self-consistent regression (RBF-SCR) and random forest (RF) [31]. Cami et al. proposed a Predictive Pharmacointeraction Networks (PPINs) to predict novel DDIs by exploiting the known DDIs and other intrinsic and taxonomic properties of drugs and AEs [32]. Huang et al. developed a method to predict DDIs using protein-protein interaction network and clinical side effects [33]. Based on information of drug metabolism, text-mining and reasoning methods were developed to infer new DDIs [34]. Iyer et al. used the textual portion Electronic health records (EHRs) to directly discover new DDIs [35]. Banda et al. also adopted a data mining method to predict new DDIs from the EHRs [36]. Based on the k-nearest neighbor algorithm, Chen et al. proposed a model to predict DDIs which integrated nine predictors by majority voting [37]. Furthermore, the drug response prediction and drug-target interaction prediction are also the important research topics about drugs. By integrating genomic/pharmaceutical data, protein-protein interaction network, and prior knowledge of drug-target interactions with the techniques of network propagation, Wang et al. have developed a dNetFS method to prioritize genetic and gene expression features of cancer cell lines that predict drug response [38]. Based on the massively collected drug-kinase interactions and drug sensitivity datasets, Liu et al. employed a sparse linear model to infer essential kinases governing the cellular responses to drug treatments in cancer cells [39].Based on the sequence information of both targets and drugs, DeepDTA is used to predict drug-target interaction binding affinities, which is a deep-learning based model (convolutional neural networks) [40].

Although the above DDI prediction methods have achieved some good prediction results of novel DDIs and provided useful information for drug development and practice process. However, these methods did not pay enough attention to new drugs which do not have any DDIs with other drugs or cannot predict novel DDIs for new drugs because known DDIs are missing.

In this study, we develop a computational method (called DDIGIP) to predict novel DDIs based on drug Gaussian interaction profile (GIP) kernel similarity and regularized least squares (RLS) classifier. We calculate the GIP similarity of drugs by known DDIs, and then adopt the RLS method to compute the related scores of any drug pairs. In addition, when predicting DDIs for new drugs, we use the KNN method to compute the initial relational scores by similarity calculated from some important chemical, biological and phenotypic information of drugs. The drug chemical structures, drug-target interactions, drug enzymes, drug transports, drug pathways, drug indications, drug side effects and drug off side effects are all used to calculate similarity of drugs. 5-fold cross validation (5CV), 10-fold cross validation (10CV) and de novo drug validation are used to systemically assess prediction performance of DDIGIP, compared with other methods. In 5-fold cross validation, the area under the ROC curve (AUC) value of DDIGIP is 0.9600 which is slightly better than the state-of-the-art method L1 classifier ensemble (L1E) method results of 0.9570. In addition, the experimental results of 10-fold cross validation also demonstrate that DDIGIP outperforms the L1E method. In de novo drug validation, DDIGIP achieves the AUC of 0.9262, which is also better than the weighted average ensemble (WAE) method result of 0.9073. Case studies further validate the prediction ability of DDIGIP method.

Materials

In this study, the benchmark dataset of DDIs composes of 548 drugs and 48,584 DDIs. This dataset is obtained from the TWOSIDES database. In addition, because we need to calculate the relational scores of new drugs, we also download other chemical, biological and phenotypic data from other databases to compute the similarity of drugs. The chemical data are PubChem substructures which are downloaded from the PubChem Compound database. Biological data include drug targets, drug transports, drug enzymes and drug pathways, the first three types are obtained from the DrugBank database and the last one is from the KEGG database. Furthermore, the phenotypic data composes of drug indications, drug side effects and drug off side effects. The SIDER database provided the drug indications and drug side effects, and OFFSIDES provided the drug off side effects.

Previous studies also provided the download links for these datasets [29]. Table 1 shows the relevant information which includes data type, data source and dimensionality.

Table 1 The description of benchmark dataset

Methods

GIP kernel similarity of drugs

The GIP kernel similarity has widely been used in other prediction issues of similar areas and achieved effective prediction performances [4146]. RLS-Kron is provided to predict drug-target interactions based on RLS classifier of Kronecker product kernel and GIP kernel similarities of drugs and targets [41]. SDTRLS is provided to predict drug-target interactions based on integration similarity of drug GIP kernel similarity and chemical substructure similarity by the SNF method[42, 47]. LDAP is used to predict lncRNA-disease associations by using a bagging SVM classifier based on lncRNA and disease similarities which include GIP kernel similarity [43]. DNRLMF-MDA is an miRNA-disease associations prediction method based on dynamic neighborhood regularized logistic matrix factorization, which also uses the GIP kernel similarity.

We compute the GIP similarity of drugs via known DDIs in this study. We denote D={d1,d2,......,dN} as the set of N drugs. The known DDIs can be represented by an adjacency matrix YNN. The value of yij is 1 if di and dj have a known interaction, and 0 otherwise. The GIP kernel similarity between drugs di and dj can be calculated as follows:

$$\begin{array}{@{}rcl@{}} {G_{sim}\left(d_{i},d_{j}\right)} = exp\left(-\gamma_{d} {||yd_{i}-yd_{j}||}^{2}\right) \end{array} $$
(1)
$$\begin{array}{@{}rcl@{}} \gamma_{d} = \gamma{^,_{d}}/\left(\frac{1}{N}\sum\limits_{i=1}^{N}{||yd_{i}||}^{2}\right) \end{array} $$
(2)

where γd is the regularization parameter of kernel bandwidth and γd, is set to be 1 according to previous studies [42, 44], ydi={yi1,yi2,......,yiN} is the interaction profile of drug di.

RLS classifier and prediction dDIs

The (kernel) RLS classifier is based on the assumption that similar principal (adjuvant) drugs are tended to interact with the same adjuvant (principal) drug and has been widely used in other areas [42, 48, 49]. After calculating the GIP kernel similarity Gsim, we adopt the RLS classifier to compute the interaction probability scores of drug pairs as follows:

$$\begin{array}{@{}rcl@{}} \hat Y{_{p}} = G_{sim}{(G_{sim}+\sigma I)}^{-1}Y \end{array} $$
(3)
$$\begin{array}{@{}rcl@{}} Y_{P} = \frac{\hat Y{_{p}}+\hat Y{_{p}^{T}}}{2} \end{array} $$
(4)

where σ is the regularization parameter and set to be 1 according to previous study [41]. Furthermore, the Gsim and I are the GIP similarity matrix and the identity matrix, respectively. The Yp is the final prediction result matrix, which is symmetric. The interacted probabilities of drug pairs are ranked in descending order. A candidate drug pair with the rank 1 is of the most possible drug pair.

KNN for new drugs

New drugs have no any known interaction with other drugs, which makes prediction DDIs for these drugs is impossible by existing methods. Therefore, we adopt the KNN method to calculate their initial relational scores based on the integrated feature similarity of chemical structure, biological and phenotypic information.

In order to calculate the integrated feature similarity SsimNN, we adopt the Pearson correlation coefficient to compute similarity based on the binary vectors of drug chemical substructures, drug targets, drug transporters, drug enzymes, drug pathways, drug indications, drug side effects and drug off side effects. We can see from Table 1 that the total dimensionality of a binary vector of any drug is 21,351, whose value is 1 when the related feature is present and otherwise is 0. Specifically, the similarity of drug pair di and dj is calculated as follows:

$$\begin{array}{@{}rcl@{}} {S{_{sim}^{i,j}}} = \frac{Cov\left(v_{d_{i}},v_{d_{j}}\right)}{{D\left(v_{d_{i}}\right)}{D\left(v_{d_{j}}\right)}} = \frac{E\left(\left(v_{d_{i}}-Ev_{d_{i}}\right)\left(v_{d_{j}}-Ev_{d_{j}}\right)\right)}{{D\left(v_{d_{i}}\right)}{D(v_{d_{j}})}} \end{array} $$
(5)

where \(v_{d_{i}}\) and \(v_{d_{j}}\) are the feature vectors of drugs di and dj, respectively. Cov is the covariance. E and D are the mathematical expectation and standard deviation, respectively.

After obtaining the integrated feature similarity Ssim, we calculate the initial scores of new drugs by the KNN method. Specifically, the interaction scores YKNN(di,dj) between new drug di and another drug dj can be calculated as follows:

$$ Y_{KNN}\left(d_{i},d_{j}\right) = \frac{\sum S{^{(i,l)}_{sim}}y_{lj}}{\sum S{^{(i,l)}_{sim}}}, d_{l} \in K_{set} $$
(6)

where \(S{^{(i,l)}_{sim}}\) is the (i,l)-th element of the integrated similarity matrix and ylj is the (l,j)-th element of known DDIs matrix YNN. Kset represents the set of top K nearnest neighbors according to the Ssim matrix. In this study, we set the value of K by de novo drug validation.

Algorithm 1 is the description of our DDIGIP method. As the 0 vectors in the DDIs adjacency matrix Y correspond to unknown cases, we firstly compute the initial relational interaction scores for new drugs via the KNN method which uses the feature similarity Ssim of drugs by integrating chemical, biological and phenotypic data. The feature similarity Ssim is calculated by Pearson correlation coefficient. After computing the GIP similarity Gd of drugs, we take the RLS classifier to calculate the interaction scores of drug pairs. The final prediction result matrix is Yp. Figure 1 demonstrates the work flow of DDIGIP.

Fig. 1
figure 1

The work flow of DDIGIP

RESULTS aND dISCUSSIONS

Benchmark evaluation and evaluation indices

5CV and 10CV are widely used to evaluate the performance for predicting drug-drug interactions [28, 29], drug-target interactions [42, 50], drug-disease interactions [5153], lncRNA-disease associations [43, 54], miRNA-disease associations [44, 55] and so on. In this study, we evaluate the predictive performance of DDIGIP using 5CV and 10CV. In 5CV, all known DDIs are divided into 5 folds, and each fold, in turn, was left out as the test set while the rest 4 folds as the training set. In 10CV, we also divide known DDIs into 10 folds, and each fold is treated as test set in turn, while the remaining 9-folds are as the training set. We adopt 10 repeats in 5CV and 10CV. Furthermore, the actual generalization ability of predicting potential DDIs for new drugs is also an important aspect to assess the prediction performance. We thus conduct de novo drug validation to evaluate the predictive performance of DDIGIP. In de novo drug validation, we take known DDIs of each drug, in turn, and the rest DDIs of other drugs as the training set.

From a prediction method, each drug pair obtains a prediction score. Then each known interaction between two drugs in the test is ranked relative to the candidate interactions (all unknown interactions). On a specified rank threshold, TPR (true positive rate) is the fraction of known interactions that are correctly predicted, and FPR (false positive rate) is the fraction of unknown interactions that are predicted to be true interactions. The receiver operating characteristic curve (ROC) can be drawn based on various TPR and FPR values with various rank thresholds. Then we also use the area under the receiver operating curve (AUC) to measure the prediction performance of DDIGIP and other methods. The higher its AUC value is, the better prediction performance a method achieves.

Comparison with previous methods

In this study, we compare our method with other four methods: weighted average ensemble (WAE) method, L1 classifier ensemble (L1E) method, L2 classifier ensemble (L2E) method [29] and label propagation (LP) method [26], with the same validation method in the benchmark dataset.

5CV

Table 2 shows that the prediction performances of five methods in 5CV. Based on the AUC values of these methods, DDIGIP is slightly better than other methods. It shows that the GIP similarity is reasonable to use known DDIs because DDIGIP only uses known DDIs in 5CV. In addition, three integrating methods (WAE, L1E, L2E) were also achieved the good results because they integrated the neighbor recommender method, random walk method and matrix perturbation method.

Table 2 The prediction performances in 5CV,10CV and denovo validation, the best results are in the bold face

10CV

Table 2 also shows the prediction performances of five methods in 10CV. DDIGIP also achieved the best prediction result and its AUC value is 0.9636 which is larger than other methods WAE: 0.9530, L1E:0.9599, L2E:0.9594 and LP (max): 0.9378, respectively. By comparing the prediction performances of DDIGIP in 5CV and 10CV, DDIGIP is more effective to predict DDIs in 10CV than in 5CV. It proves that DDIGIP has better prediction ability when there are many known DDIs.

Denovo drug validation

In de novo drug validation, we compare DDIGIP with LP and WAE. We do not perform the de novo drug validation on other existing methods because of their prediction limit for new drugs. Similar to previous studies, we also obtain the weights of integrated methods (neighbor recommender method and random walk method) with drug chemical data, biological data and phenotypic data. Table 2 shows that DDIGIP also obtains the best prediction performance in terms of AUC (0.9262), compared with other methods (WAE: 0.9073, LP (max):0.8997). It also further indicates that the GIP similarity is effective to use known DDIs.

Computation time comparison

The computation time is also an important aspect to assess the performance of computational methods. In this study, we also compare the average computation time of five methods in 5CV. Figure 2 shows that the runtime of DDIGIP is less than those of other methods. In addition, since WAE, L1E and L2E are the integration method, their computation times are longer than those of LP and DDIGIP. We can see from Fig. 2 that DDIGIP runs the fastest and its computation time is 6.61 seconds in 5CV.

Fig. 2
figure 2

The average computation times of five methods in 5-fold cross validation

Parameter analysis for K

In order to verify the robustness of DDIGIP, we analyze the parameters K that is the number of the nearest neighbors in de novo drug validation. The optimal parameter value of K is selected by the grid search. Figure 3 shows the AUC values of DDIGIP under variation of K ranging from 1 to 15 in de novo validation. We can see from Fig. 3 that the prediction performance has the ascending trend when K ranges from 1 to 7, while has the descending trend when K ranges from 11 to 15. In addition, DDIGIP has a relatively stable prediction performance and achieves the best prediction result (AUC:0.9262) when K is 9. It indicates that a reasonable value of K can improve the prediction performance of DDIGIP.

Fig. 3
figure 3

The AUC of DDIGIP under different settings of K in de novo drug validation, the sign represents the default value

Case studies

To illustrate the prediction performance of DDIGIP method, we conduct two types of case studies. The one includes the top 20 predicted DDIs under all known DDIs, in which the benchmark dataset is obtained from the TWOSIDES database while the confirmed database is DrugBanK database. Another includes top 20 the new DDIs in de novo validation of drug Ranolazine (DB00243) whose confirmed database composes of TWOSIDES database and DrugBanK database.

We can see from Table 3 that 9 out of top 20 DDIs predicted by DDIGIP are validated in DrugBank. The verification success rate is 45%. Zafirlukast (DB00549) is an oral leukotriene receptor antagonist (LTRA) drug usually used in the maintenance treatment of asthma, its metabolism can be decreased by Rabeprazole (DB01129) [56, 57]. Atazanavir (DB01072) is an antiretroviral drug of the protease inhibitor (PI) class, which is used to treat infection of human immunodeficiency virus (HIV) and its metabolism can be decreased when combining with Amlodipine (DB00381) [8, 58]. In addition, Pantoprazole (DB00213) also decreases the metabolism of Methadone (DB00333)[59]. The risk or severity of adverse effects can be increased when Atenolol (DB00335) is combined with Nadolol (DB01203), Clotrimazole (DB00257) is combined with Pregabalin (DB00230) or Enalapril (DB00584) is combined with Perindopril (DB00790) [9, 10, 60, 61]. The hypotensive activities of Nadolol (DB01203) can be increased by Propranolol (DB00571) [62]. The absorption of Cefpodoxime (DB01416) can be decreased when combining with Ranitidine (DB00863) [63]. Acebutolol (DB01193) also increases the serum concentration of Metoprolol (DB00264) [64].

Table 3 Top 20 new DDIs predicted by DDIGIP method

Ranolazine is an antianginal medication used in the treatment of chronic angina [10]. Table 4 shows that top 20 predicted DDIs of Ranolazine are validated in TWOSIDES database or DrugBanK database. In addition, 12 out of top 20 DDIs are simultaneously confirmed by TWOSIDES database and DrugBanK database, while the rest are confirmed by one of them. For example, the metabolism of Levothyroxine (DB00451) and Zolpidem (DB00425) can be decreased when combining with Ranolazine [15, 56]. Clopidogrel is an antiplatelet agent structurally and pharmacologically similar to ticlopidine, which is used to inhibit blood clots in a variety of conditions such as peripheral vascular disease, coronary artery disease, and cerebrovascular disease [8]. The serum concentration of Clopidogrel (DB00758) can be increased when combining with Ranolazine [15]. Similarly, the serum concentration of Simvastatin (DB00641), Acetylsalicylic (DB00945) or Metformin (DB00331) also can be increased when combining with Ranolazine [56, 65]. In addition, when Ranolazine is combined with Omeprazole (DB00338) or Acetaminophen (DB00316), its serum concentration also can be increased [15, 66].

Table 4 The validation result of top 20 new DDIs of drug Ranolazine (DB00243) predicted by DDIGIP method in de novo validation

Conclusion

In this study, we have proposed a computational method, called DDIGIP, for DDIs prediction. The GIP similarity of drugs is calculated by the known DDIs, which makes full use of known DDIs. To our knowledge, in the previous studies the RLS-Kron method is used to predict interaction of bipartite networks, such as drug-target interaction networks, drug-disease interaction network and so on. Experiments are conducted using two different types of cross validations: 5-fold cross validation and 10-fold cross validation. The prediction ability of DDIGIP has been illustrated by comparing it with four other competing state-of-the-art methods.

Furthermore, based on Pearson correlation coefficient, we obtain a comprehensive feature similarity of drugs by integrating the chemical, biological and phenotypic data into a high dimension binary vector. In order to more effectively predict DDIs for new drugs, we also conduct de novo drug validation. We add a preprocessing step, KNN, to compute the initial relational scores according to the feature similarity of drugs. Because the vector 0 in the matrix corresponding to unknown cases or missing values rather than confirmed non-interactions, the preprocessing can improve the prediction performance.

Despite the advantages of DDIGIP as discussed above, it still has some limitations. The more effective method should be developed to integrate known DDIs with other chemical, biological and phenotypic data. In addition, other new prediction methods such as matrix completion [67], deep learning [68] and interpretable boosting model [69] could be considered. Finally, in this study, the benchmark dataset of DDIs only includes the positive samples and is an imbalanced dataset, we will also consider some other methods (SVM [70],LibD3C [71],extreme learning machine [72] and collaborative metric learning [73]) to predict DDIs when we obtain reliable negative samples in the future. We expect to develop a more effective method to predict DDIs by overcoming these limitations in the future.