SYNDEEP: a deep learning approach for the prediction of cancer drugs synergy

Torkamannia, Anna; Omidi, Yadollah; Ferdousi, Reza

doi:10.1038/s41598-023-33271-3

SYNDEEP: a deep learning approach for the prediction of cancer drugs synergy

Article
Open access
Published: 15 April 2023

Volume 13, article number 6184, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

SYNDEEP: a deep learning approach for the prediction of cancer drugs synergy

Download PDF

Anna Torkamannia¹,
Yadollah Omidi² &
Reza Ferdousi¹

4626 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Drug combinations can be the prime strategy for increasing the initial treatment options in cancer therapy. However, identifying the combinations through experimental approaches is very laborious and costly. Notably, in vitro and/or in vivo examination of all the possible combinations might not be plausible. This study presented a novel computational approach to predicting synergistic drug combinations. Specifically, the deep neural network-based binary classification was utilized to develop the model. Various physicochemical, genomic, protein–protein interaction and protein-metabolite interaction information were used to predict the synergy effects of the combinations of different drugs. The performance of the constructed model was compared with shallow neural network (SNN), k-nearest neighbors (KNN), random forest (RF), support vector machines (SVMs), and gradient boosting classifiers (GBC). Based on our findings, the proposed deep neural network model was found to be capable of predicting synergistic drug combinations with high accuracy. The prediction accuracy and AUC metrics for this model were 92.21% and 97.32% in tenfold cross-validation. According to the results, the integration of different types of physicochemical and genomics features leads to more accurate prediction of synergy in cancer drugs.

Interpreting drug synergy in breast cancer with deep learning using target-protein inhibition profiles

Article Open access 29 February 2024

A machine learning framework for predicting synergistic and antagonistic drug combinatorial efficacy

Article 01 March 2022

Predicting tumor cell line response to drug pairs with deep learning

Article Open access 21 December 2018

Introduction

Cancer is one of the most detrimental diseases with high mortality worldwide and is considered a challenging barrier in terms of increasing life expectancy¹. The currently used treatments fail to satisfactorily cure the disease, in large part due to the emergence of drug resistance, and severe side effects^2,3. In cancer therapy, the foremost target is usually specified in eradicating malignant cells using anticancer cytotoxic drugs through the induction of apoptosis and cell death in the diseased cells and tissues. However, cancer cells can develop escape mechanisms, initiate bypasses in the networks, and emerge alternative pathways for further proliferation and recurrence^4,5. Remarkably, combinational pharmacotherapy might be the prime strategy that can intensify the therapeutic impacts of anticancer drugs and overcome the drug resistance mechanisms of cancer cells^6,7. Such an approach can impose synergistic effects and potentially reduce the dose of monotherapy and drug resistance and avoid toxicity while the efficacy of the drug increases^8,9. Hence, the discovery of the drug combination with synergistic effects is of eminent necessity in treating cancer.

Combinational pharmacotherapy should be provided and designed by in vitro and in vivo experiments based on the US Food and Drug Administration (FDA), the European Medicines Agency, and the World Health Organization guidelines¹⁰. It should be noted that predicting the possible drug combinations with synergistic effects solely via in vitro and/or in vivo experimentation is an extremely laborious task with no/trivial outcomes¹¹. Besides, predicting the synergistic drug combination with clinical experiments is inefficient, time-consuming, and cost-intensive^7,12. Therefore, in silico approaches can be reliable tools to facilitate and prioritize identifying synergistic drug candidates for experimental strategies¹³.

The use of the in silico approaches as powerful tools has empowered the opportunity of exploring the wide variety of synergistic gaps with the diversity of chemical structures of drugs and genomic data from cancer cell lines^9,14. Such approaches have paved the way for clinical trial experiments by providing accurate and adequate predictions¹⁵. Accordingly, over the last years, predicting drug combinations with synergistic effects has been increased by computational methods and has provided satisfactory outcomes. In this line, Jiang et al. developed a computational model to predict synergistic drug combinations on 39 cell lines using a graph convolutional network¹⁶. They used the multimodal network of the drug-drug synergy network, drug-target interaction network, and protein–protein interaction network. As a result of their analysis, the value of AUC was higher than 80%. The AuDNNsynergy method was proposed by Zhang et al.¹⁷, which identified the synergistic effects based on chemical structure and genomic data. As a result of this approach, the omics data enhance prediction accuracy, and the value of Pearson correlation obtained 0.74. DeepSynergy was developed to detect the drug combinations effect based on the deep neural network¹⁸. The method has utilized the chemical descriptors of drug pairs and gene expression profiles. DeepSynergy demonstrated the best performance compared to other state-of-art methods. Yang and colleagues proposed a model based on the functional similarity of target proteins to prioritize and stratify synergistic drug combinations¹⁹. This approach capitalized on the protein targets to predict synergy effects on breast cancer cell lines and experimentally validated the BRAF/insulin receptor combination in 48 colorectal cancer cell lines.

In the current study, a novel deep neural network model was proposed as an SYNDEEP to predict the synergistic drug combinations based on cancer cell lines and drug information. The model utilized the different features of physicochemical, genomic, protein–protein interaction, and protein-metabolite interaction information. Then the feature network was constructed based on eight various feature groups. The vector of features was developed according to the structure of the feature network. Finally, the feature vectors are fed into the deep neural network to achieve synergistic prediction results.

Results

This section summarizes the results of the synergy prediction model on the NCI-ALMANAC dataset. The 74 drugs were utilized in this study. The final experiment dataset consists of 22,228 drug pairs combinations of 74 unique drugs against 60 cell lines. Also, the final features for one drug consisted of 777 features, and total number of the features for a pairs of drugs were 1614 with similarity measurement and cell lines. In addition, the diverse hyperparameters of the algorithms were implemented to adjust the optimal state. As a result, the final optimization parameters are as follows: SNN: hidden layer = 1, dropout rate = 0.8, epochs = 300, learning rate = 10⁻², KNN: n = 4, metric = 'Minkowski', p = 2, SVM: degree = 3, kernel = ’linear’, cache size = 200; RF: max depth = 4, n estimators = 10; GBCs: max depth = 3, n_estimators = 100.

Performance of deep neural network with different feature groups

SYNDEEP was implemented with different feature group to evaluate performance. Furthermore, to confirm the deep neural network is robust to training, we implemented tenfold cross-validation (CV) for all six feature group groups. Table 1 summarizes the performance results of six groups. The results highlighted that the four groups achieved accuracy over 90%, and the two remaining groups achieved accuracy of 89%. Best outperforming score was achieved by DC_networkII with the highest accuracy of 92.21%, and the second highest accuracy was achieved by DC_networkI with 92.16%. By including the protein–protein interaction and protein-metabolite interaction similarity score on DC_networkI to generate DC_networkII, the accuracy score slightly increased.

Table 1 The overall results of the performance on six groups of features.

Full size table

It can be seen from the results that by adding gene mutation and gene expression information to the DT_CL group, accuracy slightly dropped. However, accuracy increased significantly when combined with other feature groups. To further investigate the effect of gene mutation and gene expression on synergy prediction, we left out their information. Substantial differences were not observed in the forecast's performance by excluding the information (ACCU. = 92.21%). In drug investigations, gene mutations and gene expression are highly predictive²⁰. In this sense, adding their information seems crucial to predicting the synergy effect.

The DC_networkII feature set group was selected as a complete feature set group. To better investigate the model’s prediction ability, we obtained the values of sensitivity, specificity, precision, F_-score, Matthews correlation coefficient (MCC), AUC, and Cohen’s kappa. The considered evaluation metrics comprehensively reflect the model performance in the values of F_-score, MCC, and AUC were obtained 92.15%, 84.41% and 97.32%, respectively. In addition, the Kappa value represents the model's inter-rater reliability with a score of 84.37%.

Figure 1 shows the area under the receiver operating characteristic curve (ROC-AUC) and the accuracy in evaluating the performance of each fold. As can be seen from the graphs, the curves obtained by ten folds had the most covering coordinate space. The result of the tenfold cross-validation indicates that SYNDEEP provided a more reliable performance on the entire feature set.

Performance comparison of the various machine learning methods

We reported the primary evaluation criteria for SYNDEEP, SNN, KNN, SVM, RF, and GBCs in Table 2. We observed from the table that SYNDEEP achieved the highest results in the entire evaluation criteria compared to other methods. The three models with the highest performance based on accuracy were (i) SYNDEEP, (ii) the GBCs, and (iii) the KNN. Accordingly, SYNDEEP accuracy was 92.21%, which is approximately 3.8% and 4.23% higher than the GBCs model, and the KNN, respectively. Among the models examined, the RF achieved poor accuracy (74.85%). Also, SYNDEEP achieved remarkable results in Matthews Correlation Coefficient (MCC. = 84.41%), representing the correlation between predictions and labels. Notably, SYNDEEP obtained a high Kappa score (Kappa. = 84.37%) while the RF had the lowest score (Kappa. = 49.69%).

Table 2 The overall results of the different state-of-art methods.

Full size table

The ROC curve was used as another evaluation measure. The ROC curve plots the true-positive rate (TPR) versus the false-positive rate (FPR). The area under the ROC curve (AUC) was calculated to reflect predictive accuracy. Figure 2 shows the visualization of the area under the ROC curve for SYNDEEP and the other classifiers to evaluate the performance of binary predictions. The AUCs for SYNDEEP, GBCs, and SNN were 0.97, 0.94 and 0.93, respectively. This shows SYNDEEP's potential for synergy prediction in drug combinations.

We conducted McNemar's Test to assess classifiers' performance. The null hypothesis of this test states that the probability of the synergistic effect being correctly identified is equal to the probability of the non-synergy effect being correctly identified. Also, the probability of the synergy effect being incorrectly classified is equal to the probability of the non-synergy effect being incorrectly classified. The p-value is calculated, and a p-value < 0.05 is considered significant, thus rejecting the null hypothesis. Hence, we obtained X² = 154.0, with a p-value of 0.0, which is below the set significance threshold (p-value = 0.05, degree of freedom = 1) and leads to rejection of the null hypothesis; we can conclude that SYNDEEP’s performance is reliable. The values of McNemar’s Test for GBC, RF, SVM, SNN, and KNN were achieved at 248.0, 363.0, 344.0, 295.0, and 154.0, respectively. Furthermore, RF and KNN obtained p-value < 0.05, while the comparison between GBC, SVM, and SNN gave a non-significant p-value > 0.05, so the null hypothesis was accepted. Generally, SYNDEEP achieved substantial performance compared with other methods to predict drug synergy.

Discussion

Cancer remains the primary cause of morbidity and mortality worldwide, despite the pharmaceutical and clinical research in cancer treatment²¹. Furthermore, the mono-therapeutic strategies are inefficient in cancer treatment because the targets are single proteins or pathways, and drug resistance occurs in these strategies^22,23. Accordingly, combining pharmacotherapy with synergistic effects is a promising approach^6,7. However, it is infeasible to identify and examine the possible combinations of anticancer drugs²⁴. Therefore, in silico approaches are beneficial to overcome the limitations.

Hence, we have developed a novel deep neural network model as an SYNDEEP that accurately predicts the synergistic effect of drug combinations for cancer cell lines. In this study, a data-driven approach to predict the synergy effects of drug combinations.

Previous studies used a small subset of genomic and drug-related data to predict synergy^{16,17,18,19,25,26}. Some studies, such as Jiang’s model¹⁶, have considered drug-drug synergy, drug-target interaction, and protein–protein interaction networks based on 39 cell lines. Moreover, the DeepSynergy¹⁸ model has been defined based on chemical descriptors and genomic features on 39 cell lines. AuDNNsynergy¹⁷ is another drug synergy model that uses multi-omic and chemical data for synergy prediction. The most striking result among previous studies emerged from the studies that utilized genomic and drug information.

To the best of our knowledge, there is no study in the literature that utilized the comprehensive feature set of genomic, physicochemical, and drug information to predict synergy. Utilizing genomic, PPI, PMI, and physicochemical data is important for overcoming drug resistance. The genomic data would be practical for predicting the drug combination synergy^27,28,29. In this study, a novel comprehensive feature set of genomic, physicochemical, and drug information (i.e. drug-target, protein–protein interaction, protein-metabolite interaction, gene mutation, gene expression, differential methylation, chemical structure, and cell lines) was constructed. Therefore, we used the comprehensive feature set of various data types to optimize the model’s performance.

The modification in depth and width of model architecture is the pivotal factor in deep neural networks to enhance performance³⁰. Generally, applying this property of deep neural networks leads to developing highly impactful architectures for diverse tasks^30,31. Several previous studies have utilized network-based and machine-learning methods, such as random forests and extremely randomized trees^{14,25,32,33,34}. In contrast, rare studies used deep neural networks that focused on a small range of features^{18,35,36,37,38,39,40}. Thus, we have used comprehensive biological features with a deep neural network. This strategy obtained satisfactory results, as shown in Table 2.

The Kappa value of SYNDEEP was 84.37%. The Kappa value demonstrated that classes have independent distributions. Few related studies in the literature have calculated the kappa coefficient^16,17,18, while most studies only reported the accuracy/ AUC value^14,34,37,38. The kappa values of DeepSynergy, AuDNNsynergy, and Jiang’s model were 0.51, 0.51, and 0.584, respectively. The kappa value in previous studies was just over 0.50%^16,17,18, which may represent fair to a reasonable agreement beyond chance.

Previous studies used different datasets to predict the drug combination effects based on the deep neural network^17,39,40,41. The NCI-ALMANAC and Merck datasets are the two large-scale pan-cancer datasets employed in most drug synergy prediction studies. For example, among these studies, the performance of the DeepSynergy, AuDNNsynergy, TranSynergy³⁹, and Jiang’s models has been evaluated on the Merck dataset. The nature of these studies in terms of recruited information (e.g. multi-omics, phenotypic and biophysical features) was different. For example, some studies utilized drug pathways information, gene expression information, and/or microRNA information. Several previous studies reported the accuracy value, some of which stated the MSE value^{17,18,39,40,41}. The accuracy values of DeepSynergy, AuDNNsynergy, and Jiang’s models were 0.92%, 0.93% and 0.919%, respectively.

In this study, SYNDEEP's performance has been tested on the NCI-ALMANAC dataset. Moreover, among the previous studies, the SYNPRED⁴⁰ and Xia et al.³⁵ utilized the NCI-ALMANAC dataset based on a deep neural network. The accuracy value of SYNPRED was 0.85%, while Xia’s model³⁵ reported the R², Pearson correlation, and Spearman correlation values.

In this study, the accuracy value of the novel deep neural network was 92.21%. While the AuDNNsynergy has reported a value score of 0.93% but the kappa value was 0.51%. However, the results of the current study highlighted the strength of the presented SYNDEEP, which was the large volume of features and network of features. Hence, the proposed model achieved a high score in kappa and accuracy values (Kappa. = 84.37%, ACCU. = 92.21%).

Evidence of in silico methods represented that combining the biological, chemical, and phenotypic properties had a substantial role in the modeling and prediction procedures^42,43. According to the performance result of feature sets, it was evident that combining the genomic and physicochemical features had significant effects on improving the performance.

The network-based approaches evaluate the interactions among the various agents. Agents in the prediction of drug effects have different natures. Several studies have used network-based analysis to predict the drug combinations effect^{19,25,26,34,37}.

Previous studies prove that integrating the genomic data, drug targeting networks, chemical structure, cell lines, genomic profiling data, gene mutation, gene expression, and protein interaction based on the network approach has a significant role in synergy prediction^{19,25,26,34,37}.

Therefore, we have investigated the effect of drug interactions among the diverse feature agents (i.e. drug-target, protein–protein interaction, metabolite-protein interaction, gene expressions, gene mutations, chemical structures, cell lines, and differential methylation). However, the previous studies investigated and analyzed the network of interaction while not examining the network by deep neural network^{19,25,26,34,37}.

Here, we have used a network of features comprised of different relationships based on the nature of the information for the first time. As shown in Table 2, the features network has achieved high performance and improved the deep neural network model.

Table 2 has proved that deep neural networks had superior outcomes compared to other state-of-art methods by the network of features.

Deep neural networks have significant capabilities in complex dimensional spaces by various structures^44,45. Therefore, deep neural network algorithms have been proven to be a more capable system and have a better accuracy rate than other algorithms in classifying synergistic effects in drug combinations.

Conclusion

The current study proposes SYNDEEP for the prediction of synergy in drugs combination. It is well known that the critical factor in predicting drugs effect is extracting practical features, so the main superiority of this study was the network of extensive features. SYNDEEP integrated the various type of physicochemical and genomics information to generate the comprehensive features set for successfully and robustly predicting drug combinations’ effects. SYNDEEP obtained 92.21% prediction accuracy utilizing the tenfold cross-validation in the NCI-ALMANAC dataset. In the experiment, we compared SYNDEEP, SNN, KNN, SVM, RF, and GBC methods that SYNDEEP achieved superior performance. The result indicated that the deep neural network is the competent learning technique to determine synergy. For better and more accurate predictions in the future, updating and complementing the information of cell lines and targets of drugs are needed. Although the in silico approaches provide substantial insights into in vivo and in vitro experiments and accelerate the procedures, well-designed experimental investigations are required to prove the data resulting from in silico computational analyses.

Materials and methods

The first step of the proposed deep neural network was data preparation which consisted of data acquisition, features extraction, and network of features construction. The next step was prediction model construction which comprised the synergy prediction and model evaluation. Figure 3 illustrates a schematic representation of how the model uses the distinct features to predict synergistic drug combinations. The network of features is recruited as input for a deep neural network to predict the synergistic effect. The network has been constructed from the sequence of physicochemical, genomic, protein–protein interaction, and protein-metabolite interaction data. The data has been produced from different databases based on drug-pairs information. The following figure illustrates the overall steps of the study.

Data acquisition

In this study, the NCI-ALMANAC dataset was used⁴⁶. This dataset is the most well-known anticancer drug combinations effect dataset, covering the combinations of drug pairs against the NCI-60 cell lines with different concentrations. The drugs in the dataset contain FDA-approved drugs in oncology. The NCI-60 panel comprises 60 human tumor cell lines derived from nine various tumor types. The NCI-ALMANAC introduced the ComboScore to quantify the combination benefit of pairs of drugs which initially modified the version of the Bliss independence score. The ComboScore defines the combination activity by three classes of positive, negative, and zero values. Hence, the combination of drug pairs with positive values represents to be synergistic, whereas the negative values indicate antagonist combinations and zero values correspond to additive combinations.

The ComboScore has defined the score range from 1 to 200 as synergistic, and − 1 to − 228 has specified the antagonistic effect. The score of drug combinations was converted to 1 and 0 to reduce the computational complexity. One represents the synergistic drug-pairs combination, and 0 illustrates the antagonistic drug-pairs combination. By sorting the dataset, the number of antagonist combination effects was more than the synergistic combinations. To avoid the occurrence of an imbalanced dataset, we selected an equal number of drug pairs based on the number of synergistic combinations.

In the present study, different feature groups have been extracted for each drug in the dataset. The extracted feature groups are as follows:

Extraction of the drug-target interactions

The total indexed drugs and related information in DrugBank(version 5.1.8.) was downloaded⁴⁷. In the next step, the FDA-approved anticancer drugs were extracted from the XML file of DrugBank. Hence, the list of drug-target interactions (DT) for each anticancer drug based on protein targets was elicited. For each drug, the pairs of (Di, DTj) were created, where D_i was observed in the approved drug list, and DT_j was extracted as a protein target.

Extraction of the protein–protein interactions

The critical resource of protein–protein interaction data is the STRING (Search Tool for the Retrieval of Interacting Genes/Proteins)⁴⁸ database which is a reliable tool for providing the properties of proteins. As the first step, the sequences of proteins and accessions were retrieved from STRING (version 11.5). Then, protein–protein interaction (PPI) was extracted based on every protein related to DTs. Next, the pair of DTs and PPI was created as (DT_i and PPI_j), where DT_i $\in${protein target reported for drug_i} and PPI_j $\in${PPI stated for protein target_j}.

Extraction of the genomic features (gene expression, mutations, and differential methylation)

The Catalogue Of Somatic Mutations In Cancer (COSMIC)⁴⁹ database represents worthwhile information on genomic sequences and microarray expression data. The COSMIC database provides extensive data for individual genes/ cell lines. The COSMIC (version 95-24) has provided the individual files for gene expression (GE), mutation (GM), and differential methylation (DM), which all relevant files downloaded. However, the vast majority of information was not relevant to predicting synergy, which might result in a bias. Hence, we filtered the data based on the cell lines and protein targets. As a result, we had the pairs of (D_i, GE_j), (D_i, GM_k), and (D_i, DM_n), where D_i was considered in the list of drugs, GE_j, GM_k and DM_n were extracted as gene expression, mutation, and differential methylation.

Extraction of the protein-metabolite interaction

The Human Metabolome Database (HMDB)⁵⁰ was utilized to construct the protein-metabolite interaction(PMI) data in this study. The metabolite and protein data lists were downloaded from HMDB version 4.0. The list of the corresponding metabolites for every protein in DTs was extracted. Then, the pair of (DT_i, M_j) was considered, where DT_i $\in${protein target reported for drug_i} and M_i $\in${metabolite reported for protein_j}.

Extraction of the chemical structure

We used the Morgan fingerprint counts, total polar surface area, molecular weight, logP, aliphatic and aromatic rings, H-bond donors, and acceptors as chemical structures (CF). The Morgan fingerprint counts are designed to represent the number of times a particular substructure detects in the molecule. To integrate the features into the deep neural network model, we used the RDKit library. As a result, chemical features for each drug were obtained.

Dimension reduction based on the similarity measure

In total, there were 19,888 features for one drug except for cell lines. The entire features for drug pairs have consisted of 39,776 features with cell lines. These extensive features are considered the high dimensionality problem in constructing the synergy prediction model. The PPI and PMI comprised the high-dimension features among the other feature groups. Hence, the Russell-Rao similarity measure⁵¹ on PPI and PMI on pairs of drugs was used to overcome this issue. Therefore, the similarity values of PPI and PMI were included in the vector instead of the total features. The Russell-Rao similarity measure is defined as follows:

$${\mathrm{S}}_{\mathrm{Russell}-\mathrm{Rao}}=\frac{x}{d}$$

where x denotes the number of features where the values of vector one and vector two are one, which means positive matches, d denotes the total sum of the length of vectors.

Feature network construction

The network feature in SYNDEEP (Fig. 3) was a heterogeneous network consisting of eight components: DT, PPI, PMI, GM, GE, DM, CF, and cell line (CL). In the previous section, it was described how to build each group of features.

In this study, different types of features were utilized; hence we constructed an undirected network U = (V, E), where V is a set of N nodes, in which V = {V ^DT $\cup$ V ^PPI $\cup$ V ^PMI $\cup$ V ^GM $\cup$ V ^GE $\cup$ V ^DM $\cup$ V ^CF $\cup$ V ^CL} is composed of cell lines, and seven sets of feature components, and E is a set of M edges such as protein–protein interaction and drug target. These N nodes have node feature vectors ${a}_{1},{a}_{2,}{a}_{3},\dots , {a}_{N} \in {\mathbb{R}}^{d}$ where d is the dimension of the feature vector. As for the edges, for example, ${(v}_{i},{v}_{j})$ represents the link between node ${v}_{i}$ and ${v}_{j}$. The total set of feature vectors has been combined to generate a feature matrix. As shown in Fig. 4, was defined feature matrix F as:

$$F=\left\{ \begin{array}{c} {F}^{D-DT}\in {\mathbb{R}}^{{N}_{{V}^{D}}\times {N}_{{V}^{DT}}}\\ if\,{v}_{i}\in {V}^{D}\text{,} \, {v}_{j} \in {V}^{DT\cdot }\\ {F}^{DT-PPI}\in {\mathbb{R}}^{{N}_{{V}^{DT}}\times {N}_{{V}^{PPI}}}\\ if\,{v}_{i} \in {V}^{DT}.{v}_{j} \in {V}^{PPI\cdot }\\ {F}^{DT-PMI}\in {\mathbb{R}}^{{N}_{{V}^{DT}}{\times N}_{{V}^{PMI}}}\\ if\,{v}_{i }\in {V}^{DT}.{ v}_{j}\in {V}^{PMI\cdot }\\ {F}^{DT-GM}\in {\mathbb{R}}^{{N}_{{V}^{DT}}\times {N}_{{V}^{GM}}}\\ if\,{v}_{i}\in {V}^{DT}.{v}_{j}\in {V}^{GM\cdot }\\ {F}^{DT-GE}\in {\mathbb{R}}^{{N}_{{V}^{DT}}\times {N}_{{V}^{GE}}} \\ if\,{v}_{i}\in {V}^{DT}.{v}_{j}\in {V}^{GE\cdot }\\ {F}^{DT-DM}\in {\mathbb{R}}^{{N}_{{V}^{DT}}\times {N}_{{V}^{DM}}}\\ if\,{v}_{i}\in {V}^{DT}.{v}_{j}\in {V}^{DM\cdot }\\ {F}^{D-CF}\in {\mathbb{R}}^{{N}_{{V}^{D}}\times {N}_{{V}^{CF}}}\\ if\,{v}_{i}\in {V}^{D}.{v}_{j}\in {V}^{CF\cdot }\\ {F}^{D-CL}\in {\mathbb{R}}^{{N}_{{V}^{D}}\times {N}_{{V}^{CL}}}\\ if\,{v}_{i}\in {V}^{D}.{v}_{j}\in {V}^{CL\cdot }\end{array}\right.$$

where D is a set of drugs that are related to cancer drugs. Hence, F_ij ∈ F^D−DT = 1 if a D node v_i and a DT node v_j are related according to the drug and drug target, F_ij = 0 otherwise, which means between drug and drug target there is no interaction. F^DT−PPI ∈ [0, n] is indicated the similarity values between pairs of PPI nodes, in which n represents the non-zero values. F^DT−PMI ∈ [0, n] is represented the non-zero values of similarity between pairs of PMI nodes. F_ij ∈ F^DT−GM = 1 indicates DT i is related according to the drug target and gene mutation with j-th GM, F_ij = 0 otherwise. F_ij ∈ F^DT−GE = 1 represents the i-th DT is associated with a j-th GE, F_ij = 0 otherwise. F_ij ∈ F^DT−DM = 1 if the interaction between a DT and DM nodes has been observed, F_ij = 0 otherwise. In this study, to represent the chemical features of drugs F^D−CF ∈ [0, n] is utilized. where D − CF is the chemical features of drugs, and n indicates the non-zero values of features. F_ij ∈ F^D−CL = 1 if a D node v_i and a CL node v_j are related according to the NCI-ALMANAC data, F_ij = 0 otherwise.

To utilize the topological structural relations among the cell lines, and seven sets of features embedded in the network U, we formulated the multi-structure of topology as single input feature vectors (Fig. 5). As an input feature vector, the vector was divided into eight different groups: seven sets of features, and cell line related features. As mentioned above, the value of each group consists of two sets: (i) zero, and (ii) non-zero values.

Feature groups construction

The various features group was established to assess the benefits of features on classifying the drug synergy, along with the network of features. Six groups of features were constructed. Where, the first group consisted of drug targets and cell lines, and the last group was the network of features comprised of total feature groups. Table 3 details the information on groups.

Table 3 Feature group description.

Full size table

Construction of deep neural network model

The Multi‑layer perceptron (MLP) was selected to build the model for synergy prediction. The shape of the MLP model was conic. The different hyperparameter settings were considered (i.e. number of layers, number of neurons, and the learning rate). The six diverse different layers (3, 4, 5, 6, 7) were tested. Various numbers of neurons (1024, 512, 256, 128, 64, 32) and learning rates (10⁻¹, 10⁻², 10⁻⁵, 10⁻⁸, 10⁻⁷⁵) were examined. After comparing the outcomes of different deep networks, the best result was observed from the five-layer network for binary classification of the drug synergy. The selected model had an input layer, 3 hidden layers, and an output layer (Fig. 6). The two activation functions have been used in the model: the ReLu and the sigmoid functions.

$$ReLu\left(x\right)=\left\{\begin{array}{c}x\,if\,x\ge 0\\ 0 \,otherwise \end{array}\right.$$

The ReLu activation function is termed Rectified Linear Unit or rectifier, which is one of the common activation functions for deep learning. The ReLu function is used in hidden layers to detect the patterns, and the performance of the model increase by overcoming the gradient vanishing. The activation function of the last layer was the sigmoid function:

$$sigmoid\left(x\right)=\left\{\frac{1}{1+{e}^{-y}}\right.$$

To optimize, we used the binary cross-entropy function as a loss function.

$$L= -\frac{1}{N} \sum_{i=1}^{N}{y}_{i}\cdot \mathrm{log}\left({s}_{i}\right)+\left(1-{y}_{i}\right)\cdot \mathrm{log}\left(1-{s}_{i}\right)$$

where y_i is the actual synergy label of each drug pair, s_i is the predicted synergy score of each combination, and N is the number of drug combinations.

Applying other machine learning methods

To evaluate the performance of the predictive model, we compared the deep neural network model with different machine learning algorithms. Shallow neural network (SNN), k-nearest neighbor (KNN), Support vector machines (SVMs), random forest (RF), and gradient boosting classifiers (GBCs) were the algorithms that were recruited in this step. All the methods were examined by different hyperparameters. The performance of considered algorithms is largely dependent on hyperparameters. Therefore, the various hyperparameters of the algorithms were adjusted to the optimal state.

Evaluation criteria of presented models

The tenfold cross-validation and some popular evaluation criteria were utilized in the proposed experiments to evaluate the models’ performance. The tenfold cross-validation involves randomly splitting the whole dataset into ten independent subsets equally sized. Each time one fold is used for testing, and nine remaining folds are used for the training, which this procedure is repeated iteratively. This process is performed ten times to ensure each fold is tested at once. To evaluate the performance of the prediction model, widely used evaluation criteria, including accuracy (Accu.), sensitivity (Sen.), specificity (Spec.), precision (Prec.), F-Score (F_score), Matthews Correlation Coefficient (MCC.), and Cohen's kappa coefficient (Kappa) were calculated as follow.

$$Accu. = \frac{TP+TN}{TP+TN+FP+FN}$$

$$Sen. = \frac{TP}{TP+FN}$$

$$Spec. = \frac{TN}{TN+FP}$$

$$Prec. = \frac{TP}{TP+FP}$$

$${F}_{score}=2\times \frac{Sen. \times Prec. }{Sen. +Prec. }$$

$$MCC. = \frac{TP\times TN-FP\times FN}{\sqrt{\left(TP+FP\right)\left(TP+FN\right)\left(TN+FP\right)\left(TN+FN\right)}}$$

$$Kappa. = \frac{Accu. -{P}_{c}}{1-{P}_{c}}$$

where TP, FP, TN, and FN denote true positive, false positive, true negative, and false negative, respectively. The current study used Cohen’s kappa to evaluate the agreement grade between observed accuracy and expected accuracy and the performance of the prediction model. Where P_c is the probability value of agreements expected by chance.

In addition, we performed McNemar’s Test to compare SYNDEEP results with proposed state-of-the-art computational models. McNemar’s Test is defined as follows:

$${X}^{2} = \frac{{(\left|B-C\right|-1)}^{2}}{B+C}$$

where B is the number of drug combinations that were detected correctly as non-synergy and incorrectly detected as synergy effects, while C is the number of drug combinations that were detected correctly as synergy and incorrectly as non-synergy effects.

Computational equipment

In this study, the deep neural network model is implemented in python language using Python 3.7 version. To implement the deep learning methods and machine learning algorithms, we used the Keras and Scikit-learn libraries, respectively. The software environment to develop the experiments was Google colab.

Data availability

Dataset: NCI-ALMANAC Data Resource (https://dtp.cancer.gov/ncialmanac). Drug-target Interactions Data: Drug-target interactions (https://go.drugbank.com/releases/latest). Protein–Protein Interactions Data: PPI (https://string-db.org/). Genomic Features Data: gene expression, mutations, and differential methylation (https://cancer.sanger.ac.uk/cosmic). Protein-Metabolite Interaction Data: protein-metabolite interaction (https://www.mhmdb.co.uk/). Scripts: The source code of SYNDEEP is available in (https://github.com/annatorkamannia/SYNDEEP).

References

Nagai, H. & Kim, Y. H. Cancer prevention from the perspective of global cancer burden patterns. J. Thorac. Dis. 9, 448–451. https://doi.org/10.21037/jtd.2017.02.75 (2017).
Article PubMed PubMed Central Google Scholar
Alamzadeh, Z. et al. Ultrastructural and optical characteristics of cancer cells treated by a nanotechnology based chemo-photothermal therapy method. J. Photochem. Photobiol. B 192, 19–25. https://doi.org/10.1016/j.jphotobiol.2019.01.005 (2019).
Article CAS PubMed Google Scholar
Piscitello, A. & Edwards, D. K. Estimating the screening-eligible population size, ages 45–74, at average risk to develop colorectal cancer in the United States. Cancer Prev. Res. 13, 443–448. https://doi.org/10.1158/1940-6207.Capr-19-0527 (2020).
Article Google Scholar
Wang, X., Zhang, H. & Chen, X. Drug resistance and combating drug resistance in cancer. Cancer Drug Resist. 2, 141–160. https://doi.org/10.20517/cdr.2019.10 (2019).
Article PubMed PubMed Central Google Scholar
Hassan, M., Watari, H., AbuAlmaaty, A., Ohba, Y. & Sakuragi, N. Apoptosis and molecular targeting therapy in cancer. Biomed. Res. Int. 2014, 150845. https://doi.org/10.1155/2014/150845 (2014).
Article CAS PubMed PubMed Central Google Scholar
Jardim, D. L., De Melo Gagliato, D., Nikanjam, M., Barkauskas, D. A. & Kurzrock, R. Efficacy and safety of anticancer drug combinations: A meta-analysis of randomized trials with a focus on immunotherapeutics and gene-targeted compounds. Oncoimmunology 9, 1710052. https://doi.org/10.1080/2162402x.2019.1710052 (2020).
Article CAS PubMed PubMed Central Google Scholar
Falzone, L., Salomone, S. & Libra, M. Evolution of cancer pharmacological treatments at the turn of the third millennium. Front. Pharmacol. 9, 1300. https://doi.org/10.3389/fphar.2018.01300 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ayoub, N. M. Editorial: Novel combination therapies for the treatment of solid cancers. Front. Oncol. 11, 708943. https://doi.org/10.3389/fonc.2021.708943 (2021).
Article PubMed PubMed Central Google Scholar
Torkamannia, A., Omidi, Y. & Ferdousi, R. A review of machine learning approaches for drug synergy prediction in cancer. Brief. Bioinform. https://doi.org/10.1093/bib/bbac075 (2022).
Article PubMed Google Scholar
Foucquier, J. & Guedj, M. Analysis of drug combinations: Current methodological landscape. Pharmacol. Res. Perspect. 3, e00149. https://doi.org/10.1002/prp2.149 (2015).
Article PubMed PubMed Central Google Scholar
Jaaks, P. et al. Effective drug combinations in breast, colon and pancreatic cancer cells. Nature 603, 166–173. https://doi.org/10.1038/s41586-022-04437-2 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Jamali, A. A. et al. DrugMiner: Comparative analysis of machine learning algorithms for prediction of potential druggable proteins. Drug Discov. Today 21, 718–724. https://doi.org/10.1016/j.drudis.2016.01.007 (2016).
Article CAS PubMed Google Scholar
Brogi, S., Ramalho, T. C., Kuca, K., Medina-Franco, J. L. & Valko, M. Editorial: In silico methods for drug design and discovery. Front. Chem. 8, 612. https://doi.org/10.3389/fchem.2020.00612 (2020).
Article ADS PubMed PubMed Central Google Scholar
Celebi, R., Bear Don’t Walk, O., Movva, R., Alpsoy, S. & Dumontier, M. In-silico prediction of synergistic anti-cancer drug combinations using multi-omics data. Sci. Rep. 9, 8949. https://doi.org/10.1038/s41598-019-45236-6 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Ferdousi, R., Safdari, R. & Omidi, Y. Computational prediction of drug-drug interactions based on drugs functional similarities. J. Biomed. Inform. 70, 54–64. https://doi.org/10.1016/j.jbi.2017.04.021 (2017).
Article PubMed Google Scholar
Jiang, P. et al. Deep graph embedding for prioritizing synergistic anticancer drug combinations. Comput. Struct. Biotechnol. J. 18, 427–438. https://doi.org/10.1016/j.csbj.2020.02.006 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zhang, T., Zhang, L., Payne, P. R. O. & Li, F. Synergistic drug combination prediction by integrating multiomics data in deep learning models. Methods Mol. Biol. 2194, 223–238. https://doi.org/10.1007/978-1-0716-0849-4_12 (2021).
Article CAS PubMed Google Scholar
Preuer, K. et al. DeepSynergy: Predicting anti-cancer drug synergy with deep learning. Bioinformatics 34, 1538–1546. https://doi.org/10.1093/bioinformatics/btx806 (2018).
Article CAS PubMed Google Scholar
Yang, M. et al. Stratification and prediction of drug synergy based on target functional similarity. NPJ Syst. Biol. Appl. 6, 16. https://doi.org/10.1038/s41540-020-0136-x (2020).
Article PubMed PubMed Central Google Scholar
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754. https://doi.org/10.1016/j.cell.2016.06.017 (2016).
Article CAS PubMed PubMed Central Google Scholar
Sung, H. et al. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249. https://doi.org/10.3322/caac.21660 (2021).
Article PubMed Google Scholar
Karami Fath, M. et al. Anti-cancer peptide-based therapeutic strategies in solid tumors. Cell. Mol. Biol. Lett. 27, 33. https://doi.org/10.1186/s11658-022-00332-w (2022).
Article CAS PubMed PubMed Central Google Scholar
Housman, G. et al. Drug resistance in cancer: An overview. Cancers (Basel) 6, 1769–1792. https://doi.org/10.3390/cancers6031769 (2014).
Article PubMed Google Scholar
Fan, K., Cheng, L. & Li, L. Artificial intelligence and machine learning methods in predicting anti-cancer drug combination effects. Brief. Bioinform. https://doi.org/10.1093/bib/bbab271 (2021).
Article PubMed PubMed Central Google Scholar
Huang, L. et al. DrugComboRanker: Drug combination discovery based on target network analysis. Bioinformatics 30, i228-236. https://doi.org/10.1093/bioinformatics/btu278 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bansal, M. et al. A community computational challenge to predict the activity of pairs of compounds. Nat. Biotechnol. 32, 1213–1222. https://doi.org/10.1038/nbt.3052 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ghosh, A. & Saha, S. Survey of drug resistance associated gene mutations in Mycobacterium tuberculosis, ESKAPE and other bacterial species. Sci. Rep. 10, 8957. https://doi.org/10.1038/s41598-020-65766-8 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Shen, Y. & Yan, Z. Systematic prediction of drug resistance caused by transporter genes in cancer cells. Sci. Rep. 11, 7400. https://doi.org/10.1038/s41598-021-86921-9 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Kang, H. C. et al. Identification of genes with differential expression in acquired drug-resistant gastric cancer cells using high-density oligonucleotide microarrays. Clin. Cancer Res. 10, 272–284. https://doi.org/10.1158/1078-0432.ccr-1025-3 (2004).
Article CAS PubMed Google Scholar
Sarker, I. H. Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput. Sci. 2, 420. https://doi.org/10.1007/s42979-021-00815-1 (2021).
Article PubMed PubMed Central Google Scholar
Pacal, I., Karaboga, D., Basturk, A., Akay, B. & Nalbantoglu, U. A comprehensive review of deep learning in colon cancer. Comput. Biol. Med. 126, 104003. https://doi.org/10.1016/j.compbiomed.2020.104003 (2020).
Article PubMed Google Scholar
Huang, L. et al. Driver network as a biomarker: Systematic integration and network modeling of multi-omics data to derive driver signaling pathways for drug combination prediction. Bioinformatics 35, 3709–3717. https://doi.org/10.1093/bioinformatics/btz109 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sidorov, P., Naulaerts, S., Ariey-Bonnet, J., Pasquier, E. & Ballester, P. J. Predicting synergism of cancer drug combinations using NCI-ALMANAC data. Front. Chem. 7, 509. https://doi.org/10.3389/fchem.2019.00509 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, X. et al. Prediction of synergistic anti-cancer drug combinations based on drug target network and drug induced gene expression profiles. Artif. Intell. Med. 83, 35–43. https://doi.org/10.1016/j.artmed.2017.05.008 (2017).
Article PubMed Google Scholar
Xia, F. et al. Predicting tumor cell line response to drug pairs with deep learning. BMC Bioinform. 19, 486. https://doi.org/10.1186/s12859-018-2509-3 (2018).
Article CAS Google Scholar
Zhang, H., Feng, J., Zeng, A., Payne, P. & Li, F. Predicting tumor cell response to synergistic drug combinations using a novel simplified deep learning model. AMIA Annu. Symp. Proc. 2020, 1364–1372 (2020).
PubMed Google Scholar
Sun, Y. et al. Combining genomic and network characteristics for extended capability in predicting synergistic drugs for cancer. Nat. Commun. 6, 1–10 (2015).
Article ADS Google Scholar
Li, H., Li, T., Quang, D. & Guan, Y. Network propagation predicts drug synergy in cancers. Can. Res. 78, 5446–5457 (2018).
Article CAS Google Scholar
Liu, Q. & Xie, L. TranSynergy: Mechanism-driven interpretable deep neural network for the synergistic prediction and pathway deconvolution of drug combinations. PLoS Comput. Biol. 17, e1008653 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Preto, A. J., Matos-Filipe, P., Mourão, J. & Moreira, I. S. SynPred: prediction of drug combination effects in cancer using full-agreement synergy metrics and deep learning. GigaScience 11, giac087 (2022).
Article PubMed PubMed Central Google Scholar
Kuenzi, B. M. et al. Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell 38, 672-684.e676 (2020).
Article CAS PubMed PubMed Central Google Scholar
Jamal, S., Goyal, S., Shanker, A. & Grover, A. Predicting neurological Adverse Drug Reactions based on biological, chemical and phenotypic properties of drugs using machine learning models. Sci. Rep. 7, 872. https://doi.org/10.1038/s41598-017-00908-z (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Ye, Z., Chen, F., Zeng, J., Gao, J. & Zhang, M. Q. ScaffComb: A phenotype-based framework for drug combination virtual screening in large-scale chemical datasets. Adv. Sci. 8, e2102092. https://doi.org/10.1002/advs.202102092 (2021).
Article CAS Google Scholar
Kraus, M., Feuerriegel, S. & Oztekin, A. Deep learning in business analytics and operations research: Models, applications and managerial implications. Eur. J. Oper. Res. 281, 628–641 (2020).
Article Google Scholar
Min, S., Lee, B. & Yoon, S. Deep learning in bioinformatics. Brief. Bioinform. 18, 851–869 (2017).
PubMed Google Scholar
Holbeck, S. L. et al. The National Cancer Institute ALMANAC: A comprehensive screening resource for the detection of anticancer drug pairs with enhanced therapeutic activity. Can. Res. 77, 3564–3576 (2017).
Article CAS Google Scholar
Wishart, D. S. et al. DrugBank: A comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 34, D668–D672 (2006).
Article CAS PubMed Google Scholar
Mering, C. V. et al. STRING: A database of predicted functional associations between proteins. Nucleic Acids Res. 31, 258–261 (2003).
Article Google Scholar
Forbes, S. A. et al. COSMIC: Exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 43, D805–D811 (2015).
Article CAS PubMed Google Scholar
Wishart, D. S. et al. HMDB: The human metabolome database. Nucleic Acids Res. 35, D521–D526 (2007).
Article CAS PubMed PubMed Central Google Scholar
Willett, P. Similarity-Based Approaches to Virtual Screening (Portland Press Ltd., 2003).
Book Google Scholar

Download references

Acknowledgements

This research is part of an MSc thesis approved and funded by Tabriz University of Medical Sciences.

Author information

Authors and Affiliations

Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, 51656/65811, Iran
Anna Torkamannia & Reza Ferdousi
Department of Pharmaceutical Sciences, College of Pharmacy, Nova Southeastern University, Fort Lauderdale, FL, 33328, USA
Yadollah Omidi

Authors

Anna Torkamannia
View author publications
You can also search for this author in PubMed Google Scholar
Yadollah Omidi
View author publications
You can also search for this author in PubMed Google Scholar
Reza Ferdousi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to writing the manuscript, building datasets, implementing neural networks and machine learning algorithms, analyzing data and predictors, and reviewing the manuscript.

Corresponding author

Correspondence to Reza Ferdousi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Torkamannia, A., Omidi, Y. & Ferdousi, R. SYNDEEP: a deep learning approach for the prediction of cancer drugs synergy. Sci Rep 13, 6184 (2023). https://doi.org/10.1038/s41598-023-33271-3

Download citation

Received: 08 January 2023
Accepted: 11 April 2023
Published: 15 April 2023
DOI: https://doi.org/10.1038/s41598-023-33271-3
Springer Nature Limited

SYNDEEP: a deep learning approach for the prediction of cancer drugs synergy

Abstract

Similar content being viewed by others

Interpreting drug synergy in breast cancer with deep learning using target-protein inhibition profiles

A machine learning framework for predicting synergistic and antagonistic drug combinatorial efficacy

Predicting tumor cell line response to drug pairs with deep learning

Introduction

Results

Performance of deep neural network with different feature groups

Performance comparison of the various machine learning methods

Discussion

Conclusion

Materials and methods

Data acquisition

Extraction of the drug-target interactions

Extraction of the protein–protein interactions

Extraction of the genomic features (gene expression, mutations, and differential methylation)

Extraction of the protein-metabolite interaction

Extraction of the chemical structure

Dimension reduction based on the similarity measure

Feature network construction

Feature groups construction

Construction of deep neural network model

Applying other machine learning methods

Evaluation criteria of presented models

Computational equipment

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation