Introduction

In complex diseases, the effectiveness of targeting multiple biochemical strategies within cells using a single drug focused on a single target is limited. To overcome this inefficiency, drug combination therapy emerges as a promising treatment approach that combines multiple drugs to achieve superior therapeutic effects beyond what individual drugs can achieve alone. Another advantage is the reduction in side effects by minimizing the required dosage for each drug. The success of drug combination therapy has been evident for several decades, particularly in addressing the common occurrence of drug resistance in cancer. Consequently, identifying optimal drugs combination is a critical task with far-reaching implications for translational, clinical, and financial research. The synergy score plays a crucial role in determining the effectiveness of drug combinations.

Testing the synergy of drug combinations through experiments becomes impractical when dealing with a large number of combinations in high-throughput screens. Experimental approaches are not only risky but also costly and time-consuming, demanding substantial human resources, research experience, and technical expertise. Consequently, deep learning models have emerged as valuable tools in biomedical applications, offering the ability to simulate and analyze biomedical data in a more efficient and scalable manner. So, several approaches have been suggested for constructing a simulation model capable of predicting the target of drug combinations. These techniques rely on various features, including chemical drug features, structural network interactions, and cell line omics data.

Among these techniques, DeepSynergy1 was the pioneer in applying deep learning models, surpassing other machine learning algorithms in performance. Subsequently, the MatchMaker technique2 improved the structure of the deep learning network. TranSynergy3 proposed a novel representation of drugs based on selected genes for drug target genes, employing a transformer network. AuDNNsynergy4 leveraged full omics data, including gene expression, copy number, and genetic mutation data, to represent the cell line. DeepDDS5 utilized a graph attention network to extract drug features. Lastly, PRODeepSyn6 investigated the impact of gene interactions in cancer cell lines. Despite the diverse perspectives employed by these techniques, the prediction of drug combination synergy remains a challenging problem.

In this paper, we present an integrated multi-task model that enables simultaneous prediction of both the synergy score and synergy class label of drug combinations. Our approach incorporates various techniques to effectively represent the drugs and the cell line, enabling comprehensive analysis.

To represent the drugs, the SMILE notation is utilized, which allows to extract two distinct features. Firstly, the Mordred package is employed to extract structural chemical SMILE features. Secondly, an improved multi-view graph embedding technique is proposed to extract a feature vector of drugs which represents as a graph.

On the other hand, the cell line is represented using gene expression data obtained from 875 genes, providing a comprehensive depiction of the cell. Additionally, we integrate drug-drug interaction information to study its influence on cancer cell lines. This is achieved by pretraining a drug-drug interaction network with well-known data. The two drugs are then fed into this network to extract drug interaction features.

Next, the cancer cell line and the drug interaction features are fed into an attention mechanism, which produces enhanced features for the cancer cell line. Subsequently, the drug and cell line features are concatenated into a single feature vector, which is further processed by an attention model. The attention model generates two weighted feature vectors that serve as inputs for predicting the synergy score and synergy class label.

To facilitate knowledge transfer between the two tasks, the cross-stitch algorithm is learned to predict their relationship. Finally, two fully connected subnetworks are employed to generate the output for the synergy score and class label, respectively.

Overall, our proposed multi-task model presents a comprehensive and effective framework for predicting drug combination synergies and outperforms other compared methods.

The subsequent sections of this paper are organized as follows: “Preliminary” presents a comprehensive overview of the fundamental knowledge and methodologies employed in the MutliSyn, specifically focusing on attention and cross-stitch algorithms. “MutliSyn” presents a comprehensive description of the MutliSyn, outlining its key components and functionalities. “Experimental results” covers aspects such as the dataset used, evaluation metrics, model parameters, and experimental results. Lastly, “Conclusion” concludes with a summary of the MutliSyn.

Preliminary

This section begins by introducing the necessary background knowledge on attention and cross-stitch mechanisms separately.

Attention mechanism

The attention model7 is a powerful framework employed in multi-task models. It utilizes attention gates to analyze the complex relationships among vectors, guiding the learning process for different tasks. By incorporating attention methods, the model dynamically assigns varying weights or importance to specific values within a vector set. This enables the model to prioritize and highlight the most informative vectors while disregarding less significant ones. Additionally, the attention model is capable of identifying significant patterns and relationships by selectively attending to vectors relevant to a particular task.

In this research, a multi-head attention approach is employed, enabling separate weighting of extracted features for each task. This mechanism allows the model to effectively navigate through the intricate relationships and dependencies within the data, leading to improved outcomes in multi-task learning and facilitating an exploration of the relationships between tasks. Figure 1 shows the mechanism of multi-head attention.

Figure 1
figure 1

The attention mechanism process that first project the input by using dense layer into query, key, value. Then a dot product is executed between the query and key, accompanied by a softmax activation to generate attention weights. These attention weights are then applied in a dot product operation.

The multi-attention mechanism involves three primary inputs: query (\(q\)), key (\(k\)), and value (\(v\)) vectors. The attention model maps the input layer \({I}_{0}\) to \({I}_{q}\), \({I}_{k}\), \({I}_{v}\) using separate dense projection layers, as described by Eqs. (1), (2), and (3):

$${I}_{q}=f\left(wa*{I}_{0}+b\right)$$
(1)
$${I}_{k}=f\left(wa*{I}_{0}+b\right)$$
(2)
$${I}_{v}=f\left(wa*{I}_{0}+b\right)$$
(3)

Here, \(f\) denotes the activation function ‘relu’ while \(wa\) and \(b\) denote attention weight and bias vector, respectively.

Subsequently, dot-product attention is employed to these vectors as shown in Eq. (4).

$$s=softmax({I}_{q}{*I}_{k})$$
(4)

Then, the output is summarized using Eq. (5):

$${I}_{f}^{1}=\sum s*{I}_{v}$$
(5)

The aforementioned steps are iterated \(h\) parallel heads times and the resulting vectors are concatenated together to obtain the final vector according to Eq. (6).

$${I}_{f}=concat({I}_{f}^{1},{I}_{f}^{2},\dots \dots ,{I}_{f}^{h}).$$
(6)

In this paper, to capture a broader range of information and mitigate potential overfitting issues, the output of the multi-head attention is merged with the input of the multi-head by applying the concatenation operation as specified in Eq. (7).

$${I}_{final}=concat({I}_{f},{I}_{0}).$$
(7)

Cross-stitch mechanism

In multi-task models, accurately determining the relationships between tasks is paramount for optimizing performance. By establishing and leveraging these relationships, the model can facilitate the transfer of valuable knowledge from one task to another, thereby enhancing learning and generalization capabilities. This enables more efficient utilization of the available data and facilitates exploration of the interconnections between tasks. As a result, the model can achieve improved performance and accelerated learning in multi-task settings. In contrast, an erroneous configuration of task relationships can have detrimental effects, such as impeding knowledge transfer and diminishing prediction performance. Therefore, this paper employs the cross-stitch algorithm8 to explore the relationships between multi-tasks. By leveraging the capabilities of the cross-stitch subnetwork, the model can effectively uncover and establish the appropriate relationships between the tasks, promoting effective knowledge transfer and enhancing prediction performance.

The cross stitch relies on the cross-stitch block to decide how much sharing is needed. The cross-stitch operation is defined in Eq. (8).

$$\left[{\overline{t} }_{1} {\overline{t} }_{2}\right]=\left[\begin{array}{cc}{r}_{11}& {r}_{12}\\ {r}_{21}& {r}_{22}\end{array}\right] [{t}_{1} {t}_{2}]$$
(8)

Here, \({t}_{1}, {t}_{2}\) represent the input tasks representation, while \({\overline{t} }_{1}, {\overline{t} }_{2}\) denote the output task relationships representation for the respective tasks. The values of \({r}_{ij}\) indicate the learned relationships between tasks \(i\) and \(j\).

The cross-stitch layer can be summarized as shown in Eq. (9).

$$( {\overline{t} }_{1}, {\overline{t} }_{2})=cross\_stitch({t}_{1}, {t}_{2})$$
(9)

In this paper, output vectors from cross stitch are separately passed through fully connected layers to generate new tasks representations, \({t}_{11}, {and t}_{22}\), respectively.

Following that, another cross-stitch layer is applied, as depicted in Eq. (10).

$$( {\overline{t} }_{11}, {\overline{t} }_{22})=cross\_stitch({t}_{11}, {t}_{22})$$
(10)

Finally, the input and output vectors of the cross-stitch subnetwork are concatenated, as indicated in Eqs. (11) and (12). Figure 2 shows the layers of cross-stitch applied in this paper.

Figure 2
figure 2

The cross-stitch mechanism process which involves learning relation weights between two tasks. Subsequently, a dot product process is performed between these weights and the corresponding inputs, followed by a dense layer operation. Finally, the output of the dense layer undergoes a dot product with additional cross-stitch weights to yield the learned relation outputs.

$${t}_{final1}=concat({\overline{t} }_{11},{t}_{1})$$
(11)
$${t}_{final2}=concat({\overline{t} }_{22},{t}_{2})$$
(12)

MutliSyn

The proposed multi-task model focuses on predicting the synergistic effects of drug combinations. The model generates both a synergy score and a synergy class label indicating whether the combination is synergistic or antagonistic. As shown in Fig. 3, the model can be divided into three main parts. The first part deals with the features of the individual drugs, the second part handles the features of the cancer cell lines, and the third part combines the drug and cell line features to produce the synergy score and class label simultaneously. The three subsequent subsections will discuss these three parts respectively in detail, highlighting their respective roles and functionalities.

Figure 3
figure 3

The structure of the proposed MultSyn model. The first part learned the drug features which the upper part extracts the chemical features from smiles and the lower part extract the drug graph features from molecular structure. The theses two features are concatenated to output the drug features. the second part, learned the cell line feature from the RNA-expression and the impact of drug-drug interaction features. In the third part the drugs feature and cell line feature are concatenated are fed to an attention and cross-stitch mechanism to optimized the target tasks.

Drug feature representation

Two methods are employed to extract drug features: chemical structure-based extraction and graph embedding-based extraction.

In the chemical structure-based method, the SMILE representations of drugs are obtained from the PubChem website. These representations are then transformed into molecular feature vectors using the chemical informatics package from DeepChem9, specifically utilizing the "Mordred" features10. This process yields an array consisting of 1613 numeric features across 43 different categories, providing a one-dimensional molecular description for each drug. Non-numerical attributes and features with zero variance are removed through pre-processing, resulting in 394 descriptive features for each drug. The resulting features are then normalized using the tanh-norm method.

Next, the features are fed into a subnetwork that includes three fully connected layers, interconnected by an activation function and a dropout layer. The Rectified Linear Unit (ReLU) activation function is applied after each fully connected layer. A dropout rate of 0.2 is applied after the first and second fully connected layers, while the final fully connected layer does not employ dropout. To prevent overfitting, each fully connected layer undergoes a regularization technique applied to the weight and output of the layer.

In the graph embedding-based method, the SMILE representations of drugs are transformed into graphs, where each atom represents a node and the chemical bonds between atoms represent edges. Specifically, we improved a multi-view graph embedding technique to extract four views of the graphs. For each view, we initialize the output vector as a fixed-length vector filled with zeros to ensure consistency across views. Then, the vector is modified based on the output of each view. Finally, the four view vectors are concatenated into a single vector. Figure 4 shows an example for the four views. These views are discussed below in more detail:

  1. I.

    The first perspective is first proposed in this paper is focused on the labeling of nodes in a graph. Initially, the unique nodes in the graph are identified and assigned a distinct numerical value in order to differentiate them. Then, the view is proposed as a vector that represents the corresponding numeric value of each node in the graph.

  2. II.

    In the second view, the information from labels associated with each edge in the graph is utilized. First, the unique nodes in graph is utilized. Then, all paths between each node and others, including self-paths (loops) are considered. The occurrence number of each path is calculated, resulting in the final vector that contains the occurrence numbers of all paths. The application of this view to extract features is done as proposed in11.

  3. III.

    The third view focuses on extracting the density of the neighborhood of each atom by examining the shortest path length between atoms. All possible paths between all nodes are considered and the length for each path is calculated. The occurrence number of each unique path length is output in the final view vector. This view was also proposed in11.

  4. IV.

    In the fourth view, the information from labels in all possible paths between all nodes in the graph is utilized. Similar to the second view, we consider all possible paths, but instead of counting the occurrence number of each unique length, the occurrence number of each unique label path is counted. The resulting view vector is constructed based on this information. This view feature extraction was proposed in12.

Figure 4
figure 4

An example of multi-view graph embedding. Each section executes each view of the above graph results in the four views of graph.

The four views, comprising the multi-view graph embedding, are concatenated into a single vector, which is then normalized using the tanh-norm method. This resultant vector is then fed into a fully connected subnetwork like the structure fully connected subnetwork. The output vector from the fully connected structure subnetwork is concatenated with the output vector from the fully connected multi-view graph network. The resulting vector represents the final drug features.

Cell line features representation

The extraction of cell line features can be divided into two sections. The first section deals with the initial representation of cell line features, while the second section focuses on how the interaction features of drugs can affect the cell line features.

In the first section, the cell line features are represented using gene expression dataset, which typically consists of over 50,000 gene features per cell line. To address the challenge of high dimensionality, we leverage the LINCS project13. This project identifies a subset of crucial genes, known as the 'Landmark gene set', that capture approximately 80% of cell line characteristics based on connectivity map data. These genes are consisting of 1000 carefully selected genes.

To obtain the initial cell line vector, the genes that intersect between the gene expression data and the Landmark gene set are selected. This results in the selection of 875 genes that effectively represent the cell line. These genes are then normalized using the tanh-norm method and fed into a fully connected subnetwork to extract the cell line features.

In the second section, the aim is to explore how drug interaction features affect cell line features through a drug-drug interaction (DDI) network. Initially, drugs are represented using a chemical structure-based extraction method. Two parallel fully connected subnetworks are then trained, each dedicated to one drug. The outputs from these subnetworks are concatenated into a single vector, which is subsequently fed into another fully connected subnetwork to generate the final class label. The DDI network is trained using the DrugBank drug-drug interaction dataset14,15. This dataset comprises 1706 unique drugs, resulting in 191,808 drug interaction pairs across 86 interaction class types. The dataset is split into a 9:1 ratio for training and validation data. Table 1 presents the validation metrics for the drug-drug interaction dataset.

Table 1 The evaluation metrics of DDI model.

From the learned DDI network, the drug interaction features are extracted from the penultimate layer of the fully connected subnetwork. These features are then normalized using the tanh-norm method and fed into a fully connected network to obtain the final representation of drug interaction features.

To integrate the cell line features and drug interaction features, an attention mechanism discussed in the previous section (“Attention mechanism”) is applied. The cell line features serve as the query for the attention mechanism, while the drug interaction features act as the key and value inputs. The output of the attention mechanism represents the updated cell line features, which are influenced by the weights assigned to the drug interaction features.

Multi-output tasks

This section focuses on combining the drug and cell line features to generate the synergy score and synergy class label simultaneously.

To begin, the drug features and cell line features are concatenated and passed through an attention mechanism. As discussed previously, the attention mechanism enables different weighting of the combined features for each task. Consequently, the output of the attention mechanism produces two weighted feature representations, one for each task. These outputs are then concatenated with the respective input attentions for each task.

The resulting combined features are subsequently inputted into a cross-stitch mechanism described in previous “Cross-stitch mechanism”. This network learns the relationship between the synergy score and synergy class label tasks. The cross-stitch network produces two outputs, representing the feature representations for the two task relationships. Additionally, the outputs of the cross-stitch network are concatenated with the inputs of the cross-stitch network for each task.

Finally, the two sets of feature representations are passed through separate fully connected networks, with one network dedicated to each task. These networks output the synergy score and synergy class label, respectively.

Experimental results

In this section, we evaluate the performance of the MutliSyn using one of the O'Neil challenging datasets16. We begin by providing an overview of the dataset and discuss the evaluation metrics in the first and second subsections respectively. Next, we present the model parameter setting and discuss the experimental results of the MutliSyn on the target dataset in the third and fourth subsections respectively.

Dataset characteristics

The drug combination dataset used in this study is derived from the O'Neil dataset, a large-scale published cancer screening dataset, and serves as a benchmark set. The dataset consists of information on drug combinations, including the names of the two drugs being combined and the specific cancer cell line targeted for treatment. It comprises 38 unique drugs that are combined to form a total of 23,052 drug combinations. These combinations are tested on 39 different cancer cell lines, covering seven cancer tissue types: skin, ovary, lung, large intestine, breast, prostate, and pleura.

To determine the impact of a pharmacological combination (synergistic or antagonistic), the Loewe Additivity score17 is calculated. This score is derived from a 4 × 4 dose–response matrix and assumes no interaction between a drug and itself. The range of synergy scores in the O'Neil dataset spans from −326.464 to 179.1233. It's worth noting that the dataset may contain multiple evaluations of the same drug combination in the initial data. To address this, the average of the replicate scores is used as the target synergy score for each unique drug pair and cell line combination resulting in 22,737 samples.

For the classification task, drug synergy is treated as a binary classification problem. Drug combinations with a synergy score greater than 30 are classified as synergistic, while combinations with a score less than 0 are considered antagonistic. Combinations with scores between 0 and 30 are excluded from the training set as they are considered additive combinations, resulting in a balanced sample distribution for the classification task. However, in this paper, removing samples in the classification task also leads to the removal of the same samples in the regression task. To mitigate this issue, a three-class labeling approach is used for the classification model, where synergy scores above 30 are assigned to the synergetic class, scores below 0 are assigned to the antagonistic class, and the remaining scores fall into the additive class. This introduces an imbalance in the sample distribution across the three classes, which may impact the classification training. The results of both the synergetic and antagonistic classes are reported and compared with other related works.

To ensure unbiased evaluation, the dataset is randomly split into five cross-fold validations using four distinct methods. The first method guarantees that each drug-drug combination pair appears in only one-fold. The second method ensures that each cell line is exclusively assigned to one-fold. The third and fourth methods ensure that each fold contains unique first drugs and second drugs, respectively. Additionally, the dataset is concatenated to create a reversed order for input drug pairs.

During each cross-fold validation, one-fold serves as the testing dataset, while the remaining four folds constitute the training dataset for the model. Subsequently, the mean synergy score and class labels prediction scores are computed across all five training runs and reported as the final results.

Evaluation metrics

In evaluating the MutliSyn, various regression metrics are utilized. The first metric is the mean squared error (\({\text{MSE}}\)), which quantifies the squared difference between the predicted and actual scores. Additionally, the root mean squared error (\({\text{RMSE}}\)) is computed, representing the square root of the \({\text{MSE}}\). Furthermore, the 95% confidence interval for the MSE is calculated and proposed. Another important metric is the Pearson correlation coefficient (\({{\text{CC}}}_{{\text{P}}}\)), which evaluates the consistency between the predicted scores and the actual scores. Given the adoption of a five folds cross-validation approach, the mean and standard deviation of each evaluation metric are computed across the five folds to ensure the robustness and reliability of the results.

When evaluating the classification task, several metrics are employed to assess the performance of the MutliSyn. Firstly, accuracy is utilized to measure the proportion of correct predictions made by the model. However, due to the presence of imbalanced data in the test dataset, where negative samples dominate, accuracy alone may not provide a comprehensive understanding of the classifier's performance. Hence, precision is employed to evaluate how accurately the model predicts the synergetic class. Additionally, Cohen's Kappa is employed, which compares the classifier's performance to that of a classifier that randomly guesses based on the class frequencies, providing a measure of how much better the model is performing.

Moreover, two essential metrics are employed, particularly effective for imbalanced classification tasks with limited samples in the minority class. These metrics are the receiver operating characteristic curve (ROC-AUC) and the area under the precision-recall curve (PR-AUC). The ROC-AUC measures the classifier's ability to distinguish between positive and negative samples across various threshold settings, while the PR-AUC focuses on the precision-recall trade-off, which is especially important when dealing with imbalanced data.

Global model setting

To fully define the MutliSyn, several global parameters are specified in Table 2. The hidden units for the fully connected subnetwork handling drug features are set to [258, 128] for the multi-view graph embedding and [1024, 512,256] for the chemical structure-based features. The hidden units for the cell line fully connected subnetwork are defined as [512, 265, 128]. For the prediction subnetwork, the hidden units are set as [128, 64] for both tasks. Additionally, the attention mechanism employed in the model has its output size set to match the input size, and the number of attention heads is set to 4.

Table 2 Hyperparameter settings of MutliSyn model.

During training, the model adopts a learning rate of 0.00001, a batch size of 64, and runs for 1000 iterations. A dropout rate of 0.2 is also applied. The model optimization is performed using the AdamW optimizer18, which is a variant of the Adam optimizer incorporating weight decay into the optimization process. Weight decay is a regularization technique that penalizes large weights during training, leading to simpler and more generalizable models. This regularization helps prevent overfitting and enhances the model's ability to predict new data. The weight decay value employed in this paper is 0.025.

Results and discussion

Table 3 provides a summary of the experimental results, comparing the MutliSyn with others on the regression task with leaving each drug-drug combination pair appears in one-fold. The table shows the performance of the models in predicting synergy scores using various regression evaluation metrics. Notably, the MutliSyn outperforms the other methods, achieving the lowest \({\text{MSE}}\) and \({\text{RMSE}}\), and the highest \({{\text{CC}}}_{{\text{P}}}\).

Table 3 Comparison of synergy score prediction results with other methods with leaving drugs combination out.

The MutliSyn achieves an \({\text{MSE}}\) of 219.14, accompanied by a confidence interval ranging from 170.00 to 268.29. In comparison to PRODeepSyn, AudnnSynergy, and DeepSynergy, the MutliSyn demonstrates superior performance by achieving lower \({\text{MSE}}\) values, with improvements of −10.35, −21.98, and −36.35, respectively. These results indicate that the MutliSyn consistently delivers more accurate predictions, with significantly reduced prediction errors compared to the existing models.

Additionally, the MutliSyn exhibits an enhancement of 1.0% in terms of \({{\text{CC}}}_{{\text{P}}}\) compared to PRODeepSyn, 2.0% compared to AudnnSynergy, and 3.0% compared to DeepSynergy. This improvement emphasizes the effectiveness of the MutliSyn in accurately predicting synergy scores, effectively capturing the intricate relationship between drug combinations and their synergistic effects.

MutliSyn exhibits substantial enhancements across all prediction metrics for drug synergy when compared to machine learning algorithms, specifically Elastic-Net and XGBoost.

In Table 4, the prediction of synergy score depends on leaving cell line out. MutliSyn exhibits the \({\text{MSE}}\) and \({\text{RMSE}}\) among all methods, with values of 405.74 ± 104.32 and 19.96 ± 2.73, respectively. These metrics indicate the model's exceptional predictive accuracy and precision. Furthermore, MutliSyn demonstrates acceptable \({{\text{CC}}}_{{\text{P}}}\) of 0.50 ± 0.07, showcasing its robust ability to capture underlying patterns in the data and align well with ground truth labels. Notably, while PRODeepSyn achieves a slightly higher \({{\text{CC}}}_{{\text{P}}}\), the MutliSyn closely follows, positioning it as a competitive and promising model for synergy prediction. This places MutliSyn as the second-highest performer.

Table 4 Comparison of synergy score prediction results with other methods with leaving cell line out.

In Tables 5 and 6, MutliSyn demonstrates outstanding performance, yielding the lowest \({\text{MSE}}\) and \({\text{RMSE}}\) values and highest \({{\text{CC}}}_{{\text{P}}}\) in comparison to other models, including PRODeepSyn, AudnnSynergy, DeepSynergy, XGBoost, and Elastic-Net.

Table 5 Comparison of synergy score prediction results with other methods with leaving first drug out.
Table 6 Comparison of synergy score prediction results with other methods with leaving second drug out.

When considering the classification of synergistic drug combinations, Table 7 presents the performance model for predicting the synergy class label in comparison to other methods with drug-drug combination pair method. Although the MutliSyn does not achieve the highest accuracy score among the other methods, this is due to the fact that it does not eliminate the additive class, as discussed in “Dataset characteristics”. Furthermore, accuracy alone cannot be considered a fair metric for imbalanced classification predictions, as previously mentioned.

Table 7 Comparison of synergy class labels prediction results with other methods with leaving drug combination pair out.

However, when evaluating other metrics such as Precision, ROC-AUC, and PR-AUC, the MutliSyn outperforms all related methods. While MutliSyn may not achieve the highest Kappa metric, it consistently demonstrates substantial agreement. Consequently, the MutliSyn exhibits exceptional performance in the task of synergistic drug combination classification with new drug-drug combination pairs.

Table 8 shows the classification performance of MutliSyn compared to other models with leaving cell line out method. MutliSyn achieves a high accuracy, Precision, and PR-AUC compared to other models, including PRODeepSyn, AudnnSynergy, DeepSynergy, XGBoost, and Elastic-Net. However, the variations observed in Kappa and ROC-AUC metrics for MutliSyn across five folds highlight potential challenges in maintaining consistent performance, particularly in the context of imbalanced classes. Specially, MultiSyn has been trained with additive class as discussed in “Dataset characteristics” which impacts the classification results.

Table 8 Comparison of synergy class labels prediction results with other methods with leaving cancer cell line out.

In Tables 9 and 10, MutliSyn achieves the highest ROC-AUC, PR-AUC, and Precision among the compared models, indicating its overall robustness. While MutliSyn still maintains good accuracy with the second highest accuracy value and an acceptable Kappa.

Table 9 Comparison of synergy class labels prediction results with other methods with leaving first drug out.
Table 10 Comparison of synergy class labels prediction results with other methods with leaving second drug out.

Furthermore, to visualize the results of the MutliSyn in comparison to other methods, Fig. 5 presents the Precision values for each cancer cell line, comparing the MutliSyn to PRODeepSyn. As depicted in Fig. 5, the MutliSyn demonstrates a significant improvement in precision across all cancer cell lines.

Figure 5
figure 5

Precision Score Performance comparison between the proposed model and PRODeepSyn.

Moreover, Fig. 6 displays a comparison between the MutliSyn and AudnnSynergy in terms of \({\text{MSE}}\) across all cancer cell lines. The MutliSyn achieves a reduction in \({\text{MSE}}\) for 21 out of 39 cancer cell lines, while it performs almost similarly to AudnnSynergy for 10 out of 39 cell lines. However, AudnnSynergy outperforms the MutliSyn in terms of \({\text{MSE}}\) for 8 out of 39 cell lines. Notably, the greatest improvement observed with the MutliSyn, compared to AudnnSynergy, is in the 'NCH23' cell line, resulting in a reduction of almost 270 points in MSE. Conversely, AudnnSynergy exhibits the highest improvement compared to the MutliSyn in the ‘UWB1289’ cell line, with a reduction of almost 89 points in MSE. AudnnSynergy outperforms our model significantly, particularly for both 'UWB1289' and 'UWB1289BRCA1.' This is noteworthy because our model employed the same gene expression data for 'UWB1289BRCA1' as for 'UWB1289,' recognizing the latter as a variant of the former.

Figure 6
figure 6

MSE Score Performance comparison between the proposed model and AudnnSynergy.

To delve deeper into the analysis, we conducted an ablation study on various MutliSyn model configurations by selectively removing specific components of its architecture. The study focuses on the regression values of synergy scores for drug-drug combination pairs. The results of this ablation study are presented in Table 11. In the first section, we remove the cross-stitch mechanism from the MutliSyn model, connecting the output of the concatenated attention mechanism directly to the prediction network. Moving to the second section, referred to as feature-attention, we eliminate the attention mechanism responsible for customizing the concatenated drugs and cell line features. Instead, the concatenated features are linked to two fully connected layers, serving as inputs to the cross-stitch mechanism. The third section, interaction-attention, involves the removal of the attention mechanism between cell line features and DDI features. Consequently, DDI features are excluded, and cell line features are directly fed into the concatenated features stage. Lastly, the unenhanced graph views involve utilizing only views numbered {2, 3, 4}, excluding view number {1} proposed in that paper.

Table 11 The comparison results of the MultiSyn ablation study at synergy score.

Finally, Table 12 displays the highest predicted synergy scores for drugs combination not presented in O’Neil dataset, along with the corresponding actual synergy scores for the MutliSyn*.

Table 12 The top predicted synergistic drugs combination for each cancer cell.

Based on the analysis conducted, the MutliSyn demonstrates efficacy in predicting target synergy scores and classes for drug combinations. It showcases minimal prediction errors, a strong correlation between actual and predicted scores, and outperforms the other compared methods. Moreover, the model utilizes a multi-task deep learning approach to simultaneously predict outputs, which contributes to its enhanced performance.

Conclusion

In this paper, a novel multi-task deep learning model has been proposed for the simultaneous prediction of both the synergy score and synergy class label for drug combinations. The model incorporates two different representations of drug features: chemical structure-based features and multi-view graph features. These features are combined into a single feature vector to represent the drug features. Additionally, the gene expression of the cancer cell line is utilized as an initial feature of the cell line. Furthermore, drug-drug interaction features have been extracted and integrated into an attention mechanism along with the cell line features, enabling the model to learn the impact of drug-drug features on the cell line and generate new cell line features. The drug and cell line features are concatenated and processed through an attention mechanism, which optimizes the concatenated features into two distinct representations for the two target output tasks. These task representations are then inputted into a cross-stitch algorithm to facilitate knowledge transfer between the tasks. Finally, each task representation has passed through a fully connected subnetwork to generate the desired target outputs.

The MutliSyn has been evaluated using the O'Neil cancer dataset, which contains information on drug combinations and cell lines. The results of the synergy score prediction indicate that the model achieves low MSE and RMSE, as well as high \({{\text{CC}}}_{{\text{P}}}\). Moreover, for the synergy class label prediction, the model demonstrates high Precision, ROC-AUC, Kappa, and PR-AUC values when compared to other deep learning models.