A Review on Graph Neural Networks for Predicting Synergistic Drug Combinations

Combinational therapies with synergistic effects provide a powerful treatment strategy for tackling complex diseases, particularly malignancies. Discovering these synergistic combinations, often involving various compounds and structures, necessitates exploring a vast array of compound pairings. However, practical constraints such as cost, feasibility, and complexity hinder exhaustive in vivo and in vitro experimentation. In recent years, machine learning methods have made significant inroads in pharmacology. Among these, Graph Neural Networks (GNNs) have gained increasing attention in drug discovery due to their ability to represent complex molecular structures as networks, capture vital structural information, and seamlessly handle diverse data types. This review aims to provide a comprehensive overview of various GNN models developed for predicting effective drug combinations, examining the limitations and strengths of different models, and comparing their predictive performance. Additionally, we discuss the databases used for drug synergism prediction and the extraction of drug-related information as predictive features. By summarizing the state-of-the-art GNN-driven drug combination prediction, this review aims to offer valuable insights into the promising field of computational pharmacotherapy.


Introduction
Combination therapy, a treatment modality that combines two or more therapeutic agents, has increasingly become the preferred approach for many human diseases, especially those caused by alterations in multiple genes or pathways, such as cancer.The integration of anti-cancer drugs enhances efficacy compared to using a single therapy, as it targets different key pathways in a synergistic or additive manner.By combining drugs with distinct mechanisms of action, therapeutic effectiveness can be enhanced, allowing for lowerdose prescriptions, and reducing the potential risks of side effects and toxicity.Clinical evidence consistently demonstrates the utility of combining different therapeutics to improve treatment efficacy in various cancer types, such as breast cancer [1], lung cancer [2], and ovarian cancer [3], among others.vector or matrix data, GNNs excel at capturing intricate relationships and dependencies between entities within a graph.At the core of GNNs is the fundamental concept of learning representations for each node by aggregating information from its neighboring nodes.These representations are then leveraged to perform prediction and classification tasks.By effectively encapsulating both local and global contexts within a graph, GNNs enable the modeling of complex interactions and dependencies, making them highly versatile across a wide range of applications [11].In the following, we outline core mechanisms and commonly used architectures of GNNs.

GNN core mechanisms
The section provides a brief overview of the fundamental components that enable GNNs to process graphstructured data.These core mechanisms are essential for understanding how GNNs capture relationships and dependencies within graphs and form the foundation for various GNN architectures and algorithms.
Message passing function is a key mechanism in GNNs that updates node embeddings during each iteration.It involves two main steps: aggregating messages and updating node embeddings.During message passing, each node gathers information from its neighboring nodes and combines it into a message.This message represents essential details from nearby nodes and edges.Subsequently, it is used to update the node's embedding, represented as ℎ  () of each node  in the graph.The update process is denoted as is updated based on information gathered from its neighborhood ().This update can be represented as ℎ  (+1) =  () (ℎ  () ,  () ({ℎ  () , ∀ ∈ ()})) ℎ  (+1) =  () (ℎ  () ,  () ), where  and  are arbitrary differentiable functions (typically neural networks).The message  () is the aggregated information from 's graph neighborhood.It refines the node's representation by integrating the aggregated message with the node's previous embedding.The aggregation step is responsible for merging information from neighboring nodes, while the update step steadily improves node embeddings across layers.This iterative process enables GNNs to effectively capture intricate relationships and dependencies within the graph [26].
Aggregation function is responsible for combining information from a node's neighboring nodes to produce a single vector representation.Traditional aggregation methods, such as summing or averaging over the neighbor embeddings, may not fully capture the complexity of the graph structure and relationships between nodes.However, more sophisticated aggregation techniques can be employed to improve GNN performance.One approach to defining an aggregation function is through the concept of permutation invariant neural networks.This approach treats the set of neighbor embeddings [ℎ  , ∀ ∈ ()] as an unordered set and aims to map this set to a single vector representation  () .A universal set function approximator, as shown by Zaheer et al. [27], is an aggregation function that can approximate any permutation-invariant function mapping a set of embeddings to a single embedding.This can be represented as  () =   ( ∑   (ℎ  ) ∈() ) where   and   are multi-layer perceptron parameterized by trainable parameters  and , respectively.Node-level aggregation is a common approach that combines information from neighboring nodes to compute representations for individual nodes.This method treats nodes as unstructured entities and does not explicitly consider the graph structure during aggregation.On the other hand, graph-level aggregation takes into account the local structural information during the aggregation process.It goes beyond simple node-level aggregation and considers the relationships and connectivity between nodes to perform higher-order graph aggregation.This results in a more comprehensive and structured representation of the graph [26,28,29].
Node or graph representations in GNNs are learned by aggregating information from neighboring nodes.
Each node in the graph updates its representation by incorporating information from its neighbors, allowing it to capture both local and global dependencies.The idea behind node representations in GNNs is to capture the influence and interactions between nodes, enabling more accurate predictions and classifications.For example, in a social network, node representations can capture the preferences, behaviors, and connections of individual users, which can be useful for tasks like predicting user interests or detecting communities.
Graph representations, on the other hand, summarize the overall properties and structural patterns of the entire graph.They aim to capture the global characteristics and relationships between nodes, providing a holistic view of the graph.Graph representations consider the collective information from individual node representations and encode it into a single high-dimensional vector.This representation can be used to perform tasks such as graph classification, where the goal is to classify the entire graph based on its structural properties.For instance, in a citation network, graph representations can capture the citation patterns, relationships between papers, and thematic similarities, aiding in tasks like topic modeling or paper recommendation.The underlying idea behind node and graph representations in GNNs is to leverage the power of neural networks to learn expressive and informative features from graph-structured data [30].
Attention mechanism in GNNs is a technique that allows the network to assign different importance weights to nodes or edges within a graph during the aggregation step.This mechanism enables the network to focus on the most relevant information and adaptively weigh the influence of different components of the graph.The basic idea behind the attention mechanism is to compute attention weights for each neighbor in the graph, which are used to weigh their contributions during the aggregation process.These attention weights are learned by the network and can be based on various factors such as the similarity, relevance, or importance of the neighbors.The attention weights reflect the importance or significance of each neighbor with respect to the current node being processed.In GNNs, attention can be applied in various ways, but a common approach is to compute the attention weights as a function of the node features and/or edge connections.This is typically done using trainable parameters such as weight matrices and attention vectors.
The attention weights are then used to compute a weighted sum or aggregation of the neighbor embeddings, where the weights determine the contribution of each neighbor to the final aggregated representation [26,31].

GNN architectures
Notable advancements of graph neural networks in recent years have resulted in the development of various architectures that tackle different aspects of graph-structured data.Below, we provide a summary of some main GNN architectures that have found applications in drug combination prediction.

Graph Convolutional Networks (GCNs) are a variant of Convolutional Neural Networks (CNNs)
designed to operate on graph-structured data.GCNs leverage both the node features and the graph structure to learn a latent representation that captures the underlying relationships and dependencies within the graph.In GCNs, the input consists of a node feature matrix , which contains the features of each node, and an adjacency matrix , which encodes the relationships or similarities between pairs of nodes.The goal is to learn a latent representation  that preserves important information from both the node features and the graph structure.The key idea behind GCNs is to propagate and aggregate information from neighboring nodes to update the node representations.By considering the local neighborhood information of each node, GCNs capture both the node features and the relationships among nodes [32,33].
Graph AutoEncoders (GAEs) are unsupervised learning frameworks used to learn low-dimensional representations of graph-structured data.The core idea behind GAEs is to encode the graph information into a compact representation and then reconstruct the original graph from this representation.In GAEs, a GCN is typically used as an encoder to transform the input graph into a latent representation.The encoder takes into account the node features and the adjacency matrix of the graph to learn informative node representations.The latent representation, denoted as , is obtained from the GCN encoder.The goal of GAEs is to capture the inherent relationships and dependencies within the graph, allowing for meaningful analysis and prediction tasks [33,34].
Graph Attention Networks (GATs) are neural networks designed to operate on graph-structured data by leveraging the concept of attention.GATs assign different weights, called attention coefficients, to the neighboring nodes during the process of central node information aggregation.In GATs, each node in the graph undergoes linear transformations and is mapped to a learnable vector using a single-layer neural network called the mapping function   .The attention coefficient   represents the influence of node j on node i and is calculated based on the transformed node representations.The final output embedding of the central node is obtained by taking a weighted summation of the representations of its neighboring nodes.
The weights for the summation are determined by the attention coefficients.This allows the GAT to focus on the most relevant and informative neighboring nodes for each central node [35,36].
Graph SAmple and aggreGatE (GraphSAGE) is a general framework for inductive node embedding, which aims to learn representations for nodes in a graph.Unlike traditional approaches that rely solely on the graph's structure, GraphSAGE uses both the topological structure and the node features to generate embeddings that generalize to unseen nodes.The core idea behind GraphSAGE is to leverage aggregator functions instead of training individual embedding vectors for each node.In this way, GraphSAGE can effectively capture and utilize the collective knowledge of a node's local neighborhood [37].
Overall, these models collectively contribute to the advancement of graph-based learning in pharmacology by providing insights into complex drug networks and facilitating the discovery of effective drug combinations [21,38,39].
Graph Regularization is a technique used in optimization problems to impose desired properties on solutions with respect to a graph structure.Graph regularization is closely related to Graph Neural Networks (GNNs) because both approaches deal with graph-structured data and aim to capture relationships and interactions between objects represented by nodes in the graph.GNNs use message passing and aggregation mechanisms to update node embeddings based on their graph neighborhoods, while graph regularization incorporates graph information into optimization problems to guide the solutions towards desired properties.Both methods leverage the inherent graph structure to improve the handling of complex relationships and interactions within the data.For instance, if we have knowledge that the signal should have sparsity (i.e., few non-zero values), we can introduce a regularization term that encourages sparsity in the solution [40].

Drug combination synergy prediction
A drug combination, as defined by the FDA [41] , involves the combination of two or more regulated components, such as drugs, devices, or biologics.These components are physically or chemically mixed to create a single entity.When multiple drugs are administered simultaneously, a synergistic drug combination occurs, resulting in a stronger therapeutic effect that surpasses the mere sum of their individual effects.In simpler terms, the combined impact of these drugs exceeds what would be expected by merely adding up their individual effects.On the other hand, an additive drug combination occurs when the combined effect of the drugs is equal to the sum of their individual effects.In this case, there is no enhancement or reduction in the overall effect when the drugs are used together.Conversely, an antagonistic drug combination exists when the combined effect of the drugs is lower than the sum of their individual effects.This happens when the drugs interfere with or counteract each other, resulting in a lower overall effect [42].In experiments conducted on cancer cell lines, researchers utilize the in vitro method such as cell culture to assess the impacts of different combinations of drugs on key aspects such as restraining tumor growth, promoting cancer cell apoptosis, and preventing metastasis.[43].In these studies, cancer cells are exposed to different concentrations of drug combinations and the collective effects they produce were analyzed.When drugs synergistically interact, they exhibit a more pronounced inhibition of cancer cell proliferation, or a heightened rate of cell death compared to their individual effects.Conversely, an antagonistic combination can diminish efficacy and potentially undermine the desired therapeutic outcome [44].In the following, we will explore quantitative measures of drug combination synergy.

Metrics of synergism in drug combinations
Determining whether a combination of compounds exhibits an interaction effect involves comparing the observed effects with what would be expected based on a non-interactive (additive) effect.To evaluate the effects of drug combinations and synergy, various metrics are employed.These metrics, such as LOEWE, Bliss, ZIP, and HSA, provide measurements that help assess the combined effects of compounds.By utilizing these frequently used measurements, researchers can determine if the observed effects of a compound combination surpass what would be expected from an additive effect alone.This allows for a comprehensive evaluation of the potential interaction and synergy between different compounds in order to optimize drug combinations for enhanced therapeutic outcomes.Described below are some commonly used metrics that facilitate this evaluation process [45].
Loewe: Loewe Additivity, defined by Loewe in 1926, is based on the principle of sham combination which assumes no interaction effect when a compound is combined with itself [46].It is a dose-effect-based concept that is widely used in pharmacology and toxicology.In pharmacology, a dose-response curve is a graphical representation of the relationship between the dose of a drug or compound and its effect.The curve shows how the effect changes as the dose increases.Loewe additivity assumes that the dose-response curves for two compounds are parallel, meaning that they have the same shape and slope.This allows for the calculation of an additive effect, which is simply the sum of the individual effects at a given dose.[47].In other words we have following equation: The mentioned combination involves fractions of individual doses that achieve the effect separately.When these fractions are added together, they sum up to one and result in the same effect [48].To make this concept clearer, let's imagine two substances: compound A and compound B. Each of these compounds, when administered alone at specific doses ( 1 and  2 respectively), produces a desired effect.The idea is that if you take a fraction ( ) of dose  1 from compound A and a fraction ( ) of dose  2 from compound B, such that their sum equals 1, then this combination should yield the same effect as taking dose  1 from compound B and dose  2 from compound A. For instance, when we have two compounds and their fractions (x₁/X₁) and (x₂/X₂) satisfy the condition.
then we consider the effect synergistic.This means that the combined effect of these compounds is greater than what would be expected if their effects were simply additive.
In cases where ∑     ∈ [1,𝑛] < 1 this synergistic relationship becomes apparent.This indicates that compound A's impact on the combined effect is more pronounced than that of compound B, despite A being administered at a smaller dose.Conversely, if ∑     ∈ [1,𝑛] > 1 the interaction is considered antagonistic, indicating an overall effect less than expected.

Bliss:
The Bliss score refers to the concept of Bliss independence [49], which is a concept used in pharmacology to assess whether the combined effects of multiple compounds are additive, synergistic, or antagonistic.The main assumption of the Bliss independence criterion is that two or more substances act independently from one another.The Bliss independence criterion is mathematically expressed through the Bliss equation.
Loewe additivity is suitable when the drugs have shared targets, while Bliss independence is more appropriate when each drug targets a distinct pathway [50].For a simplified example with two compounds (A and B), the Bliss equation is: Where   and   represents the effect of drug A at dose  and drug B at dose , respectively.  represents the combined effect of drugs A and B at doses  and .If the combined effect   matches the calculated value from the equation, the compounds are acting independently.When drug A and drug B are used together, the effect of drug B is modified by the proportion (1 −   ) that is "spared" by drug A. By summing up these two terms (  and (1 -  )  ), we get the expected combined effect   .
The above relation indicates that if the value of   is greater than   +   (1 −   ), then it signifies synergism.If the two values are equal, it suggests additivity, and otherwise, it implies antagonism [51].

ZIP:
The Zero Interaction Potency (ZIP) score is a valuable tool used to assess the synergistic or antagonistic effects of drug combinations.It combines the strengths of the Loewe and Bliss models, allowing for a systematic evaluation of various patterns of drug interaction.The ZIP score provides a numerical value ranging from -1 to 1, indicating the degree of synergy or antagonism observed in a drug combination.it is derived from the concept of zero interaction [52], which assumes that the potency of a drug's dose-response curve remains unaltered when combined with another drug [53].

HSA:
The Highest Single Agent (HSA) model, also known as Gaddum's non-interaction model [54], provides a simple approach to estimate the expected combination effect of multiple drugs .According to this model, the expected combination effect is determined by taking the difference between the combined response of the drugs  (,,,…,) and the maximum response observed among the individual drugs max (  ,   ,   , … ,   ).In other words, the HSA model assumes that the combination effect is equal to the highest response achieved by any single drug at the corresponding concentrations.The HSA model offers a straightforward way to estimate the expected outcome of drug combinations and serves as a baseline for assessing whether observed effects deviate from the additive expectation [55].

Supervised drug synergy prediction in cancer
Supervised anticancer drug synergy prediction, driven by machine learning and artificial intelligence, typically involves training models on two distinct types of datasets: 1) in vitro experiments conducted on various cell lines, evaluating the synergistic effects of different drug combinations at varying concentrations using diverse reference models (e.g., Loewe, Bliss, ZIP, HAS), and 2) clinical trial studies of drug combinations in patient populations, comprising information on clinical response, treatment outcomes, and adverse effects.
In the clinical trial dataset, the prediction task is often treated as a classification problem, where the goal is to determine positive versus negative clinical outcomes for specific drug combinations.On the other hand, the in vitro experiments yield continuous measures of synergism, which can be approached as either a regression or a classification problem after categorizing the synergy measures.
In classification tasks, the synergistic values of drugs on cell lines are grouped into either two categories (synergistic versus non-synergistic) or three categories (synergistic, additive, and antagonistic) by applying predefined thresholds to split the data.Nevertheless, establishing the most suitable threshold poses challenges and tends to differ for various synergy measures.For instance, DeepDDS [10] uses the threshold of zero to binarize Loewe measure with a score greater than or less than 0 indicating a synergistic or nonsynergistic effect, respectively.Zhang et al. [8], on the other hand, categorized Loewe and ZIP synergistic scores using quartiles.The highest quartile represents synergistic effects, and the lowest quartile represents antagonistic effects.Additionally, some other studies [7,56] consider Loewe synergy scores above 10 as synergistic and scores below 0 as antagonistic.Further, we will explore the challenges and significance associated with selecting an appropriate threshold.
Nonetheless, once the class labels have been determined, various machine learning algorithms, such as random forests [57], support vector machines [58], or neural networks [58], can be employed to perform the classification task.These models learn patterns and relationships from features encompassing diverse information about drugs and cell lines to make predictions.
Regression, on the other hand, focuses on predicting a quantitative measure of synergy for each drug combination on a particular cell line.Instead of discrete class labels, regression models estimate the degree or magnitude of synergy, providing continuous output values.Regression techniques, including linear regression [59], gradient boosting, or deep learning approaches [58], have been used to predict the synergy level of drug combinations.
Some of the commonly employed models for feature extraction from the key factors that contribute to predicting drug synergy will be discussed below.

Feature extraction
Before addressing the role of methods such as GAT, MLP and GCN in extracting valuable insights from the network to predict drug synergism, we discuss the two concepts of MLP and heterogeneous graph.MLP or Multilayer Perceptron is a type of artificial neural network that consists of several layers of interconnected nodes or neurons.A heterogeneous graph is also a graph in which the type of a series of nodes (edges) is different from other nodes (edges).
Feature learning is a crucial aspect of drug synergy prediction, which involves leveraging various computational techniques and domain knowledge.A heterogeneous graph serves as an intricate network that comprises distinct nodes representing diverse attributes, such as drug-related elements, including drugtarget proteins, drug-transporters, and drug-side effects, interconnected by edges representing their relationships.It has been frequently shown that incorporation of diverse types of features capturing complementary information about drugs will improve prediction performance of drug discovery applications [60].Therefore, it is important to incorporate an array of features associated with drugs and cell lines, including chemical structures, target proteins, gene expression data, and mutational profiles.
Methods such as the GAT, MLP, and GCN are adept at extracting valuable insights from these heterogeneous networks and embedding the characteristics of cell lines and drugs.By integrating various features, synergistic prediction models can capture the multifaceted aspects of drugs and cell lines, thereby advancing drug research and development through improved predictive accuracy [8,10,23,60,61].
Additionally, the employment of transformers and pre-trained models [62,63], alongside the utilization of propagative network architectures [64], is prevalent for the purpose of feature extraction.It is important to note that the size of the dataset and careful adjustment of model parameters are crucial when using these methods.

Review of GNN methods for predicting drug synergy
We conducted a comprehensive search of PubMed, Google Scholar, and Web of Science until July 2023, using the keywords 'graph', 'drug combination', and 'synergy' and screened retrieved articles with respect to their relevance to drug synergy predictions in cancers using GNNs.Overall, we identified 25 relevant articles within the timeframe of February 2020 -July 2023.We observed a sharp upward trend in the development of GNNs for drug synergy prediction (Figure 1a).Moreover, we collected machine learning studies related to drug synergy prediction from 2010 to 2023, not restricted solely to GNNs (supplementary Table 1).Interestingly, we observed that since the inception of GNNs in this field, their development for drug synergy prediction is becoming on par (and potentially even surpassing in the near future) the combined progress of all other machine learning methods, as depicted in Figure 1b.These trends underscore the significance and timeliness of our review in providing insights into this evolving landscape.

Drug combination prediction based on in vitro experiments
In this section, we offer an examination of research works centered on drug combination prediction using in vitro synergy experiments.Table 1 offers a comprehensive summary of these studies, outlining their individual merits and limitations.We organize these studies into two main sub-sections: classification and regression, as detailed in Table 1.

Classification methods
As detailed in Table 1, 16 studies have used classification to predict drug synergism.Out of them, 5 studies have also developed regression-based models which were covered in the next section.We grouped the remaining 11 studies based on their underlying GNN architecture namely GAT, GCN and GAE and summarized below:

GAT-based methods
In four different models, researchers have used the GAT to extract important features.GAT's attention mechanisms allow it to focus on relevant parts of a graph and increase model performance.For instance, in the case of DeepDDS [10], GAT is used to gather important information from the structure of drugs.
Similarly, SDCNet [82] applies GAT to get relevant details from the drug-cell line network, creating important drug features.Additionally, Zhang et al. [8] and Hu et al. [62] take a knowledge graph (KG) approach, creating graphs and using Graph Attention Networks to gather valuable insights from these graphs.Here, we'll delve into these models in more detail to provide a clear understanding of their methods.
The However, new drug combos could pose accuracy challenges.

GCN-based methods
GCN is used as a mechanism to extract meaningful features from complex relations in networks.Among the reviewed studies, the use of GCN works in drug-protein interaction networks or molecular structure [61,83,84].In various models, including those proposed by Hu et al. model [7], and the MPFFPSDC model [83], the GCN encoder's pivotal role is in contextualizing drug structures within networks.This enables the transformation of drug structures into embeddings in new spaces.In these models, the GCN is employed to extract higher-order neighbor feature representations for atoms in drug molecular structures.In the GraphSynergy model [84], a GCN is employed, specially tailored to understand the connections between drugs and disease modules within this network.MOOMIN [24] learns drug representations by encoding properties of compounds and sequence proteins into vertex features.
The framework of the DTSyn model consists of two paths: a fine-grained block and a coarse-grained block.
To use GCN, input chemical features are processed through GCN blocks and combined with gene

GAE and GCN encoder-based methods
The GCN Encoder is designed to learn node embeddings that capture both the structure of the graph and the attributes associated with its nodes.It excels at uncovering relationships between nodes and using these relationships for predictive tasks by integrating feature information and graph structure [85].On the other hand, the GAE acts as an autoencoder with the goal of creating a more condensed representation of the graph while also reconstructing the original adjacency matrix from the embeddings [86].This reconstruction helps in inferring missing connections and gaining a comprehensive understanding of the graph's connections.
In the GAECDS model [87], GAE encodes drug combination information using the adjacency matrix and drug features.The encoded latent features are then used to reconstruct the drug synergy graph and uncover novel relationships.Jiang et al. [85] applied a GCN encoder to process diverse networks encompassing drug-drug synergy, drug-target interactions, and protein-protein interactions.This encoder transforms drug nodes into new-space embeddings.The model examines 39 heterogeneous networks, generating embeddings via GCN encoding.Finally, using these embeddings and predictive models, drug synergy is forecasted.The GAECDS model consists of three key parts: a GAE, an MLP, and a CNN.The GAE encodes drug synergy graphs and decodes them to find new relationships.An MLP generates cell line features, while a CNN predicts drug synergy by combining drug and cell line features.
One of the classification studies based on Graph Regularization is which was proposed by LV et al. [64].
They collected antibiotic combinations and target information from the literature and described drug actions through network propagation and network proximity.The study focused on pairwise antibiotic combinations and quantified interactions based on the α-score.The model's goal was to predict synergistic antibiotic combinations by considering pharmacological similarity between drugs.The affinity matrix W was constructed to differentiate between pharmacologically similar drugs (potentially synergistic) and pharmacologically identical drugs (additive effect).
Table 1.Summary of drug combination prediction studies using GNNs.* treatment and diagnosis: treatment refers to the medical procedures, therapies, or interventions provided to patients in order to manage, alleviate, or cure their health conditions and diagnosis refers to the process of identifying and determining the specific health condition or disease that a patient is experiencing [80].
** Single-agent activity: an individual drug or therapeutic compound that is used independently to treat a specific condition or disease [74].

Regression methods
Various approaches address drug synergy prediction using regression methods.Categories include GAT, GCN, and GAE models, each enhancing performance with distinct models.

GAT-based methods
GAT is a mechanism that operates within the models to focus on important interactions and features, contributing to accurate synergy score predictions.In Model Numcharoenpinij et al. [91], GAT is employed in the GNN model based on the Message-Passing Neural Network (MPNN) framework.These weights guide the aggregation process, allowing the model to focus on critical interactions.In the Muthene [90], GAT creates meta-path-specific embeddings for end/central nodes by assigning weights to neighbor features based on attention mechanisms.In CGMS model [88], GAT is used within the Heterogeneous Graph Attention Network (HAN).HAN has three layers and employs a self-attention mechanism to capture important information and produce cell line embeddings.Each model leverages GAT uniquely within its architecture, highlighting its versatility in different scenarios.

GCN-based methods
GCNs excel at capturing complex interactions in graphical data, which are common in drug synergistic prediction models.In these models, GCNs are used to process molecular structures of drugs or knowledge graphs containing various entities and relationships.This ability to capture rich information from different networks makes GCNs a good choice for modeling complex biological relationships [8,61,63].The PRODeepSyn [61] model leverages the GCN to construct gene hidden states based on the PPI network.
Zhang et al. [56] emphasize the pivotal role of GCNs in extracting valuable information from the constructed KG.In the TranSynergy model [63], the GCN is utilized to extract important features from the drug's molecular graph structure.In the HypergraphSynergy model [23], GCN embeds drugs and cell lines; After forming a hypergraph based on drugs and cell lines, it learns and finally records the embedding of nodes.
To predict drug synergy, PRODeepSyn initially forms drug features using molecular fingerprints and descriptors.For cell line features, it combines gene expression, gene mutation, and interactions among gene products.GCN is applied to create gene hidden states from the PPI network, considering protein interactions.These states estimate the gene's evident state using omics data.Although these models are promising in predicting drug synergy, there are limitations that require further research and improvement for better performance, which we will discuss below.

Drug combination prediction based on clinical studies.
Five methods utilized clinical studies to construct datasets of synergistic drug combinations, all employing GCNs as their primary neural network approach (Table 1).
The MK-GNN model [80] is a deep learning approach designed to predict effective drug combinations for patient treatment.It utilizes multi-head attention to learn patient features from diagnosis and treatment procedure sequences.Additionally, it incorporates prior medical knowledge derived from electronic health record data, considering the relationship between diagnoses and medications.The model also employs a GCN to learn medication representation vectors, capturing drug knowledge from a formulated drug network.However, the model's generalization is limited due to variations in drug combination recommendations among different doctors and regions.To address this, future research aims to study feature invariance in drug combinations and enhance the algorithm's applicability in real clinical settings.
Chen et al. [70] proposed a novel computational pipeline called DCMGCN for predicting drug combinations.The pipeline integrates diverse drug-related information to learn low-dimensional representations of drugs from attributes and similarity networks.They identified that the drug-drug network had heterophily and sparseness, which could limit the effectiveness of the GCN.To address this, they introduced two modifications to GCN.The drug representations were then optimized using the modified GCN (MGCN) to predict drug combinations.By integrating various data types, including clinical data, DCMGCN becomes a powerful tool for drug discovery and repositioning, with potential for further extension by incorporating more heterogeneous information and experimental validation.ComboNet model [74] is designed to jointly learn drug-target interactions and drug-drug synergy.It comprises two components: a drug-target interaction module and a target-disease association module.This architecture enables the model to utilize data on drug-target interactions, single-agent antiviral activity, and available drug-drug combination datasets.The DTI network in ComboNet predicts likely targets for drugs, while the target-disease association network models how biological targets and structural features of molecules are related to antiviral activity and synergy.The model's strength lies in considering single-agent activity, which enhances the effectiveness of drug combination predictions against SarsCov-2.However, a limitation is the scarcity of training data for accurate drug synergy prediction.
MG-DDIS model [81] is an end-to-end multi-task learning framework based on a GCN for predicting DDIs and synergistic drug combinations.The model to capture important information from the molecular structures, the R-radius subgraph method is applied, producing a series of subgraphs for each drug.These subgraphs are then fed into the GCN encoder to learn a latent representation of drugs.The model is trained using a multi-task approach to simultaneously predict DDIs and synergistic drug combinations.Despite its success, the model's limitations include the possibility of adverse reactions arising from various factors unrelated to synergy, such as individual drug sensitivity and independent toxic properties of certain drugs.Across these models, the interesting aspect of using GCN lies in its ability to capture complex relationships and structural information from different types of data, such as molecular graphs, networks, and clinical information.

Evaluation of GNNs on in vitro datasets
In this section, we discuss the findings presented in Table 3, which includes the results of various drug combination prediction studies.By reviewing and analyzing these results, we aim to gain valuable insights into the challenges in studies and advances in this field.
Both the DeepDDS [10] and Hu et al. [62]  Conversely, in leave-one-cell-line-out evaluation, SDCNet excels.This is because SDCNet processes features individually for each cell line, considering the interaction type of medicinal compounds (synergistic or antagonistic).Overall, SDCNet's success is attributed to its specialized feature processing for different cell lines.The study [10] highlights an intriguing observation regarding the DeepDDS model's performance.It reveals that when the model's complexity increases and features become excessively dimensional, its performance can actually suffer.A comparison between DeepDDS and TranSynergy underscores this point.In TranSynergy, features are not only high-dimensional but also embedded using a transformer.On the Merck dataset, the DeepDDS model outperforms TranSynergy, emphasizing that overly complex models and extensive feature dimensions might not always yield improved results.Among other GAT-based models, the KGANSynergy [8] model which extracts the features of drugs and cell lines using the knowledge graph and based on attention, and compared to the GraphSynergy model [84], it has been able to perform better.
The MPFFPSDC model [83], which is based on the GCN approach, outperforms DeepDDS on the Merck dataset.While both models achieve almost the same results, MPFFPSDC demonstrates superior performance.This could be due to variations in how features are integrated for classification.Despite this difference, both models follow almost the same methods to extract features from drugs and cell lines.
DTSyn [7] extracts drug features using cell line data and known train's data labels.However, it's less accurate than other machine learning models for predicting drug synergy scores of drugs that it has not seen so far.MOOMIN's model [24] lacks a defined threshold for categorizing drug synergism.It employs random walk on a drug-target network and GCN to embed drugs and capture structural features.However, its performance is comparatively weaker due to the absence of cell line features and comprehensive drugrelated information, unlike other GCN-based models.GAECDS, a GCN-based model, classifies drug compound data from Drugcomb using a threshold of 0. While using a fixed threshold can introduce noise, GAECDS outperforms both the DeepDDS model and the GraphSynergy model, both of which also use the same dataset and threshold.This improved performance might stem from GAECDS's use of GAE on the drug-drug synergy network, which effectively distinguishes drug combinations in a new data space.
Using an attention-based approach and meta-path on a diverse graph of drug and cell line connections, the CGMS model [88] outperformed PRODeepSyn [61], TranSynergy, and DeepDDS.This suggests CGMS effectively predicted drug synergy, surpassing existing methods.Numcharoenpinij et al. introduced a GATbased regression approach in their model, which utilizes autoencoders to capture key features of cell lines.
This method demonstrated higher accuracy compared to other models, although the specific metric type of the dataset was not specified.Notably, the GAT-based approach outperformed DeepDDS, exhibiting lower error.Using adverse and therapeutic effect data as synergistic information for drug combinations has led the Muthene model to outperform other models like CGMS.This unique approach has resulted in lower errors in predicting drug synergy.Muthene benefits from including adverse and therapeutic effects, enhancing its accuracy compared to CGMS and similar models.
MGAE-DC [9] is a GAE-based model that has shown lower error rates in regression than PRODeepSyn, HypergraphSynergy, and DeepDDS.However, in classification mode, its results are comparable to those of the PRODeepSyn model.This may be due to an imbalance in the data.The embedding of GAE and GCN encoders appears to work similarly.The Zagidullin's model is related to the optimal selection of drug fingerprints for predicting drug synergy, which achieved the lowest error for predicting synergy on Drugcomb data with E3FP 1024 bits long fingerprints generated from SMILES strings.
As discussed earlier, 4 studies on clinical data were analyzed for synergistic prediction with graph-based models and their classification results are shown in Table 3.
Table 3. Performance evaluation of drug combinations studies using GNNs.* α-score: For each drug pair, a drug interaction score (α-score) quantifying the concavity of the isophenotypic curve was compute [93].┼ Youden's J statistic is a commonly used criterion for determining the optimal threshold in various scenarios, including the case of categorizing synergy scores [94].
Color legend of classification metrics: 0

Conclusions
In conclusion, our study addressed computational models for predicting drug synergy through graph learning network.We conducted a comprehensive exploration of commonly used datasets and related metric types.Our research covers various GNN models and categorizes and clarifies existing research in this field.By carefully examining the limitations and strengths of current models, our analysis provides valuable insights for future researchers in this evolving field.As mentioned, one critical step in the analysis is determining the threshold for classifying drug synergy scores as synergistic or antagonistic (or nonsynergistic), which can significantly impact the results from the data.When the distribution of synergy scores in the dataset is imbalanced, a single threshold might not capture the complexity of the data.Using multiple thresholds can help address this issue and provide a more accurate representation of the underlying biological reality by accounting for the different levels of synergy and antagonism.In the KGANSynergy [8], synergy scores represent the degree of synergy between drug pairs and cell lines.The goal is to categorize these scores into synergistic and antagonistic groups based on a threshold value.Using Youden's J statistic to determine the synergistic threshold helps ensure that the classification decision accurately represents the degree of synergy between drug pairs and cell lines.It's a way to make an informed choice that considers both the positive and negative cases and aims to strike the right balance between sensitivity and specificity.During the process of preparing data for analyzing drug synergy in cell lines, a significant challenge arises when dealing with datasets that contain multiple types of synergy scores for specific samples.For instance, in the Deep DDS model by Wang et al., when faced with samples having multiple synergy scores under the same metric, a solution was applied.They computed the average of the various scores for each sample and treated it as the synergy score.However, it's crucial to note that this approach can introduce potential issues during model training and evaluation.Attention to this matter is vital to ensure the reliability and accuracy of the analysis outcomes.
Analysis of these studies emphasizes the value of integrating diverse cell line characteristics, such as gene expression and tissue type, with drug characteristics, such as target proteins and side effects.This combination improves the accuracy of drug synergy prediction.Striking a balance is importantwhile enriched features are beneficial, overly complex models can lead to irrelevant results.GAT shines in scenarios involving complex graphs such as knowledge graphs.GAT efficiently collects multivariate information in graph structures and enhances the robustness of predictions.Understanding the complexity of the data and using appropriate techniques, such as attention mechanisms, are critical to forecasting success.Moreover, it is very beneficial to use GAE to extract from various graphs.GAE considers node type and edge connections and generates corresponding embeddings that enhance drug synergistic predictions.
In making informed decisions about choosing the right model, an important factor lies in understanding the type and structure of the network being investigated.This requires knowing the basic characteristics inherent in the network.An illustrative example involves identifying "clicks" on the graph [95].These clicks show a set of closely interacting nodes that can represent entities such as drugs, genes, or proteins.
Additionally, specifying "dominant sets" can uncover subsets of nodes that have significant influence on other nodes within the graph.By applying these techniques, we can gain a deeper understanding of the intrinsic properties of the network.Subsequently, with this insight, we can properly use graph learning models tailored to the specific characteristics of the network and ensure more effective use of these models to extract meaningful insights.
Considering the lack of features for cell lines, it seems that the analysis of pathways related to cell lines is a way to extract effective features.In fact, instead of directly using raw gene expression values, features can be extracted from activity levels of specific pathways.These features can depict the functional behavior of the cell line in the context of cancer-related pathways [96].Also, in recent years, the use of the alpha fold model has been widely used to predict the structure of proteins.Using AlphaFold [97] to predict the structure of proteins that are not available and finally, their permutation can be a way to predict the synergism of drugs on cell lines.
Predicting synergistic drug combinations through machine learning techniques relies on the availability of a gold standard training dataset.Typically, such datasets fall into one of two categories: 1) those encompassing drug pairs and their corresponding synergy metrics derived from various cell lines (in vitro screening experiments) or 2) datasets derived from clinical trials, where drug combinations are associated with positive or negative clinical outcomes.Table2provides a comprehensive overview of the diverse datasets used in both in vitro screening and clinical studies, offering insights into the number of drugs, cell lines, drug pairs, samples, and pertinent references.

Figure 2
Figure 2 visually illustrates the utilization of in vitro screening datasets by different GNN models, taking into account the dataset size employed by each study.It is worth noting that various studies may have applied filtering strategies or other data preprocessing techniques, resulting in the utilization of specific subsets of the dataset.Notably, Figure2highlights the frequent usage of Merck dataset[65] and DrugComb database[66].The latter, in particular, consolidates multiple drug synergy datasets, substantially expanding the training set and consequently establishing itself as the commonly favored dataset for synergy prediction modelling.

Figure 1 Table 2 .
Figure 1 The growth of studies related to Graph Neural Network (GNN) models for predicting drug combinations.(a) The cumulative count of published studies over time.The counts are segmented within each half-year period starting from the first study's publication in 2020 until the end of Q2, 2023.(b) the rising use of GNNs in drug combination prediction, as compared to alternative computational methods.

Figure 2 .
Figure 2. Dataset sizes across different drug combination studies, limited to studies using in vitro datasets as datasets using clinical records are not consistent across different studies.
DeepDDS model employs two different types of GNNs: GAT and GCN.These GNNs are assessed to extract features from the molecular graphs of drugs.The genomic characteristics of cancer cells are encoded using a MLP.These resulting embeddings are combined to create the ultimate feature representation for each combination of drug and cell line.These features then go through fully connected layers to classify drug pairs as either synergistic or antagonistic.Hu et al. proposed a model using a diverse graph with drug, protein, and disease nodes.It employs GNNs for message spreading, refining node embeddings through layers of attention-based mechanisms.This enhances the embeddings' quality, later combined for synergy prediction through MLP module.The model predicts drug combination effects effectively by leveraging GNNs and pre-trained models.KGANSynergy has three main steps: KG hierarchical propagation, KG attention layer, and prediction.The model explores relationships between drugs, cell lines, proteins, and tissues.The attention layer updates entity representations using neural network-based attention, and the prediction layer calculates synergy scores.SDCNet using GCN for predicting specific drug synergy without requiring cell line data.It models synergy as graphs per cell line, treating them as relations.R-GCN captures combo traits within each relation and invariant patterns.SDCNet, an encoder-decoder network, learns drug embeddings and forecasts SDCs across cell lines.This method balances cell-specific and invariant features.
embeddings.These features are then fed into the fine-grained Transformer encoder block, which learns chemical substructures and gene interactions.Finally, by aggregating features and using MLP, it predicts synergy.Bao et al. in 2023 proposed MPFFPSDC, a model for predicting drug synergy.It employs GCNs and an MLP to extract features from drug graphs and cell lines.The model aggregates these features to classify drug pair synergy using a classifier module.In the MOOMIN model, they consider the cell type when creating drug combination representations.This leads to a scoring function that predicts synergy for new drug pairs.Yang et al. propose GraphSynergy for predicting effective drug combinations in cancer using the Protein-Protein Interaction (PPI) network.It uses GCN to grasp drug-disease connections, attention highlights key proteins, and two scores evaluate therapy and toxicity.
Numcharoenpinij et al. incorporate genetic data from the Cancer Cell Line Encyclopedia (CCLE), includinggene expression, copy number variation, and somatic mutation.To reduce dimensionality while retaining crucial details, they employ autoencoders: deep, sparse, and deep sparse.For drug information, two representations-Extended Connectivity Fingerprints (ECFPs) and molecular graphs-are utilized.Their model's architecture encompasses DNNs and Autoencoders for genetic and drug data processing.To predict synergy, they employ a GNN framework, utilizing the concept of a MPNN.Muthene predicts drug combination effectiveness by identifying shared mechanistic traits between adverse events (AEs) and therapeutic effects (TEs).It tackles both tasks using meta-path schemas, capturing drug-target interactions and mechanisms of action (MoAs).The model generates drug embeddings from meta-paths and chemical features, predicting AE probabilities and therapeutic synergy.However, it can't forecast synergy for new drugs or unseen cell lines.The CGMS model predicts anti-cancer synergistic drug combinations using a complete graph.This graph integrates cell lines and drugs through different meta-paths, representing drugcell line interactions and drug-drug interactions.Employing the HAN, the model generates whole-graph embeddings hierarchically, capturing important graph information.
Finally, PRODeepSyn forecasts synergy scores using a DNN, utilizing both drug features and cell line embeddings as inputs.The KGE-DC model utilizes a KG containing drugs, targets, enzymes, and transporters to predict synergy.GCNs extract features from the KG, improving information extraction.Drug embeddings and cell line gene expressions are integrated, and a neural network predicts synergy scores.Liu et al. utilize a drug synergistic hypergraph with drugs and cell lines as nodes and hyperedges for synergistic relationships.GCN learns embeddings for drugs and cell lines, capturing hypergraph features.These learned features represent drugs.Gene expression features of cell lines are captured via a network.Finally, matrices of drug and cell line features enter the hypergraph network for predicting drug synergy scores.The TranSynergy model employs a transformer to analyze drug and cell line data while integrating drug target profiles for comprehensive features.It enhances cell line representations using gene expression data.Additionally, a GCN is utilized to extract drug features from drug structures.GAE -based methodsGAE acts as a transformative tool in the two investigated regression models.In MGAE-DC model[9], GAE encodes drug combinations, learning drug embeddings.In Zagidullin et al.[92] GAE transforms molecular structures into fingerprints.By considering synergistic, additive, and antagonistic combinations as distinct input channels, MAGE-DC enhances drug embeddings' ability to differentiate between synergy and nonsynergy.This improved detection is achieved via a GAE.Using concatenated embeddings, drug fingerprints, and cell line features, the prediction module synergistic scores.Zagidullin et al. proposed an approach where genetic and drug data are used to predict drug combination synergy scores.Genetic data informs about cancer cell lines, while drug data include molecular structures.The model employs GAE to encode drug structures, yielding synergy predictions.While this work focused solely on comparing fingerprint types, future research could explore combining molecular structure or investigating other molecular features.
Karimi et al.[79] developed a deep generative model for drug combination design, which incorporates graph-structured domain knowledge and reinforcement learning-based chemical graph-set designer.The underlying idea of the model is to design a reinforcement learning-based (RL-based) drug combination generator that can satisfy certain requirements for effective drug combination therapy.GCNs are applied to these graphs to extract features that capture the within each drug's molecular structure.The generator agent makes decisions on how to connect different subgraphs within each drug graph, guided by the learned features and the multi-objective rewards.The multi-objective rewards include a chemical validity reward, the GS-WGAN adversarial reward, and the network principle-based reward.The generator aims to optimize its actions based on these rewards, resulting in the construction of drug combinations that meet the defined criteria.
If the combined experiments are carried out in concentrations with dose   , we have according to Loewe models employ GAT-based classification approaches and evaluate their performance on the AstraZeneca dataset.However, there is a notable difference in their results.Hu's model achieves an AUC of 0.84, while DeepDDS achieves a comparative AUC of 0.66 Additionally, Hu's model obtains a higher AUPR score.When comparing these models with the same cross fold and same train dataset, Hu's model still outperforms DeepDDS.This might be attributed to Hu's more comprehensive feature extraction process.Hu's model incorporates diverse features from drugs, cell lines, and diseases, utilizing pre-trained models in a heterogeneous graph as a node's features.In contrast, DeepDDS focuses on drug features extracted solely through GAT and GCN from the drug's structure.This highlights that Hu's approach, incorporating a wider range of features and relationships, yields better predictive performance in comparison to DeepDDS's more focused feature extraction.The Hu's model was compared to the TranSynergy [63] model using a 10-fold cross-validation on the Drugcomb dataset.The Hu model outperformed the TranSynergy model in predicting drug combination synergism.This superiority is attributed to the Hu model's utilization of comprehensive drug and cell line features, which enhanced its ability to identify synergistic effects compared to the TranSynergy model that solely relied on drug target AUPR, and F1-score compared to DeepDDS and Jiang's model.Since in the prediction of drug synergy, the accurate detection of positive cases (synergistic combinations) is more important than the detection of negative cases due to data imbalance, criteria such as AUPR and F1-score are used to evaluate the models fairly.These measures take into account the importance of positive samples and make them suitable for unbalanced data sets.If the SDCNet model is trained with appropriate data, it can effectively predict drug synergy.Notably, the DeepDDS model outperforms SDCNet in leave-onedrug-out evaluation, likely because DeepDDS's performance isn't heavily reliant on the training data.