1 Introduction

Combination therapy, a treatment modality that combines two or more therapeutic agents, has increasingly become the preferred approach for many human diseases, especially those caused by alterations in multiple genes or pathways, such as cancer. The integration of anti-cancer drugs enhances efficacy compared to using a single therapy, as it targets different key pathways in a synergistic or additive manner. By combining drugs with distinct mechanisms of action, therapeutic effectiveness can be enhanced, allowing for lower-dose prescriptions, and reducing the potential risks of side effects and toxicity. Clinical evidence consistently demonstrates the utility of combining different therapeutics to improve treatment efficacy in various cancer types, such as breast cancer (Fisusi and Akala 2019), lung cancer (Molina-Arcas et al. 2019), and ovarian cancer (Lui et al. 2020), among others.

However, the search for effective combinations is hindered by the sheer number of potential drug pairs, leading to a combinatorial explosion (Azad et al. 2021b; Gilvary et al. 2019). It is infeasible to experimentally screen the enormous search space of all possible drug combinations. Consequently, the development of computational models to identify potential anti-cancer synergistic drug combinations efficiently and accurately has garnered significant attention from both the scientific community and the pharmaceutical industry. With the increasing availability of large-scale high-throughput screening datasets for identifying synergistic drug combinations, a growing number of artificial intelligence (AI) methods are being employed for in silico predictions of efficacious drug combinations (Hosseini and Zhou 2023; Hu et al. 2022; Zhang et al. 2023; Zhang and Tu 2023; Wang et al. 2022a).

Among different AI models, GNNs have emerged as a powerful class of artificial neural networks designed to process and learn from data structured as graphs. Graphs consist of nodes (vertices) connected by edges (links or relationships), and they are widely used to represent complex relationships and interactions between different entities. Due to their versatility, GNNs have found applications in various fields, including computer vision, natural language processing, social network analysis, bioinformatics, and drug discovery, among others (Zhou et al. 2020).

The increasing importance and application of AI and machine learning in drug discovery have prompted different review articles outlining various data sets, machine learning algorithms, and deep learning models developed to predict synergistic drug combinations in cancer (Torkamannia et al. 2022; Wu et al. 2022; Pearson et al. 2023; Kumar and Dogra 2022). For instance, Torkamannia et al. (Torkamannia et al. 2022) comprehensively reviewed a wide array of drug development data sources, encompassing biological datasets like molecular omics data, drug target information, and molecular interactions, as well as datasets containing high-throughput in vitro screening of drug combinations. Additionally, they presented an overview of the literature on computational methods designed for drug synergy prediction, broadly categorized into deep learning (DL), traditional machine learning (ML), and network-based methods.

Around the same time, Wu et al. (2022)- performed a similar review of machine learning methods used in drug combination prediction across algorithmic categories of systems biology or network-based methods, kinetic models, mathematical models, stochastic search algorithms, classic machine learning, and deep learning methods. They summarized 29 studies, providing details of their respective algorithms, drug combination datasets, input data types, and the availability of program code.

Further, Kumar et al. (2022) conducted a review focused on deep learning-based techniques for the prediction of synergistic drug combinations in cancer. They performed a comparative analysis of prediction techniques based on various performance measures. Additionally, they covered the theoretical aspects of drug synergy and scoring models at length with their mathematical formulations. However, all these reviews were conducted before the surge of GNN techniques in drug discovery. Therefore, they neither adequately cover GNN-related drug combination prediction studies nor represent recent advancements in GNN algorithms.

The rise in the use of GNNs in drug discovery is due to their ability to handle and interpret complex data, such as molecular graphs and biological networks (Bongini et al. 2021; Zhao et al. 2021). GNN-based models have demonstrated high performance and have yielded promising results in various aspects of drug discovery, including virtual screening, molecular property prediction, protein–ligand binding prediction, and drug repurposing (Son and Kim 2021; Wang et al. 2022c; Krasoulis et al. 2022). While GNNs have shown promise in capturing relationships and interactions in various domains (Nguyen et al. 2021; Cai et al. 2021a; Zhao et al. 2021), their specific effectiveness in the context of drug synergy prediction is still an active area of exploration. Considering the increasing use of GNNs in drug synergy prediction, their proven efficacy compared to the widely-used high-performing methods (such as MatchMaker (Kuru et al. 2021), DeepSynergy (Preuer et al. 2018), DTF (Sun et al. 2020b), among others) (Hu et al. 2022; Liu et al. 2022; Rozemberczki et al. 2022; Zhang et al. 2023), and the growing importance of drug combination discovery in both research and industry (Alves et al. 2022), there is a pressing need for a comprehensive review study that focuses on the latest advances in the field and provide insights into the future directions.

This study aims to bridge this gap by providing an in-depth review of the advancements, challenges, and potential of GNN-based approaches in drug synergy prediction. By examining existing literature, we discuss the strengths and limitations. Our review assesses the effectiveness of GNN-based methods, highlighting their reported performance and competitiveness compared to the state-of-the-art machine learning models. Furthermore, we emphasize the importance of systematic comparisons to guide researchers and industry professionals in selecting appropriate methods. This review serves as a valuable resource for those seeking a deeper understanding of the role and capabilities of GNNs in identifying synergistic drug combinations.

2 A brief overview of GNNs

GNNs have emerged as a powerful category of neural networks specifically designed to process data organized in graph structures. Unlike traditional neural networks which are primarily tailored for processing vector or matrix data, GNNs excel at capturing intricate relationships and dependencies between entities within a graph. At the core of GNNs is the fundamental concept of learning representations for each node by aggregating information from its neighboring nodes. These representations are then leveraged to perform prediction and classification tasks. By effectively encapsulating both local and global contexts within a graph, GNNs enable the modeling of complex interactions and dependencies, making them highly versatile across a wide range of applications (Zhou et al. 2020). In the following, we outline core mechanisms and commonly used architectures of GNNs.

2.1 GNN core mechanisms

The section provides a brief overview of the fundamental components that enable GNNs to process graph-structured data. These core mechanisms are essential for understanding how GNNs capture relationships and dependencies within graphs and form the foundation for various GNN architectures and algorithms.

Message passing function is a key mechanism in GNNs that updates node embeddings through an iterative process. Each iteration involves two main steps: aggregating messages and updating node embeddings. During message passing, each node gathers information from its neighboring nodes and combines it into a message. This message carries information from nearby nodes and edges. It is used to update the embedding (i.e., low dimensional numerical representation) of each node u at iteration k denoted as \(h_{u}^{k}\) based on the embeddings of nodes in its neighborhood \(N\left( u \right)\). This update can be represented as

$$h_{u}^{k + 1} = {\text{update}}\left( {h_{u}^{k} , {\text{aggregate}}\left( {h_{v}^{k} , \forall v \in N\left( u \right) } \right)} \right) = {\text{update}}\left( {h_{u}^{k} , m_{N\left( u \right)}^{k} } \right)$$

where \({\text{update}}\left( . \right)\) and \({\text{aggregate}}\left( . \right)\) are arbitrary differentiable functions (typically neural networks). The message \(m_{N\left( u \right)}\), the output of the \({\text{aggregate }}\) function, encompasses the information gathered from \(u\)’s neighbors in the graph. The aggregation step is responsible for merging information from neighboring nodes, while the update step iteratively improves node embeddings across layers. This iterative process enables GNNs to capture intricate relationships and dependencies within the graph (Hamilton, 2020).

Aggregation function is responsible for combining information from a node’s neighboring nodes to produce a single vector representation. Traditional aggregation methods, such as summing or averaging over the neighbor embeddings, may fall short in capturing the intricate nature of the graph structure and relationships between nodes. However, employing more advanced aggregation techniques can enhance the performance of GNNs. One approach to defining an aggregation function is through the concept of permutation invariant neural networks. This approach treats the set of neighbor embeddings \(h_{v} , \forall v \in N\left( u \right)\) as an unordered set that is invariant to permutations and maps this set to a single vector representation \(m_{N\left( u \right)}\). A universal set function approximator, as shown by Zaheer et al. (2017), is an aggregation function that can approximate any permutation-invariant function, mapping a set of embeddings to a single embedding. This can be represented as

$$m_{N\left( u \right)} = {\text{MLP}}_{\theta } \left( {\mathop \sum \limits_{v \in N\left( u \right)} {\text{MLP}}_{\phi } \left( {h_{v} } \right)} \right)$$

where \({\text{MLP}}_{\theta }\) and \({\text{MLP}}_{\phi }\) are multi-layer perceptron parameterized by trainable parameters \(\theta\) and \(\phi\), respectively. Node-level aggregation is a common approach that combines information from neighboring nodes to compute representations for individual nodes. This method treats nodes as unstructured entities and does not explicitly consider the graph structure during aggregation. On the other hand, graph-level aggregation takes into account the local structural information during the aggregation process. It goes beyond simple node-level aggregation and considers the relationships and connectivity between nodes to perform higher-order graph aggregation. This results in a more comprehensive and structured representation of the graph (Yang et al. 2022; Cai et al. 2021b, Hamilton, 2020).

Node or graph representations in GNNs involve learning through the aggregation of information from neighboring nodes. This enables each node to update its representation, capturing both local and global dependencies. For example, the two-dimensional structure of any chemical compound can be represented as a graph, and its node representation can be articulated through a detailed description of chemical properties and atomic bonding characteristics. Additionally, employing this representation for drug candidates and comparing them with existing drugs can be instrumental in identifying potential drugs with a high probability of success. The essence of node and graph representations in GNNs lies in leveraging neural networks to learn expressive features from graph-structured data (Khoshraftar and An 2022).

Attention mechanism in GNNs is a technique that allows the network to assign different importance weights to nodes or edges within a graph during the aggregation step. This mechanism enables the network to focus on the most relevant information and adaptively weigh the influence of different components of the graph. The basic idea behind the attention mechanism is to compute attention weights for each neighbor in the graph, which are used to weigh their contributions during the aggregation process. These attention weights are learned by the network and can be based on various factors such as the similarity, relevance, or importance of the neighbors. The attention weights reflect the importance or significance of each neighbor with respect to the current node being processed. In GNNs, attention can be applied in various ways, but a common approach is to compute the attention weights as a function of the node features and/or edge connections. This is typically done using trainable parameters such as weight matrices and attention vectors. The attention weights are then used to compute a weighted sum or aggregation of the neighbor embeddings, where the weights determine the contribution of each neighbor to the final aggregated representation (Zhang and Xie 2020, Hamilton, 2020).

2.2 GNN architectures

Notable advancements of graph neural networks in recent years have resulted in the development of various architectures that tackle different aspects of graph-structured data. Below, we provide a summary of some main GNN architectures that have found applications in drug combination prediction.

Graph Convolutional Networks (GCNs) are a variant of Convolutional Neural Networks (CNNs) designed to operate on graph-structured data. GCNs leverage both the node features and the graph structure to learn a latent representation that captures the underlying relationships and dependencies within the graph. In GCNs, the input consists of a node feature matrix \(X\), which contains the features of each node, and an adjacency matrix \(A\), which encodes the relationships or similarities between pairs of nodes. The goal is to learn a latent representation \(Z\) that preserves important information from both the node features and the graph structure. The key idea behind GCNs is to propagate and aggregate information from neighboring nodes to update the node representations. By considering the local neighborhood information of each node, GCNs capture both the node features and the relationships among nodes (Hell et al. 2020; Liang et al. 2021).

Graph AutoEncoders (GAEs) are unsupervised learning frameworks used to learn low-dimensional representations of graph-structured data. The core idea behind GAEs is to encode the graph information into a compact representation and then reconstruct the original graph from this representation. In GAEs, a GCN is typically used as an encoder to transform the input graph into a latent representation. The encoder takes into account the node features and the adjacency matrix of the graph to learn informative node representations. The latent representation, denoted as \(Z\), is obtained from the GCN encoder. The goal of GAEs is to capture the inherent relationships and dependencies within the graph, allowing for meaningful analysis and prediction tasks (Lin et al. 2023; Liang et al. 2021).

Graph Attention Networks (GATs) are neural networks designed to operate on graph-structured data by leveraging the concept of attention. GATs assign different weights, called attention coefficients, to the neighboring nodes during the process of central node information aggregation. In GATs, each node in the graph undergoes linear transformations and is mapped to a learnable vector using a single-layer neural network called the mapping function \(f_{a}\). The attention coefficient \(\alpha_{ij}\) represents the influence of node j on node i and is calculated based on the transformed node representations. The final output embedding of the central node is obtained by taking a weighted summation of the representations of its neighboring nodes. The weights for the summation are determined by the attention coefficients. This allows the GAT to focus on the most relevant and informative neighboring nodes for each central node (Veličković et al. 2017; Shao et al. 2022).

Graph SAmple and aggreGatE (GraphSAGE) is a general framework for inductive node embedding, which aims to learn representations for nodes in a graph. Unlike traditional approaches that rely solely on the graph's structure, GraphSAGE uses both the topological structure and the node features to generate embeddings that generalize to unseen nodes. The core idea behind GraphSAGE is to leverage aggregator functions instead of training individual embedding vectors for each node. In this way, GraphSAGE can effectively capture and utilize the collective knowledge of a node's local neighborhood (Hamilton et al. 2017).

Overall, these models collectively contribute to the advancement of graph-based learning in pharmacology by providing insights into complex drug networks and facilitating the discovery of effective drug combinations (Jiang et al. 2021; Sun et al. 2020a; Nguyen et al. 2021).

Graph Regularization is a technique used in optimization problems to impose desired properties on solutions with respect to a graph structure. Graph regularization is closely related to GNNs because both approaches deal with graph-structured data and aim to capture relationships and interactions between objects represented by nodes in the graph. GNNs use message passing and aggregation mechanisms to update node embeddings based on their graph neighborhoods, while graph regularization incorporates graph information into optimization problems to guide the solutions towards desired properties. Both methods leverage the inherent graph structure to improve the handling of complex relationships and interactions within the data. For instance, if we have knowledge that the signal should have sparsity (i.e., few non-zero values), we can introduce a regularization term that encourages sparsity in the solution (Lee, 2021).

3 Drug combination synergy prediction

The schematic view of drug combination synergy prediction is depicted in Fig. 1. A drug combination, as defined by the FDA (Food and Administration 2018), involves the combination of two or more regulated components, such as drugs, devices, or biologics. These components are physically or chemically mixed to create a single entity. When multiple drugs are administered simultaneously, a synergistic drug combination occurs, resulting in a stronger therapeutic effect that surpasses the mere sum of their individual effects. In simpler terms, the combined impact of these drugs exceeds what would be expected by merely adding up their individual effects. On the other hand, an additive drug combination occurs when the combined effect of the drugs is equal to the sum of their individual effects. In this case, there is no enhancement or reduction in the overall effect when the drugs are used together. Conversely, an antagonistic drug combination exists when the combined effect of the drugs is lower than the sum of their individual effects. This happens when the drugs interfere with or counteract each other, resulting in a lower overall effect (García-Fuente et al. 2018). In experiments conducted on cancer cell lines, researchers utilize the in vitro method such as cell culture to assess the impacts of different combinations of drugs on key aspects such as restraining tumor growth, promoting cancer cell apoptosis, and preventing metastasis. (Mokhtari et al. 2017). In these studies, cancer cells are exposed to different concentrations of drug combinations and the collective effects they produce were analyzed. When drugs synergistically interact, they exhibit a more pronounced inhibition of cancer cell proliferation, or a heightened rate of cell death compared to their individual effects. Conversely, an antagonistic combination can diminish efficacy and potentially undermine the desired therapeutic outcome (Kucuksayan et al. 2021). In the following, we will explore quantitative measures of drug combination synergy.

Fig. 1
figure 1

Schematic view of relevant data and the generic pipeline of synergistic drug combination prediction using GNNs. a Datasets of drug combination synergism can be categorized into in vitro screening data and clinical trials studies (read more in Sect. 4.1). b Diverse types of data relevant to drugs, cell lines, diseases, and patients are often retrieved from multiple datasets, and whenever relevant, complex biological or chemical relationships are represented as graphs (read more in Sect. 3.3). c Various types of GNNs are then used to extract numerical features for graph representation (read more in Sect. 3.3). d The corresponding features, along with label data, are then used to predict the synergism of drug combinations as a classification or regression task (read more in Sect. 4.2) and assessed using diverse evaluation metrics (read more in Sect. 5). Acronyms: AUC: Area under the ROC curve, ACC: Accuracy, AUPR: Area under the precision-recall curve, F1: F1-score, RMSE: Root mean square error, MSE: Mean square error

3.1 Metrics of synergism in drug combinations

Determining whether a combination of compounds exhibits an interaction effect involves comparing the observed effects with what would be expected based on a non-interactive (additive) effect. To evaluate the effects of drug combinations and synergy, various metrics are employed. These metrics provide measurements that help assess the combined effects of compounds. By utilizing these frequently used measurements, researchers can determine if the observed effects of a compound combination surpass what would be expected from an additive effect alone. This allows for a comprehensive evaluation of the potential interaction and synergy between different compounds in order to optimize drug combinations for enhanced therapeutic outcomes. Described below are some commonly used metrics that facilitate this evaluation process (Ianevski et al. 2020).

Loewe Additivity, defined by Loewe in 1926, is based on the principle of sham combination which assumes no interaction effect when a compound is combined with itself (Loewe, 1953). It is a dose–effect-based concept that is widely used in pharmacology and toxicology. In pharmacology, a dose–response curve is a graphical representation of the relationship between the dose of a drug or compound and its effect. The curve shows how the effect changes as the dose increases. Loewe additivity assumes that the dose–response curves for two compounds are parallel, meaning that they have the same shape and slope. This allows for the calculation of an additive effect, which is simply the sum of the individual effects at a given dose. If the combined experiments are carried out in concentrations with dose \(x_{i}\), we have according to Loewe Additivity principle:

$$E\left( {x_{1} ,x_{2} \ldots ,x_{n} } \right) = E\left( {X_{1} } \right) = E\left( {X_{2} } \right) = E\left( {X_{n} } \right),{\text{ such}}\;{\text{that}}\;\mathop \sum \limits_{{i \in \left[ {1,n} \right]}} \frac{{x_{i} }}{{X_{i} }} = 1$$

\(X_{1} , \ldots ,X_{n}\) represent the doses applied to the drugs within individual experiments, while E stands for the resultant effect (Goldoni and Johansson 2007). The mentioned combination involves fractions of individual doses that achieve the effect separately. When these fractions are added together, they sum up to one and result in the same effect (Lederer et al. 2019). To make this concept clearer imagine two substances: compound A and compound B. Each of these compounds, when administered alone at specific doses, i.e., \(X_{1}\) and \(X_{2}\) respectively, produces a desired effect. The idea is that if you take a fraction (\(\frac{{x_{1} }}{{X_{1} }}\)) of dose \(X_{1}\) from compound A and a fraction \(\left( {\frac{{x_{2} }}{{X_{2} }}} \right)\) of dose \(X_{2}\) from compound B, such that their sum equals 1, then this combination should yield the same effect as taking dose \(X_{1}\) from compound B and dose \(X_{2}\) from compound A. Subsequently, when we have two compounds and their fractions \(\left( {\frac{{x_{1} }}{{X_{1} }}} \right)\) and \(\left( {\frac{{x_{2} }}{{X_{2} }}} \right)\) satisfy the condition of \(\frac{{x_{1} }}{{X_{1} }} + \frac{{x_{2} }}{{X_{2} }} < 1\) (i.e., \(\mathop \sum \limits_{{i \in \left[ {1,n} \right]}} \frac{{x_{i} }}{{X_{i} }} < 1\) in multi-compound combination experiments) then we consider the effect synergistic meaning that the combined effect of these compounds is greater than what would be expected if their effects were simply additive. Conversely, if \(\frac{{x_{1} }}{{X_{1} }} + \frac{{x_{2} }}{{X_{2} }} > 1\) (i.e., \(\mathop \sum \limits_{{i \in \left[ {1,n} \right]}} \frac{{x_{i} }}{{X_{i} }} > 1)\), the interaction is considered antagonistic, indicating an overall effect less than expected.

Bliss Independence score refers to the concept of Bliss independence (Baeder et al. 2016), which is a concept used in pharmacology to assess whether the combined effects of multiple compounds are additive, synergistic, or antagonistic. The main assumption of the Bliss independence criterion is that two or more substances act independently from one another. The Bliss independence criterion is mathematically expressed through the Bliss equation.

Loewe additivity is suitable when the drugs have shared targets, while Bliss independence is more appropriate when each drug targets a distinct pathway (Liu et al. 2018). For a simplified example with two compounds (A and B), the Bliss equation is:

\(E_{AB} = E_{A} + E_{B} - E_{A} \times E_{B}\),

Where \(E_{A}\) and \(E_{B}\) represents the effect of drug A at dose \(x\) and drug B at dose \(y\), respectively. \(E_{AB}\) represents the combined effect of drugs A and B at doses \(x\) and \(y\). If the combined effect \(E_{AB}\) matches the calculated value from the equation, the compounds are acting independently. When drug A and drug B are combined, the effect of drug B is modified by the proportion \(\left( {1 - E_{A} } \right)\) that is “spared” by drug A. By summing up these two terms, i.e., \(E_{A}\) and \(E_{B} \left( {1{ }{-}{ }E_{A} } \right)\), we get the expected combined effect \(E_{AB} .\)

$$\frac{{E_{A} + E_{B} \left( {1 - E_{A} } \right)}}{{E_{AB} }}\left\{ {\begin{array}{*{20}l} { < 1,} \hfill & {synergism} \hfill \\ { = 1,} \hfill & {additive} \hfill \\ { > 1,} \hfill & {,antagonism} \hfill \\ \end{array} } \right.$$

The above relation indicates that if the value of \(E_{AB}\) is greater than \(E_{A} + E_{B} \left( {1 - E_{A} } \right)\), then it signifies synergism. If the two values are equal, it suggests additivity, and otherwise, it implies antagonism (Duarte and Vale 2022).

Zero Interaction Potency (ZIP) score is a valuable tool used to assess the synergistic or antagonistic effects of drug combinations. It combines the strengths of the Loewe and Bliss models, allowing for a systematic evaluation of various patterns of drug interaction. The ZIP score provides a numerical value ranging from − 1 to 1, indicating the degree of synergy or antagonism observed in a drug combination. it is derived from the concept of zero interaction (Sühnel 1992), which assumes that the potency of a drug’s dose–response curve remains unaltered when combined with another drug (Yadav et al. 2015).

Highest Single Agent (HSA) model, also known as Gaddum's non-interaction model (Berenbaum, 1989), provides a simple approach to estimate the expected combination effect of multiple drugs. According to this model, the expected combination effect is determined by taking the difference between the combined response of the drugs \(E_{{\left( {A,B,C, \ldots ,N} \right)}}\) and the maximum response observed among the individual drugs \(\max \left( {E_{A} ,E_{B} ,E_{C} , \ldots ,E_{N} } \right)\). In other words, the HSA model assumes that the combination effect is equal to the highest response achieved by any single drug at the corresponding concentrations. The HSA model offers a straightforward way to estimate the expected outcome of drug combinations and serves as a baseline for assessing whether observed effects deviate from the additive expectation (Lehár et al. 2007).

Each of these metrics has distinctive strengths and limitations, as outlined in previous reviews (Duarte and Vale 2022). The choice of a specific metric is influenced by various factors, including the experimental design, biological context, and data availability, as elaborated further in the Discussion section. Nonetheless, The field of drug combination synergy analysis is dynamic, with ongoing metric development reflecting deeper insights into drug interactions (Liu et al. 2018; Lederer et al. 2019).

3.2 Supervised drug synergy prediction in cancer

Supervised anticancer drug synergy prediction, driven by machine learning and artificial intelligence, typically involves training models on two distinct types of datasets: (1) in vitro experiments conducted on various cell lines, evaluating the synergistic effects of different drug combinations at varying concentrations using diverse reference models (e.g., Loewe, Bliss, ZIP, HAS), and (2) clinical trial studies of drug combinations in patient populations, comprising information on clinical response, treatment outcomes, and adverse effects.

In the clinical trial dataset, the prediction task is often treated as a classification problem, where the goal is to determine positive versus negative clinical outcomes for specific drug combinations. On the other hand, the in vitro experiments yield continuous measures of synergism, which can be approached as either a regression or a classification problem after categorizing the synergy measures.

In classification tasks, the synergistic values of drugs on cell lines are grouped into either two categories (synergistic versus non-synergistic) or three categories (synergistic, additive, and antagonistic) by applying predefined thresholds to split the data. Nevertheless, establishing the most suitable threshold poses challenges and tends to differ for various synergy measures. For instance, GraphSynergy (Yang et al. 2021) uses the threshold of zero to binarize Loewe measure with a score greater than or less than 0 indicating a synergistic or non-synergistic effect, respectively. Zhang et al. (Zhang et al. 2023), on the other hand, categorized Loewe and ZIP synergistic scores using quartiles. The highest quartile represents synergistic effects, and the lowest quartile represents antagonistic effects. Additionally, some other studies (Wang et al. 2022a; Zhang et al. 2022) consider Loewe synergy scores above 10 as synergistic and scores below 0 as antagonistic.

Nonetheless, once the class labels have been determined, various machine learning algorithms, such as random forests (Singh et al. 2018), support vector machines (Preuer et al. 2018), or neural networks (Preuer et al. 2018), can be employed to perform the classification task. These models learn patterns and relationships from features encompassing diverse information about drugs and cell lines to make predictions.

Regression, on the other hand, focuses on predicting a quantitative measure of synergy for each drug combination on a particular cell line. Instead of discrete class labels, regression models estimate the degree or magnitude of synergy, providing continuous output values. Regression techniques, including linear regression (Kuru et al. 2021), gradient boosting, or deep learning approaches (Preuer et al. 2018), have been used to predict the synergy level of drug combinations.

Some of the commonly employed models for feature extraction from the key factors that contribute to predicting drug synergy will be discussed below.

3.3 Feature extraction

Feature extraction provides the predictive variables for a machine learning-based model, constituting a crucial step for addressing multifactorial complex problems, such as drug synergy prediction. The process of feature extraction is preceded by the collection and representation of biological information, as illustrated in Fig. 2 and elaborated below:

  • Biological data collection Drug synergy prediction algorithms frequently leverage open-access bioinformatics databases to acquire pertinent biological, chemical, and clinical information related to drugs, cell lines, and diseases or patients. This encompasses details about drug chemical structures, drug protein targets, mechanisms of action, cell line gene expression profiles, molecular interactions (e.g., protein–protein interactions, pathways), human gene-disease associations, and functional genomics, among others. Comprehensive discussions about this information and their respective databases can be found in previous reviews (Torkamannia et al. 2022; Kumar and Dogra 2022; Chen et al. 2015), and interested readers are referred to those sources. It has been consistently demonstrated that incorporating diverse types of features, capturing complementary information about drugs, enhances the prediction performance of drug discovery applications (Wang et al. 2022a, b; Azad et al. 2021a; Liu et al. 2022; Zhang et al. 2023).

  • Feature representation: Following the extraction and, when necessary, preprocessing of biological data (e.g., data normalization in gene expression profiles or extraction of embeddings from protein sequences), this information is represented either as numerical vectors or graphs. Examples of vector-based encoding include molecular fingerprints, gene expression profiles, one-hot encoding of disease-related genes, protein sequences, among others (Gan et al. 2023; Liu et al. 2022). In the context of GNN models, the use of network-based representations has gained popularity. Various types of biological information, such as drug target interactions, protein–protein interactions, molecular pathways, gene co-expression, and synergistic drug pairs associated with a particular cell line, have been represented as graphs. Some studies have even combined different types of interactions and diverse node types into heterogeneous networks for subsequent feature extraction (Zhang et al. 2023; Zhang and Tu 2022; Yu et al. 2021).

  • Feature extraction In general, feature extraction can be categorized into two major approaches: feature transformation and feature selection (Vijayan et al. 2022). The latter involves selecting a subset of features as variables for a predictive model, such as choosing gene expression values related to cancer or identifying mutational signatures in the context of a specific disease. However, the majority of techniques rely on feature transformation, wherein statistical methods (e.g., different dimensionality reduction methods (Koch et al. 2021) or singular value decomposition methods (Chen et al. 2022)) or representation learning algorithms (especially neural networks) are used to extract latent features representing complex relationships in data (Gunawan et al. 2023). In the context of GNN models, multiple graph-based neural networks (e.g., GCN, GAE, and GAT), have been employed to learn low-dimensional representations from network-based data. These features are then utilized as variables for a predictive model (Liu et al. 2022) or to prioritize drug synergies based on a ranking mechanism (Jin et al. 2021).

Fig. 2
figure 2

Feature extraction procedure comprising biological data collection (and pre-processing), feature representation and extraction of contributing (latent) features. Acronyms include MoA: mechanism of action, EHR: electronic health records, MLP: multi-layer perceptron, GCN: graph convolutional network, GAT: graph attention, GAE: graph autoencoders

4 Review of GNN methods for predicting drug synergy

We conducted a comprehensive search of PubMed, Google Scholar, and Web of Science until July 2023, using the keywords ‘graph’, ‘drug combination’, and ‘synergy’ and screened retrieved articles with respect to their relevance to drug synergy predictions in cancers using GNNs. Overall, we identified 25 relevant articles within the timeframe of February 2020–July 2023. We observed a sharp upward trend in the development of GNNs for drug synergy prediction (Fig. 3a). Moreover, we collected machine learning studies related to drug synergy prediction from 2010 to 2023, not restricted solely to GNNs (Supplementary Table 1). Interestingly, we observed that since the inception of GNNs in this field, their development for drug synergy prediction is becoming on par (and potentially even surpassing in the near future) the combined progress of all other machine learning methods, as depicted in Fig. 3b. These trends underscore the significance and timeliness of our review in providing insights into this evolving landscape.

Fig. 3
figure 3

The growth of studies related to Graph Neural Network (GNN) models for predicting drug combinations. a) The cumulative count of published studies over time. The counts are segmented within each half-year period starting from the first study’s publication in 2020 until the end of Q2, 2023. b) the rising use of GNNs in drug combination prediction, as compared to alternative computational methods

4.1 Datasets

Predicting synergistic drug combinations through machine learning techniques relies on the availability of a gold standard training dataset. Typically, such datasets fall into one of two categories: (1) those encompassing drug pairs and their corresponding synergy metrics derived from various cell lines (in vitro screening experiments) or (2) datasets derived from clinical trials, where drug combinations are associated with positive or negative clinical outcomes. Table 1 provides a comprehensive overview of the diverse datasets used in both in vitro screening and clinical studies, offering insights into the number of drugs, cell lines, drug pairs, samples, and pertinent references.

Table 1 Summary of drug synergy combination datasets utilized by studies reviewed in this article

Figure 4 visually illustrates the utilization of in vitro screening datasets by different GNN models, taking into account the dataset size employed by each study. It is worth noting that various studies may have applied filtering strategies or other data preprocessing techniques, resulting in the utilization of specific subsets of the dataset. Notably, Fig. 4 highlights the frequent usage of Merck dataset (O'Neil et al. 2016) and DrugComb database (Zagidullin et al. 2019). The latter, in particular, consolidates multiple drug synergy datasets, substantially expanding the training set and consequently establishing itself as the commonly favored dataset for synergy prediction modelling. On the other hand, datasets such as FORCINA which only focuses on one specific cell line are less frequently utilized in computational models for predicting drug synergy with GNNs.

Fig. 4
figure 4

Dataset sizes across different drug combination studies, limited to studies using in vitro screening datasets as datasets using clinical records are not consistent across different studies

4.2 Drug combination prediction based on in vitro experiments

In this section, we offer an examination of research works centered on drug combination prediction using in vitro synergy experiments. Table 2 offers a comprehensive summary of these studies, outlining their individual merits and limitations. We organize these studies into two main sub-sections: classification and regression, as detailed in Table 2.

Table 2 Summary of drug combination prediction studies using GNNs, comprising information about the type of the prediction model (regression vs classification), drug combination datasets (in vitro vs clinical), GNN models, input features (more details in Supplementary Table 2), strength and limitations, and links to the codes when available

4.2.1 Classification methods

As detailed in Table 2, 16 studies have used classification to predict drug synergism. Out of them, 5 studies have also developed regression-based models which were covered in the next section. We grouped the remaining 11 studies based on their underlying GNN architecture namely GAT, GCN and GAE and summarized below:

4.2.1.1 GAT-based methods

In four different models, researchers have used the GAT to extract important features. GAT's attention mechanisms allow it to focus on relevant parts of a graph and increase model performance. For instance, in the case of DeepDDS (Wang et al. 2022a), GAT is used to gather important information from the structure of drugs. Additionally, Zhang et al.(Zhang et al. 2023) and Hu et al. (Hu et al. 2023) take a knowledge graph (KG) approach, creating graphs and using Graph Attention Networks to gather valuable insights from these graphs. Here, we'll delve into these models in more detail to provide a clear understanding of their methods.

The DeepDDS model employs two different types of GNNs: GAT and GCN. These GNNs are assessed to extract features from the molecular graphs of drugs. The genomic characteristics of cancer cells are encoded using a MLP. These resulting embeddings are combined to create the ultimate feature representation for each combination of drug and cell line. These features then go through fully connected layers to classify drug pairs as either synergistic or antagonistic. Hu et al. proposed a model using a diverse graph with drug, protein, and disease nodes. It employs GNNs for message spreading, refining node embeddings through layers of attention-based mechanisms. This enhances the embeddings' quality, later combined for synergy prediction through MLP module. The model predicts drug combination effects effectively by leveraging GNNs and pre-trained models. KGANSynergy has three main steps: KG hierarchical propagation, KG attention layer, and prediction. The model explores relationships between drugs, cell lines, proteins, and tissues. The attention layer updates entity representations using neural network-based attention, and the prediction layer calculates synergy scores. SDCNet uses GCN for predicting specific drug synergy without requiring cell line data. It models synergy as graphs per cell line, treating them as relations. R-GCN captures combo traits within each relation and invariant patterns. SDCNet, an encoder-decoder network, learns drug embeddings and forecasts SDCs across cell lines. This method balances cell-specific and invariant features. However, new drug combos could pose accuracy challenges.

4.2.1.2 GCN-based methods

GCN is used as a mechanism to extract meaningful features from complex relations in networks. Among the reviewed studies, the use of GCN works in drug-protein interaction networks or molecular structure (Bao et al. 2023; Yang et al. 2021; Wang et al. 2022b). In various models, including those proposed by Hu et al. model (Hu et al. 2022), and the MPFFPSDC model (Bao et al. 2023), the GCN encoder's pivotal role is in contextualizing drug structures within networks. This enables the transformation of drug structures into embeddings in new spaces. In these models, the GCN is employed to extract higher-order neighbor feature representations for atoms in drug molecular structures. In the GraphSynergy model (Yang et al. 2021), a GCN is employed, specially tailored to understand the connections between drugs and disease modules within this network. MOOMIN (Rozemberczki et al. 2022) learns drug representations by encoding properties of compounds and sequence proteins into vertex features. Similarly, SDCNet (Zhang et al. 2022a) applies GCN with attention layers to get relevant details from the drug-cell line network, creating important features.

The framework of the DTSyn model consists of two paths: a fine-grained block and a coarse-grained block. To use GCN, input chemical features are processed through GCN blocks and combined with gene embeddings. These features are then fed into the fine-grained Transformer encoder block, which learns chemical substructures and gene interactions. Finally, by aggregating features and using MLP, it predicts synergy. Bao et al. in 2023 proposed MPFFPSDC, a model for predicting drug synergy. It employs GCNs and an MLP to extract features from drug graphs and cell lines. The model aggregates these features to classify drug pair synergy using a classifier module. In the MOOMIN model, they consider the cell type when creating drug combination representations. This leads to a scoring function that predicts synergy for new drug pairs. Yang et al. propose GraphSynergy for predicting effective drug combinations in cancer using the Protein–Protein Interaction (PPI) network. It uses GCN to grasp drug-disease connections, attention highlights key proteins, and two scores evaluate therapy and toxicity.

4.2.1.3 GAE and GCN encoder-based methods

The GCN Encoder is designed to learn node embeddings that capture both the structure of the graph and the attributes associated with its nodes. It excels at uncovering relationships between nodes and using these relationships for predictive tasks by integrating feature information and graph structure (Jiang et al. 2020). On the other hand, the GAE acts as an autoencoder with the goal of creating a more condensed representation of the graph while also reconstructing the original adjacency matrix from the embeddings (Kipf and Welling 2016). This reconstruction helps in inferring missing connections and gaining a comprehensive understanding of the graph's connections.

In the GAECDS model (Li et al. 2023a), GAE encodes drug combination information using the adjacency matrix and drug features. The encoded latent features are then used to reconstruct the drug synergy graph and uncover novel relationships. Jiang et al. (2020) applied a GCN encoder to process diverse networks encompassing drug-drug synergy, drug-target interactions, and protein–protein interactions. This encoder transforms drug nodes into new-space embeddings. The model examines 39 heterogeneous networks, generating embeddings via GCN encoding. Finally, using these embeddings and predictive models, drug synergy is forecasted. The GAECDS model consists of three key parts: a GAE, an MLP, and a CNN. The GAE encodes drug synergy graphs and decodes them to find new relationships. An MLP generates cell line features, while a CNN predicts drug synergy by combining drug and cell line features.

One of the classification studies based on Graph Regularization is which was proposed by Lv et al. (2022). They collected antibiotic combinations and target information from the literature and described drug actions through network propagation and network proximity. The study focused on pairwise antibiotic combinations and quantified interactions based on the α-score. The model's goal was to predict synergistic antibiotic combinations by considering pharmacological similarity between drugs. The affinity matrix W was constructed to differentiate between pharmacologically similar drugs (potentially synergistic) and pharmacologically identical drugs (additive effect).

4.2.2 Regression methods

Various approaches address drug synergy prediction using regression methods. Categories include GAT, GCN, and GAE models, each enhancing performance with distinct models.

4.2.2.1 GAT-based methods

GAT is a mechanism that operates within the models to focus on important interactions and features, contributing to accurate synergy score predictions. In Model Numcharoenpinij et al. (2022), GAT is employed in the GNN model based on the Message-Passing Neural Network (MPNN) framework. These weights guide the aggregation process, allowing the model to focus on critical interactions. In the Muthene (Yue et al. 2023), GAT creates meta-path-specific embeddings for end/central nodes by assigning weights to neighbor features based on attention mechanisms. In CGMS model (Wang et al. 2023), GAT is used within the Heterogeneous Graph Attention Network (HAN). HAN has three layers and employs a self-attention mechanism to capture important information and produce cell line embeddings. Each model leverages GAT uniquely within its architecture, highlighting its versatility in different scenarios.

Numcharoenpinij et al. incorporate genetic data from the Cancer Cell Line Encyclopedia (CCLE), including gene expression, copy number variation, and somatic mutation. To reduce dimensionality while retaining crucial details, they employ autoencoders: deep, sparse, and deep sparse. For drug information, two representations—Extended Connectivity Fingerprints (ECFPs) and molecular graphs—are utilized. Their model's architecture encompasses DNNs and Autoencoders for genetic and drug data processing. To predict synergy, they employ a GNN framework, utilizing the concept of an MPNN. Muthene predicts drug combination effectiveness by identifying shared mechanistic traits between adverse events (AEs) and therapeutic effects (TEs). It tackles both tasks using meta-path schemas, capturing drug-target interactions and mechanisms of action (MoAs). The model generates drug embeddings from meta-paths and chemical features, predicting AE probabilities and therapeutic synergy. However, it can't forecast synergy for new drugs or unseen cell lines. The CGMS model predicts anti-cancer synergistic drug combinations using a complete graph. This graph integrates cell lines and drugs through different meta-paths, representing drug-cell line interactions and drug-drug interactions. Employing the HAN, the model generates whole-graph embeddings hierarchically, capturing important graph information.

4.2.2.2 GCN-based methods

GCNs excel at capturing complex interactions in graphical data, which are common in drug synergistic prediction models. In these models, GCNs are used to process molecular structures of drugs or knowledge graphs containing various entities and relationships. This ability to capture rich information from different networks makes GCNs a good choice for modeling complex biological relationships (Wang et al. 2022b; Zhang et al. 2023; Liu and Xie 2021). The PRODeepSyn (Wang et al. 2022b) model leverages the GCN to construct gene hidden states based on the PPI network. Zhang et al. (Zhang and Tu 2022) emphasize the pivotal role of GCNs in extracting valuable information from the constructed KG. In the TranSynergy model (Liu and Xie 2021), the GCN is utilized to extract important features from the drug's molecular graph structure. In the HypergraphSynergy model (Liu et al. 2022), GCN embeds drugs and cell lines; After forming a hypergraph based on drugs and cell lines, it learns and finally records the embedding of nodes.

To predict drug synergy, PRODeepSyn initially forms drug features using molecular fingerprints and descriptors. For cell line features, it combines gene expression, gene mutation, and interactions among gene products. GCN is applied to create gene hidden states from the PPI network, considering protein interactions. These states estimate the gene's evident state using omics data. Finally, PRODeepSyn forecasts synergy scores using a DNN, utilizing both drug features and cell line embeddings as inputs. The KGE-DC model utilizes a KG containing drugs, targets, enzymes, and transporters to predict synergy. GCNs extract features from the KG, improving information extraction. Drug embeddings and cell line gene expressions are integrated, and a neural network predicts synergy scores. Liu et al. utilize a drug synergistic hypergraph with drugs and cell lines as nodes and hyperedges for synergistic relationships. GCN learns embeddings for drugs and cell lines, capturing hypergraph features. These learned features represent drugs. Gene expression features of cell lines are captured via a network. Finally, matrices of drug and cell line features enter the hypergraph network for predicting drug synergy scores. The TranSynergy model employs a transformer to analyze drug and cell line data while integrating drug target profiles for comprehensive features. It enhances cell line representations using gene expression data. Additionally, a GCN is utilized to extract drug features from drug structures.

4.2.2.3 GAE -based methods

GAE acts as a transformative tool in the two investigated regression models. In MGAE-DC model (Zhang and Tu 2023), GAE encodes drug combinations, learning drug embeddings. In Zagidullin et al. (Zagidullin et al. 2021) GAE transforms molecular structures into fingerprints. By considering synergistic, additive, and antagonistic combinations as distinct input channels, MAGE-DC enhances drug embeddings' ability to differentiate between synergy and non-synergy. This improved detection is achieved via a GAE. Using concatenated embeddings, drug fingerprints, and cell line features, the prediction module synergistic scores. Zagidullin et al. proposed an approach where genetic and drug data are used to predict drug combination synergy scores. Genetic data informs about cancer cell lines, while drug data include molecular structures. The model employs GAE to encode drug structures, yielding synergy predictions. While this work focused solely on comparing fingerprint types, future research could explore combining molecular structure or investigating other molecular features.

Although these models are promising in predicting drug synergy, there are limitations that require further research and improvement for better performance, which we will discuss below.

4.3 Drug combination prediction based on clinical studies

Five methods utilized clinical studies to construct datasets of synergistic drug combinations, all employing GCNs as their primary neural network approach for the classification task (Table 2).

4.3.1 Classification methods

4.3.1.1 GCN-based methods

The MK-GNN model (Gao et al. 2023) is a deep learning approach designed to predict effective drug combinations for patient treatment. It utilizes multi-head attention to learn patient features from diagnosis and treatment procedure sequences. Additionally, it incorporates prior medical knowledge derived from electronic health record data, considering the relationship between diagnoses and medications. The model also employs a GCN to learn medication representation vectors, capturing drug knowledge from a formulated drug network. However, the model's generalization is limited due to variations in drug combination recommendations among different doctors and regions. To address this, future research aims to study feature invariance in drug combinations and enhance the algorithm's applicability in real clinical settings. Chen et al. (2022) proposed a novel computational pipeline called DCMGCN for predicting drug combinations. The pipeline integrates diverse drug-related information to learn low-dimensional representations of drugs from attributes and similarity networks. They identified that the drug-drug network had heterophily and sparseness, which could limit the effectiveness of the GCN. To address this, they introduced two modifications to GCN. The drug representations were then optimized using the modified GCN (MGCN) to predict drug combinations. By integrating various data types, including clinical data, DCMGCN becomes a powerful tool for drug discovery and repositioning, with potential for further extension by incorporating more heterogeneous information and experimental validation. ComboNet model (Jin et al. 2021) is designed to jointly learn drug-target interactions and drug-drug synergy. It comprises two components: a drug-target interaction module and a target-disease association module. This architecture enables the model to utilize data on drug-target interactions, single-agent antiviral activity, and available drug-drug combination datasets. The DTI network in ComboNet predicts likely targets for drugs, while the target-disease association network models how biological targets and structural features of molecules are related to antiviral activity and synergy. The model's strength lies in considering single-agent activity, which enhances the effectiveness of drug combination predictions against SarsCov-2. However, a limitation is the scarcity of training data for accurate drug synergy prediction.

MG-DDIS model (Deng et al. 2021) is an end-to-end multi-task learning framework based on a GCN for predicting DDIs and synergistic drug combinations. The model to capture important information from the molecular structures, the R-radius subgraph method is applied, producing a series of subgraphs for each drug. These subgraphs are then fed into the GCN encoder to learn a latent representation of drugs. The model is trained using a multi-task approach to simultaneously predict DDIs and synergistic drug combinations. Despite its success, the model's limitations include the possibility of adverse reactions arising from various factors unrelated to synergy, such as individual drug sensitivity and independent toxic properties of certain drugs.

4.3.1.2 GAE -based methods

Karimi et al. (Karimi et al. 2020) introduced a novel deep generative model for drug combination design, named as the Hierarchical Variational Graph Autoencoder (HVGAE), which leverages graph-structured domain knowledge and reinforcement learning-based chemical graph-set designer. In HVGAE, GAE has been utilized in learning and encoding features from graph-structured data at two levels: (1) Gene–Gene Embedding where GAE is applied to the gene–gene network, represented as a graph, to extract features related to gene interactions, and (2) Disease-Disease Embedding where GAE operates on the disease-disease network, building on the gene representations learned in the first level. Simultaneously, GCNs are applied to process the graph structures representing gene–gene interactions. The HVGAE framework integrates these dual levels of GAE and the insights from GCNs into an end-to-end representation learning process. The learned features serve as a foundation for subsequent drug combination design. The model's core objective is to design a reinforcement learning-based (RL-based) drug combination generator, operating within a chemistry- and system-aware environment.

Across these models, the interesting aspect of using GCN lies in its ability to capture complex relationships and structural information from different types of data, such as molecular graphs, networks, and clinical information.

5 Evaluation of GNNs on in vitro datasets

In this section, we discuss the findings presented in Table 3, which includes the results of various drug combination prediction studies. By reviewing and analyzing these results, we aim to gain valuable insights into the challenges in studies and advances in this field.

Table 3 Performance evaluation of drug combinations studies using GNNs

Both the DeepDDS (Wang et al. 2022a) and Hu et al. (2023) models employ GAT-based classification approaches and evaluate their performance on the AstraZeneca dataset. However, there is a notable difference in their results. Hu's model achieves an AUC of 0.84, while DeepDDS achieves a comparative AUC of 0.66 Additionally, Hu's model obtains a higher AUPR score. When comparing these models with the same cross fold and same train dataset, Hu’s model still outperforms DeepDDS. This might be attributed to Hu's more comprehensive feature extraction process. Hu's model incorporates diverse features from drugs, cell lines, and diseases, utilizing pre-trained models in a heterogeneous graph as a node’s features. In contrast, DeepDDS focuses on drug features extracted solely through GAT and GCN from the drug's structure. This highlights that this approach, incorporating a wider range of features and relationships, yields better predictive performance in comparison to DeepDDS's more focused feature extraction. In other words, Hu et al. obtained initial embeddings for different types of entities (heterogeneous entities such as drugs, proteins, and diseases) using separate MLPs. After obtaining these initial embeddings, they further enhanced and refined these embeddings using GATs.

The Hu’s model was compared to the TranSynergy (Liu and Xie 2021) model using a tenfold cross-validation on the DrugComb dataset. The Hu model outperformed the TranSynergy model in predicting drug combination synergism. This superiority is attributed to the Hu model's utilization of comprehensive drug and cell line features, which enhanced its ability to identify synergistic effects compared to the TranSynergy model that solely relied on drug target proteins. The advantage of Hu's model is that it enables the model to capture and propagate information through the graph effectively. However, TranSynergy lacks the capacity to capture the same level of information regarding relationships between entities.

SDCNet (Zhang et al. 2022a), a GAT-based model, is compared with DeepDDS and Jiang's model (Jiang et al. 2020) on various datasets (ALMANAC, Merck, Cloud, FORCINA) using different metrics (Loewe, Bliss, HSA, Zip). SDCNet benefits from drug features derived from the training dataset and cell line-based information. Unlike traditional GCNs, R-GCN employs distinct aggregation mechanisms tailored to different types of relationships. As a result, the SDC-Net model succeeds in obtaining more informative representations for drugs specific to each cell line. This advantage leads to more informative features for classification. Despite dataset imbalances, SDCNet achieves superior AUC, AUPR, and F1-score compared to DeepDDS and Jiang's model. Since in the prediction of drug synergy, the accurate detection of positive cases (synergistic combinations) is more important than the detection of negative cases due to data imbalance, criteria such as AUPR and F1-score are used to evaluate the models fairly. These measures take into account the importance of positive samples and make them suitable for unbalanced data sets. If the SDCNet model is trained with appropriate data, it can effectively predict drug synergy. Notably, the DeepDDS model outperforms SDCNet in leave-one-drug-out evaluation, likely because DeepDDS's performance isn't heavily reliant on the training data. Conversely, in leave-one-cell-line-out evaluation, SDCNet excels. This is because SDCNet processes features individually for each cell line, considering the interaction type of medicinal compounds (synergistic or antagonistic). Overall, SDCNet's success is attributed to its specialized feature processing for different cell lines. The study (Wang et al. 2022a) highlights an intriguing observation regarding the DeepDDS model's performance. It reveals that when the model's complexity increases and features become excessively dimensional, its performance can actually suffer. A comparison between DeepDDS and TranSynergy underscores this point. In TranSynergy, features are not only high-dimensional but also embedded using a transformer. On the Merck dataset, the DeepDDS model outperforms TranSynergy, emphasizing that overly complex models and extensive feature dimensions might not always yield improved results. Among other GAT-based models, the KGANSynergy (Zhang et al. 2023) model which extracts the features of drugs and cell lines using the knowledge graph and based on attention, and compared to the GraphSynergy model (Yang et al. 2021), it has been able to perform better.

The MPFFPSDC model (Bao et al. 2023), which is based on the GCN approach, outperforms DeepDDS on the Merck dataset. While both models achieve almost the same results, MPFFPSDC demonstrates superior performance. This could be due to variations in how features are integrated for classification. Despite this difference, both models follow almost the same methods to extract features from drugs and cell lines. DTSyn (Hu et al. 2022) extracts drug features using cell line data and known train's data labels. However, it's less accurate than other machine learning models for predicting drug synergy scores of drugs that it has not seen so far. MOOMIN's model (Rozemberczki et al. 2022) lacks a defined threshold for categorizing drug synergism. It employs random walk on a drug-target network and GCN to embed drugs and capture structural features. However, its performance is comparatively weaker due to the absence of cell line features and comprehensive drug-related information, unlike other GCN-based models. GAECDS, a GCN-based model, classifies drug compound data from DrugComb using a threshold of 0. While using a fixed threshold can introduce noise, GAECDS outperforms both the DeepDDS model and the GraphSynergy model, both of which also use the same dataset and threshold. This improved performance might stem from GAECDS's use of GAE on the drug-drug synergy network, which effectively distinguishes drug combinations in a new data space.

Using an attention-based approach and meta-path on a diverse graph of drug and cell line connections, the CGMS model (Wang et al. 2023) outperformed PRODeepSyn (Wang et al. 2022b), TranSynergy, and DeepDDS. This suggests CGMS effectively predicted drug synergy, surpassing existing methods. Numcharoenpinij et al. introduced a GAT-based regression approach in their model, which utilizes autoencoders to capture key features of cell lines. This method demonstrated higher accuracy compared to other models, although the specific metric type of the dataset was not specified. Notably, the GAT-based approach outperformed DeepDDS, exhibiting lower error. Using adverse and therapeutic effect data as synergistic information for drug combinations has led the Muthene model to outperform other models like CGMS. This unique approach has resulted in lower errors in predicting drug synergy. Muthene benefits from including adverse and therapeutic effects, enhancing its accuracy compared to CGMS and similar models.

MGAE-DC (Zhang and Tu 2023) is a GAE-based model that has shown lower error rates in regression than PRODeepSyn, HypergraphSynergy, and DeepDDS. However, in classification mode, its results are comparable to those of the PRODeepSyn model. This may be due to an imbalance in the data. The embedding of GAE and GCN encoders appears to work similarly. The Zagidullin’s model is related to the optimal selection of drug fingerprints for predicting drug synergy, which achieved the lowest error for predicting synergy on DrugComb data with E3FP 1024 bits long fingerprints generated from SMILES strings.

As discussed earlier, five studies on clinical data were analyzed for synergistic prediction with graph-based models and their classification results are shown in Table 3.

6 Discussion

6.1 GNN for drug synergy prediction: strengths and limitations

GNNs possess a distinct advantage over other machine learning techniques due to their ability to capture intricate relationships within biological networks. This advantage is particularly relevant in drug discovery, where compounds can be intuitively represented as graphs via molecular graph representation—i.e., molecules can be decomposed into individual atoms, with the bonds between them forming a graph structure. Unlike other machine learning algorithms that often require predefined chemical descriptors or molecular fingerprints, GNNs can extract features directly from the graph representation of drugs (Zhang et al. 2022b). Consequently, GNN models are commonly employed in drug synergy prediction models to obtain drug representations as features for a classification or regression model (Bao et al. 2023; Wang et al. 2022a; Hu et al. 2022). Moreover, GNNs can be utilized to learn complex structural relationships in diverse biological systems, such as protein–protein interactions and drug-target interactions. For example, KGANSynergy (Zhang et al. 2023) integrated multiple types of associations (drug–protein, cell line–protein, cell line–tissue, and protein–protein associations) into a knowledge graph to learn representations of cell lines and drugs using GNN models.

However, GNNs also have limitations in the context of drug synergy prediction and in general. GNNs can be computationally intensive, and their complex architecture often demands a substantial amount of data for effective training. On the other hand, the space of possible drug combinations is vast, and only a small fraction of potential drug pairs has been experimentally tested. This sparsity in the synergy space can make it difficult for GNNs to make accurate predictions, especially for untested combinations. With limited data and complex models, there is a high risk of overfitting, where the model learns to perform well on the training data but fails to generalize to new drug combinations. Careful regularization and validation strategies are needed to address this issue.

Moreover, interpretability is a major concern, as GNNs are frequently seen as black-box models, rendering it difficult to discern the reasoning behind their predictions (Zhu et al. 2022). These models often lack the mechanistic insights necessary to explain the underlying reasons for the synergistic or antagonistic effects observed in specific drug or compound combinations.

Particular GNN architectures may hinder the expressivity of GNNs, which pertains to their ability to represent and differentiate diverse graph structures. Despite their popularity, some widely used GNNs, including GCNs, exhibit a theoretical limitation in expressivity. This lack of expressivity can lead to these models underfitting the training data, resulting in suboptimal performance, particularly when confronted with complex relationships within graph data (Xu et al. 2018).

Additionally, the presence of heterophily, where interconnected nodes exhibit diverse attributes or labels, presents a challenge for GNNs (Luan et al. 2022). GNNs may face difficulties when applied to heterophilic graphs in the context of link prediction (Zhou et al. 2022). An example of a heterophilic graph in drug synergy prediction is one that incorporates links representing synergistic interactions between drug pairs and various cell lines. Such a graph poses a challenge for link prediction, demanding for the adoption or development of GNNs specially tailored to operate on graphs with diverse node types and attributes such as Heterogeneous Graph Attention Networks (Wang et al. 2019).

Overall, while GNNs hold promise in the field of drug synergy prediction, they come with various specific or general limitations. These limitations call for active exploration of techniques to enhance the strengths of GNNs to improve the accuracy, generalizability, and interpretability of drug synergy predictions.

6.2 The choice of metrics of synergism: no standard reference model

To optimize drug combinations, efficient identification of synergistic effects is crucial. Classical reference models often fall into two categories of effect-based (e.g., Bliss Independence and HSA) and dose–effect-based (e.g., Loewe Additivity and ZIP). Effect-based models assess interaction effects based on individual drug effects, often measured by cell responses like cell death, viability, and growth rate. Dose–effect-based models, introduced more recently, offer enhanced definitions of synergy, additivity, and antagonism for drugs with nonlinear dose–effect curves, surpassing the limitations of reference models.

While Loewe additivity and Bliss independence are commonly employed in synergism research, there is still no consensus due to the limitations inherent in all reference models, as discussed in prior reviews (Duarte and Vale 2022). Furthermore, the suitability of different metrics varies depending on the specific biological context. For instance, Loewe additivity is well-suited when drugs target the same pathways, while the Bliss independence principle is more relevance in cases where drugs are mutually nonexclusive, each targeting distinct signaling pathways (Liu et al. 2018). Consequently, the development of new synergism metrics remains an active research area (Wooten et al. 2021).

Developing GNN models with metric-agnostic capabilities has the potential to improve their suitability for experimental testing. For instance, SDCNet (Zhang et al. 2022a) was assessed across various metrics and datasets, mitigating the performance biases associated with specific synergism reference models. DeepDDS (Wang et al. 2022a), on the other hand, aggregated various synergy scores associated with identical drug pair-cell line combinations through averaging through averaging. This strategy has the potential to improve annotation accuracy while reducing reliance on a specific synergy metric for defining synergism.

Nonetheless, it is important to note that synergism alone may not suffice for evaluating the clinical promise of drug combinations. Complementary measures, such as the therapeutic index, which assesses the relative toxicity of anticancer treatments in normal tissues, should also be considered (Ocana et al. 2012).

6.3 Synergy score thresholding and class imbalance mitigation

The process of thresholding synergism metrics to categorize numerical values into either synergistic, additive, and antagonistic (or non-synergistic) annotations plays an important role in the development of classification models. This step significantly impacts class balance, data distribution, sample sizes, and the accuracy of categorization of numeric scores to categories. Different synergy metrics exhibit diverse distributions, and thresholding should be executed while considering the underlying data distribution, in conjunction with biological relevance and clinical evidence when available. Current studies often underestimate the importance of thresholding, leading to the selection of seemingly arbitrary thresholds or the adoption of thresholds used in previous studies without adequate justification. Therefore, there is a need for a benchmarking study to systematically evaluate the impact of threshold selections on the performance of GNN models.

Thresholding strategies have a direct impact on class imbalance. For example, one can designate the upper quartile and lower quartile scores as synergistic and antagonistic, respectively, in order to attain a balanced dataset and improve the accuracy of categorization (i.e., reduce the risk of false positive or false negative annotations resulting from more relaxed thresholding). However, this approach comes at the cost of reducing the size of the training set. Alternatively, techniques like oversampling or undersampling, such as SMOTE (Chawla et al. 2002), can be employed to balance the dataset. Class imbalance also affects the performance metrics on the test set. To address this, Jiang et al. (2020) selected 10% of the positive data as test samples and balanced this by randomly choosing an equal number of negative samples for their evaluation. However, it is important to note that due to the inherent imbalance in drug synergism (i.e., synergistic drug combinations are often rare when compared to non-synergistic ones), creating a perfectly balanced test dataset may not provide an accurate assessment of the model’s effectiveness.

The choice of performance evaluation measures is also important in the presence of class imbalance. AUC (area under the ROC curve) and AUPR (area under the precision-recall curve) are two commonly used metrics for evaluating drug synergy performance. AUC-ROC measures the trade-off between sensitivity and specificity, making it suitable when the cost of false positives and false negatives is roughly equal or when positive and negative instances are approximately balanced. In contrast, AUC-PR is more appropriate when the cost of false positives and false negatives is highly asymmetric or when the positive class is rare (Sofaer et al. 2019). Li et al. (2023b) demonstrated that even when negative data outnumbered positive data threefold (using the DrugComb dataset), the model’s AUC remained unaffected. Hence, the use of AUPR is more recommended.

7 Conclusions

In this study, we present a comprehensive review of GNN-based approaches for predicting synergistic drug combinations. We have curated a total of 25 GNN-based models developed up to the date of our literature search (July 2023). We assessed these models, considering various aspects, including their underlying GNN architectures, the nature of the prediction problem (classification vs. regression), the types of datasets employed (in vitro or clinical), the features incorporated, and the synergy metrics applied. Furthermore, we summarized the strengths, and limitations of each study. Additionally, we conducted a comparative assessment of the prediction performance of GNNs in in vitro studies, stratified by the respective datasets, synergism metrics, thresholding approaches, and validation strategies employed. This comprehensive study provides an overview of the current state of the field, offering insights into the progress and challenges in the field of synergistic drug combination prediction using GNNs and beyond.

7.1 Limitations and future directions

While we have presented performance evaluation metrics of GNN models on different datasets, direct comparisons are challenging due to variations in data preprocessing methods such as thresholding, among other factors. A benchmarking study that controls confounding conditions is essential to facilitate direct model performance comparisons on identical datasets, enabling the identification of state-of-the-art algorithms. Additionally, this study does not aim to provide either qualitative or quantitative assessments of GNNs in comparison to other drug synergy prediction algorithms, given the limited space available. However, a future study comparing high-performing GNNs with other top-performing drug synergy prediction algorithms could offer valuable insights.

The field of GNNs for drug synergy prediction offers room for improvement from various perspectives. In addition to the suggestions outlined in the Discussion section, deeper integration of biological networks and graph theory concepts could enhance performance. Exploring techniques such as identifying maximum cliques with GNNs to unveil relationships between entities (Min et al. 2022) and utilizing minimum dominating sets for more accurate link predictions among neighbors (Rani 2012) offers promising directions for future research. These strategies have demonstrated potential in recent drug synergy prediction studies (Zhang et al. 2023; Zhang and Tu 2022).

Enhancing the biological relevance of predictions can be achieved through the incorporation of knowledge graphs reflecting associations in biological systems, such as protein–protein interactions. By integrating these graphs with complex protein features, predictions can be improved. Leveraging cutting-edge protein structure prediction (For proteins whose structure may not be available) algorithms like Alphafold (Jumper et al. 2021), can significantly improve the accuracy of protein representations for integration into GNN models.

In summary, the field of drug synergy prediction using GNN and emerging techniques continues to evolve. Ongoing research, including benchmarking studies, enhancing biological relevance, and exploring novel strategies, offers prospects for more accurate and clinically translatable predictions. Despite the general interest within the research community and pharmaceutical industry regarding the use of GNNs in drug discovery, the practical implementation of these methods in pre-clinical and clinical settings is still in its early stages due to the recent emergence of this technology in this field. As the field continues to evolve, future research and advancements will likely contribute to a more comprehensive understanding of the advantages and applications of GNN in drug combination prediction across diverse disease areas.