Introduction

Discovery of a new drug, especially for cancer can be a very time-consuming and costly process. It normally takes between 10 to 15 years to develop a new drug [1] and can cost upward of tens of billion dollars [2,3,4]. On the other hand, the success rate of developing a new drug for cancer is very low [5], and the number of new FDA-approved drugs has been declining since past 25 years [6].

Drug repositioning and repurposing are effective alternative strategies to find new uses of existing drugs. Both drug repositioning and repurposing processes consist of using an existing drug for treatment of a disease other than its primary or initial purpose. If the drug is already FDA-approved, the process is called drug repurposing, while if the drug is in trial or experimental phase, the process is called drug repositioning. Since in this work we use the same methodology for all existing drugs, irrespective of their FDA-approval status, for simplicity, we refer to the methodology as “drug repurposing” in a general sense.

In case of repurposed drugs (and to a lesser degree for repositioned drugs, depending on their trial or experimental stage), the overall cost and time associated with using it for treatment of other diseases is significantly lower than developing a new drug [4].

In order to repurpose an existing drug for a new disease, the main challenge is to identify new relationships between drugs and diseases. To overcome this challenge, a variety of approaches have been introduced including computational, biological and experimental approaches, as well as hybrid schemes that combine both computational and biological techniques. Computational approaches for drug repurposing bear much lower cost and other barriers in comparison to biological experimental approaches, which makes it a more appealing strategy and a very good starting point for further clinical trials and biological validations [7].

The majority of existing computational methods for drug repurposing are based on the comparison between gene expression response of various cell lines before and after treatment or a combination of several types of data corresponding to various aspects of disease-drug relationships [8,9,10,11]. For example, Lotfi et al. grouped drug repurposing methods based on their principle source of biological data and core methodology, including gene regulatory networks, metabolic networks and molecular interaction networks [9], while Zou et al. categorized drug repurposing methods into two groups of data-driven and hypothesis-driven approaches [11]. Xue et al., on the other hand, focused on the underlying methodology used in drug repurposing, when it regards to categorizing those methods [12]. Luo et al. used Singular Value Thresholding (SVT) to predict scores for unknown drug-disease pairs based on known relationship between drugs and diseases [13]. Zhang et al. utilized a drug similarity network, a disease similarity network, and known drug-disease associations to explore the potential associations among unrelated pairs of drugs and diseases [14].

Generally speaking, we can group drug repurposing approaches into three distinct groups: text-mining approaches [15,16,17,18,19,20,21], semantics-based approaches [22,23,24], and finally network-based approaches [25,26,27,28,29,30,31,32,33,34,35,36]. The latter takes into the account the relationship and interactions between genes in their corresponding pathways. For example Bourdakou et al. used statistical co-expression networks to highlight and prioritize genes for breast cancer subtypes and leveraging them for drug repurposing [36]. One of the biggest difference between the proposed framework and the previous network-based methods is the ability of the proposed framework to identify not only single drugs, but also pairs of combined drugs (and theoretically unlimited number of drug combinations) for a given disease with a reasonable computational overhead, which enables it to find combinations of drugs that could far outreach the therapeutic effects of single drugs for a given breast cancer subtype (or any other disease in general).

This paper introduces a novel network-based approach to identify drugs with the highest repurposability with respect to each of ten breast cancer subtypes. This goal is achieved by first finding driver genes responsible for each subtype using genomic and transcriptomic data, which are then used along with pathway data in order to find those drugs that have the highest repurposing scores for each of ten breast cancer subtypes. The results show that the proposed method is able to identify potential effective known and experimental drugs developed for other diseases to be repurposed for various breast cancer subtypes. Indeed, further wet lab analysis is needed to determine the therapeutic level of identified drugs on each breast cancer subtype. For reference, what we refer to here as ten breast cancer subtypes are ten distinctive sub-groups identified in [37].

Moreover, we used the proposed method to identify single and pairs of drugs for Triple Negative (TN) breast cancer tumors. Between 10% to 15% of breast cancer cases are considered as TN, where they lack any hormone epidermal growth factor receptor 2 (HER-2), estrogen receptors (ER), and progesterone receptors (PR) in the tumor [38]. Thus, the traditional targeted (often hormone) therapy that targets one of these hormones are ineffective in these cases. This lack of targeted therapies has intensified the interest in this group of patients. Our results show that the proposed method were able to computationally identify single and paired repurposed drugs that could have therapeutic effect on this this group.

Materials and methods

For drug expression data, we used level-5 data of the LINCS dataset (from Gene Expression Omnibus with the reference number GSE70138), which consists of z-score values of more than 118,000 drug/ concentration/ treatment_time for more than 12,000 genes. In order to make the process more computationally manageable, we used only the lowest and highest dosages of each drug (generally 0.04 and 10.0 μmol, correspondingly) and a default 24-h time-point frame for the analysis, in case of having more than one time-point frame. If a drug does not have a 24-h time-point frame, we use the default time-point frame indicated in LINCS database.

Obtaining candidate genes

In the first step, we use CNA, CNV and GE data to find the most informative genes, separately for each subtype. To do so, we first use CNA/CNV information to find those genes that have very high genotypic aberration in each subtype based on their GISTIC score [39]. GISTIC identifies significant aberrations using two steps. In the first step, it calculates the G-score statistic, which involves both the frequency of occurrence and the amplitude of the aberration. In the second step, it assesses the significance of each aberration using Fisher’s Exact test [40]. these two steps take place in Fig. 1a. To make sure that we only target aberrations in the copy number and not common variations across different populations, we use the HapMap database [41] (shown in Fig. 1b. HapMap is a catalog of common genetic variants that occur in human. We only consider those genes for a significant test that have CNA but no CNV. We also use gene expression data to identify the top differentially expressed genes for each subtype. For this, we used Chi2 [42] to rank genes based on their ability to separate each subtype from the remaining subtypes. At the end, after obtaining the top genes using CNA/CNV and GE data separately, if CNA/CNV analysis determined N genes as significant in terms of their genomic aberrations, we select the top N genes from GE data; then out of these two gene sets, we take the intersection as candidate genes. These candidate genes are those genes that have both significant differences in terms of gene expression and copy number aberrations.

Fig. 1
figure 1

Schematic view of the proposed framework for identification of best repurposing drugs for each breast cancer subtype. METABRIC dataset is used to obtain copy number aberration and gene expression data for breast cancer subtypes. HapMap data is used to obtain copy number variation information. Lincs dataset is used to obtain the effect of different drug compounds on gene expression of cancer samples. KEGG dataset is used to create a universal pathway network [44]

Obtaining gene scores

The measurement is a normalized z-score value for each replicate of a given gene treated with the same perturbation agent (i.e., perturbagen: either drugs or small molecule compounds or others) based on 95% confidence level [43]. Thus, for each pair of gene and drug agent, we consider a value between − 10 and 10. A value close to zero shows that the expression of a given gene will not be affected by the drug agent. In comparison, based on the concepts of gene expression inhibition and induction, a negative or positive score shows that the expression of the given gene decreases or increases, respectively, because of the effect of the drug agent.

We have used the METABRIC dataset [37], which contains the copy number values and gene expression levels of 2000 primary breast tumors with long-term clinical follow-up. It can be accessed from the European Genome-Phenome Archive using the accession number EGAS00000000083. In [37], the copy number aberrations and copy number variations generated using Affymetrix SNP 6.0 arrays and gene expression data were obtained using Illumina HT 12 technology. The dataset contains two sets of data, validation set and discovery set. Due to the lack of class labels in the validation set, in this paper we only use the discovery set, which contains 997 samples from ten subtypes of breast cancer. Each sample contains expression data for 48,803 probe IDs. The expression of all probes corresponding to the same gene have been merged based on the median expression of those probes, which maps all the probes to 24,351 unigenes. We calculate the same normalized z-score values for each of the ten breast cancer subtypes in the METABRIC dataset, (which can be accessed from European Genome-Phenome Archive with the study id EGAS00000000083), in such a way that the normalized z-score of each gene is a value between -10 and 10. A value close to zero shows that the expression of a given gene will not be affected by the disease, while a negative or positive score shows that the expression of the given gene decreases or increases, respectively, because of the effect of the disease.

Creating unified global human pathway

In the next step, we use the KEGG Pathway database to find all possible paths between genes [44]. A biological pathway is a series of actions among molecules in a cell that leads to a certain product or a change in the cell. It can trigger the assembly of new molecules, such as a fat or protein, turn genes on and off, or spur a cell to move [45]. The version of the KEGG Pathway database we used contains 265 human pathways. So, by taking union of all genes and also all existing direct relations between each pair of genes, we create a unified global human pathway (UGHP). The UGHP contains interaction between 4985 genes in all 265 human pathways in KEGG as a matrix, where \(UGHP_{ij}\) represents signaling interaction type between gene i and gene j. The values of the matrix could be -1, 1 or 0 representing activation, suppression, or no direct signal from gene i to gene j.

Calculating drug-disease repurposing score

At this point, for each drug \(D_i\) and breast cancer subtype \(S_j\), we perform the following steps:

  1. 1

    Select the top 50 affected genes by the drug \(D_i\) from the LINCS dataset by ranking the genes based on their absolute z-score values and call them drug genes. Note that at the end of the process, the pipeline focuses only on negative correlations between drug and disease.

  2. 2

    Use the candidate genes corresponding to subtype \(S_j\) that we identified in using copy number alteration (CNA), copy number variation (CNV) and gene expression (GE) data [46]. We call these candidate genes disease genes.

  3. 3

    Map back these drug and disease genes to UGHP to create a drug-disease network, \(D_i S_j\), which contains the shortest paths between each pair of drug-disease genes (shown in Fig. 1c). Thus, the maximum number of nodes in \(D_i S_j\) network, N, is given as follows:

    $$\begin{aligned} N = G_{dr}+G_{di}+G_i \end{aligned}$$
    (1)

    where \(G_{dr}\) is the number of drug genes, \(G_{di}\) is the number of disease genes, and \(G_i\) is the number of intermediate genes in the shortest path between each pair of drug and disease genes.

  4. 4

    Since for each gene in this drug-disease network we have two z-score values (one for the effect of drug and one for the effect of disease), we construct two arrays, one consists of drug z-score values while the other consists of disease z-score values with identical gene order.

  5. 5

    We compute Pearson correlation [47, 48] between the above arrays using the following formula (shown in Fig. 1d):

    $$\begin{aligned} r = \frac{ \sum _{i=1}^{n}(x_i-{\bar{x}})(y_i-{\bar{y}}) }{ \sqrt{\sum _{i=1}^{n}(x_i-{\bar{x}})^2}\sqrt{\sum _{i=1}^{n}(y_i-{\bar{y}})^2}} \end{aligned}$$
    (2)

    where \(x_i\) and \({\overline{x}}\) are the z-score value of gene i and average of all z-score values for the drug group in drug-disease network, and \(y_i\) and \({\overline{y}}\) are the z-score value of gene i and average of all z-score values for the disease group in the drug-disease network, respectively.

Figure 1 depicts the proposed framework for the identification of best repurposing drugs for each breast cancer subtype.

Obtaining a positive correlation between a given drug genes and the disease genes means that the drug and the disease have similar effect on the genes in the drug-disease network. In contrast, obtaining a negative correlation implies that the drug’s effect on the genes in the drug-disease network is opposite to the effect of the disease on the genes in that network.

Obtaining a negative correlation is a favorable case in this context, because we are looking for drugs that could have a potential reverse effect on the genes affected by the disease.

Identifying combinations of drugs for repurposing

In the previous section, we solely focused on effects of each individual perturbation agent on each subtype of breast cancer. In this section, we test the hypothesis that combination of two or more drugs might be more effective in terms of reversing the effect of the disease, i.e., by generating a more negative correlation with the disease than each drug independently. For simplicity, in this step, we assume that the z-score value of a given pair of drugs to be additive with respect to the z-score value of each of those drugs independently. In other words, if we assume that the z-score value of drug \(D_i\) on gene G is \(X_i\), and the z-score value of drug \(D_j\) on the same gene G is \(X_j\), we can then assume that the z-score value for the given pair of drugs [\(D_i\),\(D_j\)] is X, where \(X=X_i + X_j\).

Thus, in order to find the best repurposed pair of drugs for a given subtype of breast cancer, first, we calculate the combined z-score value of all genes for every pair of drugs, and then we pick the top genes with the highest absolute value of their combined z-score. Figure 2 depicts the proposed framework for identification of the best pair of repurposed drugs for each breast cancer subtype.

Fig. 2
figure 2

Schematic view of the proposed framework for identification of best pair of repurposing drugs for each breast cancer subtype. METABRIC dataset is used to obtain copy number aberration and gene expression data for breast cancer subtypes. HapMap data is used to obtain copy number variation information. Lincs dataset is used to obtain the effect of different drug compounds on gene expression of cancer samples. KEGG dataset is used to create a universal pathway network [44]. And finally, DrugBank’s drug interference checker is used to check any possible interference between each pair of drugs

Calculating drug-disease repurposing score for a pair of drugs

For pair of drug \(D_i\) and \(D_j\) and breast cancer subtype \(S_k\), we perform the following steps:

  1. 1

    Calculate the combined z-score value of all genes for pair of drugs \(D_{ij}\), and selecting the top 50 genes with the highest absolute value of their combined z-score values as paired drug genes.

  2. 2

    Use the candidate genes corresponding to subtype \(S_k\) that we identified in using copy number alteration (CNA), copy number variation (CNV) and gene expression (GE) data [46]. We call these candidate genes disease genes.

  3. 3

    Map back these drug and disease genes to UGHP to create a drug-disease network, \(D_{ij} S_k\), which contains the shortest paths between each pair of drug-disease genes (shown in Fig. 2c). Thus, the maximum number of nodes in \(D_{ij} S_k\) network, N, is given as follows:

    $$\begin{aligned} N = G_{dr}+G_{di}+G_{ij} \end{aligned}$$
    (3)

    where \(G_{dr}\) is the number of drug genes, \(G_{di}\) is the number of disease genes, and \(G_{ij}\) is the number of intermediate genes in the shortest path between each pair of combined drug and disease gene.

  4. 4

    Since for each gene in this drug-disease network we have two z-score values (one for the effect of drug and one for the effect of disease), we construct two arrays, one consists of drug z-score values while the other consists of disease z-score values with identical gene order.

  5. 5

    We compute Pearson correlation [47, 48] between the above arrays using the following formula (shown in Fig. 2d):

    $$\begin{aligned} r = \frac{ \sum _{i=1}^{n}(x_i-{\bar{x}})(y_i-{\bar{y}}) }{ \sqrt{\sum _{i=1}^{n}(x_i-{\bar{x}})^2}\sqrt{\sum _{i=1}^{n}(y_i-{\bar{y}})^2}} \end{aligned}$$
    (4)

    where \(x_i\) and \({\overline{x}}\) are the z-score value of gene i and average of all z-score values for the drug group in drug-disease network, and \(y_i\) and \({\overline{y}}\) are the z-score value of gene i and average of all z-score values for the disease group in the drug-disease network, respectively.

  6. 6

    Finally, we perform a post-verification analysis on drug interference for the identified pairs of drugs using DrugBank’s Interaction Checker tool, in order to confirm if there is any known interference between any of those identified pairs. (shown in Fig. 2e).

    A drug-drug interference is a situation in which one drug affects the activity of another. Drugs may interact with each other to cause side effects that are unexpected or unintended. If any pair of drugs have known drug-drug interference, we remove them from the analysis. For example, the combination of Tadalafil and palbociclib generated a negative correlation of -0.65 with subtype 3, which put them in the top 10 list of paired-drugs for this subtype. But given the fact that they have a known moderate interaction with each other, this pair has been removed from the analysis [49]. The reason for doing a post-verification analysis instead of checking it as a pre-process step, is that the post-verification approach gives us the flexibility of updating the results with newly discovered drug interference in the future without a need to rerun the analysis. Also, using a post-verification approach gives us the ability to deal with interferences between a given pair of drugs at different levels depending on the type or level of interference.

Extension to triple-negative breast cancer tumors

In this section, we leverage the proposed pipeline for the ten breast cancer subtypes considered in the previous section, to identify potential repurposable drugs specifically for triple negative beast cancer tumors. In order to do so for identifying candidate genes, we treat triple negative samples in METABRIC as one group and all the remaining samples as another group. By running the pipeline introduced in [46], we identify the most discriminative genes in terms of gene expression and copy number aberration between TN and non-TN groups. Then, using the pipelines depicted in Figs.  1 and 2, we identify the top repurposing single and paired drugs for the TNBC subtype.

Results and discussion

For reference, Table 1 shows the list of drugs that have been approved by FDA to date for breast cancer treatment [50]. The results show that the proposed model is able to identify highly negative correlated drugs corresponding to each of ten breast cancer subtypes, both when used in single drug mode or for identifying pairs of drugs. Some of the well-known and widely used breast cancer drugs have been identified among the top drugs, which again shows that the proposed approach was able to pick up current drugs with high accuracy. For example, Goserelin (Zoladex) is a well known and FDA approved hormone therapy drug for treatment of BC that showed up in top ten drugs for subtypes 2 and 8. Also, Palbociclib (Ibrance) is another well known and FDA approved chemo therapy drug for treatment of BC that showed up in top ten drugs for subtype 4. Moreover, Ruxolitinib, which showed up in top ten drugs for 9 out of 10 subtypes (Table 12) has been under several trials and studies regarding its potential inhibiting effects on BC [51, 52].

Table 1 List of FDA-approved drugs for breast cancer treatment

Single drug repurposing

Figure 3 shows the distribution of drug repurposing scores across the ten breast cancer subtypes. There are a few interesting observations. First, the response level of different BC subtypes to tested drugs are different. While the distribution of correlation scores among the tested drugs versus some of the subtypes such as subtypes 1, 4 and 6 are relatively narrow (which implies relatively lower response level of the aforementioned subtypes to the tested drugs), in some other subtypes, such as subtypes 2 and 8, we observe a wider distribution of these scores. This shows that effects of tested drugs could be widely different across subtypes. The second observation is regarding the median of these scores. As shown in the figure, in all subtypes, we observe a slight distribution bias toward negative repurposing scores, which implies that the tested drugs tend to exhibit more of a therapeutic effect than adverse effect.

Fig. 3
figure 3

Distribution of drug-disease correlation for 3742 drugs across 10 breast cancer subtypes

Tables 2, 3, 4, 5, 6, 7, 8, 9, 10 and 11 show the top 20 inhibiting drugs corresponding to each of the ten subtypes. These drugs fall into three categories. Experimental drugs are those that are at the pre-clinical or at an animal testing stage. Investigational drugs are those that are in stage I, II or III of human clinical trials. Finally, Approved drugs are those drugs that have already been approved by FDA to be used for treatment of various diseases. Drugs that are FDA approved to be used for BC treatment (i.e. those listed in Table 1) have been highlighted in bold. Also, reference column lists any publication that suggested usage of that drug for BC treatment.

Table 2 Top 20 drugs corresponding to subtype 1
Table 3 Top 20 drugs corresponding to subtype 2
Table 4 Top 20 drugs corresponding to subtype 3
Table 5 Top 20 drugs corresponding to subtype 4
Table 6 Top 20 drugs corresponding to subtype 5
Table 7 Top 20 drugs corresponding to subtype 6
Table 8 Top 20 drugs corresponding to subtype 7
Table 9 Top 20 drugs corresponding to subtype 8
Table 10 Top 20 drugs corresponding to subtype 9
Table 11 Top 20 drugs corresponding to subtype 10

Some of the drugs in these lists are well-known and have been used extensively for either breast cancer or other types of cancer. For example, Raloxifene is among the top ten drugs in most, if not all, of the ten subtypes. It was originally approved by FDA in 1997 for the management and prevention of osteoporosis in postmenopausal women and reduction in risk for invasive breast cancer. However, recent studies have shown that this drug might be effective for breast cancer treatments [53, 54]. Also, Ruxolitinib, which is among the top three drugs for all but subtype 4, was approved by the FDA for the treatment of patients with intermediate or high-risk myelofibrosis [55], though it is currently used in multiple clinical trials in patients with metastatic breast cancer as well [51, 56].

The findings discussed above show that the proposed method is able to correctly identify Raloxifene and Ruxolitinib drugs as very good candidates for most of the BC subtypes. We also observe investigational and experimental drugs in the list for each of the subtypes that could have therapeutic effects on each BC subtype. For example, PHA-793887 is a potent inhibitor of multiple cyclin-dependent kinases such as CDK2, CDK5 and CDK7, and has been shown to possess the ability to affect the differentiation of melanoma cells. [57, 58]. This drug is currently in a clinical trial phase [59].

In another comparison, Table 12 shows the top 30 drugs ranked by their median score across all ten subtypes. As shown in the table, some drugs such as Palbociclib and PHA-793887 demonstrate potential effectiveness across all of the subtypes by being ranked among the top drugs. In contrast, some others such as Silmitasertib and Proglumide demonstrate potential effectiveness in some of the subtypes, while being less effective in others.

Table 12 Rank comparison among the top 30 drugs across all 10 breast cancer subtypes

Also, Figs. 4 and 5 show the perturbation scores and drug-disease network of one of the top identified drugs, Ruxolitinib, for Subtype 1. Ruxolitinib, as mentioned earlier in this paper, is a small-molecule kinase inhibitor that is selective for the Janus Associated Kinases (JAK) 1 and 2, which are responsible for the mediation of cytokine and growth factor signaling, which, in turn, affects the immune function and hematopoiesis [60].

Fig. 4
figure 4

Perturbation scores across all genes involved in drug-disease network of top repurposed drug (Ruxolitinib) corresponding to subtype 1. Red bars depict the scores of subtype 1, while green bars depict the scores for the repurposed drug

Fig. 5
figure 5

Unified Global Human Pathway (UGHP) subnetwork corresponding to the top repurposed drug (Ruxolitinib) and subtype 1. Blue nodes depict Drug related genes, while red nodes depict Subtype 1 related candidate genes involved in this drug-disease pathway

Paired drug repurposing

Figures 6, 7, 8, 9, 10, 11, 12, 13, 14 and 15 depict the top pairs of drugs with the highest anti-correlation scores with Subtypes 1-10 of breast cancer. Here, we show the pairs of drugs that have a better anti-correlation score than the best single drug for each of these subtypes. Also, we limit the number of pairs to a maximum of 100 top pairs of such drugs, if there is more than 100 pairs with better score than the best single drug.

Fig. 6
figure 6

Top pairs of drugs with highest anti-correlation corresponding to subtype 1

Fig. 7
figure 7

Top pairs of drugs with highest anti-correlation corresponding to subtype 2

Fig. 8
figure 8

Top pairs of drugs with highest anti-correlation corresponding to subtype 3

Fig. 9
figure 9

Top pairs of drugs with highest anti-correlation corresponding to subtype 4

Fig. 10
figure 10

Top pairs of drugs with highest anti-correlation corresponding to subtype 5

Fig. 11
figure 11

Top pairs of drugs with highest anti-correlation corresponding to subtype 6

Fig. 12
figure 12

Top pairs of drugs with highest anti-correlation corresponding to subtype 7

Fig. 13
figure 13

Top pairs of drugs with highest anti-correlation corresponding to subtype 8

Fig. 14
figure 14

Top pairs of drugs with highest anti-correlation corresponding to subtype 9

Fig. 15
figure 15

Top pairs of drugs with highest anti-correlation corresponding to subtype 10

Moreover, Tables 13, 14, 15, 16, 17, 18, 19, 20, 21 and 22 show top ten pairs of drugs for each of the ten breast cancer subtypes. Observing these tables, we infer that many of the top ranked pairs of drugs contain at least one individual top ranked drug, though there are some notable exceptions. For example, drugs TG-100801 and Phensuximide are not even among the top 50 repurposed drugs corresponding to Subtype 2 when administered independently with mere correlation scores of − 0.57 and − 0.55 to subtype 2, respectively. However, when administered together, the correlation between that pair and subtype 2 grows to a noticeable − 0.97 range, which places the pair in the second spot among the top repurposed pairs for that subtype. We observe a similar catalyzing effect in combination of Pregnenolone and Bromocriptine with respect to subtype 4, and combination of Amikacin and Tadalafil with respect to subtype 5. Also, Fig. 16 depicts drug-disease network (DDN) of two perturbation agents (Tadalafil and PF-04620110) and subtype 1 of breast cancer. Blue nodes depict drug related genes, while red nodes depict candidate genes related to subtype 1. Also solid arrows depict activating relationship between involved genes, while dotted arrow depicts a suppressing relationship.

Table 13 Top 10 pairs of drugs with their correlation with subtype 1 of breast cancer, considering both combined and individually.
Table 14 Top 10 pairs of drugs with their correlation with subtype 2 of breast cancer, considering both combined and individually
Table 15 Top 10 pairs of drugs with their correlation with subtype 3 of breast cancer, considering both combined and individually
Table 16 Top 10 pairs of drugs with their correlation with subtype 4 of breast cancer, considering both combined and individually
Table 17 Top 10 pairs of drugs with their correlation with subtype 5 of breast cancer, considering both combined and individually
Table 18 Top 10 pairs of drugs with their correlation with subtype 6 of breast cancer, considering both combined and individually
Table 19 Top 10 pairs of drugs with their correlation with subtype 7 of breast cancer, considering both combined and individually
Table 20 Top 10 pairs of drugs with their correlation with subtype 8 of breast cancer, considering both combined and individually
Table 21 Top 10 pairs of drugs with their correlation with subtype 9 of breast cancer, considering both combined and individually
Table 22 Top 10 pairs of drugs with their correlation with subtype 10 of breast cancer, considering both combined and individually

Goserelin-Acetate, which is sold under brand name Zoladex among others, is as a sex hormone suppression drug approved by FDA intended for use in the treatment of breast and prostate cancer [61]. As shown in Tables 3,14, Goserelin-Acetate as a single drug produces an correlation score of -0.8 with respect to subtype 2 of BC, which places it in the top spot among the single drugs for this subtype. However, if combined with either Norethindrone, Phensuximide or Nimesulide, the correlation score decreases to almost -0.97. This means that combining either of the aforementioned drugs with Goserelin-Acetate can result in a more effective therapeutic drug for this BC subtype.

Triple negative breast cancer subtype

Table 23 shows the identified driver genes corresponding to the TN group. Moreover, Table 24 shows the top 20 single repurposed drugs for the triple negative breast cancer (TNBC) subtype. As shown in the table, Ruxolitinib is by far the most negatively correlated drug for TNBC subtype and can be investigated further for its effectiveness on this particular type of breast cancer [51, 52, 62].

Table 23 Identified driver genes associated with triple negative breast cancer subtype
Table 24 Top 20 drugs along with their ranking corresponding to Triple negative breast cancer subtype
Fig. 16
figure 16

Drug-disease network (DDN) of Tadalafil and PF-04620110 drugs with subtype 1. Blue nodes depict drug related genes, while red nodes depict candidate genes related to subtype 1

Moreover, Table 25 and Fig. 17 show the top 10 pairs of repurposed drugs and their corresponding scores with respect to TNBC subtype. We observe that despite being the tenth repurposed single drug, Bromocriptine is managed to become one of the most effective repurposed drugs when paired with Isradipin, Emtricitabine and Etofylline-Clofibrate. Although Bromocriptine has been suggested in earlier studies as a potential repurposed drug in cancer therapy [63], these new combinations have not seem to be evaluated so far for breast cancer treatment, which can be investigated further both computationally and clinically. Another interesting observation from Tables 2425 is that there are only four pairs of drugs with anti-correlation scores better than Ruxolitinib as the best single drug identified for TNBC subtype.

Fig. 17
figure 17

Top pairs of drugs with highest anti-correlation corresponding to TNBC subtype

Table 25 Top 10 pairs of drugs with their correlation with respect to TN breast cancer subtype, both when they are combined and individually

Based on these observations, we infer that from both single drug and paired drug experiments, there are some promising drugs that can be repurposed either individually or in combination with another drug (as a pair) with potential therapeutic effects for each of the ten breast cancer subtypes. Some of these drugs such as Ruxolitinib have a high anti-correlation score for most of the subtypes, while some of the drugs such as Maraviroc [64] seem to be more effective on a particular subtype rather than on others. The fact that the top single drug, Ruxolitinib, is currently in multiple clinical trials in patients with metastatic breast cancer [51, 56] shows that the proposed method is able to computationally predict the potential therapeutic effect of this drug on multiple breast cancer subtypes, as well as on TNBC subtype. Indeed, further wet lab analysis is needed to determine the therapeutic level of identified drugs on each breast cancer subtype.

Conclusion and future work

The proposed computational drug repurposing method is a network-based integration approach that combines transcriptomic, genomics and pathway data in order to find computationally promising repurposed single or paired drugs for each breast cancer subtype as well as TNBC subtype. Some of the top identified drugs are either known (breast) cancer drugs or in different trial phases to be repurposed for breast or other types of cancer, while some of the identified single or paired drugs have not been used for breast cancer treatment yet, which provides opportunity for further clinical experiments and trials. Using genomic and transcriptomic data as well as both copy number variations and copy number aberrations would help the initial process to identify the driver genes more effectively and hence finding the final set of repourposed drugs that can be highly effective for treatment of other types of cancer.

The proposed framework has the potential of identifying a combination of more than two drugs, which could help identifying new and enhanced sets of drugs for various types of cancer as well as other types of diseases. Also as a possible future work, the framework can be extended to leverage more complex and nonlinear combination of drugs in order to find the most suitable sets of drugs for each disease. Moreover, since for the simplicity of the process we excluded pairs of drugs with known drug-drug interference, potential drug-drug interactions have not been considered yet. So, another possible future work could be extending the framework to include such pairs of drugs with known interactions and their effect on the disease.