1 Introduction

Breast cancer (BC) is the most prevalent cancer among women in developed and developing countries [1]. Due to its incidence and mortality, it is considered a serious public health problem worldwide [2, 3]. This type of cancer is a complex and heterogeneous disease with several distinct histopathological and molecular subtypes. Among them, about 60 to 70% of all diagnosed BC are positive for hormone receptor expression (HR+), specifically estrogen receptor (ER+) and/or progesterone receptor (PR+).

The stimulation of both receptors by its ligand molecules—estrogen and progesterone—induces mainly transcriptional changes, which trigger biological processes such as cell survival, proliferation, and differentiation. Therefore, in breast cancer, its constant stimulation by sexual hormones could contribute to tumor increasing and disease progression [4, 5], justifying anti-estrogen therapies as an adjuvant treatment for this type of neoplasia [6].

Tamoxifen (TMX) is one of the most used types of adjuvant treatment in ER+ breast cancer. Administered as a prodrug, TMX must be metabolized in the liver to form active metabolites that will perform their estrogen antagonist functions and thus referred to as a selective estrogen receptor modulator (SERM) which blocks the binding of estrogen to the estrogen receptor and consequently inhibit cell proliferation and survival [7,8,9,10,11]. For this reason, TMX is the standard hormone therapy for HR+ breast cancer patients in the premenopausal stage [12,13,14].

Whereas this hormone adjuvant therapy has considerably improved survival in HR molecularly subtyped BC patients in recent decades, the development of pharmacological resistance has been an increasing challenge for oncology, once the lack of responsiveness to treatment is a critical and limiting factor for therapeutic efficacy [15, 16]. Around 30 to 50% of HR+ BC patients, regardless of the expression level of these receptors, presented intrinsic or acquired resistance to TMX [17, 18]. Consequently, the five-year survival rate after resistance occurrence was less than 20% [19, 20].

Several clinical works have been published in the literature bringing information about a new specific marker involved with this frame of resistance, associating its molecular profile with a better or poor outcome in BC patients. Therefore, a better comprehension of the interrelationship among these identified molecular markers associated with TMX resistance becomes increasingly relevant and necessary to strengthen the understanding of what molecular mechanisms and cell processes are changed in tumor cells when they acquire hormone resistance in an attempt to overcome it and improving therapeutic effectiveness.

A major meta-analysis carried out by Wirapati et al. analyzed the gene expression in breast cancer patients and associated it with important biological processes that could be present in all molecular subtypes [21]. Another meta-analysis by Mihály et al. investigated new independent biomarkers that could be related to the tamoxifen response in breast cancer patients [22].

This work presents a systematic review of recent papers in the literature, focusing only on translational studies with a minimum 3-year follow-up. It highlights how individual biomarkers related to TMX resistance impact the survival rate of ER+ breast cancer patients. Then, it employs a clustering technique to find groups of biomarkers whose biological processes show a strong interaction. Based on this clustering analysis, this work evaluates how these biological processes influence the overall survival rate in ER+ breast cancer patients, indicating which processes could be predictive of clinical results in response to treatment with TMX. To the best of the authors’ knowledge, this is the first systematic review that presents such a clustering technique for the conjoint evaluation of multiple biomarkers in the survival rate of ER+ breast cancer patients.

2 Methodology

2.1 Identifying relevant studies

The manuscript selection was performed according to the PRISMA Extension for Scoping Reviews (PRISMA-ScR) methodology: Checklist and Explanation [23]. The literature review was performed to identify clinical studies assessing the correlation between the expression profile of a molecular biomarker and the long-term outcome of HR+ BC patients in response to TMX treatment. The initial inclusion criteria were original manuscripts, published from 2013/01/01 to 2018/06/30, full texts in any language, obtained from EMBASE or PUBMED databases. For manuscript search, three primary keyword descriptors were applied: “breast cancer”, “hormone therapy” and “resistance”, or their variations listed in Supplementary material 1. Case studies, reviews, and comments were not considered.

2.2 Selection of eligible studies

The manuscripts found in the initial search were analyzed in three stages: first, a title reading was performed, selecting only those presenting all primary keyword descriptors or their variations. Duplicates were removed. The second and third stages consisted of abstract and full-text reading, respectively. These stages ensure that the selected manuscripts respected the inclusion and exclusion criteria, according to Table 1. The essential items of the REMARK protocol [24] were considered in this review. Two independent reviewers performed the manuscript search and selection processes, while a third reviewer solved incongruities. Only manuscripts that assessed molecular biomarkers as a prognostic factor and not as a risk factor were considered. Studies only evaluating patient expression databases, in vivo or in vitro experimentation were also excluded.

Table 1 Study inclusion and exclusion criteria

2.3 Data extraction

After the full-text review stage of the selected manuscripts included in this scoping review, the following information was plotted in Table 2:

  • Population: number of participants and follow-up of only those HR+BC patients treated with Tamoxifen.

  • Marker profile: the marker symbol used by each selected manuscript; the genetic alteration (deletion, mutation, or polymorphisms) when altered; the type of molecule assessed (DNA, RNA, or protein); the molecular status of the analyzed marker in relation to activity, expression, subcellular localization, or post-translational modification, which could alter its functionality.

  • Clinical outcomes and statistical analysis: clinical profile of tested BC patients (survival or recurrence) associated with the alteration of analyzed molecular markers; the statistical analysis of the association between the molecular status of markers and clinical outcome, prioritizing multivariate analysis; the significance level of statistical analysis when available. In order to standardize the collected information, it was plotted molecular profile of markers, which was associated with a poor clinical outcome, except for those presenting polymorphisms or specific subcellular localization.

Table 2 Selected publications accessing the influence of molecular markers on clinical outcomes of Tamoxifen-resistant breast cancer patients

2.4 In silico analysis

2.4.1 Design of networks

After a full-length analysis of manuscripts and data extraction, the official symbol of all selected molecular markers was confirmed in the HUGO Gene Nomenclature Committee (HGNC) website for in silico analysis (Supplementary Table 1). Those presenting statically significant associations (p < 0.05) with some clinical outcomes in HR+ BC patients were assessed together using the ©STRING CONSORTIUM 2019 (version 11.0) web tool. The minimum interaction score adopted in this analysis is 0.700 (high confidence) for each connection (edges), including all active interaction sources used by the tool, excluding text mining. The resulted network was presented with an interaction score of each connection between two markers for further modular analysis of networks. Posteriorly, the unconnected markers were separately analyzed. Only those markers whose molecular profile of expression/activity direct influence the clinical outcomes of HR+ BC patients were considered, excluding micro RNAs. For polymorphisms, only the affected biomarker was included for network design, not considering the polymorphic variant itself.

2.4.2 Modular analysis of network

From the STRING resulting network, it was used ClusterONE plugin [25] of Cytoscape software. The cluster establishment was assessed considering the following criteria: at least three proteins compounding a cluster, density and interaction quality values above 0.5, and significant p-value (p < 0.05). Molecular markers that were not included in any cluster were called unclustered markers.

2.4.3 Gene ontology analysis

Once the clusters within a molecular marker network were identified, it was performed the gene ontology (GO) analysis using the Biological Network Gene Ontology (BiNGO) plugin (version 3.0.3) in Cytoscape software to assess the possible biological processes that could be involved with the specific set of markers grouped in each cluster. Statistically significant clusters were assessed, considering only their members or added to their direct neighbor markers (networked markers directly linked with clustered markers). Statistical significance was measured using a hypergeometric test with the multiple testing correction of Benjamini & Hochberg False Discovery Rate (FDR). Only biological processes with a p-value < 0.05 were considered significant. Based on biological knowledge about BC resistance, the five biological processes often involved with this event were selected to be illustrated in the results and further discussed.

2.4.4 The predictive power of different sets of TMX-resistance markers

In order to assess the predictive power of each set of markers among those significantly associated with any clinical outcome in TMX-treated HR+ BC patients, the Kaplan Meier-plotter web software [26] was used. It was assessed a general analysis for all selected markers, those connected and unconnected markers from STRING analysis, and those clusters obtained from ClusterOne plugin (with or without their direct neighbor markers). The data extracted from the original papers showing high expression/activity of molecular markers associated with poor outcome were directly related in the KM-plotter analysis settings, inverting those presenting low or absence expression/activity associated with poor prognosis status (Supplementary Table 1). Equal weights were attributed to all tested markers. The Kaplan–Meier graphs represent overall survival (OS) with a follow-up threshold of 240 months, splitting patients by median, using all probe sets per gene, without any restriction of breast cancer subtypes or selected cohorts studies as dataset source.

3 Results

It was identified 2.487 articles in EMBASE and PUBMED databases from the keyword descriptors and their variations. The flow diagram of literature analysis, manuscript selection, and in silico analysis is illustrated in Fig. 1.

Fig. 1
figure 1

PRISMA diagram of selection of studies and in silico analysis. The upper section corresponds to the step-by-step manuscript selection process according to the established inclusion and exclusion criteria. The Lower section includes the sequence of in silico analysis performed with all selected biomarkers to search a better and integrated understanding of the influence of each protein in TMX resistance

The application of the eligibility criteria resulted in the exclusion of 2.443 papers that were not adequate to one or more selection factors or did not provide the needed information. The main excluding factors were: (1) in vitro experimentation; (2) in vivo assays; (3) clinical studies assessing less than 35 patients (4) the absence of prognostic factor assessing in clinical studies; (5) the absence of hormone therapy with TMX; (6) manuscripts using of transcription or genomic sequencing data deposited in public databases; (7) works testing other types of hormone therapy, differently of TMX; (8) and review manuscripts. Thereby, 45 manuscripts were selected from the eligibility analysis, analyzing 60 different molecular targets, as shown in Table 2.

The total number of analyzed patients in each selected study has varied between 37 [29] and 1082 [62]. The follow-up of most studies was 5 years or more, and 6 other studies presented a follow-up of 3 years [36, 45, 58, 59, 61, 68] or less [57]. The main reported alterations in assessed molecular targets were changes in their expression level or activity. However, some structural modifications at the DNA, RNA, or protein level were also found, such as polymorphisms, post-translational modifications (phosphorylation), and mutations (deletion or splicing variant).

Fifteen of the 60 selected markers were excluded from the in silico analysis (Fig. 1). Thereby, only 45 of 60 selected markers were eligible for in silico analysis. STRING analysis resulted in a functional association among 23 of 45 tested markers, forming two networks with 5 and 18 nodes, respectively (Fig. 2a).

Fig. 2
figure 2

In silico analysis. A 3 clusters were formed with strongly linked molecular markers, cluster 1(red), cluster 2 (green), and cluster 3 (blue). The markers in gray were not included in the clusters, remaining just interconnected proteins among clusters inside the network. B unconnected markers. A and B still reveal the functional status of each molecular marker. 12 markers (dashed borderline in the molecular markers) decrease in functional activity associated to poor outcomes in BC patients, the other 33 markers presented an elevated functional activity associated to worst outcomes (full borderline in the molecular markers), accordingly information of their original manuscripts. C Gene ontology analysis for biological processes: CLUSTER 1 presented significant involvement with molecular events related to cell proliferation, CLUSTER 2 with the modulation of proliferative mechanisms, and CLUSTER 3 with the mammary gland development and its hormone stimulation. D Genetic ontology analysis of clusters 2 and 3, taking into account not only its constituent markers but with its direct neighbors

Furthermore, to find densely associated subgroups of molecular markers inside the obtained networks, ClusterONE analysis revealed the presence of 3 clusters with strongly linked molecular markers (red, green, and blue markers). Five proteins (AURKA, MK167, AURKB, FOXM1, RRM2) constituted the CLUSTER 1 (red markers) that presented a density value = 1,000 and interaction quality = 1000, displaying a high interactive power among these markers, represented by a p-value = 0.002. The CLUSTER 2 (green markers) was configured with 7 proteins (STAT3, PTPN2, FYN, SRC, PAK2, FRS2, EGFR), presenting a density value = 0.571 and quality = 0.600 and a p-value = 0.022. The CLUSTER 3 (blue markers) was established with 5 proteins (ESR1, GATA3, PGR, NCOR2, EBAG9), presenting density = 0.600, interaction quality = 0.545 and p-value = 0.055, considered a borderline, accordingly the established criteria.

Not all networked molecular markers were included in the clusters, remaining just interconnected proteins among clusters inside the network (gray markers). This subgroup was called unclustered markers and was constituted by six proteins (SOX2, POU5F1, BCL2, IGF1R, TGFBR2, YAP1). Among these, SOX2 and POU5F1 were directly linked only with CLUSTER 2, and TGFBR2 and YAP1 only with CLUSTER 3. On the other hand, BCL2 and IGF1R seem to establish a link between CLUSTER 2 and 3, acting as an intermediate node between them. The other remaining 22 markers presented no interaction among them (Fig. 2b).

Figure 2a and 2b reveal the functional status of each molecular marker that is associated with a worsening outcome in patients, according to the information of their original manuscripts. From 45 tested markers, 12 of them presented a decrease in functional activity associated with poor outcomes in BC patients (dashed borderline in the molecular markers), which may be due to an inactivating polymorphism, a gene deletion, a gene/protein expression decrease, or still an activity reduction. Further, the other 33 markers presented an elevated functional activity associated with the worst outcomes due to an activating polymorphism, elevated gene/protein expression, or increasing functional activity.

In the gene ontology analysis (Fig. 2c), CLUSTER 1 presented significant involvement with molecular events related to cell proliferation, strongly suggesting that this subset of molecular markers could directly influence the mitotic mechanism regulation. The analysis of CLUSTER 2 presented significant implications in biological processes directly connected with signaling pathways involved with the modulation of proliferative mechanisms. Lastly, CLUSTER 3 showed a significant relationship with processes related to the development of the mammary gland and its hormone stimulation, suggesting a possible involvement with mechanisms of differentiation and signaling pathways of sexual hormone response in the breast tissue.

CLUSTER 2, together with its direct neighbors (BCL2, SOX, POU5F1, GATA3, YAP1. ESR1, PGR, IGF1R) (Fig. 2d), has shown significant implication with biological processes as “gland morphogenesis,” “enzyme-linked receptor pathway,” “transmembrane RTK pathway,” “cell differentiation” and “protein autophosphorylation.” The same approach was followed for CLUSTER 3 and its direct neighbors (BCL2, STAT3, SRC, IGF1R), resulting in significant involvement with cellular mechanisms such as “gland morphogenesis,” “response to estrogen stimulus,” “anti-apoptosis,” “negative regulation of DNA-dependent transcription” and “cell proliferation.” It was found that some cell processes had changed between CLUSTER 2 and 3 when direct neighbors were added in analysis, highlighting their proximity inside the network and the possible involvement of both clusters in similar or the same cellular processes.

In order to complement our data, a more in-depth analysis was carried out to test if these clusters would have a predictive power of overall survival in BC patients when treated with TMX (Fig. 3).

Fig. 3
figure 3

Predictive analysis of overall survival for each set of markers in BC patients using the Kaplan–Meier plotter tool. A Analysis of all 45 selected markers. B Only the networked markers (23 markers). C Unconnected markers (22 markers). D Analysis of clusters 1, 2, and 3. E Cluster 2 and 3 with their direct neighbors. Significant p-values (< 0.05) are highlighted in bold

For this analysis, Kaplan Meier-plotter web software [26] was used, assessing the overall survival (OS) with a follow-up of 240 months. A first analysis of all 45 selected markers shown hazard ratio (HR) 1.51 (95% CI 1.1–2.08) and a p-value = 0.0095 (Fig. 3a). A significant predictive power was found when only those networked markers were analyzed, with an HR 1.44 (95% CI 1.05–1.97), p-value = 0.023 (Fig. 3b). As expected, the joint analysis of non-connected markers was not significant in predicting overall survival (Fig. 3c).

Furthermore, it was assessed if the specific molecular clusters would present predictive power for overall survival. Figure 3d shows Kaplan–Meier for each analyzed cluster, and, among them, only CLUSTER 1 has presented significant predictive capability, displaying HR = 2.0 (95% CI 1.6–2.49), p-value < 0.00001. Conversely, knowing the proximity and possible functional overlap between CLUSTER 2 and 3, it was assessed the predictive power of specific clusters together with their direct neighbors (Fig. 3e). Only CLUSTER 3 with its direct neighbors displayed significant results, with HR 1.4 (95% CI 1.02–1.91), p-value = 0.037.

4 Discussion

This review found evidence in the literature about possible molecular targets capable of individually predicting TMX resistance in patients with BC, demonstrating a strong association between their molecular status and specific clinical outcome, such as tumor survival and/or recurrence. Therefore, an in silico analysis was performed to aid in data analysis. An interrelation between some molecular targets was found, forming three distinct clusters, showing significant involvement in biological processes. Our method, based on a dense analysis from literature datasets, showed that there might be groups of cellular processes that could be more involved with resistance to TMX in HR+BC patients. It is worth highlighting that the same weight was assigned for all analyzed markers in the KM-plot analysis since this information is not commonly evaluated in the study models of the original data sources. Therefore, it might represent a limitation in the present analysis since each marker can have different influence degrees within a signaling pathway to control a specific biological process.

The markers MKI67, AURKA, AURKB, FOXM1 in CLUSTER 1 presented significant involvement with molecular events related to cell proliferation (Fig. 2c), as observed by their original manuscripts [32, 35, 39, 44, 64]. However, this cellular process is highly complex, involving several components that could regulate it positively or negatively, and it has been widely studied as it is crucial for resistant subpopulation rising, not only to TMX but also to other drugs [72,73,74,75].

Analysis of overall survival of CLUSTER 1 showed a significant association with a worse outcome of the TMX-treated BC patients (Fig. 3d) when strongly expressed or activated. Bergamaschi et al. showed that the analysis of a gene signature related to mitosis control, orchestrated by marker 14-3-3ζ, was able to control the resistance profile to endocrine therapy in BC, influencing the expression of markers involved with CLUSTER 1, such as FOXM1 and AURKB [76]. Perhaps, if the Thistle JE. et al., who analyzed the involvement of 14-3-3ζ with a worse outcome in TMX+ treated ER+ patients, had presented statistical correlation values, this marker would be highly related to CLUSTER1 and could result in an increase of predictive power for this set of genes for TMX therapy response [62].

The uncontrolled cell cycle is considered a hallmark of cancer [77], as shown in the meta-analysis that evaluated three key biological processes for the development of breast cancer (cell proliferation, ER, and HER2 signaling). This analysis highlighted cell proliferation as an important prognostic role. Further, the results showed that the subtypes ER+/HER2- with low proliferation showed a higher DRFS, whereas the ER−/HER2− and HER2+ showed a higher level of cell proliferation and worse outcomes. However, there was no analysis of the markers with regard to the different hormonal treatments [21].

Besides, there are studies providing information about possible combined effects among these markers associated with a worse prognosis in patients with other BC subtypes [78, 79]. However, no studies simultaneously evaluate the expression profile of five markers composing CLUSTER 1 in tumor cells, which is a promising molecular target to be better explored in future research about this predictive power in response to TMX treatment.

No significant correlation was found with the clinical outcomes for CLUSTERS 2 and 3 (considering only the constituent markers) (Fig. 3d). However, individually, each molecular marker has presented an association with worse outcomes in their original studies. This controversial effect of joint analysis demonstrates the importance of simultaneously analyzing as many markers as possible and jointly correlating with clinical outcomes. The more extensive and more complete the molecular profile of tumor cells is, the closer to reality this analysis will be, increasing the representativeness of the experimental data.

Reinforcing this concept, when only constituting markers of CLUSTER 3 were analyzed, no association was found with some clinical outcome. However, when other close markers were included in this analysis, a significant association with overall survival was observed (Fig. 3e). The combination of CLUSTER 3 and its direct neighbors have shown involvement with cellular processes linked to mammary gland development and differentiation, in addition to sexual hormone response and gene transcription regulation (Fig. 2d). The latter has been associated with Decitabine (DAC) chemoresistance, also in BC [80]. However, some of these cellular processes are quite generic and could be related to several molecular markers. Therefore, more evidence about this subset of markers is required to increase the number of molecular components integrating this cluster, restrict the involvement with generic cellular processes, and improve its predictive power concerning clinical outcomes.

The investigation of gene signatures relating a specific cell profile with specific clinical outcomes has gained traction, not only for BC but also for other tumor types. Toshimitsu et al. revealed a specific molecular profile of cisplatin resistance in esophageal cancer based on in vitro studies using drug-resistant cell lineage [81]. Other works have discovered gene signatures related to BC resistance for the most varied treatments, such as neoadjuvant therapy [82], inhibitory molecules [83], and hormone therapy [84], among others. These studies make use of multiple strategies, such as transcriptome sequencing and microarray, to simultaneously investigate a range of markers in the same tumor sample and compare it with the resistant condition.

The study used a gene expression assay to quantify the likelihood of recurrence in patients with positive breast cancer, negative nodule, and ER+ treated with TMX. The study managed to identify 16 genes that were related to cancer recurrence that belonged to either of the following biological processes: proliferation (such as Ki-67 and cyclin B1), estrogen receptor (including genes such as ER and PR), HER2 (including HER2 and GRB7), and cell invasion (including genes like Stromelysin 3 and Cathepsin L2) [85]. A follow-up study was developed to investigate the ESR1 marker, which has been reported as a strong predictor of resistance to TMX in ER-positive patients with low levels of ESR1 [86].

Mihály et al. performed a meta-analysis looking for studies that evaluated expression-based markers to provide independent information regarding TMX treatment. Out of 68 biomarkers with probable connection to the TMX resistance, three (PGR, MAPT, and SLC7A5) were the most promising observed in patients treated with TMX. However, only independent markers were evaluated without making a joint analysis to investigate their main biological processes. According to our review, we can also observe that many markers remain under study [22].

In conclusion, this systematic review is the first of its kind to adopt a clustering technique to evaluate the independent data on the literature regarding TMX resistance in ER+ breast cancer patients. This analysis revealed three networks of biomarkers involved in biological processes: cell cycle, signal transduction of proliferative stimuli, and hormone response involved in morphogenesis and differentiation of mammary gland. As expected, CLUSTER 1 corroborated that the cell proliferation pathway is strongly linked to the prediction of TMX response. Further, CLUSTER 2 and 3 indicate that there are other resistance pathways that must be thoroughly investigated as currently available data are not sufficient to reach a statistically significant conclusion. When the direct cluster neighbors are considered, CLUSTER 3 showed a strong link between TMX resistance and the development of the mammary gland and its hormone stimulation. Thus, our data found are hypothesis generators, which suggest promising mechanisms and biomarkers for predicting resistance to TMX and contributing to future studies and personalized medicine.