Hub metastatic gene signature and risk score of breast cancer patients with small tumor sizes using WGCNA

Chang, Yu-Tien; Hong, Zhi-Jie; Tsai, Hsueh-Han; Feng, An-Chieh; Huang, Tzu-Ya; Yu, Jyh-Cherng; Hsu, Kuo-Feng; Huang, Chi-Cheng; Lin, Wei-Zhi; Chu, Chi-Ming; Liang, Chia-Ming; Liao, Guo-Shiou

doi:10.1007/s12282-024-01627-w

Hub metastatic gene signature and risk score of breast cancer patients with small tumor sizes using WGCNA

Original Article
Open access
Published: 27 August 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Breast Cancer Aims and scope Submit manuscript

Hub metastatic gene signature and risk score of breast cancer patients with small tumor sizes using WGCNA

Download PDF

Yu-Tien Chang¹,
Zhi-Jie Hong²,
Hsueh-Han Tsai²,
An-Chieh Feng²,
Tzu-Ya Huang¹,
Jyh-Cherng Yu²,
Kuo-Feng Hsu²,
Chi-Cheng Huang^3,5,6,
Wei-Zhi Lin⁴,
Chi-Ming Chu¹,
Chia-Ming Liang² &
…
Guo-Shiou Liao ORCID: orcid.org/0000-0003-1082-679X²

604 Accesses
Explore all metrics

Abstract

Background

Breast cancer (BC) is the most common cancer in women and accounts for approximately 15% of all cancer deaths among women globally. The underlying mechanism of BC patients with small tumor size and developing distant metastasis (DM) remains elusive in clinical practices.

Methods

We integrated the gene expression of BCs from ten RNAseq datasets from Gene Expression Omnibus (GEO) database to create a genetic prediction model for distant metastasis-free survival (DMFS) in BC patients with small tumor sizes (≤ 2 cm) using weighted gene co-expression network (WGCNA) analysis and LASSO cox regression.

Results

ABHD11, DDX39A, G3BP2, GOLM1, IL1R1, MMP11, PIK3R1, SNRPB2, and VAV3 were hub metastatic genes identified by WGCNA and used to create a risk score using multivariable Cox regression. At the cut-point value of the median risk score, the high-risk score (≥ median risk score) group had a higher risk of DM than the low-risk score group in the training cohort [hazard ratio (HR) 4.51, p < 0.0001] and in the validation cohort (HR 5.48, p = 0.003). The nomogram prediction model of 3-, 5-, and 7-year DMFS shows good prediction results with C-indices of 0.72–0.76. The enriched pathways were immune regulation and cell–cell signaling. EGFR serves as the hub gene for the protein–protein interaction network of PIK3R1, IL1R1, MMP11, GOLM1, and VAV3.

Conclusion

Prognostic gene signature was predictive of DMFS for BCs with small tumor sizes. The protein–protein interaction network of PIK3R1, IL1R1, MMP11, GOLM1, and VAV3 connected by EGFR merits further experiments for elucidating the underlying mechanisms.

Graphical abstract

Identification of five hub genes as monitoring biomarkers for breast cancer metastasis in silico

Article Open access 21 June 2019

Genetic co-expression networks contribute to creating predictive model and exploring novel biomarkers for the prognosis of breast cancer

Article Open access 31 March 2021

Detecting prognostic biomarkers of breast cancer by regularized Cox proportional hazards models

Article Open access 20 December 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Breast cancer is still the most common cancer in women. In 2020, 2.3 million women were diagnosed with breast cancer, with 685,000 reported deaths globally [1]. The heterogeneity of breast cancer is attributed to differences in the genomic, epigenetic, transcriptomic, and proteomic characteristics of the cancer cells. These factors have an impact on tumor properties such as proliferation, apoptosis, metastasis, and therapeutic response [2]. In clinical diagnosis, the histological status of biomarkers, such as estrogen receptor (ER), progesterone receptor (PR), and Human Epidermal Growth factor Receptor-2 (HER2), and Ki67, are critical characteristics for prognosis estimation. The two most important factors are tumor size and lymph-node metastasis status. As for the disease development and progression, larger tumors and positive lymph-node metastases are often associated with poor outcomes in breast cancer. However, it has been found that some patients with small tumors have severe lymph-node metastasis, resulting in poor prognosis [3, 4]. Moreover, in extensive node-positive breast cancers, very small tumor size may be a surrogate for biologically aggressive disease [5]. In the same case of lymph-node metastasis, patients with small tumors have a higher breast cancer-specific mortality rate [6]. However, the underlying mechanism remains elusive.

Since the status of the biomarkers is essential to prognosis estimation, and tumor size and lymph-node distant metastases are the two factors for survival, it is critical to identify the missing link first by analyzing the genomic characteristics. However, the regulatory relationships between genes are of high complexity. Moreover, each regulatory network has a group of highly related genes called the Hub gene. Therefore, we employed Weighted Gene Co-expression Network Analysis (WGCNA), a bioinformatics approach. By using WGCNA, we can simplify up to thousands of genes into several highly correlated gene modules, construct a free-scale network through weighting and co-expression, and explore the module structure, gene and module (module membership information), module and module information in the network, as long as modules (eigengene network methodology) and their association with clinical features [6].

In this study, we aim to identify candidate genes affecting distant metastasis in breast cancer patients with small-size tumors through WGCNA and LASSO Cox-hazards model to establish a genetic risk score for identifying high-risk patients and improve treatment strategies to prevent tumor progression.

Materials and methods

Immunohistochemical subtyping

We analyzed the number of positively stained cell using immunohistochemistry (IHC). A sample was considered positive if the percentage of ER or PR was greater than 1%. The criteria for defining HER2 positivity were IHC 3 + or fluorescence in situ hybridization (FISH) positivity.

RNAseq data

We integrated nine mRNA datasets of breast cancers as a training cohort, GSE6532, GSE6532, GSE9195, GSE11121, GSE16446, GSE25066, GSE45255, GSE58984, and GSE158309 (Table S1) downloaded from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo). There are 2013 patients in total. The selection criteria include primary breast tumors from human tissues, clinical characteristics of females, distant metastases-free survival (DMFS), tumor size, lymph-node status, and at least one clinical feature of grading, ER, PR, and HER2. After filtering out tumor size > 2 cm and missing data of distant metastasis status and time, 598 patients were included in the prognostic biomarker analysis that contained 474 patients without distant metastases and 124 patients with distant metastases. All analyses in this study were performed in the R software (version 4.2.2; https://www.r-project.org/). Training cohorts integrated from nine GSE datasets were normalized to remove batch effects using the linear models for microarray RNA-seq data (LIMMA) R package and “removeBatchEffect” function. GSE20685, which contained 17 patients with distant metastasis and 84 patients without distant metastasis, was used to validate the prediction model.

Prognostic prediction model

As shown in the study workflow (Fig. 1), we first resampled 80% data into the training set and 20% data into the testing set 100 times. Next, we used random survival forests (rsf) to identify important genes for distant metastasis. Then, we summarized the number of selected genes, and calculated the variable importance for corresponding genes. We plotted the scatter plot of gene count and variable importance in 100 times analysis (Fig. 2A). There was a positive linear relationship between gene count and variable importance if the gene count was over five. Namely, genes selected 5 times in 100 resampling results were more likely to have higher importance for distant metastasis prediction.

The concordance index or C-index is a generalization of the area under the ROC curve (AUC) [7] that is used to evaluate predictions made by a model. In 100-round analyses, the C-indices of training and testing sets were 0.88 ± 0.01 and 0.81 ± 0.03, respectively (Fig. 2B). The average prediction efficacy was good, which indicated that selected significant genes were the potential biomarkers.

The LASSO Cox regression was performed using the “glmnet” R package to obtain the optimal gene combination. The degree of Lasso regression complexity was controlled by the appropriate parameter λ, and λ was selected to build the model for accuracy. The λ selection diagram is shown in Fig. 2C, D. The model constructed by λ1se was the simplest. It used a small number of genes, while λ_min had a higher accuracy rate and used a larger number of genes. The λ_min was selected to build the model for accuracy in our study.

Finally, the prognostic risk score was built using Cox regression according to the expression levels of prognostic RNAs and their prognostic coefficients with the calculation formula as follows:

Cell line expression data

Gene expression data of prognostic genes in various breast cancer cell lines (n = 60) were derived from Cancer Cell Line Encyclopedia (CCLE) and Dependency Map (DepMap) 22Q2. They were used to plot the heatmap using the R package “pheatmap” and investigate the prognostic gene expression in various characteristic BC cell lines.

Weighted gene co-expression network analysis

We filter out genes whose adjusted p values (false discovery rate) < 0.05 in the univariable Cox regression. 6786 genes were left to construct the co-expression network and computed module–trait relationship using weighted gene co-expression network analysis (WGCNA). First, we used adjacency and soft threshold power of β to calculate co-expression similarity. Second, we used the TOMsimilarity function to convert a topological overlap matrix to a distance matrix and build hierarchical clustering to identify modules. Then, cutting clusters were employed by the “cutreeDynamic” function. After merging-related modules with the “mergeCloseModules” function, we recalculated the module eigengenes to their corresponding modules and calculated the module–trait correlation to identify significant clinical modules. Finally, functional annotation analysis of the modules was performed using the "userListEnchment" function of the WGCNA package.

Nomogram and protein–protein interaction networks

A nomogram was constructed using the “rms” package to visualize the prediction value of prognostic genes further. The C-index was calculated to evaluate the discriminative ability of the nomogram, and calibration curves were drawn to show the consistency between the predicted 3-year, 5-year, and 7-year endpoint events and the authentic outcomes.

The STRING database collects, scores, and integrates information on protein–protein interactions from various public databases. For the functional annotation of biomarkers, the prognostic genes were adopted to conduct protein–protein interaction networks using STRING (https://string-db.org/).

Results

Patients and demographic

We summarized essential characteristics of breast cancer patients with tumor size ≤ 2 cm in the training cohort by distant metastasis-free survival (DMFS) time, age, tumor grade, lymph node status, biomarker status (ER, PR, and HER2), and the Prediction Analysis of Microarray 50 (PAM50) results (Table 1). There were 474 patients without distant metastasis and 124 patients with distant metastasis. All patients were female and the average age was 58 years old. The average follow-up time is 10 years. The ER, PR, and PAM50 subtyping were significantly associated with DMFS, while others were not significantly associated.

Table 1 The characteristics of breast cancer patients with tumor size ≤ 2 cm

Full size table

Construction and validation of the prognostic signature

After the feature selection using rsf, 46 genes were selected to enter the LASSO Cox regression model to reduce the dimension of genes. Then, 16 genes were selected and entered into the stepwise Cox regression model. Finally, an optimal gene set of nine genes, Abhydrolase domain-containing protein 11 (ABHD11), DExD-box helicase 39A (DDX39A), G3BP Stress Granule Assembly Factor 2 (G3BP2), Golgi membrane protein 1 (GOLM1), Interleukin 1 Receptor Type 1 (IL1R1), Ｍatrix metallopeptidase 11 (MMP11), Phosphoinositide-3-Kinase Regulatory Subunit 1 (PIK3R1), small nuclear ribonucleoprotein polypeptide B2 (SNRPB2), and vav guanine nucleotide exchange factor 3 (VAV3) were selected to build the prediction model using multivariable Cox regression. GOLM1 and VAV3 were suppressor genes of distant metastasis, while the others were oncogenes (Table 2).

Table 2 Prognostic genes for predicting the distant metastasis recurrence

Full size table

Gene expression of nine prognostic genes in tissue samples and various BC cell lines is shown in Fig. 3. All prognostic genes are differentially expressed between primary tissues with distant metastasis (DM) and non-DM (Fig. 3A). In the gene expression heatmap of 60 BC cell lines, prognostic genes were cluster into two gene sets, geneset 1 of GOLM1, ABHD11, G3BP2, DDX39A, SNRPB2, as well as geneset 2 of VAV3, PIK3R1, MMP1, and IL1R1. In general, geneset 1 expresses a relatively higher expression than geneset2 across all cells (Fig. 3B).

We performed univariable- and multivariable Cox regression analyses using the training cohort to determine whether nine prognostic genes can serve as an independent prognostic factor. As revealed by the multivariable Cox regression analysis, except for ABHD11, the other genes DDX39A, G3BP2, GOLM1, IL1R1, MMP11, PIK3R1, SNRPB2, and VAV3 were independent prognostic factors (Table 2). We still retained ABHD11 in the prediction model because of better overall prediction efficacy. Based on the coefficients of the Cox regression model and gene expression values of the nine genes, the risk score for each patient was calculated as follows: risk‑score = (0.38) × ABHD11 + (− 0.38) × DDX39A + (− 0.41) × G3BP2 + (0.22) × GOLM1 + (0.42) × IL1R1 + (0.5) × MMP11 + (0.29) × PIK3R1 + (0.45) × SNRPB2 + (0.3) × VAV3.

At the cut-point value of the median risk score, all patients in integrated data were classified into a high-risk group (n = 299; ≥ median risk score) and a low‑risk group (n = 299; < median risk score).

DMFS time was significantly increased in the low-risk group compared with the high-risk group (p = < 0.001; Table 3, Fig. 4A). The risk stratification capability of the risk scores were validated using the independent dataset GSE20685. Similarly, in validation data, risk scores could discriminate between high-risk and low-risk groups (HR 5.48, p = 0.003, in Fig. 4B). Next, we entered the risk score and clinical variables of age, lymph-node status, ER, PR, Grade, and PAM50 subtyping in stepwise Cox regression. Her2 was not included due to too much missing data. The risk score and ER status were kept in the final model. The comparison of ROC curves showed that AUC is 0.79 for risk score-ER model, and AUC is 0.77 for risk score model (Fig. 4B). The minor difference indicated the good prediction capability of risk score only. We furtherly evaluated the relationship between risk scores and clinical characteristics (Table 3). Risk scores were significantly associated with DMFS, Grade, ER, PR, and PAM50 subtyping.

Table 3 The association of genetic risk score and clinical characteristics using training cohort and univariable Cox regression

Full size table

Nomogram model construction and visualization

The nomogram model was used to visualize prognostic genes in 3-, 5-, and 7-year DMFS prediction (Fig. 4C, D). The C-index of the nomogram using the training cohort was 0.76 (95% CI 0.72–0.80, p < 0.001) and 0.72 (95% CI 0.56–0.88, p = 0.017) in the validation cohort. A calibration diagram was also used to verify the prediction ability of risk score for 3-year, 5-year, and 7-year DMFS (Fig. 4E–G). In the calibration plots, the colored solid line was the prediction for DMFS, and the diagonal dotted line was the actual DMFS. The closer the solid line was to the dotted line, the better the prediction ability was. The calibration curves of the 3-year, 5-year, and 7-year DMFS showed good agreements between predicted DMFS and observed DMFS (Fig. 4E–G). These results suggested that our nomogram had good prognostic significance.

Weighted gene co-expression network analysis

A WGCNA network was constructed using the training cohort. With WGCNA applying scale-free topology criterion, the soft threshold power of β was 6 when scale‑free topology model fit R² was maximized (0.9), and the mean connectivity for the network was 6. A total of six modules were identified (module size ≥ 100 and cut height ≥ 0.2) in the network (turquoise, blue, brown, yellow, green, and grey; Fig. 5A). The number of genes comprising each module is shown in Fig. 5B.

Except for the grey module, all the modules were positively associated with distant metastasis and negatively associated with lymph-node status, age, grading, ER, PR, and PAM50 subtypes (Fig. 5C). According to results of module annotation, each of the five modules was associated with cell–cell signaling (ABHD11 and PIK3R1), cellular metabolic process (G3BP2, GOLM1, SNRPB2), cell cycle phase (DDX39A), cell adhesion (IL1R1 and MMP11) and immune response. VAV3 was clustered in MEgrey (Fig. 5D).

Functional analysis

The extended protein–protein (PP) interaction of nine prognostic genes is shown in Fig. 6A. Of note, IL1R1, MMP11, GOLM1, VAV3, and PIK3R1 were connected by the hub gene EGFR. DDX39A, SNRPB2, and G3BP2 were in the same PP interaction network. These interactions conformed to the finding in the clustering result of the heatmap (Fig. 3B). The functional analysis result of gProfiler demonstrated that G3BP2 and PIK3R1 were related to signaling receptor complex adaptor activity. Additionally, PIK3R1 and VAV3 were related to host-defense mechanisms; IL1R1 and PIK3R1 were related to immune functions; G3BP2, PIK3R1, and SNRPB2 were co-regulated by hsa-miR-302a-5p (Fig. 6B).

Discussion

Traditionally, the size of breast cancer at diagnosis is seen as a key determinant of clinical outcome. However, some aggressive subtypes challenge this notion despite being small (≤ 1 cm) [6]. In certain subtypes, tumor size, lymph-node status, and prognosis may be uncoupled due to a disproportionate number of metastatic cancer cells relative to tumor size [5]. Therefore, understanding the underlying mechanisms of distant metastasis in small-size tumors is an important clinical issue for appropriate treatment decisions. We used machine learning Random Survival Forest and WGCNA techniques to identify nine prognostic genes (ABHD11, DDX39A, G3BP2, GOLM1, IL1R1, MMP11, PIK3R1, SNRPB2, and VAV3) that were predictive of DMFS. Their functions were related to “cell adhesion” (IL1R1 and MMP11), “cell–cell signaling” (ABHD11 and PIK3R1), “cellular metabolic process” (G3BP2, GOLM1 and SNRPB2), “cell cycle phase” (DDX39A), and “not specific” (VAV3) from the results of WGCNA. Patients with higher risk scores had a three-to-fourfold increased risk of developing distant metastasis. When ER status was added, the risk score had good prediction efficacy, with AUCs of 0.75 and 0.79. Furthermore, the risk score reflected clinical characteristics such as lower age, poor Grading, ER-, PR-, and a higher proportion of luminal A/B.

In this study, we employed nomogram models to simplify and visualize prognostic genes in the prediction of DMFS shown in Fig. 4C, D. By obtaining the normalized gene expression of these nine genes, we can determine the individual corresponding scores by referring to the “Points” scales and summing them to derive the “Total Points.” For example, using the nomogram model in Fig. 4D, a hypothetical patient with the following values: DDX39A (12.2 mapping to 100 points), VAV3 (11 mapping to 10 points), GOLM1 (5 mapping to 0 points), MMP11 (11.5 mapping to 20 points), G3BP2 (12.5 mapping to 99 points), PI3KR1 (10.8 mapping to 10 points), ABHD11 (7.5 mapping to 0 points), IL1R1 (8 mapping to 0 points), and SNRPB2 (12.2 mapping to 0 points) would accumulate a total of 239 (100 + 10 + 0 + 20 + 99 + 10 + 0 + 0 + 0) points. These total points correspond to an estimated 3-year survival probability of approximately 73%, a 5-year survival probability of around 59%, and a 7-year survival probability of about 57%.

Previous molecular studies identified genetic markers involved in the prognosis of small breast tumors. For instance, the expression of stromal type IV, an extracellular matrix protein, in small invasive breast cancers has been linked to a higher risk of developing distant metastasis and poorer survival outcomes. It is possible that stromal type IV collagen can promote metastasis formation by supporting cancer cell survival and tumor progression, and high levels of type IV collagen in the metastases appear to be beneficial for metastatic growth [8]. The expression of P-cadherin has been found to be highly predictive of a poor prognosis in small, node-negative breast cancers. P-cadherin has an important role in maintaining the structural integrity of the epithelium [9]. These studies highlighted the dysregulation of the extracellular matrix in the progression of small breast tumors, consistent with our findings on IL1R1 and MMP11, which are involved in “cell adhesion”. In addition, significant correlation of IL1R1 with MMP11 expression was found and involved in breakdown of extracellular matrix, tissue remodeling, and metastasis [10]. Breast tumors infiltrated by MMP-11+ mononuclear inflammatory cells are more likely to metastasize, have high levels of interleukin (IL)-1, IL-5, IL-6, IL-17, interferon (IFN), and NFB, and an increased CD68+/(CD3+CD20+) cell ratio at the invasive front. These factors are implicated in the crosstalk between tumors and their inflammatory microenvironment [11]. MMP11 expression in mononuclear inflammatory cells was associated with shorter relapse-free survival and overall survival [12].

Early in tumorigenesis, IL-1R1 signaling suppresses mammary tumor cell proliferation and inhibits breast cancer outgrowth and pulmonary metastasis. In breast cancer, IL-1-mediated IL-1R1 signaling is tumor-suppressive [13]. Patients treated with anti-estrogen therapy have increased IL1R1 expression, which predicts treatment failure [14]. In addition, inhibition of IL-1 signaling with the anti-IL1β antibody or the IL1R antagonist inhibits bone metastasis in pre-clinical models of breast cancer [15]. Colorectal cancer patients who did not respond to Cetuximab blockage had higher levels of IL1R1 than responsive subjects, and high levels of IL1R1 are predictive of survival [16].

In this study, we identified PIK3R1-IL1R1-MMP11-GOLM1-VAV3-EGFR protein–protein interaction network connected by Epidermal Growth Factor Receptor (EGFR). EGFR and its downstream pathway regulate epithelial–mesenchymal transition, migration, and tumor invasion and that high EGFR expression is an independent predictor of poor prognosis in inflammatory breast cancer. Targeting EGFR enhances the chemosensitivity of tumor cells by rewiring apoptotic signaling networks in Triple-Negative Breast Cancer [17]. Therefore, this genetic network may play crucial role in triggering small-size breast tumor metastases.

GOLM1 has been identified as a potential target for cancer therapy, because it is overexpressed in many solid tumors, promotes tumor growth and metastasis, and leads to poor survival [18]. GOLM1 could promote breast cancer cell aggressiveness by regulating matrix metalloproteinase-13 (MMP13) [19]. Knocking down GOLM1 expression further increased the epigallocatechin gallate (a natural migration-inhibiting substance) treatment effect in breast cancer cells [18]. What’s more, GOLM1 also acts as a positive regulator of Programmed Cell Death Ligand 1 (PD-L1) expression via the EGFR/Signal Transducer and Activator Of Transcription 3 (STAT3) signaling pathway in the human hepatocellular carcinoma [20].

VAV3, a GEF for Rho family GTPases, belongs to the VAV protein family [21]. It is a downstream signal transducer of EGFR/HER2 and could bind to several partners, including PI3K, modulates cell morphology, and induces cell transformation [22]. High nuclear VAV3 expression in tumor cells was associated with poorer endocrine therapy response [23]. It complexes with ERα and together enhance ERα-mediated signaling axis, participating in breast cancer development and/or progression [24]. The depletion of VAV3 reduced the viability of cell models of acquired endocrine therapy resistance [23].

In this study, we unrevealed that G3BP2, PIK3R1, and SNRPB2 were co-regulated by hsa-miR-302a-5p. The MiR-302 family exerts antitumor effects in several cancers [25]. MiR-302a, -b, -c, and -d were found to cooperatively inhibit BCRP expression to increase the drug sensitivity of breast cancer cells [26]. Dysregulation of the phosphoinositide 3-kinase (PI3K) pathway contributes to the development and progression of tumors. PIK3R1 underexpression is an independent prognostic marker in breast cancers [27]. Silencing PIK3R1 enhanced the sensitivity of breast cancer cell lines to rapamycin [28], implicating a negative role of PIK3R1 in PI3K pathway activation. Both PIK3R1 and EGFR were involved in anti-cancer drug effects. Schisandrin A (SchA), a good anti-cancer drug, significantly down-regulated EGFR, PIK3R1, and MMP9 but up-regulated cleaved-caspase 3, thus inhibiting the migration and promoting the apoptosis of MDA-MB-231 cells [29].

G3BP2 (G3BP Stress Granule Assembly Factor 2) regulates breast tumor initiation by stabilizing squamous cell carcinoma antigen recognized by T cells 3 (SART3) mRNA. The loss of G3BP2 inhibits breast tumor initiation, possibly lead to improved cancer treatments [30]. Cell-cycle checkpoint regulator MK2 or G3BP2 inactivation sensitizes cisplatin-resistant TNBC cell lines to cisplatin [31]. Suppression of G3BP2 inhibits the immune checkpoint molecule PD‐L1 due to mRNA degradation [32].

SNRPB2 (Small Nuclear Ribonucleoprotein Polypeptide B2) is the encoding gene of protein U2 small nuclear ribonucleoprotein B, one component of spliceosome. While primarily studied in hepatocellular carcinoma [33], SNRPB2 is a novel gene of interest in breast cancer. Many oncogenic insults deregulate RNA splicing, often leading to hypersensitivity of tumors to spliceosome-targeted therapies (STTs). Mis-spliced RNA is a molecular trigger for tumor killing through viral mimicry. STTs cause widespread cytoplasmic accumulation of mis-spliced mRNAs, many forming double-stranded structures in MYC-driven triple-negative breast cancer [34].

DDX39 encodes protein DExD-Box Helicase 39A that unwinds double-stranded RNA in an ATP-dependent manner. It is involved in transcription, splicing, ribosome biogenesis, RNA export, RNA editing, RNA decay, translation, and the protection and maintenance of telomeres. Increased DDX39 mRNA expression was associated with poor outcomes in ER-positive breast cancers. Inhibiting DDX39 could enhance the sensitivity of MCF-7 to doxorubicin [35]. In hepatocellular carcinoma (HCC), DDX39 knockdown inhibited HCC cell migration, invasion, growth, and metastasis by activating the Wnt/β-catenin pathway [36].

ABHD11 (Abhydrolase Domain Containing 11) is a protein-coding gene. ABHD11 antisense RNA 1 (ABHD11-AS1) is highly expressed in many cancers. Several studies have highlighted the clinical importance of ABHD11-AS1 in cancer prognosis, diagnosis, stage prediction, and treatment response. The ABHD11-AS1 has been shown to cause cancer by sponging various microRNAs (miRNAs), altering signaling pathways such as PI3K/Akt, epigenetic mechanisms, and N6-methyladenosine (m6A) RNA modification [37]. ABHD11 and Esterase D could predict the development of distant metastases and the presence of aggressive lung adenocarcinomas [38].

In the present study, although prognostic genes were validated in the independent dataset, further testing on clinical data is warranted. HER2 status, Ki67, and lymphovascular invasion were not included for analysis due to too much missing data. It should be included in the future work. In addition, the current mRNA expression data require experimental studies to validate the findings and elucidate the mechanisms.

Conclusions

In sum, we utilized machine learning and WGCNA of large-scale data from the GEO database to identify prognostic gene set of ABHD11, DDX39A, G3BP2, GOLM1, IL1R1, MMP11, PIK3R1, SNRPB2, and VAV3 involved in the distant metastasis for BC patients with small tumor size. They are reportedly linked to metastasis and treatment resistance. When compared to a low genetic risk score, a high genetic risk score comprising nine genes predicted poor DMFS. To note, the protein–protein interaction network of PIK3R1, IL1R1, MMP11, GOLM1, and VAV3 altogether connected by EGFR merits further work to understand the mechanism and develop an ideal treatment strategy for invasive small tumor-size breast cancers.

Availability of data and materials

All data can be downloaded from Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/).

Abbreviations

BC:: Breast cancer
GEO:: Gene Expression Omnibus
DMFS:: Distant metastasis-free survival
DM:: Distant metastasis
HR:: Hazard ratio
WGCNA:: Weighted gene co-expression network analysis
CCLE:: Cancer cell line encyclopedia
Depmap:: Dependency map

References

WHO. Breast cancer. 2023 [cited 2023 June 5]. https://www.who.int/news-room/fact-sheets/detail/breast-cancer.
Guo L, et al. Breast cancer heterogeneity and its implication in personalized precision therapy. Exp Hematol Oncol. 2023;12(1):3.
Article PubMed PubMed Central Google Scholar
Gajdos C, Tartter PI, Bleiweiss IJ. Lymphatic invasion, tumor size, and age are independent predictors of axillary lymph node metastases in women with T1 breast cancers. Ann Surg. 1999;230(5):692–6.
Article PubMed PubMed Central CAS Google Scholar
Soerjomataram I, et al. An overview of prognostic factors for long-term survivors of breast cancer. Breast Cancer Res Treat. 2008;107(3):309–30.
Article PubMed CAS Google Scholar
Azmi AS, Bao B, Sarkar FH. Exosomes in cancer development, metastasis, and drug resistance: a comprehensive review. Cancer Metastasis Rev. 2013;32(3–4):623–42.
Article PubMed CAS Google Scholar
Wo JY, et al. Effect of very small tumor size on cancer-specific mortality in node-positive breast cancer. J Clin Oncol. 2011;29(19):2619–27.
Article PubMed PubMed Central Google Scholar
Uno H, et al. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 2011;30(10):1105–17.
Article PubMed PubMed Central Google Scholar
Jansson M, et al. Prognostic value of stromal type IV collagen expression in small invasive breast cancers. Front Mol Biosci. 2022;9: 904526.
Article PubMed PubMed Central CAS Google Scholar
Arnes JB, et al. Placental cadherin and the basal epithelial phenotype of BRCA1-related breast cancer. Clin Cancer Res. 2005;11(11):4003–11.
Article PubMed CAS Google Scholar
Malgulwar PB, et al. Transcriptional co-expression regulatory network analysis for Snail and Slug identifies IL1R1, an inflammatory cytokine receptor, to be preferentially expressed in ST-EPN-RELA and PF-EPN-A molecular subgroups of intracranial ependymomas. Oncotarget. 2018;9(84):35480–92.
Article PubMed PubMed Central Google Scholar
Eiro N, et al. Cytokines related to MMP-11 expression by inflammatory cells and breast cancer metastasis. Oncoimmunology. 2013;2(5): e24010.
Article PubMed PubMed Central Google Scholar
Eiro N, et al. MMP11 expression in intratumoral inflammatory cells in breast cancer. Histopathology. 2019;75(6):916–30.
Article PubMed Google Scholar
Dagenais M, et al. The Interleukin (IL)-1R1 pathway is a critical negative regulator of PyMT-mediated mammary tumorigenesis and pulmonary metastasis. Oncoimmunology. 2017;6(3): e1287247.
Article PubMed PubMed Central Google Scholar
Sarmiento-Castro A, et al. Increased expression of interleukin-1 receptor characterizes anti-estrogen-resistant ALDH(+) breast cancer stem cells. Stem Cell Rep. 2020;15(2):307–16.
Article CAS Google Scholar
Ridker PM, et al. Effect of interleukin-1beta inhibition with canakinumab on incident lung cancer in patients with atherosclerosis: exploratory results from a randomised, double-blind, placebo-controlled trial. Lancet. 2017;390(10105):1833–42.
Article PubMed CAS Google Scholar
Gelfo V, et al. A novel role for the interleukin-1 receptor axis in resistance to anti-EGFR therapy. Cancers (Basel). 2018;10(10):355.
Article PubMed CAS Google Scholar
Masuda H, et al. Role of epidermal growth factor receptor in breast cancer. Breast Cancer Res Treat. 2012;136(2):331–45.
Article PubMed CAS Google Scholar
Xie L, et al. Suppression of GOLM1 by EGCG through HGF/HGFR/AKT/GSK-3beta/beta-catenin/c-Myc signaling pathway inhibits cell migration of MDA-MB-231. Food Chem Toxicol. 2021;157: 112574.
Article PubMed CAS Google Scholar
Zhang R, et al. Golgi membrane protein 1 (GOLM1) promotes growth and metastasis of breast cancer cells via regulating matrix metalloproteinase-13 (MMP13). Med Sci Monit. 2019;25:847–55.
Article PubMed PubMed Central CAS Google Scholar
Yan J, et al. GOLM1 upregulates expression of PD-L1 through EGFR/STAT3 pathway in hepatocellular carcinoma. Am J Cancer Res. 2020;10(11):3705–20.
PubMed PubMed Central CAS Google Scholar
Rao S, et al. A novel nuclear role for the Vav3 nucleotide exchange factor in androgen receptor coactivation in prostate cancer. Oncogene. 2012;31(6):716–27.
Article PubMed CAS Google Scholar
Zeng L, et al. Vav3 mediates receptor protein tyrosine kinase signaling, regulates GTPase activity, modulates cell morphology, and induces cell transformation. Mol Cell Biol. 2000;20(24):9212–24.
Article PubMed PubMed Central CAS Google Scholar
Aguilar H, et al. VAV3 mediates resistance to breast cancer endocrine therapy. Breast Cancer Res. 2014;16(3):R53.
Article PubMed PubMed Central Google Scholar
Lee K, et al. Vav3 oncogene activates estrogen receptor and its overexpression may be involved in human breast cancer. BMC Cancer. 2008;8:158.
Article PubMed PubMed Central Google Scholar
Chen W, et al. MiR-302a-5p suppresses cell proliferation and invasion in non-small cell lung carcinoma by targeting ITGA6. Am J Transl Res. 2019;11(7):4348–57.
PubMed PubMed Central Google Scholar
Wang Y, et al. miR-302a/b/c/d cooperatively inhibit BCRP expression to increase drug sensitivity in breast cancer cells. Gynecol Oncol. 2016;141(3):592–601.
Article PubMed CAS Google Scholar
Cizkova M, et al. PIK3R1 underexpression is an independent prognostic marker in breast cancer. BMC Cancer. 2013;13:545.
Article PubMed PubMed Central Google Scholar
Ou O, et al. Loss-of-function RNAi screens in breast cancer cells identify AURKB, PLK1, PIK3R1, MAPK12, PRKD2, and PTK6 as sensitizing targets of rapamycin activity. Cancer Lett. 2014;354(2):336–47.
Article PubMed PubMed Central CAS Google Scholar
Chen L, et al. Bio-informatics and in vitro experiments reveal the mechanism of schisandrin a against MDA-MB-231 cells. Bioengineered. 2021;12(1):7678–93.
Article PubMed PubMed Central CAS Google Scholar
Gupta N, et al. Stress granule-associated protein G3BP2 regulates breast tumor initiation. Proc Natl Acad Sci USA. 2017;114(5):1033–8.
Article PubMed PubMed Central CAS Google Scholar
Heijink AM, et al. Modeling of cisplatin-induced signaling dynamics in triple-negative breast cancer cells reveals mediators of sensitivity. Cell Rep. 2019;28(9):2345 e5-2357 e5.
Article Google Scholar
Zhang Y, Yue C, Krichevsky AM, Garkavtsev I. Repression of the stress granule protein G3BP2 inhibits immune checkpoint molecule PD-L1. Mol Oncol. 2021. https://doi.org/10.1002/1878-0261.12915.
Article PubMed PubMed Central Google Scholar
Guo J, Li L, Wang H. Unveiling the hidden role of SNRPB2 in HCC: a promising target for therapy. 2024. PREPRINT (Version 1) available at Research Square https://doi.org/10.21203/rs.3.rs-3909546/v1.
Bowling EA, et al. Spliceosome-targeted therapies trigger an antiviral immune response in triple-negative breast cancer. Cell. 2021;184(2):384 e21-403 e21.
Article Google Scholar
Wang X, et al. DEAD-box RNA helicase 39 promotes invasiveness and chemoresistance of ER-positive breast cancer. J Cancer. 2020;11(7):1846–58.
Article PubMed PubMed Central CAS Google Scholar
Zhang T, et al. DDX39 promotes hepatocellular carcinoma growth and metastasis through activating Wnt/beta-catenin pathway. Cell Death Dis. 2018;9(6):675.
Article PubMed PubMed Central Google Scholar
Golla U, et al. ABHD11-AS1: an emerging long non-coding RNA (lncRNA) with clinical significance in human malignancies. Noncoding RNA. 2022;8(2):21.
PubMed PubMed Central CAS Google Scholar
Wiedl T, et al. Activity-based proteomics: identification of ABHD11 and ESD activities as potential biomarkers for human lung adenocarcinoma. J Proteomics. 2011;74(10):1884–94.
Article PubMed CAS Google Scholar

Download references

Acknowledgements

The authors would like to thank Tri-service General Hospital (801GB112190) for the financial support.

Funding

This study was supported by the funding of Tri-service General Hospital (801GB112190) to Guo-Shiou Liao and Yu-Tien Chang.

Author information

Authors and Affiliations

School of Public Health, National Defense Medical Center, Taipei City, Taiwan
Yu-Tien Chang, Tzu-Ya Huang & Chi-Ming Chu
Division of General Surgery, Department of Surgery, Tri-Service General Hospital, National Defense Medical Center, No. 325, Sec. 2, Chenggong Rd., Neihu Dist., Taipei City, 114202, Taiwan
Zhi-Jie Hong, Hsueh-Han Tsai, An-Chieh Feng, Jyh-Cherng Yu, Kuo-Feng Hsu, Chia-Ming Liang & Guo-Shiou Liao
Department of Surgery, Taipei Veterans General Hospital, No.201, Sec. 2, Shipai Rd., Beitou District, Taipei City, 11217, Taiwan
Chi-Cheng Huang
AIoT Center, Tri-Service General Hospital, Taipei City, Taiwan
Wei-Zhi Lin
Comprehensive Breast Health Center, Taipei Veterans General Hospital, No. 201, Sec. 2, Shipai Rd., Beitou District, Taipei City, 11217, Taiwan
Chi-Cheng Huang
Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, No. 17, Xuzhou Rd., Taipei City, 100, Taiwan
Chi-Cheng Huang

Authors

Yu-Tien Chang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Jie Hong
View author publications
You can also search for this author in PubMed Google Scholar
Hsueh-Han Tsai
View author publications
You can also search for this author in PubMed Google Scholar
An-Chieh Feng
View author publications
You can also search for this author in PubMed Google Scholar
Tzu-Ya Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jyh-Cherng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Kuo-Feng Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Cheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Zhi Lin
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Ming Chu
View author publications
You can also search for this author in PubMed Google Scholar
Chia-Ming Liang
View author publications
You can also search for this author in PubMed Google Scholar
Guo-Shiou Liao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.S.L. contributed to conception and design of the study, analysis and interpretation of data, manuscript writing, and manuscript review. Z.J.H. contributed to manuscript writing, manuscript editing, and manuscript review. H.H.T. contributed to manuscript writing, manuscript editing, and manuscript review. A.C.F. contributed to manuscript writing, manuscript editing, and manuscript review. T.Y.H. contributed to collection and assembly of data, analysis and interpretation of data, manuscript review. J.C.Y. contributed to manuscript writing, manuscript editing, and manuscript review. K.F.H. contributed to manuscript review. C.C.H. contributed to manuscript review. W.Z.L. contributed to manuscript review. C.M.C. contributed to manuscript review. C.M.L. contributed to manuscript review. Y.T.C. contributed to conception and design of the study, collection and assembly of data, analysis and interpretation of data, manuscript writing, and manuscript review. All authors approved the manuscript.

Corresponding author

Correspondence to Guo-Shiou Liao.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 18 KB)

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Cite this article

Chang, YT., Hong, ZJ., Tsai, HH. et al. Hub metastatic gene signature and risk score of breast cancer patients with small tumor sizes using WGCNA. Breast Cancer (2024). https://doi.org/10.1007/s12282-024-01627-w

Download citation

Received: 13 September 2023
Accepted: 09 August 2024
Published: 27 August 2024
DOI: https://doi.org/10.1007/s12282-024-01627-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Hub metastatic gene signature and risk score of breast cancer patients with small tumor sizes using WGCNA

Abstract

Background

Methods

Results

Conclusion

Graphical abstract

Similar content being viewed by others

Identification of five hub genes as monitoring biomarkers for breast cancer metastasis in silico

Genetic co-expression networks contribute to creating predictive model and exploring novel biomarkers for the prognosis of breast cancer

Detecting prognostic biomarkers of breast cancer by regularized Cox proportional hazards models

Introduction

Materials and methods

Immunohistochemical subtyping

RNAseq data

Prognostic prediction model

Cell line expression data

Weighted gene co-expression network analysis

Nomogram and protein–protein interaction networks

Results

Patients and demographic

Construction and validation of the prognostic signature

Nomogram model construction and visualization

Weighted gene co-expression network analysis

Functional analysis

Discussion

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 18 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation