A novel epigenetic signature for overall survival prediction in patients with breast cancer
Breast cancer is the most common malignancy in female patients worldwide. Because of its heterogeneity in terms of prognosis and therapeutic response, biomarkers with the potential to predict survival or assist in making treatment decisions in breast cancer patients are essential for an individualised therapy. Epigenetic alterations in the genome of the cancer cells, such as changes in DNA methylation pattern, could be a novel marker with an important role in the initiation and progression of breast cancer.
DNA methylation and RNA-seq datasets from The Cancer Genome Atlas (TCGA) were analysed using the Least Absolute Shrinkage and Selection Operator (LASSO) Cox model. Applying gene ontology (GO) and single sample gene set enrichment analysis (ssGSEA) an epigenetic signature associated with the survival of breast cancer patients was constructed that yields the best discrimination between tumour and normal breast tissue. A predictive nomogram was built for the optimal strategy to distinguish between high- and low-risk cases.
The combination of mRNA-expression and of DNA methylation datasets yielded a 13-gene epigenetic signature that identified subset of breast cancer patients with low overall survival. This high-risk group of tumor cases was marked by upregulation of known cancer-related pathways (e.g. mTOR signalling). Subgroup analysis indicated that this epigenetic signature could distinguish high and low-risk patients also in different molecular or histological tumour subtypes (by Her2-, EGFR- or ER expression or different tumour grades). Using Gene Expression Omnibus (GEO) the 13-gene signature was confirmed in four external breast cancer cohorts.
An epigenetic signature was discovered that effectively stratifies breast cancer patients into low and high-risk groups. Since its efficiency appears independent of other known classifiers (such as staging, histology, metastasis status, receptor status), it has a high potential to further improve likely individualised therapy in breast cancer.
KeywordsBreast cancer Mammary carcinoma Epigenetics Molecular marker Response Prognosis Molecular signature Individualized therapy
Least Absolute Shrinkage and Selection Operator
single sample gene set enrichment analysis
Gene Expression Omnibus
differentially expressed genes
differentially methylated genes
Weighted Correlation Network Analysis
receiver operating characteristic
epidermal growth factor receptor 2
epidermal growth factor receptor 2
epidermal growth factor receptor 1
Breast cancer is the most common tumour in woman, but represents a heterogeneous disease in terms of clinical prognosis and therapeutic response. Parts of the clinical heterogeneity can be linked to distinct molecular subtypes by gene expression profiles [1, 2]. Depending on the mutational and growth factor receptor status, a targeted chemotherapy had recently help to improve overall survival. DNA mutation and copy number changes are robust markers for molecular subtypes and they show little variation throughout therapy. However, their predictive value for progression and response may be limited. RNA expression pattern usually exhibits much larger variations between individual patients, and can be directly related to the activity of important pathways in malignant cells. On the other hand, RNA expression values also show relative rapid and stochastic variations, that could hamper the identification of relevant pathways. Epigenetic changes of DNA methylation are semi-stable and less variable, but show large variations related to the activity of cellular pathways. Thus, the combination of epigenetic status and transcriptome would be helpful for predicting the tumour progression. Moreover, Changes in DNA methylation provides tumour cells with a high level of plasticity to quickly adapt to changes in physiology, metabolic restrictions or to cytotoxic stress during therapy [3, 4, 5]. Therefore it is reasonable to analyse the DNA methylation pattern in the tumour cells in order to find novel predictors for the survival or response of breast cancer patients [6, 7].
The availability of high throughput genomic assays such as DNA methylation-seq, ATAC-seq and RNA-seq, have opened the possibility for a comprehensive characterisation of all molecular alterations of cancer cells and, hence to find novel biomarkers with clinical and therapeutic value [1, 8, 9, 10]. To overcome the limited statistical power of single biomarkers, entire molecular signatures derived from high-content genome screens seem to offer better predictive values. Some studies already demonstrated the power of whole transcriptome (RNA-seq) datasets, alone or in combination with DNA methylation datasets to build gene-based or CpG site-based signatures [6, 11]. In the present study, we merged DNA methylation and RNA-seq datasets of breast cancer patients from the Cancer Genome Atlas (TCGA) in order to develop a novel epigenetic signature, capable of predicting the overall survival. The proposed epigenetic signature was validated in 4 external datasets from the GEO database (617 cases in total).
Sample selection and data processing
Detailed informations for each of the GEO cohorts and for the different breast cancer subtypes of the TCGA cohort are given, together with their calculated hazard ratio
Differentially expressed genes (DEGs) analysis and differentially methylated genes (DMGs) analysis
Limma package was used to perform DEG analysis . An empirical Bayesian approach was applied to estimate the gene expression changes using moderated t-tests. The DEGs were defined as genes with an adjusted p value of less than 0.05, and with an absolute of fold change greater than 2. The DMGs were defined as genes with an adjusted p value of less than 0.05, and an absolute β value (from HumanMethylation450 BeadChip) difference higher than 0.25. We could identify 306 genes that had overlapping changes in both DEGs and DMGs.
LASSO regularisation and signature construction
Risk score = (0.321 * expression level of PCDHGA12) + (0.204 * expression level of HIF3A) + (0.061 * expression level of EZR) + (0.056 * expression level of PCDHGA3) + (0.044 * expression level of TPD52) + (− 0.011 * expression level of STAC2) + (− 0.012 * expression level of C2orf40) + (− 0.019 * expression level of KRT19) + (− 0.050 * expression level of NDRG2) + (− 0.054 * expression level of KCNH8) + (− 0.151 * expression level of CCND2) + (− 0.170 * expression level of SIAH2) + (− 0.186 * expression level of ITPRIPL1).
WGNCA for the transcriptome of breast tumour
A gene co-expression network was built by the Weighted Correlation Network Analysis (WGCNA) [10, 17]. Raising the co-expression similarity to a power β defined a weighted network adjacency . By evaluating the correlations between the risk score of patients with breast cancer and the module memberships, it was possible to identify highly-correlated modules. The hub-gene (selected as gene significance greater than 0.4) in blue module was selected and underwent further analysis. Gene ontology (GO) and KEGG analysis were performed by clusterProfiler  and Metascape (metascape.org), respectively.
The correlation between mRNA expression level and DNA methylation level was analysed for every gene by Spearman correlation coefficient. Every dot represented the mean value of 80 samples in tumour tissues and the mean value of 10 samples in normal tissues in Fig. 5. The p value of the difference of gene expression in 4 molecular subtypes was calculated by ANOVA analysis. The p value and hazard ratio (HR) of survival analysis were calculated by COX regression. The code for analysing DEGs and risk scores were in Additional file 2.
Integrating TCGA breast cancer RNA-seq datasets with DNA methylation datasets according to the flowchart (Additional file 3: Figure S1) 306 genes were identified that form an overlapping cluster (up-regulated expressed genes overlap with hypomethylated genes and down-regulated expressed genes overlap with hypermethylated genes between tumour and normal tissues, respectively). Of these 306 genes, 95 genes had a significant correlation between the mRNA expression and DNA methylation values. LASSO Cox regression analysis build the prediction model with a 13-gene epigenetic signature as the best predictor for overall survival of breast cancer patients. ssGSEA was applied to identify the association between epigenetic signature and cancer-related hallmarks (e.g. MTORC1 signaling, G2M checkpoint). Using ssGSEA, WGCNA and downstream GO, KEGG analysis indicated that cell division, and cell cycle and related terms were closely linked to the signature. The nomogram which included the 13-gene epigenetic model and other clinicopathological factors exhibited high accuracy.
Identification of differently expressed genes and differently methylated genes between tumour and normal tissues
LASSO Cox regression identifying a 13-gene epigenetic signature
95 genes from above analysis constructed a gene-expression profile, and LASSO Cox model was applied to build the prognostic signature on the gene-expression profile. Cross-validation was carried out in 5 rounds to prevent overfitting (internal training sets and internal validation sets constructed randomly) (Fig. 1d).
The most powerful features (ITPRIPL1, SIAH2, KCNH8, KRT19, NDRG2, STAC2, TPD52, EZR, PCDHGA12, HIF3A, PCDHGA3, C2orf40, CCND2) were identified by the regularisation process of LASSO COX regression (Fig. 1e).
The ROC plots for identifying the tumour and normal tissues by expression level and methylation level of the 13 genes were shown in Additional file 4: Figure S2 and Additional file 5: Figure S3. The 13 genes showed high efficiency to differentiate between tumour and normal tissues in terms of both gene expression level and DNA methylation level.
Overall survival prediction based on the epigenetic signature
WGCNA on the transcriptome of breast cancer patients
DNA methylation pattern, gene expression level in tumour and normal tissues and association of OS and RFS for the 13 genes
Subgroup analysis on the 13-gene epigenetic signature
As shown in Additional file 9: Figure S7A–F, the prognostic epigenetic signature serves as a promising biomarker for predicting the survival of breast cancer in different subgroups, including Luminal A type (p = 0.03), Luminal B type (p = 0.026), HER2-enriched (p = 0.012) and triple negative (p = 0.004), stage I-II (p < 0.001), stage III-IV (p < 0.001) patients, respectively.
Validation of the 13-gene epigenetic signature by independent breast cancer datasets
Construction of a nomogram
Most of the established clinical markers for therapy response and survival of breast tumour are based on clinical traits with limited accuracy and specificity. Cellular markers of the tumour biology such as IHC positivity for estrogene receptor (ER), progesteron recepetor (PR), epidermal growth factor receptor 2 (HER-2), cytokeratin 5/6, epidermal growth factor receptor 1 (EGFR) and for cell proliferation (Ki67) are currently the gold standard for therapy stratification, but require considerable laboratory work and are prone to subjective bias. Nowadays, high-throughput data gives a comprehensive insight into the genomic, genetic and epigenetic change in patients [1, 20]. The high-throughput profiles help identify possible biomarkers for predicting the survival of patients and their reaction to therapy. Tumour tissues have a distinct DNA methylation landscape compared to adjacent normal tissues . Hypermethylation of promoter CpG islands often associates with transcriptional silencing of the associated genes in breast cancer . The different DNA methylation status makes it possible to use this as a potential tool in breast cancer detection and diagnosis. Here, we explore the utility of DNA methylation status and gene expression level in the prediction of survival of breast cancer patients. By integrating the DNA methylation profiles and gene expression profiles in breast tumour tissues and normal tissues, we built a 13-gene epigenetic signature. This way it was possible to predict with high confidence the CpG methylation status by measuring the mRNA expression status of 13 genes. This avoids the need of the laborious direct measurement of DNA methylation pattern, and on the other provides a robust set of biomarkers.
The subgroup analysis indicated that the epigenetic signature could stratify patients with high and low-risk scores well in different grades and in different molecular subtypes. The epigenetic prognostic model was applied together with grade and molecular subtypes to build a nomogram for predicting survival probability of patients with breast tumour. The prediction efficiency was confirmed by the calibration plot. Thus, the nomogram may help the clinics for better treatment and precision medicine in patients with breast tumour. The cellular pathway most clearly associated with the 13 gene epigenetic signature is mTORC signalling. mTOR signalling integrates both intracellular and extracellular signals and works as a central pathway in tumour progression and malignancy. Dysregulation of the PI3K/PTEN/Akt/mTORC1 pathway by gene mutations occurs in > 70% of breast tumours . In ER+ breast cancers, PI3K/PTEN/Akt/mTORC1 pathway activation results in both estrogen-dependent and estrogen-independent ER activity and loss of therapy response to hormonal therapies . PI3K/PTEN/Akt/mTORC1 pathway activation also results in resistance to HER2 inhibitors in HER2+ breast cancer. mTORC1 pathway inhibition helps overcome the resistance to anti-HER2 based molecular therapies . Thus, hormonal therapy combined with mTORC1 blockage is a promising way for the treatment of breast tumour. The epigenetic signature showed the most significant correlation with mTORC1 signalling pathway, which may provide new strategy for the treatment of breast cancer.
A problem with the molecular profiling of the tumours as shown here is that it might be prone to intra-tumour heterogeneity. Whereas histo-morphological methods such as IHC immediately can show spacial pattern of marker expression—including focal subclones—across the tumour specimen, such an intra-tumour heterogeneity might be overseen with molecular profiles. A future direction of expression- and methylation profiling for tumour classification would therefore require a single-cell based approach.
In summary, the novel 13-gene epigenetic signature serves as a promising prognostic model to predict the survival of patients with breast cancer, which may help the development of personalised and precise medicine in breast cancer field.
We would like to thank Prof. Dr. Michael Atkinson for the helpful discussions and suggestions.
XB and MR conceived and designed the experiments. XB and YW analysed the data. XB and YW wrote the paper. NA and MR reviewed the draft. All authors read and approved the final manuscript.
We sincerely thank the China Scholarship Council (CSC), Grant: 201608210186 for supporting the research and work of Xuanwen Bao.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
- 2.Bao X, Shi R, Zhang K, Xin S, Li X, Zhao Y, Wang Y (2019) Immune landscape of invasive ductal carcinoma tumor microenvironment identifies a prognostic and immunotherapeutically relevant gene signature. Front. Oncol 9Google Scholar
- 9.Bao M, Shi R, Zhang K, Zhao Y, Wang Y, Bao X. Development of a membrane lipid metabolism–based signature to predict overall survival for personalized medicine in ccRCC patients. EPMA J. 2019.Google Scholar
- 10.Wang Y, Xin S, Zhang K, Shi R, Bao X. Low GAS5 levels as a predictor of poor survival in patients with lower-grade gliomas. J Oncol. 2019;2019:1–15.Google Scholar
- 13.Friedman J, Hastie T, Tibshirani R. glmnet: Lasso and elastic-net regularized generalized linear models. R package version. 2009;1(4).Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.