Background

Esophageal carcinoma (EC) is one of the most common malignant tumors in the digestive system. It occurs mostly in the esophageal epithelium, and there are no typical clinical symptoms in the early stage of the patient. Therefore, more than 80% of EC patients have progressed to the advanced stage when they are diagnosed, which affects the prognosis of patients [1]. Esophageal cancer has two major histological subtypes, esophageal adenocarcinoma (EAC) and esophageal squamous cell carcinoma (ESCC). Among them, ESCC is the predominant subtype and accounts for 80% of all patients [2]. For the moment, the mechanism of ESCC is still not fully characterized, and the early symptoms of patients are atypical, which brings great difficulties for clinical diagnosis and therapy [3]. Similar to other malignancies, the progression of ESCC is also a complex process involving multiple factors and multiple gene mutations. Studies have shown that the changes in the molecular level of ESCC tissues are earlier than the clinical features. Therefore, early diagnosis and intervention are significant for reducing the incidence of ESCC [4].

Epigenetic changes are identified as significant contributors to cancer progression [5]. The abnormal DNF methylation is one of the most important and common epigenetic modifications, and plays key roles in regulating genome function [6]. Selective hypermethylation or hypomethylation of genes to regulate the expression of genes and form specific tissue types during development are considered to be a hallmark in developing many carcinomas [7]. In recent years, studies on methylation and tumors have gradually drawn more attention. For instance, Roy et al. analyzed the lymph node metastasis in esophageal squamous cell carcinoma and built a comprehensive methylation signature for predicting the prognosis of patients [8]. Genes including NNK, MSH3 and P16, which were stated to be methylated, associated with tumors progression [9,10,11]. Identification of abnormal methylated genes can explore the redundancy and instability of the esophageal carcinoma genome and provide the basis for risk prediction and targeted therapy.

The wide DNA methylation arrays and advent of deep RNA-Seq approach has significantly contributed to study the interaction between methylation and gene expression during tissue carcinogenesis and development. An integrative analysis of mRNA expression and DNA methylation studied by Kim et al. stated out the function of epigenetic changes on malignant mesothelioma cell [12]. Furthermore, in order to identify the mechanism contributed to oncogenesis, Olivier Gevaert et al. developed a novel computational algorithm called Methylmix to study abnormal methylated genes and predict transcription [13]. As a well-known cancer genome database, The Cancer Genome Atlas (TCGA) [14] provides a great genomic data with patients information, which can translate molecular information into potential clinical information. In this study, ESCC-related expressed and abnormally methylated genes were recognized based on TCGA database, and the related differential genes and expression of abnormally methylated genes in ESCC patients were clarified. We analyzed RNA-Seq transcriptomes and DNA methylation data of ESCC samples from 99 cases in TCGA. Five candidate genes (ABCD1, SLC5A10, SPIN3, ZNF69, ZNF608) were identified from 26 driven genes (p < 0.05), which could be served as independent prognostic biomarker. Additionally, ABCD1, CCDC8, FBXO17 were identified to be meaningfully correlated with prognosis by further integrative survival analysis. Besides, we found the significant correlation between methylated sites with gene expression.

Methods

Data acquisition and preprocessing

In this study, all data were obtained from TCGA data portal accessed on 20181108 (https://portal.gdc.cancer.gov/). Of them, the DNA methylation data was using the Illumina Infinium HumanMethylation450 platform, and beta values, ranged from 0 to 1, was quantified to indicate the levels of DNA methylation. The DNA methylation data included 3 normal samples, 96 ESCC samples. And we used transcriptome profiling data without isoform expression and miRNA expression quantification, for analyzing the gene expression of ESCC. Then, R software and packages were utilized to analyze and normalize the downloaded data to obtain differentially expressed genes (DEGs) and differentially methylated genes (DMGs). Furthermore, a total of 96 ESCC suffers had recorded clinical data and were used in further survival analysis (Additional file 1: Table S1). The data from TCGA is open-ended and publicly available.

Integrative analysis

The DEGs and DMGs were integrated for performing an analysis via the R package MethylMix [15]. MethylMix is a program used for automatically analyzing the correlation between methylation events and gene expression [13]. Three datasets are required as input for analysis: normal DNA methylation data, cancer DNA methylation data and matched gene expression data. Then, the Methylmix identify cancer specific hyper and hypo methylated genes, which named transcriptionally predictive genes, and compute the correlation between methylated genes and related genes. A Wilcoxon rank sum test was adopted in this algorithm. And the final output of MethylMix is genes that are both transcriptionally predictive and differentially methylated states. Additionally, the differential methylation (DM) value where a negative DM value signifies hypomethylation and a positive DM value signifies hypermethylation can be used in subsequent analysis.

Methylation-driven genes functional enrichment and pathway analysis

Gene ontology (GO) analysis was conducted on identified methylation-driven genes with methylation/expression using the DAVID database. DAVID provides integrative and systematic annotation tools for unraveling biological meaning of genes. Gene ontology (GO) analysis includes the molecular function, biological process and cellular component [16]. And we used Goplot to visualize the result.

Pathway analysis was conducted for the methylation-driven genes with ConsensusPathDB [17], which is a functional molecular interaction database, integrating information on genetic interacting signaling, protein interacting, drug-target interactions, metabolism and gene regulation in humans. Over-representation analysis was based on neighbourhood entity sets or biochemical pathways, and the pathway analysis was performed on the basis of imputed gene list. Lists of hypomethylated genes and hypermethylated genes were analyzed together. We used p value cutoff of 0.05 and minimum overlap as default settings.

Survival analysis

Kaplan–Meier curves were used to identify the relationship between methylation-driven genes and the survival in ESCC. The independent prognostic possibility of methylation-driven genes was screened via the survival R package. The p value was obtained using the long-rank test and p < 0.05 were considered statistically significant.

To further investigate the key genes from methylation-driven genes, we combined abnormal methylation genes with the corresponding gene expression data, and the joint survival analysis was performed via the survival R package. In addition, since the key genes were obtained from the above, we merged relevant sites of methylation and corresponding gene expression data, for identifying the correlation between gene expression and key gene methylation sites.

Results

Identification of methylation-driven genes in ESCC

To study methylation-driven genes, a total of 3 normal samples and 96 sample of methylated from TCGA were included in our study. First, we used LIMMA software package for DMGs filtration (p < 0.05, |logFC ≥ 1|, and hypermethylation of 447 genes and hypomethylation of 520 genes were identified (Fig. 1). Second, edgR R package was used for identifying the DEGs in ESCC, and DEGs and DMGs were merged. Third, according to the Methylmix R package, we recombined the DEGs and DMGs and divided them into methylated cancer set, methylated normal set and gene cancer set. P < 0.05 and cor < − 0.3 were adopted for screening methylation-driven genes. Last, 26 genes were screened and we used R software to visualized the mixture model and the correlation between genes expression and degree of methylation (Table 1). Among them, 4 genes were shown in Figs. 2 and 3, while the rest were shown in Additional file 2: Figure S1 and Additional file 3: Figure S2. Furthermore, there were no significant differences between these four genes (p > 0.05).

Fig. 1
figure 1

Thermal map of ESCC-related abnormal methylation genes. The color from bule to red illustrates a trend from low methylation to high methylation. ESCC esophageal squamous cell carcinoma

Table 1 DNA methylation-driven genes in ESCC
Fig. 2
figure 2

Methylmix model of DNA methylation-driven genes. The distribution maps show the methylation states of methylated genes. The histogram represents the distribution of methylation in tumor samples. The horizontal black bar demonstrates the distribution of methylation in the normal samples. The distribution of methylation degree can be clearly seen from the figure (ad)

Fig. 3
figure 3

Correlation between DNA methylation and gene expression (ad)

Functional enrichment and pathway analysis of methylation-driven genes

To further investigate the function of methylation-driven genes in ESCC, we used GO enrichment analysis in DAVID. Methylation-driven genes were enriched in molecular function (MF) of DNA binding and transcription factor activity. As for cell component (CC), these genes showed enrichment in nonmotile primary cilium. Besides, biological process (BP) indicated enrichment predominantly at regulation of RNA metabolic process (Fig. 4a).

Fig. 4
figure 4

Methylation-driven genes functional enrichment and pathway analysis. a Gene ontology analysis of methylation-driven genes of ESCC. b Pathway analysis of methylation-driven genes by using ConsensuspathDB

Pathway enrichment analysis revealed that methylation-driven genes were significantly linked to vitamin D receptor pathway, development and heterogeneity of the ILC family, adipogenesis and G alpha (i) signaling events (Fig. 4b).

Prognostic assessment of methylation-driven genes in ESCC

The prognostic value of 26 methylation-driven genes was assessed by Survival R package, and we found five genes (ABCD1, SLC5A10, SPIN3, ZNF69, and ZNF608) were independent prognostic indicators for ESCC (Fig. 5). However, to further investigate the correlation between genes methylation and expression, we combined these data to study the influence on patients’ survival. Using p < 0.05 as a significant standard for integrative survival, the gene expression and methylation levels of the prognostic genes ABCD1, CCDC8, FBXO17 were meaningfully correlated with prognosis (Fig. 6). Also, the prognosis-related genes methylation sites based on corresponding data in TCGA were identified, and the correlation between genes expression and sites were analyzed (Table 2). The gene expression of ABCD1, CCDC8 and FBXO17 were identified to be correlated with the methylation level of multiple sites, and all of them showed negative correlations (Figs. 7, 8).

Fig. 5
figure 5

Kaplan–Meier survival curves of five independent prognostic factors (ae)

Fig. 6
figure 6

Kaplan–Meier survival curves for the integrative analysis of ABCD1, CCDC8 and FBXO17 (ac)

Table 2 Correlation between gene expression and methylation sites
Fig. 7
figure 7

The correlation between methylated sites and gene expression. ABCD1 methylated sites with matched gene expression ad. CCDC8 methylated sites with matched gene expression eh

Fig. 8
figure 8

The correlation between methylated sites and gene expression. CCDC8 methylated sites with matched gene expression a, b. FBXO17 methylated sites with matched gene expression (ce)

Discussion

Esophageal carcinoma is one of the most common malignant tumors of the digestive system with high mortality and poor prognosis [18]. Esophageal adenocarcinoma (EAC) and esophageal squamous cell carcinoma (ESCC) are the major histological subtype of esophageal cancer. Alcohol consumption and tobacco smoking are two main risk factors in ESCC [19, 20], while obesity, diet and gastroesophageal reflux disease (GERD) were considered as risk indicators for EAC [21, 22]. Despite chemoradiotherapy or surgery, the prognosis of esophageal carcinoma remains poor with the overall survival [23]. The mechanism of ESCC is still unclear. Therefore, a further study of ESCC and subsequent therapeutic advances are urgently needed. Both epigenetic and genetic aberrations have been identified to ESCC generation and progression. With the rapid development of gene analysis technology, we can further study the molecular characteristics of ESCC, which provides valuable evidence for prognosis and therapeutic molecular targets.

Recently, the study on the relationship between epigenetics and tumorigenesis is always one of hotspots in the molecular biology. Epigenetics are different in that nucleotide sequences have not changed and play roles via DNA methylation, chromosome remodeling and histone deacetylation. Many studies have shown that DNA methylation is correlated with human ESCC. Aberrant DNA methylation of genes can be served as noninvasive biomarkers for the diagnosis and detection of cancer [24, 25]. Therefore, to investigate the epigenetic changes and the molecular mechanisms of ESCC progression that determines promising biomarkers, early diagnosis, treatment of ESCC is significant. The stability and independence of aberrant methylated DNA analysis make it a feasible approach for prognostic biomarkers [26]. Several reports have shown that the aberrant methylation of DNA affects genes involved in DNA damage, cell cycle, Wnt, NF-κB signaling pathways, including MGMT [27], P16 [11], DACH1 [28] and ZNF382 [29]. Also, other studies have shown that methylated FHIT is correlated with poor prognosis in early ESCC [30]. Therefore, bioinformatics analysis of the molecular functional enrichment and prognostic value of aberrant methylation DNA can offer clinicians with promising tools to predict prognosis and treat patients.

In our study, we investigated aberrant methylated genes between normal samples and ESCC patients to identify the biomarkers of prognosis related to methylation-driven genes. A model-based tool (methylmix) was used to identify those genes with abnormal methylation and correlation with gene expression, and 26 methylation-driven genes were found [13]. To study the functional roles of these ESCC methylation-driven genes, gene ontology (GO) and pathway analysis were performed. As was revealed by DAVID database, methylation-driven genes in ESCC were enriched in molecular function (MF) of DNA binding and transcription factor activity. As for cell component (CC), these genes showed enrichment in nonmotile primary cilium. Besides, biological process (BP) indicated enrichment predominantly at regulation of RNA metabolic process. These functional items not only showed the interaction of genes at the functional level but also revealed the aberration of genes function may result from abnormally methylated DNA in different samples.

In order to study further of the relationship between methylation-driven genes and patients, survival R package was utilized to analyze the correlation between abnormal DNA methylation and patients survival. Five candidate genes (ABCD1, SLC5A10, SPIN3, ZNF69, ZNF608) were identified from 26 driven genes (p < 0.05), and they might be served as independent prognostic factor for ESCC. However, it was still not comprehensive for just analyzing aberrant methylation data with patients’ survival. Thus, we moved on to combine abnormal methylation genes and the corresponding gene expression data with patients’ survival for integrative survival analysis. In the result, ABCD1, CCDC8, FBXO17 were identified to be meaningfully correlated with prognosis. Previous studies have suggested CCDC8 (coiled-coil domain containing 8) was frequently epigenetically dysregulated in renal cell carcinoma and in breast carcinomas that metastasis to the brain [31, 32]. Also, FBXO17 (F-box protein 17) have been identified to be hypermethylated in salivary gland adenoid cystic tumor [33]. For these specific genes, we further studied the correlation between expression level with methylation level of the sites, and we found multiple sites were negatively correlated with the gene expression level. The result may due to aberrant methylation of the sites leading to the dysregulation of the expression, which affects the generation and progression of cancers and the prognosis of patients.

Growing evidence demonstrated that the aberrant DNA methylation was associated tumors generation and progression via the bioinformatics analysis. For instance, Gao et al. found a prognostic risk model for evaluating the prognosis of LUSC patients, and they studied the abnormal methylated sites of key genes which had poor prognosis with patients [34]. Also, Fan et al. used GEO database to study aberrant methylation genes as biomarkers for hepatocellular cancer [35]. At present, the abnormally methylated genes in ESCC still have not been studied. Compared to previous studies, we used methylmix as a technology, which provided a more comprehensive analysis for screening methylation-driven genes in ESCC. For transferring the result to practical application, we studied the methylation driven-genes which were significant with patients’ survival. Furthermore, the correlation between abnormally methylated sites and gene expression was analyzed for providing a more precise target for further experimental validation. Although we have made comprehensive study correlated with epigenetics changes and ESCC, the experiments are still significant to testify its specificity and sensitivity.

Conclusion

In summary, we found DNA methylation-driven genes involved in ESCC generation and progression by using methylmix technology. On this basis, we further studied the driven-genes related to patients’ survival. In the result, ABCD1, SLC5A10, SPIN3, ZNF69, ZNF608 were identified and can be served as independent prognostic factors for ESCC. ABCD1, CCDC8 and FBXO17 were screened out by the integrative survival analysis, and multiple methylated sites were correlated with gene expression. Those aberrant methylated genes may contribute to reveal the mechanisms of ESCC generation and progression and can be served as promising biomarkers for diagnosis, treatment and prognosis. Further characterization of the DNA methylated changes can help to figure out the mechanisms and design improved existing treatment.