Background

Lung cancer is the leading cause of cancer-related death worldwide [1]. Lung adenocarcinoma (LUAD) accounts for 45–55% of non-small-cell lung cancer (NSCLC), with a 5-year overall survival rate of less than 15% [2]. Due to the heterogeneity of lung adenocarcinoma, it is still a great challenge to develop successful individual-based treatment [3]. Therefore, there is an urgently need to identify effective and promising biomarkers in predicting the prognosis of LUAD.

Genetic aberrant expression is crucial for cancer etiology, and the joint effect of both genetic and epigenetic changes facilitates the development of human cancer [4,5,6]. DNA methylation acts as the key element in epigenetic modifications and plays a significant role in the regulation of cellular functions and carcinogenesis [7,8,9]. Epigenetic modification, especially DNA methylation, can provide a novel horizon to explore new biomarkers in predicting the prognosis of cancer [10,11,12,13,14]. A large number of studies have demonstrated that DNA methylation can regulate the expression of lncRNA, and this phenomenon may be associated with the prognosis of lung cancer [15]. For instance, the lncRNA AFAP1-AS1 acts as an oncogene in NSCLC, while its expression is tightly regulated by DNA methylation, which might provide prognostic and diagnostic values for NSCLC patients [16].

MethylMix, an algorithm implemented in the R programming environment, identifies disease-specific hyper- and hypomethylated genes [17]. Currently, few studies on methylation-driven genes have been reported [18]. Recently, a study based on using MethylMix to explore methylation-driven genes for predicting the prognosis of LUAD was reported; however, they only obtained information about methylation-driven mRNA [19]. In this study, DNA methylation and RNA-Seq data were extracted from The Cancer Genome Atlas (TCGA) database, and we used the MethylMix R package to obtain LUAD-specific methylation-driven lncRNA sequences. Furthermore, a Cox survival predictive model with 6 lncRNAs was constructed to predict the diagnosis and prognosis of LUAD. Finally, the combined effect of DNA methylation and gene expression survival analysis was examined, which might provide a novel insight to explore methylation-driven lncRNA and mRNA for predicting the prognosis of LUAD.

Methods

Data retrieving and analyzing

Methylation and RNA-Seq expression data were retrieved from LUADs from the TCGA database. The methylation data were downloaded from 475 cancer tissues and 32 noncancer tissues from the Illumina Human methylation 450k platform. The RNA-Seq data (level 3), including mRNA and lncRNA expression, were retrieved from 535 cancer tissues and 59 noncancer tissues from the IlluminaHiSeq_RNASeq platform. First, on the basis of “limma R” packages in R with absolute fold change (log 2) > 0 and adjusting the false discovery rate (FDR) to a P value < 0.05, we obtained aberrant methylated genes. The methylation difference of the mean of the 3000 bp (base pair) sites upstream of the gene was analyzed to identify the differential level of methylation in the gene promoter [19,20,21,22]. We analyzed the differential level of methylation in the promoter of genes by using the limma R package [23]. Then, based on the “edge R” packages in R with absolute fold change (log 2) > 3 and adjusting the false discovery rate (FDR) to a P value < 0.01 to correct the statistical significance of multiple experiments, we retrieved differentially expressed mRNA and lncRNA. MethylMix is a kind of R statistical package for integrating DNA methylation data and RNA expression data to identify methylation-driven genes in kinds of cancers [17, 24,25,26]. Filtering or eliminating missing value genes is the preprocessing common step when running MethylMix R software [17]. In this present manuscript, we filtered or eliminated missing value genes and intersected DNA methylation data with RNA expression data for matching. Finally, there were a total of 465 LUAD samples with matched DNA methylation and RNA expression and 32 non-LUAD samples with DNA methylation data for entering the MethylMix R package. Then, calculated the correlation between DNA methylation level and RNA expression to find significantly negatively related genes, a beta mixture model was constructed for the degree of methylation of samples, Wilcoxon rank test was used to calculate differential methylation in LUAD and adjacent non-LUAD samples. Finally, the methylation-driven mRNA and lncRNA was obtained. Since the data were directly obtained from the TCGA database, no approval was required from the local ethics committee.

Functional enrichment analysis of methylation-driven mRNA in LUAD

To determine the function represented in the methylation-driven mRNA, we used the Database for Annotation, Visualization and Integrated Discovery (DAVID) (http://david.abcc.ncifcrf.gov/) to perform a functional and enrichment analysis on the methylation-driven mRNA by using GO and ConsensusPathDB analysis. In the GO analysis, a P value of less than 0.05 was considered statistically significant. Furthermore, the GOCircle and GOChord plotting functions of the GOplot R package were used to allow data from expression analysis and data from functional annotation enrichment analysis. ConsensusPathDB (http://cpdb.molgen.mpg.de/) is an online software program that includes binary and complex signaling, gene regulatory and drug-target interactions, and biochemical pathways. P < 0.05 was considered statistically significant.

Construction of a differentially methylated, lncRNA-related predictive model in LUAD

We identified the differentially methylated lncRNA associated with overall survival with P < 0.05 to act as prognostic methylation lncRNA candidates for multivariate Cox regression analysis. On the basis of the median risk score, LUAD patients were divided into two cohorts, high-risk cohorts and low-risk cohorts. Receiver operating curves were used to test the effect of the lncRNA signature (high risk vs low risk) on overall survival. We analyzed the receiver operating curve by calculating the area under the curve (AUC) under the binomial exact confidence interval to reveal prognostic biomarkers for predicting survival in LUAD.

Combined methylation and gene expression survival analysis

To further explore the effect of methylation level and gene expression level of the same methylation-driven key gene on LUAD patient prognosis, we performed a combined methylation and gene expression survival analysis to identify potential methylation-driven mRNAs and methylation-driven lncRNAs for predicting the prognosis of LUAD patients. Therefore, a Kaplan–Meier curve was performed. P < 0.05 was regarded as statistically significant.

Results

Identification of methylation-driven mRNA and lncRNA in LUAD

A total of 99 mRNAs and 17 lncRNAs were identified to be associated with DNA methylation using MethylMix criteria. The methylation-driven mRNAs and IncRNAs are shown in Tables 1 and 2, respectively. We constructed a mixed model and performed a Wilcoxon rank test for determining differential methylation (logFC > 0, P < 0.05, Cor < − 0.3). Figure 1 shows that two methylation-driven mRNAs (ZNF454 and ZNF471) (Fig. 1f, g) and two methylation-driven lncRNAs (TUSC8 and LINC00676) (Fig. 1h, i) have significant negative correlations between methylation and gene expression levels. In Fig. 1, the distribution of the methylation degree shows that ZNF454 (Fig. 1b), ZNF471 (Fig. 1c), and LINC00676 (Fig. 1e) are hypermethylated in LUAD patients and hypomethylated in the normal group, while TUSC8 (Fig. 1d) is hypomethylated in LUAD patients and hypermethylated in the normal group. A heat map of methylation-driven mRNAs and lncRNAs is shown in Fig. 2a, b. A flow diagram of the exploration of methylation-driven mRNA and lncRNA in LUAD is shown in Fig. 1a.

Table 1 Methylation-driven mRNAs
Table 2 Methylation-driven lncRNAs
Fig. 1
figure 1

Identification of top hypermethylated and hypomethylated mRNAs and lncRNAs in LUAD. a A flow diagram of the exploration of methylation-driven mRNA and lncRNA in LUAD. be The methylation degree when comparing cancer patients to normal patients in LUAD. The red curve indicates the methylation degree from the cancer group, the green curve indicates the methylation degree from the normal group, and the black line above the figure is the distribution of methylation levels in normal patients. fi The correlation between methylation and gene expression in methylation-driven mRNAs and lncRNAs

Fig. 2
figure 2

Heat map of methylation-driven mRNAs and lncRNAs in LUAD. a The hierarchical clustering heat map of LUAD-specific methylation-driven mRNAs. b The hierarchical clustering heat map of LUAD-specific methylation-driven lncRNAs. In the figure, red represents highly methylated genes and green represents low methylated genes between LUAD and adjacent tissues

Enrichment analysis of methylation-driven mRNAs in LUAD

Gene ontology analysis showed that there were 5 GO terms (regulation of transcription, DNA-templated; transcription factor activity, sequence-specific DNA binding; nucleic acid binding; transcription, DNA-templated; metal ion binding) with significant differences (P < 0.05), and the highest GO biological process was “GO:0006355 regulation of transcription, DNA-templated” (Fig. 3a, c). The GOChord plot shows the top 30 methylation-driven mRNAs with their related GO terms (Fig. 3b). Figure 4 shows that 11 pathways (Generic Transcription Pathway, Benzene metabolism, RNA Polymerase II Transcription, Gene expression (Transcription), Platinum Pathway, Pharmacokinetics/Pharmacodynamics, Phase II—Conjugation of compounds, Sulfation Biotransformation Reaction, Estrogen metabolism, Glutathione-mediated detoxification, Cytosolic sulfonation of small molecules, Drug metabolism—other enzymes—Homo sapiens (human)) were considered statistically significant (P < 0.05). Furthermore, pathway analysis showed that the methylation-driven mRNAs were most enriched in the Generic Transcription Pathway, RNA Polymerase II Transcription and Gene expression (Transcription) pathways (P < 0.01) (Fig. 4). The pathway analysis is shown in Table 3.

Fig. 3
figure 3

Functional enrichment analysis of methylation-driven mRNAs in LUAD. a The outer circle represents the expression (logFC) of methylation-driven mRNAs in each enriched GO (gene ontology) term: red dots on each GO term indicate upregulated methylation-driven mRNAs and blue dots indicate downregulated methylation-driven mRNAs. The inner circle indicates the significance of GO terms (log10-adjusted P values). b The circle indicates the correlation between the top 30 methylation-driven mRNAs and their gene ontology terms. c The distribution of the methylation-driven mRNAs in significant GO terms

Fig. 4
figure 4

Pathway analysis of methylation-driven mRNAs in LUAD. The red circles indicate the number of methylation-driven mRNAs on each pathway. The line between the two red circles represents the ratio of methylation-driven mRNAs present in the common genes of the two pathways; the thicker the line, the more common methylation-driven mRNAs are represented

Table 3 Pathway analysis

Construction of a predictive model of six differentially methylated lncRNAs in LUAD

Univariate Cox regression analysis was performed first to identify the prognosis associated with differentially methylated genes in LUAD, incorporating 10 methylation genes that were conspicuously associated with overall survival (P < 0.05). Next, multivariate Cox regression was used and showed that six lncRNAs were eventually selected to construct a predictive model. We used the linear combination of the expression of the 6 lncRNAs to construct the predictive model. The relative coefficients weighted in the multivariate Cox regression were as follows: survival risk score = (3.0040 × expression value of FOXE1 + 1.0226 × expression value of HOXB13-AS1_2 + 1.0540 × expression value of VMO1 + 1.0050 × expression value of HIST1H3F + (− 3.0925) × expression value of AJ003147.8 + 1.4791 × expression value of ASXL3). The multivariate Cox analysis is shown in Table 4.

Table 4 Multivariate Cox regression analysis of 6 lncRNAs associated with overall survival in LUAD patients

Risk groupings and ROC curve analysis

As shown in the heat map, the expression of six prognostic methylation genes was profiled (Fig. 5a). Based on the median risk scores, a total of 449 samples of complete survival information were divided into a high-risk group (n = 224) and a low-risk group (n = 225). We used the Kaplan–Meier curve with a log-rank statistical examination to perform survival analysis. As shown in Fig. 5b, patients in the low-risk group had conspicuously better overall survival than those in the high-risk group (Fig. 5b). The receiver operating characteristic (ROC) curve was analyzed to test the influence on the 6-lncRNA signature associated with overall survival in LUAD (Fig. 5c).

Fig. 5
figure 5

Prognostic value of 6-methylation lncRNAs in LUAD. a A risk heat map established from 6 lncRNAs from 449 LUAD patients. b Kaplan–Meier curve analysis for OS (overall survival) of LUAD patients using the 6 lncRNA signatures. c ROC curve analysis of the prognostic 6-lncRNA signature

Combined methylation and gene expression survival analysis in LUAD

The combined Kaplan–Meier curve analysis revealed that the combination of methylation and expression of lncRNAs AC023824.1, AF186192.1, LINC01354 and WASIR2 had a conspicuous correlation with the prognosis of LUAD patients (Fig. 6a–d). The hypermethylation and low-expression survival rate of AC023824.1 was high, while the hypermethylation and low-expression survival rate of AF186192.1, LINC01354 and WASIR2 were low. The combined Kaplan–Meier curve analysis showed that the combination of methylation and mRNA expression of mRNAs CCDC181, EFS, F2RL1, GKN2, ITPRIPL1, KLHDC9, MPV17L, and S1PR1 were associated with overall survival of LUAD (P < 0.05) (Fig. 6e–l). The hypermethylation and low-expression survival rate of F2RL1 was high. However, the hypermethylation and low-expression survival rates of EFS, CCDC181, GKN2, ITPRIPL1, KLHDC9, MPV17L, and S1PR1 were low (Fig. 6).

Fig. 6
figure 6

The combined methylation and gene expression data survival analysis in LUAD. ad Kaplan–Meier curve analysis of four lncRNAs in LUAD. el Kaplan–Meier curve analysis of eight mRNAs in LUAD

Discussion

In recent years, with the increasing numbers of advanced diagnoses and poor prognoses in lung adenocarcinoma, it is pivotal to find more effective prognostic biomarkers to predict survival in LUAD. LncRNA-related studies have attracted the attention of various cancer fields. Accumulating studies show that cancer-related lncRNAs may serve as diagnostic or predictive biomarkers of cancer and have a significant effect on the therapeutic treatment of cancer [27]. Emerging evidence shows that studies on the molecular mechanisms and prognostic biomarkers of LUAD associated with methylation-driven lncRNA and mRNA are still lacking.

In recent years, epigenetic alterations in DNA methylation, noncoding RNA expression, chromatin modeling and post-transcriptional regulators have been found to play significant roles in the regulation and development of lung cancer pathogenesis [28,29,30,31,32]. Some studies have shown that epigenetic changes in DNA methylation cause changes in the expression of lncRNA, which might provide a novel insight to explore new biomarkers for predicting the prognosis of human cancer [33,34,35,36]. For instance, analysis of microarray data on gene expression and methylation showed that the expressions of lncRNAs LOC146880 and ENST00000439577 were regulated by DNA methylation, which might provide a new horizon to predict the diagnosis and prognosis of NSCLC [37]. Lu et al. indicated that MEG3 is significantly downregulated in NSCLC tissues that could be affected by DNA methylation [38]. Previous studies have shown that survival-associated, methylation-driven lncRNAs might serve as novel prognostic biomarkers for predicting the prognosis of LUAD.

Recent studies have shown that the roles of lncRNA in tumorigenesis and metastasis can indicate that lncRNA may function as a novel biomarker for the diagnosis and prognosis of cancer [39,40,41,42,43,44]. LncRNA TUBA4B has been reported to serve as a new predictor for prognosis and modulate cell viability in non-small-cell lung cancer [45]. lncRNA AFAP1-AS1 may act as an oncogenic to facilitate the migration of non-small-cell lung cancer (NSCLC) [16]. Long noncoding RNA ANRIL acts as an oncogene by silencing KLF2 and P21 expression to promote the development of NSCLC [46]. LncRNA PANDAR acts as a cancer suppressor gene by regulating Bcl-2 to affect cell apoptosis in NSCLC [47].

In the present study, we retrieved methylation and lncRNA and mRNA expression from the TCGA database by using bioinformatics analysis and obtained methylation-driven lncRNAs and mRNAs to predict the prognosis of LUAD. First, we obtained differentially expressed methylation and lncRNA and mRNA using the MethylMix R package to obtain methylation-driven lncRNA and mRNA. Functional enrichment analysis was performed to analyze the methylation-driven mRNA to identify its biological functions in the regulation and development of LUAD. Furthermore, univariate and multivariate Cox regressions were performed to construct a predictive model for predicting the prognosis of LUAD. Finally, a combined methylation and lncRNA expression survival analysis was carried out, which might provide novel insight to predict the diagnosis and prognosis of LUAD.

In the present study, we combined methylation and lncRNA and mRNA expression data with survival analysis to identify 4 lncRNAs and 8 mRNAs that act as independent prognostic factors for predicting the diagnosis and prognosis of LUAD. LINC01354 acts as a ceRNA to predict the early diagnosis and prognosis of colorectal cancer [48]. The combined survival analysis showed that the low expression of AC023824.1 with hypermethylation, compared to the high expression of AC023824.1 with hypomethylation, had a higher survival rate. (P = 0). The combined hypermethylation and low-expression survival rate of AF186192.1 was lower than the hypomethylation and high-expression survival rate of AF186192.1. (P = 0.01). The low expression and hypermethylation survival rate of LINC01354 was low. (P = 0.038). The high expression and hypomethylation survival rate of WASIR2 was high. The survival analysis showed that lncRNA AF186192.1, LINC01354 and WASIR2 might act as cancer suppressor genes regulated by DNA methylation to play significant roles in predicting the prognosis of LUAD. LncRNA AC023824.1 might act as an oncogene regulated by DNA methylation to have a pivotal effect on predicting the prognosis of LUAD. The survival rate of hypermethylation and low expression of CCDC181, EFS, GKN2, ITPRIPL1, KLHDC9, MPV17L, and S1PR1 were low. However, the hypermethylation and low-expression of the survival rate of F2RL1 was high. F2RL1 might act as an oncogene for predicting the prognosis of LUAD. Previous studies have shown that CCDC181, KLHDC9, and S1PR1 act as methylation-driven genes to reveal prognostic biomarkers in LUAD [19]. GKN2 may contribute to the homeostasis of gastric epithelial cells by inhibiting GKN1 activity [49]. F2RL1 may act as novel acute myeloid leukemia subsets that are meaning for treatment guidance [50]. MPV17L acts as a unique interacting protein and regulator of HtrA2 protease, mediating antioxidant and antiapoptotic functions in mitochondria [51]. Compared with previous studies, our study first obtained differentially expressed mRNA and lncRNA using the edge R package and aberrant methylated genes using the limma R package, and then we filtered low expression genes and intersected mRNA and lncRNA expression data with DNA methylation data to obtain methylation-driven genes by using the MethylMix R package. The MethylMix (https://bioconductor.riken.jp/packages/3.1/bioc/html/MethylMix.html) is an algorithm implemented to integrate DNA methylation with RNA expression to identify methylation-driven genes in cancers [52]. In summary, MethylMix provides a tool that contributes to the analysis of methylation-driven lncRNAs and mRNAs in cancer studies from TCGA [17, 18]. However, the MethylMix focuses on identifying cis-regulatory effects of DNA methylation on gene expression and does not currently model trans-regulatory effects [18]. Further studies are needed to solve the multiple testing challenge on identifying trans-regulatory effects of DNA methylation on gene expression. Our study may provide a novel method for determining disease-specific prognostic biomarkers in LUAD and may play a significant role in predicting the diagnosis and prognosis of LUAD.

Our study subjects were retrieved from the TCGA database, which is a significant tool for analyzing prognostic biomarkers. It is not known whether our results are applicable to other groups. The predictive prognostic lncRNA and mRNA signature needs to be verified by molecular biologic experiments on clinical samples in future studies. Eventually, large-scale samples and experimental studies could validate the biological function of prognostic biomarkers in LUAD.

Conclusion

In conclusion, our study identified methylation-driven mRNA and lncRNA by using bioinformatics analysis from the TCGA database. A Cox predictive model was performed to identify independent prognostic factors. Methylation and gene expression data combined with survival analysis was used to identify LUAD-specific, methylation-driven lncRNAs and mRNAs for predicting the diagnosis and prognosis of LUAD.