Introduction

The World Health Organization (WHO) notes that lung cancer is one of the most prevalent malignancies in developing nations, where its prevalence is also on the rise as a result of things like air pollution and smoking [1]. The National Cancer Institute (NCI) reports new cases and mortality rates per 100,000 people: The annual incidence of new lung and bronchial cancers is 50.0 cases per 100,000 people. The annual male and female mortality rate is 35.0 per 100,000,Based on 2017–2019 data, approximately 6.1% of men and women will be diagnosed with lung and bronchial cancer at some point in their lifetime [2]. Lung cancer is mainly divided into non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). NSCLC accounts for approximately 85% of lung cancers, with lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) being the most common subtypes [3]. People who die of lung cancer are often diagnosed at an advanced stage. Therefore, an early lung cancer diagnosis is essential, requiring sensitive and specific biomarkers for early diagnosis.

DNA methylation is one of the earliest discovered and widely studied epigenetic regulatory mechanisms [4]. DNA methylation refers to the formation of 5- methylcytosine (5mC) by transferring the methyl group to the 5-carbon position of the cytosine ring with s- adenosylmethionine (SAM) as the methyl donor [5]. Abnormal DNA methylation is closely related to the occurrence and development of cancer [6], and changes in DNA methylation have been observed in various types of cancer [7]. Hypomethylation of oncogenes and hypermethylation of tumor suppressor genes are pathogenic mechanisms in most tumors [8]. The former is through genome-wide hypomethylation, which plays a stabilizing role in the structure of heterochromatin, and the latter is through the hypermethylation of promoter genes to inhibit the expression of oncogenes [9]. Aberrant DNA methylation contributes to the development and progression of lung cancer [10]. Abnormal DNA methylation patterns were observed in lung cancer cells compared with normal lung tissue [11].

In this study, we obtained the transcriptional data and clinical data of LUAD patients from TCGA and GEO, and conducted a comprehensive analysis, and finally determined a set of DNA methylation features of the CFTR and PKIA genes. We performed a receiver operating characteristic curve to verify the ability of the identified DNA methylation profiles to predict the survival of LUAD. In addition, we carried out immunohistochemical chemistry, and the results showed that these genes were different from normal people and tumor patients.

Materials and methods

Data collection

DNA methylation data was downloaded from TCGA database((https://portal.gdc.cancer.gov/)).DNA methylation profile was measured experimentally using the Illumina Infinium HumanMethylation 450 platform. The level of DNA methylation is express as a beta value. The transcriptome profiling data and clinical information also were downloaded from TCGA and GEO database (https://www.ncbi.nlm.nih.gov/geo/). The DNA methylation data contains 466 samples which include 29 normal samples 437 tumor samples and from TCGA. Gene expression data were downloaded from TCGA and GEO (GSE30219). TCGA included 54 normal and 501 tumor samples, and GEO included 14 normal and 293 tumor samples. Data analysis and collation were carried out using R software(version 4.2.1)and perl software(v5.30.0).

Screening of differentially expressed genes(DEGs)

The transcriptome analysis data and clinical information from TCGA and GEO were transformed into a gene expression matrix using Perl software, and R was used to intersect the two expression matrices of TCGA and GEO to obtain the intersection gene expression level. Batch corrections were performed on both datasets simultaneously to obtain normalized expression levels. Using the “Limma” and “Pheatmap” package in R software, log2 conversion and DEGs screening of TCGA normalize data were performed. |log2FC|>0.585 and adjust P-value < 0.05 were regarded as the statistical significance threshold level of DEGs samples.

Differential methylation analysis

Use the “Limma” package of R software to read differential gene expression files and methylation data files, extract the expression and methylation data of normal samples and tumor samples, and intersect the two to obtain four files: Expression files and methylation files for normal and tumor samples. Methylation-driven genes need to meet the following conditions: (1) The gene expression level is different between the normal group and the tumor group. (2) The gene methylation level is different and negatively correlated between the normal group and the tumor group. Use the “MethylMix” package of R software to screen methylation-driven genes, Cox regression analysis was used to study the correlation between gene methylation and expression, and 24 genes were obtained.

Construction of prognostic risk model

To correlate the expression of DNA methylation regulators with overall survival (OS), we performed a univariate Cox regression analysis. In the LASSO Cox regression algorithm, the optimal penalty parameter λ and the corresponding coefficient criterion are determined according to the minimum criterion through cross-validation. Patients were divided into the low methylation risk group and a high-risk group according to the average risk score [12]. We used survival curves to compare overall survival (OS) between the two groups. Risk curves were plotted to show the relationship between risk assessment, survival status, and the two gene methylation levels. To investigate whether risk score could be used as an independent factor, we performed univariate and multivariate Cox regression analysis using the “survival” R-package.

Establishment and evaluation of Nomogram for survival rate prediction

To accurately predict OS at 1, 3, and 5 years, clinical information with independent predictors was combined with risk scores to create prognostic nomograms. Harrell’s Concordance Index (C-Index) was used to assess forecast accuracy. The c-index ranges from 0.5 (no predictive power) to 1 (perfect prediction). Assess the performance of the nomogram using the calibration plot. Each patient was assigned an overall nomogram score, the nomoscore, and the quantiles of the nomo score were used as cut-off points to classify patients into three risk groups. KM curves further check the performance of the nomogram.

Gene set enrichment analysis

Functional enrichment of gene expression data was interpreted using gene set enrichment analysis (GSEA). The method analyzes the genome to determine whether they are statistically significantly different under the two biological conditions. Within the “Molecular Signatures Database”(http://www.gsea-msigdb.org/gsea/msigdb/index.jsp) of c2.cp.kegg. symbols and c5.go.symbols by GSEA with R software, underlying mechanisms were studied. The random sample permutation number was set as 500, and the significance threshold p < 0.050.

Drug sensitivity prediction

The GDSC(https://www.cancerrxgene.org/) database, the largest public resource of molecular indicators of cancer drug sensitivity and treatment response, for differential analysis of drug sensitivity [13]. The drug susceptibility values (IC50) of 198 drugs were downloaded from the GDSC database. Drug response prediction was evaluated using the ‘oncoPredict’, ‘limma’ and ‘paralle’ packages in R, and the accuracy of the prediction was estimated by cross-validation.

Immunohistochemistry

Sections were deparaffinized, followed by antigen retrieval in citrate buffer and blocking with 3% hydrogen peroxide. Subsequently, block sections with 3% fetal calf serum for 60 min, and then use antibodies (rabbit polyclonal anti-CFTR antibody, 1:100; Fine Test, FNab01623, China. Rabbit polyclonal anti-PKIA antibody 1:100; CUSABIO, CSB-PA030217, China) or fetal bovine serum (negative control) were incubated overnight at 4 °C. The next day, the secondary antibody [1:400, HRP-conjugated Goat Anti-Rabbit IgG, Sangon Biotech, NO. D110058), China] was incubated for 30 min at 37 °C. After washing, immunoreactivity was visualized by incubating tissue sections with a DAB staining kit (GTVisionTM+Detection System/Mo&Rb, GK600710) and counterstaining with hematoxylin. Immunoreactive cells are stained tan.

Results

Identification of DEGs in LUAD

Transcriptome profiling was obtained from TCGA for lung cancer tissues (n = 501) and non-tumor tissues (n = 54) (supplement 1), GEO database expression files (GSE30219) included 14 normal samples and 293 tumor samples (supplement 2, 3). The intersection of genes is taken, and the expression levels of the intersection genes in the two databases are obtained respectively, and batch correction is performed. TCGA-corrected expression files were analyzed using FDR < 0.05 and |log2 FC| > 0.585 as thresholds. A total of 4471 genes were screened, of which 1900 genes were up-regulated and 2571 genes were down-regulated for subsequent analysis (supplement 4). The identified DEGs were visualized using a heatmap and a volcano map(supplement Fig. 1sA, B), where the heatmap shows the most significantly upregulated and downregulated 50 genes.

Identification of DNA methylation-driven genes in LUAD

To identify DNA methylation driver genes in LUAD, gene expression and DNA methylation data of 2571 DEGs from 466 clinical samples from TCGA (437 LUAD samples and 29 non-tumor samples) (supplement 5) were included in the methylation analysis. The inclusion criteria are: (1) There is a difference in gene expression between the normal group and the tumor group; (2) The gene methylation level is different in the normal group; (3) There is a negative correlation between the gene methylation level and gene expression, 24 methylation-driven genes were screened out(Figs. 1 and supplement Fig. ">2s)( Table 1). Then, we plotted the expression data heatmap and methylation heatmap of these 24 genes(supplement Fig. 3sA, B).

Fig. 1
figure 1

Correlation between methylation and expression in LUAD. The x-axis is an expression, the y‐axis is methylation

Table 1 24 methylation-driven gene expression levels
Fig. 2
figure 2

Construction of prognostic risk profiles for patients with two survival-related genes LUAD. (A) Forest map of screening prognostic genes. (B,C) Using LASSO Cox regression, two m5C RNA methylation regulators were selected for risk coefficient calculation. (D,E) Survival analysis of TCGA and GEO in high-risk and low-risk population. (F,G) Time -ROC curve analysis of 1-,3-,and 5-year risk characteristics of TCGA and GEO.

Construction and validation of prognostic models of DNA methylation-driven genes

We analyzed the prognosis of 24 genes and observed the effect of their expression on the prognosis of LUAD patients. Hazard ratio(HR) > 1 represents high-risk genes, HR < 1 represents low-risk genes, and found that CFTR and PKIA genes (P < 0.05) have prognostic significance in lung cancer(Fig. 2A). Using the methylation regulators of CFTR and PKIA, we performed a LASSO Cox regression algorithm and constructed a prognostic model based on these two regulators (Fig. 2B,C). Finally, we identified the factors that will be used to calculate the risk score after cross-validation. The coefficients of CFTR and PKIA are − 0.22127 and 0.17919, respectively (Table 2).The risk score for each patient with TNBC was calculated using the following formula: risk score = − 0.22127 × CFTR + 0.17919 × PKIA. According to the median risk score, all patients with LUAD were divided into a low-risk group and a high-risk group. In both the TCGA and GEO databases, the OS of patients in the high-risk group was lower than that in the low-risk group (Fig. 2D, E). To evaluate the specificity and sensitivity of risk characteristics in predicting the prognosis of LUAD, we performed a time-dependent receiver operating characteristic curve (ROC) analysis in the TCGA database and the GEO data set, and the areas under the ROC curve (AUC) of the 1, 3, and 5-year analyses were 0.618,0.596,0.607,0.663,0.627,0.644,respectively (Fig. 2F,G),suggesting that our risk characteristics can be used to predict the prognosis of LUAD.

Table 2 Risk scores for CFTR and PKIA

Relationship between clinical features and prognostic risk score

TCGA and GEO risk score plots, survival time and status plots are shown in Fig. 3A, B, D, and E. In addition, these 2 independent risk genes were displayed in a heat map to show the difference in expression levels between high-risk and low-risk groups (Fig. 3C,F). Then we performed univariate and multivariate Cox regression analysis, riskscore was significantly correlated with OS(p < 0.001). At the same time, univariate and multivariate Cox regression analysis also showed that histological stage was also associated with OS (p < 0.001) (Fig. 3G,H). These results suggest that risk models based on these two methylation regulators can serve as independent prognostic factors for LUAC patients.

Fig. 3
figure 3

(A,D) Risk score distribution of TCGA and GEO. (B,E) Survival status scatter plots of TCGA and GEO. (C,F)Heat maps show differences in CFTR and PKIA expression between the high-and low-risk groups of TCGA and GEO. (G) The p-value, HR value, and 95% confidence interval of the DNA methylation regulators analyzed using univariate Cox regression analysis (H) Multivariate Cox regression analysis

Establishment of a prognostic nomogram for prediction of LUAD OS

Survival analyses of model gene expression and methylation levels showed that CFTR and PKIA were associated with survival in patients with LUAD (Fig. 4A-D).In order to manage the clinical prognosis of lung cancer patients and provide clinicians with a quantitative method to predict the probability of one-,three-and five-year survival time of individuals, we established a prognostic nomogram that integrates clinical pathology independent risk factors with a prognostic model (Fig. 4F). Based on the above predictive model, the calibration curve of the nomogram shows that the predicted OS rates at 1, 3, and 5 years are in good consistency with the final results (Fig. 4E).

Fig. 4
figure 4

(A)Survival analysis of CFTR expression levels. (B) Survival analysis of PKIA expression levels. (C)Survival analysis of CFTR methylation levels. (D)Survival analysis of PKIA methylation levels. (E)A correction curve for assessing the consistency between the prediction of the prognostic model and the actual survival rate. (F)Survival probability nomogram of all independent predictive variables based on TCGA data set

Biological mechanisms of model gene function

To investigate potential CFTR- and PKIA-related functions and signaling pathways, we performed GSEA. After GO and KEGG analysis, GSEA (supplement Fig. 4sA-D) found that new model genes were overexpressed in high-risk and low-risk groups, and regulated multiple biological processes, such as antigen processing and presentation, cell cycle, DNA replication, aldosterone regulates sodium reabsorption, juvenile diabetes, and tyrosine metabolism.

Chemotherapy strategy of LUAD high and low-risk group

The Oncopredict software package was used to predict the drug sensitivity scores of the two groups with high and low LUAD scores. Based on the GSDC database, the correlation between the expressions of CFTR and PKIA and the sensitivity to antitumor drugs was analyzed. The results showed that the lower the drug susceptibility score, the more sensitive to drug treatment. The results show that our signature genes are associated with the response to 35 anti-tumor drugs and have potential anti-tumor value (Fig. 5).

Fig. 5
figure 5

Boxplot of susceptibility to commonly used chemodrugs evaluated between high-risk and low-risk groups by analysis of cell line data from the GDSC database

Expression of CFTR and PKIA in LUAD tissues

The expressions of CFTR and PKIA were detected by immunohistochemical (IHC) staining using lung tissue and LUAD tissue from our hospital. The results showed that CFTR and PKIA were strongly expressed in normal lung tissue and weakly expressed in LUAD tissue(Fig. 6). These results were consistent with the results of the dataset, indicating that CFTR and PKIA played an important role in the progression of lung cancer.

Fig. 6
figure 6

Immunohistochemistry: (A) CFTR is strongly expressed in adjacent normal lung tissue, and decreased expressed in LUAD (magnification ×20, magnification ×40) (B) PKIA is strongly expressed in adjacent normal lung tissue, and in decreased expressed in LUAD (magnification × 20, magnification × 40)

Discussion

Lung cancer is the main cause of cancer-related mortality worldwide [14]. Despite the progress in diagnosis and treatment, the five-year survival rate of lung cancer is still very low. With the advent of new sequencing technologies, genome-wide DNA methylation analysis has become possible. In the study of cancer, it is found that apparent genetic changes also play an important role in the process of tumors [15]. Aberrant DNA methylation usually occurs in the early stages of cancer, making DNA methylation biomarkers a good marker for early cancer detection [16]. DNA methylation is also associated with lung cancer prognosis and is essential for its growth and metastasis [17].In this article, we discuss the role of CFTR and PKIA methylation in lung cancer, as revealed by data analysis from TCGA and GEO. We found that the methylation of CFTR and PKIA genes correlated with the prognosis of lung cancer. They are not only differentially methylated and expressed in LUAD tumor tissues, but also correlated with the prognosis of patients. Survival curves showed that there were significant differences in survival curves between the high-risk group and the low-risk group.

CFTR is a gene that encodes a chloride channel that is critical for regulating the transport of ions and fluids across epithelial tissues [18].CFTR protein deficiency leads to excessive inflammation in the lungs, pancreas and intestines [19, 20]. Mutations in the CFTR gene are closely related to cystic fibrosis, and different types of CFTR mutations can lead to CFTR protein deficiency and functional impairment, and the severity of lung diseases varies significantly among different individuals [21]. Hypermethylation of the CFTR promoter has also been observed in head and neck cancer and non-small cell lung cancer, bladder cancer, liver cancer and breast cancer [22]. Previous articles found that hypermethylation of the CFTR promoter region can lead to a decrease in the expression of CFTR in lung cancer cells, and hypermethylation of the CFTR gene in NSCLC tissue samples was associated with a significantly reduced survival rate [23, 24], which is consistent with our conclusions.

PKIA, also known as cAMP-dependent protein kinase inhibitor α, has the function of interacting with cAMP-dependent protein kinase and inhibiting its activity [25]. PKIA is elevated in prostate cancer and associated with reduced progression-free survival, and its depletion leads to reduced tumor growth and migration, and increased susceptibility to anoikis [26]. Satarupa et al. pointed out that PIKA is an important biomarker for cervical cancer staging [27]. Another study showed that the PIKA-mRNA signature acts as a prognostic signature in thyroid cancer and is also associated with the infiltration of immune cell subtypes [28]. Although there is no definite result to prove whether there is a relationship between PKA and lung cancer, previous studies suggest that cAMP signaling pathway may play an important role in the occurrence of lung cancer.

DNA methylation is an important epigenetic mechanism, which is significantly related to lung cancer and plays an important role in the process of carcinogenesis [29, 30]. We identified two lung cancer-specific methylation markers, CFTR and PKIA. The methylation of these genes can be used as potential biomarkers for the diagnosis and prognosis of lung cancer. However, there are still certain limitations, and further biological verification and clinical trials are needed to evaluate the clinical significance of these methylation markers. The results of this study provide relevant ideas for elucidating the development and prognosis of lung cancer and contribute to the early detection and treatment of cancer.

However, the study in the article and its results are only a preliminary study, and the present study has some limitations. First, this study used a sample of LUAD patients from publicly available databases for analysis, and these samples may differ, leading to inconsistent results. Second, although we built and validated prediction models in the TCGA database and the GEO database, further external validation is needed to verify the reliability of these results. In addition, our findings are based on bioinformatics analysis and further clinical trials are needed to validate the prognostic predictive value of these potential biomarkers in real patients. Finally, although our findings suggest that the methylation status of CFTR and PKIA genes is associated with prognosis in LUAD, we need further larger cohort studies to validate these findings. studies to gain insight into the exact mechanisms and mode of action of these genes in LUAD onset and progression.

Conclusions

Lung cancer is a common malignant tumor with a high mortality rate. Abnormal DNA methylation is closely related to the occurrence and development of cancer. Our study shows that the methylation status of CFTR and PKIA can be used as potential prognostic biomarkers and therapeutic targets in lung cancer, and their prognostic value needs to be further explored to improve the survival prediction of patients.