Background

Cancers have become the second life-threatening malignancies, which contribute to almost 18.1 million people occurred and 9.6 million death globally in 2018 [1]. Lack of efficient diagnosis indicators at an early stage and high rate of postoperative recurrence contribute to poor clinical prognosis and high mortality [2, 3]. Growing evidence demonstrated that genomic instability [4, 5], oncogene activation, aberrant methylation modifications, alterations in epigenetic changes [6,7,8], aberrant expression of microRNAs [9], and alterations of signaling pathways are crucial factors and contribute to cancer pathogenesis [10,11,12]. Methylation is an essential epigenetic modification and is closely related to the pathogenesis of cancers [13,14,15,16,17]. The 5-methylcytosine (m5C), N6-methyladenine (m6A), and N1-methyladenosine (m1A) have become the most common types of epigenetic modifications in eukaryotes [13]. Emerging evidence has demonstrated that m5C modification has the potential to serve as novel epigenetic markers with remarkable biological significance in biological processes [18,19,20].

m5C modification distributes in different types of RNAs and DNAs [21,22,23]. m5C modifications can even modify the destiny of cancer cells [24]. m5C regulators contain writers, erasers, and readers, which function as common epigenetic modification and contribute to pre-mRNA splicing, gene expression, gene silencing, nuclear export, genomic maintenance, and translation initiation modifications [25, 26]. m5C could therefore be used as a biomarker for disease progression, including various types of cancers [27]. m5C maintains open and closed chromatin states to control gene expression, genome editing, organismal development, and cellular differentiation [23]. In this context, writers act within a methyltransferase complex to methylate targets, and erasers remove m5C methylation, while readers recognize and bind to m5C-methylated RNA and implement corresponding functions [25, 28, 29]. The anomalous interplay between writers and erasers, arising from alterations to their expression, has been linked to cancer pathogenesis and progression [23, 30]. However, pan-cancer effects of changes to m5C regulatory gene expression have not been fully defined. Next-generation sequencing (NGS) provides us effective tools to comprehensively view the m5C distribution landscape throughout the global transcriptome [27].

In this study, we identified the potential prognostic value of m5C regulators and provided a comprehensive understanding of m5C modifications in pan-cancers, which will help to find novel opportunities for cancer early detection, treatment, and prevention.

Materials and methods

Study workflow

We downloaded fragments of kilobase transcripts based on fragments per kilobase of transcript per million (FPKM) gene expression from The Cancer Genome Atlas (TCGA, https://www.cancer.gov/) dataset among 33 different cancer types. m5C regulator patterns were investigated in 5480 samples among 33 different cancer types, including somatic mutations, copy number variations (CNVs), gene expression, and RNA-seq data (Fig. 1B).

Fig. 1
figure 1

m5C regulators and the function in cancers. A. The distribution and the function of m5C regulatory writers, eraser, and reader. B The workflow scheme for this study

Genomic data collection of m5C regulators

Thirteen m5C regulators were identified from published papers. Information of m5C regulators was collected from Gene Cards (www.genecards.org). Ensemble gene IDs and HUGO Gene Nomenclature Committee symbols were assigned to each m5C regulator-associated gene.

Whole genomics data analysis of 33 pan-cancers

The integrated OMICS datasets based on the TCGA database of 33 pan-cancers were applied in this study. We collected the mutation annotation format profile from TCGA, which contains over 10,000 cancer patient’s information. The level 3 data of copy number alterations profiles in TCGA was acquired for secondary analysis. In addition, we downloaded pan-cancers RNAseq data of genomic variations profiles and corresponding clinical information from the Genomic Data Commons Data Portal using the R package “TCGAbiolinks.”

Differently expression genes (DEGs) identification

The DESeq package in R language was utilized to validate the DEGs between 33 pan-cancer samples and adjacent samples. Genes with a mean value > 0 were included in the screening of DEGs. To establish proper DEGs among 33 cancers, we settled the adjusted P value less than 0.01 and |log2 fold change (log2Fc) | no less than two as the statistical threshold value for differentially expressed genes. The results were screened as significant DEGs and methylated sites.

Functional annotations and pathway enrichment analysis

To evaluate the biological functions of each m5C modification-related gene, we transformed the RNA-seq data of all samples into transcripts per million (TPM) values. The methods have been described in a previous study [31]. The insufficient, duplicated, and zero expression genes will be eliminated. Furthermore, the gene set variation analysis (GSVA) was applied to determine transcriptomic activities and explore the biological processes of m5C regulators. To further explore m5C regulators related inhibition and activation factors, we performed the Pearson correlation coefficient (PCC) and defined the absolute value of the PCC greater than 0.5 and p value of less than 0.01 as the screen cut-off. The results could be recognized as significantly correlated m5C regulators.

The internships between m5C regulators

To visualize the intercorrelations among m5C regulators, we adopted the “CORPRRAP” R package (https://github.com/taiyun/corrplot). Besides, the STRING database was also applied for the exploration and analysis of these associations between m5C regulators and 325 related genes [32]. The 325 genes were obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (http://www.kegg.jp/ or http://www.genome.jp/kegg/). The correlations between m5C regulators and 325 genes were visualized through Cytoscape (https://cytoscape.org/).

Clinical characteristic of m5C regulators

To explore the m5C regulators’ related clinical characteristics, we classified genes into high and low expression groups based on genes’ median expression. Correlations between outcomes of the two groups were then analyzed through a log-rank test via R software (https://cran.r-project.org/web/packages/survival/index.html). The log-rank test was performed to weigh the overall survival rates that differ between the high and low expression groups. The CRAN Package survival (https://cran.r-project.org/web/packages/survival/index.htm) was performed, and we defined the p value of less than 0.05 as significant difference.

The roles of m5C regulators in cell growth

The CRISPR-CAS9 gene scale screening of cell lines from 33 cancer types was collected from previous study [32]. We calculated the proportion of every regulator as an essential gene in the cell lines.

Results

Results m5C regulators identification and its genomic extensive genetic changes

In this study, we identified 13 m5C regulators, as shown in Fig. 1A, eleven writers (NSUN1-7, DNMT1, DNMT2, DNMT3A, and DNMT3B), one eraser (TET2), and one reader (ALYREF). This study validated the frequency of m5C regulator patterns among 33 cancers by integrating somatic mutations and CNVs data. Table 1 illustrated detailed information. The results indicated that the overall average mutation frequency of regulatory factors is low, ranging from 0 to 9% (Fig. 2A). The uterine corpus endometrial carcinoma (UCEC) is characterized as a high tumor mutation burden [33]. The UCEC showed significantly higher mutation frequency. Horizontal analysis indicated that TET2, DNMT3B, DNMT3A, and DNMT1 demonstrated much higher mutation frequency among 33 cancers. Furthermore, we uncovered the CNV mutation frequency of m5C regulators was common. Regulators such as DNMT3B, ALYREF, and NSUN5 displayed extensive CNVs. On the contrary, TET2 and NSUN4 showed significant lack of m5C modification related CNV mutations among pan-cancers (Fig. 2B, and Table 2).

Table 1 The 33 cancer types in TCGA pan-cancer project
Fig. 2
figure 2

Mutation and CNV of m5C regulators across pan-cancer. A The mutant frequency of m5C regulators across 33 cancer types. B CNV analysis of m5C regulators across cancer types. The upper part of each grid shows the deletion frequency, and the bottom part shows the amplification frequency

Table 2 The mutation frequency of m5C regulators across 33 cancer types (Top 5)

To further investigate whether the genomic mutations affect m5C regulators expression, we intensively detected the m5C regulators’ gene expression disturbances in thirty-three pan-cancers and five standard control samples. The result implied that the CNV alterations (amplification and deletion) might profoundly affect the m5C regulator’s expression (Fig. 3A). The m5C regulators with CNV amplification showed significantly increased expression in pan-cancers (such as DNMT3B), and m5C regulators with CNV deletion exhibited remarkably decreased expression, like TET2. In addition, we comparatively analyzed the m5C regulators’ expression levels in cancers and corresponding normal tissues and found out that DNMT3B was significantly overexpressed in thirty-three tumor or cancer tissues compared with adjacent normal tissues (Fig. 3B). These results uncovered that the m5C regulators among various cancers showed significant heterogeneity in gene expression and genetics. Collectively, our results demonstrated that aberrant m5C regulations were crucial for carcinogenesis and progression, which provided a clue for further functional detection.

Fig. 3
figure 3

The association between CNV and the gene expression of m5C regulatory genes. A Alterations to m5C regulatory gene expression in 24 cancer types. The heat map demonstrates fold change, with red representing upregulated genes and blue representing downregulated genes. B Box plots exhibit the expression distribution of DNMT3B across tumor and normal samples in 24 cancer types

m5C regulators related pan-carcinogenic pathways

To comprehensively explore the molecular mechanism m5C regulators involved in cancers, we evaluated the correlations between m5C regulators’ proteins expression and KEGG enrichment analysis-related activities. The results indicated that m5C regulator proteins have a close relationship with tumor-related pathways’ activation and inactivation (Fig. 4A, and Table 3). ALYREF, DNMT1, and TET2 were involved in the cell cycle, DNA replication, and prostate cancer-related pathways. Notably, ALYREF was involved in multiple pathways, including cell cycle, DNA replication, and prostate cancer-related pathways. DNMT1 was involved in drug metabolism, lipid metabolism, and nucleic acid biosynthesis signaling pathways [34]. ALYREF, DNMT1, NSUN5, NSUN1, and TET2 showed active involvement in KEGG enrichment pathways (Fig. 4B). In addition, genes will not function alone [35]. Growing evidence indicated that genes always co-effect with multiple genes and always have multiple functions [35, 36]. We further explored the internal connections between m5C regulators gene expression. Results indicated that the readers, writers, and erasers also have high correlations with each other. The eraser TET2 was significant correlated with the writer NSUN3. Writers such as ALYREF and NSUN5 also showed obvious correlations (R = 0.55, P < 0.01) (Fig. 4C). Additionally, to visualize the interactions between m5C regulators, we utilized the protein–protein interaction (PPI) analysis in m5C regulators related proteins. The results showed that writers, readers, and erase were particularly frequent (Fig. 4D). These results indicated that interactions among m5C regulators play crucial roles in the development and progression of cancers.

Fig. 4
figure 4

m5C regulators are associated with the activation and inhibition of cancer pathways. A Network landscape demonstrating the correlation between m5C regulators and cancer pathways. Red represents a positive correlation, and blue represents a negative correlation. The size of the nodes corresponds to the number of links. B The number of pathways correlated with individual m5C regulators. The upper panel represents positively correlated pathways, and the bottom panel represents negatively correlated pathways. C The correlation among the expression of m5C regulators. D The PPI network of m5C regulators

Table 3 The CNV-Gain and CNV-loss frequency of m5C regulators across 33 cancer types (Top 5)

Clinical significance of m5C regulators in pan cancers

To evaluate the clinical prognosis of m5C regulators, we calculated the overall survival (OS), overall median progression-free interval (PFI), disease-specific survival (DSS), and disease-free interval (DFI) of m5C regulators. The OS analysis implied a significant correlation between m5C regulators and thirty-three pan-cancers. The heat map demonstrated that m5C regulators were significantly correlated with survival of patients, including OS, PFI, DSS, and DFI. In detail, the OS in adrenocortical carcinoma (ACC), kidney chromophobe (KICH), kidney renal clear cell carcinoma (KIRC), brain lower grade glioma (LGG), and liver hepatocellular carcinoma (LIHC) showed significant correlations with m5C regulators (Fig. 5A). The highly expressed DNMT3A, DNMT3B, DNMT1, and ALYREF were significantly related to poor prognosis. Collectively, the DNMT3A, DNMT3B, DNMT1, and ALYREF might function as poor prognosis predictors in cancer progression. Moreover, we evaluated the prediction of PFI at fixed time points in patients with thirty-two solid tumors. PFIs of DNMT3B and DNMT1 showed significantly higher hazard ratio values, indicating that they have the potential to be utilized as unfavorable prognosis prediction factors. The PFI in ACC, KICH, KIRC, LGG, and uveal melanoma (UVM) showed significant correlation ships with most m5C regulators (Fig. 5B). Similarly, The DSS and DFI analyses indicated that ACC and LGG showed remarkably correlations with most m5C regulators (Fig. 5C, D). These results indicated that m5C regulators have crucial prognostic prediction values in a variety of cancer types.

Fig. 5
figure 5

Summary of the relationship between m5C regulators expression and patient’s survival. A Overall survival (OS) of m5C regulators across 33 cancer types. B Progression-free interval (PFI) of m5C regulators across 32 solid cancer types. C Disease-specific survival (DSS) of m5C regulators across 32 solid cancer types. D Disease-free interval (DFI) of m5C regulators across 28 solid cancer types. Red represents a higher m5C regulator expression associated with poor survival, and blue represents an association with better survival

Effect of m5C regulators in LIHC and cholangiocarcinoma (CHOL)

Studies have validated that m5C-related genes alterations have a close relationship with advanced tumor progression and advanced tumor stages [37], based on the above pieces of evidence that most m5C regulators are associated with patients’ OS in LIHC and CHOL (Fig. 5A). Based on the overall expression patterns of m5C regulators, all patients in these cancer groups were categorized into two subgroups. The first subgroup consisted of 112 patients indicating high expression of m5C regulators (Reg-high), and the second subgroup consisted of 295 patients with low m5C regulators expression (Reg-low) (Fig. 6A). Compared with the Reg-high subgroup, the survival probability of patients in the Reg-low subgroup was significantly better (P < 0.001) (Fig. 6B). These results indicated that m5C regulators have a potential function as prognostic indicators in hepatocellular carcinoma and cholangiocarcinoma.

Fig. 6
figure 6

Effect of m5C regulators on patients with hepatocellular carcinoma and cholangiocarcinoma. A Heat map showing clustering for CHOL and LIHC patients based m5C regulator expression. Yellow represents Reg-low subgroup (N = 295), and green represents Reg-high subgroup (N = 112). B Kaplan–Meier survival plot of patients grouped by global m5C regulator expression pattern (P < 0.001)

Discussion

Epigenetic variation is often related to human disease, especially cancers [12, 38,39,40]. Remarkably progression has been made of various epigenetic-targeted therapies that have a broad application of malignancies and have exhibited detection and therapeutic potential for solid tumors in preclinical and clinical trials [41,42,43,44,45]. Aberrant methylation regulators process both in DNA and RNA play a critical role in epigenetic regulators, which are significantly associated with tumorigenesis [17, 46,47,48,49,50,51].

Original reports described that NSUN2 participates in catalyzing biological reactions of m5C formation in RNAs and regulating cell cycle [52], linked to stem cell differentiation and involved in progression [53]. It is also reported that m5C regulators such as NSUN2 and binding partner ALYREF participant in promoting mRNA export coordinately [54], and NSUN6, in complex with a full-length tRNA substrate targeting cytosine accessible to the enzyme for methylation [14, 55, 56]. The NSUN3 is required for the deposition of m5C at the anticodon loop in the mitochondria encoded transfer RNA methionine [57]. The NSUN5 demonstrates suppression characteristics in vivo glioma models [58]. NSUN5 gene mutation leads to an un-methylated condition at the C3782 position of 28S rRNA, which leads to a total depletion of protein synthesis and inducing an adaptive translational program under stress collectively [59], as is illustrated that the m5C regulators may influence a wide variety of biological functions and metabolism.

In the present study, we applied certain methodological particularities to build a model and evaluated a catalog of genomic characteristics of tumors associated with m5C regulators. We obtained a total of 13 m5C regulators. The mutations and CNVs of m5C regulators are linked to several tumor developments. All cancers carry somatic mutations [60]. UCEC exhibited a significantly higher number of mutations across pan-cancers, analogously TET2, DNMT3B, DNMT3A, and DNMT1 have placed a moderate burden in m5C regulator genes. DNMT3B gene mutation was generally higher expression level among various cancers. Here, we sequenced the m5C regulator genomes of pan-cancer and providing the first comprehensive remarkable insights into the forces that have shaped various cancer genomes. CNVs play an important role in tumor genesis and progression [61], including amplification and deletion of oncogenes, which may significantly increase the risk of cancer [62]. In this research, the DNMT3B, ALYREF, and NSUN5 showed extensive CNV amplification. In contrast, CNVs such as TET2 and NSUN4 are generally deletion. These results indicated that CNV and the associated gene signatures are useful for early cancer detection and diagnosis, targeted therapeutics, and prediction of prognosis.

The genomic and transcriptomic parameters of various cancers are associated with m5C regulators gene expression and activity of the KEGG pathways [63]. We also investigate the gene expression perturbations of m5C regulators through 33 cancer types with parallel normal controls. The expression of ALYREF, DNMT1, NSUN2, and TET2 are more positively correlated with the majority of pathways, such as the cell cycle, DNA replication, spliceosome, and nucleotide excision repair pathways. The DNMT1 expression is related to the activation of multiple metabolic pathways, including drug metabolism, lipid metabolism, and nucleic acid biosynthesis signaling pathways. The m5C regulators’ pathways are significantly essential for a wide range of biological processes. m5C regulators were also validated involved in malignant activities [64]. Recent studies have demonstrated that the m5C modification in pyruvate kinase muscle isozyme M2 was involved in bladder cancer proliferation and migration. M5C regulator Aly/REF export factor regulated pyruvate kinase muscle isozyme M2 promote the glucose metabolism of bladder cancer [64]. At the same time, the precise molecular modification mechanisms and cellular processes among pan-cancer need further study and deeper exploration for a better prognosis.

For a deeper exploration of the relationship between m5C regulators and their clinical outcomes, we describe a comprehensive landscape of m5C regulator pathways activities across different cancer types and identify cancer characteristics in relation to clinical outcome. Collectively, we provide robust evidence for the close relationship between cancer-associated clinical relevance and m5C regulators. To determine the effect of methylation-based molecular for earlier detection diagnostics in patients with several types of cancer, we systematically analyzed the m5C regulators’ pathway activities with the functional and clinically complication for estimating tumor development and progression with potential prognostic value.

Conclusion

The m5C regulators were differently expressed and showed significantly different CNVs in pan-cancers, which also involved multiple oncogene pathways. In addition, m5C regulators also exhibited prognosis prediction value in pan-cancers. Therefore, our study provides a better understanding of the biology of m5C regulators in pan-cancers, indicating that m5C RNA methylation regulators have the potential to become novel biomarkers and therapeutic targets for various tumors.