Background

Endometrial cancer (EC), that is, uterine corpus endometrial carcinoma (UCEC), originates from the epithelial malignant tumours in endometrium. With an increase in obesity and an aging population, the incidence and mortality rates of EC are increasing in developed countries [1]. According to the latest statistics of the American Cancer Society [2], over 61, 000 cases were estimated to be diagnosed with EC in 2017. At present, advanced stage EC still accounts for 20–30% of incidents, and the disease relapse is associated with a poor prognosis.

Currently, there are no known reliable diagnostic and prognostic biomarkers for EC. Cancer antigen 125 (CA125), being most frequently used as a biomarker for ovarian cancer, has some diagnostic/prognostic value in EC [3]. However, CA125 level is elevated in a number of physiological and pathological conditions, such as age [4, 5], pregnancy [6], menstruation [4, 6], and in gynaecological and non-gynaecological disorders, such as endometriosis [6], benign ovarian cysts [6], pelvic inflammatory disease [6], peritonitis [6], pancreatitis [6], and pneumonia [6]. Human epididymis protein 4 (HE4) also has some diagnosis/prognosis value in EC [7]. Similar to the high expression of CA125, HE4 level is also elevated in many physiological and non-gynaecological conditions, such as age [8], menopausal status [8], Body Mass Index [8], smoking status [8], creatine levels [8], pulmonary adenocarcinoma [9], chronic kidney disease [7], renal failure [10], and kidney fibrosis [11].

Due to these factors reduce the clinical value of the existing biomarkers in the progress and prognosis of EC, it is crucial to discover new reliable biomarkers as well as to unravel the underlying molecular mechanisms of the EC progression.

Results

Identification of DEGs and DEMs

A total of 1961 DEGs and 149 DEMs were identified from GSE17025 and GSE25405, respectively; 2339 DEGs and 205 DEMs were identified from the mRNA and miRNA data of uterine corpus endometrial carcinoma in TCGA (named TCGA-UCEC and TCGA-UCEC_miRNA, respectively); 520 common DEGs and 30 common DEMs were screened out with Venny 2.1.0(http://bioinfogp.cnb.csic.es/tools/venny/index.html) [12], respectively (Fig. 1a, Fig. 1b). There were 212 upregulated genes and 308 downregulated genes, as well as 15 upregulated and 15 downregulated miRNAs in EC tissues compared with NE tissues, respectively (Table 1, Table 2).

Fig. 1
figure 1

a: Venn diagram of the differentially expressed genes among these three datasets. b: Venn diagram of the differentially expressed miRNAs between two datasets. TCGA-UCEC: the mRNA data of uterine corpus endometrial carcinoma in the Cancer Genome Atlas, TCGA-UCEC_miRNA: the miRNA data of uterine corpus endometrial carcinoma in the Cancer Genome Atlas, TG-miRNA: the target gene of differentially expressed miRNA

Table 1 Top 10 DEGs in EC tissues compared with NE tissues according to the data from TCGA database
Table 2 Top 10 DEMs in EC tissues compared with NE tissues according to the data from GEO database

Functional and pathway enrichment analysis

The functional and pathway enrichment analyses of DEGs were conducted with DAVID. The upregulated genes were mainly enriched in these biological processes, which were cell cycle, cell division, and DNA replication signalling pathways, while downregulated genes were mainly enriched in skeletal system development, vasculature development, and cell adhesion signalling pathways (Table 3). Moreover, three KEGG pathways were enriched in upregulated genes, including cell cycle, oocyte maturation, and oocyte meiosis signalling pathways (Table 3). There were no KEGG pathways enriched in downregulated genes.

Table 3 Top 10 GO terms of biological processes and significant KEGG pathways of upregulated and downregulated DEGs for EC tissues compared with NE tissues

Construction of PPI network and module analysis

A PPI network consisting of 287 nodes and 1840 edges was constructed, which included 212 upregulated and 308 downregulated genes (Fig. 2). Next, 82 genes were screened out as hub genes (Degree of interaction ≥10 were selected as the threshold) [13], there were close correlations among hub genes (Fig. 3, Additional file 1). After analysing the network with the MCODE tool in Cytoscape software, an important module was obtained, including 50 nodes and 1082 edges (Fig. 4). Functional enrichment analyses of biological processes with regard to this module showed that these genes were enriched in cell cycle, cell division, and DNA replication signalling pathways (Table 4). Three KEGG analysis showed an enrichment in cell cycle, oocyte meiosis, and oocyte maturation signalling pathways (Table 4).

Fig. 2
figure 2

Protein-protein interaction network of the differentially expressed genes in endometrial cancer tissues compared with normal endometrium tissues. Green and red nodes represent upregulated and downregulated genes, respectively. The edges/lines stand for the regulatory association between nodes

Fig. 3
figure 3

Protein-protein interaction network of hub genes of the differentially expressed genes in endometrial cancer tissues compared with normal endometrium tissues. Green and red nodes represent upregulated and downregulated genes, respectively. The edges/lines stand for the regulatory association between nodes

Fig. 4
figure 4

Demonstration of the important module by cytoscape. The edges/lines stand for interaction relationship between nodes

Table 4 Top 10 GO terms of biological processes and significant KEGG pathways of the DEGs in module

Analysis of miRNA-mRNA regulatory network

Thirty commonly identified DEMs were screened out from GSE25405 and TCGA-UCEC_miRNA, including 15 upregulated and 15 downregulated miRNAs (Table 2). 6865 TG-miRNAs were checked in the miRecords database, of which 199 were validated in 520 common DEGs (Fig. 1a). These 199 commonly identified DEGs and 30 commonly identified DEMs were used to construct a miRNA-mRNA network. In patients with EC, 160 pairs of DEMs-DEGs relationships with reverse associated expression were confirmed using starBase v2.0 project, including 22 DEMs and 71 overlapping DEGs (Fig. 5, Additional file 2). In the network, hsa-miR-200b, hsa-miR-200c, hsa-miR-429, hsa-miR-424, hsa-miR-195, hsa-miR-653, and hsa-miR-141 showed a higher degree of interaction (≥ 5, Table 5).

Fig. 5
figure 5

The miRNA-mRNA regulatory network. Green and red nodes stand for upregulation and downregulation, respectively. The ellipses represent genes and the triangles represent miRNAs

Table 5 Top 7 miRNAs with the highest degree of interaction in the miRNA-mRNA interactions network (Degree of interaction ≥5)

Survival analysis

The prognostic value of 82 hub genes was assessed by OncoLnc. We found that high mRNA expression of BUB1, TOP2A, CDCA8, TTK, ASPM, UBE2C, BIRC5, HJURP, CENPA, MCM10, FOXM1, SPAG5, EXO1, ESPL1, OIP5, MCM4, CDC25C, DEPDC1, KIF18B, ERCC6L, CKAP2L, ATAD2, TK1, CCNF, E2F1, and CCNE1, as well as low mRNA expression of MYC was associated with the significantly worse overall survival for EC patients (data not shown). What makes us interesting was that, CCNE1 was also identified as a target gene of hsa-miR-195 and hsa-miR-424, which were identified in our DEM analysis (Fig. 6, Fig. 7).

Fig. 6
figure 6

Overall survival analysis of CCNE1 expression with prognosis of endometrial cancer patients (Logrank p-value = 0.000157). Based on the median expression level of CCNE1, the patients with EC were divided into two (high vs. low) groups

Fig. 7
figure 7

The correlated expression of CCNE1 and hsa-miR-195-5p (hsa-miR-195) in 538 patients with endometrial cancer. The correlation coefficients − 0.355 with p-value = 1.93e-17 indicated that CCNE1 and hsa-miR-195 expression levels were correlated with each other; data source: starBase v3.0 project

Discussion

In recent years, although clinical medical scientists have made significant progress in the treatment of EC with surgery and chemotherapy, the incidence and mortality rate of EC is still rising [14]. It is necessary to further understand the etiology and underlying mechanism of the EC progression to improve the prognosis of EC.

In this study, by integrating GSE17025 and TCGA-UCEC datasets, 520 common DEGs were screened out in EC tissues compared with NE tissues. These 520 common DEGs were composed of 212 upregulated genes and 308 downregulated genes. The upregulated DEGs, such signalling pathways, were mainly enriched as cell cycle, cell division, and DNA replication. Skeletal system development, vasculature development, and cell adhesion signalling pathways were enriched among downregulated DEGs. Furthermore, PPI network was built for 82 hub genes. Survival association analysis of these 82 hub genes showed poor prognosis associated with 26 upregulated genes and one downregulated gene for patients with EC. Similarly, 30 common DEMs were analysed from GSE25405 and TCGA-UCEC_miRNA datasets. After integrating 6865 TG-miRNAs with these 520 common DEGs, 71 overlapping DEGs were screened that showed close correlations with 22 common DEMs in EC (Fig. 5, Additional file 2). Moreover, high mRNA expression of CCNE1 (one of the 82 hub genes, which was correlated with hsa-miR-195 and hsa-miR-424) was significantly correlated with worse overall survival in EC patients.

miRNAs are endogenous small non-coding RNAs, which can inhibit gene expression by mRNA degradation/destabilization or through impairing translation [15, 16]. The abnormal expression of miRNAs occurs in a variety of tumours and is often associated with altered tumour characteristics, such as changes in tumour cell survival, proliferation, and invasion [17].

In this study, 30 common DEMs were compared between EC and NE tissues, such as hsa-miR-200b, hsa-miR-200c, hsa-miR-429, hsa-miR-141, hsa-miR-424, hsa-miR-195, and hsa-miR-653. The microRNA-200 (miR-200) family consists of miR-200a, miR-200b, miR-200c, miR-429, and miR-141, which all have the same seed sequence and homologous targets. The expression of hsa-miR-200b is upregulated in many malignant tumours [18,19,20], and its role in the inhibition of mesenchymal characteristics and metastasis has been revealed in prostate cancer, gastric carcinoma, and hepatocellular carcinoma via regulating ZEB1 expression or directly targeting ZEB2, or via Rho/ROCK signalling pathway [21,22,23]. Our study outcomes suggested that hsa-miR-200b was also upregulated in EC, and the observation was consistent with the previous study [24]. Hsa-miR-200c has been widely investigated during the last few years. There have been numerous studies demonstrating the association between an aberrant expression level of miR-200c and the prognosis of various human malignancies, such as breast cancer [18, 25, 26], prostate cancer [27], ovarian cancer [28], and endometrial cancer [29]. Some of these studies verified the anti-oncogenic role of miR-200c in certain cancer types, indicating the potential correlation of elevated expression levels of miR-200c and superior prognosis [26, 28, 29]. In contrast, other studies have suggested that miR-200c serves as an oncogene [18, 25, 27]. Nevertheless, these findings suggest that miR-200c is a potential biomarker for cancer prognosis. Our results also suggested that hsa-miR-200c was upregulated, and the observation was consistent with the previous study [29]. Recent reports have shown that hsa-miR-429 expression is frequently upregulated in several cancers and may function as an oncogene [30, 31] in cancers, such as endometrial carcinoma [30], as observed in this study. One study showed that upregulation of hsa-miR-429 is associated with a decrease in overall survival of serous ovarian cancer [32]; in contrast, other studies have shown that hsa-miR-429 was downregulated in some malignant tumours and had tumour-suppressor function [33, 34]. These results indicate that hsa-miR-429 plays different (even opposite) roles in tumorigenesis and cancer progression in different tumours. Hsa-miR-141 is also an important member of the miR-200 family, several previous studies have shown that has-miR-141 was involved in prognosis of cancer [35,36,37].

Some previous studies reported that hsa-miR-424 was downregulated and could have a tumour suppressor role in some cancers [38,39,40]. In line with these observations, our present study also showed that hsa-miR-424 was downregulated [40]. Hsa-miR-195 is a member of the miR-15a, −15b, − 16, − 195, − 424, and − 497 families, which is involved in the occurrence and developmental progress of many malignant tumours and regulation of malignant biological behaviours [40,41,42,43]. In our study, hsa-miR-195 in EC tissues showed lower expression levels compared with NE tissues, which was consistent with the previous study [42]. So far, there are only few reports on the role of hsa-miR-653 in the malignant biological behaviour of tumours.

Based on our findings, we speculates that hsa-miR-200b, hsa-miR-200c, hsa-miR-429, hsa-miR-141, hsa-miR-424, hsa-miR-195, and hsa-miR-653 may play important roles in biological behavior of EC by multiple pathways.

CCNE1, that is Cyclin E1, belongs to the cyclin family which, through association with cyclin-dependent kinase 2, controls cell cycle progression from G1 to S phase [44]. Previous studies have shown that the upregulation of CCNE1 could contribute to cancer development or tumorigenesis in many cancers [45,46,47,48,49,50], and CCNE1 could serve as a reliable independent prognostic marker [49, 50]. miRNAs from multiple families have been identified to target CCNE1 in a number of malignant tumours, such as hepatocellular carcinoma [51], osteosarcoma [52], cervical cancer [53], and bladder cancer [54]. In the present study, survival analysis of the hub genes related to DEMs showed that high expression of CCNE1 could indicate poor prognosis in EC patients.

There are some defects in this article. Such as, the overlapped miRNAs were about only 1/4 to 1/5 between GSE25405 and TCGA-miRNA, and some of the findings need further experimental validation in future studies.

With regard to the ratio of the overlapped miRNAs is low, the following observations may explain the possible reasons. Firstly, the ethnic origins of the chip and RNA-seq samples were different. The GSE25405 data was composed of Asians, while the TCGA-miRNA data was mainly composed of European Americans and African Americans. Secondly, the sample sizes were also different; while GSE25405 included 48 samples (41 endometrial cancer tissue samples, 7 normal endometrial tissue samples), the TCGA data sample size was larger and (after the author has screened and processed the relevant data) a total of 572 samples were included (539 tissue samples from endometrial cancer patients and 33 normal controls). Last but not least, the efficacy of RNA-seq detection and chip detection were different. It is well known that when detecting genes with higher abundance, the results of RNA-seq and chip may be similar, however, when detecting genes with lower abundance, RNA-seq can more effectively capture relevant information. As for the latter topic, we believe that the outcomes of the present study provide credible base for future research. For example, verifying the expression of selected miRNA (such as, miR-195 and miR-424.) in endometrial cancer cell lines and endometrial cancer tissue samples through PCR experiments, and in animal models may shed light on the role of these miRNAs in affecting the malignant biological process of endometrial cancer. Further, verifying the differential expression of miRNA in a large number of clinical samples and to analyse its correlation with clinical parameters (such as tumour clinical stage, pathological stage classification, recurrence, metastasis, and prognosis.) will help to determine the diagnosis and prognostic value of these miRNA in endometrial cancer patients. For another example, one or more hub genes can be selected to verify their mRNA and protein expression in endometrial cancer cell lines and endometrial cancer tissue samples. And then study the effect of genes, which were knocked out or overexpressed or mutated, on the biological process of endometrial cancer cell lines (such as, tumour cell proliferation, transformation, migration and invasion, blood vessel formation, and energy metabolism.) and its participation in molecular mechanism of action / signal transmission research. What’s more, to establish a subcutaneous transplanted tumour model, to introduce the target gene into the animal body, to observe the effect on tumour growth in the body, and further analyse the molecular mechanism or signal transmission of the target gene to provide potential targets for tumour gene therapy. Lastly, to verify the different expression of genes in a large number of clinical samples and analyse its correlation with clinical parameters to determine the diagnosis and prognostic value of genes in endometrial cancer patients.

Next, our clinical research team will select some miRNAs to verify the relationship between miRNAs and target genes through clinical experiments and their value in the diagnosis and prognosis of endometrial cancer patients.

Conclusions

Based on bioinformatics analyses of EC-related microarray data in the GEO database and clinical data related to EC in TCGA database, we identified 27 hub genes (BUB1, TOP2A, CDCA8, TTK, ASPM, UBE2C, BIRC5, HJURP, CENPA, MCM10, FOXM1, SPAG5, EXO1, ESPL1, OIP5, MCM4, CDC25C, DEPDC1, KIF18B, ERCC6L, CKAP2L, ATAD2, TK1, CCNF, E2F1, CCNE1, and MYC) that were associated with poor prognosis in EC patients. Further, seven miRNAs (hsa-miR-200b, hsa-miR-200c, hsa-miR-429, hsa-miR-141, hsa-miR-424, hsa-miR-195, and hsa-miR-653) were observed to participate in biological behaviour of EC. Further research is warranted to confirm the clinical implications of our findings.

Methods

Microarray expression data

The mRNA and miRNA expression data of the GSE17025 and GSE25405 datasets were respectively downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). The mRNA dataset GSE17025 contained the data from 103 samples, including 91 EC tissue samples and 12 normal endometrium (NE) samples. mRNA expression profiles in this dataset were measured using the GPL570 [HG.U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array platform [55]. The miRNA dataset GSE25405 contained 41 EC tissue samples and 7 NE tissue samples. In this dataset, the miRNA expression profile was detected using the GPL7731 Agilent-019118 Human miRNA Microarray 2.0 G4470B platform.

The RNA-seq data

The mRNA-seq and miRNA-seq data of patients with UCEC were downloaded from TCGA (www.cancergenome.nih.gov) by the tool named SangerBox (https://shengxin.ren/softs/Sanger_V1.0.8.zip; accessed June 20, 2019). The mRNA-seq and miRNA-seq datasets contained 544 EC tissue samples, 35 NE tissue samples, and 539 EC tissue samples and 33 NE tissue samples, respectively.

Identification of DEGs and DEMs

The Limma package (version 3.36.5) in R/Bioconductor was used to identify differentially expressed genes (DEGs) and differentially expressed miRNAs (DEMs) between EC and NE tissue samples [56]. The adjusted P-value (adj.P-value) was obtained by correcting P-value using the ‘Benjamini-Hochberg’ method, adj.P-value < 0.05 and |log2 fold change (FC)| > 1 were set as the threshold value [57]. The original probe-level data in Series Matrix Files were converted into gene symbol based on platform annotation files. The expression values of multiple probes corresponding to the same gene were selected by the minimum adj.P-value.

Functional and pathway enrichment analysis

The Database for Annotation, Visualization and Integrated Discovery (DAVID, http://david.ncifcrf.gov) facilitates users to perform biological analysis from data collection [58]. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted with DAVID. FDR < 0.05 was set as statistically significant.

Construction of PPI network and module analysis

PPI network of DEGs was constructed using STRING database (version 11.0, https://string-db.org/) and visualized using Cytoscape (version 3.7.1) [59, 60]. The parameter was set as medium confidence score ≥ 0.7, module analyses were conducted using Cytoscape software MCODE package with degree cut-off = 2, node score cut-off = 0.2, max depth = 100 and k-score = 2 [61]. The functional enrichment analyses for these DEGs in the modules were conducted with DAVID.

Prediction of the target gene of miRNA

The target genes for miRNAs (TG-miRNAs) were predicted by employing miRecords (http://c1.accurascience.com/miRecords/), which includes 11 different miRNA target genes predicted databases [62]. A TG-miRNA can only be identified when at least four different prediction databases predict that the gene is a target gene.

Construction of the miRNA-mRNA regulatory network

The intersection of TG-miRNAs and DEGs were considered to be potentially valuable differentially expressed target genes. Pearson correlation analysis was then used in starBase (http://starbase.sysu.edu.cn/) to verify the association between these potentially valuable differentially expressed target genes and DEMs in patients with EC [63]. These significant differentially expression target genes and corresponding miRNAs were used to construct a miRNA-mRNA regulatory network using the Cytoscape software. The Degree of interaction of the node ≥5, which was defined as a hub miRNA.

Survival analysis of hub genes

The overall survival of patients with EC with regard to hub genes was calculated using Kaplan-Meier analysis in OncoLnc (http://www.oncolnc.org/) [64]. The patients were divided into two groups (high vs. low) according to the median values of mRNA expression of the hub gene. The log-rank test was used to examine the significance of the difference between two groups.