Background

Endometrial cancer is the 2nd most common gynecologic malignancy worldwide [1]. In China it also ranks the 2nd most common female cancer of the genital tract [2]. Uterine serous carcinoma (USC/uterine serous papillary carcinoma) was first described by Hendrickson in 1982 [3]. It represents a type of endometrial cancer whose clinicopathological and molecular features deviate from those of endometrioid carcinoma (EEC). Unlike EEC, USC tends to develop in elderly women, with low body weight and arises in the background of atrophic endometrium [4]. Microscopically, USC typically forms complex papillary structure with almost high-grade polymorphic nuclei in contrast to glandular/cribriform pattern with mild to moderate atypical nuclei in EEC [5, 6]. 80–90% of USC tumors harbor TP53 mutation while retaining wildtype PTEN but losing ER/PR expression [5,6,7,8]. USC accounts for almost 10% of endometrial cancers but is disproportionately responsible for poor outcomes, contributing up to 40% cancer-related deaths from endometrial cancer [4]. The estimated 5-year disease-specific survival for USC is 18–27% compared with that of 80–90% for EEC. Compared in stage, USC has better 5-year disease-specific survival than grade 3 EEC both in early (stage I/II, 74% vs. 85%, p < 0.0001) and late stage (stage III/IV, 33% vs. 54%, p < 0.0001) [9]. USC is characteristically aggressive, readily invading lymph-vascular space and undergoing abdominal dissemination in the early and stages even in the absence of myometrium invasion [10,11,12,13,14,15]. A high proportion of USC cases present with extrauterine symptoms and adnexal, peritoneal or upper abdominal mass at diagnosis [16,17,18]. Therefore, the clinicopathological parameters that can predict the prognosis of EEC, such as tumor size, myometrial invasion, lymph-vascular space invasion and lymph node metastasis, are not reliable indicators of USC prognosis [4, 19, 20]. To the best of our knowledge, a robust system for predicting USC outcomes and recurrence is currently unavailable.

Advancements in molecular biological techniques and RNA-sequencing technology, have made it easier to identify genes that are associated with cancer initiation and progression [21]. Single or multiple gene signatures exhibiting superior capacity to predict cancer outcomes relative to conventional clinicopathological features, have been developed [22,23,24,25,26,27,28,29]. While similar signatures have been developed for EEC [30,31,32], to the best of our knowledge, rare is available for USC.

Here, we carried out a genome-wide search for dysregulated genes in datasets from TCGA (The Cancer Genome Atlas) and GTEx (Genotype-Tissue Expression) and uncovered a 4-gene prognostic signature for USC. As an independent indicator of USC prognosis, this signature performs better than conventional prognostic factors.

Methods

Processing of TCGA-USC, GTEx datasets

Level 3 USC RNA-Seq dataset (reads FPKM with HTSeq) along with associated clinical information was downloaded from the TCGA database. Normal uterus GTEx data were downloaded from the UCSC Xena project (http://xena.ucsc.edu/) in October 2019. On the TCGA dataset, cases with follow-up data or overall survival (OS) of less than 30 days were excluded from the study.

Identification of dysregulated genes and functional enrichment analysis

Limma, an “R” Bioconductor package was used to identify genes that are dysregulated in USC tissues relative to normal uterine tissue by applying a threshold of |log2FC| > 2 and FDR < 0.01. GO (gene ontology) term analysis and KEGG (Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis were conducted using the ClusterProfiler package on “R”. A P value = < 0.05 was considered indicative of significantly enriched functional annotations.

Construction and evaluation of the prognostic model

Half of the USC cases were randomly assigned to the training set. Cases with complete records on clinicopathological features, including OS, age, invasion, node, and stage, were assigned to the testing set. In the training set, dysregulated genes with prognostic potential were identified by univariable Cox regression analysis using the Survival package in “R”. P-value = < 0.05 was considered significant. To identify the most important prognostic genes, the least absolute shrinkage and selection operator (LASSO) regression method was executed in “R” using the Glmnet package. The prognostic signature for predicting OS was developed through multivariable Cox regression analysis using the “R” Survival package. The prognostic signature was applied in the calculation of the patients’ risk scores. The cases were then ranked into the high-risk and low-risk groups based on the median score. Kaplan-Meier survival analysis was done using the “R” Survival package to plot the survival curves for the 2 risk groups. Receiver operating characteristic (ROC) curve analysis done using the “R” Survival ROC package to test the 4-gene signature’s accuracy in predicting OS for the high and low-risk USC cases. To validate the effectiveness of the signature, the OS risk score for each patient in the testing set was calculated using the signature, followed by Kaplan-Meier curve analysis and ROC estimation as was done in the training set. To evaluate the superiority of the 4-gene signature as a prognostic indicator, ROC curve analysis was done on other clinicopathological features, including age at diagnosis, myometrium invasion, node metastasis and stage. The process outlined above was used to test the signature’s effectiveness at predicting recurrence-free survival (RFS).

Results

TCGA-USC patient characteristics

A dataset of 110 UCS samples and 35 adjacent normal uterus tissue samples was downloaded from TCGA. The training and testing set consisted of data from 56 and 74 USC cases, respectively. The clinicopathological features among the 2 groups and the whole dataset did not differ significantly (P-value = > 0.05). These features were summarized in Table 1.

Table 1 Clinicopathological characteristics of USC patients in this study

Identification of dysregulated genes in USC and functional enrichment analysis

To ensure that our analysis compared equivalent numbers of USC and non-USC cases, we downloaded a dataset of normal uterus tissue samples from GTEx (n = 78), which along with the 35 in the TCGA dataset brought the total number of normal uterine cases to 113. Using Limma package in “R”, and a cutoff threshold of |log2FC| > 2, FDR < 0.01, 1385 genes were identified as being dysregulated in USC tissue vs the normal controls (Fig. 1a). Functional enrichment analysis revealed that the dysregulated genes are significantly associated with 717 GO term processes and 21 KEGG pathways. The most significantly enriched GO terms were extracellular matrix, mitosis, and cell adhesion, processes that might promote cancer progression (Fig. 1b). The most significantly enriched pathways are involved in cell adhesion, cell cycle, PI3K-Akt signaling pathway, cancerous microRNAs, transcriptional misregulation, and pathways involved in melanoma and bladder cancer (Fig. 1c).

Fig. 1
figure 1

Dysregulated genes in the USC and functional enrichment analysis. a Volcano plot shows 1385 up- and down-regulated genes between 110 USCs and 113 normal uterus tissue with the threshold of |log2FC| > 2 and FDR < 0.01. b Top 30 Gene ontology (GO) biological processes of dysregulated genes. c Top 20 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of dysregulated genes

Prognostic signature construction and evaluation in the training set

To identify dysregulated genes that may be associated with OS, we performed univariable Cox regression analysis and uncovered 29 genes that significantly correlated with OS (Table S1). To narrow down to the most important prognostic genes, we used LASSO regression analysis, which revealed 5 dysregulated genes as being potential critical indicators of USC survival (Fig. 2a). Next, multivariable Cox regression analysis narrowed down to a signature 4 genes, KRT23, CXCL1, SOX9 and ABCA10 (Fig. 2b) that effectively predict OS (Table 2). Among these, KRT23, CXCL1, and SOX9 exhibited positive regression coefficients, indicating a high risk of mortality. While ABCA10 showed a negative regression coefficient, implying a low mortality risk. Next, we constructed the following risk prediction formula based on the 4 prognostic genes and used it to calculate each patient’s risk score in the training set: risk score = (0.5424 × expression level of KRT23) + (0.2398 × expression level of CXCL1) + (0.5398 × expression level of SOX9) - (1.7023 × expression level of ABCA10). This signature was used to calculate risk scores for 56 USC cases (individually) in the training set. The risk scores were then ranked linearly and assigned as high-risk or low-risk based on whether they were higher or lower than the median risk score (Fig. 2c). The relationship between risk scores and survival time was showed in Fig. 2d. Visualization of the expression of the 4 genes in a heatmap revealed that the expression level of the 3 high-risk genes increasing with rising risk scores, while the low-risk gene showed an opposite correlation (Fig. 2e). Kaplan-Meier analysis revealed that patients in high-risk group experienced worse outcomes relative to the low-risk group (P-value = 0.003317, Fig. 2f). Relative to standard clinicopathological parameters like age, myometrium invasion, node metastasis and disease stage, the 4-gene prognostic signature scored 0.855 in AUC (area under the ROC curve) analysis, indicating superior performance over conventional prognostic factors (0.213, 0.796, 0.728 and 0.564 for age, myometrium invasion, node metastasis and stage, respectively; Fig. 2g).

Fig. 2
figure 2

Identification of the 4-gene signature and prediction of overall survival (OS) in the training set. a LASSO regression defines 5 critical survival prognostic genes. b Expression differences of the 4 genes, identified by multivariable Cox regression analysis, between 110 USC patients and 113 normal endometrium tissues are analyzed by Wilcox test. P < 0.05 is considered statistically significant. c-e The distribution of risk score, OS, and survival status and the 4 genes expression patterns for the 56 patients in the training set. f Kaplan–Meier analysis to compare OS between patients in the high- and low-risk group in the training set. g ROC analysis of the 4-gene signature and other clinicopathological parameters (age, invasion, node metastasis and stage) for prediction OS in the training set

Table 2 Four signature genes constructed in this model

Validation of the 4-gene signature in the testing set

To assess the robustness of the 4-gene prognostic signature, risk scores for the 74 USC cases in the testing set were calculated and ranked as described in section 3.3 (Fig. 3a). The relationship between risk scores and survival is shown in Fig. 3b. This analysis revealed that the expression of the 3 high-risk genes increased with rising risk scores, while the low-risk gene exhibited the opposite effect (Fig. 3c). Kaplan-Meier curve indicated that the high-risk group experienced worse outcomes relative to the low-risk group (P-value = 0.0004387, Fig. 3d). The score of 0.811 for the 4-gene signature was revealed by AUC analysis was higher than for conventional prognosis indicators (0.430, 0.752, 0.808 and 0.688 for age, myometrium invasion, node metastasis and stage, respectively; Fig. 3e), consistent with observations made in the training set.

Fig. 3
figure 3

Validation of the 4-gene signature in predicting OS in the testing set and different stage patients. a-c The distribution of risk score, OS, and survival status and the 4 genes expression patterns for the 74 patients in the training set. d Kaplan–Meier analysis compares OS between patients in the high- and low-risk group in the testing set. g ROC analysis of the 4-gene signature and other clinicopathological parameters (age, invasion, node metastasis and stage) for prediction OS in the testing set. f Kaplan-Meier analysis compares OS between patients in the high- and low-risk group in early stage (I + II) patients. g Kaplan-Meier analysis compares OS between patients in the high- and low-risk group in late stage (III + IV) patients

Independent prognostic value of the 4-gene signature

To evaluate the potential of the 4-gene signature independently of conventional prognosis indicators, we used univariate and multivariate Cox regression analysis on testing set cases with reporting complete clinical features. This analysis revealed that our prognostic signature and tumor stage are both independent predictors of OS (Table 3). Next, we tested if the 4-gene signature could predict OS at different disease stages. To this end, we stratified the cases by stage into early (stage I + II) and late stage (stage III + IV). Patients in high-risk group in both early and late stage exhibited lower OS relative to those in the low-risk group (P value = 0.003306 and P value = 0.02755, respectively, Fig. 3f-g). These results indicate that the 4-gene signature has superior performance in early stage, highlighting its potential clinical application.

Table 3 Univariate and multivariate Cox regression analysis of OS in USC patients in the testing set (n = 74)

Evaluation of the 4-gene signature in predicting RFS

To evaluate whether the 4-gene signature could predict recurrence-free survival (RFS) in USC, TCGA-USC cases with RFS data were analyzed. Cases with RFS of < 30 days were excluded and 95 cases further analyzed. Each patient’s risk scores were calculated and ranked as described in section 3.3 (Fig. 4a). The risk scores and recurrent time are shown in Fig. 4b. This analysis revealed that expression of the 4 genes increased with rising risk scores (Fig. 4c). Kaplan-Meier analysis revealed that the high-risk group had higher recurrence rate relative to the low-risk group (P value = 0.01198, Fig. 4d). The AUC analysis of the prognostic signature revealed a score of 0.737 at RFS prediction, which was higher than the scores from conventional indicators (0.151, 0.595, 0.551 and 0.632 for age, myometrium invasion, node metastasis and stage, respectively, Fig. 4e). Univariate and multivariate Cox regression analysis revealed the prognostic signature and stage as independent prognostic factors for RFS, consistent with OS analysis (Table 4). Analysis of the effectiveness of the 4-gene signature in predicting RFS at different disease stages revealed that patients in low-risk and high-risk groups had significantly different RFS in late stage (P value = 0.003489, Fig. 4f). However, there was no difference in early stage between the two risk groups (Fig. S1).

Fig. 4
figure 4

Validation of the 4-gene signature in predicting recurrence-free survival (RFS) in the TCGA dataset. a-c The distribution of risk score, RFS, and survival status and the 4 genes expression patterns for the 95 patients with RFS data. d Kaplan–Meier analysis compares RFS between patients in the high- and low-risk group. e ROC analysis of the 4-gene signature and other clinicopathological parameters (age, invasion, node metastasis and stage) for prediction RFS. f Kaplan-Meier analysis compares RFS between patients in the high- and low-risk group in late stage (III + IV) patients

Table 4 Univariate and multivariate Cox regression analysis of RFS in USC patients in USC patients in TCGA cohort (n = 95)

Discussion

Here, we analyzed USC datasets from TCGA and GTEx and uncovered 1385 genes that are dysregulated USC tissues relative to normal endometrial tissue. KEGG pathway analysis revealed that these genes mainly belong to cancer-associated pathways, including melanoma and bladder cancer as well as in pathways associated with cell adhesion, cell cycle, PI3K-Akt signaling pathway, cancer-linked microRNAs and transcriptional misregulation. Disruption of cell adhesion may explain why USC tends to disseminate early, spreading to fallopian tubes or invading lymph-vascular space. The tumor suppressor, TP53 is the frequently mutated gene in USC [33]. USC’s high proliferative rate dysregulated cycle control may contribute to the high relapse and mortality rates in endometrial cancers. The PI3K/AKT/mTOR signaling pathway is the most frequently dysregulated pathway in EEC [33]. In USC, PIK3CA mutation occurs in about 30% of cases [11, 33, 34], which is consistent with the involvement of the PI3K-Akt signaling pathway seen from our analysis. Inhibition of PI3K/AKT/mTOR signaling strongly suppresses EEC progression [35,36,37] and clinical trials targeting PI3K/AKT/mTOR signaling in solid tumors have shown promise [38]. However, the benefits of this against in endometrial cancers is controversial due to the complexity of pharmacological action and toxicity [39]. Further studies are needed to better target PI3K/AKT/mTOR signaling in endometrial cancer.

The TGCA database, which offers a collection of complete transcriptomic data and associated clinical information, is publicly available for data mining [40]. To identify important dysregulated genes associated with USC outcomes, we used LASSO and Cox regression analysis. LASSO is widely applied in modeling high-dimensional data and avoids overfitting risk and improves prediction accuracy [41]. Our analysis generated a 4-gene signature for predicting USC OS by calculating each patient’s risk score. We find that patients with high scores exhibit poor outcomes relative to those with low scores, an observation that was validated in both the training and testing sets. ROC curve analysis revealed this signature’s superiority over conventional prognostic parameters (age, myometrium invasion, node metastasis, and stage) in the training and testing sets. Our data show that both the 4-gene signature and disease stage are independent prognostic indicators OS. Patients with late-stage of the disease have an unfavorable prognosis for most malignant solid tumors. However, for USC, the early-stage disease does not necessarily correlate with good prognosis due to the tumor’s propensity for shedding, spreading and invading the lymph-vascular space even when the lesion confined to the endometrium or polyps. Management of patients with early stage USC is controversial [4, 42, 43]. Our signature identified high-risk patients in the early stage USC group who had much poorer OS relative to low-risk patients in the same group. Our data show that this signature performed better in the early stage group than in the late stage group, highlighting its potential value in guiding the management for early stage USC.

The average recurrence rate for stage IA USC after chemotherapy, radiotherapy or surgery is 8.7, 25 and 12.4% respectively. For stage IB/IC the corresponding recurrence rate are 10.8, 36.6 and 37.3%, respectively [11]. Our 4-gene signature predicts a higher recurrence risk in the high-risk group relative to the low-risk group. Consistently with our OS, ROC curve analysis, this 4-gene signature exhibited superior effectiveness over conventional indicators of RFS. Both the signature and disease stage were an independent prognostic factor for RFS. Our data show that the 4-gene signature is effective at RFS prediction in late stage disease but showed no difference between high and low-risk groups in early stage. This may be due to too few recurrent cases (8 cases out of 45 cases) in early stage in the TCGA cohort.

The 4 genes in the signature have been associated with various cancers. KRT23 has been implicated as an oncogene in liver cancer [44] and colorectal cancer [45]. CXCL1 is overexpressed in EEC tissue relative to normal endometrium and promotes tumorigenesis by promoting neutrophil chemotaxis [46]. Snail induces ovarian epithelial-mesenchymal transition via CXCL1 and CXCL2, representing an immunological therapeutic target [47]. SOX9 overexpression in uterine epithelium may induce endometrial hyperplastic lesions [48], promoting endometrial cancer cell proliferation [49]. ABCA10 has been proposed as a prognostic marker in ovarian carcinoma [50]. Germline single nucleotide polymorphisms in ABCA10 may affect follicular lymphoma overall survival [51]. So far, none of the 4 genes has been associated with USC, though CXCL1 and SOX9 are associated with EEC progression.

Conclusion

Here, we an analysis of USC genome-wide expression profiles in TCGA and GTEx datasets. We have identified genes that are dysregulated in USC and explored their molecular functions and pathways. More importantly, we have developed and validated a 4-gene signature that robustly predicts USC OS and RFS. This signature is an independent prognostic indicator that is more superior to conventional indicators of USC prognosis, especially when predicting OS in early stage of USC. Our findings highlight the potential of this signature as a guide for personalized USC treatment. However, more independent cohorts are needed to validate the signature and to elucidate the molecular mechanisms of these predictive genes in USC.