Introduction

Cervical cancer is the fourth most commonly diagnosed cancer among women worldwide, with approximately 604,000 new cases and 342,000 deaths in 2020 (Ward et al. 2020). Over the past few decades, the incidence and mortality rates of cervical cancer have steadily declined with the development of cervical cancer screening and HPV vaccines in high-income countries (Siegel et al. 2021). However, invasive adenocarcinoma, the second most common histologic type of invasive cervical carcinoma, has shown an increasing trend in incidence over the past two decades (Islami et al. 2019). Therefore, more attention should be paid to the prevention and treatment of cervical adenocarcinoma. Increasing evidence has shown that the genomic alterations, biological behavior, treatment outcomes, and prognostic factors of cervical adenocarcinoma (CA) differ from those of squamous cell carcinoma (SCC) (Ni et al. 2021). More recently, Levinson et al. reported that tumor size was the highest risk factor for recurrence of cervical adenocarcinoma, while the depth of invasion was the highest risk factor for recurrence of squamous cell carcinoma (Bhatla et al. 2019). Therefore, it is necessary to explore the prognostic factors of CA and develop a predictive model for predicting the prognosis of CA and optimizing treatment strategies. In this study, we hoped to develop the nomograms for relapse-free survival (RFS) and cancer-specific survival (CSS) using our central database as well as the Surveillance, Epidemiology, and End Results (SEER) database to more accurately evaluate the prognosis of patients with cervical adenocarcinoma.

Methods

Patient data

All consecutive patients diagnosed with cervical adenocarcinoma (CA) at the First Affiliated Hospital of Wenzhou Medical University, China between December 2008 and September 2018 were eligible for this study. This study complied with the Declaration of Helsinki. In the study, patients had signed written informed consents to be included. This research followed the ethical principles of the First Affiliated Hospital of Wenzhou Medical University. Clinical and pathologic information were obtained from patient files and pathology reports. Only patients with stage I–II (FIGO 2009) CA were included.

Predictors

The following predictors were selected for model development: age, grade, FIGO 2009 stage, surgery manner (laparotomy or laparoscopy), tumor size, differentiation (low, medium, and high differentiation), lymph metastasis (yes or no), number of positive lymph nodes (LNs), lymph-vascular space invasion (LVSI) (yes or no), infiltration depth, resection margin (positive or negative), radiation (yes or no), chemotherapy (yes or no), D-Dimer, platelet, total cholesterol (TC), triglyceride (TG), high-density lipoprotein (HDL), low-density lipoprotein (LDL), glucose (GLU), hemoglobin (HB), red blood cell (RBC), (the above blood tests were completed within 1 week before the operation), red blood cell after the operation (RBC after) and hemoglobin after the operation (HB after). Both RBC after and HB after were tested the next day after the operation).

Handling of missing data

In the study, there were 146 cases in the original database, of which 15 cases were lost to follow-up, giving a total of 131 cases were included in the analysis. Missing values, including both categorical and continuous variables, were imputed using random forest method for five times. The model results of different imputed datasets were combined according to Rubin’s rules, and the pooled C-index value and 95%CI were also calculated using mice:pool function.

Transformation of the predictors

To facilitate the model’s use and interpretation in practice, continuity variables such as age and tumor size were transformed into categorical variables. The optimal cutoff point of age was selected using the log-rank tests. According to the log-rank test, age at diagnosis was categorized as: ≤ 55, > 55 years. According to the FIGO staging system, tumor size (defined as the maximum measurement of horizontal diffusion or surface diameter in the ultrasound field) was divided into two groups: ≤ 40 and > 40 mm.

Predictor selection

The primary endpoint focused in this study is patient recurrence. Univariate Cox regression analysis was used to determine the prognostic factors associated with the total recurrence rate and a p value < 0.05 was considered statistically significant. Hence, eight variables were ascertained in subsequent analysis: age, stage, tumor size, lymph metastasis, number of positive LNs, infiltration depth, edge positive, and radiation. Considering the results of univariate Cox analysis, the clinical relevance of the variables, and the sample size, multiple multifactor models were established. The model with the highest C-index were selected as the final prediction model. Consequently, five variables, including age, stage, size of the tumor, lymph metastasis and depth of invasion, were included in the final model.

For the survival model, the study endpoint of this study was death specifically attributed to CA. Survival time was calculated from the time of diagnosis until death attributed to CA or last follow-up. Through the same statistical method as above, four variables, including age, stage, tumor size and number of positive LNs, were included in the survival model.

Model development and internal validation

To visualize the predictive models, two nomograms for predicting the 2- and 5-year relapse-free survival (RFS) and cancer-specific survival (CSS) in patients were further constructed. Then developed nomograms were internally validated and calibrated using the bootstrap resampling (B = 1000) approach as assessed by the C-index and calibration curves. The survival prediction model was validated in SEER database externally.

SEER data extraction and external validation of survival model

Cases with CA were identified using the International Classification of Diseases for Oncology, third edition (ICD-O-3). Histology code: 8140/3, 8144/3, 8147/3, 8200/3, 8210/3, 8241/3, 8244/3, 8255/3, 8260–8263/3, 8310/3, 8313/3, 8323/3, 8380/3, 8382/3, 8384/3, 8430/3, 8441/3, 8460/3, 8461/3, 8480–8482/3, and 8490/3 (Fritz et al. 2000; Lu and Chen 2014). We collected data on confirmed CA cases from the SEER registry (n = 1679) from 2004 to 2015 for external validation. The seven variables collected from the database were age, stage, tumor size, number of positive LNs, survival time, cause of death, and vital status. Patients diagnosed with stage III or IV disease, no follow-up data, no lymph node examination results, and missing values of modeling variables were excluded.

Statistical analysis

Continuous variables are described as mean ± standard deviation (SD) or median with interquartile range (IQR) values, depending on whether they are normal or non-normal. Categorical variables are shown as numbers and percentages for each group. Cox proportional hazards regression analysis was used to construct predictive models that were presented as static nomograms and dynamic web-based nomograms. The nomogram for the recurrence model was internally validated with a bootstrap resampling method. The prediction performance of the survival nomogram was assessed by resampling techniques for internal validation and on the external validation cohort from SEER database. All statistical analyses were performed using R software (version 3.6.3). p < 0.05 was considered statistically significant.

Results

Clinical characteristics

Excluding 15 patients who lost follow-up, a total of 131 patients diagnosed with CA at the First Affiliated Hospital of Wenzhou Medical University between December 2008 and September 2018 were enrolled. The patients’ baseline characteristics are shown in Tables 1 and 2. The mean age of the entire cohort was 49.8 ± 10.1. Of the patients, 95 (72.5%) cases, and 36 (27.5%) cases were ≤ 55, and > 55 years old, respectively; 80 (61.1%) cases and 15 (11.5%) cases had tumor sizes ≤ 40, and > 40 mm, respectively. The number of patients with stage I and stage II were 97 (74%) and 34 (26%). The number of patients with or without lymph metastasis was 22 (16.8%) and 109 (83.2%); the depth of invasion of ≤ 2/3 and > 2/3 were 81 (61.8%), and 49 (37.4%), respectively.

Table 1 Clinical characteristics of the recurrence group and the nonrecurrent group from raw data
Table 2 Clinical characteristics of the survival group and the dead group from raw data

Cox regression analysis of disease recurrence

Of all 131 patients, the median follow-up was 43 months, and disease recurrence occurred in 19(14.5%) patients. We firstly analyzed the tumor characteristics associated with RFS. In univariable analysis, age > 55, FIGO stage II, tumor size > 40 mm, positive lymph metastasis, number of positive LNs, depth of invasion > 2/3, positive resection margin and with radiation were risk factors for RFS (p < 0.05) (Table 3). In multivariable analysis, age of > 55 (HR 2.74, 95% CI 1.05–7.15, p = 0.040), stage II (HR 2.76, 95% CI 1.1–6.93, p = 0.031), larger tumor size (HR 7.02, 95% CI 2.61–18.94, p < 0.001) were identified as independent predictors of RFS in CA patients. Depth of invasion (HR 1.22, 95% CI 0.43–3.44, p = 0.704) and lymph metastasis (HR 2.25, 95% CI 0.88–5.76, p = 0.090) was not identified as advantageous factor for prognosis. The results are shown in the forest plot (Fig. 1).

Table 3 Univariable Cox regression of recurrence model
Fig. 1
figure 1

Forest plot shows the multivariate Cox regression model that predicts recurrence of CA

Nomograms and internal validation of recurrence model

Subsequently, we construct models based on the independent factors screened above. Due to the strong correlation between lymph metastasis and the number of positive LNs, only one variable was chosen in the development of the model. To refine the clinical application of the model, we compared the predictive effect of the models with different variables (Table S1). Finally, the model with the highest C-index was selected. The C-index for the nomogram as the final model is 0.818 (95% CI 0.708–0.928). Thus, five variables including age, FIGO stage, tumor size, lymph metastasis, and invasion depth were used to construct the static nomograms and web-based dynamic nomograms of the recurrence model. The probability of 2- and 5-year RFS was shown in the nomogram (Fig. 2). We conducted sensitivity analysis on the complete data of the recurrence model, and the model achieved similar discrimination. C-index is 0.85 (95% CI 0.73–0.96) (Table S2).

Fig. 2
figure 2

Nomogram for predicting the 2- and 5-year probability of RFS. Draw a vertical line from each variable to the corresponding points scale to obtain its points. The points are then summed and a line is drawn downward from the total points line to obtain the probability of 2- and 5-year RFS

To verify the accuracy of the model, internal verification was performed and calibration curves were drawn. The optimism-adjusted c-statistics for 2 and 5 years were 75.41% and 74.49% after internal validations by bootstrap resampling, and the calibration curve showed good agreement between predictions and observation of the nomogram, as shown in Fig. 3, which indicated that the predictive model has sufficient discriminatory power.

Fig. 3
figure 3

Calibration Curve for the 2, 5 Year recurrence rate from Nomogram. The gray line represents the ideal fit. The nomogram predicted probability of recurrence is plotted on the x-axis, and the actual recurrence rate is plotted on the y-axis. The dashed and solid line represents the performance of the present nomogram of 2 year and 5 year, respectively. The closer the distance between the two lines, the higher the prediction accuracy

Cox regression analysis of survival

Among all patients, the median follow-up was 43 months, and 13 (9.9%) patients suffered death. Initially, we analyzed the tumor characteristics associated with CSS. In univariable analysis, age > 55, FIGO stage II, tumor size > 40 mm, positive lymph metastasis and the number of positive LNs, were risk factors for RFS (p < 0.05) (Table 4). In multivariable analysis, age of > 55 (HR 7.02, 95% CI 1.87–26.33, p = 0.004), stage II (HR 2.34, 95% CI 0.69–7.93, p = 0.172), larger tumor size (HR 9.26, 95% CI 2.450–35.01, p = 0.001) and the number of positive LNs (HR 1.44, 95% CI 1.133–1.83, p = 0.003) was associated with poor prognosis. The results are shown in the forest plot (Fig. 4).

Table 4 Univariable Cox regression of survival model
Fig. 4
figure 4

Forest plot shows the multivariate Cox regression model that predicts CSD in the survival model. CSD cancer-specific death

Nomograms and validation of survival model

We then develop a survival model nomogram based on the above analysis. To refine the clinical application of the model, the predictive effect of the models with different variables was compared (Table S3). The variable selection of the model was based on the comprehensive consideration of the results of single factor analysis to find the variables that have an impact on the prognosis, the correlation between the variables and the clinical significance. Finally, the model with the highest C-index was selected as the final model, with a C-index of 0.896 (95% CI 0.806–0.986). Four variables including age, FIGO stage, tumor size and the number of positive LNs were used to construct the static nomograms and web-based dynamic nomograms of the survival model. The nomogram shows the probability of 2- and 5-year CSS (Fig. 5). We conducted sensitivity analysis on the complete data of the survival model, and the model achieved similar discrimination. C-index is 0.92 (95% CI 0.81–1.00) (Table S4).

Fig. 5
figure 5

Nomogram for predicting the 2- and 5-year probability of CSS. CSS cancer-specific survival

We then performed internal validation using the bootstrap resampling method and drew calibration curves (Fig. 6), with optimism-adjusted c-statistics of 83.22% and 83.31% for the 2-year and 5-year CSS, respectively, indicating that the predictive model has sufficient discriminatory power. Additionally, we performed an external validation using SEER database. A total of 1679 patient data from SEER were included in this study and used for external validation. Compared with our data, more patients in the SEER database were no older than 55 years old (80.8%, n = 1356), and more patients were in stage I (93.2%, n = 1564). There was no significant difference in tumor size between the two cohorts. Clinical characteristics of the SEER cohort and the original data cohort are shown in Table S5. After external validation, the C-index of the nomogram predicting 2- and 5-year CSS was 0.69 and 0.71, respectively.

Fig. 6
figure 6

Calibration Curve for the 2, 5 Year CSS from Nomogram. The gray line represents the ideal fit. The nomogram predicted CSS is plotted on the x-axis, and the actual CSS is plotted on the y-axis. The dashed and solid line represents the performance of the present nomogram of 2 year and 5 year, respectively

We could use the two nomograms to predict the RFS and CSS of patients with CA, respectively. For instance, a 50-year-old patient with a primary tumor of 45 mm in size (100 points), stage II (52 points), no lymph metastasis (0 points), and invasion depth ≤ 2/3, had a total of 152 points. Correspondingly, the 2- and 5-year RFS probabilities were 66% and 51%, respectively. We have developed two web-based calculators in order to simplify the application of the model, (https://yfycrc.shinyapps.io/recurrence_rate/; https://yfycrc.shinyapps.io/survival/).

Discussion

The incidence of cervical adenocarcinoma has increased over the past 2 decades (Siegel et al. 2021). A large body of evidence suggests that the overall prognosis of cervical adenocarcinoma is worse than that of cervical squamous cell carcinoma (Lee et al. 2011; Rose et al. 2014), therefore, the identification of prognostic factors and the development of predictive models are important to optimize treatment planning and guidance of CA patients.

A prognostic nomogram is a predictive model that has been widely used in recent years to estimate the prognosis of cancer (El Sharouni et al. 2021). This novel model has been used to tailor the prognosis of cervical cancer(Wang et al. 2018; Xie et al. 2020; Zhou et al. 2015). Shim’s research constructed a nomogram to predict 5-year OS of patients with cervical cancer with a C-index of 0.69 (Shim et al. 2013). Lee’s study analyzed 1702 patients with stage IB–IIA cervical cancer who underwent adjuvant radiotherapy after radical hysterectomy and constructed a nomogram to predict 5-year OS with a C-index of 0.69 (Lee et al. 2013). However, few studies have focused on CA. Recently, Ni et al. constructed nomograms predicting 2- and 5-year CSS in patients with cervical adenocarcinoma using SEER dates with adjusted C-statistics of 0.90 and 0.89, respectively (Ni et al. 2021). In their study, they used only a public database (SEER) with few variables involved and performed a prediction model for survival only, without predicting recurrence.

In our current study, age, stage, size of the tumor, lymph metastasis and depth of invasion were identified as the recurrence-related factors for CA, which is consistent with the results of other studies (Lee et al. 2017; Levinson et al. 2021; Yoneoka et al. 2021). In the nomogram, tumor size contributed the most to RFS, followed by stage and age. Lymph metastasis and infiltration depth were also established as independent prognostic factors. In a study of SCC by Levinson et al., lymphovascular space invasion, tumor size and depth of invasion were found to be associated with recurrence (Levinson et al. 2021). Among these factors, the depth of invasion had the greatest impact on the prognosis, which is different from our research on CA.

We then explored the prognostic factors associated with CA survival and found that age, FIGO stage, tumor size and number of positive LNs were independent predictors of survival in CA. Histological type, age, FIGO stage, tumor size, stromal invasion, lymphatic-vascular space infiltration (LVSI), parametrial involvement, and concurrent chemotherapy, have been identified and included in the prediction model related to survival in previous cervical cancer studies (Lee et al. 2013; Polterauer et al. 2012; Shim et al. 2013; Zhou et al. 2015). In our study, these four factors: age, stage, size of the tumor, and the number of positive LNs, were identified as independent factors for patient survival and were incorporated into the model, which is consistent with the results of other studies (Gadducci et al. 2019; Khalil et al. 2015; Park et al. 2010; Stolnicu et al. 2019).In the current nomogram, the number of positive LNs contributed the most to prognosis, followed by tumor size and age. The tumor stage was established as another independent prognostic factor, although it is also a related factor to tumor size. Zhou et al. found that in patients with stage I–IIB ECA, tumor diameter (≥ 4 cm) and the number of positive lymph nodes were independent prognostic factors of relapse free survival (RFS), while the positive number of pelvic lymph nodes and age of operation were independent prognostic factors of OS (Zhou et al. 2018).

We have established the survival model through internal verification and external verification. Since the SEER database does not record the recurrence of patients, it cannot be used for external verification of our recurrence model. Both models exhibited satisfactory performance with accurate discrimination. In these models, each prognostic factor is quantified and visualized by static nomograms that can individually predict 2-year and 5-year RFS and CSS in CA patients. Two web-based calculators were developed (https://yfycrc.shinyapps.io/recurrence_rate/; https://yfycrc.shinyapps.io/survival/). After entering the appropriate variables, the patient's RFS or CCS and 95% CI can be obtained. Based on these two predictive models, physicians can determine individual risk, predict outcomes, and select appropriate therapies for patients with CA.

There are some limitations in the study. First, we established this model through retrospective analysis, which may lead to bias due to the lack of random assignment, and some missing values. Second, because all patients were from an East Asian population, the corresponding ethnic susceptibility is unknown; our results should be extrapolated to other populations with caution. Third, the prediction model for tumor recurrence was internally validated, so additional external validation using cohorts from different hospitals or regions is needed. Fourth, Due to the limited data, we did not divide the data into training set and test set, considering that the modeling data would be reduced after dismantling and the degree of assurance of model modeling and verification would be reduced. In the future, on the basis of increasing the sample size, more adequate internal verification can be carried out.

In conclusion, in the current study, we developed and validated nomogram models to predict 2-year and 5-year RFS and CSS in patients with early-stage CA, respectively. This will help to assess the prognosis of patients with CA more accurately evaluate in clinical work.