Development and validation of nomograms to recurrence and survival in patients with early-stage cervical adenocarcinoma

Purpose Cervical adenocarcinoma is one of the most common types of cervical cancer and its incidence is increasing. The biological behavior and treatment outcomes of cervical adenocarcinoma (CA) differ from those of squamous cell carcinoma (SCC). We sought to develop a model to predict recurrence and cancer-specific survival (CSS) deaths in CA patients. Methods 131 patients were included in model development and internal validation, and patients from the SEER database (N = 1679) were used for external validation. Multivariable Cox proportional hazards regression analysis was used to select predictors of relapse-free survival (RFS) and CSS and to construct the model, which was presented as two nomograms. Internal validation of the nomograms was performed using the bootstrap resampling method. Results Age, FIGO (International Federation of Gynecology and Obstetrics) stage, size of the tumor, lymph metastasis and depth of invasion were identified as independent prognostic factors for RFS, while age, FIGO stage, size of the tumor and number of positive LNs were identified as independent prognostic factors for CSS. The nomogram of the recurrence model predicted 2- and 5-year RFS, with optimism adjusted c-statistic of 75.41% and 74.49%. Another nomogram predicted the 2- and 5-year CSS with an optimism-adjusted c-statistic of 83.22% and 83.31% after internal validation; and 68.6% and 71.33% after external validation. Conclusions We developed and validated two effective nomograms based on static nomograms or online calculators that can help clinicians quantify the risk of relapse and death for patients with early-stage CA. Supplementary Information The online version contains supplementary material available at 10.1007/s00432-023-05068-4.


Introduction
Cervical cancer is the fourth most commonly diagnosed cancer among women worldwide, with approximately 604,000 new cases and 342,000 deaths in 2020 (Ward et al. 2020).Over the past few decades, the incidence and mortality rates of cervical cancer have steadily declined with the development of cervical cancer screening and HPV vaccines in high-income countries (Siegel et al. 2021).However, invasive adenocarcinoma, the second most common histologic type of invasive cervical carcinoma, has shown an increasing trend in incidence over the past two decades (Islami et al. 2019).Therefore, more attention should be paid to the prevention and treatment of cervical adenocarcinoma.Increasing evidence has shown that the genomic alterations, biological behavior, treatment outcomes, and prognostic factors of cervical adenocarcinoma (CA) differ from those of squamous cell carcinoma (SCC) (Ni et al. 2021).More recently, Levinson et al. reported  factor for recurrence of cervical adenocarcinoma, while the depth of invasion was the highest risk factor for recurrence of squamous cell carcinoma (Bhatla et al. 2019).Therefore, it is necessary to explore the prognostic factors of CA and develop a predictive model for predicting the prognosis of CA and optimizing treatment strategies.In this study, we hoped to develop the nomograms for relapse-free survival (RFS) and cancer-specific survival (CSS) using our central database as well as the Surveillance, Epidemiology, and End Results (SEER) database to more accurately evaluate the prognosis of patients with cervical adenocarcinoma.

Patient data
All consecutive patients diagnosed with cervical adenocarcinoma (CA) at the First Affiliated Hospital of Wenzhou Medical University, China between December 2008 and September 2018 were eligible for this study.This study complied with the Declaration of Helsinki.In the study, patients had signed written informed consents to be included.This research followed the ethical principles of the First Affiliated Hospital of Wenzhou Medical University.Clinical and pathologic information were obtained from patient files and pathology reports.Only patients with stage I-II (FIGO 2009) CA were included.

Predictors
The following predictors were selected for model development: age, grade, FIGO 2009 stage, surgery manner (laparotomy or laparoscopy), tumor size, differentiation (low, medium, and high differentiation), lymph metastasis (yes or no), number of positive lymph nodes (LNs), lymph-vascular space invasion (LVSI) (yes or no), infiltration depth, resection margin (positive or negative), radiation (yes or no), chemotherapy (yes or no), D-Dimer, platelet, total cholesterol (TC), triglyceride (TG), high-density lipoprotein (HDL), low-density lipoprotein (LDL), glucose (GLU), hemoglobin (HB), red blood cell (RBC), (the above blood tests were completed within 1 week before the operation), red blood cell after the operation (RBC after) and hemoglobin after the operation (HB after).Both RBC after and HB after were tested the next day after the operation).

Handling of missing data
In the study, there were 146 cases in the original database, of which 15 cases were lost to follow-up, giving a total of 131 cases were included in the analysis.Missing values, including both categorical and continuous variables, were imputed using random forest method for five times.The model results of different imputed datasets were combined according to Rubin's rules, and the pooled C-index value and 95%CI were also calculated using mice:pool function.

Transformation of the predictors
To facilitate the model's use and interpretation in practice, continuity variables such as age and tumor size were transformed into categorical variables.The optimal cutoff point of age was selected using the log-rank tests.According to the log-rank test, age at diagnosis was categorized as: ≤ 55, > 55 years.According to the FIGO staging system, tumor size (defined as the maximum measurement of horizontal diffusion or surface diameter in the ultrasound field) was divided into two groups: ≤ 40 and > 40 mm.

Predictor selection
The primary endpoint focused in this study is patient recurrence.Univariate Cox regression analysis was used to determine the prognostic factors associated with the total recurrence rate and a p value < 0.05 was considered statistically significant.Hence, eight variables were ascertained in subsequent analysis: age, stage, tumor size, lymph metastasis, number of positive LNs, infiltration depth, edge positive, and radiation.Considering the results of univariate Cox analysis, the clinical relevance of the variables, and the sample size, multiple multifactor models were established.The model with the highest C-index were selected as the final prediction model.Consequently, five variables, including age, stage, size of the tumor, lymph metastasis and depth of invasion, were included in the final model.
For the survival model, the study endpoint of this study was death specifically attributed to CA. Survival time was calculated from the time of diagnosis until death attributed to CA or last follow-up.Through the same statistical method as above, four variables, including age, stage, tumor size and number of positive LNs, were included in the survival model.

Model development and internal validation
To visualize the predictive models, two nomograms for predicting the 2-and 5-year relapse-free survival (RFS) and cancer-specific survival (CSS) in patients were further constructed.Then developed nomograms were internally validated and calibrated using the bootstrap resampling (B = 1000) approach as assessed by the C-index and calibration curves.The survival prediction model was validated in SEER database externally.

Statistical analysis
Continuous variables are described as mean ± standard deviation (SD) or median with interquartile range (IQR) values, depending on whether they are normal or non-normal.Categorical variables are shown as numbers and percentages for each group.Cox proportional hazards regression analysis was used to construct predictive models that were presented as static nomograms and dynamic web-based nomograms.The nomogram for the recurrence model was internally validated with a bootstrap resampling method.The prediction performance of the survival nomogram was assessed by resampling techniques for internal validation and on the external validation cohort from SEER database.All statistical analyses were performed using R software (version 3.6.3).p < 0.05 was considered statistically significant.

Nomograms and internal validation of recurrence model
Subsequently, we construct models based on the independent factors screened above.Due to the strong correlation between lymph metastasis and the number of positive LNs, only one variable was chosen in the development of the model.To refine the clinical application of the model, we compared the predictive effect of the models with different variables (Table S1).Finally, the model with the highest C-index was selected.The C-index for the nomogram as the final model is 0.818 (95% CI 0.708-0.928).Thus, five variables including age, FIGO stage, tumor size, lymph metastasis, and invasion depth were used to construct the static nomograms and web-based dynamic nomograms of the recurrence model.The probability of 2-and 5-year RFS was shown in the nomogram (Fig. 2).We conducted sensitivity analysis on the complete data of the recurrence model, and the model achieved similar discrimination.C-index is 0.85 (95% CI 0.73-0.96)(Table S2).
To verify the accuracy of the model, internal verification was performed and calibration curves were drawn.The optimism-adjusted c-statistics for 2 and 5 years were 75.41% and 74.49% after internal validations by bootstrap resampling, and the calibration curve showed good agreement between predictions and observation of the nomogram, as shown in Fig. 3, which indicated that the predictive model has sufficient discriminatory power.

Cox regression analysis of survival
Among all patients, the median follow-up was 43 months, and 13 (9.9%)patients suffered death.The results are shown in the forest plot (Fig. 4).

Nomograms and validation of survival model
We then develop a survival model nomogram based on the above analysis.To refine the clinical application of the model, the predictive effect of the models with different variables was compared (Table S3).The variable selection of the model was based on the comprehensive consideration of the results of single factor analysis to find the variables that have an impact on the prognosis, the correlation between the variables and the clinical significance.Finally, the model with the highest C-index was selected as the final model, with a C-index of 0.896 (95% CI 0.806-0.986).Four variables including age, FIGO stage, tumor size and the number of positive LNs were used to construct the static nomograms and web-based dynamic nomograms of the survival model.
The nomogram shows the probability of 2-and 5-year CSS (Fig. 5).We conducted sensitivity analysis on the complete data of the survival model, and the model achieved similar discrimination.C-index is 0.92 (95% CI 0.81-1.00)(Table S4).
We then performed internal validation using the bootstrap resampling method and drew calibration curves (Fig. 6), with optimism-adjusted c-statistics of 83.22% and 83.31% for the 2-year and 5-year CSS, respectively, indicating that the predictive model has sufficient discriminatory power.Additionally, we performed an external validation using SEER database.A total of 1679 patient data from SEER were included in this study and used for external validation.Compared with our data, more patients in the SEER database were no older than 55 years old (80.8%, n = 1356), and more patients were in stage I (93.2%, n = 1564).There was no significant difference in tumor size between the two cohorts.Clinical characteristics of the SEER cohort and the original data cohort are shown in Table S5.After external validation, the C-index of the nomogram predicting 2-and 5-year CSS was 0.69 and 0.71, respectively.We could use the two nomograms to predict the RFS and CSS of patients with CA, respectively.For instance, a 50-yearold patient with a primary tumor of 45 mm in size (100 points), stage II (52 points), no lymph metastasis (0 points), and invasion depth ≤ 2/3, had a total of 152 points.Correspondingly, the 2-and 5-year RFS probabilities were 66% and 51%, respectively.We have developed two web-based calculators in order to simplify the application of the model, (https:// yfycrc.shiny apps.io/ recur rence_ rate/; https:// yfycrc.shiny apps.io/ survi val/).

Discussion
The incidence of cervical adenocarcinoma has increased over the past 2 decades (Siegel et al. 2021).A large body of evidence suggests that the overall prognosis of cervical adenocarcinoma is worse than that of cervical squamous cell carcinoma (Lee et al. 2011;Rose et al. 2014), therefore, the identification of prognostic factors and the development of predictive models are important to optimize treatment planning and guidance of CA patients.
A prognostic nomogram is a predictive model that has been widely used in recent years to estimate the prognosis of cancer (El Sharouni et al. 2021).This novel model has been used to tailor the prognosis of cervical cancer (Wang et al. 2018;Xie et al. 2020;Zhou et al. 2015).Shim's research constructed a nomogram to predict 5-year OS of patients with cervical cancer with a C-index of 0.69 (Shim et al. 2013).Lee's study analyzed 1702 patients with stage IB-IIA cervical cancer who underwent adjuvant radiotherapy after radical hysterectomy and constructed a nomogram to predict 5-year OS with a C-index of 0.69 (Lee et al. 2013).However, few studies have focused on CA.Recently, Ni et al. constructed nomograms predicting 2-and 5-year CSS in patients with cervical adenocarcinoma using SEER dates with adjusted C-statistics of 0.90 and 0.89, respectively (Ni et al. 2021).In their study, they used only a public database (SEER) with few variables involved and performed a prediction model for survival only, without predicting recurrence.In our current study, age, stage, size of the tumor, lymph metastasis and depth of invasion were identified as the recurrence-related factors for CA, which is consistent with the results of other studies (Lee et al. 2017;Levinson et al. 2021;Yoneoka et al. 2021).In the nomogram, tumor size contributed the most to RFS, followed by stage and age.Lymph metastasis and infiltration depth were also established as independent prognostic factors.In a study of SCC , lymphovascular space invasion, tumor size and depth of invasion were found to be associated with recurrence (Levinson et al. 2021).Among these factors, the depth of invasion had the greatest impact on the prognosis, which is different from our research on CA.
We then explored the prognostic factors associated with CA survival and found that age, FIGO stage, tumor size and number of positive LNs were independent predictors of survival in CA.Histological type, age, FIGO stage, tumor size, stromal invasion, lymphatic-vascular space infiltration (LVSI), parametrial involvement, and concurrent chemotherapy, have been identified and included in the prediction model related to survival in previous cervical cancer studies (Lee et al. 2013;Polterauer et al. 2012;Shim et al. 2013;Zhou et al. 2015).In our study, these four factors: age, stage, size of the tumor, and the number of positive LNs, were identified as independent factors for patient survival and were incorporated into the model, which is consistent Fig. 3 Calibration Curve for the 2, 5 Year recurrence rate from Nomogram.The gray line represents the ideal fit.The nomogram predicted probability of recurrence is plotted on the x-axis, and the actual recurrence rate is plotted on the y-axis.The dashed and solid line represents the performance of the present nomogram of 2 year and 5 year, respectively.The closer the distance between the two lines, the higher the prediction accuracy Fig. 4 Forest plot shows the multivariate Cox regression model that predicts CSD in the survival model.CSD cancer-specific death with the results of other studies (Gadducci et al. 2019;Khalil et al. 2015;Park et al. 2010;Stolnicu et al. 2019).In the current nomogram, the number of positive LNs contributed the most to prognosis, followed by tumor size and age.The tumor stage was established as another independent prognostic factor, although it is also a related factor to tumor size.Zhou et al. found that in patients with stage I-IIB ECA, tumor diameter (≥ 4 cm) and the number of positive lymph nodes were independent prognostic factors of relapse free survival (RFS), while the positive number of pelvic lymph nodes and age of operation were independent prognostic factors of OS (Zhou et al. 2018).
We have established the survival model through internal verification and external verification.Since the SEER database does not record the recurrence of patients, it cannot be used for external verification of our recurrence model.Both models exhibited satisfactory performance with accurate discrimination.In these models, each prognostic factor is quantified and visualized by static nomograms that can individually predict 2-year and 5-year RFS and CSS in CA patients.Two web-based calculators were developed (https:// yfycrc.shiny apps.io/ recur rence_ rate/; https:// yfycrc.shiny apps.io/ survi val/).After entering the appropriate variables, the patient's RFS or CCS and 95% CI can be obtained.Based on these two predictive models, physicians can determine individual risk, predict outcomes, and select appropriate therapies for patients with CA.
There are some limitations in the study.First, we established this model through retrospective analysis, which may lead to bias due to the lack of random assignment, and some missing values.Second, because all patients were from an East Asian population, the corresponding ethnic susceptibility is unknown; our results should be extrapolated to other populations with caution.Third, the prediction model for tumor recurrence was internally validated, so additional external validation using cohorts from different hospitals or regions is needed.Fourth, Due to the limited data, we did not divide the data into training set and test set, considering that the modeling data would be reduced after dismantling and the degree of assurance of model modeling and verification would be reduced.In the future, on the basis of increasing the sample size, more adequate internal verification can be carried out.
In conclusion, in the current study, we developed and validated nomogram models to predict 2-year and 5-year RFS and CSS in patients with early-stage CA, respectively.This will help to assess the prognosis of patients with CA more accurately evaluate in clinical work.

Fig. 1
Fig. 1 Forest plot shows the multivariate Cox regression model that predicts recurrence of CA

Fig. 5
Fig. 5 Nomogram for predicting the 2-and 5-year probability of CSS.CSS cancer-specific survival that tumor size was the highest risk Xintao Wang and Wenpei Shi contributed equally to this work.

Table 1
Clinical characteristics of the recurrence group and the nonrecurrent group from raw data

Table 3
Univariable Cox regression of recurrence model