Background

Cervical cancer is the third most common cancer in women. In 2021 341,831 women died from the disease worldwide [1, 2]. Despite screening programs and vaccination, a substantial number of patients still present with locally advanced disease (LACC), International Federation of Gynaecology and Obstetrics (FIGO) 2009 stages IB2, IIB–IVA and FIGO 2018 stages IB3– IVA. Patients with LACC are treated with (chemo)radiation. In general, the radiotherapy treatment plan is based on physical examination (under anaesthesia) and imaging.

Staging is done according to the FIGO staging system and this was, until a couple of years ago, mainly based on clinical parameters [3]. Imaging was only added to the latest FIGO staging update in 2018 [4, 5].

2-deoxy-2-[18F]fluoro-D-glucose positron emission tomography computed tomography ([18F]FDG-PET/CT) is essential in staging LACC, as it has a better performance in showing lymph node and distant metastases compared to magnetic resonance imaging (MRI) or computed tomography (CT) [6]. If [18F]FDG-positive lymph nodes are present, patients have a worse prognosis compared to patients with the same clinical FIGO 2009 stage, but [18F]FDG-negative nodes [7]. There are, however, only limited data on the change in prognostic ability of FIGO 2018 when additional imaging is taken into account [8].

Considering the role of [18F]FDG-PET/CT in LACC, Kidd et al. developed a prediction model based on lymph node status, maximum standardized uptake value (SUVmax) of the primary tumor, and the metabolic tumor volume (MTV) measured on the PET images [9]. After entering these data into the model, the prediction of recurrence free survival (RFS), disease specific survival (DSS) and overall survival (OS) is expressed as a surviving fraction. Prediction models are often used in the field of oncology. Ideally, a robust prediction model should aid solving a clear clinical problem, be tested on several independent datasets, have mature follow up data and an evaluation of the statistical robustness [10]. However, prediction models are seldom tested using an external patient cohort and rarely specify the choices in treatment regime or cutoff values for a specific treatment choice.

In the treatment of LACC, detection of lymph node metastasis is essential in treatment planning as adjustment of the treatment plan leads to a better survival [11, 12]. Whether to extend the radiotherapy volume to the para-aortic region is a crucial decision, as patients with pelvic node metastasis only have a 5-year survival rate of approximately 60% and it drops to 37% in case of para-aortic metastases [13]. This decision is challenging, as there is chance of (A) false negative nodes (micro-metastases not shown on imaging) [13,14,15,16] and (B) false positive, reactive nodes (due to tumor necrosis) [17, 18]. Both overtreatment and under treatment have serious consequences: irradiation of a reactive node means substantial toxicity without survival benefit, while not adequately treating a tumor-positive node could result in impaired survival. Therefore, a risk stratification tool identifying low and high risk patients could aid decision making towards or against a treatment regime with increased toxicity, such as para-aortic radiotherapy.

The purpose of our study to determine risk profiles in locally advanced uterine cervical cancer, based on evaluation of pre-treatment PET-derived characteristics. First, we evaluated an existing PET-based prediction model [9]in our population. After that, in order to study if the prediction model was valid in current practice, patients were retrospectively re-staged according to the FIGO 2018 staging system. Low and high risk groups were defined based on the calculated survival cutoff values.

Materials and methods

Patients

Consecutive patients with suspected LACC and intended curative chemoradiation between 1 January 2013 and 31 November 2018 at the Amsterdam University Medical Center (UMC) were included in this retrospective study. All included patients underwent [18F]FDG-PET/CT imaging for staging and radiation treatment planning. Exclusion criteria were: unexpected distant metastases or insufficient tumor to background ratio for clear delineation of the tumor volume and no received treatment. Due to the retrospective manner of the study, informed consent was waived by the medical ethical committee of the Amsterdam UMC.

All patients underwent physical examination under anesthesia, including cystoscopy; MRI and [18F]FDG-PET/CT imaging before starting (chemo)radiotherapy treatment. Tumor stage was clinically determined according to the FIGO 2009 staging system [3]. During the study all patients were retrospectively restaged according to the FIGO 2018 staging system [4, 5].

[18F]FDG-PET/CT imaging

[18F]FDG-PET/CT scans were performed within three weeks after diagnosis, either in regular or in radiation treatment position after fasting for at least six hours. Patients were orally pre-hydrated 24 h prior, and on the day of the investigation (including 0.7 L diluted oral contrast– Joxithalamate 5% [Telebrix Gastro], Guerbet, Villepinte, France). Based on body mass index (BMI), patients were injected with an intravenous bolus of 180–300 MBq [18F]FDG. First, an abdominal CT scan was performed with full bladder followed by PET acquisition for radiotherapy planning purposes. After voiding, a second, low dose or diagnostic CT scan was made from the thigh to the skull base, with administration of i.v. contrast agent (Iopromide [Ultravist 300] Bayer Pharma AG, Berlin, Germany). Then, the second PET acquisition followed. This PET/CT scan was used for staging. When low dose CT was performed, PET/CT images were visually compared to the recent diagnostic CT (performed less than three weeks prior to the PET/CT).

The CT and [18F]FDG-PET images were fused and viewed (maximum intensity projection, coronal, sagittal and transversal reconstructions) using a Hermes Hybrid viewer (Nuclear Diagnostics AB, Stockholm, Sweden). The CT part of the investigation was additionally viewed on a picture archiving and communication system (PACS, Agfa Enterprise Imaging, Agfa Healthcare System, Mortsel, Belgium).

Treatment and follow up

All patients were scheduled for curative radiation therapy. External beam radiotherapy (EBRT) was given to the pelvis [total dose 46–50.4 Gy, 1.8–2.0 Gy per fraction] in combination with weekly cisplatin [40mg/m2], or hyperthermia as a substitute for concurrent chemotherapy [19, 20]. Radiotherapy was extended to the para-aortic region, if there were suspicious nodes on PET/CT at or above the level of the common iliac vessels. Fletcher brachytherapy to the primary tumor followed during two pulsed dose rate (PRD) applications to a dose of 14 Gy in an hour (total dose 28 Gy). Additional EBRT boost to suspicious parametria and lymph nodes on imaging was given as an integrated boost during pelvic EBRT, or sequentially after brachytherapy up to a biologically equivalent total dose of 60 Gy. All patients had EBRT using an organ sparing technique, (intensity modulated radiotherapy [IMRT] or volumetric modulated arc therapy [VMAT]).

Patients with [18F]FDG-positive supraclavicular nodes received platinum based palliative chemotherapy, followed by radiotherapy to the pelvis (typically 30 Gy in 3 Gy fractions). A few patients chose to undergo radiotherapy only. Standard follow up consisted of alternating visits to the gynecologist and radiation oncologist every 3 months for the first two years, every 6 months in the third and fourth years and once in the fifth year. Imaging was only performed during follow up if there was suspicion of recurrent disease, according to local protocols.

Outcome measures

RFS was defined as time from the day of pathological diagnosis until the date of any recurrent disease, determined by either physical examination, imaging or pathological confirmation. DSS was defined as the time from the date of diagnosis until the date of death caused by cervical cancer. OS was defined as time from the date of diagnosis until the date of death, irrespective of the cause of death. The survival data were collected from the electronic patient records, patients were censored at the date of their last visit without event.

Data extraction

[18F]FDG-PET/CT investigations at diagnosis were analyzed by one of two experienced nuclear medicine physicians (JA 18 years and BE 15 years of experience in PET/CT reading). In case of uncertainties, consensus between the two readers was reached.

Metabolic tumor volume and SUVmax

Metabolic tumor volume and SUVmax were determined as described earlier [9, 21]. Briefly: first, the bladder activity was masked by creating an automated volume of interest (VOI) by volume rendering and manually adjusted if necessary. Then, an automated VOI was created including the tumor using a fixed 30% threshold at a lowest SUVmax limit of 4.0 and adjusted for best fit (mostly 20–40% threshold). The automated region was manually corrected to prevent erroneously inclusion of adjacent tissue– such as ureter, ovary, bowel or non-masked thin border of the bladder. The necrotic part of the tumor was not included in the VOI.

The interobserver variability of the two observers was determined in a subset of thirty scans. The mean difference ± SD of tumor volume and SUVmax in this subset was compared. A difference of less than 30% was accepted for measuring metabolic tumor volume, as described before [22]. No difference in the SUVmax was accepted.

After analyzing the results of the interobserver variability, the whole cohort (inclusive the subset of thirty scans) was analyzed by either one of the two observers or in case of uncertainty (e.g. low SUVmax or necrosis) consensus was reached between the two readers.

The PET-derived tumor volume, the SUVmax within the VOI, and the used threshold were recorded on the case report form (CRF).

Lymph node status

Lymph node status was collected from the [18F]FDG-PET/CT report in the electronic patient chart. A lymph node was considered positive if its short axis diameter was more than 1.0 cm, and the [18F]FDG-uptake was more than the adjacent vessel or surrounding tissue, or a short axis diameter of less than 1.0 cm in case of very intense uptake (more than twice the adjacent vessel or surrounding tissue) as described before [9]. As patients with positive nodes at the level of the common iliac vessels received para-aortic irradiation, common iliac nodes were considered para-aortic in our cohort.

Prediction model validation

First, we externally validated an existing [18F]FDG-PET/CT-based prediction model to see if it was applicable in our patient cohort treated with contemporary radiotherapy treatment methods.

The validation of the [18F]FDG-PET/CT based prediction model by Kidd et al. [9] was performed as follows. The original Cox proportional hazards equations with lymph node status, SUVmax and MTV of the primary tumor and of the highest [18F]FDG-positive lymph node, and baseline hazards at 12, 36 and 64 months were received after request and were used for the calculation of the estimated survival. The regression parameters, baseline Hazard estimates and prediction models for 1, 3 and 5 years recurrent free survival, disease specific survival and overall survival of the Kidd model are included in the electronic supplementary material.

Then the estimated and observed survival was compared: 1, 3, and 5-year RFS, DSS and OS receiver operating characteristic (ROC) analysis was performed and areas under the curve (AUC) with 95% confidence interval (CI) were calculated. An AUC close to or higher than 0.7 was considered as adequate for a sufficient prediction accuracy as described before [23, 24].

Risk stratification: low and high risk groups

From the nine studied outcome measures we chose the 3-year RFS for determining “low” and “high” risk groups for deciding on treatment options e.g. para-aortic irradiation, as recurrent disease mostly occurs within 3 years after diagnosis [25]. In addition, we studied the 5-year overall survival as a general accustomed oncological outcome measure.

The optimal cutoff value was calculated using the Youden (J) index of the ROC curve [26]. The J index combines sensitivity and specificity in a single measure, and is used for determining a decision threshold based on the maximized sum of sensitivity and specificity [27], expressed as a chance for an event and translated to survival chance as (1-J). Patients with outcomes above this value were considered as low risk and under as having high risk. Of each model a Kaplan-Meier analysis was performed and differences between low and high risk patients were tested with a log rank test. Statistical analysis was performed with Rstudio (www.r-project.org version 1.2.335).

After the cutoff value was determined we studied the distribution of patients according to FIGO 2009 and 2018 stages in the low and high risk groups.

Results

Patients and tumor characteristics

From an initial cohort of 202 patients with LACC who underwent [18F]FDG-PET/CT, 19 patients were excluded: 6 had distant metastases, 8 had a different tumor type: vaginal carcinoma, sigmoid adenocarcinoma, melanoma, uterine endometrioid adenocarcinoma, uterine carcinosarcoma, 4 had an insufficient tumor to background ratio for clear delineation of the tumor volume and one patient refused treatment. This resulted in a study cohort of 183 patients.

Patient characteristics and tumor types of our cohort and the cohort analyzed by Kidd et al. and the retrospectively allocated FIGO 2018 stage are summarized in Table 1. Our cohort was comparable to the original cohort, except that we did not include patients with FIGO 2009 stage IB1 as we intended to use the prediction model in LACC. Patients with supraclavicular metastasis only were classified as IVB in our cohort.

Table 1 Patient characteristics and tumor pathology (n = 183) [*average]

Interobserver variability

After automated and adjusted delineation, the mean difference of both observers for the MTV was 15.9 ± 14.2%. No difference in SUVmax was noted. In 3/30 of the patients the difference between tumor volume was more than 30%. In these cases the tumor was either necrotic or had a low metabolic activity. Therefore it was concluded that in case of a necrotic tumor or a tumor with low metabolic activity, where no automated VOI region with the set threshold could be created, consensus between the two observers should be reached. This was necessary in 19/183 patients (10%).

Prediction model evaluation

In our cohort, the median MTV was 47.1 cm3 [3.0-351.9] and median SUVmax was 15.6 [3.7–60.7], in the cohort of Kidd et al. the average MTV of 66.4 cm3 [3-535.7] and average SUVmax of 12.4 [2.1–50.4]. The level of the highest [18F]FDG-positive lymph node was pelvic in 37.2%, para-aortic in 17.5% and supraclavicular in 4.4% of the patients in our cohort. In the cohort of Kidd et al. it was 53%, 18% and 4% respectively. No [18F]FDG-positive nodes were seen in 40.9% of the patients compared to 47% in the cohort of Kidd et al. Data are shown in Table 2.

In our cohort, the observed RFS at 1, 3 and 5 years were 0.87, 0.79 and 0.73, the DSS 0.92, 0.85 and 0.76, and OS 0.91, 0.81 and 0.68, respectively.

Table 2 PET-derived parameters and treatment (n = 183) [*average]

Comparison of the observed survival with the model-based survival estimates yielded AUCs between 0.7 and 0.807 for all outcomes except for 5-year RFS (0.684) and 5-year OS (0.650), the detailed data including the confidence intervals are shown in Table 3.

Considering that the majority of the patients had squamous cell carcinoma (84.1%), we also calculated the AUC for this group only. The AUC was somewhat higher for patients with squamous cell carcinoma compared to the whole cohort for all studied survival categories, see Table 3.

Table 3 Area under the curve and 95% confidence intervals for the estimated versus observed survival in the whole patient group (n = 183) and in patients with squamous cell carcinoma (n = 154)

Risk stratification

When determining the cutoff value for low and high risk patients, the best fit for the 3-year RFS appeared at a cutoff value of survival chance of 0.43 (Fig. 1a) and for the 5-year OS of 0.47 (Fig. 1b).

Fig. 1
figure 1

(a) Kaplan-Meyer curves of patients with low and high risk for 3-year RFS. (b) Kaplan-Meyer curves of patients with low and high risk for 5-year OS

When dividing patients into low or high-risk groups, all the patients in the high risk group had [18F]FDG-positive para-aortic or supraclavicular nodes. None of the patients in the low risk group showed [18F]FDG-positive nodes outside the pelvis. The distribution of the FIGO 2009 stage was similar with representation of all FIGO stages in both risk groups (Table 4). After patients were restaged according to the FIGO 2018 stage, the distribution had changed. As expected, all patients with [18F]FDG-positive para-aortic nodes (stage IIIC2) fell into the high risk group. In stages IB2–IIIA and IIIB the majority of the patients fell in the low risk group for both outcomes. In case of patients with stage IIIC1 disease, however, the FIGO 2018 stage cannot properly differentiate between low and high risk patients (Table 4; Fig. 2). These observations were valid for all studied outcome measures (data not shown).

Table 4 Characteristics of high and low risk patients for 3-year recurrence free survival and 5-year overall survival
Fig. 2
figure 2

Distribution of patients with stages FIGO 2009 and 2018 LACC in the low and high risk group determined by the Kidd prediction model for 3-year RFS and 5-year OS. Abbreviations: RFS– recurrent free survival, OS– overall survival, LACC– locally advanced cervical cancer

Discussion

To be able to define a low risk and high risk group for recurrence in LACC, we have tested the accuracy of an earlier described PET-based prediction model in our retrospective cohort of 183 consecutive patients with LACC. The observed survival was in line with the earlier reported survival in LACC [28]. When comparing the estimated survival with the observed survival, the predictive accuracy was sufficient to predict 1-, 3- and 5-year RSF, DSS and OS with AUC values between 0.650 and 0.807. When looking at patients with squamous cell carcinoma only, the AUC values were slightly higher and had a range of 0.680–0.840. These results indicate that the Kidd prediction model is applicable to our patient population with LACC.

Lora et al. previously externally validated the Kidd prediction model [29] and concluded that the 3-year OS and the 1-year DSS are statistically valid with an AUC of 0.69 and 0.64 respectively. Although the observed survival of the patient cohort of Lora et al. was similar to ours, the prediction model performed better in our cohort. This could be explained by the fact that the validation of Lora et al. is based on the visual nomogram and not on a Cox-analysis of the prediction model. Second, in the patient cohort of Lora et al. all patients with bulky disease or FIGO 2009 stage ≥ IIB received para-aortic irradiation irrespective of the imaging results. Similar to Kidd et al., in our cohort, para-aortic radiotherapy was only applied in case of suspicious (e.g. [18F]FDG-positive) para-aortic nodes.

One of the challenging situations in LACC is the decision to extend the radiation volume to the para-aortic region. Considering that most recurrences occur within three years, we used the 3-year RFS to evaluate whether the Kidd prediction model could aid the decision for extending the radiotherapy field. Our results suggest that the 3-year RFS at a predicted survival chance cutoff value of 0.43 could differentiate between high and low risk patients. When looking at the risk profile of the low and high risk patients, all patients with para-aortic nodes (FIGO 2018 stage IIIC2) fall into the high risk category. This prognostic high risk suggests that in these patients the chance that the nodes are false positive on imaging is small and decision should be towards irradiation while accepting toxicity. This is in line with our earlier data that the certainty of an [18F]FDG-positive node improves with prevalence [30] and the earlier reported finding that patients with [18F]FDG-positive nodes had worse prognosis than patients with negative nodes in the same FIGO 2009 stage [7]. Another application of the prediction model could be to select high risk patients for additional treatment strategies. In this case, patients with the calculated 5-year OS chance under the cutoff of 0.47 should be considered high risk and potentially benefit from for example imaging during follow up, experimental immune- or targeted therapy.

When looking at the FIGO 2009 stages, we would expect that the high risk group contains more patients with higher FIGO stages. However, all FIGO 2009 stages were represented in both risk categories. This means that the FIGO 2009 stage alone is not sufficient to stratify the risk of patients with LACC, which has already been acknowledged by adding imaging to the FIGO 2018 staging. After patients were retrospectively restaged according to the FIGO 2018 stage, the distribution changed to a more skewed pattern where the risk profile was more in line with the FIGO stage. In case of patients with stage IIIC1 disease however, the FIGO 2018 stage cannot properly differentiate between low and high risk. Especially in these patients treatment choice could be optimized with the use of the Kidd prediction model. Considering that the choice of imaging has been left open in FIGO 2018, our results suggest that [18F]FDG-PET/CT should be performed in LACC, as PET-based parameters are a useful addition in risk stratification.

Our data provide additional value as prediction models should be tested on several independent datasets [10]. In addition, the study of Lora et al. [29] included patients treated 1999–2014, the study of Kidd et al. [9] patients treated in the period 1998–2008 and the patients in our cohort between 2013 and 2018. The patients in our cohort received treatment with more contemporary radiotherapy techniques (e.g. IMRT and VMAT), allowing smaller planning target volume margins and simultaneous integrated protection. Furthermore we showed that the model is applicable in FIGO 2018 staging system.

There are limitations to our study. First, the retrospective nature. Second, our cohort is relatively small, with less than the recommended 100 number of events [31], and a small group per stage. However, the sample size is comparable to the earlier reported external prediction model validations in LACC. We included all tumor types, while other tumor types, e.g. adenocarcinoma could have different outcomes than squamous cell carcinoma in LACC [32]. Indeed, our study show that the prediction model performs slightly better in case of SCC than in the whole group, nevertheless the model is robust enough in all tumor types of LACC. We included patients with supraclavicular nodes (stage IVB). This is arguable, as these patients have the worst prognosis in LACC, and inclusion might worsen the robustness of the prediction model. However, the original model is based on the presence of metastases in supraclavicular nodes as well.

Conclusion

In this study we show that an existing prediction model is applicable in our patient population with LACC and that it can identify low and high risk patients. Our data suggest that patients with an [18F]FDG-positive para-aortic or supraclavicular nodes (FIGO 2018 stage IIIC2 and IVB) belong to the high risk group. This means that when these nodes are present, treatment choice should be towards irradiation and biopsies for pathological confirmation may be considered as superfluous in high risk patients. In our population, particularly in stage IIIC1 the FIGO 2018 staging is not sufficient for risk stratification. The Kidd prediction model could be a useful addition for clinical decision making in these patients. Combining PET-prediction models with clinical parameters or other imaging data could further aid decision making in the future.