Interobserver agreement and prognostic impact for MRI–based 2018 FIGO staging parameters in uterine cervical cancer

Objectives To evaluate the interobserver agreement for MRI–based 2018 International Federation of Gynecology and Obstetrics (FIGO) staging parameters in patients with cervical cancer and assess the prognostic value of these MRI parameters in relation to other clinicopathological markers. Methods This retrospective study included 416 women with histologically confirmed cervical cancer who underwent pretreatment pelvic MRI from May 2002 to December 2017. Three radiologists independently recorded MRI–derived staging parameters incorporated in the 2018 FIGO staging system. Kappa coefficients (κ) for interobserver agreement were calculated. The predictive and prognostic values of the MRI parameters were explored using ROC analyses and Kaplan–Meier with log-rank tests, and analyzed in relation to clinicopathological patient characteristics. Results Overall agreement was substantial for the staging parameters: tumor size > 2 cm (κ = 0.80), tumor size > 4 cm (κ = 0.76), tumor size categories (≤ 2 cm; > 2 and ≤ 4 cm; > 4 cm) (κ = 0.78), parametrial invasion (κ = 0.63), vaginal invasion (κ = 0.61), and enlarged lymph nodes (κ = 0.63). Higher MRI–derived tumor size category (≤ 2 cm; > 2 and ≤ 4 cm; > 4 cm) was associated with a stepwise reduction in survival (p ≤ 0.001 for all). Tumor size > 4 cm and parametrial invasion at MRI were associated with aggressive clinicopathological features, and the incorporation of these MRI–based staging parameters improved risk stratification when compared to corresponding clinical assessments alone. Conclusion The interobserver agreement for central MRI–derived 2018 FIGO staging parameters was substantial. MRI improved the identification of patients with aggressive clinicopathological features and poor survival, demonstrating the potential impact of MRI enabling better prognostication and treatment tailoring in cervical cancer. Key Points • The overall interobserver agreement was substantial (κ values 0.61–0.80) for central MRI staging parameters in the 2018 FIGO system. • Higher MRI–derived tumor size category was linked to a stepwise reduction in survival (p ≤ 0.001 for all). • MRI–derived tumor size > 4 cm and parametrial invasion were associated with aggressive clinicopathological features, and the incorporation of these MRI–derived staging parameters improved risk stratification when compared to clinical assessments alone. Supplementary Information The online version contains supplementary material available at 10.1007/s00330-022-08666-x.


Introduction
Uterine cervical cancer is the fourth most common cancer among women worldwide, and one of the leading causes of cancer-related deaths, especially in low-and middle-income countries [1]. Cervical cancer is staged according to the International Federation of Gynecology and Obstetrics (FIGO) system [2]. The previous 2009 FIGO classification was primarily based on clinical examinations with limited incorporation of information from additional diagnostic procedures [3]. Thus, cross-sectional imaging findings, though commonly used to guide treatment decisions in highresource settings, were not included in the staging [4,5]. Recognizing this disparity, the recently revised 2018 FIGO system formally incorporates results from available diagnostic imaging and pathology assessments into stage assignment [2]. 2018 FIGO subdivides stage IB into IB1-3 based on tumor size, and assigns lymph node metastases to stage IIIC. Better risk stratification between 2018 FIGO stages than between 2009 FIGO stages has been reported [6][7][8], and large tumor size, parametrial invasion, and nodal involvement are uniformly reported to predict poor outcome in cervical cancer [6][7][8][9][10][11][12][13].
This study aimed to evaluate the interobserver agreement for MRI-based 2018 FIGO staging parameters at pretreatment MRI in a large cervical cancer patient cohort, and assess the potential prognostic value of these MRI parameters in relation to clinical 2009 FIGO stage and clinicopathological markers.

Patients and study setting
This retrospective study on prospectively collected data was approved by the Regional Committee for Medical Research Ethics (2015/2333/REK vest) with written informed consent at primary diagnosis from all patients.
From May 2002 to December 2017, pelvic MRI was performed as part of clinical routine at primary diagnostic workup in 420 women with histologically confirmed cervical cancer. Four patients had incomplete MRI (n = 2) or missing follow-up data (n = 2), leaving 416 patients eligible for study inclusion. All patients were diagnosed and treated at Haukeland University Hospital. Clinical data (e.g., clinical tumor size and 2009 FIGO stage) were registered. Patients originally staged according to the 1994 FIGO system were later restaged based on the 2009 FIGO staging criteria. Histopathological variables and follow-up data were collected from the medical records. Progression was defined as local recurrence/progression in the pelvis or new metastases in the abdomen or at distant sites, confirmed by clinical examination with biopsy, or by imaging (computed tomography (CT), MRI, and/or 18 F-fluorodeoxyglucose positron emission tomography with CT (FDG-PET/CT)). Patients presenting with new imaging findings regarded as highly likely to represent progression (e.g., growth of known tumor mass or new lesions/new FDG-PET positive lesions in patients without previous history of other malignancies as potential origins of metastases) were categorized as recurrence (even without histological verification). Imaging findings regarded as unsure or possible (but not indicative of) progression, in patients without a positive biopsy, were categorized as no recurrence. Date of last follow-up was September 2021. During the follow-up, 89 patients experienced progression with a median (mean) [interquartile range, IQR] time to progression of 11 (16) [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24]

MRI protocol
Pelvic MRI was acquired on scanners from different manufacturers (GE Healthcare, Siemens Healthineers, Philips Healthcare), comprising 1.5-T (329/416 patients) or 3.0-T (87/416 patients) systems, at five hospitals in Western Norway. The imaging protocols and scanning parameters varied across scanners and institutions, reflecting current guidelines and local preferences. All examinations were, however, dedicated pelvic protocols largely in accordance with European Society of Urogenital Radiology (ESUR) guidelines for MRI staging of cervical cancer [14]. As a minimum, the protocols included axial and/or axial oblique (perpendicular to the long axis of the uterine cervix), sagittal, and coronal and/or coronal oblique (parallel to the long axis of the cervix) T2-weighted (T2W) sequences in addition to an axial T1-weighted (T1W) sequence of the pelvis. In total, 66% (273/416) of the examinations included a pelvic diffusion-weighted imaging (DWI) sequence. During the study period, contrastenhanced T1W series were not routinely included in the MRI protocols and thus were only performed in 10% (40/ 416) of the patients. A detailed overview of MRI acquisition parameters in a subset of the patients (n = 123) is given in Suppl. Table 1.

Image analysis
The MRI examinations were de-identified and reviewed independently by three radiologists blinded to clinical and histopathologic information. Reader 1 (N.L.), 2 (K.W.L.), and 3 (I.J.M.) were consultants from the same institution with 5, 10, and 20 years of experience, respectively, with pelvic MRI. The readers reported MRI findings relevant for 2018 FIGO staging [25] in a standardized form including both continuous and categorical variables. Maximum tumor diameters and depth of parametrial invasion were measured regardless of plane on T2W images (Fig. 1) and later categorized (≤ 2 cm; > 2 and ≤ 4 cm; > 4 cm, absence/presence of parametrial invasion). Patients with no visible tumor were recorded with maximum tumor size ≤ 2 cm. Regardless of tumor visibility on MRI, all images were analyzed systematically for relevant findings (e.g., enlarged lymph nodes). Imaging findings suggesting vaginal invasion (upper two-thirds or lower third), bladder-/rectum-or pelvic-sidewall invasion, hydroureter (indicative of hydronephrosis), and enlarged pelvic/paraaortic lymph nodes suspicious of metastases were assessed on  images, supported by T1W and DWI sequences when available (Fig. 1). Diagnostic criteria of parametrial invasion were full-thickness cervical stroma invasion co-occurring with spiculated or nodular tumor-to-parametrium interface and/or encasement of parametrial vessels. Vaginal invasion was defined as tumor disruption of the vaginal wall, and bladder/ rectum involvement was diagnosed when the bladder or rectal wall was interrupted with tumor nodules in the mucosa. Pelvic-sidewall invasion was defined as tumor extending into the iliac vessels, internal obturator, piriformis, or levator ani muscles. Pelvic/paraaortic lymph nodes were considered suspicious of metastases if they had > 1 cm short-axis diameter [26].
To establish the overall imaging findings based on the recordings by all three readers, "consensus reading" variables were generated using the median values recorded for the continuous variables and the category recorded by the majority for the dichotomous variables.
To ensure a common understanding of the image reading criteria applied, the readers and an expert in gynecologic cancer imaging (I.S.H.) independently filled in the registration form for five randomly selected pilot cases prior to the review of the entire patient cohort. Disagreements in interpretation were discussed to reach a consensus.
To compare the diagnostic performance of the different imaging parameters for prediction of disease-specific death at 5 years after primary diagnosis, time-dependent receiver operating characteristic (ROC) analyses were used. The prognostic value of the different imaging parameters was explored using the Cox proportional hazard model and Kaplan-Meier with log-rank tests. Chi-square test was used to analyze the imaging parameters in relation to clinicopathological patient characteristics. Test of equal area under the ROC curves (AUC) among the three readers and the consensus reading, and among the different MRI-derived staging parameters (consensus reading), was performed using 6 and 15 pairwise comparisons of AUCs, respectively. p values were adjusted according to the Holm-Bonferroni method, yielding significance levels less than 0.008 (0.05/6) and 0.005 (0.05/10), respectively. All other p values were considered significant when less than 0.05 (two-sided). The data were analyzed using R 4.0.3 (TimeROC package [28], R Core Team 2020 [29]), SPSS 26.0 (IBM Corp.), and STATA 16.1 (StataCorp).  (Table 3). For the remaining staging parameters, agreement was only moderate or fair.

Imaging parameters and prediction of survival
Time-dependent ROC curves for predicting disease-specific death at 5 years for the different MRI-derived staging parameters (consensus reading) yielded AUCs ranging from 0.61 to     (Fig. 4). The MRI-derived staging parameters large tumor size (in three categories: ≤ 2 cm; > 2 and ≤ 4 cm; > 4 cm), parametrial invasion, vaginal invasion, enlarged lymph nodes suggestive of metastases, and bladder/rectum invasion were associated with reduced disease-specific survival (p < 0.001 for all) ( Table 4). However, in a multivariable model including the same imaging variables, only tumor size and bladder/rectum invasion independently predicted poor survival, whereas only tumor size remained significant when adjusting for patient age, histologic type, and primary treatment received (Table 4). When grouping patients according to tumor size categories (≤ 2 cm; > 2 and ≤ 4 cm; > 4 cm), higher tumor size category yielded a stepwise reduction in disease-specific and progression-free survival (p ≤ 0.001 for all) (Fig. 2b and Suppl. Figure 1b).

MRI-derived assessments of tumor size > 4 cm and parametrial invasion refines prognostication
Patients with MRI-derived tumor size > 4 cm or parametrial invasion were more frequently diagnosed with squamous histology (Table 5). In patients with recordings on clinical tumor size (≤/> 4 cm) (n = 230), 75% (172/230) had the same tumor size category on MRI (Table 5 (Table 5). Furthermore, patients with clinical tumor size ≤ 4 cm but MRI-based tumor size > 4 cm had lower diseasespecific and progression-free survival than patients with both clinical-and MRI-derived tumor size ≤ 4 cm (p < 0.001) (Fig.  2c and Suppl. Figure 1c).

Discussion
Since 2018, staging information from diagnostic imaging has been formally incorporated in the FIGO system for cervical cancer, and routinely guides choice of treatment. We observed substantial interobserver agreement for most MRI-derived staging parameters, supporting the robustness of MRI staging in the 2018 FIGO system. Large MRI-measured tumor size, using the 2018 FIGO size categories, was associated with a stepwise reduction in disease-specific and progression-free survival, confirming the strong prognostic impact of tumor Thus, this study demonstrates that the substantial interobserver agreement of MRI at primary diagnostic work-up in cervical cancer can translate into better prognostication, which is promising for the role of MRI in treatment tailoring. Subjectivity in image interpretation may lead to variability that affects overall test reproducibility [30], and the interobserver agreement for important imaging findings is critical for the validity of an imaging method [31]. To our knowledge, this is the largest and most comprehensive study on interobserver agreement for pelvic MRI staging parameters in cervical cancer to date. Interestingly, maximum tumor size was the parameter yielding the highest interobserver agreement (overall κ = 0.76-0.80 for different size categories), being higher than that reported (κ = 0.46) in a previous smaller (n = 152) MRI study [21]. For parametrial invasion, vaginal invasion, and enlarged lymph nodes, we also identified substantial interobserver agreement (overall κ = 0.63, κ = 0.61, and κ = 0.63, respectively), being within the wide range of that previously reported (κ = 0.36-90) [19,[22][23][24].
Of notice, former studies assessing interobserver reproducibility for MRI-based cervical cancer staging parameters have used surgicopathological findings as reference standard, thus    [19,[21][22][23][24]. Hence, the lower prevalence of positive staging parameters for advanced FIGO stages in these studies makes these interobserver agreement metrics not necessarily comparable with that of the present study. Nevertheless, it seems reasonable to conclude that the interobserver agreement for central MRI staging parameters in this study is within the higher range of that previously reported. Importantly, clinical staging by pelvic examination under anesthesia reportedly yields lower agreement for assessing tumor size (κ = 0.42), parametrial invasion (κ = 0.31-0.43), and vaginal invasion (κ = 0.47-0.57) [32], thus indicating that MRI staging is more reproducible than clinical staging. The inclusion of nodal status in the 2018 FIGO update reflects the importance of lymph node metastases as a pivotal prognostic factor and determinant of treatment algorithm in cervical cancer [6,15,33]. Notably, in the present study, MRI-assessed enlarged lymph nodes suggestive of metastases predicted reduced disease-specific survival in univariable analysis (p < 0.001), however, only tended to the same in multivariable analysis (p = 0.12). MRI has known limitations in diagnostic accuracy for diagnosing lymph node metastases and is reportedly being surpassed by FDG-PET/CT [34]. Limitations in accuracy of MRI may be due to the size criterion employed for pathologic lymph nodes, thus by definition missing the smaller lymph node metastases, and to challenges in distinguishing metastatic enlarged nodes from hyperplastic enlarged nodes [14,35]. Thus, the lack of an independent prognostic impact of enlarged lymph nodes in this study may be explained by limited accuracy of MRI for lymph node staging. Importantly, although pathology is regarded as the reference standard for diagnosing lymph node metastases, the FIGO 2018 system allows the use of imaging, as it is non-invasive and easier to perform than surgical lymph node sampling [14,25].
The updated 2018 FIGO stage IB comprises three subgroups (IB1-3) for tumor size ≤ 2 cm, > 2 and ≤ 4 cm, and > 4 cm, respectively [25]. Interestingly, we found that a higher tumor size category was linked to a stepwise reduction in disease-specific survival (p ≤ 0.001 for all) and that tumor size > 4 cm yielded the highest AUC (AUC = 0.77) among all MRI staging parameters for the prediction of disease-specific death. These findings are consistent with the growing body of literature uniformly reporting a strong association between large tumor size and poor prognosis in cervical cancer [6,[8][9][10][11][12][13].
Notably, tumor size > 4 cm and parametrial invasion at MRI were associated with aggressive clinicopathological features, and patients clinically staged as negative for these findings but with positive MRI findings had significantly reduced survival. Furthermore, incorporating MRI-derived information on tumor size (≤/> 4 cm) and parametrial invasion to the clinical 2009 FIGO staging would have resulted in an upstaging of 32% (50/155) and 24% (70/296) of the patients, respectively. Importantly, the evaluation of tumor size [20] and parametrial invasion [16] by MRI reportedly yields higher agreement with pathology than that of clinical assessment in cervical cancer, supporting that staging by MRI produces more accurate stage designation than clinical staging. This study has some limitations. First, the MRI examinations were performed during 2002-2017 using various scanners and protocols, which may have affected our results. However, the demonstrated robustness of MRI for staging and prognostication despite these technical variations makes it more likely that our findings are generalizable, and that this study set-up more accurately mimics the value of MRI in a standard setting. Second, the study of interobserver reliability could have been more extensive and ideally included more readers with variable levels of expertise from different institutions. Third, we did not assess intraobserver variability, which is normally lower than the interobserver variability. Lastly, since a large proportion of the patients did not undergo surgery, our study is based on the assessment of agreement without surgicopathological reference standard, hence not indicative of diagnostic accuracy.
In summary, substantial interobserver agreement of MRIbased 2018 FIGO staging parameters supports the robustness of MRI staging in the 2018 FIGO system. Furthermore, the inclusion of MRI staging parameters into stage assignment yields refined risk stratification compared with former clinical 2009 FIGO staging. This study thus demonstrates the potential impact of MRI enabling better prognostication and treatment tailoring in cervical cancer.