Non-contrast MRI can accurately characterize adnexal masses: a retrospective study

Objective To determine the accuracy of interpretation of a non-contrast MRI protocol in characterizing adnexal masses. Methods and materials Two hundred ninety-one patients (350 adnexal masses) who underwent gynecological MRI at our institution between the 1st of January 2008 and the 31st of December 2018 were reviewed. A random subset (102 patients with 121 masses) was chosen to evaluate the reproducibility and repeatability of readers’ assessments. Readers evaluated non-contrast MRI scans retrospectively, assigned a 5-point score for the risk of malignancy and gave a specific diagnosis. The reference standard for the diagnosis was histopathology or at least one-year imaging follow-up. Diagnostic accuracy of the non-contrast MRI score was calculated. Inter- and intra-reader agreement was analyzed with Cohen’s kappa statistics. Results There were 53/350 (15.1%) malignant lesions in the whole cohort and 20/121 (16.5%) malignant lesions in the random subset. Good agreement between readers was found for the non-contrast MRI score (к = 0.73, 95% confidence interval [CI] 0.58–0.86) whilst the intra-reader agreement was excellent (к = 0.81, 95% CI 0.70–0.88). The non-contrast MRI score value of ≥ 4 was associated with malignancy with a sensitivity of 84.9%, a specificity of 95.9%, an accuracy of 94.2% and a positive likelihood ratio of 21 (area under the receiver operating curve 0.93, 95% CI 0.90–0.96). Conclusion Adnexal mass characterization on MRI without the administration of contrast medium has a high accuracy and excellent inter- and intra-reader agreement. Our results suggest that non-contrast studies may offer a reasonable diagnostic alternative when the administration of intravenous contrast medium is not possible. Key Points • A non-contrast pelvic MRI protocol may allow the characterization of adnexal masses with high accuracy. • The non-contrast MRI score may be used in clinical practice for differentiating benign from malignant adnexal lesions when the lack of intravenous contrast medium precludes analysis with the O–RADS MRI score. Supplementary Information The online version contains supplementary material available at 10.1007/s00330-021-07737-9.


Introduction
The accurate characterization of adnexal masses is critical to guide appropriate patient management. Ultrasound is the primary imaging modality in women with a clinically suspected adnexal mass with 82-92% accuracy [1]. However, approximately 5-20% of adnexal masses remain uncharacterized following US [2,3]. For these indeterminate masses, although short-term follow-up is an option, MRI is the imaging modality of choice for a rapid characterization [4][5][6]. Previous papers with differing MRI protocols reported excellent accuracies with a range of 88-93% for the diagnosis of malignancy [7][8][9].
In 2013, Thomassin-Naggara et al published the ADNEX MR scoring system for risk stratification in adnexal masses [10]. This score proposes a uniform dynamic contrastenhanced (DCE) MRI protocol and standardized interpretation to be used across centers. Following its publication, the ADNEX scoring system was tested on 1340 women in a prospective multicenter clinical study and integrated into the O-RADS MRI scoring system [11]. The O-RADS MRI score was found to have a sensitivity of 93% and specificity of 91% for stratifying the risk of malignancy in adnexal masses [11]. It is the most comprehensive guidance in the current literature for the characterization of adnexal masses on MRI.
The O-RADS MRI score relies on intravenous gadoliniumbased contrast agents (GBCAs) for the assessment of the dynamic enhancement curve [11]. A non-contrast MRI study, therefore, yields an O-RADS MRI score of 0 (incomplete study) [12]. Although the assessment of an adnexal mass using the O-RADS MRI score is recommended by the American College of Radiology [12], there may be situations where avoiding contrast-enhanced MRI is preferable due to logistical and patient factors. Acquisition of the DCE protocol proposed by Thomassin-Naggara et al [10,11] significantly extends the MRI examination which may be a challenge in certain patients or clinical scenarios. Although Pereira et al recently proposed a simplified dynamic MRI protocol including 5 post-contrast phases with 30-s delays [13,14], off-line post-processing of dynamic imaging still contributes significantly to the workload. In addition, there are significant concerns regarding the administration of GBCAs in relation to the development of nephrogenic systemic fibrosis and the potential impact of long-term gadolinium retention in a range of tissues and organs [15]. Recently, the Royal College of Radiologists UK published their position statement and recommendations emphasizing that GBCAs should only be used when essential diagnostic information cannot be obtained with unenhanced scans [16,17].
Therefore, the aim of our study was to evaluate the performance of a non-contrast gynecological MRI protocol in the characterization of adnexal masses and analyze the reproducibility and repeatability.

Patients and study setting
This single-institution retrospective study was approved by the institutional review board, with the need for informed consent for data analysis waived. Reports of all consecutive gynecological MRI scans from the 1 st of January 2008 to the 31 st of December 2018 (n = 8242) were reviewed. Inclusion criteria were as follows: (1) adult female patients with gynecological MRI performed for adnexal mass characterization or follow-up as recorded on Picture Archiving and Communications System using standard imaging sequences of the female pelvis (Supplementary Table S1), (2) confirmed histopathological diagnosis or at least one-year stability on imaging follow-up. The study flow is summarized in Fig. 1. The final cohort included 291 patients with 350 adnexal masses. From this cohort, 102 patients with 121 adnexal masses stratified for their malignancy status were chosen randomly, to evaluate reproducibility and repeatability of radiologists' assessments. The patients' electronic health records were reviewed and CA125 levels were noted, if available. All patients were diagnosed, treated or followed up in the same gynecology oncology department which is a specialist cancer center for gynecological malignancies.

MRI protocol
MRI examinations were performed on 1.5-T (MR HDx, MR450 Discovery, MR 450 W Optima) and 3-T (MR750 Discovery) MRI systems (all GE Healthcare) using 8-32 channel phased array body coils. Unless contraindicated, hyoscine butylbromide (Buscopan®, Sanofi), 20 mg, was administered i.v. prior to the imaging to reduce peristaltic movement. The MRI protocol for characterization of adnexal masses included sagittal, axial and coronal T 2 -weighted fast spin-echo sequences and axial T 1weighted gradient-echo sequences with and without fatsuppression (LAVA-Flex implementation of Dixon method), followed by diffusion-weighted imaging (DWI) with b values of 0 and 800-1000 s/mm 2 (Supplementary Table S1). Apparent diffusion coefficient (ADC) maps were calculated. The MRI protocol used in this study included only non-contrast sequences, post-contrast imaging was not available and complementary use of gadolinium was not assessed.

Image interpretation and analysis
Two consultant radiologists, with 6 and 8 years of experience in gynecological imaging, took part in image interpretation. Reader 1 assessed the whole cohort (n = 291 patients) whilst readers 1 and 2 assessed the random subset (n = 102 patients) independently following a 2-month interval to avoid recall bias. The numbers within the cohorts were statistically calculated by power analysis to calculate a sufficient number of patients to assess reproducibility. When multiple masses were present, each lesion was described separately. The readers were blinded to all information except age and CA125 levels.
Morphological features and DWI signal intensity (SI) of the adnexal mass and important accompanying features (ascites, lymphadenopathy or peritoneal implants) were re-evaluated. Morphological MRI features were the presence of a simple cystic mass, purely endometriotic mass, fatty mass, solid mass, multiple septations, thick or irregular septations and solid tissue in the mass. Lymphadenopathy was defined as the enlargement of lymph nodes in the short axis more than 8 mm in the pelvis and 10 mm in the para-aortic region. Solid tissue suspicious for malignancy was defined as tissue within the adnexal mass displaying intermediate SI on T 2 -weighted images, low SI on T 1 -weighted images with corresponding restricted diffusion. True diffusion restriction was defined qualitatively as high SI on high b value DWI images and low SI on ADC map. SI of the solid tissue on T 2 -weighted images and ADC map was defined relative to skeletal muscles whilst on DWI, it was compared to cerebrospinal fluid.
Readers assigned a score to each adnexal mass using the proposed non-contrast MRI score (Fig. 2). The ADNEX MR [10] and O-RADS [11] score informed this 5-point scale. We proposed the following non-contrast MRI score (Table 1): 1 = no adnexal mass present; 2 = benign/likely benign and a specific diagnosis was assigned (e.g. endometrioma, dermoid or ovarian fibroma); 3 = indeterminate; 4 = suspicious for malignancy and 5 = highly suspicious for malignancy, i.e. at least one other feature (such as peritoneal implant, ascites or lymphadenopathy) in addition to score 4.
In addition to scoring each adnexal mass, readers assigned a specific diagnosis according to their individual evaluation. Agreement between the final and reader assigned diagnosis for the whole cohort and random subset was analyzed.

Reference standard
Histopathological diagnosis or imaging follow-up for at least 1 year served as the reference standard. Final diagnoses at histopathology were categorized into normal ovary, benign, borderline or malignant disease.

Statistical analysis
Categorical variables were expressed as absolute numbers and percentages. Continuous variables were described either as median and interquartile range or mean and standard deviation, according to their distribution. T test or the Mann-Whitney test for continuous and chi-squared or Fisher's exact test for categorical variables were used to compare MRI Receiver operating characteristic (ROC) curves, the areas under the curves (AUC) and all conventional measures for diagnostic test accuracy were calculated to assess the noncontrast MRI score's prediction of the reference standard. The final diagnosis was grouped as a binary variable and borderline disease was included in the malignant group.
According to a pre-defined cut-off value of score 4, the noncontrast MRI score was dichotomized (score ≥ 4 as malignant) to evaluate sensitivity, specificity, accuracy and positive likelihood ratio (PLR). To evaluate inter-reader and intra-reader agreement, Cohen's kappa (к) coefficients and weighted к coefficients were computed for ordinal variables. Where appropriate, 95% confidence intervals (CI) were calculated. As described above, a random sample of 102 patients was used to assess the agreement between two raters. Power analysis was

Study population and adnexal mass characteristics
Overall, 1158 patients underwent gynecological MRI between 2008 and 2018 for characterization or follow-up of an adnexal lesion in our institution. A total of 291 patients were included in the study after exclusions shown in Fig. 1. Patient and lesion characteristics are listed in Table 2.
In total, 250/291 (85.9%) patients underwent surgery; the remainder were followed up with US and/or MRI over a median period of 21 months (interquartile range 15-28 months) ( Table 2).
Amongst the benign masses (n = 297), 243 (81.8%) were confirmed surgically whilst 54 (18.2%) were defined as benign on the basis of stability (n = 26 masses) or resolution (total resolution in 15, decrease in size in 13 masses) on imaging follow-up.

Assessment of individual MRI features
For score 2 (benign/likely benign) lesions, 184/281 (65.4%) lesions were comprised of purely endometriotic, fatty or simple cystic masses. The distribution of individual MRI features amongst lesions is given in Table 4.

Failure analysis
The failure analysis of the cases which were incorrectly classified on imaging by either reader revealed that false-negative diagnoses occurred exclusively in lesions with mucinous and serous cystic fluid and multiple septations. False-positive diagnoses occurred primarily in purely solid masses. The association of highly discriminatory imaging features with the clinical parameters and the reference standard diagnosis of malignancy is illustrated in Supplementary Fig S2.

Comparison between final diagnosis and specific reader diagnosis
For the whole cohort and random subset, agreement between the final diagnosis and specific diagnosis of the readers was excellent (91.4-95.9%) ( Table 5). For the whole cohort, 320/350 (91.4%) masses were given the correct specific diagnosis. The readers were not able to give a specific diagnosis in 2 lesions (1 follicular lymphoma, 1 benign germ cell tumor without fat  TOA tubo-ovarian abscess *Unless otherwise specified, data are numbers of masses, with percentages in parenthesis # Benign non-ovarian cyst refers to paraovarian cyst or peritoneal inclusion cyst § Two adnexal masses which were scored as primary adnexal masses resulted as metastases after histopathological assessment component). The inter-reader and intra-reader agreement for the specific diagnosis of the adnexal mass was excellent at 96.7% (к = 0.88, 95% CI 0.76-0.97). Only 4/121 masses were given different diagnoses in the random subset (2 borderline tumors, 1 benign stromal tumor and 1 benign germ cell tumor).

Discussion
This study evaluated the accuracy of characterizing adnexal masses using a non-contrast MRI protocol. Benign adnexal masses constituted most cases (84.4%) which reflect our referral population for MRI characterization. In this group, our results indicate that this protocol can correctly classify adnexal masses into benign or malignant with high accuracy (94.2%) and high PLR (21) for malignancy. Additionally, there was excellent inter-and intra-reader agreement with high reproducibility and repeatability of the score. In our study, amongst adnexal masses with a score of 2, 80.7% underwent surgery, of which 97.8% were confirmed to be benign. All patients with score 3, 4 and 5 underwent surgery, with the corresponding malignancy rates as 25%, 71% and 86%, respectively. These results may therefore help to provide appropriate management stratification using morphological imaging assessment of the adnexal mass by experienced radiologists in a tertiary center. Continuous efforts have aimed to standardize pre-operative assessment of adnexal masses in women for the last 20 years  In definition of non-contrast MRI score, scores 2, 3, 4 and 5 refer to benign/likely benign, indeterminate, suspicious and highly suspicious masses, respectively *Unless otherwise specified, data are numbers of lesions with the relevant MRI feature, with percentages in parenthesis # Thick or irregular septa was not evaluated for score 2 lesions a DWI signal of the solid tissue is evaluated on high b value (800-1000s/mm 2 ) DWI images Table 5 Comparison of interreader and intra-reader agreements for specific diagnosis of the readers and agreement between the final diagnosis and specific reader diagnosis CI confidence intervals *Reader 1 has assessed the random subset at two different time points which are written as Reader1 first and reader 1 second § Reader 1 first -reader 2 comparison refers to inter-reader agreement # Reader 1 first -reader 1 second comparison refers to intra-reader agreement [18,19]. Although several attempts at integrating clinical criteria, biochemical parameters and model-based US evaluation for the stratification of ovarian lesions according to malignancy have been made [20][21][22], only one standardized MRI scoring system for adnexal masses has been developed so far [10,23]. This scoring system, the ADNEX MR score [10], has high sensitivity (93.5%) and specificity (96.6%), which is supported by external validation studies [13,24,25]. The subsequently developed O-RADS MRI score with multicenter prospective data also has excellent sensitivity, specificity, accuracy and PLR for malignancy (93%, 91%, 92% and 10.9 for experienced readers, respectively) using the same MRI protocol (except temporal resolution, 2.4 s vs. 15 s) and technique in image and data interpretation [11].
The O-RADS study therefore provides the benchmark for adnexal mass characterization [11]. The O-RADS scoring system relies upon the addition of intravenous contrast medium to assess the enhancement of the lesion using dynamic curve analysis. We recognize that there may be circumstances in which the addition of intravenous contrast medium is not possible and therefore we set out to address if an adnexal mass could be accurately characterized in a protocol without the addition of intravenous contrast medium. We have proposed a simple 5point scoring system. This non-contrast MRI score is aimed to be a practical qualitative score using morphological assessment and basic comparison of tumoral signal intensities on T2, DWI and ADC map with reference to standard tissues. A proper definition of solid tissue is the principal of this score to suspect malignancy which could be clinically relevant. We achieved 84.9% sensitivity, 95.9% specificity, 94% accuracy and PLR of 21 without performing DCE-MRI. Despite the assumption that if the ADNEX MR score is utilized without dynamic contrast-enhancement, the specificity for malignancy would fall below 90% [3], we found a high specificity with our MRI protocol and image interpretation based on morphology and qualitative DWI interpretation. This also emphasizes the need for high-quality DWI in this setting. Moreover, according to PLR of malignancy, score 3 (indeterminate) in the non-contrast MRI score correlated to between the low-and intermediate-risk categories in O-RADS, whilst scores 4 (suspicious) and 5 (highly suspicious) corresponded to the O-RADS high-risk category. This result demonstrates that non-contrast MRI score may be able to interrogate further the suspicion of malignancy which may help management stratification.
There were five missed malignancies in the score 2 group in our study. These included two borderline lesions (one mucinous cystadenoma, one borderline serous cystadenofibroma), one case with microscopic serous carcinoma foci within a serous cystadenofibroma and two cases of clear cell foci arising within endometriomas. These cases demonstrate the difficulties of imaging interpretation of small foci of malignancy arising within benign lesions and are a reflection of known imaging interpretation pitfalls in complex cases such as endometriosis. It is not possible for us to speculate whether the addition of intravenous contrast medium would have allowed visualization of these foci of malignancy. But it remains an important message that imaging interpretation in borderline malignancy and endometriosis can be challenging and extra-caution is required. Although we did not aim to make a differentiation between borderline and malignant tumors, a more detailed analysis of the signal could potentially show differences between those tumors which could also be scientifically relevant as well as of translational clinical benefit. In the current literature, there are few studies that address this topic and which are focused mostly on DWI parameters [26][27][28][29].
A non-contrast protocol, with a shorter acquisition time, reducing workload for radiology departments, is highly desirable in high volume referral hospitals and in patients where a shorter protocol is likely to confer a diagnostic study. Our proposed non-contrast MRI protocol (average acquisition times 18 min and 12 min for 1.5 T and 3.0 T, respectively) may enable diagnostic centers to save time, which may be critical in some instances, especially at high magnetic field strengths. Furthermore, the use of GBCAs is not without risk and it increases the total cost of the study [30]. These issues also make a non-contrast MRI protocol preferable. Concordantly, our results support that in instances when the administration of intravenous contrast medium is not possible, not preferred or should be avoided, a non-contrast MRI protocol and scoring system can safely classify adnexal masses into benign or malignant with high reproducibility and repeatability.
Our study reached excellent inter-and intra-reader agreement for the specific diagnosis of adnexal masses. The study of Thomassin-Naggara et al showed that individual MRI features of adnexal masses show variable inter-reader agreement which was lowest in the assessment of grouped and thickened septa [10]. Similarly, our study found the lowest inter-and intra-reader agreement for thick or irregular septa, reflecting the difficulties in septa evaluation. Our results also demonstrated that septa evaluation: namely multiple septa and thick irregular septa gave a lower intra-reader agreement than interreader agreement also reflecting that this is a difficult area of evaluation. Although kappa values for intra-reader agreement were over 0.70 for those features, our results show that evaluation of these features may sometimes be challenging. We believe that this is an important limitation to recognize as these features can be difficult to use as markers of malignancy as benign lesions may have multiple septae but due to the overall morphology and signal characteristics, the radiologist can interpret the diagnosis correctly as a cystadenofibroma for example. Discrepancies gain importance when individual MRI features are used in structured reporting or as decision criteria of a scoring system. In our opinion, looking at overall morphology to make a diagnosis is more valuable; this was the case in our study where the agreement rates were over 95%. When the readers assessed the case as a whole and gave a specific diagnosis, the inter-reader and intra-reader agreements were very high (kappa 0.88). We believe that this result supports the radiologist's evaluation of the entirety of the lesion as a whole rather than too much reliance on single factors.
Our study has several limitations. Firstly, it was a retrospective single-center study. Secondly, long-term follow-up (i.e. more than 2 years) was not available for patients who did not undergo surgery. However, median imaging followup was 21 months and lesions were considered benign if they resolved, decreased in size or stayed stable. Thirdly, the distribution of types of masses differed slightly from previous scoring studies [10,11]. Although we had fewer malignant lesions in comparison to the ADNEX [10] and O-RADS [11] studies (15.1% vs. 18.8% vs. 18.4%, respectively), the percentage of borderline cases, which can create challenges, was pretty similar (4% vs. 3.6% vs. 3%, respectively). Still, our proposed score achieved high accuracy (94% vs. 96% vs. 92%, respectively). Fourth, the assessment was done by readers with experience in gynecologic oncologic imaging which may create a limitation in global standardization of this score. Nevertheless, our results suggest that given the appropriate training and support, qualitative radiology reporting and assessment of adnexal masses could potentially be taught to radiologists reporting these studies. Lastly, an external validation of our proposed non-contrast scoring was not performed, and such validation would be crucial to support future adoption of this score into wider clinical practice.
In conclusion, our study shows that non-contrast MRI has high accuracy and excellent inter-and intra-reader agreement for characterization of adnexal masses. This suggests that a morphological and qualitative DWI assessment by radiologists with experience in gynecological imaging can be an alternative to safely guide patient management when intravenous contrast medium and a dynamic curve assessment for the formal O-RADS score cannot be provided.