Introduction

Ovarian cancer is a significant health concern worldwide, accounting for 4.4% of entire cancer-related mortality in women [1, 2]. High mortality rates can be attributed to late-stage diagnosis, limited treatment options, and disease recurrence seen in 80% of advanced cancers, with no recommended screening strategy in place [3]. Traditional prognostic factors, such as histological subtype and tumour stage, have provided insight into patient outcomes; however, there is a growing need for more accurate and personalised prognostic tools to guide treatment decisions and improve overall patient management [4,5,6].

Radiomics, a rapidly evolving field within medical imaging, holds great promise in enhancing the prognostic evaluation of ovarian cancer [7]. Following acquisition of images, raw imaging data must be pre-processed via manual, semi-automated, or fully automated machine learning methods to facilitate segmentation of the regions of interest (ROI) [8,9,10]. Radiomic features can then be extracted from the ROIs using various feature extraction software, such as PyRadiomics, Computational Environment for Radiological Research (CERR), and Image Biomarker Explorer (IBEX) [11,12,13,14]. Specific features can be broadly subdivided into textural, morphological, and functional radiomics [15]. By extracting and analysing quantitative features from medical images, radiomics offers the potential to uncover hidden patterns and relationships that can serve as predictive markers [16]. In the context of ovarian cancer, radiomics can provide valuable insights into tumour heterogeneity, microenvironment characteristics, and treatment response [17, 18]. The integration of radiomics-based predictive models into clinical practice has the potential to facilitate individualised treatment strategies and improve patient outcomes [19].

Whilst several studies have investigated the role of radiomics in predicting disease recurrence in various cancer subtypes with promising performance, its specific application in ovarian cancer is an area of ongoing research [20,21,22,23]. The aim of our systematic review is to comprehensively assess the existing literature regarding the role of radiomics as a predictor of disease recurrence in ovarian cancer. Moreover, we aim to explore the potential clinical implications and future directions of radiomics in the management of ovarian cancer, ultimately laying the groundwork for future research to the development of more effective and personalised treatment strategies.

Methods

Study design and reporting guidelines

This study is a systematic review of retrospective studies and follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guidelines. [24]

Search strategy

The following databases were searched as part of the systematic review in August 2023: Medline, EMBASE, and Web of Science. The systematic search process with detailed search terms are outlined in the supplementary material S1. The last date of search was 11th August 2023. The grey literature (conference abstracts and dissertations) was also searched to further identify other suitable publications.

Eligibility criteria

Studies were assessed for eligibility based on the following inclusion criteria. Studies investigating the use of radiomics to predict post-operative recurrence in patients undergoing primary debulking surgery with ovarian cancer were included in our analysis. Case reports, case series and conference abstracts were excluded. Radiomics features extracted from computed tomography (CT) and magnetic resonance imaging (MRI) were included.

Study selection, data extraction & critical appraisal

A database was created using the reference managing software EndNote X9™. Two researchers (NOS and HCT) reviewed outputs from the searches independently of each other.

Initially, duplicates were removed. Study titles were then screened and assessed for potential relevance. The abstracts of selected potential studies were then read and assessed for eligibility for inclusion, based on the inclusion/exclusion criteria detailed above. Rejected studies were grouped together in the database by their reason for exclusion. The full texts of the abstracts deemed eligible for inclusion were then further analysed using the same criteria.

In order to extract and store data efficiently, the Cochrane Collaboration screening and data extraction tool, Covidence, was used [25]. Data were collected by two reviewers (NOS and HCT) independently, using the following headings; study details, study design, population, intervention, comparison groups and outcomes. Conflicts on study selection and data extraction between the two reviewers (NOS and HT) were resolved following an open discussion and final decision by senior author (MK).

A critical appraisal of the methodological quality and risk of bias of the included studies was performed. The critical appraisal was completed by two reviewers independently. Quality assessment of the included studies was performed according the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) and Radiomics Quality Score (RQS) [26, 27]. A description of these tools, including methodological domains, is provided for the reader in the supplementary material.

Systematic review registration

Our systematic review was registered on PROSPERO in July 2023 (ID: CRD42023446290). [28]

Statistical analysis

Due to heterogeneity in primary outcome, a meta-analysis of included studies was not performed. Data have been presented qualitatively throughout the results.

Results

Search results

The literature search described above yielded a total of 295 results (supplementary material S1). Following the removal of 103 duplicates, 192 studies were screened. After the initial screen, 86 abstracts were reviewed and assessed for eligibility, of which 15 were selected for full-text review. From these fifteen full texts, a total of six studies met the inclusion criteria and were included in our analysis.

Methodological characteristics and quality of studies

All six of the included studies were retrospective in nature [29,30,31,32,33,34]. Table 1 summarises the methodological characteristics of the included studies. Data quality assessed using the RQS and QUADAS-2 tools was generally satisfactory. All studies were deemed low risk of bias as assessed by the QUADAS-2 tool, whilst 83% (n = 5/6) of included studies received an RQS score > 30%. A detailed explanation of the tools and breakdown of the results can be found in supplementary material (S2–S5).

Table 1 Study characteristics

Participant characteristics

The total number of participants from the thirteen included studies was 952. Five studies included both training and validation sets within their studies, whereas the remaining one study included only a training cohort. Overall, 548 patients constituted the training sets and 404 constituted the validation sets across included studies. Of the five studies incorporating validation sets, three were internal [31, 33, 34] and two were external [29, 32]. Basic participant characteristics are outlined in Table 2.

Table 2 Participant characteristics

Acquisition parameters

Magnetic resonance imaging (MRI) and computed tomography (CT) were the imaging modalities employed by three studies each. Full acquisition parameters are illustrated in Table 3.

Table 3 Scanning parameters

Development of signatures

Whilst precise radiomic feature extraction methods varied across included studies, a relatively similar workflow was followed across the board. Experienced radiologists performed manual segmentation of regions of interest (ROI) using ITK-SNAP software in all studies. Features were subsequently extracted from these regions of interest using radiomics software. Specific software utilised in each study is demonstrated in Table 4. Intra- and inter-observer variability were assessed for using the intraclass correlation co-efficient (ICC) in three studies [22, 29, 31]. Acceptable ICC values ranged from 0.75 to 0.85. Four studies reported imaging normalisation methods to account for variation in acquisition parameters [29, 31, 32, 34]. Feature reduction and selection are shown in Table 5.

Table 4 Software and performance
Table 5 Feature processing and nomogram construction

Performance of signatures

Performance of models estimated using the receiver operating characteristic (ROC) curve and summarised as the area under curve (AUC) ranged between studies. Table 4 illustrates the performance of each model in predicting primary outcome. All included studies had at least satisfactory performance in predicting post-operative recurrence. Three studies reported a comparison of model performance between the radiomics/combined model and clinical model alone. All three studies reported a substantial improvement in performance on incorporation of radiomics features into the developed nomogram. [29,30,31]

Discussion

Our review demonstrates the development and performance of radiomic-based nomograms to predict post-operative recurrence in patients with ovarian cancer from raw radiological imaging. All nomograms developed within included studies predicted their primary outcome with reasonably modest accuracy (AUC range 0.77–0.89 in validation sets). When assessing the quality of the included studies using the QUADAS-2 and RQS tools, we found that overall, the methodological quality and risk of bias were satisfactory. The studies were generally deemed low risk of bias according to the QUADAS-2 tool and a majority of studies received a high RQS score, indicating good reporting and methodological quality.

Although these findings show promise, heterogeneity amongst studies poses challenges for replication of results. To enable controlled external validation across different institutions, standardisation is necessary, including the use of open-source scans, segmentations, and code [26]. This standardisation will facilitate the eventual application of radiomics within the clinical setting [27, 35]. Our findings also suggest that the quality of the included studies in our review compares favourably to other similar radiomics research, providing confidence in the reliability of the reported results. [36,37,38]

In the broader context of radiomics research, there are several limitations that need to be acknowledged. One major limitation is the reliability and reproducibility of radiomic features, as they can be influenced by various factors, such as image acquisition protocols and segmentation variability [35]. Standardisation of imaging protocols and rigorous quality control measures are crucial to mitigate these limitations and ensure the reliability of radiomic features. Another challenge is the heterogeneity of radiomics workflows and feature extraction methods across different studies, making direct comparisons and meta-analyses challenging [16]. Efforts towards standardisation and the development of guidelines for radiomics research are essential to address these issues.

Regarding our own study, we acknowledge some minor limitations. Firstly, the limited number of included studies and their retrospective nature might introduce selection bias and impact the generalisability of the findings. Additionally, the variations in study designs, imaging modalities, and analysis techniques amongst the included studies may have influenced the overall results and their interpretation. The authors also understand the concern about potential selection bias due to the inclusion of studies only from one country. Upon analysis of the data, it is evident that all included studies in our systematic review originate from China. Whilst this may raise concerns about generalisability, it is essential to consider the available evidence within the context of the current literature landscape. Despite the geographical concentration of studies, the consistent performance of radiomics-based signatures across different cohorts within these studies suggests promising predictive ability. However, caution should be exercised in extrapolating these findings to diverse patient populations until further validation studies from other regions are conducted. Similarly, negative results may be difficult to publish, further contributing the potential publication bias. Future research efforts should aim to explore the applicability and generalisability of radiomics in predicting disease recurrence in ovarian cancer across various geographical settings.

The potential implications of a radiomics-based signature in clinical practice for predicting post-operative recurrence in ovarian cancer are significant [39]. Such a signature could provide clinicians with a reliable tool to assess individual patient risk and tailor treatment strategies accordingly [40]. The incorporation of radiomic signatures into a nomogram, combined with traditional clinical parameters, has the potential to enhance the accuracy of risk prediction and facilitate personalised treatment decision-making [41]. However, the implementation of radiomics-based nomograms in routine clinical practice may face challenges, including the need for standardised imaging protocols, robust feature extraction and selection methods, and validation prospectively in large-scale multicentre studies. [42,43,44]

Conclusion

Our review provides good evidence supporting the potential of radiomics as a predictor of disease recurrence in ovarian cancer. The included studies consistently demonstrated the ability of predicting post-operative recurrence, indicating the potential value of radiomics-based nomograms in improving risk stratification and guiding personalised treatment decisions. However, further research is warranted to validate its real-world benefit in terms of decision-making and patient selection to improve overall outcomes.