Background

High-grade serous ovarian cancer (HGSOC) is the most common type of ovarian cancer and the most lethal gynaecologic malignancy, with 4.32 per 100,000 women predicted to die from ovarian cancer in the European Union in 2022 [1]. The disease typically presents at an advanced stage with ascites and extra ovarian spread (i.e., peritoneal carcinomatosis) with multiple implants within the abdomen. Up-front treatment options comprise primary debulking surgery followed by platinum-based chemotherapy or neoadjuvant chemotherapy (NACT) with subsequent interval debulking surgery depending on disease extent and locations. Beyond chemotherapy, poly (ADP-ribose)polymerase and/or vascular endothelial growth factor inhibitors are used to treat ovarian cancer. Patient stratification and selection for the different clinical pathways are not entirely standardised across centres but also strongly depend on surgical expertise, training and facilities. Despite maximal surgical effort and molecular-driven maintenance therapy, many patients with ovarian cancer recur and eventually develop chemotherapy-resistant disease. Poor prognosis is underpinned by the loss of DNA repair mechanisms, resulting in high genomic intra-tumoural heterogeneity, early clonal evolution and rapid onset of chemoresistance [2, 3].

Currently, the evaluation of disease extent and response assessment in patients with HGSOC are based on the subjective analysis of cross-sectional imaging, i.e., computed tomography (CT) and/or magnetic resonance imaging (MRI). CT of the abdomen, pelvis, and often also the chest is a key component in the multidisciplinary discussion leading up to a recommendation of either NACT and interval debulking surgery or primary debulking surgery. CT-based criteria suggesting low chances of optimal cytoreduction and therefore favouring NACT over primary debulking surgery are bulky or multifocal disease along the large or small bowel and the mesentery, bulky disease at the porta hepatis, along the liver surface including the gallbladder fossa, and nonresectable diaphragmatic or thoracic disease [4]. However, these criteria are subjective and not homogeneously applied across centres. Response evaluation criteria in solid tumors (RECIST) 1.1 is the mainstay of response assessment on CT scans of patients undergoing NACT [5]. Despite efforts to develop objective response criteria such as RECIST 1.1, interobserver variability, and subjective selection of target lesions remain major challenges that may affect response classifications and frequently require centralised image interpretation for drug registration trials [6]. There is an urgent need to obtain more robust imaging biomarkers for a better understanding of the initial presentation and successive monitoring of HGSOC during treatment that can be used to more effectively tailor therapy and ultimately improve outcomes. Radiomics has emerged as a tool for large-scale quantitative feature extraction from standard diagnostic imaging and holds great potential for developing image-derived biomarkers [7]. Recent advances in computational power have facilitated the use of deep learning (DL) and convolutional neural networks to automate image analysis further for tasks such as lesion classification, prognostication and response prediction.

To date, the genomic heterogeneity that characterises ovarian cancer, and HGSOC in particular, can only be captured spatially by sampling multiple instead of single disease sites and temporally by sampling the tumour at different time points during treatment which is hardly acceptable by patients. However, studies that harness artificial intelligence (AI) for integrating molecular omics with quantitative and standardised image analysis hold the potential for unravelling imaging signs of molecular heterogeneity that can then be used for the development of improved predictive and prognostic biomarkers. Cancer cells suffer deregulations at multiple levels including deoxyribonucleic acid (DNA), ribonucleic acid (RNA), proteins and metabolites; therefore, integrated multiomics data analysis is essential to fully understand the complexity of cancer and to capture features relevant to prognostication and prediction making. Although it has been shown that these approaches yield higher predictive performance when compared to studies focusing on single omics [8], only a minority of studies use imaging and radiomics integrated into a wider multiomics approach. With this review, we set out to provide an overview of research that has taken on the challenge of integrating at least two data types into AI-based prediction or classification models for ovarian cancer.

Search strategy and eligibility criteria

Between October 2022 and January 2023, PubMed/MEDLINE, IEEE Xplore Digital Library, and Google Scholar were searched for studies that developed an AI algorithm for classification or prediction-making in ovarian cancer patients using multiple data types including radiological imaging. A controlled vocabulary supplemented with keywords such as radiomics, AI, imaging, multiomics, and data integration for ovarian cancer was used for the search. We limited the results to journal articles and conference proceedings. Conference abstracts, editorials, and letters to the editor were excluded from our search. Only English-language articles were considered.

AI: machine learning and deep learning

AI is a field of computer science where computers mimic human intelligence and attempt to perform certain tasks that normally require human cognition, such as problem-solving and decision-making [9]. The two main fields of AI, machine learning (ML) and deep learning (DL), have shown higher performance than traditional approaches in the molecular characterisation of cancer, prognostication, diagnosis, patient classification, and prediction-making in various cancer types, including ovarian cancer [10, 11].

In the traditional paradigm of programming, AI tools use manually created programs that use given input data to produce the desired output. ML uses algorithms to automatically and iteratively learn from those data to perform a certain task, thus giving computers the ability to learn without being explicitly programmed. DL is a subset of ML, inspired by human artificial neural networks, which tries to mimic the learning process of the human brain. In ML, feature extraction (the process of transforming raw data set into relevant features) is handled manually, while the feature extraction process is fully automated in DL.

Of the 34 studies reviewed, 88% used ML, whereas only four studies used DL alone [8, 12,13,14] and two studies [15, 16] combined ML with DL techniques (Table 2). Three of the four studies that used only DL [8, 13, 14] combined genomics, epigenomics and transcriptomics. So far, only ML techniques have been proposed for predicting complete surgical cytoreduction and residual disease [17,18,19] while both ML and DL methods have been used for other general predictive and diagnostic goals/applications such as prediction of survival, recurrence, response to neoadjuvant chemotherapy, and differentiation between malignant and benign cancers and various cancer subtypes.

AI and radiomics

Among the most frequently used ML algorithms for classification tasks in ovarian cancer are support vector machines, multilayer perceptron networks, decision tree, random forests, extreme gradient boosting (XGBoost), and logistic regression. These techniques make processing numerical features (such as clinical and demographic data or blood test results) relatively straightforward. However, medical images are multidimensional data and must be converted into numerical features to be used as input to such ML algorithms. Radiomics is a method for the computerised extraction of quantitative imaging features that allow the use of imaging data for machine learning; radiomics features are numerical descriptors of the shape, intensity, and texture of a structure such as a tumour on imaging [20]. Encoding the information captured on multidimensional images into radiomics feature vectors allows information from radiological data to be combined with other modalities (clinical data, genomics, etc.). Many radiomics features are highly correlated, therefore, feature selection methods are crucial as they improve the performance of the ML model, select the most relevant features and eliminate irrelevant and redundant features thus reducing the computational cost of modelling. Of all studies, 82% reviewed use feature selection as part of their models (Table 2).

One of the bottlenecks of many radiomics studies is the fact that radiomics feature extraction requires manual segmentation of the regions of interest, which is time-consuming and suffers from inter-observer variability. Differently, DL algorithms attempt to identify complex associations of original features using combinations of different deep neural layers and generate new features that can improve the performance of a particular classification task compared to the original features [14]. Convolutional neural networks are a common DL method and a type of network that is particularly appropriate for computer vision and image analysis for purposes such as image classification and automated feature extraction [21]. It is important to note that DL approaches require larger medical datasets compared to standard handcrafted radiomics and ML methods to efficiently fit the training model and produce the desired results.

Integration of different data types

The most frequently used data in the reviewed ovarian cancer studies are clinical and demographic data. These data, even if not universally considered as proper “omics data”, are widely available, easy to collect, and not expensive and could be integrated into clinical care more easily than predictive models based on far more expensive DNA or RNA sequencing, DNA methylation, or proteomic data.

Interestingly, 50% of the studies reviewed here benefitted from high-throughput information from radiomic data, mainly obtained from CT (27%) followed by MRI (21%) and less frequently from ultrasound (US) (3%) (Table 1). All of these studies combined radiomic signatures with clinical data, and most of them also with serum biomarkers, whereas only very few took into account genomics [22,23,24] or histopathology [16, 25].

Table 1 Categorisation of the reviewed studies based on their used data type and target of the study

In almost all studies reviewed, combining multiple data types improved the overall predictive performance compared with a single data type, with the only exception of Wang et al. [12], whose DL model based on CT data alone achieved higher prognostic performance in validation cohorts, in terms of accuracy and area under the curve (AUC) when compared to the combined model (clinical information and DL features). This result confirms the importance of integrating multiple data types to enrich the data space used by AI techniques (Fig. 1).

Fig. 1
figure 1

Multiomics studies in ovarian cancer to date use (circles from left to right) clinical data, serum biomarkers like cancer antigen 125 (CA-125), imaging including computed tomography (CT), magnetic resonance imaging (MRI), and ultrasound (US), genomics, epigenomics, transcriptomics, proteomics, and pathology data for the development of artificial intelligence-based cancer subtyping, lesion classifiers, and models for predicting patient outcome including response to chemotherapy, complete surgical cytoreduction, and survival (bottom panel)

We found seven studies [17, 18, 26,27,28,29,30,31] that combined clinical characteristics and serum biomarkers for different research tasks (Tables 1 and 2). These studies used a variety of clinical features, including age, year of diagnosis and surgery, performance status, histological type, tumour grade and stage, timing of surgery, presence of ascites, site of bulky disease at surgery, size of the largest tumour deposit, tumour location, tumour diameter, outcome of surgery, and residual tumour size after initial surgery. In addition, various serum biomarkers such as cancer antigen 125 (CA-125), cancer antigen 153, alphafetoprotein, carcinoembryonic antigen, and carbohydrate antigen 19–9 were investigated. In these seven studies, different ML algorithms were used for various objectives, including differentiation between benign and malignant tumours [26, 27, 30], prediction of complete surgical cytoreduction, survival [17, 18], determination of clinical stage, histotype, residual tumour burden, histopathological cancer type [28, 30], and prediction of critical care unit admission [29]. In three reports [26, 27, 30], the logistic regression, multilayer perceptron, and XGBoost algorithms were able to achieve an accuracy of 0.97, 0.98, and 0.96, respectively. In two reports [27, 30], no feature selection method was used. Two studies [17, 18] reported an accuracy of up to 0.87 and 0.73, respectively, for predicting complete surgical cytoreduction using XGBoost and artificial neural network algorithms. In two studies [28, 30], higher accuracy was obtained in distinguishing benign from malignant lesions (0.97 and 0.96, respectively) compared with clinical stage determination (0.76 and 0.68, respectively). Except for one study [27], all studies used data sets of more than 290 patients.

Table 2 Summary of the data set sizes, models, features selection methods and performance metrics for the same publications as shown in Table 1 (combining at least two data types for model development)

Imaging and other omics

To date, twenty studies have combined imaging data with other data types [12, 15, 16, 19, 22,23,24,25, 31,32,33,34,35,36,37,38,39,40,41,42] (Tables 1 and 2). Fifteen of these studies combined imaging data with clinical features and serum biomarkers [12, 15, 19, 31,32,33,34,35,36, 38,39,40,41,42]; two studies combined imaging data with clinical data and transcriptomics (gene expression; RNA) [22, 23]; two studies combined imaging data with clinical data, serum biomarkers, and histopathology data [16, 25]; and two studies combined imaging data with clinical data, serum biomarkers, and genomics (DNA and circulating tumour DNA) [24, 37]. CT [12, 15, 16, 22,23,24, 34,35,36] and MRI [19, 25, 38,39,40,41,42] were the most commonly used imaging modalities, while US and positron emission tomography (PET)/CT were used in only three studies [31,32,33]. Radiomics was used in all studies with imaging data, except for three studies that used either colour Doppler US and morphologic descriptors [31] and semantic CT features [34] or extracted DL features from images [12].

All studies used ML algorithms, except for one that used only DL algorithms [12]. Two studies used both ML and DL models on image data [15, 16].

The targets of the reviewed studies combining imaging data and other omics were distinguishing between benign, malignant, and borderline tumours [31, 32, 41, 42]; prediction of survival [22, 33, 34, 36]; prediction of recurrence and platinum resistance [12, 15, 22, 24, 25, 35, 39]; prediction of response to neoadjuvant chemotherapy [16, 37]; prediction of hypoxia [23]; prediction of peritoneal metastases [38, 40]; and prediction of complete surgical cytoreduction [19]. Integration of clinical variables with radiomics improved the performance of BRCA mutation prediction and achieved an AUC of 0.74 compared to only 0.62 on training data and 0.59 on test data for models based on radiomics alone [15]. The highest AUC (0.81) was also reported using a radiomics-clinical nomogram in [25] compared with radiomics or a clinical model alone. In [32], the AUC for US-based radiomics model combined with clinical features reached 0.91, compared to 0.88 for the radiomics model alone. The best performance (C-index 0.70, 95% confidence interval 0.66−0.74) was achieved by integrating clinical and PET radiomics features in [33] compared to using radiomics or clinical features alone. A higher AUC metric was also reported in [36] using a combined model (0.77) compared to using only radiomics (0.72) or clinical data (0.69). In [38], authors reported a higher performance using a radiomic-pelvic fluid-CA-125 model (AUC 0.94) compared with a model using radiomics alone (AUC 0.92). A higher AUC was also achieved using a combined model (0.78) compared to the clinical model (0.67) and multiradiomics model (0.74) [39]. A study [40] combined model based on radiomics and clinicopathological risk factors lead to an absolute increase of 5% in AUC compared with clinical data or radiomics only. A combined model based on radiomics and clinical features achieved an absolute increase in AUC of 10% and 15%, respectively, when compared to a clinical and radiomics model alone [41]. In addition, combining radiomics signatures with other types of data has also shown promise, e.g., a radiomics-histopathological model [16], a radiomics-clinical-genomic model [22], a radiogenomic model [23], a radiomics-clinicopathological-genomic model [24], and a radiomics-clinical-radiological characteristics model [42] showed higher performance compared to using radiomics, clinical, genomic, and histological models alone (details of model performance can be found in Table 2). However, there is still scope for optimising the different building blocks of radiomics pipelines to improve the AUC also of integrative multiomics predictors.

Genomics, epigenomics, transcriptomics, and other omics

We found seven more studies that integrated genomics and epigenomics (DNA and DNA methylation), transcriptomics (gene expression, RNA, and other omics data [8, 13, 14, 43,44,45,46] (Tables 1 and 2). Three studies combined genomics, epigenomics and transcriptomics for benign versus malignant differentiation [8], survival prediction [13], and subtyping [14]. Two studies combined genomics, epigenomics, and transcriptomics with clinical data to predict survival [43] and to predict response to neoadjuvant chemotherapy [45]. In addition, one study combined transcriptomics with clinical data to predict response to neoadjuvant chemotherapy [44]. In another study [46], the authors combined genomics, transcriptomics, pathology, and proteomics data to predict survival in ovarian cancer.

Of these seven studies, four [43,44,45,46] used ML and three [8, 13, 14] used DL. In [8], the authors reported that a combination of copy number variation, mRNA, and methylation data (AUC 0.96) outperformed using copy number variation (AUC 0.54), mRNA (AUC 0.94), and methylation (AUC 0.75) data alone. A multiomics model also achieved a higher performance (C-Index 0.571 ± 0.036, mean ± standard deviation) compared to models using single omics in [13]. In [44], an integrated model based on clinical information and gene expression data increased the accuracy by more than 19% and 45% compared to a model using only gene expression data on two different datasets. An AUC over 0.95 was achieved by integrating clinical and genomic variables [45]. Some authors [46] achieved the best accuracy using a multiomics model combining histopathological image features and genomics (AUC 0.91) compared with the model using only histopathological image features (AUC 0.70). In addition, an integrated multiomics model based on DNA methylation, copy number alteration, and RNA achieved a higher AUC (0.70) than when using methylation data (AUC 0.53), copy number alteration (AUC 0.64), and RNA data alone (AUC 0.66).

AI and multiomics-based heterogeneity analysis

Genomic studies of multiple samples from single patients have allowed detailed intra-patient inter-site heterogeneity studies and revealed the diverse patterns of clonal spread of HGSOC which are thought to shape the local tumour immune-microenvironment, to affect sensitivity to treatment and, therefore, to be prognostically relevant [2, 47, 48]. CT of the abdomen and pelvis is central to the clinical pathway in patients with HGSOC and provides a snapshot of the multisite disease burden in patients with advanced disease.

Radiomics allows to non-invasively quantify this inter-site heterogeneity and the integration of radiomics with molecular omics provides a unique opportunity for decoding the link between heterogeneity on imaging and at the molecular and cellular level [49]. Vargas et al. [50] developed CT-radiomics-based spatial heterogeneity metrics across multiple metastatic lesions and integrated these imaging-based heterogeneity metrics with clinical variables and genomics to predict survival and platinum resistance [22]. Of note, this integrated multiomics predictor outperformed other models based on fewer data types and also a multiomic model that included radiomics but did not take into account inter-site intra-patient heterogeneity. Besides radiomics, also multiparametric MRI and 18F-fluoro-deoxy-glucose-PET have been shown to hold great potential for an improved understanding of inter-site heterogeneity as clusters based on imaging-derived diffusivity, vascularity and metabolic parameters were associated with patterns of hypoxia on immunihistochemistry and distinct genetic alterations [51].

Discussion

Over the past two decades, fast and affordable sequencing has revolutionised genomic, epigenomic, and transcriptomic research, introducing unprecedented innovations in all the disciplines of cancer care ranging from gynaecological oncology to radiation therapy. Improvements in computational power have similarly changed the landscape of imaging research with exponentially increasing publications on radiomics and AI [7, 52]. The literature reviewed here demonstrates the vast potential multiomics data integration holds for improving patient care and outcome. Integration of radiomics and clinical information consistently outperformed models using radiomics or clinical models alone [15, 19, 25, 32, 33, 35, 36, 38,39,40,41]. In addition, the integration of radiomics with other types of data such as histopathological, genomic, and clinicopathological data [16, 22,23,24] improved performance, illustrating the added value of combining radiomics features in the developed models. Furthermore, in studies that integrated genomics and epigenomics, transcriptomics, and other omics data [8, 13, 14, 43,44,45,46], integration of multiomics data improved the results compared to single omics and an AUC of up to 0.95 was achieved for the test data sets [45].

Although genomic features such as homologous recombination deficiency have shown significant therapeutic implications in ovarian cancer, their assessment is not yet integrated into clinical practice and can be challenging and expensive and is still not refundable in many countries. To date, no imaging-based classifiers or predictors of outcome are currently being used routinely in clinics. Their implementation could be facilitated with easily usable, economically sustainable and effective methods. They hold the promise of overcoming many of the above-mentioned issues but limitations in the design and execution of some of the literature on radiomics and AI are at least partially contributing to the current situation. The lack of independent external datasets to evaluate AI models in a large part of the studies reviewed here is representative of a shortcoming of many radiomics studies. The use of independent data sets ideally from different institutions with different patient demographics, and socioeconomics and with imaging studies acquired using different scanners and vendors is highly recommended to overcome this limitation. Secure sharing of pseudonymised or anonymised data sets as well as AI models between research institutions, for example in a federated setup, is one way of ensuring the publication and distribution of highly generalisable models and could increase the chances of timely integration into clinical workflows, thanks to the robust replicability of these experiments.

The majority of studies also lacks detailed descriptions of the software and source code to enable the independent reproduction of results, an issue also encountered in other areas of oncological imaging and AI [53]. In addition, a variety of different metrics are used in AI research to evaluate and indicate diagnostic and predictive power, for example, metrics such as accuracy, AUC, sensitivity, specificity, F-score, C-index, recall, and dice similarity coefficient, limiting the comparative interpretability of the existing studies (Table 2). The use of common and standardised metrics would facilitate quantitative comparisons of models across different cohorts and institutions, significantly increasing the clinical impact of these decision-support tools.

The use of sufficient and common metrics should also be considered in future AI research. The overwhelming majority of studies reviewed here showed a benefit of multiomics data integration over limiting classification or prediction model building to one data type only. Therefore, the future work in this field should focus on data integration.

The association of genomic intrapatient heterogeneity in ovarian cancer and prognosis has been established and imaging is a well-suited tool for non-invasively assessing heterogeneity between tumour sites and tracking it over time. However, the integration of heterogeneity studies including multiomics is only in its infancy but has already shown advantages over less comprehensive analyses in terms of predictive power [22]. Besides the well-established whole genome sequencing of tumour DNA, more innovative approaches like the assessment of circulating tumour DNA and proteomics are more rarely encountered but merit attention [37, 54]. CA-125 has a well-established role in the diagnosis and management of ovarian cancer but lacks both sensitivity and specificity and circulating tumour DNA holds the potential to overcome this limitation [55]. Radiomics-based habitats have been used for targeting tissue sampling under ultrasound guidance, a technique that holds the potential to allow the integration of molecular tissue studies and radiomics with the advantage of limiting the exposure of the patient to ionising radiation and improving the design of clinical trials and their end-points [56, 57].

Conclusions

AI tools for integrating multiomics data for tasks such as adnexal lesion classification and outcome prediction in ovarian cancer were reviewed in this review. The current literature proves that AI-based tools based on multiomics data integration are more than the sum of their parts and clearly outperform single-omic data sets. Clinical data, serum markers and imaging data (predominantly using handcrafted radiomics) were the data most frequently paired up, followed by genomics and transcriptomics.

The latter two were only infrequently combined with imaging data, highlighting a current gap in the available literature. Only rarely the AI methods have been described in enough detail that would allow the reproduction of the results. Also, sharing data and analysis algorithms is uncommon thus hampering independent validation of results, a prerequisite for AI tools to be considered for clinical use. Heterogeneity at the genomic and tumour-microenvironment level represents a hallmark of ovarian cancer, likely contributing to its poor prognosis. Pivotal studies have shown that imaging holds the key to describe and understand heterogeneity, not only at the spatial but also temporal level with multiple scans being routinely performed throughout our patients’ path of care, allowing the setup of innovative therapeutic solutions and potentially improving treatment outcomes.