Introduction

Medical imaging plays a major role in the clinical decision-making process in oncology. In the past, that role was limited to diagnosis and staging. However, in recent years, imaging markers derived from routine clinical images have increasingly been researched to provide insight into the tumor in a non-invasive manner.

Imaging features may be either qualitative, sometimes referred to as semantic [1,2,3], where a reader, often an experienced radiologist, will assign a score to certain parameters stemming from expertise-based observation, or quantitative, for example, tumor dimensions, attenuation, or radiomics where the values are derived directly from the image. In radiomics, features are extracted mathematically with the aid of specialized computer algorithms. Radiomic features reflect a wide array of parameters in the image and have been shown to capture distinct imaging phenotypes beyond what is discernible to the naked eye [2].

Radiomics is the rapidly growing field of radiological research where routine patient images/scans are converted into mineable quantitative data [4] that can then be leveraged to decode the tumor phenotype for applications ranging from improved diagnostics to prognostication to therapeutic response prediction [5]. Radiomics refers to the general field in which patient scans are converted into quantitative data while radiogenomics is a specific application where imaging features, radiomic or otherwise, are linked to genomic profiles [6]. For the purpose of this review, we will focus on the above definition.

Interestingly, the term “radiogenomics” initially belonged to the realm of radiation oncology where it reflected the prediction of radiotherapy-induced toxicity based on the genetic profile of the tumor [7]. In recent years, radiogenomics has come to carry the connotation of linking radiomic features to even broader biological parameters beyond genomics such as proteomics and metabolomics [8]. While this is most likely the ultimate destiny of the field, it has yet to become the standard definition.

The radiomics pipeline

The workflow where quantitative features are extracted from radiological images and linked to specific outcomes has often been termed in literature as the “radiomics pipeline” [9,10,11]. This workflow is composed of a number of broad steps, beginning with image acquisition. Based on the device [12], image reconstruction algorithm [13], and protocol settings [14], the value of the extracted radiomic features can differ greatly. Following the image acquisition step, two broad approaches emerge, depending on how radiomic features are derived and at what step artificial intelligence is applied, if at all: classical radiomics [15, 16], with or without machine learning [17,18,19], and deep learning radiomics [20, 21].

Classical/conventional radiomics

Considering that radiomics is a novel field, the term “classical radiomics” may be something of a misnomer. Alternate descriptions such as “conventional radiomics” have been put forth [22]; however, the majority of literature still refers to this method as “classical.” In this approach, regions of interest are delineated, either manually or automatically, and handcrafted features are extracted using specialized algorithms (see Fig. 1a). Handcrafted imaging features attempt to describe lesions by capturing intuitive parameters reflecting the shape, morphology, and texture [23]. This dependence on delineation and handcrafted features is characteristic of conventional radiomics.

Fig. 1
figure 1

Outline of the two kinds of radiomics pipeline. a The classical/conventional radiomics model where, after image acquisition, areas of interest are delineated and handcrafted features are extracted. Subsequently, models are built around these predefined features using either statistical or machine learning methodologies. b The deep learning radiomics pipeline where, after image acquisition, neural networks automatically perform feature extraction, selection, and classification

The image-derived features are then processed, either using statistical models or machine learning, to link them to specific outcomes. Statistical models, such as univariate and multivariate analysis, are largely used to determine mathematical relationships between variables and outcomes, whereas machine learning is primarily focused on the construction of systems capable of learning from the data without explicitly programmed instructions. Machine learning models are widely considered superior for predictive purposes and rely on fewer mathematical assumptions, are built from more comprehensive datasets, and require minimal human involvement.

Deep learning radiomics

Newer deep learning radiomic workflows (Fig. 1b) can now process the image, automatically extract features, and perform classification without the need for a detailed delineation, if at all [24]. This approach is founded on the concept that deep learning classifiers should not only be used for data mining but also for data generation [25]. Convolutional neural networks (CNNs), a class of deep learning networks, have risen to prominence in the field of medical image analysis due to their independence from humans in feature design. The number of quantitative features extracted within a CNN is several orders of magnitude greater than older handcrafted feature extraction algorithms used in classical/conventional radiomics. Another significant advantage is that within the same convolutional neural network, feature extraction, selection, and classification occur across different layers.

Value of the genetic profile in cancer

Genomic instability and mutations are a hallmark of cancer and the accumulation of genetic and epigenetic mutations results in unchecked cell proliferation [26]. Gene profiles are often used to predict survival, as a prognostic biomarker, or response to treatment, as a predictive biomarker, helping to guide clinical decisions, particularly treatment selection [27].

Genetic mutations have been shown to be predictive of response/resistance to or recurrence after chemotherapy in breast cancer [28], hepatocellular carcinoma [29], and ovarian cancer [30] among others. In addition to traditional chemoradiotherapy, targeted molecular therapy has become a mainstay in the management of a wide range of tumors. Targeted therapy harnesses tumor-specific biology to inhibit the action of particular enzymes, target tumor-associated proteins, mutated receptors, or leverages other oncogenic molecular vulnerabilities. Detailed knowledge of the genetic composition, for instance driver mutations, resistance signatures, and so on can provide much-needed guidance on the selection of an optimal therapy.

In the rapidly evolving field of cancer immunology and immunotherapy, high tumor mutational load, a measure of the number of mutations within the tumor genome, has been associated with increased potential response to immunotherapy, especially immune checkpoint blockade [31, 32]. This association has been explained by the fact that an increased mutational burden results in a wider variety of neoantigens, to which the immune system has not been exposed, being expressed by the tumor [33, 34]—making them more recognizable by the immune system as foreign.

With the increased use of precision medicine and the establishment of guidelines around biomarkers, knowledge of the tumor genetic profile is advantageous to clinicians. Normally, insight into the tumor genome requires biopsies, an invasive procedure that may increase patient morbidity. The field of science that has risen to non-invasively identify specific imaging features (or signatures) that can predict tumor genomic alterations is termed “radiogenomics.”

Review of radiogenomic studies

Between 2017 and 2018 alone, there have been more than five hundred radiomics publications listed in the PubMed/MEDLINE database. Radiogenomics forms a rapidly growing subset of this research activity and as such, a general overview of the accumulating literature is needed. This review aims to provide a broad outline of the radiogenomic literature. Considering that cancer mutations are very often shared among different abdominal and extra-abdominal tumors, we have opted to study the various projects that have aimed to correlate imaging features with specific genetic signatures across different tumor types. (Literature search methodology in “Appendix 1” and Supplemental Fig. 1).

Brain

The overwhelming bulk of literature within radiogenomics has concerned the brain (Table 1). As early as 2008, Diehn et al. combined glioblastoma multiforme (GBM) neuroimaging with microarray DNA data in order to non-invasively map gene expression within the tumor [35]. Colen et al. subsequently used the scans of 82 treatment-naïve TCGA (The Cancer Genome Atlas) GBM patients to successfully correlate semantic imaging features to gene and microRNA expression—specifically those associated with edema-genesis, cell migration, and increased inflammatory response [36]. Alongside semantic and radiomic imaging features, even volumetrics—the analysis of volumes within the tumor compartments—have been associated with PERIOSTIN expression, a gene linked with decreased survival, shorter time to recurrence (p < 0.001), and the mesenchymal GBM subtype (p < 0.0001) [37].

Table 1 General overview of radiogenomic literature on brain tumors

Radiogenomics research in the brain was initially focused on the use of imaging features for molecular subtype prediction. Molecular subtypes are based on genome-wide profiling and large-scale genomic analysis and can used to provide diagnostic, prognostic, and therapeutic options [38]. MRI-derived texture features were shown to non-invasively predict whether a tumor would belong to one of four distinct molecular subtypes: classical (AUC = 0.72), neural (AUC = 0.75), proneural (AUC = 0.82), or mesenchymal (AUC = 0.70) [39]. Tumor shape [40, 41] and tumor-associated edema on T2-FLAIR (AUC = 0.61) [42] have also been reported as possible imaging discriminators of molecular subtype in GBM.

With the promising results of radiomics in molecular subtype prediction, more specific biological associations were pursued. The attention of the field turned towards the prediction of gene mutations that were already being used as biomarkers in daily clinical practice.

A number of prominent somatic mutations (namely TP53, RB1, NF1, EGFR, and PDGFRA) were found to be associated with volumetric parameters on T1 contrast and T2 FLAIR MR images [43]. The first purely quantitative analysis integrated GBM “multi-omics” data, containing multiple “-omes” such as the genome, transcriptome, and proteome, from the TCGA and their corresponding images from The Cancer Imaging Archive (TCIA) [44, 45]. Zinn et al. identified “radiomic profiles” that helped to discriminate the mutational landscape (i.e., TP53, PTEN, and EGFR) on a dataset of 29 TCGA patients [46]. Hu et al. performed an exceptional study where multiparametric MRI features extracted specifically from biopsy sites were used to predict various driver mutations (namely EGFR, PDGFRA, PTEN, CDKN2A, RB1, and TP53) via random forest [47]. This approach cleverly mitigated the impact of tumor heterogeneity and was one of only two radiogenomic studies to do so. Radiomics features were also identified that directly correlated with the overall mutational burden/load [48].

Different groups quickly moved away from broad genetic analyses to the characterization of imaging phenotypes for specific mutations in GBM. Isocitrate dehydrogenase-1 (IDH) mutation is commonly used in the clinic to stratify patients, often in conjunction with other co-mutations [49] and it received the most attention in brain radiogenomic research. IDH-mutated gliomas occurred most frequently in the rostral extension of the lateral ventricles of the frontal lobe [50] and were linked to tumor size [51], local pattern of intensities [52], PET features [53, 54], angular standard deviation (tumor boundary irregularity) [41], mean diffusional kurtosis [55], and apparent diffusion coefficient (ADC) [56] as well as part of “radiomic signatures” in artificial intelligence models [20, 57,58,59,60,61,62]. Similarly, 1p/19q co-deletion [62,63,64,65], a widely used prognostic biomarker for brain tumors [66], and EGFR mutation [67,68,69,70] were thoroughly covered by different teams, having been linked to a wide array of MRI features. Additional imaging markers were also found that were predictive for the expression of clinically relevant genes such as BCAT [71], PDGFR2a [72], ATRX [73], hypoxia-associated genes [74], Ki67 [75], CD49d [76], and CD3RNA expression [77]. Going beyond genomic parameters, Lehrer et al. performed a “radioproteomics” study where semantic MRI features were correlated with proteins expressed in lower grade glioma [78].

Interestingly, attempts were made to subgroup these tumors with radiomics alone, irrespective of their mutational landscape and clinical features [79]—strongly suggesting that much more information than anticipated could be extracted from quantitative features. In support of this point, MGMT methylation status, another prognostic biomarker [66], predicted by AI models even when clinical data were excluded and only radiomics were used [80].

Other brain tumors have also been studied where, especially in meningioma, radiomic features were shown to be capable of good patient stratification, based on grade (AUC = 0.86) [81], phenotype [81], or risk of relapse (AUC = 0.72, p = 0.28) [82]. Semantic features were also correlated with the molecular subtype in medulloblastoma [83] and to the genomic profile of neuroblastoma [84].

As the site that was most extensively studied, brain radiogenomic literature reflects what is seen across a wide variety of tumor types, namely that the field is still in its early stages. Radiogenomic features have been linked to genomics-derived molecular subtypes, gene groups, and specific somatic mutations. Most studies, however, are performed on limited, usually single-center, patient cohorts in an exploratory, proof-of-concept manner. Agreement between studies on independent radiogenomic features remains to be achieved before mainstream clinical application can be pursued.

Lung

In non-small cell lung cancer (NSCLC), genetic mutational status plays a pivotal role in the clinical decision-making process—especially since targeted agents received FDA approval in NSCLC [85]. Around 10–15% of NSCLC patients have an activating EGFR mutation [86], where receptor blockers or tyrosine kinase inhibitors (TKIs) would be valid treatment options. Moreover, a quarter of lung adenocarcinoma patients have a KRAS mutation [87], which would confer resistance to the aforementioned treatments (see Fig. 2). KRAS and EGFR mutations are mutually exclusive [88] and the presence of one or the other strongly influences the choice of treatment. Furthermore, up to 7% of NSCLC patients harbor an ALK rearrangement, potentially making them candidates for crizotinib, a multi-targeted tyrosine kinase inhibitor with potent activity against ALK [89].

Fig. 2
figure 2

An illustration of the role of genetic mutational status in treatment decisions. a The RAS pathway connects EGFR to cell proliferation and survival (via transcription of effector genes). b Anti-EGFR based treatments block signal transduction from the receptor and hence counter the effects of the pathway. c Activating mutations in KRAS allow the cell to constantly exert the effect of this pathway irrespective of whether the ligand has bound to the receptor. Such mutations could mediate resistance to anti-EGFR therapies

Given the fact that genomic analysis is becoming more routine in this cohort of patients, there has been significant research interest in seeing whether radiomics could predict the genetic status of the tumor, especially in KRAS, EGFR, and ALK. (Table 2). Gevaert et al. started the flurry of radiogenomic lung research by correlating PET/CT features to metagenes, aggregated patterns of gene expression, with encouraging preliminary results [90]. However, the first article that most resembled a conventional radiogenomics study by linking specific imaging features to specific mutations was the work of Halpenny et al, investigating the predictive value of CT radiomics for ALK rearrangements [91]. Early on, different projects selected clinically relevant mutations, like ALK [92], EGFR [93, 94], and KRAS [88]), with various technical methodologies (purely correlative studies vs. predictive models [95]).

Table 2 General overview of radiogenomic literature on lung cancer

As a predictive marker, KRAS has been linked to round shape [96], nodules in non-tumor lobes [96], multiple small nodules [97], as well as general radiomic profiles [98, 99]. Interestingly, some studies reported the ability to predict EGFR, but not KRAS [100, 101]. ALK rearrangement was linked to pleural effusion [96] and lobulated margin [102]. HER2, a gene often amplified during acquired resistance to EGFR-targeting therapy, was also studied [103]. Halpenny et al. also studied qualitative features in the prediction of a BRAF mutation in lung cancer—citing pleural metastases as a key difference (p = 0.045) [104].

By far, the most researched genetic mutation in NSCLC was EGFR (both exon 19 and exon 21 mutations) having been linked to contrast [105], Laws-Energy [94], median Hounsfield Unit (HU) [88], SUVmax [106], pleural retraction [96], small size [96, 97], speculation [97], irregular nodules [107], poorly defined margins [107], ground glass [107, 108], emphysema [109], locoregional infiltration [109], “normalized inverse difference moment” [101], as well as combined radiomic profiles [98,99,100, 110,111,112,113].

Interestingly, the semantic features “small size” and “ground glass appearance” were the only two imaging markers that were reproducibly shown to be associated with the EGFR mutation [96, 97, 107, 108]. Quantitative features are often referred to as being more objective, however, based on image acquisition and feature extraction methods, the values can vary greatly [12,13,14]. In addition to large multicenter studies, the methods used at different points of the radiomics pipeline need to be standardized, otherwise a direct comparison may not possible. Achieving direct comparison is especially helpful in the case of other, less prevalent, tumors that would share the same driver mutation. A model trained on more easily acquired data from lung cancer may in the future be generalized to a different less prevalent tumor.

In addition to mutational status prediction, radiomic as well as genomic parameters have been combined in prognostication models [114]. Emaminejad et al. reported that while radiomics (AUC = 0.78) and genomics (AUC = 0.78) models were capable of predicting survival, accuracy significantly improved (AUC = 0.84) when both data were integrated [115]. The future of data-driven diagnostics will be the integration of complementary data generated from different sources (the “-omics”) to improve the performance of predictive models by unlocking further hidden information.

As in brain radiogenomic literature, lung studies seldom agreed on features, either semantic or radiomic, that were predictive for specific genetic mutations. This lack of reproducibility/generalizability is one of the largest hurdles for radiogenomics to overcome in order to be implemented into the clinical workflow. Until recently, most studies relied on monocentric patient cohorts using basic statistical methodologies to identify associations between certain features and the gene/molecule of interest. In order to identify truly reproducible radiogenomic features or signatures, large heterogeneous patient cohorts would need to be collected from multiple centers, ideally prospectively and would need to be analyzed using modern methods capable of addressing data heterogeneity.

Breast and ovaries

Since the 1970′s, mammograms have been used to assess breast cancer risk, through qualitative [116] and quantitative analysis of the breast parenchyma [117]. However, Li et al. were among the very first to apply texture analysis to discriminate between high-risk BRCA-mutated and low-risk wild-type patients [118]. AI methods were subsequently leveraged to improve the performance of these radiogenomic predictive models—first with Bayesian artificial neural networks, a framework for different machine learning algorithms [119, 120], and then with convoluted neural networks (AUC = 0.86) [121].

While radiogenomic research in the breast had initially started with mammograms, the interest of researchers in this field also spanned MRIs (Table 3). Yamamoto et al. studied 353 patients with breast cancer and obtained their gene expression profiles (MRI n = 10) [122]. Given that the ACR does not recommend breast MRIs as screening for the general population, this low sample size is understandable [123]. Nonetheless, 21 imaging traits were found that were globally correlated (p < 0.05) with 71% of the genes measured. In the MOSCATO-01 trial, voxel intensity was significantly associated with HER2 amplification on a patient group (n = 20) composed of gastrointestinal tumors, HNSCC, lung cancer, breast cancer, and urogenital cancer [124]. Considering that a number of mutations are relevant across multiple tumor types, for instance BRCA and HER2 in both breast cancer and ovarian cancer, further validation studies should be performed to test whether these imaging markers are tumor agnostic.

Table 3 General overview of radiogenomic literature on breast cancer and ovarian cancer

Quantitative imaging features were not only linked to specific mutations but also genomics-derived molecular subtypes [125, 126]. Wang et al. were able to further broaden the molecular subtypes that were predicted by imaging features: luminal A (p = 0.00473), HER2-enriched (p = 0.00277), and basal-like (p = 0.0117) [127]. Machine learning models capable of discerning luminal A molecular subtype (AUC = 0.697), triple-negative breast cancer (AUC = 0.654), estrogen receptor (ER) mutational status (AUC = 0.649), and progesterone receptor (PR) status (AUC = 0.622) were subsequently built [128]. Imaging markers were also directly correlated with survival in breast cancer [129]. With increasingly thorough analyses [130], novel imaging markers may be found that can be readily used in the clinic alongside existing risk-stratification scoring systems [131].

Molecular subtyping has also gained importance in ovarian cancer. Due to the significant risk of recurrence in high-grade serous ovarian carcinoma, prognostic biomarkers are of critical importance for clinicians [132]. A prognostic model called the “Classification of Ovarian Cancer” (CLOVAR) was developed by combining subtype and survival gene expression signatures in order to better profile patients into risk groups [133]. In one of the few radiogenomic multicenter studies, semantic features from a cohort of 92 high-grade serous ovarian carcinoma patients were found to be associated with subtypes of the CLOVAR system as well as time to progression [134]. In epithelial ovarian cancer, where 5-year survival is approximately 35–40%, a similar need for patient stratification exists [135]. Lu et al. engaged in a very thorough and methodical study where machine learning was used to derive a score they termed as the “Radiomic Prognostic Vector” (RPV) [136]. The RPV was shown to identify patients with less than 24 months of survival. This is also one of the very few predictive radiogenomic models that has been validated on independent multicenter cohorts and has received further complementary genetic and transcriptomic analysis to rationalize the predictions.

As mentioned in the previous section, when different data sources are combined, further information may potentially be unlocked. This was also observed when imaging features were combined with the recurrence scores of the OncotypeDX and PAM50 gene panels, the predictive performance of models improved significantly [137]. Integration of data from multiple domains may well be the necessary next step to bring the performance of these predictive models up to a clinically feasible level.

Liver

Liver cancer is the second leading cause of cancer mortality worldwide [138], with 80–90% of primary liver cancer being hepatocellular carcinoma (HCC). Radiogenomics in HCC developed rather curiously in the sense that two parallel research lines arose simultaneously (Table 4)—one searching for predictive imaging markers (therapeutic response) versus another searching for prognostic imaging markers (survival and recurrence rate).

Table 4 General overview of radiogenomic literature on liver tumors and colorectal cancer

Doxorubicin is one of the earliest developed anthracyclines and while systemic administration of the agent has shown limited benefit, 30–70% of HCC patients respond to transarterial chemoembolization (TACE) with doxorubicin [139]. Early radiogenomic studies successfully identified six predefined semantic imaging phenotypes on standard CE-CT images that were associated with doxorubicin response gene expression patterns [140]. West et al. subsequently performed the first quantitative radiogenomic study in liver cancer where they acquired CT images from 27 treatment-naïve HCC TCGA patients and correlated radiomic features with specific genes known to confer doxorubicin chemoresistance (namely TP53, TOP2A, CTNNB1, CDKN2A, and AKT1, AUC = 0.72–0.86) [141].

The other major focus in HCC radiogenomic literature was the development of a proxy imaging marker for Microvascular Invasion (MVI), an independent predictor of early recurrence and poor postoperative overall survival [142] identifiable only by histology of excised tissue. Having a non-invasive means of determining preoperative MVI would provide significant clinical benefit to these patients. Previous attempts were made to directly correlate conventional imaging features with histopathological MVI [143,144,145]. However, the most researched imaging marker was “Radiogenomic Venous Invasion” (RVI), a contrast-enhanced CT marker linked to a 91-gene HCC venous invasion gene expression signature that was prognostic for survival following surgical resection or liver transplantation [146]. Banerjee et al. validated the RVI for its correlation with MVI (accuracy = 89%, sensitivity = 76%, specificity = 94%) and moreover for its prognostic value (OS (p < 0.001), 3-year recurrence-free survival (p = 0.001) [146]). It should be noted that liver cirrhosis may impact the ability of these radiogenomic markers to predict MVI or even outcome [147]. Considering that the overwhelming majority of HCC patients have chronic liver disease [148], with 80–90% of cases suffering from liver cirrhosis [149], this is a sizable limitation. Further efforts would be needed in hepatic radiogenomic studies to identify cirrhosis-independent predictors.

In cholangiocarcinoma, FGFR2, a potentially suitable target for molecular therapy, was also studied from a radiogenomics perspective, albeit with conflicting results. An initial study (n = 66) using semantic features was unable to find any significant association to the genetic mutation [150]; however, a second smaller study (n = 33) using machine learning successfully predicted FDFR2 mutation in 90% of the cases (sensitivity = 87%, specificity = 94%) despite having only half the sample size [151]. This makes a compelling argument in favor of using machine learning to build better-performing predictive models.

Kidneys

As a result of the widespread use of abdominal imaging, renal cell carcinoma (RCC) is increasingly being detected at earlier stages where surgery is a therapeutic option [152]. Prognostic biomarkers are important for treating clinicians to help mitigate the risk of postoperative recurrence. In the clinic, von Hippel–Lindau (VHL) mutational status is often used as both a prognostic and predictive biomarker for RCC [153], given its role in hypoxia signaling [154]. VHL mutations were found to be significantly associated with well-defined tumor margins (p = 0.013), nodular tumor enhancement (p = 0.021), and gross appearance of intratumoral vascularity (p = 0.018) [155]. In addition to these features (Table 5), machine learning classifiers, trained on local datasets and validated on TCGA patients, were built that could identify not only mutations in VHL (accuracy = 0.75) but also in BAP1 (accuracy = 0.83) and PBRM1 (accuracy = 0.83) [156].

Table 5 General overview of radiogenomic literature on prostate cancer and renal cell carcinoma

BAP1 mutation, particularly with a concomitant loss of PBRM1 [157], has been shown to be a significant negative prognostic parameter for RCC patients [158] and as such, has received attention in radiogenomic research alongside VHL [159]. BAP1-mutated RCCs tended to display CT renal vein invasion (p = 0.046) [155], ill-defined tumor margins (p = 0.002) [160], and a higher Fuhrman (pathological) grade (p = 0.026) [161]. BAP1 also featured in the work of Vikram et al. where they extracted quantitative features from 78 clear cell RCC from the TCGA and used machine learning methods to predict the mutational status (AUC = 0.78, pre-contrast) [162]. Beyond individual genes, CT radiomic features were also correlated, at the epigenetic level, with DNA methylation in RCC [163].

Colorectal carcinoma

As in lung cancer, the RAS gene family functions as a group of molecular switches controlling transcription factors and cell cycle proteins. Deregulation of RAS signaling results in increased cell proliferation, angiogenesis, and heightened metastatic potential. In colorectal cancer (CRC), 30–50% of cases are KRAS mutated [164] while 3–5% are NRAS mutated [165]—they are largely considered to be mutually exclusive [166, 167]. As in NSCLC, an activating RAS mutation is considered indicative of EGFR antibody resistance and hence serves as a predictive biomarker in the clinic [168, 169] (see Fig. 2).

Different imaging modalities have been tested to provide predictive imaging markers for KRAS mutational status. On CT, conventional radiomics has shown that KRAS mutations are associated with skewness (p = 0.02) [170]. More robust machine learning classifiers have also identified radiomic signatures capable of predicting KRAS (AUC = 0.829), NRAS (AUC = 0.686), and BRAF mutations (AUC = 0.857) in a cohort of 117 CRC patients [171]. Magnetic resonance imaging has also been analyzed where it was observed that, in rectal cancer, KRAS mutations were associated with N stage, axial tumor length, and a polypoid pattern [172]. DCE MR parameters, however, were not associated with either KRAS mutation or microsatellite instability, an important prognostic and predictive biomarker for colorectal cancer [173]. F18 FDG PET/CT imaging has also been tested in the search for colorectal radiogenomic markers albeit with mixed results. On one hand, first-order features such as maximum SUV [174], mean SUV, SUV standard deviation, and SUV coefficient of variation [175], have been significantly associated with KRAS mutation. On the other hand, a growing body of literature is reporting that uptake may not be reflective of KRAS mutational status [176, 177].

EGFR, KRAS, and BRAF mutations are clinically relevant in a number of carcinomas; however, due to lack of standardization, direct comparison of radiomic features linked to the same mutation across different tumor types is not meaningful. Tumor agnostic radiogenomic features need to be found using large multicenter datasets and harnessing advanced classification methods.

Outside the domain of image analysis, Pershad et al. used a purely text-based approach to process the descriptive words used by radiologists in their reports in order to predict KRAS mutational status [178]. The trained classifier determined the word frequency within both mutant and wild-type radiology report. Tumors with a KRAS mutant were described more often as “innumerable,” “confluent,” and “numerous,” whereas wild-type tumors were more “few,” “discrete,” and “[no] recurrent.”

Prostate

Radiogenomics carries great potential in the setting of prostate cancer given that clinical outcome is closely linked to one prominent tumor suppressor gene, PTEN (Table 5). Loss of PTEN in prostate cancer has been correlated with a clinically aggressive phenotype and increased mortality [179]. While multiparametric MR scans failed to yield any predictive/correlated features [180], contrast uptake on DCE-MRI (p < 0.01) and T2-weight signal-intensity skewness (p < 0.1) were correlated with PTEN expression [181].

Stoyanova et al. took a unique approach and performed radiogenomic analysis on patients with prostate cancer who had undergone MR-guided biopsies. The biopsy site was identified on the scan and radiomic features were extracted only from the specific area of interest—allowing for a more accurate radiomicbiological correlation to be made. Radiomic features associated with prognostic biomarkers were identified with this approach [182, 183].

Limitations and challenges facing radiogenomics

While the field of radiogenomics holds great promise, there are a number of limitations that the field, as a whole, needs to overcome [184].

Radiomics pipeline

Up until fairly recently, the radiomic pipeline consisted of image acquisition, delineation of regions of interest, extraction of handcrafted features, and correlation with simple statistics. Every step in the pipeline is prone to error, which can accumulate in a phenomenon known as “propagation of the error.” Manual delineations require significant time commitment by trained radiologists, often serving as a rate limiting step for the number of scans/patients included in a study. Furthermore, inter-observer variability of manual segmentations is hard to control—impacting the features derived from the image [185]. Notwithstanding the annotation/delineation step, the process of handcrafted feature extraction introduces an element of bias since they are built on predefined mathematical formulas meant to capture specific morphological and textural phenotypes. Based on the extraction technique and software used, these features can change [186]. Most handcrafted features were designed in the realm of computer vision for general image decoding tasks and were not intended to address clinical research questions.

At the model building step of the pipeline, more robust classification methods need to be leveraged. Simple correlative studies often are not reproducible, leading to conflicting results from different datasets meant to predict the same mutation in the same disease. With such simple statistical methodologies, basic radiomic studies run the risk of overfitting (i.e., being protocol-specific, scanner-specific, or hospital-specific or all of the above) or underfitting (i.e., exploring only linear models).

Reproducibility of radiomic features

In order for radiomics to gain clinical application, the identified imaging biomarkers need to be independent, informative, and reproducible. Typically, robust biomarkers are derived from large heterogeneous datasets with a methodical testing process and external validation. Currently, literature has shown that radiomic features can be influenced by many parameters such as scanning equipment [187, 188], image pre-processing [189], acquisition protocols [190, 191], image reconstruction algorithms [13, 192, 193], and delineation [194, 195] among others. These changes can subsequently impact the radiomic predictive models. In lung cancer, the diagnostic performance of a radiomic signature for solitary pulmonary nodules fluctuates based on slice thickness, contrast enhancement, and the convolutional kernel [196]. Changes in pixel resolution and contrast injection rates decreased the proportion of reproducible features in liver cancer [197]. Reproducibility testing is becoming more routine in the radiomic biomarker identification process [187, 198,199,200,201,202,203]. Associations such as the Quantitative Imaging Biomarker Alliance have begun to advocate for the technical standardization of patient scans to help ensure inter- and intra-machine reproducibility of imaging features [204, 205].

Tumor heterogeneity

From the biological perspective, radiogenomic studies rely on information derived from a biopsy to serve as a gold standard. In addition to being invasive, biopsies suffer from a bias where it is assumed that the sample reflects entire tumor burden. Tumor heterogeneity poses a serious challenge to this approach and the clinical decisions based on them. Radiogenomics aims to bypass both the issue of invasiveness and the sampling bias by using non-invasive radiological images to analyze the full tumor burden. If predictive radiogenomic models are built using biopsies where the sampling bias is passed on, one could argue that the second objective has yet to be achieved. This is a critical challenge for radiogenomics as the reliability of predictive models is dictated by the credibility of ground truth biological data. The next generation of radiogenomic research has become more acutely aware of this where Hu et al. and Stoyanova et al. extracted radiomic features only from the biopsy sites in brain and prostate tumors, respectively [47, 182, 183]. Better radiomic–biological correlations are built with such an approach and would be the next step forward in identifying robust radiogenomic features.

Need for multidisciplinarity

The difference of approach to radiomics, seen between clinical and technical researchers, should also be taken into consideration. Groups that are more clinically oriented tend to look at problems from a very clinical perspective, often at the sacrifice of the methodology. This would explain the continued presence of simple correlative radiogenomic studies despite numerous methodological disadvantages, such as multiple comparison errors, difficulty to make decisions based on a feature, and lack of robustness. Conversely, highly technical groups often tend to be driven by the development of novel methodologies, whereas the clinical relevance is of secondary importance. Due to limited direct translational potential, technical studies often suffer from being limited to a niche. Better inter-disciplinary collaboration will help bring about technically novel research that serves a greater clinical impact—to the benefit of everyone involved.

Future directions

The technical challenges listed in the previous section can be mitigated, to varying degrees, with the implementation of novel artificial intelligence techniques.

Deep learning radiomics

As highlighted earlier, the radiomics pipeline currently exists in a loosely dichotomous state between the two different approaches in radiomics. Classical/conventional radiomics relies on tumor delineations followed by the extraction of predefined handcrafted imaging features. Early radiomics research has been hindered by a lack of reproducibility partly due to different extraction algorithms being used to derive the features. Deep learning neural networks, specifically CNNs, have enjoyed significantly increased interest in medical imaging analysis. CNNs are capable of “learning” important features from within the image directly without needing to manually define them. The added advantage of CNNs is that the classification process, where the extracted features are linked to each other as well as the desired outcomes, is fully automated.

Deep learning is not without its own set of challenges. As the entire process of feature extraction and classification takes place within the same network, a great number of samples are needed to build robust models with satisfactory performance. For purposes like facial recognition, deep learning models are ordinarily trained on datasets numbering in the tens of thousands. However, in oncological imaging, the availability of standardized imaging data suitable for AI purposes remains rather limited. In this review, the overwhelming majority of radiogenomic studies had low sample sizes (n < 100). Data augmentation strategies that are optimized for medical images need to be developed to overcome this limitation [206].

Another concern raised by researchers using this newer generation radiomics pipeline is the fear of an “artificial intelligence black box” where even neural network architects are unsure how the most relevant features are selected and the predictions are made. Regardless of accuracy, clinicians are unlikely to embrace a technology that requires blind faith.

Various approaches have been proposed to shed light on this black box. Saliency mapping has gained attention in AI research, where the impact of each individual pixel on the prediction is measured and “areas of importance” for the prediction of the neural network are visualized [207]. This effectively generates a heat-map highlighting where in the image the model bases its prediction. An alternative approach put forth was the decoupling of the segmentation and classification tasks into two different neural networks [208]. This would allow the clinician to inspect and confirm the automatically segmented area prior to the feature extraction and subsequent diagnosis/prediction.

Automated segmentation

Tumor delineation is considered a bottleneck in the radiomics pipeline [209], as it can only be reliably performed by experienced radiologists, who are already burdened with clinical duties. AI can address this problem in two ways: automatic segmentation and deep learning. Automatic segmentation uses the algorithm’s ability to detect patterns to delineate structures, pathological, or otherwise. These automatically segmented areas would then have their features extracted and would flow into the next step of the radiomics pipeline. This approach has the added advantage of minimizing inter(/intra)-observer variability—where obtaining agreement on the delineations is notoriously difficult [210].

Another way the delineation bottleneck can be bypassed is with the use of convolutional neural networks in image processing. The network can either process the entire image to detect “global” imaging markers on the whole image or, if a region of interest is needed, the lesion is simply “pointed out” as opposed to being fully delineated. Within a convolutional neural network, feature extraction/selection happens automatically, on a much grander scale than any number of handcrafted features, and enjoys consistent optimization with each iteration. The malleability of these networks allows for the construction of more complex, robust, and generalizable architectures which can correct for known biases.

Experimental radiology

While the overwhelming majority of radiogenomic research is performed on routine clinical scans, there is a limit on the type of analyses that can be performed on patient-derived obtained tissue samples and patient scans. As in the case of fundamental oncology, preclinical experiments may also be advantageous considering that the whole tumor burden can be modulated, excised, and studied [211]. There is still a debate on the usefulness of preclinical models in the drug development process especially considering the differences in physiology, tumor heterogeneity, and disease progression. However, preclinical tumor models may prove to be a valuable development platform for imaging protocols dedicated to specific biological situations, for example, oxygenation that might not be readily induced in a human subject.

As our understanding of imaging markers improves, preclinical experiments can help identify better radiological–biological correlates to help provide non-invasive insight into the tumor biology. Once the inner workings of the neural networks are better elucidated, we may transcend the need to constantly link the imaging marker back to a biological parameter.

Integration systems in healthcare

Radiomic predictive models have the potential to assist clinical teams make more precise diagnoses, more rapidly, with less time needed to be spent on each scan. In addition to patient stratification, radiogenomics could potentially have a major impact on selection of candidates for targeted therapies where expression of the target molecule can be measured non-invasively for the entire tumor burden, without having to rely on a single biopsy to represent all the cancer lesions within a patient. Before this can happen though, a number of challenges and growth opportunities need to be addressed.

As mentioned in the brain section, most of the research performed in radiogenomics was exploratory or proof-of-concept in nature using simple methods on single-center data. Large-scale multicenter projects are needed to identify reproducible radiomic features that might be not only scanner/protocol/hospital independent but ultimately tumor agnostic. Artificial intelligence provides a unique opportunity to not only extract more meaningful features and create better predictive models but also to integrate multiple data sources. A number of projects have begun to see the value of combining complementary information, such as radiomics and genomics, to boost predictive performance [114, 115, 137]. Within such an integration system, a future could be envisioned where computational biologists would work in close collaboration with radiologists to select the most suitable patient image, validate the region of interest generated by the AI prior to prediction, and lead discussions in the multidisciplinary tumor boards.

Conclusions

In this review, we endeavored to encapsulate all the major radiogenomic research and cover the challenges that lie ahead along with what we believe to be the future prospects. While the field of radiogenomics does have its roots in research performed decades ago, it is still very much in its infancy. Many challenges still need to be addressed, however, there has already been significant progress made across a number of tumor types. With the rise of artificial intelligence in medicine, especially deep learning, more complex models combining multiple data sources could overcome many of the challenges that stand between radiogenomics and clinical implementation. Radiologists need to embrace this new technology and adapt to the changes that it will bring to the clinical workflow.