Introduction

The term “radiomics” describes the high-throughput extraction, analysis and interpretation of large amounts of features from medical images. It is assumed that medical images contain more information than can be appreciated by the human eye alone and that additional data extracted through automated or semi-automated analysis may complement the standard descriptive data or metrics available to the radiologist or nuclear medicine physician. There is an assumption that there is a relationship between the additional quantitative imaging parameters and the tumour molecular phenotype or genotype, which bioinformatic approaches may uncover [13]. Frequently the additional data is “free” in that its extraction only requires computational post-processing of data already acquired for clinical imaging protocols, without the need to resort to more complicated acquisition protocols or to subject the patient to additional examinations or visits.

The underlying hypothesis of radiomics is that this additional information from medical images can be used alone or in combination with other “omics” data (e.g. genomics, metabolomics, proteomics) to improve tumour phenotypic characterisation, treatment prediction or prognostication, to an extent that each patient’s treatment may be individualised. For example, Segal et al. reported the extraction of 28 features from contrast-enhanced CT scans of hepatocellular carcinoma that can be used to reconstruct 78 % of the gene expression profiles associated with proliferation, hepatic synthetic function and prognosis [4].

The interest in radiomics has been heightened by the knowledge that there is intra- and inter-tumoural genetic heterogeneity both within and between patients and that the genetic profile may change over time, for example as a consequence of therapy [5, 6]. On a simpler level, it is recognised that malignant tumours show heterogeneity of molecular and cellular features, including cellular density and proliferation, necrosis, fibrosis, metabolism, hypoxia, angiogenesis and receptor expression, factors that have been independently associated with poor treatment response and more aggressive tumour behaviour. There is early evidence that some of these adverse biological features may be reflected in medical images. For example, features extracted from CT scans of patients with non-small cell lung cancer (NSCLC) have been correlated with histological features of angiogenesis and hypoxia [7]. Also, it has been shown in a murine head and neck cancer model that the distribution of 18F-fluoro-deoxyglucose (18F-FDG) uptake on PET images is associated with underlying histological features of tumour cell density (areas containing more tumour cells were compared with areas showing more stromal tissue and necrosis) [8]. It is postulated that tumour phenotype (whole tumour or subsegments) could be characterised and associated with underlying regional genetic changes within or between tumours [6].

The major advantages of imaging in revealing underlying phenotypic and genetic information is that it allows a whole tumour, and any metastases, to be sampled non-invasively and repeatedly; in other words, it overcomes the invasiveness and sampling errors associated with the examination of biopsy material. Imaging therefore seems to be in a strong position as regards its potential use, in the future, to characterise tumours, select patients for optimum therapy and monitor genetic and biological changes that may subsequently inform the best management for an individual patient. PET on its own is unlikely to be able to meet all these requirements but with the increasing use of novel tracers exploring different aspects of tumour biology, in conjunction with the routine use of multimodality imaging including PET/CT, and more recently PET/MRI, it is likely to play a large part.

Radiomics and PET

The overall process of radiomics in PET is not significantly different from that of other imaging modalities. The image data have to be acquired and then reconstructed; this is followed by tumour segmentation and feature extraction and finally the application of informatics analyses and data mining, without necessarily having a priori hypotheses.

To some extent, we are already beginning to extract additional data from standard PET images to supplement routine clinical parameters such as the standardised uptake value (SUV). Indeed, there are many ways of measuring or applying corrections to SUVs, and there already exist published recommendations for standardisation for clinical trials and clinical practice [911]. Examples of the additional quantitative features now widely reported in the literature include the metabolically active tumour volume (MTV) and total lesion glycolysis (SUVmean × MTV). Both of these parameters have yielded better discriminatory, predictive or prognostic information than SUV alone [1216] and a recommendation of the published PERCIST guidelines was that these parameters should be measured as secondary or exploratory endpoints in future trials to allow their subsequent evaluation in this role [10].

A large number of alternative parameters that can be extracted from PET images have also been proposed; these describe the shape, size and texture or heterogeneity within a tumour (Table 1) [1719]. In particular, there has been recent interest in measuring heterogeneity within medical images and, following work with morphological imaging such as CT and MRI [7, 2022], there are now more and more reports showing that the measurement of texture or heterogeneity within PET images may give additional information on the tumour phenotype compared to simple SUV-based measurements alone [17, 2329]. Heterogeneity measurements may be temporal, interrogating changes in signal over time, or spatial, where what is measured are the relationships between voxels in a static image. The measurement of spatial heterogeneity has received most interest.

Table 1 Common imaging heterogeneity parameters

The methods for measuring image spatial heterogeneity can be divided into global, regional or local parameters representing the relationships between voxel intensities. The most commonly used statistical methods include first-order (one voxel), second-order (two voxels) and high-order (three of more voxels) parameters, but other methods exist, such as model-based, e.g. fractal analysis, or transform-based ones [1719]. First-order methods are the simplest, describing global measurements of a tumour or region of interest (ROI), e.g. mean, minimum, maximum, range, standard deviation, skewness (asymmetry of the histogram), kurtosis (flatness of the histogram), uniformity (regularity) and entropy (randomness) of voxel intensities, from intensity histograms. First-order parameters do not convey spatial information from within the tumour because the above properties are calculated using individual voxel values, ignoring the spatial relationships between voxels. By contrast, second- and high-order features describe properties of the intensities of two or more voxels occurring at separate locations relative to each other and therefore maintain spatial information.

Second-order parameters describe local textural features and can be calculated using spatial grey-level dependence or co-occurrence matrices (GLCM) [30]. The matrices determine the frequency with which a voxel of intensity i finds itself in a certain relationship to another voxel of intensity j. The most commonly used second-order parameters based on GLCMs include entropy (randomness of the matrix), uniformity (orderliness/homogeneity), contrast (local variation), homogeneity, dissimilarity (difference between elements in the matrix) and correlation (grey-level linear dependencies).

High-order parameters can be calculated using neighbourhood grey-tone difference matrices (NGTDMs) [31]. Local textural features calculated from NGTDMs relate to differences between each voxel and its neighbouring voxels in adjacent image planes, and are thought to closely resemble the human experience of the image. Examples include the parameters coarseness, contrast and busyness. Coarseness describes the granularity of an image and is considered one of the most fundamental texture properties. Contrast relates to the dynamic range of intensity levels in an image and the level of local intensity variation, while busyness relates to the rate of intensity change within an image [31]. Regional parameters describe run-lengths of consecutive voxels, or zones with similar intensities, such that short run-lengths with similar intensities give fine texture and long runs with differing intensities give coarse texture. Examples include short run (or small zone) emphasis, long run (or large zone) emphasis and run length (or size zone) variability [32, 33].

Technical factors

Textural features can vary depending on image acquisition and reconstruction parameters in PET. Indeed, different textural features have been found to show different variability when varying the acquisition method (2D vs. 3D), matrix size (128 × 128 vs. 256 × 256), reconstruction algorithm and post-reconstruction filter [34]. For example, features with <5 % variability included entropy and uniformity (first order), maximum correlation coefficient (second order) and low-grey level sum emphasis (high order), whereas the majority of calculated features (40 out of 50) were found to show >30 % variability; these included coarseness, contrast and busyness (high order). A similar study examined the effects on smoothing, ROI segmentation thresholds and bin widths on the precision of a number of first-, second- and high-order features [35]. It was concluded that changes in smoothing and ROI segmentation thresholds had relatively small effects and that GLCM and grey level run length (GLRL) methods were the most robust. By contrast, bin width had larger effects on precision and it was suggested that a normalisation process might reduce this effect.

Image noise can also adversely influence the discriminatory ability of textural analysis but this can depend on the type of image [36]. For gradually changing images, the discriminatory performance is poor for low noise levels whilst images with greater differences are more robust until higher levels of noise are present.

The majority of clinical studies that measure tumour heterogeneity with PET or other imaging modalities use operator-defined or threshold-defined ROIs for tumour segmentation. However, it has been argued that a fuzzy locally adaptive Bayesian segmentation approach more accurately defines the tumour volume, particularly for heterogeneous tumours, and could improve the prognostic evaluation [3739].

Whilst it is attractive to hypothesise that image heterogeneity may reflect underlying tumour biology there is no evidence of a direct relationship between 18F-FDG PET images and histological features in humans. Given the microscopic nature of tumour biology and the tumour microenvironment (e.g. angiogenesis, hypoxia, proliferation, necrosis etc.), the macroscopic scale of the PET image (e.g. 0.5 cm voxels) is unlikely to allow a direct reflection of the tumour biology although it has been hypothesised that 18F-FDG heterogeneity in NSCLC may reflect the distribution of hypoxic cells associated with higher expression of glucose transporters [40]. Exploring heterogeneity within tumours also has the potential to reveal underlying aspects of tumour physiology and biology. For example, measurement of spatial heterogeneity of kinetic parameters of 18F-FDG uptake within a number of tumours (predominantly NSCLC) segmented into quartiles showed variability across quartiles leading to the hypothesis that the tumour glucose phosphorylation rate is independent of 18F-FDG delivery and transport [41].

Fig. 1
figure 1

ROC curves for baseline 18F-FDG PET primary tumour coarseness, contrast, busyness, and complexity for identification of responders vs. non-responders by RECIST at 12 weeks in patients treated with chemoradiotherapy for NSCLC. [This research was originally published in JNM [28]. © by the Society of Nuclear Medicine and Molecular Imaging, Inc.] (color figure online)

Measurement of heterogeneity within images requires the inclusion of a reasonable number of voxels in a ROI, so as to be able to measure some of the textural features. It has been postulated that some heterogeneity parameters may be surrogates for size below a certain volume. Brooks and Grigsby, using probability theory, calculated that measurements from cervical tumour volumes of <45 cm3 can be very sensitive to size, and may reflect size rather than underlying heterogeneity as measured by the parameter, local entropy [42]. The calculated minimum volume may vary with different tumour datasets, scanner resolution and voxel size and dynamic range of intensities in tumour voxels, but it is nevertheless recommended that this calculation be made so that tumours below a minimum volume may be excluded from analyses. This dependence on size also means that it may not be possible to perform analyses of heterogeneity within different parts of a tumour or small structures such as lymph nodes. For example, differences in entropy between the edge and core of lung cancers have been reported to show an inverse correlation with survival on CT scans [6], but this type of analysis may not be possible with lower-resolution PET data.

A detailed study of the robustness of a number of first-, second- and high-order features was carried out on 18F-FDG PET images of three tumour types, namely NSCLC, metastatic colorectal cancer and breast cancer, and showed further potential limitations. A number of features were correlated with each other, with tumour volume or with standard indices such as SUV [43]. As expected, some textural features were more sensitive than SUVmax to segmentation methods, but less sensitive than SUVmean. It was concluded that none of the first-order features and only half of the other textural features were robust to the tumour segmentation method. It was also recommended that at least 32 grey levels should be used for resampling to avoid spurious correlations with SUV.

Another important factor when using any imaging parameters for response assessment is the reproducibility of the measurements. For first-order parameters, good test–retest reproducibility and reliability were found for area under the SUV volume histogram, although skewness and kurtosis were less robust [44]. In a study of 18F-FDG PET scans in patients with oesophageal cancer repeated within 4 days, local heterogeneity parameters including entropy and homogeneity showed better reproducibility than did SUV measurements (2 vs. 5 %), and a number of regional heterogeneity features were found to be similar to SUVs [45]. The same group examined the robustness of textural features of 18F-FDG PET in oesophageal cancer to the partial volume effect and segmentation method and the ability to predict treatment response. They found that local features (e.g. entropy, homogeneity, dissimilarity) and regional features (e.g. zone percentage) were robust and maintained differentiation power for response prediction [46]. A further study has also shown moderately good test–retest and inter-observer stability of features, including first-order, intensity-volume histogram, geometric and textural features in 18F-FDG PET [47]. We do not know how sensitive textural feature analysis is to measurements from scans performed at different times post injection. The accuracy and precision of texture analysis may therefore depend on individual scan acquisition and image reconstruction protocols as well as image quality parameters such as noise, resolution and motion artefacts. The technical features that may affect texture analysis are obviously important considerations when designing studies, particularly multicentre studies in which the methodology has to be carefully evaluated and standardised.

Clinical applications

Characterisation and segmentation

High-order textural features including coarseness, contrast and busyness have been used to differentiate head and neck primary and nodal tumours from normal tissues [48] and in a subsequent study by the same group, ROIs were segmented using textural features, an approach that showed feasibility for improving accuracy of radiotherapy planning of head and neck tumours [49]. Model-based fractal methods have also been investigated as a means of characterising pulmonary nodules on 18F-FDG PET images. A significant difference was found between benign and malignant nodules for fractal indices as well as SUVmax. However, SUVmax was significantly correlated with tumour size whereas fractal indices were not [27]. In some tumour types, such as peripheral nerve sheath tumours, it has proved possible to improve characterisation between benign and malignant lesions through qualitative scoring of 18F-FDG heterogeneity [50].

Prediction and prognosis

A number of studies have demonstrated the predictive and prognostic ability of textural parameters derived from 18F-FDG PET images and have shown these parameters to be superior to standard ones such as SUV. This applied to studies in sarcomas [23], head and neck cancer [24], oesophageal cancer [17, 51] and lung cancer [2528], Fig. 1. Conversely, textural features were not able to predict nodal status or outcome in cervical cancers [52, 53]. Some initial work using 18F-fluorothymidine PET in breast cancer has shown good test–retest repeatability in the evaluation of some features. In addition, tumours with fewer highly proliferating cells tended to respond less well to chemotherapy and a decrease in heterogeneity was found in the majority of responders at 1 week [54].

Radiogenomics and PET

Whereas there are increasing numbers of studies investigating radiogenomics with CT and MRI in oncology, there are relatively few investigating radiogenomics with PET. Nair et al. extracted 14 SUV-based uptake features from 18F-FDG PET scans of patients with resected NSCLC and showed associations with distinct genes and gene signatures; they also showed a multivariate 18F-FDG uptake feature to be prognostic [55]. These authors concluded that the prognostic models they evaluated could increase understanding of 18F-FDG as a biomarker at the genomic level. A radiogenomics strategy has also been tested in patients with NSCLC linking gene expression and 180 image features from PET and CT images. A number of metagenes could be predicted from CT features, and CT features and PET SUV could be predicted using the metagenes [56].

Conclusion and future directions

Radiomics with medical imaging and PET are in an early phase of study and there are technical issues that still need to be further developed, evaluated and addressed. However, if these challenges for technical validation can be overcome, it is expected that radiomic data, alone or in combination with other-omic data, may lead to a personalised approach to cancer management in the future. With the growth of hybrid imaging such as SPECT/CT, PET/CT, and more recently PET/MRI, the volume of potentially useful image parameters and metrics available is likely to increase. With the wealth of additional information that is available but unused in medical images, a radiomics approach whereby additional data are extracted automatically and then subjected to bioinformatic analysis may be a way forward to achieve better tumour segmentation, definition, characterisation, prediction and prognostication. This approach, in combination with the use of molecular analysis of tumours that is now becoming routine in some cases for treatment stratification, may contribute to improvements in cancer care.