Background

Structural and functional imaging provide data that are integral for diagnosis, treatment response evaluation, and surveillance in patients with head and neck squamous cell carcinoma (HNSCC). The large amount of volumetric bioimaging information amassed in institutional archives constitutes an extensive database amenable to high-throughput, quantitative image analysis. Radiomics refers to automated extraction of high-dimensional sets of quantitative descriptors (“radiomic features”) from medical images (e.g. CT, MRI, PET etc.) for development of novel diagnostic and prognostic biomarkers. Machine learning (ML) algorithms and artificial intelligence (AI) are best suited for analysis of radiomics high-dimensional data. Radiomics provides fast, low-cost and non-invasive, yet comprehensive tissue and organ characterization, as features are extracted directly from (pre-processed) standard-of-care medical images. The generated features offer information complementary to traditional clinical predictors in numerous applications, which may help advance cancer care towards personalized precision medicine. Numerous recent radiomics studies have focused on classification, characterization, prognostication, and treatment guidance of HNSCC.

Paired with key clinical predictors, radiomic analysis can capture a large variety of HNSCC properties [1], enabling the predictive models to more accurately reflect the spatial, metabolic, and morphological heterogeneity of primary tumor lesions and metastatic lymph nodes. This review aims to provide an overview of recently published HNSCC radiomics studies focusing on (molecular) characterization, classification, prognostication and treatment guidance. The general principle of radio(geno)mic analysis and the typical radiomics workflow are introduced. We also discuss the applications of advanced machine learning for radiomics-based modelling. Finally, we summarize future challenges, barriers and limitations of individual radiomic applications, as well as the field of head and neck radiomics in general.

Radiomics

Over the last decade, advancements in high-throughput computing and machine learning algorithms have led to emergence of the “-omics” concept – referring to the collective characterization and quantification of pools of biologic information, such as genomics, proteomics or metabolomics. Radiomics refers to automated extraction of mathematically defined, numerical descriptors (“radiomics features”) from 2-dimensional – or more commonly – 3-dimensional medical images and subsequent application of data mining and analysis techniques. Over the past few years, there has been an increasing interest in application of radiomics in patients with HNSCC for prediction of molecular biomarkers, prognostication, and treatment response.

Radiomics features commonly describe shape, intensity (histogram) and texture characteristics. These features can be extracted from different imaging modalities, such as CT, MRI, or metabolic imaging like 18- fludeoxyglucose positron emission tomography (FDG-PET). The notion that certain characteristics of medical images – which are not reliably assessed by human visual inspection – can provide medically meaningful information for diagnostic and prognostic purposes as well as treatment guidance is the underlying hypothesis in the emerging field of radiomics [2]. Prior studies showed that radiomics features represent biological characteristics of the tissue such as cellularity, heterogeneity, and necrosis [3]; and frequently exhibit correlation with diagnostic and outcome variables [2]. Furthermore, certain features can be reflective of molecular and genetic characteristics of malignant tissue. The subfield of Radiogenomics focuses on the identification and scientific exploitation of relationships between quantitative bioimaging features and genomic characteristics of the tumor [4]. It is worth noting that radiomics analysis captures information from the whole volume of interest (VOI), and therefore may act as a quantitative descriptor of tumor spatial heterogeneity, whereas the diagnostic validity of localized tools like tissue sampling may be degraded in heterogeneous tumors [3, 5].

Radiomics workflow

Despite not being part of the radiomics workflow in a narrower sense, image acquisition is often considered the first step in radiomics analysis. Radiomics feature robustness and reproducibility against variation in scan acquisition protocols have been extensively investigated across imaging modalities and in various settings [6], including test-retest assessments [7,8,9], studies designed to evaluate the impact of scanner types/manufacturers using phantoms [10, 11], reconstruction algorithms /slice thickness [12, 13], and motion artifacts [14]. Traverso et al. [6] conducted a systematic review of 41 studies investigating the reproducibility and stability of radiomics features in phantoms and different cancers – including lung, HNSCC, and esophageal cancer – and found that only three studies investigated radiomics reproducibility in HNSCC. Bagher-Ebadian et al. [15] investigated the impact of smoothing and noise on CT and cone beam CT textural features and reported general feature robustness against low-power Gaussian noise and low pass filtering, whereas a high-pass filter significantly impacted textural features. Bogowicz et al. [16] focused on feature stability regarding CT perfusion calculation factors. Finally, Lu et al. [17] studied the effect of seven different segmentation methods and 5 forms of fixed-bin SUV-discretization on PET radiomic features, reporting 50 and 23% of 88 tested features were robust to FDG-PET segmentation and discretization, respectively (with robustness ascertained by an intraclass correlation coefficient ≥ 0.8). While there is as yet no consensus regarding stable radiomic feature sets, it is crucial to assess stability of radiomic features in each study – especially for generalization of findings and future comparison.

The next step in the radiomics workflow involves the delineation (“segmentation”) of the target area/volume in medical images, resulting in image sub-sections referred to as regions of interest (ROI) and volumes of interest (VOI) in 2- and 3-dimensional images, respectively. Manual and (semi-) automated segmentation have both been applied in recent radiomics studies, each with its inherent advantages and drawbacks. Manual segmentation is affected by observer variability; several studies investigated the inter- and intra-rater reproducibility of CT and PET radiomic features extracted from repeated manual segmentations of lung cancer lesions [8, 18, 19]. Lu et al. [17] assessed feature stability across manual and various automated segmentation techniques applied to oropharyngeal cancer lesions on PET scans and showed that 50% of features extracted from 18-FDG-PET achieved an intraclass correlation coefficient ≥ 0.8, which was considered sufficiently reproducible. Across all studies, individual radiomic features were found to exhibit varying degrees of robustness against observer variability, suggesting stability measures may be appropriated for feature dimensionality reduction. A multitude of (semi) automated segmentation methods have been proposed or adapted for Radiomics purposes [20]. An in-depth discussion of the various algorithmic approaches is beyond the scope of this review; however, reproducibility and observer variability are certainly a minor concern with (semi-) automated approaches. On the other hand, fully automated segmentation can only be “as good as” the expert-generated ground truth data used for development and may be impaired by artifacts, presence of multiple pathologic findings and other abnormalities not considered in the development process. The resulting imprecisions in segmentations will undoubtedly affect the quality and usefulness of extracted radiomic features, warranting thorough human validation.

Image pre-processing is usually applied as the next step following segmentation: Resampling voxels to uniform sizes is often necessary due to the heterogeneity of the available imaging data, originating from different scanners and reconstruction protocols. Additionally, resampling to isotropic voxels (i.e. voxel with identical edge lengths) should be considered as it guarantees rotational invariance of texture features [21]. While CT imaging uses a “real-valued” grey scale (the Hounsfield unit scale is an absolute representation of physical density), other imaging modalities require gray scale homogenization to facilitate inter-patient comparability of radiomic features; for example, PET scanners measure radioactivity concentrations [MBq/mL] which directly depend on the amount of injected radiotracer and patient weight [22]. To compensate for variability, the standardized uptake value (SUV) is calculated for each voxel as a relative measure of radiotracer uptake in clinical practice as well as radiomics studies [17, 19, 23,24,25]. MRI grey scales are expressed in arbitrary units unique to the hardware and reconstruction method used. Presence of heterogeneous image acquisition variables in an MRI dataset always necessitates image normalization before radiomic feature extraction [26,27,28]. Notably, in addition to the original image, radiomic features are often extracted from transformed or filtered images. A multitude of studies applied wavelet-decompositions to extract texture features from different frequency bands of the original image [7, 18, 24, 25, 29, 30]. Smoothing filters (e.g. Gaussian filters) or combined filters (e.g. Gaussian smoothing followed by Laplacian for edge enhancement) have also been implemented by some studies [20, 30,31,32].

Radiomic feature extraction represents the last step of common radiomics pipelines. Zwanenburg et al. published “The image biomarker standardisation initiative” (IBSI), which is the most recent attempt to standardize image pre-processing and radiomics feature sets across the field [21]. In brief, IBSI defines 11 feature families, assessing geometric aspects of the ROI/VOI shape, quantifying the grey scale intensity (distribution), and lesion texture. Feature extraction is usually performed by dedicated software in a fully automated fashion. Recent studies extracted their feature sets from original images and several derivatives thereof – generated by filtering, resampling and transformation. This approach commonly yields feature vectors in the magnitude of hundreds to several thousand data points per segmented ROI/VOI.

Both open source- and in house-developed feature libraries and radiomics extraction software have been utilized in recent radiomics studies. Two commonly used open-source solutions for radiomics feature extraction are the “Imaging Biomarker Explorer (IBEX)” [33], and “PyRadiomics” [34]. They represent adaptable, configurable platforms for image preprocessing and feature extraction and were applied in recent HNSCC radiomics studies (for example IBEX in refs [31, 35], .PyRadiomics in refs [36, 37].). The considerable methodological variability in HNSCC-related radiomics studies heralds the need for devising evidence-based consensus radiomics pipelines to improve reproducibility and generalizability. Fig. 1 summarizes the essential steps in common radiomics pipelines.

Fig. 1
figure 1

Typical radiomics workflow pipeline

Machine learning analysis of radiomics features

Radiomics pipelines extract high-dimensional, quantitative feature sets from medical images [2]. This bioimage-based information is most helpful when combined with clinical variables, serum markers, and other conventional prognostic biomarkers, creating the need for efficient analysis and development of predictive models based on high-dimensional data. Machine learning (ML) methods have proven to be statistically powerful tools for taking on such challenges [2, 38].

ML refers to a series of statistical algorithms driving their functionality from labelled or unlabeled training data, rather than applying predefined sets of rules and functions [38]. This property is ideal in the setting of radiomics research, where extensive numbers of bioimaging features are extracted, to predict molecular biomarkers, histopathological characteristics, clinical outcome, or treatment response [2].

To limit overfitting and augment generalizability, ML studies are ideally based on training, validation, and independent/external testing in separate datasets [38]. The training and validation datasets are used to iteratively fit the ML model (training data), assess its performance (validation data) and optimize model parameters (“tuning of hyperparameters”) [38]. Alternatively, cross validation may be applied to fit/assess/tune the model based on random subdivision and iterative rounds of training and validation [20]. The independent/external test cohort will be kept fully isolated from the model development process and is used to test the final ML model and confirm its performance and generalizability [20, 38].

Typically, in radiomics studies, a data dimensionality reduction strategy is combined with a ML classification or regression algorithm [20]. Dimensionality reduction usually aims to exclude redundant and unstable features and rank-orders the remaining features according to their predictive association with the target outcome. Then, ML algorithms combine the most predictive features into a meaningful, predictive model [20]. The model is next applied to the validation set, where its performance is assessed [38]. The process is iteratively repeated and hyperparameters are adjusted throughout [38].

In a study exploring the importance of feature selection in radiomics analysis, Parmar et al. [29] compared different combinations of 13 feature selection methods and 11 ML classifiers to predict overall survival based on a set of 440 radiomic features extracted from 231 HNSCC primary tumor lesions in contrast-enhanced CT images. Using multifactor analysis of variance (ANOVA) on the receiver operating characteristics (ROC) area under the curve (AUC), they assessed the effect of three ML framework variables (feature selection methods, classification methods, number of selected features). They found that while ML classification methods accounted for 29.02% of the total variance in classification accuracy, the feature selection methods explained 14.02%, and the interaction of classifier and feature selection explained 16.59%. These findings highlight the importance of selecting the appropriate combination of feature selection and ML models, for example by testing various combinations of algorithms with high performance in prior studies.

In the field of head and neck cancer radiomics, classification and (survival) regression models are frequently applied for prediction of molecular markers, identification of genomic signatures, diagnostic differentiation of suspected tissue, survival prognostication, and prediction of treatment response. Increasing numbers of publicly available mega-data and open-source machine-learning algorithms have paved the road for development of novel multivariate diagnostic and prognostic biomarkers integrating quantitative radiomics features and clinical variables for risk stratification, outcome prediction, and precision treatment planning in HNSCC.

Radiomics signatures of HNSCC molecular markers

Multiple recent radiomics studies reported the associations of bioimaging features with various molecular HNSCC traits, such as human papillomavirus (HPV) status, somatic mutations, methylation and gene expression subtypes and PD-L1 expression levels. Among all investigated HNSCC molecular traits, HPV has been evaluated the most:

Human papillomavirus status

The incidence of HPV-associated oropharyngeal SCC (OPSCC) has been rising in recent decades [39, 40]. The prevalence of HPV-associated forms among OPSCC in North America has increased from 50.7% before 2000, to 69.7% in the period from 2005 through 2010 [41]. HPV-positivity is a strong, independent prognostic factor for favorable outcome and overall survival (OS) in patients with OPSCC [42, 43]. HPV association in HNSCC is associated with distinct tumor morphology (smaller primary tumors, marked cervical adenopathy at presentation), younger patients’ age at presentation, and favorable response to radiation therapy [43]. Consequently, the latest release of the American Joint Committee on Cancer (AJCC), and Union for International Cancer Control (UICC) staging manuals have classified HPV-mediated OPSCC as a distinct tumor entity with different staging rules from the OPSCC-negative form [44, 45]. In addition, recent studies suggest that HPV association may analogously impact OS in non-oropharyngeal forms of the HNSSC [46, 47].

Since 2015, multiple studies have demonstrated the association of radiomic features with HPV status in HNSCC: While Buch et al. [48] and Fujita et al. [49] examined the association of individual texture features with HPV status, other groups have designed machine learning classification models for HPV prediction in HNSCC. Table 1 summarizes prior work in this field.

Table 1 Prediction of HPV status based on radiomics features of HNSCC tumors

Of note, some studies did not report the details of the HPV test used for ground truth labeling [48, 49, 53, 54, 57], and some used p16 immunohistochemical surrogate testing to consequently predict p16 status [59]. Many studies evaluated their models’ generalizability in independent confirmation cohorts and confirmed similar performance as compared to the training datasets [50,51,52, 54, 56]. While the majority of studies to date have applied CT-based radiomics for HPV classification, Vallieres et al. [60] reported their preliminary results based on radiomics features from FDG-PET scans in 67 patients with HNSCC. In addition, quantitative diffusion MRI studies have shown the difference in apparent diffusion coefficient values between HPV-positive and HPV-negative OPSCC [61,62,63]; however, there has yet been no report of MR-based radiomics signatures for prediction of HPV status.

A potential application of radiomics-based biomarkers for HPV status would be to aid pathologists if standard p16 immunohistochemical staining is equivocal or to supplement the immunohistochemical tests in subjects requiring second-line testing. For routine clinical HPV-testing, the 2018 Guideline from the College of American Pathologists recommends p16 immunohistochemistry as a surrogate marker for HPV-association on samples from the primary tumor or cervical level II or III nodal metastases. However, they recommend using HPV-specific testing – such as in situ hybridization for HPV DNA – in certain p16-positive cervical nodes or multisite primary tumors [59]. In such cases, radiomics-based biomarkers may be an inexpensive substitute confirmatory test for HPV status.

In addition, radiomics signatures for HPV classification may serves as a prognostic biomarker in patients with OPSCC. Leijenaar et al. [52] used contrast-enhanced CT radiomic features from OPSCC primary tumors (628 subjects for training and 150 for validation) to devise a radiomic biomarker for HPV status. Using Kaplan-Maier survival analysis, they showed that both p16 (as a surrogate for HPV), and the radiomics-based classifier could differentiate low- versus high-risk patients in survival curve analysis. Future studies will likely explore the role of other imaging modalities such as MRI or FDG-PET as well as state-of-the-art ML classifiers to enhance classification performance. There is also a potential role for application of radiomics to detect HPV-association of metastatic nodes in carcinoma of unknow primary that may direct search for the tumor origin to the oropharynx.

Radiomics biomarkers of HNSCC molecular subtypes beyond HPV status

Several recent studies have proposed novel radiomics biomarkers for prediction of HNSCC molecular features and subtypes, aside from HPV status.

Zwirner et al. [64] hypothesized that frequently mutated HNSCC driver genes may correlate with radiomics features known to quantify intra-tumor heterogeneity. The analysis was thus focused on three radiomics features initially described by Aerts et al. [18]. A total of 20 patients with locally advanced SCC of the oral cavity, oropharynx or hypopharynx were recruited for a prospective study by [64]; next-generation tumor sequencing and radiomics analysis of corresponding non-contrast radiotherapy planning CTs was performed. The presence of mutations in known driver genes (TP53, FAT1 and KMT2D) were correlated with each of the three selected radiomics features; and showed significant association of all three tested radiomics features with FAT1 [64]. The authors suggested that these findings are likely related to lower heterogeneity in FAT1-mutated HNSCC tumors.

Huang et al. [51] studied a series of molecular HNSCC “phenotypes”: five DNA methylation subtypes, four previously identified HNSCC gene expression subtypes (transcriptomics-based [65]) and five common somatic gene mutations. DNA methylation aberrations were explored using the MethylMix algorithm [66], followed by consensus clustering for subtyping. Contrary to Zwirner et al. [64], Huang et al. used a large radiomics feature set comprised of 540 individual features extracted from pre-treatment contrast-enhanced CT scans of 113 patients [51]. Feature selection and LASSO-penalized logistic regression were applied in nested cross validation. Multi-class classification was facilitated using a “one-vs-all” approach (i.e. binary classifiers were trained to predict any given class against all others). The machine learning classifiers yielded moderate to good predictive performance in identification of the HNSCC molecular phenotypes, even exceeding models based on clinical variables only.

In a cohort of 126 HNSCC patients, Zhu et al. [57] examined the correlation of radiomic features extracted from contrast-enhanced CT-images with whole-genome multiomics data (microRNA expression, somatic mutations, transcriptional activity of pathways, copy number variations and promoter region DNA methylation changes of pathways). They identified over 5000 significant associations, suggesting widespread association of genomic markers and radiomic features from various feature families. Additionally, Zhu et al. trained random forest classifiers in 5-fold cross validation to predict HPV status (Table 1) and disruptive TP53 mutation status, with the most predictive model yielding an AUC of 0.641 (averaged across 30 cross validation repetitions).

In 2016, nivolumab and pembrolizumab were FDA-approved for treatment of recurrent or metastatic squamous cell carcinoma of the head and neck with disease progression on or after a platinum-based therapy [67]. Expression of programmed cell death protein 1 ligand (PD-L1) is the single factor that is most strongly correlated with response to PD-1 blockers like nivolumab or pembrolizumab [68]. Since overall response rates to these agents are low, ranging from 13 to 18% [69, 70], quantification of PD-L1 expression by immunohistochemical staining has been applied to identify patients who are more likely to respond [71, 72]. Extracting textural features from the PET-portions of staging FDG-PET/CT scans, Chen et al. [73] reported significant association of several radiomics features with PD-L1 expression in 53 patients with oropharyngeal and hypopharyngeal SCC. Multivariate logistic regression analysis revealed one FDG-PET radiomics feature as an independent predictor for PD-L1 expression (PD-L1 staining cutoff of 5%) [73].

Thus far, exploratory studies show associations of CT- and FDG-PET-derived radiomic imaging features with genomic, transcriptomic and proteomic characteristics of HNSCC, suggesting that future “multiomic” investigations of HNSCC should incorporate radiomics-based biomarkers. Additional imaging modalities as well as molecular targets are the focus of future investigations.

Prediction of recurrence, treatment response, and survival in HNSCC

Despite major efforts in treatment and drug development, prognosis of HNSCC is generally poor, with five-year survival rates in Europe ranging from 25% in hypopharyngeal cancer to 59% for cancers of the larynx [74,75,76]. Additionally, the majority of patients with HNSCC presents with advanced-stage disease [76, 77].

More accurate risk stratification, treatment response prediction and prognostication may help clinicians to selectively plan treatment options, guide treatment intensity and ultimately tailor personalized cancer care for their patients. This notion triggered interest among scientists, making outcome prediction by means of bioimaging-features the most popular field within head and neck radiomics.

Table 2 summarizes recent studies focusing on prediction of survival, locoregional recurrence, distant metastasis, progression or treatment failure as well as several composite outcome endpoints. One study used radiomics for prediction of early response to induction chemotherapy [27]; another predicted response to chemoradiotherapy [28] – both in nasopharyngeal carcinoma. Oropharyngeal SCC, laryngeal SCC, hypopharyngeal SCC, nasopharyngeal cancer and combined HNSCC cohorts were investigated from 2013 through 2019, with a marked range in terms of cohort size: While exploratory studies used as few as 30 cases [92], others gathered expansive datasets. For example: 240 and 204 contrast CTs were used for model training and testing, respectively, by Zhai et al. study [93], reporting significantly better prognostic performance of a combined model (radiomics + clinical predictors) as compared to a clinical-variables-only model for disease-free survival in HNSCC. Using 542 oropharyngeal SCC cases from Canada, Leijenaar et al. [87] externally validated a radiomics signature previously devised by Aerts et al. [18] on 422 non-small cell lung cancer contrast-enhanced CTs, which showed significant prognostic differentiation in Kaplan-Meier overall survival analysis in all sub-cohorts. A similarly large dataset of pre-treatment contrast-enhanced CT scans (465 oropharyngeal SCC cases) was analyzed by the Head and Neck Quantitative Imaging Working Group of M.D. Anderson Cancer Center [31]; whose proposed 2-feature-signature could robustly discriminate between the high- versus low-recurrence probability groups. Individual radiomics features, radiomic signatures/scores (e.g. (linear) combinations of several features [18, 27, 31, 94]) as well as ML-generated models [29, 30, 95] showed significant predictive value in a multitude of HNSCC settings, including various HNSCC sub-entities, and outcomes (Table 2).

Table 2 Prediction of locoregional recurrence, treatment response, and survival

The complimentary value of radiomics analysis in addition to conventional “clinical” predictors has been emphasized by several groups [18, 93, 96]. However, using multi-institutional and multi-national dataset of 726 pre-treatment contrast CT scans and 686 FDG-PET scans, Ger et al. [35] were unable to improve HNSCC overall survival prediction using multivariate Cox proportional hazard models incorporating only two and one radiomic features in separate CT-based and FDG-PET-based analysis, respectively. These findings suggest more complex analysis strategies may help improve predictive performance. Leger et al. [30] applied 11 ML algorithms combined with 12 feature selection methods in a proof-of-technology study and identified several promising combinations which may be applied in future time-to-event modelling. Combining large HNSCC cohorts with advanced ML analysis may eventually enable radiomics to more consistently improve prognostic models.

Contrast-enhanced and non-contrast CT, (contrast-enhanced) T1 and T2 MRI sequences and FDG-PET imaging were all applied for radiomics based outcome prediction (Table 2) as well as some less common imaging techniques including diffusion-weighted MRI [28], 18F-fluorothymidine-PET [92], and perfusion CT [80]. Studies listed in Table 2 applied different analytical strategies, such as using single feature, feature combinations (“signatures”, “scores”) or more complex combined models; such analytical heterogeneity limits direct comparison of studies [97], and cannot be fully reflected in Table 2. The majority of studies, however, applied multivariate Cox proportional hazard models, the results of which are summarized in the table. The performance of radiomics, clinical or combined models with regards to the respective outcome(s) prediction is expressed in the Cox-model hazard ratio, and the concordance index typically reflects the overall accuracy of models in survival prediction.

Detection of extra-nodal extension of metastasis

Extra-nodal extension (ENE) of metastasis in cervical lymph nodes is a poor prognostic factor and is associated with higher risk of developing recurrent disease [98,99,100,101]. Thus, the presence of ENE warrants addition of chemotherapy to adjuvant irradiation [98,99,100,101], requiring tri-modality treatment with increased toxicity and patient morbidity [102, 103]. Reliable detection of ENE prior to the therapy, could help guide treatment choices, reduce morbidity, and avoid surgery in patients likely requiring adjuvant chemoradiation. In clinical practice, ENE is ascertained by pathology review after neck dissection, whereas radiographical identification remains challenging [104, 105]. Kann et al. developed [106] and validated [107] quantitative imaging tools for pre-operative detection of ENE: the group segmented 653 nodes in total (380 negative, 153 without ENE and 120 nodes with ENE) on contrast-enhanced CT scans and extracted 99 radiomic features [106]. Random forest ML classifiers were trained and yielded an AUC (95% confidence interval) of 0.88 (0.81–0.95) for the detection of ENE and 0.91 (0.86–0.97) for nodal metastasis detection in an independent test set of 131 lymph nodes; whereas – being the methodological focus of the study – a deep neural network yielded an AUC performance of 0.91 (0.85–0.97) and 0.91 (0.86–0.96) for ENE and metastasis detection, respectively [106]. The deep neural network model generalized well to an external test set, outperforming radiologists in ENE classification [107].

Of note [106, 107], there was no significant difference in performance of deep neural networks (exploratory radiomics) over (preset conventional) radiomic analysis in detection of ENE. They highlight the potential quantitative imaging may possess for augmenting radiologist performance and guiding HNSCC treatment.

Predicting post chemoradiotherapy complications

Radiotherapy combined with chemotherapy (chemoradiotherapy, CRT) is the mainstay treatment regimen for many patients with HNSCC [101]. However, patients not uncommonly suffer from treatment-related side effects such as xerostomia, trismus, hearing loss, mucositis and dermatitis. Identification of those patients who are at risk of developing specific side effects may guide oncologists to plan personalized treatment strategies and adopt preventive remedies to improve therapy tolerance. Several groups have devised radiomics biomarkers to predict the occurrence or severity of treatment-related toxicities based on bioimaging features of at-risk organs.

Xerostomia

Radiation-induced xerostomia is a common side effect of radiation therapy for HNSCC and remains a challenge in long-term patient management [108, 109]. The dose-dependent increased risk of xerostomia after irradiation of the salivary glands is well established [109]. Four separate groups designed radiomics-based models to predict post-radiation xerostomia in patients with HNSCC with or without concurrent chemotherapy (Table 3). Imaging features were extracted from salivary glands – either the parotid gland(s) or parotid glands and submandibular glands. A heterogeneous set of xerostomia endpoints was investigated: Sheikh et al. [110] predicted a binary xerostomia-endpoint 3 month post radiotherapy; Liu et al. [111] applied regression analysis for acute xerostomia prediction; and van Dijk et al. [112,113,114] used three different imaging modalities (CT, MRI, FDG-PET) for long-term binary xerostomia outcome classification. Furthermore, the xerostomia assessment methods varied: Liu et al. used objective saliva amount measurements over 5 min [111], whereas other groups used patients-filled questionnaires [112,113,114]. While these results appear promising, their clinical application is limited by the lack of external validation, heterogeneity in image processing, statistical analysis, and treatment outcome measures.

Table 3 Prediction of post-radiation xerostomia based on salivary gland radiomics features

Trismus

Trismus in HNSCC patients may result from involvement of masticatory muscles in radiotherapy treatment fields, surgery or cancerous invasion into mastication structures or the neural innervation of masticatory muscles [117, 118]. Defining trismus ≥ Grade 1 by CTCAE v4.0 (Common Terminology Criteria for Adverse Events Version 4.0 [115]) criteria 1 year following completion of intensity-modulated radiotherapy (IMRT), Thor et al. [119] compared 24 imaging features from four masticatory muscles on contrast-enhanced post-treatment T1-weighted MRI scans in 10 patients with radiation-induced trismus, versus 10 control subjects. The best discriminative ability among radiomics predictors was observed for the Haralick Correlation GLCM-matrix feature of the medial pterygoid muscle VOI (logistic regression p = 0.12, AUC = 0.78). Their result was not significant, but may be indicative of a potential of radiomics biomarkers for prediction of post-radiation trismus. Studies in larger cohorts may be the focus of future research, to devise radiomics signature predictive of post-radiotherapy trismus.

Hearing loss

Abdollahi et al. [120] explored the potential application of cochlear radiomics for prediction of chemoradiotherapy-induced hearing loss. Using radiomics features extracted from the cochlea on pre-treatment CT scans, they evaluated 47 cancer patients (brain, nasopharynx, parotid, other) treated with 3-dimensional conformal radiation therapy, 23 of whom also received cisplatin-chemotherapy. They showed that combination of radiomic features with clinical and dosimetric variables may predict radiotherapy-induced sensory neural hearing loss.

Future directions, challenges and barriers

The next leap forward in radiomic analysis undoubtedly lies in developing decision support and prognostic tools for day-to-day clinical usage. However, several key barriers and challenges in the field of quantitative imaging should be addressed first:

While exploratory radiomics studies have achieved promising results throughout, independent large-scale validation is lagging [121, 122]. A recent publication by Kim et al. [122] reported on design characteristics of 516 studies applying AI algorithms for diagnostic analysis of medical images. Only 31 studies (6%) have validated their proposed models in external test cohorts – i.e. cohorts from institutions other than the one providing the training data, as well as cohorts obtained from the same institution but a different time period as the training data. On the other hand, usage of homogenous, single-institution or even single-scanner training data may limit the generalizability of radiomics-based models [121]. These limitations highlight the importance of multi-institutional, multi-national medical imaging archives for development of radiomics tools for future clinical usage. Data sharing may help mitigate the shortage of diverse imaging data [2]; hence, platforms like “The Cancer Imaging Archive” (TCIA) were created. TCIA publicly hosts de-identified imaging collections with corresponding clinical data and provides digital infrastructure for data sharing [123]. As of December 2019, nine head and neck cancer collections are available comprising CT, MRI and FDG-PET imaging data [123].

Further challenges lie in the implementation of the radiomics pipeline (including image acquisition) as outlined in this article. Forghani et al. [1] described sources of variation impairing generalizability and reproducibility of radiomics studies, including:

  • Scan acquisition parameters

  • Variability in post-contrast images – such as the degree of enhancement achieved, depending on timing of a contrast agent administration, patients’ circulatory dynamics, anatomical location of the VOI

  • Wear and tear of scanners

  • Differences in manufacturer, model, type of scanner

  • Reconstruction parameters

  • VOI/ROI segmentation

  • Radiomics feature set / feature extraction

Preprocessing steps like resampling and filtering (Fig. 1) may help mitigate some variation. However, standardization of reconstruction and acquisition parameters across providers as well as scanner components among manufacturers should be pursued as the field moves towards clinical application of AI-driven image analysis.

Inter- and intra-observer VOI/ROI delineation variability could be addressed by using semi-automated or automated segmentation tools. In addition, there have been efforts to standardize radiomics features – the most recognized being the “The image biomarker standardisation initiative” (IBSI) [21]. Moreover, open-source feature libraries and radiomics extraction software packages like “PyRadiomics” [34], or the “Imaging Biomarker Explorer” [33] allow for reproducible feature extraction as well as easy reporting of radiomics feature definitions and are increasingly adopted by recent publications.

Conclusions

Precision prognostication and treatment personalization is considered the next major evolution in cancer care, and the “-omics”-concept has been postulated as key enabler thereof. Numerous studies have established radiomics as powerful addition to the “-omics”-toolbox, and ongoing research provides incremental upgrades. Radiomics has indeed revolutionized the landscape of quantitative imaging research: In the future, fast, low-cost and comprehensive tumor and tissue characterization facilitated by radiomic analysis may constitute a compelling augmentation – or even alternative – for traditional clinical testing and prognostication, if adequate performance and stability is attained. Numerous studies in the past 6 years have reported potential applications of radiomics analysis for molecular classification, prognostic characterization, and treatment response prediction in patients with HNSCC. While recent exploratory studies yield promising results in the field of HNSCC radiomics, independent large-scale validation is lagging behind as access to multi-institutional, multi-national imaging data is restricted. Standardization of radiomics pipelines, image acquisition protocols, and outcome targets can pave the road towards engineering of radiomics tools for day to day clinical usage, and ultimately superior outcomes and reduced treatment-related toxicities in the field of head and neck cancer.