Introduction

About 50% of patients treated with radiochemotherapy (RCT) for locally advanced human papilloma virus–negative head-and-neck cancer (HNC) experience local and regional treatment failure [1, 2]. As salvage treatment options are limited, locoregional failure in most patients leads to severe symptoms and ultimately to death. Thus, overcoming treatment resistance by optimized RCT represents an important area of research. Preclinical and clinical data demonstrates that tumor hypoxia and other microenvironmental factors significantly contribute to tumor radiation resistance [3,4,5,6]. Different quantitative imaging biomarkers (QIBs) related to tumor hypoxia and microenvironment have shown potential for outcome prediction, early response assessment, and RT personalization, e.g., by means of risk adapted radiation dose modulation [7,8,9,10,11,12].

Hypoxia imaging using positron emission tomography (PET) with specific radiotracers such as [18F]-Fluoromisonidazole (FMISO) has proven prognostic power to predict outcome after RCT in HNC [7, 13,14,15]. Similarly, functional magnetic resonance imaging (MRI) techniques, such as diffusion-weighted (DW) imaging assessing tumor cellularity or dynamic contrast-enhanced (DCE) imaging which allows to analyze tissue vascularity and vessel permeability, have been correlated to tumor response after RCT in HNC and other solid tumors [8, 9, 16, 17]. Some studies correlated the spatial distribution of multiple QIB and suggested complementary biological information [18,19,20]. However, the optimal QIB or imaging profile using multiple QIB to predict outcome after RCT in HNC is unknown. Most results were derived from small observational clinical cohorts and none of the previous studies was able to relate relevant QIB to radiation resistance on a biological or pre-clinical level.

Future clinical use of QIB to personalize radiation dose to overcome treatment resistance requires a widely available, robust, affordable, and simple method to generate QIB to allow multicenter trials and easy access for patients. In contrast to molecular profiling [21, 22], liquid biopsy [23, 24], histopathology [25, 26], or combination with immunotherapy [27, 28], QIBs have the benefit of spatial tumor characterization [29] and thus optimal conditions for focal personalized interventions such as dose-painting, including dose escalation and dose de-escalation [13, 30, 31].

The aim of this preclinical study was to develop and train a multi-scale model from a broad and unbiased basis for prediction of high-risk subvolumes (HRS) in HNC linked to increased radiation resistance derived from hypoxia PET, DW-, and DCE-MRI. Multi-parametric small animal PET/MRI of xenograft tumors from different human HNC cell lines with variable, known radiation sensitivities were imaged and evaluated by novel machine learning (ML) methods to identify HRS in multi-dimensional imaging space. The hypothesis to be investigated in this study was therefore that with novel ML approaches new QIB or imaging profiles will be discovered to define HRS in a pre-clinical scenario, which may be used for future personalized radiotherapy (RT) interventions in a clinical setting.

Material and methods

Study design, animals, and tumor models

A total of 68 mice with implanted human HNC cell lines of different, known radiation sensitivities were examined with simultaneous functional PET/MRI before and after 2 weeks of fractionated RT. Details on animals, implanted cell lines, imaging data, and time points are summarized in Table 1. The animal facilities and all experiments were approved according to our institutional guidelines and the German animal welfare regulations (animal allowance no. 35/9185.81-2/R4/16). Two to 5 days before tumor cell injection, 4- to 6-week-old immunodeficient female nude mice (NMRI nu/nu, Charles River Laboratories) received a 4-Gy total body irradiation (6 MV photons, Elekta SL15, Crawley, UK) to further suppress the residual immune system. Eight well-established human HNSCC tumor cell lines (UTSCC-45, XF354, UTSCC-14, UTSCC-8, FaDu, UTSCC-5, CAL-33, SAS) with known radiation sensitivities in vivo [32, 33] were grown in cell culture (cf. Table 1). Exponentially growing cells of the third passage were trypsinised, and a single cell suspension with approx. 500,000 cells dissolved in 50 μl phosphate-buffered saline was prepared and injected subcutaneously on the right hind leg of the animal. Animals were checked regularly for weight loss, abnormal behavior, or other signs of distress. Tumor diameter was measured twice weekly. After reaching the target size of 7–10-mm diameter, tumors were examined using multi-modal, small animal PET/MRI before and after 2 weeks of fractionated RT.

Table 1 Preclinical data. Details on animals, head-and-neck cancer cell lines including mean and 95% confidence interval (CI) tumor control dose 50% (TCD50) according to [33], radiation sensitivities grouped into high (H), medium (M), medium/low (ML), and low (L) as well as number of complete imaging data sets, data sets with hypoxia positron emission tomography (PET), diffusion-weighted MR imaging (DWI), and dynamic contrast enhanced (DCE) MRI before the start of radiotherapy (RT) and after 14 days

Multi-modal imaging and radiotherapy

All animals were imaged with combined PET/MRI using a small animal 7-T MRI system with a dedicated PET insert [29, 34, 35]. Animals were anesthetized with a mixture of isoflurane (1.5–2.0%; Abbott, Wiesbaden, Germany) and air (flow rate 1.0–1.5 l/min) with continuous monitoring of the breathing rate and were placed on a warming pad to maintain constant body temperature during imaging. The imaging protocol consisted of simultaneous dynamic FMISO PET, anatomical T2-weighted MRI (T2w-MRI), DW-MRI, and DCE-MRI, with T2w- and DW-MRI in a gated acquisition technique with respiratory triggering (cf. Fig. 1).

Fig. 1
figure 1

Multi-dimensional pre-clinical imaging data. Example of pre-clinical imaging data consisting of A anatomical T2-weighted MRI, B FMISO PET, and C contrast-enhanced T1-weighted MRI, D apparent diffusion coefficients (ADC) derived from diffusion-weighted (DW) MRI

Dynamic PET was acquired in listmode for 90 min post injection (p.i.) of approximately 10 MBq FMISO in 200 μl of physiological sodium chloride solution (0.9%) into the animal’s tail vein. PET data was reconstructed to a total of 65 time frames (36 × 10 s, 18 × 60 s, 11 × 360 s) using 2D-OSEM (4 iterations, 8 subsets). DW-MRI was performed with an echo planar imaging sequence with nine equidistant b-values (b = 0–800 s/mm2). DCE-MRI was acquired for a total duration of 13.5 min starting 1 min before injection of the contrast agent (Gadovist®, Bayer Vital GmbH, Germany), with a temporal resolution of 5.4 s. Details about the pre-clinical image acquisition protocol are given in Table 2.

Table 2 Details of the pre-clinical PET/MR imaging protocol

Irradiation with ten fractions of 2 Gy per day was applied for 2 weeks using a dedicated small animal image-guided RT platform (SAIGRT, Dresden, Germany) [36]. For irradiation, the animals were immobilized using plastic tubes fixated on a precisely movable carbon table; the tumor-bearing leg was positioned using a foot holder. Positioning accuracy with respect to the radiation field was checked with portal X-ray imaging (80 kV, 0.8 mA). All irradiations were performed using iso-centric opposed fields with dedicated circular collimators (8–14 mm diameter) depending on tumor volume. Radiation dose and corresponding irradiation time were calculated as a function of tumor size.

ML-based identification of radioresistant clusters

Image pre-processing

During a data preprocessing step, the tumor region as well as a representative muscle region were defined manually based on the T2w-MRI data by an experienced radiation oncologist (SB) using the open-source software 3DSlicer. The tumor region was manually contoured on all image slices to encompass the whole lesion, excluding skin and bony structures. Resulting tumor volumes are summarized in Table 1. Muscle tissue was carefully contoured in the ipsilateral leg excluding bones and blood vessels. All quantitative MRI data were resampled to the PET image grid for subsequent processing and analysis. To correct for potential movements of the animal between different acquisitions, local rigid registrations between the respective images were performed using the open-source toolkit elastix (details on registration parameters are given in Supplementary Table S1). The registration result was carefully visually checked by an imaging scientist (SL) and a radiation oncologist (SB) and manually adjusted if necessary.

Extraction of quantitative parameter maps

Maps of apparent diffusion coefficient (ADC) values were derived from DW-MR images using a mono-exponential fit over all b-values with in-house software developed in python (scipy 0.19.1).

FMISO PET data was first transformed into static uptake parameter maps by generating a tumor-to-muscle ratio map from normalized voxel activity concentration with respect to mean muscle uptake in the second last FMISO PET frame (approx. 80 min pi) to avoid potential artifacts caused by the following MRI contrast agent injection. To further extract quantitative parameter maps related to tumor hypoxia from dynamic FMISO PET signals, FMISO activity concentrations were converted into maps of standardized uptake value (SUV) by normalization to body weight and injected activity. Then, a principal component analysis (PCA) was performed using the uncentered data to extract a reduced set of quantitative parameter maps. Based on the variance explained by the individual principal components (PCs), the projection coefficients of the first two PCs (FMISO_c1, FMISO_c2) were found to be sufficient to describe the measured tracer dynamics and kept for further analyses (Fig. 2).

Fig. 2
figure 2

Principal component analysis (PCA) of dynamic imaging data for FMISO PET (A) and DCE-MRI (B). Upper row: Variance of data explained by first five principal components (PC). Middle: Time-dependent curves of principal components 1 and 2. Lower row: Exemplary image voxel with raw data of FMISO PET and DCE-MRI and time curve reconstructed by PC 1 only or PCs 1 and 2

Similarly, for DCE-MRI, measured signal intensities \({S}_{{t}_{i}}\) were converted to relative signal increase

$$\Delta {S}_{{t}_{i}}=\frac{{S}_{{t}_{i}}-{S}_{0}}{{S}_{0}}$$

with \({t}_{i}=\left\{1, \cdots , 150\right\}\) being the time frames, and \({S}_{0}\) the baseline signal intensity, averaged over \(11\) frames acquired prior to contrast agent injection. Quantitative parameter maps were then derived from \(\Delta S\) data using PCA, yielding two final parameter maps containing the two first PC projection coefficients DCE_c1 and DCE_c2 (Fig. 2).

Model training for identification of radioresistant clusters

We propose a novel method for unbiased identification of tumor clusters defining HRS from multi-parametric quantitative imaging. This method is based on the hypotheses that recurrence after RT originates from such HRS inside the macroscopic tumor, which fails to be controlled by a standard radiation dose and fractionation due to its biological and physiological properties, and that a larger HRS translates into higher levels of radiation resistance. We therefore implemented a method which automatically extracts tumor clusters with similar biological and physiological properties as derived by joint information of quantitative maps from functional imaging and scores their ability to stratify tumor cell lines according to radiation sensitivity. In this way, relevant image parameters were learned which fulfill the hypotheses listed above.

A schematic overview of the machine learning approach to identify most relevant parameters in n-dimensional imaging space is provided in Fig. 3. For this analysis, only the imaging data cohort \({C}_{all}=42\), where all five quantitative parameter maps (ADC, FMISO_c1, FMISO_c2, DCE_c1, DCE_c2) were available for the first imaging time point, were included into the analysis (cf. Table 1). First, the total number of tumor voxels of the training cohort Call was collected in common parameter spaces. 1- to 5-dimensional (1D to 5D) image parameter spaces were built, with each dimension being spanned by one of the five quantitative parameters extracted from functional imaging. Samples in parameter space (tumor voxels) were \(z\)-normalized. During parameter space scanning, each 1D to 5D parameter space was scanned for connected clusters of a fixed number NHRS of voxels with similar parameters. According to [33], NHRS was chosen such that the fraction of tumor voxels belonging to HRS resulted in 15.0%, 7.5%, and 0% for tumor cell lines of low, medium, and high radiation sensitivity, respectively.

Fig. 3
figure 3

Preclinical model development. Schematical representation of machine learning model to identify clusters in multi-dimensional imaging space linked to radiation sensitivity: (I) Randomly select a cluster center in n-dimensional imaging parameter space. Each point in this 2D parameter plot corresponds to the corresponding parameter values of one tumor voxel in the cohort. (II) Identify the corresponding cluster using the K-nearest neighbor (KNN) clustering method. (III) Derive the fractional volume corresponding to this cluster in individual xenografts. (IV) Assess stratification potential S with respect to radiation resistance groups using Cohen’s d-score

Parameter space scanning was performed by repeating the following steps Nit = 5000 times: (1) randomly select one sample as cluster center Xcluster; (2) assign its \({N}_{HRS}\) nearest neighbors (KNN clustering) using the Euclidean distance from Xcluster in parameter space as proximity measure; (3) derive the fraction of voxels in this cluster fcluster for each individual tumor; (4) quantify the stratification potential of fcluster using a stratification score S.

Quantification of stratification potential

For a robust, score-based assessment of the stratification potential for each tested parameter combination, cell lines were grouped into classes of distinct radiation sensitivity based on previously published tumor control doses (TCD50, Table 1) [32, 33]. Cell lines with overlapping confidence intervals were considered not distinguishable with respect to radiosensitivity and were therefore grouped into the same class. By doing so, three distinct classes of cellular radiation sensitivity could be identified: a class of high (H) sensitivity (UTSCC-45, XF354, UTSCC-14, UTSCC-8), medium (M) sensitivity (FaDu), and low (L) sensitivity (UTSCC-5, SAS). UTSCC-5 could not be successfully implanted into animals. Imaging data of the cell line CAL-33 could not be reproducibly analyzed due to significant differences in image quality; further, no reliable assignment of radiosensitivity class based on the high reported range of TCD50 was possible. Therefore, CAL-33 was excluded from the analysis.

The stratification potential, i.e., the capability to separate groups H-M and M-L, respectively, for any investigated parameter combination was quantified by Cohen’s d as effect size measure

$$S_{ij}=\frac{\;\mu_j-\mu_i}{\sigma_{ij}}\text{with}\left(i,j\right)\in\left\{\left(H,M\right);\left(M,L\right)\right\}.$$

Here, \({\mu }_{i,j}\) is the mean of the assessed HRS of group \(i\) or \(j\) based on the different parameter combinations, whereas \({\sigma }_{ij}\) is the pooled standard deviation of groups \(i\) and \(j\), defined as

$$\sigma_{ij}=\sqrt{\frac{\left(n_i-1\right){\cdot\;\sigma}_i^2+\left(n_j-1\right)\cdot\sigma_j^2}{\left(n_i+n_j-2\right)}}$$

with \({\sigma }_{i,j}\) being the group variances and \({n}_{i,j}\) the number of observations in groups \(i\) or \(j\), respectively. The final score was defined as the arithmetic mean

$$S=\frac{{S}_{HM}+{S}_{ML}}{2} .$$

Selection of optimal HRS clusters in 1D to 5D imaging space

For each n-dimensional image parameter space, the clusters yielding the highest stratification score \({S}_{HRS, nD}\) and their corresponding cluster centers XHRS,nD were identified and used for comparing the performance of different parameter spaces. Furthermore, the differences of \({f}_{HRS, nD}\) between radiosensitivity groups H-M and M-L, respectively, were tested for significance using a Wilcoxon rank sum test. P < 0.05 was considered statistically significant.

Assessment of robustness

To evaluate the robustness of the identified stratification scores \({S}_{HRS}\) and their cluster centers XHRS, an internal bootstrap validation was performed for each parameter space. Each bootstrap cohort was drawn with replacement from the original training cohort Call, using a total number of Nbs = 50 bootstrap cohorts. Robustness was then quantified by deriving bootstrap-based 95% confidence intervals (CIs) for \({S}_{HRS}\) and XHRS, respectively. For an additional assessment of the robustness of XHRS, the distribution of identified scores after parameter space scanning was visualized as multiple 2D projections.

Extended cohort

To test the resulting ML models for the best models identified during training, model verification was performed using an extended cohort. For this purpose, an extended cohort consisting of all animal data available for the respective parameter combination Cmax was used including also incomplete data sets not used during training (cf. # imaging data sets given in brackets in Table 1).

Classical imaging parameters and multiple time points

For comparison, classical ADC-related imaging parameters reporting mean, minimum, and maximum value in a tumor were reported. In addition, ADCvalley was derived by the minimum ADC value in a connected image region of seven voxels to create a robust measure related to minimum ADC but unaffected by artifacts originating from partial volume effects at the edges of the tumor. Similarly, maximum and peak values of FMISO tumor-to-muscle ratio (\({TMR}_{max/peak}\)) and mean, maximum, and peak as average over seven voxels around the maximum FMISO SUV were calculated using the late PET frame acquired 80 min p.i. for each tumor and correlated to cell-line specific radio sensitivities. The full analysis pipeline described above was also carried out for imaging data acquired after 2 weeks of fractionated RT (w2).

Results

During model training in 1D to 5D search space on the baseline imaging data, we identified distinct clusters in 1D to 3D imaging parameter space which were able to significantly stratify the xenograft tumors according to their radiation resistance. Using 1D parameters only, the best stratifying cluster was obtained for ADC derived from DW-MRI which showed the highest stratification power with an effect size [95% CI] of \({\mathrm{S}}_{\mathrm{HRS},1\mathrm{D}}=1.69 \left[1.46-3.20\right]\), \(p=0.002\). Interestingly, the center (interval) of the cluster was located at \({\mathrm X}_{1\mathrm D,\mathrm{ADC}}=420\left[384;457\right]\cdot10^{-6}\text{mm}^2\text{/s}\), which corresponded to the left flank of the histogram generated from all animals. In comparison, we found significantly increased stratification potential for a 2D cluster defined by ADC and FMISO_c1 (\({\mathrm{S}}_{\mathrm{HRS},2\mathrm{D}}=2.68 \left[2.41-4.12\right]\), \(p=0.01\)). Details on cluster center and interval in terms of imaging parameter values are given in Table 3. The best stratifying cluster in n-dimensional imaging space was spanned by the 3D quantitative maps of ADC, FMISO_c1, and FMISO_c2, yielding an effect size of \({\mathrm{S}}_{\mathrm{HRS},3\mathrm{D}}=2.99 \left[2.50-4.44\right]\), with \(p<0.0001\) respectively. When further increasing the dimensionality of the parameter space, further improvement of \({\mathrm{S}}_{\mathrm{HRS}}\) was observed, which, however, was not significant (\(p>0.05\)) according to Mann-Whitney U test based on a bootstrap analysis with respect to \({\mathrm{S}}_{\mathrm{HRS},3\mathrm{D}}\). Best scoring models in 1D to 5D imaging space are summarized in detail in Table 3. A visualization of the n-dimensional search space is presented in Fig. 4, whereas Fig. 5 shows the corresponding stratification potential for the selected 1D, 2D, and 3D clusters. Figure 6 presents an example of one preclinical tumor (SAS) with annotations of 1D and 3D HRS.

Table 3 Best scoring parameter combinations. Stratification potential of multi-dimensional imaging clusters at baseline before RT for all 1D to 5D combinations. Best stratifying combinations are printed in bold
Fig. 4
figure 4

Visualization of stratification scores in 1D to 3D parameter space. Stratification scores S for the best-scoring 1D, 2D, and 3D imaging parameter spaces. 3D parameter space is shown as corresponding 2D projections for better visualization

Fig. 5
figure 5

Stratification potential of 1D to 3D clusters. Box plots showing significant stratification for data cohort \({C}_{all}\) of best scoring 1D, 2D, and 3D imaging clusters according to high (H), medium (M), and low (L) radiation resistance including Cohen’s d-score S for each cluster. HRS fraction is defined by the relative HRS of each sample normalized to the whole tumor volume

Fig. 6
figure 6

Visualization of 1D and 3D HRS clusters in addition to ADCvalley. Example of 1D (blue) and 3D (purple) HRS annotations inside a SAS-tumor, in axial (A), sagittal (B), and coronal views (C) overlaid to the anatomical T2w-MRI. The position of the crosshair defines the center of the region with lowest ADCvalley. Gross tumor volume (GTV) delineation is shown in green. Voxel structure of contours results from resampling of all functional data and GTV delineation to the PET image grid, which had the lowest resolution

Correlation of cell line specific radiation sensitivities with the classical imaging parameters in the tumor region did only yield significant stratification potential for \({ADC}_{valley}\) (\(p=0.006\)), cf. Figure 7 and Table 4.

Fig. 7
figure 7

Verification of stratification potential for extended cohorts \({C}_{max}\) of 1D to 3D clusters and ADCvalley. Box plots showing significant stratification for data of extended cohorts \({C}_{max}\) of best scoring 1D (N = 51), 2D (N = 45), and 3D (N = 45) imaging clusters according to high (H), medium (M), and low (L) radiation resistance in comparison to ADCvalley (N = 51) including Cohen’s d-score S for each cluster. HRS fraction is defined by the relative HRS of each sample normalized to the whole tumor volume

Table 4 Classical imaging parameters. Stratification potential of imaging parameters at baseline before RT related to FMSIO TMR and SUV values as well as ADC. Peak and valley parameters are calculated by an average value of seven voxels centered around the maximum or minimum value in the tumor, respectively

Figure 7 shows the validation results for the best 1D, 2D, and 3D models identified during training in addition to the only significant classical parameter \({ADC}_{valley}\). Stratification results of the different models in the extended cohorts \({C}_{max}\) are similar to those obtained the training cohort \({C}_{all}\), indicating high robustness of the method.

Following the same methodology, 1D to 5D parameter space scanning was performed for imaging data obtained after 2 weeks of fractionated RT. Here, only a 1D cluster defined by the FMISO_c1 map measured in w2 yielded significant stratification potential \({\mathrm{S}}_{\mathrm{HRS},1\mathrm{D},\mathrm{ w}2}=1.12 \left[0.90-3.69\right]\), \(p=0.041\). Results of n-dimensional model training in w2 of RT are detailed in Table 5.

Table 5 Results of ML cluster analysis after two weeks of radiotherapy. Best stratifying multi-dimensional imaging clusters after two weeks (w2) of fractionated RT. Best stratifying combinations are printed in bold

Discussion

In this study, we report pre-clinical training of a multi-dimensional PET/MRI-based QIB to detect HRS in HNC as potential target for future focal dose escalation. Our findings suggest that a HRS defined by a cluster of ADC values derived from DW-MRI correlates spatial maps of cellularity with individual radiation resistance considering a 1D quantitative functional imaging map as input. Highest stratification potential with respect to cell line specific radiation resistance was found for a 3D QIB created from ADC, and two PCs of dynamic FMISO PET information. Increasing dimensionality further did not significantly increase stratification potential, which may be due to redundancies hidden in the n-dimensional functional imaging data. Consequently, we identified a QIB profile from PET/MRI using a novel machine learning approach in a pre-clinical setting. Starting from a wide search approach with as few assumptions as possible using the main quantitative imaging techniques which are clinically available today, we were able to identify the most promising multi-parametric QIB for potential usage for future RT individualization.

The proposed method relies on the identification of a radioresistant cluster in parameter space only. Consequently, we do not per se assume a spatially connected area of the HRS inside the tumor. If spatial connection is given, HRS may be used for potential future local radiotherapy interventions, such as dose painting. If HRS voxels in contrast would be scattered throughout the tumor, this might be indicative of a generally more radioresistant tumor and dose painting strategies may result in a radiation dose escalation of the whole tumor. However, scattered HRS voxels throughout the GTV might also be caused by noise and potentially weak robustness of the model, which should be clarified in future validation studies in preclinical and ultimately also clinical settings.

Due to their limited size and heterogeneity, direct application of the ML models to identify spatially connected HRS regions in patients may not be possible. In this study, eight different cell lines with distinct radiation resistance levels were used, meaning that each small animal tumor must be understood as a role model for one voxel of a patient tumor. Consequently, the final model may not necessarily yield connected HRS areas but will require retraining and validation in patients.

ADC has been identified by earlier studies as potential prognostic QIB in HNC [8, 16], whereas other studies reported controversial results [37]. The discrepancy of earlier results may be due to over-simplified imaging measures such as mean ADC averaged over the whole tumor in contrast to the sub-volume approach based on clusters in multi-dimensional QIB space proposed in this study. Classical or global imaging parameters investigated in this study demonstrated that \({ADC}_{valley}\) appears to also be associated with radiation sensitivity. A potential explanation for this observation might be that \({ADC}_{valley}\) is a mean value calculated from seven voxels around the minimum \(ADC\) in a tumor sample and may thus be correlated to the 1D cluster identified during ML training on voxel level.

However, when using joint information from ADC maps derived from DW-MRI combined with two PC of dynamic FMISO PET, significantly better stratification was obtained compared to ADC only. This comes however to the expense of acquiring in addition to DW-MRI dynamic hypoxia PET which increases the level of complexity during patient examination and image acquisition enormously. So far, only small hypoxia PET patient data sets were reported due to the complexity of acquisition requiring experimental tracer production, extended scan times, and non-standard data analysis strategies which make a broad roll-out of this technology unrealistic [5]. Nevertheless, these findings corroborate earlier results reported by our group and others that dynamic hypoxia PET has prognostic character with respect to RCT outcome [7, 12, 15]. Assuming that repeated functional imaging will further enhance the power of image-based adaptive RT interventions, it appears that dynamic hypoxia PET is more complex, costly, and not as broadly available as DW-MRI. Thus, from a pragmatic point of view, DW-MRI appears promising for wider clinical roll-out with change of practice even if less predictive than 3D-HRS combining DW-MRI and FMISO PET.

Analysis of the preclinical imaging data acquired 2 weeks after fractionated RT revealed no stratification of radiation resistance groups for most cluster combinations. Sole hypoxia PET yielded slightly significant stratification power at this time point early during RT. As such, this confirms clinical findings of prognostic potential of FMISO PET at the second week during RT [14, 38]. However, in this study, the model for w2 was newly trained without any inference from the models obtained for pre-treatment data.

Our ML approach used to identify multi-dimensional clusters of radiation resistance is based on several assumptions. First, radiation resistance levels were based on data from earlier pre-clinical studies [32, 33], showing significant variation in radiation resistance between experiments. Second, small animal functional imaging is extremely challenging, requires anesthetized animals, and thus deviates from a standard clinical situation. In addition, we assumed a relative HRS size varying between 0 and 20% depending on the radio-resistance levels of the respective cell lines. A further drawback of our method is the fact that parameter space scanning was performed directly on image voxel data, which is more prone to noise and registration inaccuracies compared to volume averaged methods. An alternative would be to combine single voxels to small homogeneous subregions (supervoxels) prior to parameter space scanning, e.g., by means of simple linear iterative clustering [19].

In this study, we used a data-driven ML approach in terms of PCA for extracting a reduced number of QIB maps from dynamic functional imaging. The use of PCA for dynamic data has been shown to be promising by other clinical and pre-clinical studies [39] providing potentially more robust results compared to classical use of compartment models for such data [7, 9].

A previous study proposed deriving high-risk tumor subvolumes from joined functional imaging information by clustering patient imaging data [19]. However, this method does not directly use the size of a HRS for patient stratification but apply different intermediate steps to determine heuristic stratification parameters. In contrast, our method uses the relative HRS size which is directly connected to cell line specific hypoxia levels which are only available in a translational approach. This prior represents a major limitation of our study, as no tumor specific hypoxia or radiation resistance levels were measured. This underlines the necessity of independent validation studies, ideally in patients to confirm the hypotheses identified in this experiment.

Potential uncertainties of the method making use of multi-dimensional functional imaging data on voxel level originate from manual contouring of tumor regions used as input for the analysis as well as co-registration of the functional imaging data sets which is of crucial importance for the integrity of the data set in higher dimensions. Robustness of the proposed HRS method was evaluated in different ways. The density of visualized scores in parameter space (Fig. 4) shows a smooth distribution as well as a single, compact region of high scores \({S}_{HRS}\), indicating robust learning of the cluster center \({X}_{HRS}\), which is further supported by the internal bootstrap validation using the training cohort \({C}_{all}\). Furthermore, robustness of the model was evaluated using an extended cohort \({C}_{max}\) including additional tumors which were not part of the initial training cohort \({C}_{all}\). Even though this evaluation indicated stability of the model parameters, this approach cannot be considered a full independent validation due to only a small number of additional data sets in \({C}_{max}\) compared to \({C}_{all}\). A potential alternative for tumor stratification based on joint QIB maps might be an end-to-end learning approach using for example convolutional neural networks (CNNs), which have shown to achieve high performance in image processing and classification tasks [40]. We did not investigate such approach since we had only a low number of tumors with the full multi-dimensional imaging parameter space available in this study (n = 42). Therefore, an approach was developed which complements a data-driven learning method with hypotheses about the existence and size of an HRS related to known radio-resistance levels. The final model can easily be interpreted in the sense that learned HRS are fully determined by associated QIB ranges. In contrast, model interpretation using CNN-based end-to-end learning might be challenging.

In Fig. 5, cell line UTSCC-45 shows distinctly different HRS compared to all other cell lines of the group with high radiation sensitivity (group H). Interestingly, this cell line differs from the other investigated cell lines due to its positive human papilloma virus (HPV) status. The associated genetic difference may cause a shift in radiosensitivity compared to HPV-negative cancer cell lines which seems not to be detectable by quantitative imaging [41]. Therefore, ADC/FMISO-based HRS radiation dose escalation does not seem an option for low-risk HPV-positive oropharyngeal HNC and future interventional trials should be limited to patients with high-risk profiles (HPV-negative or HPV-positive plus  > 20 pack-years smoking history) [42].

As tumor hypoxia and cellularity are subject to change during RCT, individualized RT approaches adapted to the current level of resistance will only be possible if HRS can be identified shortly before treatment. Recently developed hybrid MR-Linacs may allow functional MRI acquisitions before and during RT and open thus unique possibilities in terms of MR-specific QIB-adaptive RT [43]. Recent results on phantom and early clinical data proved that quantitative imaging is possible at hybrid MR-Linac systems [44, 45] which is a major pre-requisite for biologically adapted RT dose painting based on ADC clusters. More complex multi-parametric QIB involving different imaging modalities may need to be acquired on dedicated PET/MRI scanners and used for offline response-adaptive RT. Nevertheless, before QIB-based RT dose painting can be applied in clinical RT practice, technical and clinical validation is required including test–retest studies and comparison to diagnostic scanners to ensure repeatability and reproducibility [43, 46].

In conclusion, this study used a novel ML approach combined with hypothesis-driven methods, where n-dimensional imaging spaces spanned by hypoxia imaging using dynamic FMISO PET, DW-MRI, and DCE-MRI were scanned to learn characteristic patterns of radiation resistance. Finally, we present the pre-clinical description of a HRS defined by a 3D cluster defined by ADC, FMISO_c1, and FMISO_c2 which identifies spatially resolved tumor subvolumes exhibiting increased radiation resistance and thereby presumably the cause of local tumor recurrence. These results warrant validation and translation to a clinical setting before benefits of PET/MRI-derived, QIB-based RT adaptation can be tested in a clinical trial.