Introduction

Head and neck squamous cell cancer (HNSCC) is the seventh commonest cancer (Ferlay et al. 2010). Concomitant chemo-radiotherapy (CRT) is the standard of care for the advanced disease at most head and neck tumour sites, with treatment failing at loco-regional sites in over 30% of stage III or IV tumours (Goodwin 2000).

Conventional CT and MRI evaluation is challenging in the presence of post-treatment tissue distortion (Hermans et al. 2000; Arga et al. 2006; King et al. 2013a, b). Metabolic imaging with 18F-fluorodeoxygluocose (18F-FDG) PET-CT (Sheikhbahaei et al. 2015) may overcome these limitations and is widely used to achieve earlier detection of residual disease. Quantitative post-treatment 18F-FDG PET-CT SUVmax (Moeller et al. 2009; Chan et al. 2012; Sherriff et al. 2012; Castelli et al. 2016; Kim et al. 2016; Matoba et al. 2017) has been shown to predict treatment failure and survival outcomes.

Quantitative diffusion-weighted MRI (DW-MRI) may provide alternative post-CRT imaging variables for the prediction of treatment success. Cellular tumour impedes diffusion of water molecules, resulting in lower ADC values (Chawla et al. 2009), and it has been hypothesised that a reduction in cellularity and progressive necrosis with successful treatment leads to a greater rise in ADC values. A number of studies have evaluated post-treatment tumour ADCmean as a biomarker (Kim et al. 2009; King et al. 2010; Vandecaveye et al. 2012; Schouten et al. 2014; Marzi et al. 2017; Brenet et al 2020) with increased absolute ADCmean, or a greater percentage interval increase in ADCmean from pre-treatment values, being associated with the disease control (King et al. 2010; Vandecaveye et al. 2012; Brenet et al 2020). Since DW-MRI probes a different biological process to 18F-FDG PET-CT, the two modalities may be complementary in stratifying the risk of residual or recurrent disease (Preda et al. 2016).

Human papillomavirus oropharyngeal cancer (HPV OPC) status is a potential confounding factor in these studies. HPV OPC has unique histopathological characteristics (Chernock et al. 2009) and differing tumour metabolism (Krupar 2014) which influence ADC measures (Chan et al. 2016; Driessen et al. 2016) and post-treatment SUVmax (Moeller et al. 2009; Zhang et al. 2010; Castelli et al. 2016; Vainshtein et al. 2014; Helsen et al. 2018), whilst resulting in improved clinical outcomes (Chaturvedi et al. 2011). However, the HPV-OPC status is rarely documented in studies of post-treatment quantitative DW-MRI (Marzi et al. 2017) or post-treatment 18F-FDG PET-CT in predicting HNSCC outcomes (Sherriff et al. 2012; Kim et al. 2016; Matoba et al. 2017). HPV OPC has not been previously considered as a co-variant in the studies of post-treatment ADC values and their prognostic significance in HNSCC.

In this study, we first aimed to determine whether 6- and 12-week post-CRT ADCmean values (absolute values and percentage interval increase in values) and 12-week post-CRT SUVmax were able to predict DFS in stage III/IV HNSCC. Second, we explored whether this prediction was influenced by HPV OPC status, and whether these quantitative post-CRT DW-MRI variables and 18F-FDG PET-CT differed between HPV-OPC and other HNSCC, after stratifying for disease-free survival (DFS) status.

Methods

Participants

Participants were recruited to a prospective single-centre cohort observational study between May 2014 and July 2017 (http://www.controlled-trials.com/ISRCTN58327080). Research Ethics Committee approval (REC reference 13/LO/1876) and informed consent was obtained.

Participants were eligible if: (1) there was a histologically confirmed stage III or IV primary HNSCC without distant metastatic disease (2) a 1cm2 area of measurable primary tumour and/or nodal tumour on the basis of standard clinico-radiological staging, and (3) curative CRT was planned. Exclusion criteria included prior CRT, Eastern Cooperative Oncology Group (ECOG) performance status > 2, inability to provide informed consent, known allergy to gadolinium-based contrast medium and eGFR < 30 ml/min.

Sample size was calculated to demonstrate a difference in the percentage change in ADC values between participants with and without DFS at 2 years. It was assumed that 70% of the participants would be disease-free at 2 years (Goodwin 2000) with the standard deviation of the percentage change in ADC values being 20% (Kim et al. 2009; King et al. 2010; Marzi et al. 2017; Schouten et al. 2014; Vandecaveye et al. 2012). A sample size of 70 was projected to show a 15% difference between those with and without 2-year DFS assuming 5% significance level and 80% power.

HPV status, biopsies and treatment

HPV status was analysed for all OPC as per standard of care. Non-OPC HNSCC was not routinely tested for HPV status, according to international guidelines (Fakhry et al. 2018). HPV status was evaluated with p16 testing using an immune-stain or high-risk HPV DNA testing using in situ hybridisation. HPV status was analysed for 49/49 oropharyngeal and 2/16 other HNSCC. Diagnostic biopsies were obtained from the primary tumour (n = 56), lymph node (n = 7) or both (n = 2).

Imaging

Patients underwent (1) MRI before treatment and at 6- and 12-weeks post-CRT as per the study protocol and (2) 18F-FDG PET-CT imaging at 12-weeks post-CRT as per institutional practice.

MRI: protocol and technique

Patients underwent standard institutional head and neck soft tissue protocol MRI on a 1.5 T scanner (Siemens Magnetom Aera) using a surface phased array neck coil. An additional research echo-planar diffusion-weighted sequence was acquired in the axial plane with the following b-values: 0, 50, 100, 800 and 1500 s/mm2 (supplementary table).

MRI: processing and analysis

The ROIs were placed by a radiologist (3 years’ experience) under the supervision of another radiologist (21 years’ experience). The first 24 participants were also independently analysed by a further radiologist (5 years’ experience) to assess for inter-observer agreement. ROIs were placed individually within the primary tumour and/or the largest pathological lymph node (Figs. 1, 2) using OsiriX v8.0.2, open-source Mac-based medical image processing software. ROIs were placed on the pre-treatment, 6-week (ADCmean6) and 12-week (ADCmean12) post-treatment MRI studies using the DWI b = 800 s/mm2 map, but with access to other MRI sequences. When a focus of increased DWI signal was not evident on post-treatment images, a standardised 6 mm diameter ROI was placed at its original location. An ADC map was generated from the b 100 and b 800 s/mm2 images. A ROI was also placed within the cervical spinal cord on the ADC map as a reference.

Fig. 1
figure 1

A HPV negative participant with a partially necrotic left level 2 lymph node. a T1w post gadolinium axial image pre-treatment demonstrates the lymph node (arrow). b b = 800 s/mm2 map from DW-MRI pre-treatment indicating the lymph node ROI as the increased DWI signal whilst avoiding the necrotic area. c T1w post gadolinium axial image at 12 weeks post-treatment demonstrates the lymph node to be of reduced size (arrow). b b = 800 s/mm2 map from DW-MRI at 12 weeks post-treatment indicating the lymph node ROI as the increased DWI signal. e 18F-FDG PET-CT study at 12 weeks post-treatment demonstrating the 6 mm VOI at the site of mild 18F-FDG uptake in the lymph node

Fig. 2
figure 2

A HPV-positive participant with a left palatine tonsillar tumour. A T1w post gadolinium axial image pre-treatment demonstrates the left palatine tonsillar tumour (arrow). b b = 800 s/mm2 map from DW-MRI pre-treatment indicating the primary tumour ROI as the increased DWI signal. C T1w post gadolinium axial image at 12 weeks post-treatment demonstrates the primary tumour to be of reduced size (arrow). b b = 800 s/mm2 map from DW-MRI at 12 weeks post-treatment indicating the primary tumour standardised 6 mm ROI since there is no increased DWI signal relative to adjacent oropharyngeal tissue. e 18F-FDG PET-CT study at 12 weeks post-treatment demonstrating the 6 mm VOI at the primary tumour. Since there is no 18F-FDG uptake to target, it is placed with guidance from the MRI study

18F-FDG PET-CT: protocol and technique

18F-FDG PET-CT was performed as per standard clinical practice. Patients were fasted for at least 6 h prior to administration of 350–400 MBq 18F-FDG. PET/CT scans were acquired 90 min after injection from the upper thigh to the base of the skull with additional local views of the head and neck performed on one of two PET-CT scanners (Siemens mCT Flow VST or GE Discovery DST 710) (supplementary table).

18F-FDG PET/CT: processing and analysis

A 6 mm diameter volume of interest (VOI) was placed by a radiologist (3 years’ experience), under the supervision of another radiologist (16 years’ experience). VOIs were placed at the site of most intense FDG uptake within either the primary lesion and/or the largest lymph node, which were matched to the ROI placed for the MRI analysis (Figs. 1, 2). If there was reduced uptake on the post-treatment images relative to background, a 6 mm VOI was placed at the same site as the post-treatment MRI ROI. If necrosis was identified within a lesion, the area of necrosis was excluded. The SUVmax was calculated with semi-automated software on a Hermes workstation (Hermes Gold 3, Stockholm).

Treatment and treatment outcome

Intensity-modulated radiotherapy (IMRT) was delivered as 7-Gy in 35 fractions (2 Gy per fraction delivered once daily, 5 days a week). Concomitant intravenous cisplatin at a dose of 35 mg/m2 every 7 days, starting on day 1 of radiotherapy, was used for all patients with adequate GFR and no contraindications to cisplatin (n = 47) with carboplatin being used if measured GFR < 50 or if a patient had a history of hearing impairment (n = 16). Two patients received radiotherapy alone. The time from the completion of treatment to disease progression was recorded for those participants without DFS and time from the completion of treatment to the latest follow-up was recorded for those with DFS. The 2-year DFS was recorded for all participants. A 12-week PET-CT study was standard of care with clinical assessment at 1- and 2-years post-CRT. Treatment failure was determined by cytological or histological confirmation or serial progression on imaging.

Statistical analysis

Analysis was performed using Stata (version 15.1) with a p value of < 0.05 being considered statistically significant.

The percentage interval changes in ADCmean (%ADCmean0–6, %ADCmean0–12,%ADCmean6–12, respectively), were calculated.

The %ADCmean0–6, %ADCmean0–12, %ADCmean6–12 ADCmean6, ADCmean12 and SUVmax12, at primary tumour and nodal locations, were compared with survival outcomes using two different methods. First, they were compared between participants with and without 2-year DFS using the independent t-test, if variables were normally distributed, and the Mann–Whitney test if they were not normally distributed. Second, the association between the imaging variables and DFS outcome was evaluated using Cox regression analysis after censoring patients without DFS at the time of the last follow-up. These comparisons with DFS were performed for all participants and subsequently for the HPV OPC and other HNSCC subgroups alone.

No multiple testing correction for these pre-designed “planned comparisons” was deemed appropriate in this exploratory study.

Receiver operating characteristic (ROC) analysis was used to identify the area under the curve (AUC), optimal threshold and sensitivity/specificity/ positive predictive value (PPV)/negative predictive value (NPV) for any parameters predictive of 2-year DFS. The optimal threshold was chosen as the point which maximised the combination of sensitivity and specificity. Hazard ratios were also calculated for variables predictive of overall DFS from the Cox regression analysis. These represent the relative chance in the hazard (risk) of disease progression at any time for a specified increase in the given variable.

The variables in HPV OPC and other HNSCC subgroups were compared with each other after stratifying for presence or absence of DFS. Continuous variables were compared using the independent t-test if normally distributed, and the Mann–Whitney test if not normally distributed.

The primary tumour and lymph node %ADCmean0–12, %ADCmean6–12 and ADCmean12 were correlated with SUVmax 12 using Pearsons correlation coefficient.

The Intra-class correlation coefficients (ICCs) were evaluated for interobserver agreement.

Results

Participants and descriptive statistics

The participant consort flow diagram is demonstrated in Fig. 3.

Fig. 3
figure 3

Participant consort flow diagram

There were 70 subjects enrolled out of 101 eligible, but five were subsequently withdrawn. Of the 65 participants (53 male, 12 female, mean age 59.9 ± 7.9 years), there were 11 with stage III disease (17%) and 54 with stage IV (83%). Participant characteristics including primary site, nodal staging and HPV status are summarised in Table 1. There were 46/65 patients with HPV-OPC.

Table 1 Site, subsite, TN stage and HPV status for the 65 participants

The number of primary tumours and lymph nodes analysed, the ICCs for the sample of ROIs performed by two observers, and the cervical cord ROI ADCmean values at the different DW-MRI time points are indicated in Table 2.

Table 2 Number of participants, number of primary tumours and lymph nodes analysed, inter-observer agreement, and cervical cord ROI ADCmean values at the different DW-MRI time points

The median follow-up was 4.1 [3.05, 5.0] years post treatment. Ten participants had progressive disease within two years of completing CRT (isolated nodal recurrence (n = 4); nodal, primary and distal metastatic recurrence (n = 1); isolated primary recurrence (n = 1) and distal metastatic recurrence (n = 4). The median time to recurrence was 0.51 [0.30, 0.72] years post-treatment. There were no other cases of progressive disease within the duration of the study follow-up.

Comparison of post-CRT ADCmean variables and SUVmax with 2-year and overall DFS

Table 3 demonstrates the comparison of post-CRT percentage interval changes in ADCmean, absolute ADCmean values and SUVmax, with DFS outcomes for all participants, the HPV-OPC subgroup and the other HNSCC subgroup. A box plot (Fig. 4) illustrates the lymph node and primary tumour absolute ADCmean values at pre-treatment, 6 weeks post-CRT and 12 weeks post-CRT in participants with and without 2 year DFS.

Table 3 Comparison of post CRT 18F-FDG PET-CT (SUVmax 12) and DW-MRI parameters (%ADCmean 0–6, %ADCmean 0–12, ADCmean 6 and ADCmean 12) between participants with and without 2-year and overall DFS: all participants, HPV OPC and other HNSCC
Fig. 4
figure 4

Box plot illustrating the lymph node and primary tumour absolute ADCmean values at pre-treatment, 6 weeks post-CRT and 12 weeks post-CRT in participants with and without 2 year DFS

The lymph node absolute ADCmean at 6 weeks (p = 0.02) and primary tumour absolute ADCmean at 12 weeks (p = 0.03) was predictive of 2-year DFS for all participants with higher values of being associated with an increased risk of 2-year DFS. The lymph node absolute ADCmean at 6 weeks predicted 2-year DFS with AUC of 0.77 with an optimum threshold of 1405 10–6 mm2/s and sensitivity/specificity/PPV/NPV of 83%/80%/39%/97%, respectively. The primary tumour absolute ADCmean at 12 weeks predicted 2-year DFS with AUC of 0.70 with an optimum threshold of 1840 10–6 mm2/s and sensitivity/specificity/PPV/NPV of 83%/57%/22%/96%, respectively. Application of these thresholds predicted 5 of the 6 patients with disease progression at 2 years.

The lymph node absolute ADCmean at 6 weeks (p = 0.03) and primary tumour absolute ADCmean at 12 weeks (p = 0.03) were also predictive of overall DFS for all participants according to Cox regression analysis. A 100 × 10–6 mm2/s higher lymph node ADCmean at 6 weeks was associated with the risk of DFS increasing by 61% (4–149%; 95% CI), whilst a 100 × 10–6 mm2/s higher primary tumour absolute ADCmean at 12 weeks was associated with the risk of DFS increasing by a 38% (3–184%; 95% CI) at any time. Kaplan–Meier plots illustrate the impact of lymph node absolute ADCmean at 6 weeks and primary tumour absolute ADCmean at 12 weeks on the DFS (Fig. 5).

Fig. 5
figure 5

Kaplan–Meier plots illustrate the impact of lymph node absolute ADCmean at 6 weeks (a) and primary tumour absolute ADCmean at 12 weeks (b) on DFS. For the purposes of illustration of the results, the patients were split into two equal-sized groups by the median ADC value for each parameter

None of the percentage interval changes in ADCmean or absolute ADCmean values variables were able to predict DFS in the HPV OPC subgroup (Table 3).

The lymph node absolute ADCmean at 6 weeks was associated with 2-year DFS (p = 0.03) and the primary tumour absolute ADCmean at 12 weeks was significantly associated with both 2-year DFS and DFS (p = 0.01; p = 0.04) in the other HNSCC subgroup. However, there should be some caution exercised in interpreting these results due to the small number of participants in this subgroup (Table 3).

The 12-week post-CRT SUVmax did not predict DFS for either HPV-OPC or other HNSCC.

Comparison of post-CRT ADCmean variables and SUVmax between HPV-OPC and other HNSCC when stratified by  DFS

Table 4 demonstrates the comparison of post-CRT percentage interval changes in ADCmean, absolute ADCmean values and SUVmax, between HPV-OPC and other HNSCC, in participants with DFS.

Table 4 Comparison of post CRT 18F-FDG PET-CT (SUVmax 12) and DW-MRI parameters (%ADCmean 0–6, %ADCmean 0–12, ADCmean 6 and ADCmean 12) between HPV OPC and other HNSCC participants with DFS

In participants with DFS, the percentage interval changes in ADC values at the primary tumour site were significantly higher in HPV OPC than in other HNSCC, both at 6 weeks (%ADCmean0–6; p = 0.01) and at 12 weeks (%ADCmean0–12; p = 0.005).

There was no significant difference between HPV OPV and other HNSCC subgroups for the primary tumour absolute ADCmean, any of the lymph node ADCmean variables or the 12-week SUVmax in participants with 2-year DFS.

Due to the small sample of participants without DFS, a formal statistical comparison was not performed between HPV OPV and other HNSCC variables, although primary tumour percentage interval changes in ADCmean values were again noted to be higher in HPV OPC. For instance, at 12 weeks, the percentage interval change in ADCmean for HPV-OPC (141 ± 83% × 10–6 mm2/s; n = 3) was higher than that for other HNSCC (70 ± 39% × 10–6 mm2/s; n = 4).

Correlation between 12 week ADCmean variables and 12 week SUVmax

The correlation between %ADCmean0–12, %ADCmean6–12 and ADCmean12 and SUVmax12 at the primary tumour and lymph node sites is shown in Table 5. There was no significant correlation between SUVmax12 and ADCmean12 or its interval change with p = 0.50–0.82 at the lymph node site and p = 0.52–0.71 at the tumour site.

Table 5 Correlation between 12 week SUVmax and 12 week ADCmean parameters

Discussion

The absolute 6 week lymph node and 12 week primary tumour ADCmean values were able to predict 2-year DFS and overall DFS for the whole cohort, but not for the HPV-OPC subgroup. The percentage changes in primary tumour ADCmean from pre-treatment to 6- and 12-week post-CRT were unable to predict DFS, and were significantly higher in successfully treated HPV-OPC primary tumours compared to successfully treated HNSCC at other sites. The 12-week post-CRT SUVmax did not predict DFS overall or for either subgroup and was not influenced by HPV-OPC status.

Almost 90% of HNSCC recurrences following CRT develop within 2 years (Chang et al. 2017). Timely intervention is required in order that progressive loco-regional disease can be cured with salvage surgery. Metabolic imaging with 18F-FDG PET-CT has evolved as a tool for the post-treatment evaluation of HNSCC but is generally delayed for at least 12 weeks due to the potential for false-positive resulting from early post-treatment inflammatory changes (Mehanna et al. 2016). Although qualitative interpretative criteria are most frequently applied (Koksel et al. 2019; Krabbe et al. 2009; Marcus et al. 2014; Porceddu et al. 2011; Sjövall et al. 2016; Zhong et al. 2020), quantitative analysis of SUVmax with 18F-FDG PET-CT has been shown to have prognostic significance (Moeller et al. 2009; Chan et al. 2012; Sherriff et al. 2012; Castelli et al. 2016; Kim et al. 2016; Matoba et al. 2017) and multi-objective radiomics models have also been applied in this setting (Zhang et al.2018). However, we were unable to demonstrate the value of 12-week post-CRT SUVmax in predicting DFS in our predominantly HPV-OPC cohort.

Quantitative DW-MRI has been proposed as an alternative prognostic biomarker for the assessment of early HNSCC treatment response to CRT. ADCmean values or their interval changes from pre-treatment baseline studies have been evaluated from both intra-treatment (Kim et al. 2009; King et al. 2010; Berrak et al. 2011; Vandecaveye et al. 2012; King et al. 2013a, b; Matoba et al. 2014; Schouten et al. 2014; Ding et al. 2015; Galbán et al. 2015; Wong et al. 2016; Marzi et al. 2017; Paudyal et al. 2017) and post-treatment (King et al. 2010; Vandecaveye et al. 2012; Kim et al. 2014; Schouten et al. 2014; Marzi et al. 2017; Brenet et al 2020) DW-MRI studies, in an attempt to predict CRT outcomes. The majority of studies have found that increased absolute ADCmean values or a greater rise in ADCmean from pre-treatment to either intra-treatment values (Kim et al. 2009; Berrak et al. 2011; King et al. 2013a, b; Matoba et al. 2014; Ding et al. 2015; Marzi et al. 2017; Cao et al. 2019) are predictive of treatment success, however, this is not a universal finding (Galbán et al. 2015; Wong et al. 2016; Paudyal et al. 2017). There are few studies which have applied this to the post-treatment setting (King et al. 2010; Vandecaveye et al. 2012; Brenet et al 2020). Our finding of decreased absolute ADCmean in patients with successful treatment differs from that observed in previous post-treatment studies (King et al. 2010) when an increased ADCmean in primary tumours or lymph nodes (n = 20) was able to predict 6 month outcomes. However, King et al. only sampled visible residual tumour whilst our approach was to place standardised ROIs at the tumour location when solid tissue with increased DWI signal was not visible. This was the case for all primary tumours and the majority (32/52) of lymph nodes with definable residual disease on DWI at the relevant time points. It is speculated that the favourable outcomes with decreased absolute ADCmean in our cohort actually reflects a post-treatment fibrotic response, since densely packed benign collagen may result in decreased ADC (Ailianou et al. 2018). Whilst other previous researchers have found interval changes in ADCmean post treatment to have prognostic potential (Vandecaveye et al. 2012; Brenet et al. 2020) there were different methodologies and study populations.

In previous studies, ADC measurements have been performed at various intervals between 3 and 12 weeks after completion of CRT. The potential to diagnose the residual post-CRT tumour and perform salvage surgery earlier than a 12-week 18F-FDG PET-CT would be advantageous since surgery is less compromised by fibrosis, there is less possibility of tumour being irresectable or spreading to distant sites. This was the rationale for including a 6-week time point in our study design. The interval percentage changes in ADCmean from 6 to 12-weeks were smaller than pre-treatment to 6 weeks, and the potential ability of predicting outcome with 6-weeks post-CRT absolute ADCmean concurs with a previous study (King et al. 2010). It is of interest that the absolute ADCmean value was predictive of 2-year and overall DFS at 6 weeks post CRT for the lymph node, whereas it was at the later 12-week time point for primary tumour. It may be speculated that there is a later differential post CRT increase in ADCmean at the primary tumour site compared with lymph nodes in successfully treated patients.

HPV-OPC is increasing in incidence and now accounts for 70–80% of OPC in the United States and Western Europe (Chaturvedi et al. 2011). HPV-OPC is a clinically, epidemiologically and histologically distinct form of HNSCC; it is more radiosensitive and has a better outcome irrespective of treatment choice. It exhibits particular histopathological features such as indistinct cell borders and comedo-necrosis (El-Mofty and Lu 2003) and is characterized by an increased glucose and respiratory metabolism (Krupar et al. 2014). The potential influence of the HPV-OPC status on the prognostic values of intra or post-treatment ADC and SUVmax has only been addressed in a limited number of studies (Moeller et al. 2009; Ding et al. 2015; Castelli et al. 2016; Wong et al. 2016; Marzi et al. 2017; Paudyal et al. 2017; Cao et al. 2019).

The pre-treatment SUVmax (Kendi et al. 2015; Tahari et al. 2014) in HPV-OPC differs from that in other HNSCC. Although there are variable results (Koshkareva et al. 2014; Mowery et al. 2020), a number of studies have shown that post-treatment SUVmax is a less accurate predictor of outcome in HPV-OPC than other HNSCC (Moeller et al. 2009; Vainshtein et al. 2014; Castelli et al. 2016; Helsen et al. 2018). It has been speculated that the greater radio-sensitivity of HPV-OPC results in a delayed repopulation by resistant cells, and a lower sensitivity to early post-treatment detection, such that a longer interval to the surveillance 18F-FDG PET-CT may prove more appropriate in HPV-OPC. It has been shown that a 16-week post CRT 18F-FDG PET-CT demonstrates superior diagnostic accuracy for residual HPV-OPC nodal tumour when compared to 12-week 18F-FDG PET-CT (Liu et al. 2019). In addition, the increased cytotoxic T-cell-based immune response reported in HPV-OPC may result in spurious18F-FDG uptake and reduced specificity of 18F-FDG PET-CT.

It is also recognised that pre-treatment ADCmean values are lower (Chan et al. 2016; Driessen et al. 2016) and possibly more variable in HPV-OPC (Wong et al. 2016). In our study, post-CRT ADCmean interval changes were greater in HPV-OPC than other HNSCC, but the difference was only statistically significant for the primary tumour site in those with DFS. The percentage interval changes were not predictive of DFS overall or within HPV-OPC subgroups. It could therefore be argued that, without multivariate analysis to account for HPV-OPC status, the larger interval changes in treatment responders reported in previous studies (Kim et al. 2009; Berrak et al. 2011; Vandecaveye et al. 2012; King et al. 2013a, b; Matoba et al. 2014; Marzi et al. 2017; Brenet et al 2020) may be related to the predominance of prognostically favourable HPV-OPC.

Despite the small sample size, the absolute ADCmean values were shown to predict 2-year and overall DFS in other HNSCC participants at both lymph node and primary tumour sites. A potential application in this group of patients is of importance since they have a poorer prognosis and will benefit most from earlier diagnosis of residual tumour. There are a few potential reasons for the failure of HPV-OPC post-CRT ADCmean values to predict outcomes. First, the greater radiosensitivity of HPV-OPC and cystic nature of lymph nodes result in smaller tumour residua which are more difficult to reliably analyse. Second, there has been observed to be a wider variation in pre-treatment ADCmean values in HPV-OPC (Wong et al. 2016), which may influence the ability to predict outcomes on the basis of interval change.

Previous direct comparisons of 18F-FDG PET-CT and quantitative DWI-MRI for their ability to predict treatment outcomes are confined to the pre-treatment and intra-treatment settings (Choi et al. 2011; Martins et al. 2015; Preda et al. 2016) or in the presence of symptomatic recurrence (Becker et al. 2018). To our best knowledge, this is the first prospective study, to date, comparing the ability of the two modalities to predict outcomes in the early post-treatment setting. Whilst a previous study showed synergy between the two modalities in stratifying the risks of therapeutic failure from pre-treatment imaging (Preda et al. 2016), this could not be reproduced in our cohort. Nonetheless, this possibility should be further explored in a larger high risk or HPV-OPC negative population.

The authors acknowledge a number of limitations in the design of this study. Firstly, the small number of both other HNSCC participants and those with treatment failure resulted in the study being sub-optimally powered for subgroup analysis. Previous publications indicated that at least 30% of HNSCC would fail treatment at loco-regional sites (Goodwin 2000) and the study was initially powered on this basis. Whilst the sample size was comparable to other similar studies, our prospectively accrued cohort comprised an unexpectedly high proportion of HPV-OPC participants (46/65) with improved outcomes. Similarly, the sample size was specifically calculated to demonstrate a differences in ADCmean interval change between participants with and without 2-year DFS, so it was potentially underpowered to reveal a variation in 12-week post-CRT SUVmax. We propose that larger cohorts are required for further validation of our results. Second, it should be noted that almost all primary tumours and the majority of lymph nodes did not demonstrate residual focal signal abnormality (> 5 mm) on DW-MRI at follow-up (4% primary tumours, 45% nodes). It has been recommended that a 5 mm lesion is required for reliable assessment of ADC in the head and neck (Theony et al. 2012). When there was no overt residual post-treatment tumour on DW-MRI, a standardised ROI was placed according to the site of the pre-treatment lesion as has been described at other tumour sites (Kuang et al. 2011). Thirdly, it was decided a priori not to correct for multiple comparisons since these were selected “planned comparisons” as part of the experimental design and not a data-driven search. It was not clear how many factors to adjust for any adjusted p value would be difficult to compute. It should, however, be appreciated that there is an inherent trade off between protecting against Type I errors and Type II errors in such an exploratory study and that a lower pre-specified significance level may not have demonstrated a predictive value of the absolute 6-week lymph node and 12-week primary tumour ADCmean values. Fourthly, it is appreciated that alternative qualitative approaches using a standardised comparison with adjacent tissues may overcome the potential for false-positive 18 F-FDG PET results due to radiation-related inflammation. These approaches have been widely applied to the post CRT evaluation of HNSCC and studies have demonstrated their ability to predict disease outcome (Koksel et al. 2019; Krabbe et al. 2009; Marcus et al. 2014; Porceddu et al. 2011; Sj övall et al. 2016; Zhong et al. 2020). Whilst quantitative 12-week post-CRT SUVmax was not associated with DFS in this cohort, our results cannot be directly compared with those of qualitative interpretative 18 F-FDG PET criteria since they did not incorporate a comparison with the 18 F-FDG PET uptake in other tissues. Finally, the inter-observer agreement statistics would have been optimally obtained from the whole cohort, however, the sample analysed by two observers was noted to be representative in terms of tumour site and HPV status.

In conclusion, primary tumour and nodal absolute post-CRT ADCmean measurements may predict 2-year and overall DFS in HNSCC but this does not apply to the HPV-OPC subgroup. Following successful CRT for HNSCC, percentage interval changes in ADCmean at the primary tumour site are seen to differ between HPV-OPC and other HNSCC. Therefore, knowledge of HPV-OPC status is crucial to the clinical utilisation of post-CRT DWI-MRI for the prediction of outcomes.