Introduction

Oropharyngeal squamous cell carcinoma (OPSCC) subdivides into human papilloma virus (HPV)-associated and non-HPV-associated cancer groups. Immunohistochemical (IHC) overexpression of p16 is a surrogate marker for HPV and used in the current TNM classification [1] and the World Health Organization (WHO) classification of head and neck tumors [2]. A HPV-associated OPSCC usually responds well to (chemo)radiation ([C]RT), and treatment deintensification trials are underway to reduce treatment adverse effects [3, 4]. In a small subset of patients, however, the disease recurs making salvage surgery more demanding.

Prior to treatment, patients undergo multiplanar radiological imaging. Some radiological features show prognostic impact, such as volume of the primary tumor and metastasis, lymph node cystic or matted morphology, and extranodal extension (ENE) [5,6,7,8]. Many reports link primary tumor relatively high apparent diffusion coefficient (ADC) in magnetic resonance imaging (MRI) with lower radiosensitivity [7, 9,10,11,12,13,14,15,16], although some conflicting evidence has emerged [17,18,19]. These studies include cancers from multiple head and neck sites, however, and all but one overlook HPV association [17]. This confounds results because HPV positive tumors tend to have lower ADC [20,21,22,23,24], and respond better to (C)RT.

Our aim was to investigate the prognostic effect of clinical and radiological variables, including ADC, in pretreatment MRI in an OPSCC population treated with (C)RT with curative intent. These features were then compared with the tumor p16 status. We hypothesized that ADC would serve to estimate treatment response and prognosis after (C)RT and improve management guidelines. This might support the decision to exclude patients who seem to have a worse prognosis from the deintensification protocols, and lead to offering them more extensive treatment.

Material and Methods

Study Design and Patient Selection

We included OPSCC patients treated at the Helsinki University Hospital Head and Neck Center, diagnosed between January 2013 and December 2017. Our multidisciplinary tumor board reviewed the diagnostics, staging, and treatment plan for all patients. The follow-up data were collected in December 2018. Inclusion criteria were histologically proven OPSCC, available pretreatment diagnostic MRI, and (C)RT with curative intent. We excluded patients with previous head and neck cancer (HNC) or distant metastasis at presentation (Fig. 1). The final study cohort consisted of 67 patients, of whom 55 had p16 positive disease, 9 had p16 negative disease, and in 3 patients the result of p16 status was unavailable. We chose the years from 2013 onwards, when the diffusion-weighted imaging (DWI) sequence was added to our institute’s tumor imaging protocol. The protocol remained mainly constant during the study period and DWI was available for 63 patients (for 52 of the 55 p16 positive patients).

Fig. 1
figure 1

Patient exclusion chart

Treatment and Follow-up

In a small subset of patients, the tumor was biopsied via tonsillectomy after MRI. We included 8 of these patients who had undergone a primary tonsillectomy with metastatic lymph nodes (N + disease), as these patients received the same locoregional treatment as others and they are a typical patient population receiving definitive (C)RT. Patients who underwent tonsillectomy and had no neck metastases (N0 disease) were excluded as they had no macroscopic tumor left prior to the RT. All patients were treated with intensity-modulated RT (IMRT), range 56–70 Gy; 66 patients received the treatment conventionally fractionated with 2 Gy daily fractions and 1 with simultaneous integrated boost (SIB). RT was delivered to the primary tumor and cervical lymph node areas. Chemotherapy was administered with weekly cisplatin 40 mg/m2 (maximum of 6 doses).

After treatment, patients underwent regular follow-up appointments and positron emission tomography with computed tomography (PET-CT) at 3 months. Additional diagnostic and therapeutic procedures were planned in cases of suspicion of a residual or recurrent disease. Locoregional recurrence (LRR) was defined either as a residual disease (a persistent disease in follow-up PET-CT and first clinical examination with no remission in between) or a later recurrent disease (a reappearance of the tumor at the primary site or metastases after a disease-free period). Biopsy or clear imaging evidence with progression confirmed the recurrence. Residual disease was found in 7 (10.4%) and later recurrent disease in 6 (9.0%). Median time to recurrence was 98 days. Follow-up time was calculated from the last day of treatment to the last follow-up visit or death, whichever occurred first. Salvage surgery included primary tumor area, or neck dissection, or both depending on the site of the recurrence.

Pretreatment MRI and Analysis

Our standard tumor imaging protocol consisted of a localizer sequence, axial turbo spin echo (TSE) T2, axial fat saturated T2, axial TSE T1, axial and coronal fat saturated T1 with gadolinium, and axial echo-planar DWI with b‑values of 0, 500, and 1000 s/mm2. ADC maps were generated from DWI sequences. The patients were imaged with six different MRI scanners: two were 1.5 T Siemens Avanto fit (Siemens Healthcare, Erlangen, Germany), two were 1.5 T Siemens Avanto (Siemens Healthcare), one was 1.5 T GE Signa HDxt (GE Healthcare, Chicago, IL, USA), and one was 3 T Siemens Verio (Siemens Healthcare). Dedicated head and neck coils were applied. Two experienced head and neck radiologists (H. S. and R. L.) analyzed the images, blinded to tumor p16 status and treatment outcome.

Clinical and Radiological Variables

All patient data were gathered from hospital records (Tables 1 and 2). Tumors primarily staged by the 7th edition of the UICC system were retrospectively restaged according to the UICC 8th edition [1]. The p16 IHC served for determining the tumor HPV association. The criteria of the size of the metastatic node were minimum axial diameter of 10 mm and 11 mm for the digastric node [25]. Heterogeneous nodes were counted as metastatic. Cystic metastases were defined to have a thin (< 2 mm) enhancing capsule and internal homogeneous fluid content [26], or as an intranodal cystic space with more than 70% of the border smoothly delineated [27]. Otherwise, a fluid-containing metastatic node was defined as necrotic. Lymph node location was documented according to the American Academy of Otolaryngology–Head and Neck Surgery (AAO–HNS) 2002 classification [28].

Table 1 Patient demographics, treatment data and treatment results, stratified by p16
Table 2 Radiological variables, stratified by p16, and interobserver correlations

The volume of the primary tumor and the largest single metastasis or matted metastatic lymph node mass were measured by manually drawing the region of interest (ROI) in all MRI slides containing the tumor. Volume was measured primarily from the ADC maps, which usually well differentiate the tumor and surrounding edema. The T2 and T1 gadolinium-enhanced fat saturated images were used as a reference. If ADC maps were unavailable or tumor was not well delineated, T1 gadolinium-enhanced fat saturated images were used instead. Prior to volume measurement, MR images were exported from our picture archiving and communication system (Agfa Impax 6.7, Agfa Healthcare, Mortsel, Belgium) and anonymized. Volume measurement was performed with third party 3D software (3D Slicer version 4.10.1, www.slicer.org).

DWI Analysis

The ADCmean of the primary tumor and metastatic lymph node was measured by manually drawing an ROI on the single slice most central to the tumor (Fig. 2). The T2 and T1 fat saturated gadolinium-enhanced images served as a reference to exclude necrotic areas. ADCmin was measured with the smallest available 0.24 cm2 ROI from the most hypointense part of the tumor.

Fig. 2
figure 2

Measuring apparent diffusion coefficient (ADC) region of interest (ROI). a Axial diffusion weighted imaging, b corresponding ADC map and c axial T1 fat saturated gadolinium-enhanced images show a tonsillar tumor infiltrating to the tongue on the right. The ROI was drawn on the ADC map along the tumor borders, excluding the surrounding inflammation and necrotic areas, on the most central slice of the tumor. The ADC was 1.097 × 10−3 mm2/s

The ADC values differ between MRI systems and ADC value of the cervical spinal cord may be used to compare between these differences [29]. In an attempt to estimate the ADC values across different scanners, we measured ADC at the level of cervical spinal cord in 3 central slices to calculate a mean value (ADCmyelum) for each examination.

Statistical Analysis

The main clinical outcome measures were disease-free survival (DFS) and LRR rate, as we found them to be most representative of the treatment effect in our study setting. Time to event or censoring was counted from the end date of (C)RT. We first conducted univariable Cox proportional hazards model to find potential prognostic factors, then adjusted the results with the tumor p16-status.

Differences in variable values between p16-positive and p16-negative tumors were compared with χ2-testor Fisher’s exact test with categorical variables and Mann-Whitney U‑nonparametric test with ordinal or continuous variables, which in our study were all non-normally distributed. Interobserver agreement for categorical radiological variables was calculated with Cohen’s kappa and in continuous variables with intraclass correlation coefficient (ICC). Values under 0.4 indicated poor agreement, values between 0.4 and 0.75 indicated moderate agreement, and values 0.75 and over indicated excellent agreement [30]. Variables with poor interobserver agreement were not studied.

The differences between MRI systems were evaluated by comparing the mean value of ADCmyelum for each MRI system examinations with Kruskal-Wallis nonparametric test. A p-value of < 0.05 was considered significant.

The data analysis tool was SPSS Statistics 25 software (SPSS Inc., Chicago, IL, USA). Survival curves were drawn using GraphPad Prism (version 9.0 for Windows, Graphpad software, La Jolla, CA, USA, www.graphpad.com).

Results

Patients

Table 1 presents patient demographics, treatment outcome, and survival data, stratified by tumor p16 status. In patients with p16-negative disease, both overall survival (OS) and DFS were noticeably inferior. Of the 13 patients with LRR, further treatment included surgery with curative intent in 5 patients, palliative radiotherapy in 1 patient, palliative chemotherapy in 2 patients, palliative CRT in 2 patients, and palliative symptomatic treatment in 2 patients. One patient died of hemorrhage at the time of diagnosis of the recurrent tumor.

Thirteen patients experienced a treatment break in RT ranging from 2–18 days (median 6 days). In total, 39 patients received chemotherapy as planned. Three patients received a chemotherapy agent other than cisplatin.

Prognostic Value of Clinical and Radiological Variables for DFS (Table 3)

In univariable analysis, patients with p16-negative tumor had a 7.7-fold risk for disease recurrence or death compared with patients with p16-positive tumor. Smoking at diagnosis and number of pack years were also significantly associated with worse DFS. Interruptions in RT and incomplete chemotherapy were significantly associated with worse DFS. After adjusting for these variables with p16, only interruption of RT remained significant in DFS analysis.

Table 3 Crude and p16-adjusted hazard ratios (HR) for disease-free survival in Cox proportional hazards regression model

In univariable analysis, metastasis volume, muscle invasion, and depth of muscle invasion indicated significantly worse DFS. Primary tumor higher ADCmean and ADCmin were associated with worse DFS. After adjusting these variables with p16, only metastasis volume remained significant in DFS analysis. Larger metastasis volume but not primary tumor volume, was associated with the occurrence of distant metastasis, regardless of p16 status. The HR was 1.059 (95% CI 1.011–1.110, p = 0.015). Survival curves stratified by ADC are presented in Fig. 3a, b.

Fig. 3
figure 3

a Survival curves for all patients (p16-positive and p16-negative tumors). Disease-free survival (DFS) in two groups of primary tumor apparent diffusion coefficient (ADCmean) at or higher than 0.836 × 10−3 mm2/s, and lower than 0.836 × 10−3 mm2/s, p = 0.001. b Survival curves for patients with p16-positive and p16-negative tumors. In p16-positive patients, disease-free survival (DFS) was represented in two groups of primary tumor apparent diffusion coefficient (ADCmean): at or higher than 0.772 × 10−3 mm2/s, and lower than 0.772 × 10−3 mm2/s, p = 0.604. Difference between mean ADCmean between p16-positive and p16-negative tumors was statistically significant (p < 0.001)

Prognostic Value of Clinical and Radiological Variables in LRR (Table 4)

In univariable analysis, LRR rate was associated with p16 and higher T‑stage, stage, and grade. After adjusting these variables with p16, none of these clinical variables remained significant in LRR rate analysis.

Table 4 Crude and p16-adjusted hazard ratios (HR) for locoregional recurrence rate in Cox proportional hazards regression model

Univariable analysis showed prognostic value for primary tumor volume, transverse diameter, anteroposterior diameter, muscle invasion, and invasion depth. After adjustment with p16, primary tumor volume had significant association with the LRR rate. This was a two-class variable, where tumors were divided by the median, to ≤ 7 cm3 and > 7 cm3. Primary tumor transverse diameter was also significantly associated with worse prognosis, whilst ADC values showed no significance. In p16-positive tumors, ADC values were significantly higher in grade 1–2 tumors (mean 0.883) compared to grade 3 tumors (0.736; p = 0.003). In these patients, tumor grade had no effect on treatment outcome.

Differences in ADC Measurements Between Different MRI Systems

The ADCmyelum values in different MRI systems were significantly different between separate 1.5 T systems, and also between 1.5 T and 3 T systems (p = 0.007). After Bonferroni correction for multiple tests, the differences were no longer statistically significant.

Interobserver Correlations

For the primary tumor the intraclass correlation coefficient (ICC) for ADCmean was 0.783 and for ADCmin 0.797 and for metastasis 0.913 and 0.820, respectively. We observed a relatively inferior Cohen’s kappa (κ) interobserver agreement in evaluating necrotic and cystic nodes (κ = 0.499 and κ = 0.516, respectively, Table 2). Fig. 4 demonstrates examples of cystic and necrotic metastases.

Fig. 4
figure 4

Examples of cystic and necrotic metastases. a T2 fat saturated image of a metastasis that was clearly cystic, with a thin capsule and homogeneous fluid, with only a minor wall irregularity. b Example of a necrotic node in a T2 fat saturated image, with a mainly irregular margin between fluid and solid parts. c The third example is of a metastasis that the radiologists graded differently. The margin appears smooth in T2 fat saturated image but (d) reveals more irregularity in T1 gadolinium-enhanced image, leading the other radiologist to grade the metastasis as necrotic. All three tumors were p16-positive

Discussion

To our knowledge this is the first study analyzing prognostic imaging factors in OPSCC and including a considerable number of p16 positive OPSCC patients. Our main finding was that our results do not support the hypothesis that ADC would be an independent prognostic factor. Higher ADCmean and ADCmin were associated with lower DFS in univariable analysis but not after adjustment with the p16 status. This finding may be explained by the fact that ADC correlates with tumor p16 status [21, 22], which in itself is a strong predictor of disease recurrence and survival [31, 32].

Most studies evaluating the role of pretreatment ADC in predicting the results of (C)RT in different head and neck sites linked high ADC to worse prognosis [7, 9,10,11,12,13,14,15,16], while some showed no connection [17, 19, 33]. Interestingly, one study associated low ADC to worse 2‑year DFS, but over half of the patients were surgically treated [18]. Since previous studies have not differentiated results by site, it is hard to tell if ADC has real prognostic effect in cancers in other head and neck anatomic sites or if results are confounded because of the HPV-associated effects in ADC and survival among OPSCC.

In our study, among the p16 positive subgroup, ADC values were significantly higher in grade 2 tumors compared to grade 3 tumors. Similar observations have been previously reported in meningiomas [34], gliomas [35], and a nonsignificant correlation was found in oral and oropharyngeal cancers [36]. A study comparing histology to ADC values in laryngeal and hypopharyngeal cancers discovered that ADC correlated inversely with cell density, nuclear area, and nuclear-cytoplasmic ratio. ADC also correlated positively with the percentage area of stroma [37]. HPV-associated tumors are typically nonkeratinizing or only partially keratinizing, may have areas of central necrosis and cystic changes, and host tumor infiltrating lymphocytes. They often lack a strong stromal desmoplastic reaction and have a lower stromal volume [38]. Many of these characteristics can contribute to both lower ADC and better response to RT. HPV-associated cancer also has intrinsic genetic mechanisms, which might explain the radiation sensitivity [39]. Microscopic necrosis could also explain higher ADC, with hypoxia-initiating cellular survival mechanisms making the tumor less sensitive to RT, but so far hypoxia-related markers have not been found to be related to higher ADC [37, 40].

As presented in Table 2, even with a small number of patients, it appears that ADC values in p16-positive primary tumors and metastases were significantly lower compared to p16-negative tumors, which is in line with a previous systematic review and meta-analysis [22]. The p16-positive tumors were smaller and more often exophytic than not, similar to previous studies [41, 42]. Our cohort lacked any significant differences in the rates of cystic, necrotic, and solid metastases in relation to p16 status. This is in contrast to previous studies reporting a larger number of cystic metastases in p16-positive patients [26, 41, 42], while the limited number of p16-negative tumors might influence this assessment. Although cystic and necrotic nodes are clearly defined, in our study radiologists found the evaluation often challenging, which was reflected in the suboptimal interobserver correlation. On the contrary, other lymph node characteristics, such as ENE and matted nodes, showed better interobserver correlations.

We drew the ROI on the ADC maps with a free-hand method on the most central slice. We feel that this best resembles our available tools in everyday clinical practice and is easy to produce. The interobserver correlations for the ADC values were excellent. The ADC results in our study, however, may consequently differ from the studies that used automatic or semi-automatic segmentation. The different MRI systems used in this study bring forth variability in ADC values. These differences were observable in our measurements but were not statistically significant after Bonferroni correction for multiple tests.

We found a correlation between larger metastasis volume and worse DFS, regardless of p16 association. This is similar to a Finnish study comprising 91 OPSCC patients that found nodal volume to correlate with DFS and locoregional control both in p16-negative and p16-positive disease [8]. Davis et al. [43] also found that in HPV-positive OPSCC, smaller nodal volume led to better DFS. In our study, the LRR rate was higher in patients with a primary tumor volume over 7 cm3. The aforementioned reports failed to show similar results [8, 43]. In tonsillar tumors with unknown HPV association, the connection disappeared after taking the T‑stage into consideration [44]. Carpén et al. [8], however, observed a connection between larger primary tumor volume and worse OS in p16-negative disease. Interestingly, metastasis volume, but not primary tumor volume, correlated with survival in our study. This might be connected to our finding that larger neck metastasis volume but not larger primary tumor volume, correlated significantly with occurrence of later distant metastasis. A recent study showed that disease recurred only in OPSCC patients with persisting high risk HPV positivity after treatment [45]. In our hospital, we do not routinely determine the HPV status after treatment, although this might be worth considering in the future.

An unplanned break in radiation treatment led to worse DFS. The break duration was between 2–18 days, with most patients having mid-treatment breaks of under 10 days. In head and neck cancers, treatment breaks are known to decrease survival and local control [46, 47], tumor cell repopulation being faster during the break [48]. Incomplete radiotherapy seems to affect particularly the more radiosensitive HPV-associated oropharyngeal cancers [49].

In Finland surgery is usually the treatment of choice for patients with p16-negative tumors; however, (C)RT may be the option e.g., if surgery is expected to cause significant morbidity. We decided to include all patients treated with (C)RT with curative intent as our aim was neither a comparison of treatment modalities nor comparison of prognosis between p16-negative and p16-positive groups, and the final analysis accounted for the p16 status. This led to a small number of patients with p16-negative OPSCC in our cohort, and the recurrence rate and survival probably not being representative of the whole population with p16-negative tumors. Because of the small number of p16-negative OPSCC patients, and the generally good prognosis among the p16-positive OPSCC patients, event count remains fairly small and our survival analysis results must be interpreted with caution. To be able to obtain statistical power, we adjusted our survival analyses with p16 status only, as it is the most important factor affecting prognosis in OPSCC [31, 32]. One of the limitations in our study is use of ADC and p16 in the same Cox proportional hazards model as they have a strong statistical correlation; however, the observation that any prognostic impact of ADC remained absent in the p16-positive group supports our main finding of ADC being merely an indication of HPV status.

Conclusion

Our study showed that in patients with OPSCC and with known p16 status, pretreatment tumor ADC values provide no additional benefit in evaluating prognosis after definitive (C)RT with curative intent. Our results indicate that ADC values should not be used to select patients for de-escalation trials or lighter treatment.