Local recurrence of squamous cell carcinoma of the head and neck after radio(chemo)therapy: Diagnostic performance of FDG-PET/MRI with diffusion-weighted sequences

Purpose To determine the diagnostic performance of FDG-PET/MRI with diffusion-weighted imaging (FDG-PET/DWIMRI) for detection and local staging of head and neck squamous cell carcinoma (HNSCC) after radio(chemo)therapy. Materials and methods This was a prospective study that included 74 consecutive patients with previous radio(chemo)therapy for HNSCC and in whom tumour recurrence or radiation-induced complications were suspected clinically. The patients underwent hybrid PET/MRI examinations with morphological MRI, DWI and FDG-PET. Experienced readers blinded to clinical/histopathological data evaluated images according to established diagnostic criteria taking into account the complementarity of multiparametric information. The standard of reference was histopathology with whole-organ sections and follow-up ≥24 months. Statistical analysis considered data clustering. Results The proof of diagnosis was histology in 46/74 (62.2%) patients and follow-up (mean ± SD = 34 ± 8 months) in 28/74 (37.8%). Thirty-eight patients had 43 HNSCCs and 46 patients (10 with and 36 without tumours) had 62 benign lesions/complications. Sensitivity, specificity, and positive and negative predictive value of PET/DWIMRI were 97.4%, 91.7%, 92.5% and 97.1% per patient, and 93.0%, 93.5%, 90.9%, and 95.1% per lesion, respectively. Agreement between imaging-based and pathological T-stage was excellent (kappa = 0.84, p < 0.001). Conclusion FDG-PET/DWIMRI yields excellent results for detection and T-classification of HNSCC after radio(chemo)therapy. Key points • FDG-PET/DWIMRI yields excellent results for the detection of post-radio(chemo)therapy HNSCC recurrence. • Prospective one-centre study showed excellent agreement between imaging-based and pathological T-stage. • 97.5% of positive concordant MRI, DWI and FDG-PET results correspond to recurrence. • 87% of discordant MRI, DWI and FDG-PET results correspond to benign lesions. • Multiparametric FDG-PET/DWIMRI facilitates planning of salvage surgery in the irradiated neck.


Introduction
Patients with squamous cell carcinoma of the head and neck (HNSCC) are treated with radio(chemo)therapy, surgery or with a combination thereof [1,2]. It has been suggested that up to 50% of patients with HNSCC will experience disease relapse during their lifetime, locoregional recurrence being more common than distant metastases or second primary tumours. Local recurrence constitutes an important prognostic factor and influences the 5-year survival rate and quality of life [2,3]. Early diagnosis of local recurrence and precise depiction of tumour extent are important since surgical salvage increases overall survival [4].
Locally recurrent HNSCC is often more difficult to detect than primary SCC. Endoscopy may fail in the presence of submucosal recurrence and findings at cross-sectional imaging may be confusing since radio(chemo)therapy may induce morphological, functional and metabolic changes that are difficult to interpret [1,5,6]. Nonetheless, previous studies have suggested that both magnetic resonance imaging (MRI) with diffusion-weighted imaging (DWI) sequences and FDG positron emission tomography computed tomography (PET/CT) can substantially improve the detection of recurrent HNSCC [5][6][7][8][9]. As the combined use of PET/CT and MRI with DWI can add diagnostic certainty in difficult post-treatment situations [1,9], hybrid PET/MRI systems have raised high hopes in the field of oncological imaging due to the potential to obtain morphological, functional and metabolic information in a single examination [10][11][12][13][14][15][16][17][18][19][20]. Previous studies have addressed important technical questions related to PET/MRI feasibility in the head and neck (HN), quantification of FDG uptake with PET/MRI as compared to PET/CT and workflow issues [12,13,[16][17][18][19]. Since hybrid PET/MRI systems are expensive and examination protocols may involve long scanning times, the usefulness of PET/MRI in clinical routine needs to be determined based on added clinical value [20]. However, the added clinical value of hybrid PET/MRI examinations awaits validation in several oncological applications, including HN tumours [10,15].
The purpose of this prospective study was to assess the diagnostic performance of FDG-PETMRI with DWI for the detection and T-classification of HNSCC recurrence in a series of consecutive patients treated with radio(chemo)therapy.

Patient population
This prospective clinical study was approved by the institutional ethics committee and performed in accordance with the guidelines of the Helsinki II declaration. Written informed consent was obtained from all subjects. Eligible patients were identified at the Clinic for Otorhinolaryngology, Head and Neck Surgery of the University Hospital of Geneva. Over 36 months, hybrid PET/MRI examinations were performed in a consecutive series of 76 adult patients previously treated with curative radio(chemo)therapy ± surgery (delay between treatment end and imaging: mean ± SD = 15.2 ± 12.8 months, median [quartiles] = 12 months [5-22.5]). Indications for PET/MRI were persisting or newly developed symptoms after radio(chemo)therapy (pain, reflex otalgia, hoarseness, dysphagia). Exclusion criteria were standard MRI contraindications. None of the potentially eligible patients refused to participate. Two PET/MRI examinations were excluded from the study due to poor image quality (n = 1) or absent follow-up (n = 1). Therefore, a total of 74 PET/MRI examinations formed the basis of this series. Most patients (50/74, 67.5%) were males (mean age ± SD 62.1 ± 12.6 years). A small proportion of this cohort (15/74 patients) was included in a study comparing image quality and whole-body FDG uptake detectability with PET/MRI versus PET/CT [16].

Image evaluation, diagnostic criteria and measurements
Two board-certified radiologists with substantial experience in HN MRI and PET/CT (>15 years) and a board-certified nuclear medicine physician with substantial experience in PET/ CT and HN MRI (>10 years) evaluated the images separately and were blinded to all clinical/histopathological data. In case of discrepant evaluations, consensus was reached. Findings were recorded on pre-defined evaluation sheets using a fivepoint scale for receiver operating characteristics (ROC) analysis as follows: 1, definitely negative for recurrence; 2, probably negative; 3, indeterminate, therefore, suspicious/possibly positive; 4, probably positive; and 5, definitely positive.
The three readers evaluated morphological MRI first, then DWI and PET. All images (MRI, DWI and PET) were assessed according to the diagnostic criteria established in the literature and taking into consideration diagnostic pitfalls related to radiation-induced changes [1,9]. Internationally established qualitative and quantitative criteria were applied [1, 5-9, 11, 27, 28]. Tumours involving the upper aerodigestive tract, the neopharynx (after total laryngectomy) or flaps in the oral cavity/pharynx were considered as local recurrence [27]. On MRI, recurrent tumours were diagnosed in the presence of well-defined or ill-defined mass-like lesions with intermediate T2 signal ('evil grey'), moderate contrast enhancement and restricted diffusivity (high signal on b1000, low signal on ADC) [1, 5-7, 27, 28]. Lesions with high signal on T2, strong contrast enhancement and high signal on b1000 and ADC were interpreted as suggesting posttreatment inflammatory oedema. Mature scar tissue/longstanding fibrosis was diagnosed in the presence of an elongated lesion with very low signal on T2, no/minor contrast enhancement, and low signal on b1000 and ADC [1]. If on a DWI sequence localised artefacts were seen on slices outside the lesion to be measured, the sequence was regarded as being of acceptable quality and ADC measurements were carried out. Qualitative DWI assessment (visual assessment of b1000 and ADC) and quantitative assessment with ADC threshold were obtained for all lesions. The ADC threshold was calculated after completed radiological-histological correlation based on ROC analysis of prospectively measured ADCs [22]. Focal FDG uptake (visual tracer accumulation exceeding the adjacent background activity) was rated as PET positive taking into account physiological FDG accumulation and pitfalls in the HN, such as muscular, salivary gland, physiological Waldeyer's ring uptake or post-treatment inflammatory changes [1,8,9,20,[29][30][31]. Qualitative and quantitative PET assessment (with standardised uptake value (SUV) threshold) was obtained. The SUV threshold was calculated analogous to the ADC threshold.
Benign post-treatment lesions and complications (oedema, scar/fibrosis, soft tissue-and osteonecrosis, ulceration, denervation atrophy) were diagnosed on combined PET/DWIMRI taking into consideration established criteria [1]. As FDG uptake can be variable in post-radiotherapy changes/complications, increased focal FDG uptake was not necessarily regarded as indicating recurrence, and MRI and DWI characteristics were taken into consideration for the combined PET/ DWIMRI interpretation [1].
Measurements were obtained for: diameters for tumours and benign lesions, mean/minimum ADC values (ADCmean/ADCmin), and mean/maximum standardised uptake values (SUVmean/SUVmax). Tumour ADCs were measured with small elliptical regions of interest (ROIs) placed over several tumour sections on b1000 images and copied on the corresponding ADCmaps, while carefully avoiding areas of apparent necrosis [1,27,31]. Average ADCmean/ADCmin values were then calculated for each measured tumour. In analogy, SUV measurements were performed with ROIs placed on anatomically matched areas [16,31].

Standard of reference and correlation with imaging findings
The standard of reference consisted of histology and followup ≥24 months after PET/DWIMRI. Histology was obtained within 2 weeks: (1) in lesions with a rating ≥3 on PET/ DWIMRI, (2) in endoscopically suspicious lesions or (3) whenever there was a discrepancy between clinical/ endoscopic examination and imaging. Histology included endoscopic biopsy and salvage surgery. Histological analysis of the resected tumours was based on serial whole-organ sections as described in the literature [28]. It served as a gold standard for the assessment of the pathological T-stage (pT) according to UICC [32]. Two experienced pathologists (>12 years) interpreted histology prospectively and blinded to imaging findings.
Patients with negative examinations or negative histology were followed ≥24 months to determine whether negative readings corresponded to true negative assessments and to detect false-negative evaluations. Follow-up consisted of clinical evaluation and fiberoptic endoscopy every month during the first year, every 2 months in the second year, every 3 months in the third year, every 6 months in the fourth year and additional cross-sectional imaging. If follow-up was negative during the entire period, negative assessments were considered as true negatives. If recurrence was proven ≤3 months after PET/MRI, negative assessments were considered as false negatives. If recurrence was proven >3 months after a negative PET/MRI, the case was re-evaluated at the interdisciplinary HN tumour board to distinguish between a false-negative evaluation and a metachronous tumour unrelated to the initial PET/DWIMRI.
After completed image analysis, correlation between follow-up, histopathological and imaging findings was obtained. Correlation between imaging and whole-organ surgical specimens was made on a slice-by-slice basis.

Statistical analysis
Statistical analysis was carried out by an experienced biomedical statistician (>15 years). Diameters, ADCmean/ADCmin and SUVmean/SUVmax for benign lesions and tumour recurrence were compared using a linear mixed effect regression model with a random intercept to account for data clustering. The diagnostic performance for combined PET/DWIMRI was assessed globally by calculating the area under the curve (AUC) and specifically at a cut-off of 3 (sensitivity, specificity, predictive values, accuracy). Statistical comparisons considered paired clustered data [33,34]. An optimal cut-off value for ADCmean/SUVmean was calculated by minimising the distance between the corresponding point of the ROC curve and the upper left graph corner [35]. Multivariant logistic regression analysis (with mixed effects to account for clustering) was performed to assess the association between histology and ADCmean/SUVmean binarised according to optimal cut-off values. Cohen's kappa coefficient was used to assess the concordance between PET/DWIMRI and the pathological T-classification (pT) [36]. All statistical analyses were conducted with R3.3.1(R-foundation for Statistical Computing, Vienna, Austria) and statistical tests were twosided with a significance level of 0.05.
Comparison of AUCs of PET/DWIMRI with ADCmean/ SUVmean threshold versus PET/DWIMRI without threshold (qualitative assessment) revealed no statistically significant difference (p > 0.05). Comparisons took data clustering into consideration.

Correlation between imaging findings and the standard of reference
Thirty-two of the 43 recurrent HNSCCs (32/43,74%) had no obvious mucosal abnormality at endoscopy and PET/ DWIMRI was essential in guiding the surgeon to select the most appropriate biopsy site. Correlation with whole-organ pathology revealed that true positive PET/DWIMRI assessments corresponded histologically to tumours located mainly beneath intact mucosa and with multicentric foci dispersed over large anatomical areas (Figs. 4 and 5). In 9/43 (20.9%) tumours, histology revealed microscopic perineural spread and intravascular tumour thrombi (Fig. 5). False-negative assessments ( Table 2) were caused by oropharyngeal pTis (n = 1), laryngeal pT1 (n = 1) and oral cavity pT2 (n = 1) tumours; the mean diameter of missed tumours was 11 ± 9 mm. In retrospect, the readers were able to identify one of the three missed tumours. During the prospective readings, nine false-positive FDG-PET assessments were avoided due to absent restriction on DWI (n = 3) (Fig. 6) or due to absence of suspicious features on both DWI and MRI (n = 6). Likewise, two false-positive DWI assessments were avoided due to absent focal FDG uptake, whereas eight falsepositive MRI interpretations were avoided due to nonsuspicious PET and ADC (n = 6) or due to absent focal uptake only (n = 1). In three patients, however, combined PET/ DWIMRI yielded four false-positive assessments (Table 2), which were caused by granulation tissue/ulcer (n = 2), infection/abscess (n = 1) or lymphoid hyperplasia (n = 1) mimicking recurrence on all three modalities. In retrospect, these four false-positive assessments could not have been avoided.

Concordance analysis of MRI, DWI and PET readings
Morphological MRI, DWI and PET interpretations were considered concordant if the results on all three modalities (MRI, DWI and PET) were positive (rating ≥3) or if all results were negative (rating < 3). Results were considered discordant if two modalities were positive, one was negative or if one modality was positive and two were negative. Kappa coefficient for the concordance between the three methods was 0.71, indicating substantial agreement according to Landis and Koch [36].   Lesion-per-lesion evaluation: AUC (95% CI) PET/DWIMRI = 0.939 (0.887-0.990). The ROC curve for lesions was calculated taking data clustering into consideration positive on one modality; 12 of these 14 lesions (85.7%) had a negative gold standard. Nine discordant lesions had two positive results and one negative result; eight of these nine lesions (88.9%) had a negative gold standard. Results were, therefore, more frequently concordant in recurrent tumours than in benign radiation-induced lesions (p = 0.0018).

Discussion
In this prospective study, PET/DWIMRI had an excellent overall diagnostic performance and enabled accurate T-classification of recurrent tumours, therefore facilitating salvage surgery. Locally recurrent tumours had significantly lower ADCmean/ ADCmin and significantly higher SUVmean/SUVmax than benign post-treatment lesions/complications, our results being in agreement with the literature [1, 5-7, 27, 31]. As tumour differentiation influences ADC values (poorly differentiated HNSCCs have lower ADCs), the low prevalence of welldifferentiated tumours in this series may limit the overall validity of ADC measurements. Nevertheless, our ADCmean values were similar to those reported by others at 1.5 T and 3 T MRI, respectively [1,[5][6][7]31]. Although AUCs for ADCmean and SUVmean/SUVmax were similar, there was no statistically significant correlation between ADC and SUV, therefore, our results further support evidence that these biomarkers are independent parameters in HNSCC [31,35,37]. Multivariate logistic regression analysis also showed that each binarised criterion (ADCmean ≤1.208, SUVmean ≥3.361) significantly improved Submucosal oedema with very high T2 signal (green asterisk). Normal right submandibular gland (blue asterisk). ADC map (B): restricted diffusion suggesting recurrence (white asterisk, ADCmean = 1.127 × 10 -3 mm 2 /s). High ADC signal surrounding the tumour (green asterisks, ADCmean = 1.789-1.965 × 10 -3 mm 2 /s) due to oedema. Left and right submandibular glands (pink and blue asterisks). (C) PET/MRI (PET fused with T1) suggests recurrence (increased FDG uptake, arrows, SUVmean = 7.688; SUVmax = 12.11). (D) Whole-organ axial section from surgical specimen (same orientation) confirms recurrence (white asterisk) invading the above-described structures. Submandibular gland (pink asterisk). Tumour margins contoured by pathologist (white line). Green asterisks: inflammatory oedema. T-stage on PET/DWIMRI was T4a. Pathological stage was pT4a tumour detection when added to the other criterion, thus additionally supporting the concept of ADC/SUV complementarity. The combination of SUV and ADC has also been recently used to stratify patients into risk groups, high SUVmax combined with high ADCmin being associated with worst prognosis [38].
We found no significant difference between visual assessment and quantitative assessment with ADC/SUV thresholds, although our thresholds were very similar to published thresholds based on DWIMRI and PET/CT. This finding may be of interest for clinical routine because the issue of SUV quantification using PET/MRI is not yet solved, recent publications having reported underestimation of SUVs with PET/MRI versus PET/CT [10,11,15,16,20,26]. Moreover, SUV measurements may also be influenced by biological factors (blood glucose level, body size, breathing) and technological characteristics (scanner model, reconstruction parameters, dose calibration) [39]. Visual analysis of focal FDG uptake without threshold may thus be sufficient for the diagnosis of local recurrence in clinical routine. Visual assessment of FDG uptake without semiquantitative measurements/thresholds is also used in many institutions for PET/CT scan interpretation [8,29,30,[40][41][42].
Meta-analyses evaluating PET and PET/CT for the followup of HNSCC found that the pooled sensitivity and NPV of PET and PET/CT for detecting residual/recurrent HNSCC at the primary site were very high, whereas the PPV was only in the range of 58.6-75% [29,[40][41][42]. In contrast to the PET/CT literature, data on the capability of DWIMRI to detect postradio(chemo)therapy HNSCC recurrence are very sparse [5][6][7]43]. Nevertheless, it was suggested that DWIMRI has high sensitivity/specificity, but variable PPV/NPV [5][6][7]43]. A recent study by Queiroz et al. including patients with primary/recurrent HNSCC and other histological types reported that adding DWI information to PET/MRI may diminish specificity and overall diagnostic accuracy; the authors therefore concluded that DWI did not improve the diagnostic performance of PET/MRI [44]. We cannot confirm this observation. On the contrary, our excellent PET/DWIMRI performance for all analysed parameters ( Table 2) including AUCs (Fig. 2) corroborates the fact that DWIMRI and PET provide complementary information in symptomatic irradiated HNSCC patients. DWI helped to avoid false-positive findings caused by FDG-uptake and nonspecific MRI morphology. It is well known that FDG-PET can lead to false-positive evaluations after radiotherapy as inflammatory cells contribute substantially to FDG-uptake [1,9,29,30]. Possible explanations regarding the discrepancy between our results and the results of Queiroz et al [44] include: different histology, posttreatment imaging only versus pre-and post-treatment imaging, evaluation of local recurrence only versus evaluation of tumours, lymph nodes and metastases, different DWI parameters and use of different diagnostic criteria (multiparametric complementarity in this study versus 'MRI, PET or DWI positivity'). While a combination of criteria using the 'or'-conjunction may increase sensitivity, it invariably yields lower specificity, as shown by our analysis of ADCmean/SUVmean thresholds.
Prospective interpretation of multiparametric data is challenging, in particular if morphological MRI, DWI and PET findings are discrepant (21.9% in this series). Currently there is no consensus on how to deal with such discrepant information, whether it should be preferred to rely on morphological MRI, DWI or PET. This diagnostic uncertainty can lead to unnecessary biopsy in irradiated tissues with the risk of precipitating infection. Our study may show a way to manage with all techniques added diagnostic certainty and corresponded to recurrent tumours whereas negative concordant results and discordant results most often corresponded to benign lesions. This approach could also be applied in indeterminate/suspicious FDG-PET/CT readings, in which case a high ADC revealed by DWIMRI would instead lead to a wait and see policy instead of biopsy. Larger patient series are, however, necessary to substantiate this approach.
Recurrent HNSCCs tend to occur submucosally with multicentric tumour foci dispersed over large anatomical areas, a growth pattern that is different from the rather concentrical growth of primary HNSCCs [45]. Our study confirms this reported growth pattern and suggests that this distinct histological feature accounts for the ill-defined tumour aspect at imaging. In our study, PET/DWIMRI had a high staging accuracy with excellent agreement between imagingbased and pT stage. This is in contrast to previous reports suggesting that MRI or CT may grossly underestimate local recurrence leading to inadequate surgery in up to 50% of patients [46]. Possible explanations for the high diagnostic PET/ DWIMRI performance and high T-staging accuracy include the routine use of high-resolution images, clearly defined diagnostic criteria with precise analysis of signal intensity and enhancement patterns, increased diagnostic confidence due to multiparametric information and evaluation by experienced readers. The fact that only experienced readers interpreted the images may constitute a limitation of this study, and it may be necessary to perform a multi-observer study to evaluate whether our results are also reproducible by readers who are less familiar with HN imaging.
The purpose of this study was to evaluate the diagnosis of local HNSCC recurrence. Detection of nodal recurrence requires detailed systematic correlation with neck dissection specimen on a level-by-level and node-by-node basis.
Although this is also important from a clinical point of view, the related questions are beyond the scope of this report and require a separate analysis.
In summary, PET/DWIMRI has an excellent diagnostic performance for the detection of HNSCC recurrence after radio(chemo)therapy with excellent agreement between imaging-based and pathological T-stage provided appropriate diagnostic criteria are applied. Results of our study show that positive concordant results with MRI, DWI and PET correspond to locally-recurrent HNSCC, negative concordant results correspond to absent recurrence, and discordant results rather correspond to benign post-radio(chemo)therapy lesions/complications.
Statistics and biometry One of the authors (Christophe Combescure, PhD in biomedical statistics and lecturer) has significant statistical expertise. He is a biomedical statistician who has been employed by the Center for Clinical Research of the University of Geneva for 9 years. He is the author of over 100 original articles in peer-reviewed journals including articles on statistical methods, and teaches biomedical statistics at the University of Geneva at pre-and postgraduate levels.   T0  pT1*  pT2  pT3  pT4a  pT4b  Total   T-classification PET/DWIMRI  T0  58  2  1  0  0  0  61   T1*  3  7  0  0  0  0  Cohen's kappa = 0.84, p < 0.0001 * The only pTis lesion in this series was counted together with the T1 lesions ** Histology was the standard of reference in all 43 HNSCCs (salvage surgery in 37 tumours and excisional or diagnostic biopsy in six tumours). In 23 benign lesions, histology was the standard of reference (pT0), whereas for the remaining 39 benign lesions without histological proof, a negative follow-up (mean ± SD = 34 ± 8 months) was considered as indicating T0 Informed consent Written informed consent was obtained from all subjects (patients) in this study.
Ethical approval Institutional Review Board and Ethics Committee approval was obtained.
Study subjects or cohorts overlap Some study subjects (15/74) have been previously reported in a study that evaluated PET/MRI image quality and detectability of whole body FDG uptake with PET/MRI versus PET/CT and analysing SUV quantification:

Methodology • prospective
Open Access This article is distributed under the terms of the Creative Comm ons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.