What is the prognostic impact of FDG PET in locally advanced head and neck squamous cell carcinoma treated with concomitant chemo-radiotherapy? A systematic review and meta-analysis

Purpose Evidence is conflicting on the prognostic value of 18F-fluorodeoxyglucose (FDG) positron emission tomography (PET) in head and neck squamous cell carcinoma. The aim of our study was to determine the impact of semiquantitative and qualitative metabolic parameters on the outcome in patients managed with standard treatment for locally advanced disease. Methods A systematic review of the literature was conducted. A meta-analysis was performed of studies providing estimates of relative risk (RR) for the association between semiquantitative metabolic parameters and efficacy outcome measures. Results The analysis included 25 studies, for a total of 2,223 subjects. The most frequent primary tumour site was the oropharynx (1,150/2,223 patients, 51.7%). According to the available data, the majority of patients had stage III/IV disease (1,709/1,799, 94.9%; no information available in four studies) and were treated with standard concurrent chemoradiotherapy (1,562/2,009 patients, 77.7%; only one study without available information). A total of 11, 8 and 4 independent studies provided RR estimates for the association between baseline FDG PET metrics and overall survival (OS), progression-free survival (PFS) and locoregional control (LRC), respectively. High pretreatment metabolic tumour volume (MTV) was significantly associated with a worse OS (summary RR 1.86, 95% CI 1.08–3.21), PFS (summary RR 1.81, 95% CI 1.14–2.89) and LRC (summary RR 3.49, 95% CI 1.65–7.35). Given the large heterogeneity (I2 > 50%) affecting the summary measures, no cumulative threshold for an unfavourable prognosis could be defined. No statistically significant association was found between SUVmax and any of the outcome measures. Conclusion FDG PET has prognostic relevance in the context of locally advanced head and neck squamous cell carcinoma. Pretreatment MTV is the only metabolic variable with a significant impact on patient outcome. Because of the heterogeneity and the lack of standardized methodology, no definitive conclusions on optimal cut-off values can be drawn. Electronic supplementary material The online version of this article (10.1007/s00259-018-4065-5) contains supplementary material, which is available to authorized users.


Introduction
Head and neck cancer is the sixth most common malignant tumour, with increasing incidence worldwide [1]. In over 95% of cases, the disease arises from the epithelial layer of the mucosa lining the upper aerodigestive tract. Due to the absence of anatomical barriers, the abundant lymphatic drainage of the neck and the usually infiltrative pattern of growth of head and neck squamous cell carcinoma (HNSCC), in about 60% of patients the diagnosis is made at an advanced locoregional stage. In order to maximize the likelihood of disease cure, multimodality treatment is usually needed. Therapeutic management is often challenging: both primary radical surgery and concurrent chemoradiotherapy are burdened with a high rate of posttreatment complications, acute and long-term toxicities [2] and a marked detrimental effect on quality of life. Notwithstanding the refinement of treatment strategies that has taken place in last 20 years, the prognosis of HNSCC remains severe, with a cumulative 5-year overall survival (OS) rate of 45-55% [3] in patients with locally advanced disease. The prevalent pattern of failure in the overall population is locoregional: about 50% of first events of relapse occur at the primary tumour site and/or in the neck, in the vast majority (about 90%) within the first 2 years after treatment.
Taking into account that the patient's outlook can be substantially influenced by clinical factors with large variability existing among the different subsites of disease, a series of common features contribute to the severe prognosis of locally advanced HNSCC; these include the suboptimal efficacy of the standard Bone size fits all^multimodal approach, the large proportion of frail patients who are noncompliant with intensive therapy, and the absence of biomarkers. In this regard, the only notable exception is the human papillomavirus (HPV). In the last 15 years, a major epidemiological shift has taken place in western countries due to the rising incidence of HPVassociated oropharyngeal cancer [4], reducing the dominance of the classical phenotype of HNSCC resulting from alcohol and tobacco-induced field cancerization. A positive HPV status was recognized as an independent favourable prognostic factor in a series of correlative prospective studies and in an unplanned secondary analysis of the randomized phase 3 RTOG 0129 trial [5]. Overall, HPV positivity is associated with a reduction in the risk of death and disease progression of about 60%.
Although major progress has been achieved in unravelling key molecular pathways involved in HNSCC pathogenesis [6], at present no biomarkers are available in clinical practice apart from HPV status. Prognostic information is therefore critically lacking in the management of patients affected by HNSCC. Next to individual genomic profiling, an alternative strategy which has been explored in recent years is to integrate molecular imaging into precision oncology care, exploiting the potential of imaging as a biomarker. The possibility of linking the information obtained from medical images with personalized treatment forms the core of Btheragnostics^, an term that has been used particularly in the context of radiation therapy [7]. In a hallmark review published in 2000, Ling et al. [8] suggested that the evolution of molecular imaging could facilitate the development of customized dose delivery in the era of intensity-modulated radiotherapy (IMRT). As foreseen by Ling and colleagues, in the last 15 years molecular imaging has been increasingly implemented in the management of HNSCC, in particular 18 F-fluorodeoxyglucose (FDG) positron emission tomography (PET). The fundamental prerequisite is the ability to image physiopathological processes occurring within a tumour or its microenvironment. The use of FDG allows the characterization of the metabolic activity of a defined tumour burden. In HNSCC, available evidence supports the role of FDG PET in primary target definition for radiotherapy planning [9], staging [10] and posttreatment response assessment [11]. However, its potential impact on patient outcomes is an unresolved issue. The aim of this work was to define the relevance of semiquantitative and qualitative FDG PET features as prognostic biomarkers in the curative setting of locally advanced head and neck cancer.

Materials and methods
In accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement [12], a systematic review of the literature was conducted. Relevant articles were identified in two databases (MEDLINE and Embase) over a 10-year period (1 January 2007 to 28 February 2017) using the appropriate terminology as described in Appendix 1 of the Supplementary material. Conference proceedings of main international conferences (ASCO, ASTRO, ESMO, ESTRO, ECCO) were also searched. The reference lists of the articles reviewed as full texts were also searched manually. The literature search strategy was based on the PICO methodology [13], as discussed in the following sections.

Population
The target population of our analysis consisted of adult patients (>18 years of age) treated with curatively intended radiotherapy, concurrent chemoradiotherapy or radiotherapy combined with targeted therapy for locally advanced HNSCC. Primary surgery and induction chemotherapy were not allowed. In view of the known heterogeneity among different head and neck subsites, we sought to assess whether the impact of metabolic parameters could be observed in specific disease entities or in HNSCC taken as a whole. In addition, information on the radiotherapy technique used and the schedule of systemic therapy administered was collected when available. To provide evidence-based support for the analysis, the published literature was categorized according to the type of study design: all case series except those with fewer than 20 patients, literature reviews and consensus statements were eligible. Only studies in the English language were included.

Interventions
Upon inclusion in the analysis, adequate information on FDG PET metrics (semiquantitative parameters and/or qualitative scores) had to be retrieved from the studies analysed. Studies focusing on tracers other than FDG and on integrated PET/MRI were excluded. Since the main aim of this review was to investigate the potential impact of specific metabolic data on HNSCC prognosis, the following parameters were considered as main interventions: standardized uptake values (SUV max , SUV mean , SUV peak ), metabolic tumour volume (MTV) and total lesion glycolysis (TLG). These parameters were defined according to reference guidelines [14], as follows: -SUV (body-weighted): the concentration of FDG in a given region of interest (ROI) or volume of interest (VOI; expressed in kilobecquerels per millilitre) divided by the ratio between administered activity (corrected for radioactive decay at the time of scanning) and the body weight of the patient -SUV max : the highest SUV of pixels (or voxels) in a given ROI (or VOI) -SUV mean : the mean SUV of pixels (or voxels) in a given ROI (or VOI) -SUV peak : SUV mean within a 1-cm 3 spherical VOI centred on the voxels with the highest uptake -MTV: the VOI segmented using a fixed threshold (usually 41% or 50%) of FDG-avid lesions -TLG: the product of the VOI average SUV (SUV mean ) and the corresponding MTV Standardized qualitative interpretations of FDG PET scans were also considered interventions, if rigorously defined. In addition, the included studies were further analysed according to the timing of the FDG PET scans, whether performed before, during or after treatment.

Comparators
When available, different clinical factors other than the metabolic FDG PET parameters discussed above were defined as Bcomparators^if analysed as potential prognostic biomarkers.

Outcomes
Ultimately, we sought to assess whether intrinsic features on FDG PET retain prognostic significance in terms of outcome. Therefore, we searched for a potential correlation between the interventions (as described above) and locoregional control (LRC), progression-free survival (PFS) and OS at a minimum follow-up of 1 year. These outcome measures were defined as follows: -LRC: the time from randomization (or study initiation) to local and/or regional disease progression -PFS: the time from randomization (or study initiation) to disease progression or death -OS: the time from randomization (or study initiation) to death from any cause Studies in which the main outcome measure was not consistent with the definition of the prespecified efficacy endpoints were excluded. Studies performed to assess the diagnostic accuracy of FDG PET as well as Bin-silico^radiotherapy planning analyses were also excluded.

Statistical analysis
Baseline demographics, patient and disease characteristics, treatment features and outcome data were collected by three authors (P.B., A.M., E.O.), verified by two reviewers (I.D., S.C.) and summarized using descriptive statistics. From all studies included in the literature review, we extracted the most adjusted estimate of relative risk (RR), including odds ratio and hazard ratio (HR), for the association between each of the metabolic parameters (SUV max , SUV mean , SUV peak , MTV and TLG) and each of the patient outcomes (OS, PFS and LRC). When there were two or more independent RR estimates, these were transformed into logRR and the corresponding variance using the formula of Greenland [15] and pooled using random effects models to obtain a summary RR (SRR) and corresponding 95% confidence intervals (CI). We assessed the heterogeneity between studies using the I 2 statistic, which is interpreted as the percentage of the variability that is attributable to true heterogeneity rather than chance. Larger values of I 2 denote greater betweenestimate heterogeneity; values of I 2 below 50% are considered acceptable. We did not perform subgroup analysis and meta-regression because of the limited sample size. Finally, we evaluated the presence of publication bias using the funnel plot of Begg and Mazumdar [16] and the regression test of Egger et al. [17]. The metaanalysis was conducted using the metan command in Stata version 14 (Stata Corp, College Station, TX).

Data collection and analysis
Two authors (P.B., A.M.) independently examined the titles and abstracts of each search record, and retrieved the full text articles for potentially eligible studies. The full texts were further examined according to the inclusion criteria. Discrepancies were resolved by consensus. Data were extracted by the two authors using a data collection form. Overall, of 180 studies identified using the predefined search criteria, 81 were screened by assessment of the abstracts (Fig. 1). Of these screened studies, 42 were evaluated for eligibility, and 25  satisfied the inclusion criteria and were therefore analysed fully. The whole reference lists of the eligible studies and the reasons for exclusion are available in Appendix 2 of the Supplementary material. In terms of study design, most included studies (21/25, 84%) were retrospective. Two papers were initially retrieved in abstract form [33,41] and updated as soon as the full versions became available [34,42]. One study [29] had limited data on the disease and treatment characteristics collected in most patients, but provided adequate information on FDG PET variables and outcomes.

Treatment-related features
Most patients (1,467/1,544, 95%) were treated with IMRT, while 77 (5%) received 3D-conformal radiotherapy (3DCRT). No information on the radiotherapy technique used was available in seven studies ( Table 4). The most adopted radiotherapy regimen consisted of conventional fractionation of 1.8 or 2 Gy per fraction for a total dose of 66-72 Gy in the majority of cases (22/23 papers; no available data in two studies). Concurrent chemoradiotherapy was the most frequent treatment schedule in our analysis, being used in 1,562/ 2,009 patients (77.7%; no available information in only one study). Standard three-weekly 100 mg/m 2 cisplatin was the chosen regimen in almost half of the included studies (11/ 24). Finally, a very small group of patients received induction chemotherapy before radiotherapy (181/2,223, 8.1%) in seven studies. On the basis that these studies were not excluded by our entry search criteria, they were retained in the analysis. The timing of FDG PET was different among the studies included in the analysis (Table 5). A single baseline assessment time-point was present in almost half of the studies (12/25) while a combination of pretreatment, interim (during treatment) and posttreatment scans was described in four (pretreatment plus interim), seven (pretreatment plus posttreatment) and two (pretreatment plus interim plus posttreatment) studies. Among those studies providing data on more than a single time-point, a time-weighted analysis exploring changes over time (Bdelta^) of specific metabolic semiquantitative or qualitative features was additionally reported in seven. As a single variable, MTV and SUV max were the main metabolic parameters addressed in nine and seven studies, respectively. A qualitative analysis was used in three studies [29,31,42]. Zschaeck et al. [44] determined SUV mean in irradiated normal mucosa tissue to explore the impact of off-target hypermetabolism and its change over time. Only a limited number of alternative prognostic biomarkers (comparators) were reported in parallel with the metabolic evaluation (nine studies). The median overall follow-up time for all studies was 23.6 months (range 15-55.8 months). In terms of threshold or cut-off values to discriminate worse from better outcomes, a large variability was observed for each intervention. Finally, a large heterogeneity characterized the prognostic information which could be extracted from each paper.

Discussion
In the era of precision oncology, the lack of prognostic biomarkers has hindered the evolution of standard-of-care management in HNSCC. Apart from HPV status, no molecular stratification is currently available for use in daily practice. In the last two decades, steady technological progress has highlighted the potential of imaging as a comprehensive tumour biomarker [47]. In the field of functional imaging, FDG PET is the most widespread, easily accessible modality that is able to provide surrogate metabolic information on tumour ns not stated burden. The aim of our work was to define whether distinct FDG PET features can be intrinsically associated with prognostic relevance in the context of nonmetastatic HNSCC. We acknowledge several limitations which have to be taken into account when interpreting the data presented. First, most studies included in our systematic review were retrospective. Although a strict search methodology was followed, their potential heterogeneity in terms of patient selection, treatment administration and outcome measures may have affected the consistency of our analysis. Second, the technical variability in the performance of FDG PET scans is also a factor that cannot be ignored with a retrospective study design; only a prospective design can ensure that consensus acquisition recommendations [14] are rigorously adopted. Third, among the included studies the methods used to calculate the FDG PET metrics were not consistent. Heterogeneity in their definition has to be taken into account particularly for SUV max and MTV, for which several threshold values were shown to be significant in discriminating patients with different outcomes. Renewed interest in the role of FDG PET in the management of HNSCC was recently prompted by the publication of the PET-NECK trial [11]. The findings of this large prospective, multicentre phase 3 trial are practice-changing, since the study provided definitive evidence in favour of a response evaluation centred on the high negative predictive value (NPV) of a 12-week posttreatment FDG PET scan. However, the study had two main limitations that prevented the clarification of other relevant issues on the role of FDG PET in the management of HNSCC. First, none of the 564 patients enrolled in the trial underwent a baseline FDG PET scan; a qualitative comparison between pretreatment and posttreatment scans was therefore not performed. Second, FDG PET semiquantitative metrics could not be evaluated due to nonuniform calibration among the different scanners. From this perspective, the PET-NECK trial did not add any new data to the available low-level body of evidence on the prognostic role of specific FDG PET semiquantitative and qualitative features in HNSCC. Although many investigators have focused on this topic in the last 15 years [48], the literature is characterized by inconclusive and heterogeneous findings [49].  Tx  T1  T2  T3  T4  N0  N1  N2  N3  I  II  III  IV  III/  Cisplatin 100 mg/m 2 every 21 days ns A crucial aspect that needs again to be underlined is the strict dependence of FDG PET information on the image acquisition modality, which in turn may be influenced by a series of factors, ranging from the technical parameters of the scanner to the timing of the scan with respect to treatment. As also demonstrated in our descriptive analysis (Table 5), there is significant variability in the correlation between semiquantitative metrics and outcome measures in HNSCC. We have already pointed out that in the posttreatment scenario a negative PET scan at 12 weeks after chemoradiation is a prognostic biomarker of long-term complete remission based on level 1 evidence. However, standardized interpretation of response to treatment is lacking. In this context, the Hopkins criteria are the only proposed scoring system for qualitative interpretation of FDG PET in HNSCC. Marcus et al. [29] showed that a fivepoint scale based on prespecified qualitative descriptors is accurate in discriminating complete from incomplete responses. The application of the Hopkins criteria resulted in a high NPV of 91.1% with an overall diagnostic accuracy of 86.9%. Notably, the results of the ECLYPS study [42] prospectively confirmed the reliability of the Hopkins criteria applied 12 weeks after the end of treatment, with an overall NPVof 92.1% and a very low number of equivocal reports. As accurately described by Garibaldi et al. [50] in a recent systematic review, the potential prognostic and predictive relevance of an interim FDG PET scan (scan acquired during treatment) is a controversial matter. At present, no firm conclusions can be drawn as to the ideal metabolic parameter to analyse early in treatment, the most informative threshold value, or the best time to re-scan the patient.
Taking all together, the use of FDG PET in patients with HNSCC provides prognostic information through standardized qualitative assessment at a minimum of 12 weeks after chemoradiation, but no added value during its delivery. It is therefore a rational approach to investigate before treatment whether baseline semiquantitative metrics are intrinsically able to characterize the outcome in patients with locally advanced disease. Conflicting evidence is available from the literature. Pak et al. [51] performed a systematic review and meta-analysis of 13 studies (1,180 patients) to assess the prognostic role of MTV and TLG before treatment. The authors found that high values of both volumetric parameters correlated significantly with a worse outcome. The pooled HRs for OS were 3.51 (95% CI 2.62-4.72, p < 0.00001) and 3.14 (95% CI 2.24 -4.40, p < 0.00001) for MTV and TLG, respectively. However, the generalizability of these results is open to question. First, loose criteria were followed in the literature search strategy and inclusion of articles. Second, for both parameters no threshold values portending a worse outcome were defined, thus preventing further analysis of the data.
In a prospective study in 77 patients affected by stage II-IV HNSCC, Schinagl et al. [52] consistently applied five different segmentation methods for coregistered CT and FDG PET scans   to 75% corresponding to an increase in SUV max from 9.6 to 16.8) for a worse prognosis. The link between FDG avidity and tumour volume has been further explored by different groups focusing on MTV. In this regard, the correlative, prospective imaging study of the randomized phase 3 RTOG 0522 trial [40] is noteworthy. Of the whole sample of 940 patients enrolled, 74 from 19 different centres provided both pretreatment and posttreatment FDG PET scans, as mandated upon inclusion. A prespecified acquisition imaging protocol was followed in all patients. Excellent centralized interobserver agreement (intraclass correlation coefficient ≥0.80) on semiquantitative metrics was reported. Based  on voxels with a minimum of 40% SUV max , baseline primary MTV above the median was the strongest prognosticator of worse LRC (HR 4.01, 95% CI 1.28-12.52, p = 0.2). Other retrospective studies [23,27,37] have underlined the prognostic value of baseline MTV, reporting different cut-off values as most significant for a worse outcome (combined primary and nodal MTV >40 ml, >20 ml and >18 ml correlating with worse DFS [23], LRC and OS [27], and disease-specific survival [37], respectively). The prognostic value of MTV analysed as a continuous variable has also been reported.
In a single-centre retrospective analysis in 83 patients, Tang et al. [53] found that an increase in primary baseline MTV of 17 ml (from the 25th to the 75th percentile) was associated with a doubling of the risk of disease progression (p = 0.0002) and of death (p = 0.0048). Of note, combined primary and nodal MTV (as a continuous variable) was also associated with a shorter PFS (HR 4.23, p < 0.0001; CI not reported) and OS (HR 3.21, p < 0.0029; CI not reported) in the subgroup of 64 patients with p16-positive oropharyngeal cancer. In a larger cohort of 122 patients with oropharyngeal cancer, Castelli et al. [45] assessed whether the use of different absolute and relative thresholds of SUV max result in different discriminatory power of MTV. Using a 51% relative SUV max threshold, combined primary and nodal MTV was the only significant factor in a multivariate analysis predicting OS (HR 1.43 per 10 ml, CI 1.23-1.65, p < 0.001) and DFS (HR 1.43 per 10 ml;,CI 1.23-1.65, p = 0.03). The optimal cut-off value for MTV 51% was 22.7 ml, which was able to discriminate 2-year DFS with rates of 63.3% versus 32.9% and LRC with rates of 68% versus 35.3%.
The absence of a consensus methodology on VOI delineation is clearly a limitation when comparing different datasets on the prognostic relevance of MTV, since no single optimal cut-off value is recognized. In line with previous experience, our data reinforce the prognostic role of pretreatment MTV as the most informative semiquantitative metabolic feature. In line with our search inclusion criteria, the patient population analysed was extremely homogeneous (about 95% of the whole sample size) in terms of disease stage, radiotherapy technique used and schedule of concomitant chemoradiotherapy. With all due limitations, our analysis provides further evidence on the predominant impact of pretreatment MTV on HNSCC outcome compared with all other available FDG PET metrics. Further consideration of its role also as a predictive biomarker may be generated by pattern-of-failure data correlating baseline FDG PET and radiation dose distribution in HNSCC. Due et al. [54] performed a retrospective analysis in 304 HNSCC patients with the aim of correlating the pattern of disease failure with FDG uptake on pretreatment PET scans. By performing a deformable registration of CT scans acquired at the time of recurrence with the planning PET/CT scan, the authors showed that 96% of relapses (95% CI 86-99%) occurred in the high-dose region. In addition, they found that recurrence density was higher in the central part of the target volume (p < 0.0001), with a significant correlation with increasing FDG avidity (p = 0.036). In a smaller cohort of 44 patients enrolled in a prospective phase 2 trial, Leclerc et al. [55] showed that all ten recurrences arose in areas receiving >95% of the dose determined on PET-based plans. A similar finding was reported by Mohamed et al. [56], who hypothesized that a 1-cm margin in addition to the 50% SUV max isocontour on pretreatment FDG PET scans would cover the majority of type A recurrences (according to the authors' definition, those that arise in the central high-dose area).
Once again, it has to be underlined that, among others, the main limitations of FDG in HNSCC are its suboptimal specificity and the large variability in segmentation methods. Potentially, it could be hypothesized that hypoxia PET [57] and diffusionweighted magnetic resonance imaging [58] may be more refined imaging biomarkers in the field of HNSCC. However, conclusive results on their prognostic impact have long been awaited, mainly due to the lack of reproducibility and cost issues preventing their adoption on a large scale. In our opinion FDG PET will remain the most widespread functional imaging modality used in clinical practice for many years to come.

Conclusion
The absence of prognostic biomarkers is a critical limitation in the management of locally advanced HNSCC. With all due limitations, our analysis showed that MTV defined from pretreatment FDG PET scans has the strongest impact on patient outcome after standard concurrent chemoradiotherapy. Prospective studies to corroborate this finding through standardized FDG PET acquisition and segmentation methods are warranted.