Superior prognostic utility of gross and metabolic tumor volume compared to standardized uptake value using PET/CT in head and neck squamous cell carcinoma patients treated with intensity-modulated radiotherapy

Objective To compare the prognostic utility of the 2-[18F] fluoro-2-deoxy-d-glucose (FDG) maximum standardized uptake value (SUVmax), primary gross tumor volume (GTV), and FDG metabolic tumor volume (MTV) for disease control and survival in patients with head and neck squamous cell carcinoma (HNSCC) undergoing intensity-modulated radiotherapy (IMRT). Methods Between 2007 and 2011, 41 HNSCC patients who underwent a staging positron emission tomography with computed tomography and definitive IMRT were identified. Local (LC), nodal (NC), distant (DC), and overall (OC) control, overall survival (OS), and disease-free survival (DFS) were assessed using the Kaplan–Meier product-limit method. Results With a median follow-up of 24.2 months (range 2.7–56.3 months) local, nodal, and distant recurrences were recorded in 10, 5, and 7 patients, respectively. The median SUVmax, GTV, and MTV were 15.8, 22.2 cc, and 7.2 cc, respectively. SUVmax did not correlate with LC (p = 0.229) and OS (p = 0.661) when analyzed by median threshold. Patients with smaller GTVs (<22.2 cc) demonstrated improved 2-year actuarial LC rates of 100 versus 56.4 % (p = 0.001) and OS rates of 94.4 versus 65.9 % (p = 0.045). Similarly, a smaller MTV (<7.2 cc) correlated with improved 2-year actuarial LC rates of 100 versus 54.2 % (p < 0.001) and OS rates of 94.7 versus 64.2 % (p = 0.04). Smaller GTV and MTV correlated with improved NC, DC, OC, and DFS, as well. Conclusion GTV and MTV demonstrate superior prognostic utility as compared to SUVmax, with larger tumor volumes correlating with inferior local control and overall survival in HNSCC patients treated with definitive IMRT.


Introduction
Radiation therapy (RT) is the mainstay of treatment for early and locally advanced head and neck squamous cell carcinoma (HNSCC). Improvements in identification of the tumor volume of head and neck tumors using imaging such as 2-[ 18 F] fluoro-2-deoxy-D-glucose (FDG) positron emission tomography with computed tomography (PET/CT) have facilitated radiation treatment planning by improved target volume delineation and more accurate target localization, which is critical for intensity-modulated radiotherapy (IMRT). Sharp dose gradients between the high dose region targeted at the tumor and adjacent lowdose normal tissue regions in IMRT improves the therapeutic ratio between tumor control and radiation related toxicity, although this is reliant on the accurate identification of the tumor extent. Yet despite advanced IMRT techniques and integration of advanced radiological imaging such as PET/CT, locoregional failure still occurs in 30-50 % of locally advanced HNSCC largely within the high dose region [1,2]. Such variable treatment responses argue for the need to characterize metrics that allow the a priori identification of patients at high risk of treatment failure and death.
Currently the American Joint Committee on Cancer (AJCC) staging, which utilizes a uni-dimensional tumor size, local anatomic invasion, nodal involvement, and presence of metastatic disease, is the most widely accepted and applied prognostic system in cancer [3]. Yet much attention has been called to its weaknesses, specifically in its ability to identify HNSCC patients at high risk of recurrence [4][5][6]. PET/CT has been increasingly integrated into diagnostic staging and radiation planning for HNSCC [7,8], and has been demonstrated to be an accurate and sensitive imaging modality for the post-treatment evaluation of patients with HNSCC compared to clinical exam and CT alone [9,10]. More, recently PET/CT variables including maximum standardized uptake value (SUV max ) and metabolic tumor volume (MTV) are emerging as potential radiological biomarkers in patients with HNSCC [11][12][13][14][15][16][17].
Volumetric indices have been proposed to risk stratify patients. Studies have reported that the primary gross tumor volume (GTV) correlates with outcomes and survival in patients with HNSCC undergoing curative surgery [18], radiation [19][20][21], or combined chemoradiation treatments in various head and neck cancer sites [4-6, 22, 23]. Given the interest in PET-based imaging the MTV has been recently explored as a combined volumetric and metabolic radiological biomarker. Studies have reported the predictive power of MTV in patients with head and neck cancer undergoing chemoradiation [12,13].
The relative importance between SUV max , GTV, and MTV of the primary tumor in the risk stratification of HNSCC patients has not been determined. This retrospective study sought to compare the prognostic utility of SUV max , GTV, and MTV with respect to disease outcome and survival in patients with HNSCC undergoing IMRT with or without concomitant chemotherapy.

Patient selection
The study was conducted as a retrospective review approved by the institutional review board (IRB). Informed consent was waived by the IRB. Fifty-one newly diagnosed HNSCC patients treated between January 2007 and March 2011 underwent IMRT (with or without chemotherapy) and had PET/CT imaging obtained prior to start of IMRT. Forty-one patients met the inclusion criteria. In total 10 patients were excluded: 6 had synchronous or metachronous malignancies within 3 years prior to HNSCC diagnosis, 2 had unknown primary, 1 had sino-nasal cancer, and 1 died shortly after treatment from non-cancer related causes (sepsis). All patients were staged according to the 2002 AJCC classification [24].
PET/CT protocol All PET/CT studies were performed on a GE Discovery STE 16 (General Electric, Milwaukee) PET/CT scanner. Patients were scanned skull base to mid-thigh in treatment position on a flat table. Patients were injected with an average of 13.6 ± 3.3 mCi of 18F-FDG and incubated for an average period of 63.0 ± 5.9 min. The amount of injected radioactivity was routinely measured by quantification of the radioactivity of the syringe before and after injection. All patients were scanned using a dedicated head and neck protocol. Head and neck images were acquired with the arms down and body images were obtained with arms up from clavicle to mid-thigh. Body images were obtained first, followed by head and neck images and then low-dose deep inspiration images of the chest.
The dedicated head and neck PET scans were done using 2D imaging with emission scans lasting between 5 and 6 min, and a field of view (FOV) of 30 cm. The matrix size was 128 9 128, and slice thickness was 3.3 mm. The CT images were obtained with a matrix of 512 9 512. Beam collimation was 10 mm with a pitch of 0.984. Table speed was 9.84 mm/rotation and the slice thickness was 0.625 mm. kV of 120 and mAs of 440 were used. Intravenous contrast (IV) was administered by power injection (GE electric, Milwaukee) of 60 ml of Optiray IV (Tyco Health care/Mallinckrodt, Hezelwood) after a 40 s delay for the head and neck images. There was a second bolus after a 110 ml of IV contrast was given for the body section of the study. CT images were reconstructed to the PET slice thickness to match the PET and to create fused images. In addition, CT images were reconstructed at 1.25 mm with 1.25 mm spacing in soft tissue and bone algorithm for review.

PET/CT image analysis
All PET/CT studies were electronically retrieved from archives and reviewed on a GE Advantage Workstation by a single, board-certified radiologist with neuroradiology and nuclear medicine fellowship training. PET, CT, and fused PET/CT images were displayed in axial, coronal, and sagittal planes. For the purposes of this study, the relevant imaging parameter measurements were the primary tumor SUV max and MTV segmented from PET. MTV was defined as the tumor volume with FDG uptake segmented by a gradient-based method. The commercially available MIMvista software analysis suite (MIM Software Inc., Cleveland, OH) includes a contouring suite for radiation therapy planning and a PET/CT fusion suite. Once the primary tumor (target) was segmented, SUV max and MTV were automatically calculated by the MIMvista software. The gradient and threshold segmentation methods of volume measurement available in MIMvista software previously described rely on an operator-defined starting point near the center of the lesion [25,26]. As the operator drags the cursor out from the center of the lesion, six axes extend out, providing visual feedback for the starting point of gradient segmentation. Spatial gradients are calculated along each axis interactively, and the length of an axis is restricted when a large gradient is detected along that axis. The six axes define an ellipsoid that is then used as an initial bounding region for gradient detection. The MTV and SUV max within the bounding region are automatically calculated.

IMRT treatment planning
Patients underwent CT simulation (Brilliance CT Big Bore, Philips Medical Systems, Cleveland, OH) in the supine position immobilized with a custom thermoplastic mask. The radiation planning CT acquisition encompassed the vertex of the scalp to at least 5 cm below the clavicle using 2-3 mm slice thickness. Treatment planning was performed using Philips Pinnacle 3 software suite (version 6.0 to 8.0m, Philips Medical Systems, Fitchburg, WI). GTVs were contoured incorporating diagnostic CT, PET, and/or MR images. To aid GTV contouring, PET/CT images were fused using Philips Pinnacle 3 software suite prior to 2008 or MIMVista version 5.1.2 (MIMVista Corp., Cleveland, OH) after 2008. Structures on the planning CT contoured by the physician included: GTV, clinical target volume (CTV), planning target volume (PTV), and organs at risk including critical normal tissue organs adjacent to the target volumes. GTVs were manually contoured for IMRT by a single board-certified radiation oncologist and the volumes were then calculated by the software when generating dose volume histograms. No auto segmentation was used to create GTVs. Volumetric expansions from GTV to CTV were 7-15 mm (respecting normal tissue planes) followed by a 3-5 mm expansion to PTV. IMRT plans were designed with seven to ten 6 mv photon beams, using an inverse optimization algorithm with normalization such that 95 % of PTV was covered with the prescription dose (66-70 Gy), with the goal of no more than 1 % of PTV receiving less than 93 % of prescription dose, and no more than 1 % or 1 cc of the tissue outside the PTV receiving more than 110 % of prescription dose. Elective nodal areas and regions at risk for subclinical disease were treated to 54-60 Gy using a dose painting technique.

Treatment
All patients were treated with definitive IMRT. The GTV was treated to a median dose of 69.96 Gy (range 66.0-69.96 Gy), over a median of 33 fractions (range 32-33), and a median of 48 days (range 39-72 days). Concurrent chemotherapy was given to 36 (87.8 %) patients: 23 received cisplatin, 8 received carboplatin, and 5 received cetuximab. Of these 36 patients 15 also received induction chemotherapy.

Follow-up
Patients were followed after the conclusion of treatment, continuing until analysis or patient death. PET/CT was used to assess clinical response in addition to clinical exam at 3 months as part of standard treatment care. Disease recurrence was defined as the first site of failure including local failure, nodal failure or distant failure. All failures were confirmed by biopsy.

Statistical analysis
The statistical endpoints analyzed in this study were local control (LC), nodal control (NC), distant control (DC), overall control (OC), overall survival (OS) and disease-free survival (DFS), measured from the end of IMRT to the date of event, censoring patients at last follow-up or death. For OC, the event is occurrence of first local, nodal, or distant relapse. Overall survival was defined as death due to any cause, DFS included patients who died or had disease relapse anytime after the end of IMRT.
The Kaplan-Meier product-limit method was used to estimate the probabilities of tumor control and survival rates at 2 years irrespective of follow-up length [27]. The comparison of survival rates among the groups was done using the two-tailed log rank test. A probability value of less than 0.05 was considered statistically significant. All other statistical computations were performed on SAS 9.1 system (SAS Institute, Cary, NC).

Patients and tumor characteristics
Non-white patients comprised 61 % of patient cohort, with 71 % of patients presenting with stage IV disease. The overall median follow-up was 24.2 months (range 2.7-56.3 months) and 27.1 months (range 4.0-56.3) among surviving patients. Complete patient and tumor characteristics are described in Table 1. For the patient cohort the median SUV max of the primary tumor was 15.8 (range 4.5-33.8), the median GTV was 22.2 cc (range 1.5-162.5 cc), and the median MTV was 7.2 cc (range 0.40-43.5 cc). Overall and sub-site specific PET/CT and tumor volume characteristics are described in Table 2.
Disease control and patterns of failure Local, nodal, and distant recurrences occurred in 10, 5, and 7 patients, respectively, with a median time to recurrence of 2.4 months. The median time to local, nodal, and distant failure was 2.9, 2.2, and 2.2 months, respectively. The estimated 2-year actuarial LC rate was 77.7 %, NC rate was 87.7 %, and the DC rate was 82.0 %. The estimated actuarial DFS and OS rates at 2 years were 67.6 and 79.8 %, respectively, Table 3.

Correlating T category, AJCC stage with SUV parameters and tumor volume
There was a significant correlation between GTV and MTV with a Pearson correlation coefficient of 0.53 (p \ 0.0004). Smaller GTV (\22.2 cc) was associated with lower MTV (4.0 vs. 18.1 cc, p \ 0.001). A significant association was also found between tumor volume measurements and SUV parameters with larger tumor volume associated with greater SUV max , Table 4. AJCC stage correlated with SUV max , but not GTV and MTV though number of patients with stage I, II and III disease were fewer. Compared to patients with AJCC stage I-III disease, stage IV disease patients had higher values of SUV max (11.9 vs. 17.7, p = 0.013), larger GTV (19.8 vs. 38.6 cc, p = 0.109), and larger MTV (4.1 vs. 14.2 cc, p = 0.012). A non-significant trend was noted for SUV max , GTV, and MTV with increasing T stage, Table 4.

Discussion
This study demonstrates that both GTV and MTV are superior prognostic radiological biomarkers of treatment outcome and survival for HNSCC patients undergoing definitive IMRT as compared to SUV max . Improved local, nodal, and overall control rates were seen in patients with smaller GTV and MTV. SUV max was found to correlate significantly with AJCC stage, GTV and MTV, although in this study it was not found to be prognostic for outcome.  Schwartz et al. [28] evaluated 54 patients with HNSCC, undergoing definitive RT including postoperative patients with or without concurrent chemotherapy, and reported that a SUV of greater than 9, the median, significantly correlated with inferior local control and disease-free survival. On univariate and multivariate analyses these data remained significant or borderline significant. Similarly, Machtay et al. [14] reported in a cohort of 60 HNSCC patients, treated with definitive radiotherapy with or without concurrent chemotherapy, that an SUV max \9, median SUV max of the study was 7.2, was associated with improved 2-year DFS of 72 versus 37 % (p = 0.007). Torizuka et al. [15] reported in 50 consecutive HNSCC patients who underwent definitive RT with or without chemotherapy, or surgery with or without postoperative RT that an SUV max of B7 significantly predicted higher rates of 2 year local control and disease-free survival. When adjusted for age and nodal stage these findings remained significant. However, the median SUV max for the cohort was 10.53, and they did not identify how an SUV max of 7 was selected as the optimal cut point. Limitations of comparing SUV as a radiological biomarker between studies includes the use of different SUV cutoff values which may be influenced by multiple factors including patient selection, differences in imaging technique, injected FDG dose, incubation period, protocol, scanner, and reconstruction algorithm variation [29][30][31].
Our study confirms the findings of Strongin et al. [22] who reported a series of 78 patients with stage III-IV oropharyngeal, laryngeal or hypopharyngeal cancer, in which patients with locoregional failure had greater tumor volumes than patients free of disease (58 vs. 36.5 cc, p = 0.028), and those with a GTV\35 cc had significantly improved overall control (71 vs. 41 %), progression free survival (61 vs. 33 %), and overall survival (84 vs. 41 %) rates. Chen et al. [6] demonstrated that a primary tumor volume of greater than 60 cc in patients with nasopharyngeal cancer is superior to the AJCC and the TNM classification system when correlated with survival rates in patients with nasopharyngeal cancer who underwent definitive RT. Similarly, Studer et al. [4] reported that a GTV-based staging system was superior to TNM and  The current study demonstrated that the clinical GTV based on PET/CT and clinical examination, appears to be prognostic as it correlates with control and survival in HNSCC patients who were treated with definitive IMRT with and without induction and/or concurrent chemotherapy regimens. Given the expanding interest in metabolic and volumetric-based indices, we evaluated the prognostic utility of MTV, defined as the volume of tumor with FDG avidity. Chung et al. [12] reported on 64 patients with pharyngeal cancer undergoing definitive radiation therapy with or without concomitant chemotherapy. Patients with a MTV greater than 40 cc, a statistically optimized cut point, indicated a significantly worse disease-free survival than those with MTV B40 cc (HR 3.42, p = 0.04) using a raw SUV cutoff of 2.5 to define MTV within a radiologist contoured margin of the primary tumor and areas of nodal disease. La et al. [13] recently demonstrated the predictive value of MTV in patients with head and neck cancer undergoing chemoradiation. MTV was defined by autosegmentation in three dimension of volume with 50 % or greater SUV max using custom software on pretreatment PET scans [13]. An increase in MTV of 17.4 cc or greater correlated with recurrence or death. MTVs correlated with the GTV with a correlation coefficient of 0.73, but consistently underestimated GTV, which was a finding confirmed in our study [13]. Similar to our findings, these studies failed to demonstrate a correlation with SUV max and DFS or OS.
Given the ability of autosegmentation algorithms, MTV has potential to become a standardized prognostic metric. We suspect that interest in the standardization of MTV will continue to grow as new algorithms are developed, but it is critical to understand the current limitations of MTV as a metric including the lack of a standardized SUV threshold, lack of true correlation with anatomic structures, validation of autosegmentation software, and variability in SUV cutoffs. The impact of these limitations were clearly demonstrated by Ford et al. [32] who reported that a 5 % change in threshold contour can translate into a 200 % increase in contour volume resulting in a significant dosimetric effect. Furthermore, MTV has significant limitation in defining target volumes for treatment planning and using the MTV for treatment planning purposes alone could risk marginal treatment failure as it may underestimate the tumor volume. In contrast, GTV integrates multiple information including radiological and clinical examination findings.
In summary, GTV and MTV demonstrate superior prognostic utility as compared to SUV max as patients with larger tumor volumes are associated with significantly inferior control and survival in HNSCC patients treated with definitive IMRT.