Repeatability of hypoxia PET imaging using [18F]HX4 in lung and head and neck cancer patients: a prospective multicenter trial

Purpose Hypoxia is an important factor influencing tumor progression and treatment efficacy. The aim of this study was to investigate the repeatability of hypoxia PET imaging with [18F]HX4 in patients with head and neck and lung cancer. Methods Nine patients with lung cancer and ten with head and neck cancer were included in the analysis (NCT01075399). Two sequential pretreatment [18F]HX4 PET/CT scans were acquired within 1 week. The maximal and mean standardized uptake values (SUVmax and SUVmean) were defined and the tumor-to-background ratios (TBR) were calculated. In addition, hypoxic volumes were determined as the volume of the tumor with a TBR >1.2 (HV1.2). Bland Altman analysis of the uptake parameters was performed and coefficients of repeatability were calculated. To evaluate the spatial repeatability of the uptake, the PET/CT images were registered and a voxel-wise comparison of the uptake was performed, providing a correlation coefficient. Results All parameters of [18F]HX4 uptake were significantly correlated between scans: SUVmax (r = 0.958, p < 0.001), SUVmean (r = 0.946, p < 0.001), TBRmax (r = 0.962, p < 0.001) and HV1.2 (r = 0.995, p < 0.001). The relative coefficients of repeatability were 15 % (SUVmean), 17 % (SUVmax) and 17 % (TBRmax). Voxel-wise analysis of the spatial uptake pattern within the tumors provided an average correlation of 0.65 ± 0.14. Conclusion Repeated hypoxia PET scans with [18F]HX4 provide reproducible and spatially stable results in patients with head and neck cancer and patients with lung cancer. [18F]HX4 PET imaging can be used to assess the hypoxic status of tumors and has the potential to aid hypoxia-targeted treatments. Electronic supplementary material The online version of this article (doi:10.1007/s00259-015-3100-z) contains supplementary material, which is available to authorized users.


Introduction
[ 18 F]HX4 is a new 2-nitroimidazole PET imaging agent for hypoxia, in which structure-activity relationships have been used to optimize pharmacokinetic and clearance properties [1,2]. Tumor hypoxia is a condition in which insufficiently vascularized tumor cells deprived of oxygen not only become more aggressive and malignant, but also more resistant to treatment by radiation and chemotherapy [3][4][5]. The presence of hypoxia is therefore generally considered a poor prognostic disease marker in cancer patients [6]. However, it is difficult to measure oxygen levels reproducibly and noninvasively in a highly heterogeneous tumor environment. Reliable diagnostic methods to detect and quantify tumor hypoxia are therefore needed. It has been hypothesized and currently being investigated that inclusion of hypoxic cell sensitizers during treatment, i.e., the delivery of higher radiotherapy doses to hypoxic regions [7] or the use of hypoxia-targeting therapy [8][9][10][11], might improve the outcome in patients with hypoxic tumors [12]. [ 18 F]HX4 has the potential to serve as a clinically useful diagnostic tool to aid the use of hypoxia-targeting therapies in those patients who will most likely benefit from them [13,14].
This pilot phase 2 study was primarily designed as a testretest study to investigate the repeatability of [ 18 F]HX4 as a noninvasive PET imaging marker for detection of tumor hypoxic regions. Here we present the results in patients with lung cancer and patients with head and neck (H&N) cancer.

Patients
This multicenter study (NCT01075399) was conducted in accordance with the ethical principles of Good Clinical Practice, according to the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH). Both the FDA and the institutional review boards of the participating institutions approved the study protocol and the informed consent form. All participants reviewed and signed the informed consent form before study entry. [ 18 F]HX4 PET/CT images were acquired in 19 patients, 9 with lung cancer and 10 with H&N cancer. The patients underwent two sequential pretreatment [ 18 F]HX4 PET/CT scans within 1 week to assess repeatability. Patient characteristics are presented in Table 1.

Scanners and technical parameters
[ 18 F]HX4 PET/CT scans were performed using a highresolution full-ring PET/CT scanners, including a GE Discovery, GE Discovery LS, Philips Gemini, and a Siemens Biograph PET/CT scanner. Images were reconstructed using scanner-specific parameters in accordance with each facility's standard procedure, including at least attenuation and scatter correction. Repeat scans were performed on the same PET/CT scanner using the same protocol and patient positioning without respiratory gating.

Image evaluation of [ 18 F]HX4
[ 18 F]HX4 PET/CT scans were analyzed using an Inveon Research Workplace (Edition 4.0.0.3; Siemens, Germany). Gross tumor volumes (GTV) of the primary lesion or largest lymph node were defined in centimeters cubed by manual contouring of the tumor on the CT images by one observer (D.C.). These tumor delineations were applied to the PET images and the maximal and mean standardized uptake values (SUV max , SUV mean ) were measured in grams per milliliter. Under the assumption of water density, the SUV is reported as unitless. For each patient, the reference tissue was defined by contouring a volume of interest (VOI; sphere of radius 25 mm) in a large (thigh) muscle on the CT image. From this muscle VOI the SUV mean (M) was determined. Tumor-tobackground ratios (TBR) were calculated by dividing tumor SUV max and tumor SUV mean by muscle SUV mean (M) The fraction of HV (FHV, percent) of each tumor was determined by dividing the HV by its respective GTV: .

GTV
To evaluate the repeatability of the heterogeneous uptake pattern, the second [ 18 F]HX4 PET/CT scans were rigidly registered and inspected for accurate registration, and a voxelwise comparison of the SUVs within the GTV was performed.

Statistics
For all parameters, the mean±SD are reported. The relationships among GTV-based parameters (SUV mean , SUV max , TBR, HV, FHV) extracted from repeat [ 18 F]HX4 PET images were analyzed by calculating Pearson correlation coefficients. A p value <0.05 was assumed to be statistically significant. In addition, a Bland-Altman analysis was performed for all parameters providing the mean difference of each parameter and the absolute and relative coefficients of repeatability (CR 1.96 × SD), defined as the value below which the difference between two measurements will be with 95 % probability. To evaluate the voxel-wise analysis, a linear fit of the data was performed, providing the correlation coefficient and slope. A Bland-Altman plot was created providing the difference in uptake for each matching voxel (ΔSUV) with the lower and upper limits of agreement of the 95 % confidence interval. In addition a histogram of SUVs within the GTV was prepared.

Results
[ 18 F]HX4 PET/CT imaging in nine patients with lung cancer and ten with H&N cancer were included in the analysis. Two sequential baseline [ 18 F]HX4 PET/CT scans were performed at an average interval of 1.1 days (range 1 -2 days) in patients with lung cancer and 2.1 days (range 1 -6 days) in patients with H&N cancer.   Table 2). The uptake parameters from the first and second scans were highly correlated: r=0.958 for SUV max (p<0.001, Fig. 1), and r=0.946 for SUV mean (p<0.001, Supplementary figure). High correlations between scans were also seen within each subgroup of cancer patients: r=0.972 for SUV max (p<0.001) and r=0.960 for SUV mean (p<0.001) in those with lung cancer, and r = 0.945 for SUV max (p < 0.001) and r = 0.952 for SUV mean (p < 0.001) in those with H&N cancer. In the Bland-Altman analysis, SUV max showed a mean difference of 0.02 with an absolute CR of 0.29 and a repeatability percentage of 17 % (Fig. 1), and SUV mean showed a mean difference of 0.01 with an absolute CR of 0.18 and a repeatability percentage of 15 %.
High correlations were also seen for TBR max (r=0.962, p<0.001; Fig. 1) and TBR mean (r=0.965, p<0.001). High correlations were also seen within each subgroup of cancer patients: r =0.939 for TBR max (p < 0.001) and r =0.972 for TBR mean (p<0.001) in those with lung cancer, and similarly r=0.972 for TBR max (p<0.001) and r=0.964 for TBR mean (p<0.001) in those with H&N cancer. In the Bland-Altman analysis, TBR max showed a mean difference of −0.01 with an absolute CR of 0.30 and a repeatability percentage of 17 % (Fig. 1), and TBR mean showed a mean difference of −0.01 with an absolute CR of 0.11 and a repeatability percentage of 10 %.

HV and FHV analysis
The average tumor volume was 70 cm 3 (range 2.6 -361 cm 3 ). The average HV 1.2 in the first scan was 32 cm 3 (range 0 -211 cm 3 ) and in the second scan was 34 cm 3 (range 0 -204 cm 3 ; Table 2). For HV 1.2 , there was a high correlation between the first and second scans (r = 0.995, p <0.001; Supplementary figure) which was retained in each subgroup of cancer patients: r=0.997 (p<0.001) in those with lung cancer and r=0.998 (p<0.001) in those with H&N cancer. In the Bland-Altman analysis, HV 1.2 showed a mean difference of -1.55 cm 3 with an absolute CR of 13.5 cm 3 (Supplementary figure).
Applying the higher threshold of 1.4 times the background, in the first scan the average HV 1.4 was 19 cm 3 (range 0 -175 cm 3 ) and in the second scan was 19 cm 3 (range 0 -162 cm 3 ; Supplementary table). For HV 1.4 , there was also a consistently high correlation between the first and second scans (r=0.982, p<0.001) which was retained in each subgroup of cancer patients: r=0.959 (p<0.001) in those with lung cancer and r=0.999 (p<0.001) in those with H&N cancer. In the Bland-Altman analysis, HV 1,4 showed a mean difference of 0.08 cm 3 with a confidence interval of -17.2 to 17.4 cm 3 .
There was a wide range of FHV 1.2 due to varying levels of hypoxia among the tumors. In the first scan the average FHV 1.2 was 20±25 % (range 0 -85 %) and in the second scan the average FHV 1.2 was 23±26 % (range 0 -80 %; Table 2). This was also seen when the higher threshold of 1.4 times the background was applied: in the first scan the average FHV 1.4 was 9±18 % (range 0 -71 %) and in the second scan the average FHV 1.4 was 10 ± 17 % (range 0 -63 %; Supplementary table).
For FHV 1.2 , there was a high correlation between the first and second scans (r=0.957, p<0.001) which was retained in each subgroup of cancer patients: r=0.966 (p<0.001) in those with lung cancer and r=0.950 (p<0.001) in those with H&N cancer. For FHV 1.4 , there was also a high correlation between the first and second scans (r=0.975, p<0.001) which was retained in each subgroup of cancer patients: r = 0.963 (p<0.001) in those with lung cancer and r=0.985 (p<0.001) in those with H&N cancer. In the Bland Altman analysis, FHV 1.2 showed a mean difference of -3.1 % with an absolute CR of 14.9 %, and FHV 1.4. showed a mean difference of -0.9 % and an absolute CR of 7.8 %.
Using 1.2 times the background as the threshold to determine FHV, 79 % of the tumors (15/19) were found to have some level of hypoxia but when the higher threshold of 1.4 times the background was applied to determine FHV, only 47 % of the tumors (9/19) were characterized as having hypoxia.

Repeatability of the spatial uptake pattern
An example of voxel-wise image analysis in a patient with head and neck cancer (patient 12) is shown in Fig. 2. Comparison of the heterogeneous uptake within the GTV between the first and second [ 18 F]HX4 PET scans showed a moderate to strong correlation in the majority of patients, with an average correlation coefficient of 0.65±0.14. There were two exceptions (patients 14 and 16) in whom a poor correlation was observed (R=0.38 and 0.39). The average slope and intercept of the linear fit of the data were 0.56±0.17 and 0.47±0.19, respectively. The Bland-Altman analysis showed an average ΔSUV of 0.02±0.06, with a lower and upper limit of agreement of 0.15±0.09 and 0.19±0.08. Examples of voxel-wise image analysis in patients with lung cancer (patients 1 and 4) are shown in Fig. 3. In addition, the results for each patient are shown in Table 3.

Discussion
The aim of this study was to investigate the repeatability of [ 18 F]HX4 as a noninvasive PET imaging marker for the detection of tumor hypoxia in patients with lung cancer and patients with H&N cancer. Tumor hypoxia is known to be a  1 Correlation and Bland-Altman plots (including 95 % confidence intervals) of the image parameters SUV max and TBR max dynamic process characterized by the presence of acute and chronic hypoxia. Acute hypoxia is usually the result of a blockage or disruption in the perfusion of the tumor, while chronic hypoxia is mainly caused by limitations of oxygen diffusion due to an inefficient blood vessel network which results in larger distances between the blood vessels and tumor tissue. Static PET imaging will show only the hypoxic status at one specific time-point and contain information about both acute and chronic hypoxia. To be able to select patients for treatment with antihypoxia therapy and/or for a hypoxia-based radiotherapy dose redistribution, it is important to gain an insight into the day-to-day variability in tumor hypoxia and its spatial location. Therefore we compared [ 18 H&N cancer lesions (1.2±0.3). There is no standardized method to define tumor hypoxia on PET images. The threshold value for defining tumor hypoxia is dependent on the tracer, tracer pharmacokinetics, and other imaging parameters [16]. In a previous study [16], we showed that PET imaging using a threshold of 1.2 times background at 2 h after injection provides a similar FHV and hypoxic lesion detection rate to imaging using a threshold of 1.4 times background at 4 h after injection. In the current analysis, we included both thresholds to quantify the HV. First we defined the threshold as an uptake above 1.2 times the background level. In this case 89 % (8/9) of the patients with lung cancer and 70 % (7/10) of those with H&N cancer had a hypoxic tumor volume. These percentages are in agreement with previously published results showing, for example, hypoxia in 72 % of patients with non-small-cell lung cancer [16] and in 84 % of those with H&N cancer [17]. Increasing the threshold to 1.4 times background level resulted in decreases in the proportions of hypoxic lesions detected to 67 % of lung cancer lesions (6/9) and 30 % of H&N cancer lesions (3/10).
At the tumor level we observed a high correlation for the frequently used parameters to quantify tumor hypoxia (SUV max , SUV mean , TBR, HVand FHV). This is in agreement with the results of a study by Okamoto et al. [18] who evaluated the reproducibility of the hypoxia PET tracer [ 18 F]FMISO in patients with H&N cancer. They found a high correlation for SUV max, TBR and HV. However, these results do not agree with the previous results of Nehmeh et al. [19] who found a considerable variability in intratumoral uptake between repeat [ 18 F]FMISO PET scans. The reproducibility of the hypoxia PET tracer [ 18 F]FAZA was evaluated by Busk et al. [20] in a mouse model and showed good reproducibility. In comparison to [ 18 F]FDG PET/CT imaging, our observed repeatability percentages (SUV max 17 % and SUV mean 15 %) are smaller than the relative differences required to exceed test-retest variability, which should be larger than 25 % for SUV max and 20 % for SUV mean [21]. Since [ 18 F]HX4 has a lower uptake than [ 18 F]FDG, results from comparisons of the two tracers should be interpreted with caution. However, comparing our relative coefficients of repeatability with the results of the low uptake [ 18 F]FDG measurements (Fig. 1c  can be used to reliably detect and quantify tumor hypoxia. This is essential for the use of hypoxia PET imaging as a predictor of treatment response or for monitoring changes in hypoxia during treatment. The detection of hypoxia using [ 18 F]HX4 PET/CT at the tumor level could therefore be used to identify patients who might benefit from hypoxia-targeted treatment [22]. To evaluate the stability of the heterogeneous uptake pattern of [ 18 F]HX4, a voxel-wise comparison was performed. This analysis showed reproducible results (R>0.5) in the majority (17 out of 19) patients with lung cancer or H&N cancer. The observed repeatability is in agreement with previous results of Peeters et al. [23] showing high repeatability of [ 18 F]HX4 uptake in a rat rhabdomyosarcoma model. Repeatability studies using the alternative hypoxia tracer [ 18 F]FMISO have shown contradictory results: Okamoto et al. [18] and Bittner et al. [24] found good repeatability, while Nehmeh et al. [19] observed variability in spatial uptake. For the hypoxia tracer [ 18 F]FAZA, repeated PET/CT imaging was performed during the course of radiotherapy. While Mortensen et al. [25] found a stable location of the HV during treatment, Servagi-Vernat et al. [26] found a spatial move in the HV. The spatial reproducibility of tumor hypoxia, as measured by a hypoxia PET tracer is essential for hypoxia PET-based radiotherapy planning. Three-dimensional information on the hypoxic areas within the tumor can be used to tailor radiotherapy treatment to give a higher radiation dose to hypoxic subvolumes [27]. In this study, [ 18 F]HX4 PET/CT imaging was able to identify stable hypoxic areas in the majority of patients. Therefore, this imaging technique could potentially enable the reliable treatment of hypoxic areas with an increased radiotherapy dose. Several studies have already shown that it is feasible to perform radiotherapy dose planning based on hypoxia PET images [12,28,29].
There were some limitations to this study. First, patients with very heterogeneous disease were included. These tumors have a different histology and might therefore express a different phenotype regarding acute versus chronic tumor hypoxia, which could possibly affect the reproducibility of tracer uptake. Nevertheless, even in this heterogeneous population, a high repeatability in [ 18 F]HX4 PET/CT uptake was observed. Second, the study design was multicentric; therefore different PET/CT scanners were used with different physical characteristics and different acquisition protocols. Differences in resolution among the scanners might have led to differences in the tumor hypoxia detection rates. In general, we expect with all scanners a partial volume effect, and particularly in small lesions, in lesions with low uptake and with a small HV this would cause larger differences in absolute uptake measurements. Also, breathing motion in the patients with lung cancer could have caused blurring of the PET signal. The differences in acquisition protocol, i.e., acquisition time per bed position and uptake period, will lead to differences in the observed signal-to-noise ratios, and TBR and SUV measurements [16,30]. Nevertheless, since we used each patient as his or her own control, the partial volume effect and the effect of different scanners should have had only a minor influence on the repeatability results. Third, the [ 18 F]HX4 PET scans were on average acquired at 99 min after injection, with a maximal difference in the time from injection acquisition of 27 min. Studies reported after this study was completed have shown that the contrast between tumor and background increases up to 4 h after injection. Therefore, the image contrast might have been suboptimal and the differences in uptake parameters observed might have been due to differences in the time from injection to acquisition [30].
In conclusion, repeated PET imaging with the hypoxia tracer [ 18 F]HX4 provides reliable and reproducible results regarding the (spatial) uptake in patients with head and neck and lung cancer. [ 18 F]HX4 has the potential to quantify hypoxia in tumors and aid hypoxia-targeted treatments.