Background

Non-small cell lung cancer (NSCLC) remains the leading cause of cancer-related mortality worldwide [1]. Tumour-cell hypoxia, a common feature of solid tumours, is a pivotal determinant of the effectiveness of radiation, chemical, and immune therapies and is associated with poor overall survival [2, 3]. The hypoxic microenvironment can be assessed by a variety of approaches, e.g. by measurement of partial pressure of oxygen with polarographic electrodes [4] or by immunohistochemical detection of endogenous and exogenous hypoxia markers [5]. However, such procedures are invasive and potentially hazardous, restricted to accessible lesions, and limited by sampling errors.

18F-fluoromisonidazole (FMISO) positron emission tomography (PET), a non-invasive imaging technique, presents an attractive alternative [68]. FMISO is clinically the most extensively investigated hypoxia PET tracer. Several studies in lung cancer patients have suggested stratification strategies based on FMISO uptake and kinetics [912]. The hypothesis is that selective dose painting of putative radioresistant hypoxic tumour sub-volumes, as defined by FMISO PET, may improve locoregional control [13]. Numerous efforts also continue to evaluate 18F-fluorodeoxyglucose (FDG) PET for target delineation in radiotherapy [14, 15] and explore the utility of intensity-modulated radiotherapy (IMRT) based on FDG voxel intensities [16, 17]. However, despite a number of ongoing hypoxia-imaging trials, quantitative assessment of intratumour distribution of hypoxia-specific PET tracers has yet to be widely implemented in the clinical decision-making process.

In order to fully exploit the benefits of incorporating tumour hypoxia information as obtained by FMISO PET into patient management, whether as an IMRT target, in patient stratification strategies for hypoxia-targeting regimens, or for monitoring response to therapeutic interventions, it is essential to examine the spatiotemporal reproducibility of FMISO intratumour distribution. To our knowledge, such studies have been performed only in head and neck cancer (HNC) patients, with discordant results [18, 19]. Due to the absence of similar studies in other tumour entities, e.g. lung cancer, it remains unclear to what extent the reproducibility of FMISO will be affected by the lack of rigid immobilization and the presence of respiratory motion. Therefore, the aim of this pilot study was to investigate the reproducibility of FMISO intratumoural distribution in serial baseline FMISO PET scans in a cohort of NSCLC patients.

Methods

Ethics statement

This study was approved by Memorial Sloan Kettering Cancer Center’s Institutional Review Board (IRB 13-186; registered under www.clinicaltrials.gov identifier number NCT02016872), and all subjects signed a written informed consent regarding the examination and use of anonymized data for research and publication purposes. The methods were carried out in accordance with the approved guidelines.

Patient characteristics

The eligibility criteria were as follows: age > 18 years, pathological confirmation of NSCLC, no prior treatment, primary or nodal tumour measuring ≥2 cm on CT, and a Karnofsky performance status of ≥70 %. Exclusion criteria included pregnant or breast-feeding women and patients with severe diabetes (fasting blood glucose >200 mg/dl). Fifteen patients agreed to participate in the study. Patients were scanned on a flat-top couch insert and immobilized in an alpha cradle (Smithers Medical Products, Inc.). As the second FMISO PET scan was not acquired for five patients due to their inability to continue (n = 3) or technical reasons (n = 2), ten patients in total were included in the reproducibility analysis (Table 1).

Table 1 Patient characteristics

18F-fluorodeoxyglucose PET/CT protocol

Each patient underwent an FDG PET/CT study scan for radiotherapy simulation purposes. Patients were injected intravenously with 460 ± 17 MBq (range, 429–477 MBq) of FDG, after a fasting period of ≥6 h. PET scans were acquired for 3 min/bed position, at 100 ± 38 min (range, 60–171 min) post-injection (pi). All PET data were acquired in 3D mode on a General Electric Discovery ST PET/CT (GE Health Care Inc.). A CT acquired in cine mode (140 kVp, 10 mA, 5.0-mm slice thickness, 0.5-s tube rotation) was averaged (CTavg) and used for attenuation correction of PET images. The cine duration was set to match the patient breathing period plus 1 s (~6 s on average). PET emission data were corrected for attenuation, scatter, and random events and reconstructed into 128 × 128 × 47 matrix (voxel dimensions 5.47 × 5.47 × 3.27 mm). The reconstruction was performed using the GE ordered subset expectation maximization (OSEM) algorithm with standard clinical reconstruction parameters: 2 iterations, 16 subsets, and 6.0 mm full width at half maximum Gaussian post-filter.

18F-fluoromisonidazole PET/CT protocol

Ten patients underwent two FMISO PET studies each (i.e. FMISO1 and FMISO2). FMISO1 was performed 2.4 ± 1.4 days (range, 1–5 days) after the FDG PET/CT, with FMISO2 being performed 1.7 ± 1.6 days (range, 1–6 days) after FMISO1. Patients received an average FMISO bolus injection of 388 ± 15 MBq (range, 356–407 MBq). Data were acquired for 10 min over one field of view (~15 cm; centred on the lesions) at 163 ± 13 min pi (range, 144–183 min). A low-dose cine CT scan (the same parameters as for the FDG study) was performed and used for attenuation correction and image co-registration. All FMISO PET images were reconstructed using OSEM with the standard clinical parameters.

Image analysis

The FDG and FMISO2 tumour volumes were co-registered to those of FMISO1 by means of their corresponding CTavg image sets, using the GE AW Workstation v4.6 General Co-Registration tool (GE Health Care Inc.). Rigid transformation was used, and the results were visually inspected for potential mismatches. The transformation matrices obtained were then applied to the corresponding PET images. Tumour metabolic target volumes (TV) were delineated on the FDG PET images with the adaptive threshold algorithm in the GE AW Workstation PET VCAR™ (Volume Computer-Assisted Reading) semi-automated software (FDA-approved), which is based on the companion CT as a fiduciary marker and a count-based edge recognition algorithm. The corresponding target volumes were subsequently copied onto the two co-registered FMISO image sets.

Tumour uptake in the target volumes on the two FMISO scans was compared on a voxel-by-voxel basis in PMOD v3.604 (PMOD Technologies GmbH). Activity concentration data were converted into standardized uptake values (SUV; normalized to lean body mass). The blood SUV (SUVblood) was measured by (i) segmenting the descending aorta on the CTavg, (ii) copying the volume of interest (VOI) to the corresponding FMISO PET, (iii) eroding the VOI by 1 voxel in 3D, and (iv) measuring the average SUV within the eroded VOI.

Hypoxic sub-volume (HTV; in cm3) was defined as including voxels within the TV having a tumour-to-blood ratio (TBR) ≥ 1.2 on both FMISO scans [8]. For each esion, maximum and mean values for voxels within the TV were calculated in units of SUV (SUVmax, SUVmean) and TBR (TBRmax, TBRmean). Reproducibility of FMISO uptake was also assessed in the non-diseased normal lung tissue (by evaluating the mean SUV within a 20-mm-diameter spherical VOI that was placed in the healthy contralateral lung; SUVlung) and in the non-diseased muscle (by evaluating the mean SUV within a manually drawn VOI on the subscapularis muscle on the CTavg and subsequently copied to the corresponding PET FMISO scan; SUVmuscle).

Statistical analysis

The Pearson correlation coefficient was calculated between the FMISO1 and FMISO2 intratumour distributions (r TV) and between all SUV- and TBR-derived indices. The normality of the distribution of differences in the investigated parameters between the two FMISO studies was verified with a two-sample Kolmogorov-Smirnov test. This was done to validate the applicability of Bland-Altman analysis, which was subsequently performed to calculate the mean differences between voxelwise TBR values and 95 % limits of agreement (LoA) [20]. The latter are defined as ±1.96 * SD of the mean differences and represent the boundaries within which 95 % of observations are expected to be observed. p < 0.05 was assumed to represent statistical significance. To evaluate the geographic stability of hypoxic sub-volumes, the percentage of intratumour voxels that were identified as hypoxic in both FMISO studies was calculated, as based on the TBR ≥ 1.2, ≥1.4, and ≥1.6 thresholds [69]. All statistical analyses were carried out in MedCalc v15.6 (MedCalc Software bvba).

Results

Nineteen FDG-avid lesions were included in the analysis. None of the investigated lesions were located near the edge of the PET field of view. The average lesion volume was 28 cm3 (range, 4–111 cm3). No lesions were found that would exhibit uptake on the FMISO scan while being negative on the corresponding FDG scan. As mismatches between PET and CT scans were identified in two patients (#4 and #9), the co-registrations were modified manually based on the PET images. All differences between the FMISO1 and FMISO2 scans in the SUV- and TBR-derived parameters, imaging time pi, and injected dose were normally distributed, as assessed with Kolmogorov-Smirnov test (p > 0.05). Tumour volume, imaging time pi, SUVblood, SUVlung, SUVmax, SUVmean, TBRmax, TBRmean, HTV, and r TV are summarized in Table 2. Significantly high correlations were observed between all SUV- and TBR-derived parameters from the first and second FMISO scans (r ≥ 0.87, p < 0.001) and HTV (r = 0.99, p < 0.001).

Table 2 Summary of FMISO PET reproducibility analysis in patients with non-small cell lung cancer

Scatter plots for the co-registered FMISO images display intratumour voxels colour-coded according to their hypoxia status as based on the TBR ≥ 1.2 threshold (Fig. 1). The mean r TV was 0.84 ± 0.10 (range, 0.52–0.95), with all lesions except for one having r TV > 0.70. The hypoxic status (i.e. the presence of intratumour voxels with TBR above the pre-defined threshold) remained unchanged in nine out of ten patients between the two scans, regardless of the implemented threshold. 77 %, 68 %, and 63 % of voxels identified as hypoxic on one FMISO scan were confirmed as such on the other FMISO scan (based on TBR ≥ 1.2, ≥1.4, and ≥1.6, respectively).

Fig. 1
figure 1

Reproducibility of FMISO intratumor distribution in patients with NSCLC. Voxelwise scatter plots of tumour-to-blood ratio in FMISO1 (x-axis) vs. FMISO2 (y-axis) are presented for all 19 lesions. Black, blue, and red voxels represent normoxic, hypoxia-ambiguous, and hypoxic tumour sub-volumes, respectively, as based on the TBR ≥ 1.2 threshold (dashed lines). Equality lines (dotted) and r TV are also displayed for all scatter plots. r TV values were significant in all cases

No significant correlation could be established between SUVlung or SUVmuscle and SUVmax, SUVmean, TBRmax, or TBRmean. The muscle-to-blood ratio, defined as SUVmuscle/SUVblood, was 0.97 ± 0.11 across the patients, confirming that FMISO uptake for non-diseased normoxic tissues approaches unity. Representative FMISO PET/CT images from both scans are displayed for two patients (Fig. 2).

Fig. 2
figure 2

FMISO PET images of two patients with non-small cell lung cancer. From left to right: coronal, axial, and sagittal slices showing the first (upper row) and second (lower row) FMISO PET scans of a patient #2 (lesion #2) and b patient #5 (lesion #7). PET images are windowed at 0–1.8 (a) and 0–1.4 (b) tumour-to-blood ratio, respectively

Bland-Altman analysis revealed that voxelwise SUV- and TBR-derived indices were reproducible to within 8–13 %, as calculated from 95 % confidence intervals of the mean differences for each lesion (Table 3). Limits of agreement were between 12 % (SUVmuscle) and 27 % (SUVmax). Percentages are reported instead of absolute values, to facilitate direct comparison of results across different indices. Voxelwise Bland-Altman analysis revealed an average relative difference of 0.9 ± 10.8 % between FMISO1 and FMISO2, as calculated from pooled data including all 19 lesions (n = 5320 voxels; Table 4, Fig. 3). The associated limits of agreement indicate that for 95 % of cases, the relative differences will be within 22 %.

Table 3 Bland-Altman analysis results for all image-derived features
Table 4 Bland-Altman analysis results for differences between voxelwise tumour-to-blood ratio values
Fig. 3
figure 3

Bland-Altman analysis results for pooled data from all 19 lesions. Relative differences in voxelwise TBR values are shown against the average value combined from the two FMISO scans. Mean and both upper and lower limits of agreement (LoA) are displayed as red and blue lines, respectively

Discussion

It is important to determine the reproducibility of image-based prognostic and predictive parameters, including those deduced from nuclear medicine images, which typically exhibit greater statistical variation (i.e. noise) than other modalities. This is especially true for hypoxia-selective radiotracers such as FMISO in light of its generally low tumour uptakes and tumour-to-background ratios. An evaluation of the reproducibility of FMISO PET in NSCLC is a prerequisite if FMISO images are to be rationally used in stratification of NSCLC patients for hypoxia-targeting regimens, monitoring response to therapeutic interventions, or to determine the topology of the hypoxic tumour sub-volumes for dose escalation.

Our data showed strong correlations for both SUV- and TBR-derived metrics between repeated FMISO scans, corroborating results from FDG and 18F-flortanidazole (HX4) PET scans of NSCLC patients [2123]. TBRmax and TBRmean were as reproducible as SUVmax and SUVmean, despite the fact that the definition of a second region of interest to measure the blood activity introduces additional source of variability. The classification status (i.e. indicating either the presence or absence of tumour hypoxia) as based on one FMISO scan remained unchanged in the majority (9/10) of patients when reassessment was performed using the other FMISO scan. These results are encouraging in the context of using FMISO PET in stratification of NSCLC patients for hypoxia-targeted treatments. Changes in FMISO uptake were reported to measure the early response to chemoradiotherapy in NSCLC [10]; however, it remains unclear to what extent the spatiotemporal variability of FMISO PET affects the quality of monitoring treatment response.

Data on the reproducibility of FMISO intratumour distributions from serial FMISO PET scans have been presented previously only for HNC patients, in two separate studies by Nehmeh et al. [18] followed by Okamoto et al. [19]. While Nehmeh et al. reported variability in spatial uptake, speculating that the possible differences were observed due to transient hypoxia, Okamoto et al. subsequently showed that FMISO intratumour distributions were highly reproducible. These contradictory observations may be attributable to (i) imaging at different times post-injection (162 ± 21 vs. 262 ± 21 min), (ii) differences between scan times post-injection for the two FMISO studies (13 ± 8 vs. 3 ± 3 %), (iii) different acquisition times and modes (8 min in 2D mode vs. 10 min in 3D), (iv) different PET/CT cameras (GE Discovery LS with an axial field of view of 15.2 cm vs. newer generation Siemens TruePoint Biograph with an axial field of view of 21.6 cm), and (v) different co-registration algorithms used in the studies by Nehmeh et al. and Okamoto et al., respectively.

Okamoto and colleagues further speculated that another potential reason for the discrepancy between the two studies might have been imaging time post-injection and considered imaging at 4+ h to be more suitable, due to slow clearance of FMISO from the blood [19]. While longer waiting periods should in principle increase the contrast (and image noise), our results indicate that for non-small lung cancer, similarly high reproducibility can be obtained when imaging already at 2.5 h post-injection. The mean r TV (0.84 ± 0.10) is comparable to the results from Okamoto et al. (0.89 ± 0.09 [19]), though not with those from Nehmeh et al. (0.60 ± 0.14 [18]). While in the current study patients were imaged at 163 ± 13 min pi, there are several differences in the methodology compared to that by Nehmeh and colleagues [18]: (i) variations in scan times were substantially lower (5 ± 4 %), (ii) data were acquired in 3D mode for 10 min, (iii) image acquisition was performed on a more recent version of the GE PET/CT scanner, and (iv) a different (FDA-approved) image co-registration software was used compared to the previous study which utilized in-house image co-registration software [18]. The quality of co-registrations may have additionally affected the voxelwise correlation (for example, deliberate misregistration by a single voxel in patient #3 resulted in a drop of r TV from 0.72 to 0.17).

More recently, reproducibility of hypoxia imaging using HX4 PET has been investigated by Zegers and colleagues in a multicenter trial in both HNC and NSCLC patients [23]. The authors concluded that HX4 PET imaging is reproducible regarding the spatial uptake in both HNC and NSCLC patients, reporting no major differences in the results between the two cohorts [23]. The mean ΔSUV was 0.02 ± 0.07; high correlations were reported between SUVmax and TBRmax as well as between hypoxic sub-volumes [23]. Our results are also in concordance with this study.

Scatter plots indicate systematic differences in voxelwise uptake between the two FMISO scans, also observed in earlier PET reproducibility studies [18, 19, 23]. Various technical (e.g. incorrect synchronization of time between injection and calibration), biologic (uptake period, presence of acute hypoxia, patient motion, breathing, and comfort), and physical factors (VOI for the calculation of SUVblood) might affect PET quantification [24]. However, the mean difference in voxelwise TBR values from pooled data was 0.9 ± 10.8 %, suggesting no systematic biases. This observation is further supported by the absence of significant correlation between SUV values in normal tissues (contralateral lung and subscapularis muscle) and in the tumour. Approximately 23 % of voxels identified as hypoxic on one FMISO scan did not meet the hypoxia criterion on the other FMISO scan (assuming the TBR > 1.2 threshold). In addition to the aforementioned factors, this could be attributed to relatively low uptake of FMISO that exacerbates the impact of statistical noise, potential mismatch between the PET and the CT images (affecting attenuation correction), CT-CT co-registration of the FMISO1 to FMISO2 image sets, and/or the susceptibility of the lesion to respiratory motion due to its location within the lung. Resampling of FMISO2 resulted on average in <3 % absolute differences in uptake values. When correlation analysis was repeated by co-registering FMISO1 to FMISO2, the change in r TV was <1 % (data not shown). The extent to which the changes in spatial distribution of tumour hypoxia compromise the coverage of hypoxic tumour sub-volumes achievable by IMRT remains to be investigated.

A limitation of the current study is a small sample size. Nevertheless, high reproducibility of FMISO spatiotemporal distribution was confirmed, providing an impetus for the use of FMISO PET imaging in thoracic oncology. Another limitation of this as well as earlier PET reproducibility studies in NSCLC is the absence of respiratory gating [2123]. While motion correction is not yet widely used clinically [22], it may alter the accuracy of quantitative uptake measures due to image blurring [25]. However, similar reproducibility of FMISO was observed for non-small cell lung cancer patients as for patients with head and neck cancer [19], despite the fact that the latter were immobilized during image acquisition, the tumours were not affected by respiratory motion, and for which the co-registration is expected to be more accurate. Lastly, the clinical significance of the observed variability in FMISO intratumour distribution in the context of patient stratification for hypoxia-targeting therapies, monitoring treatment response, efficacy of biologically conformal radiotherapy, or radiomics warrants further examination in larger datasets.

Conclusions

The results of this pilot study confirm that (i) FMISO intratumour distribution is highly reproducible in NSCLC, facilitating its use in dose escalation of hypoxic tumour sub-volumes, patient stratification strategies, and monitoring treatment response; (ii) high reproducibility can be achieved with relatively shorter imaging times post-injection than those previously suggested, potentially reducing long patient waiting periods; and (iii) the spatiotemporal uptake patterns of FMISO as measured by PET are not expected to be affected by transient hypoxia.