Background

Positron-emission tomography [PET] may be used to delineate the biological target volume for both radiotherapy and response monitoring purposes [14]. The most widely used PET tracer, [18F]-fluoro-2-deoxy-D-glucose [FDG], might improve accuracy of tumor volume definition for radiotherapy by identifying areas within the tumor that are more metabolically active [5]. Tumor volumes can be delineated on either images of glucose metabolic rate or standardized uptake value [SUV] images [6]. SUV is most commonly used for (semi-)quantification of whole-body FDG PET studies and only requires a static scan. Images of glucose metabolic rate can be generated from dynamic scans using a measured or image-derived arterial input function together with Patlak graphical analysis. It is well known that Patlak analysis is quantitatively more accurate than SUV analysis. Patlak analysis, however, requires a dynamic scan and limits data acquisition to a single bed position with an axial coverage of < 20 cm.

As shown previously [6], metabolic volumes, defined using a 50% isocontour method, were smaller when defined on Patlak images than when defined on SUV images. To date, however, no systematic comparison has been performed in which various existing (semi-)automatic tumor delineation methods have been applied to both SUV and Patlak images.

Recently, a number of different (semi-)automatic tumor delineation methods have been validated for SUV images using both simulations [7, 8] and lung tumor FDG scans [9, 10]. Most methods showed good performance as measured maximum diameters derived from these methods corresponded well with pathological measurements [11]. As most of tumor delineation methods that correct for local background are less sensitive to changes in local contrast, these methods might show better correspondence between measured tumor volumes derived from either Patlak or SUV images. Therefore, the purpose of this study was to systematically compare measured metabolic tumor volumes derived from SUV and Patlak images using a variety of (semi-)automatic tumor delineation methods.

Materials and methods

Patient data

Dynamic FDG PET scans from 10 non-small cell lung cancer [NSCLC] (stages IIIB to IV) patients [12] and 8 gastrointestinal [GI] (colorectal carcinoma) cancer patients [13] were included retrospectively. All scans had been acquired prior to therapy. All patients had given written informed consent, and both studies had been approved by the Medical Ethics Review Committee of the VU University Medical Center.

For NSCLC patients (three females, seven males; weight 76 ± 10 kg, range 56 to 94 kg), blood glucose levels were within the normal range (mean 5.5 ± 0.6 mmol·L-1, range 4.4 to 7.0 mmol·L-1). The same was true for blood glucose levels (mean 5.6 ± 0.8 mmol·L-1, range 3.9 to 7.0 mmol·L-1) of patients with advanced GI malignancies (one female, seven males; weight 85 ± 15 kg, range 60 to 110 kg).

PET scanning protocol

All patients fasted for at least 6 h before scanning. Patients were prepared in accordance with recently published guidelines for quantitative PET studies [14]. They were scanned in a supine position and received an intravenous catheter for tracer administration. During dynamic scanning, blood samples for determining plasma glucose levels were collected at fixed times (i.e., at 35, 45, 55 min post injection). All dynamic scans were performed using an ECAT EXACT HR+ scanner (Siemens/CTI, Knoxville, TN, USA) [15], having a 15.5-cm axial field of view. Each scan session started with a 10-min transmission scan using three retractable rotating 68Ge line sources. After completion of the transmission scan, a bolus of FDG was administrated intravenously (388 ± 71 and 459 ± 97 MBq for NSCLC and GI cancer, respectively), at the same time starting a dynamic emission scan in a 2-D acquisition mode. Each dynamic scan consisted of 40 frames with the following lengths, 1 × 30, 6 × 5, 6 × 10, 3 × 20, 5 × 30, 5 × 60, 8 × 150, 6 × 300 s. In addition, a static scan was created by summing the sinograms of the last three frames (i.e., 45 to 60 min post injection).

All data were normalized and corrected for attenuation, random coincidences, scatter radiation, dead time, and decay. Reconstructions were performed using normalization and attenuation-weighted ordered subsets expectation maximization [OSEM] with 2 iterations and 16 subsets, followed by post-smoothing using a 0.5 Hanning filter. This resulted in an image resolution of approximately 6.5 mm full width at half maximum. An image matrix size of 256 × 256 × 63 was used, corresponding to a pixel size of 2.57 × 2.57 × 2.43 mm3.

After reconstruction, the summed image (45 to 60 min post injection) was used to generate a SUV image by normalizing local tissue concentrations to injected dose and body weight. In addition, Patlak analysis, a kinetic linearized model [16] for irreversible tracer uptake, was applied to the interval 10 to 60 min post injection to generate an image of net influx rate [K i] of FDG, which is proportional to the metabolic rate of glucose. Image-derived input functions [IDIF] were used as plasma input curves and obtained as described by Cheebsumon et al. [17]. In short, 3-D volumes of interest [VOI] were drawn manually in three vascular structures (i.e., the left ventricle, aortic arch, and ascending aorta) using an early frame that highlights the blood pool [18]. These VOI were then projected onto all frames yielding arterial whole blood time activity curves (i.e., IDIF). The average input curves from VOI defined in the three vascular structures were used as an input function during Patlak analysis.

Data analysis

For the 10 NSCLC patients, VOI were defined for 54 lesions that were all located in the lung. For the 8 GI cancer patients, VOI were defined for 37 lesions that were located in the liver (n = 23), lung (n = 12), or colon (n = 2). All lesions that could be identified by an expert physician were included in this study. Metabolic tumor volumes were obtained using the following five different types of (semi-)automatic VOI methods:

Fixed threshold of 50% and 70% (VOI50 , VOI70 ). In this method, a fixed threshold (i.e., 50% or 70%) of the maximum voxel value within a tumor is used to delineate the tumor [19].

Adaptive threshold of 41%, 50%, and 70% (VOIA41 , VOIA50 , VOIA70 ). This is similar to the fixed threshold method, except that it adapts the threshold relative to the local average background, thereby correcting for the contrast between the tumor and local background [19].

Contrast-oriented method (VOISchaefer ). This method uses the average of SUV within a 70% threshold of SUVmax isocontour (meanSUV70%) and background activity for various sphere sizes. Regression coefficients are calculated, which represent the relationship between the optimal threshold and image contrast for various sphere sizes [3]. This threshold equation is given by:

Threshol d optimal = A × meanSU V 70 % + B × Background,

where A and B are fitted using phantom studies [3]. When applied to Patlak images, K i rather than SUV is used. In general, different values are applied for sphere diameters smaller and larger than 3 cm in diameter. In the present study, this method was recalibrated, i.e., specific A and B values for the image characteristics used were determined.

Background-subtracted relative-threshold level [RTL] method (VOIRTL ). This method is an iterative method that is based on a convolution of the point-spread function, which takes into account differences between various sphere sizes and the scanner resolution [4].

Gradient-based watershed segmentation method. This method uses two steps before calculating the VOI [2]. First, a gradient image is calculated on which a 'seed' is placed in the tumor (tumor basin) and another in the background (background basin). Next, a watershed [WT] algorithm is used to grow both seeds in the gradient basins, thereby creating boundaries on the gradient edges. In the present study, two different types of gradient basins were used. In the first approach [GradWT1], all voxels on the edge between the tumor and background are assigned to the tumor [8, 10]. In the second approach [GradWT2], an upsampled image is used to ensure less effects of sampling. In addition, a voxel on the edge between the tumor and background is allocated to either the tumor or background based on the smallest difference with that voxel value.

To reduce sensitivity to noise, for all methods, the maximum voxel value was obtained using a cross-shaped pattern. This method searches for the region with the (local) average maximum intensity based on the average of seven neighboring voxels, which was then used as a maximum or 'peak' value. The tumor-to-background ratio was calculated by dividing this maximum value by the background value surrounding the tumor.

Statistical analyses

Both metabolic volumes and differences in measured volumes derived from two image types are reported. The percentage volume difference was defined as Volum e SUV Volum e Patlak  -  1 × 100 % . Note that this value can be negative, indicating an underestimation of the SUV-derived metabolic volume compared with the Patlak-derived volume. In addition, for each delineation method, mean, median, first quartile, third quartile, minimum, and maximum values, including statistical outliers, are reported in box plots. Moreover, visual outliers were identified as VOIs that showed unrealistically large or small volumes when compared visually with the tumor. These outliers were not included in the statistical analysis when calculating p values. A two-tailed Wilcoxon signed-rank test was used to indicate statistically significant differences between measured volumes derived from SUV and Patlak images, where p values less than 0.05 were considered to be significant.

Results

Table 1 shows the number of visual outliers (i.e., those cases where there is an obvious mismatch between derived VOI and tumor boundaries) for all methods applied to both Patlak and SUV images, specifying results for NSCLC and GI cancer separately. In general, most tumor delineation methods provided more outliers when metabolic volumes were derived from SUV images rather than Patlak images. Only gradient-based methods showed more outliers for Patlak-based tumor delineation. VOI70 and VOIA70 provided no outliers for either image or cancer type.

Table 1 Number of visual outliers in SUV- and Patlak-derived measured metabolic volumes for both cancer

In general, measured tumor volumes derived from SUV images were larger than those derived from Patlak images. Example images of the measured tumor volumes derived from SUV and Patlak images are shown in Figure 1. Exceptions were VOIA70 for both types of cancer and the two gradient-based methods for GI cancer (Tables 2 and 3). Large differences (up to 58.7% and 28.1% for NSCLC and GI cancer, respectively) in measured metabolic volume based on the two image types were observed for the various delineation methods (Figure 2A). In the case of NSCLC, the median difference in volume was higher for fixed threshold methods than for adaptive, contrast-oriented, or gradient-based methods. This is further illustrated by Figure 3A, where VOIA50 (i.e., with background correction) shows better correspondence between SUV- and Patlak-based volumes than VOI50 (i.e., without background correction). Only GradWT1 provided no significant difference (p > 0.05) in the metabolic volume derived from SUV and Patlak images, but this may be due to the large spread in differences (Figure 3B). Similar results were found when these differences in volume were compared to SUV (or K i values, Figures 4A, B). In general, we observed that smaller lesions also had the lowest SUV. Consequently, the largest volume differences between SUV and Patlak image-based tumor delineations were seen for lesions having a low SUV and a small metabolic volume.

Figure 1
figure 1

Coronal images of the measured tumor volumes. Coronal images of the measured tumor volumes derived from SUV and Patlak images of one patient with NSCLC, obtained using four different tumor delineation methods (i.e., VOI50, VOIA50, GradWT1, and GradWT2).

Table 2 Mean, median, minimum, and maximum values of metabolic volumes and their median differences for NSCLC
Table 3 Mean, median, minimum, and maximum values of metabolic volumes and median differences for GI cancer
Figure 2
figure 2

Box-and-whisker plots of the percentage differences between measured volumes derived from SUV and Patlak images. Box-and-whisker plots of the percentage differences between measured volumes derived from SUV and Patlak images for different tumor delineation methods in (A) NSCLC and (B) GI cancer, and (C) the pooled data from both studies specified per location, i.e., the liver and the lung. The median is the horizontal line between the lower (first) and upper (third) quartiles. Empty square represents the average value, cross, the minimum and maximum values, and filled left-pointing pointer, the number of statistical outliers. The percentage difference was defined as Volum e SUV Volum e Patlak  -  1 × 100 % .

Figure 3
figure 3

Percentage difference in measured volumes compared to the measured volume. Percentage difference in measured volumes compared to the measured volume derived from SUV and Patlak images for various delineation methods applied to (A, B) NSCLC and (C, D) GI cancer. Note that some data points (for VOI50 and GradWT2) fall outside the range of the figure. The percentage difference was defined as Volum e SUV Volum e Patlak  -  1 × 100 % .

Figure 4
figure 4

Percentage difference in measured volumes compared to SUV. Percentage difference in measured volumes compared to SUV derived from SUV and Patlak images for various delineation methods applied to (A, B) NSCLC and (C, D) GI cancer. Note that some data points (for VOI50 and GradWT2) fall outside the range of the figure. The percentage difference was defined as Volum e SUV Volum e Patlak  -  1 × 100 % .

Similar trends were observed for GI cancer (Figures 2B and 3C, D). Here, GradWT1 (only after removal of visual outliers), GradWT2, and VOIA50 provided no significant differences (p > 0.05) between measured volumes derived from both image types. In addition, similar trends were observed when data from both studies were pooled and presented separately for the specific locations of the tumors (i.e., the liver or the lung, Figure 2C).

Discussion

The main use of FDG is measurement of glucose metabolism. However, FDG PET can also be used to measure the volume with increased metabolism. In a previous report [6], it was shown that tumor delineation using Patlak-derived glucose metabolism images provided smaller volumes and sharper borders than when SUV images were used. This was due to a higher local contrast in Patlak images than in SUV images. Patlak analysis, however, may not always be feasible or optimal because it requires (measured) arterial input data and a dynamic scan, which limits coverage to a single bed position. Therefore, in clinical practice, a static whole-body scan (covering the whole body) might be preferred, in which case data can only be analyzed using a SUV approach. It is well known, however, that SUV may be affected by technical, biological, and physical factors [20] that could hamper tumor delineation using this image type.

In agreement with Visser et al. [6], the present study showed (for two types of cancer) that when a fixed percentage threshold method (i.e., VOI50) was used, significantly larger metabolic volumes were obtained from SUV images than from Patlak images. However, these differences reduced when using methods that correct for local background and/or contrast, and in the case of gradient-based methods (Figure 2, Tables 2 and 3). This confirms that SUV-based tumor delineation is sensitive to signal-to-background ratios. Differences in Patlak- and SUV-derived volumes were larger for NSCLC than for GI cancer, especially in the case of methods that use a fixed percentage threshold without background correction. As local (image) contrast for GI cancer was larger than that for NSCLC (average tumor-to-background ratios 7.4 and 5.3, respectively), this further illustrates the sensitivity to signal-to-background ratio.

Some tumor delineation methods (i.e., VOI50, VOIA41, and GradWT1) provided visually, unrealistically large tumor volumes (in up to 41% of cases) when applied to SUV images, while VOISchaefer did the same (in up to 8% of cases) for both image types (Table 1). In contrast, GradWT2 provided many unrealistically small tumor volumes (in up to 24% of cases) for both image types. This suggests that these methods should be applied carefully and that their performance should be supervised.

Two different implementations of gradient-based methods were evaluated in the present study. In a previous NSCLC study [11], tumor diameters obtained using GradWT2 corresponded better to pathology than those obtained using GradWT1. As shown in Figure 3B, the present study also showed that measured volumes obtained from GradWT2 were smaller than those from GradWT1. However, GradWT1 showed better correspondence between SUV- and Patlak-derived volumes than GradWT2 (1.9% and 18.2%, respectively). In contrast, for GI cancer, GradWT2 showed better correspondence between SUV- and Patlak-derived volumes than GradWT1 (-2.1% and -9.1%, respectively). This suggests that the performance of gradient-based methods may also depend on signal-to-background ratios.

Differences between metabolic volumes obtained from SUV and Patlak images reduced when signal-to-background-corrected delineation methods are used. This finding is in line with previous studies reporting on test-retest variability using various tumor delineation methods [10, 21] that confirmed that VOIA50 seems to be a good possible candidate for response monitoring purposes. However, gradient-based methods have been shown to be good candidates for radiotherapy purposes [10, 22]. Therefore, either signal-to-background-corrected or gradient-based methods may be good candidates for response assessments and radiotherapy purposes.

Limitations

A limitation of this study is the lack of an independent reference standard to define tumor volumes, and consequently, in this paper, we could only study differences in (semi-)automatic tumor delineation method performance when applied onto Patlak versus SUV images. However, the accuracy and precision of several (semi-)automatic tumor delineation methods have been studied previously using simulations [8] and clinical test-retest data [10]. Both articles showed that performance of tumor delineation methods are affected by several factors, such as scanner type, radiotracer, image noise, and tumor characteristics. It is generally accepted that pathology is the gold standard. Therefore, studies are needed and are currently performed that compare the tumor volumes obtained using (semi-)automatic delineation methods with pathology [11].

Although the Patlak analysis was performed on OSEM-reconstructed images in order to reduce the levels of noise, Patlak images still showed a small fraction (< 1%) of voxels that had a negative slope, exclusively seen in non-tumor tissue locations. Correlation-coefficient filtered parametric imaging, as proposed by Zasadny and Wahl [23], could potentially enhance the quality of the Patlak images and could be further investigated to improve the accuracy of automated tumor delineation. Despite the lack of using such a denoising method, our results were in line with those published by Visser et al. [6], and we could identify that tumor delineation methods that correct for local signal-to-background contrast or use gradients showed a better agreement in tumor volume assessment between Patlak and SUV images than those tumor delineation methods that did not.

Conclusion

Large differences may exist in metabolic volumes derived from static (SUV) and dynamic (Patlak) FDG image data. These differences depend strongly on the delineation method used. (Semi-)automatic tumor delineation methods that correct for local signal-to-background contrast or use gradients provide the most consistent results between SUV and Patlak images.