This section presents phantom and human data on a number of PET camera models acquired and/or evaluated based on this guideline, from which the recommended reference values have been derived.
Phantom experiment #1
Methods Phantom experiment #1 was carried out according to this guideline on 13 PET camera models (Aquiduo, Biograph LSO, Discovery ST, Discovery STE, Discovery STEP, SET-3000 B/L, SET-3000 G/X, Biograph mCT, Discovery 600, Discovery 690, GEMINI GXL, GEMINI TF, SET3000 GCT/M) to determine the optimum scanning duration and to investigate the validity of the physical parameters as indicators of the 10-mm hot sphere visualization. The reconstruction condition, which is routinely used in the PET center that housed the PET camera, was employed for this experiment. The PET images were visually evaluated by nine physicians and technologists using “Fusion Viewer 2.0” (Nihon Medi-Physics) software to derive visualization scores.
Results and discussion Figure 2 represents the relationship between the average score of visualization for the 10-mm-diameter hot sphere and the scanning duration. As the scanning duration increased, the visualization of each PET camera model improved, although the optimum duration depended on the model.
Figure 3 represents the relationship between the average score of visualization for the 10-mm-diameter hot sphere and the physical parameters. The NECphantom, N
10mm and Q
H,10mm/N
10mm were similarly related to the visual score regardless of the camera model, suggesting the validity of those parameters as indicators of the hot sphere detectability. As scanning duration increased, NECphantom increased and N
10mm decreased, with both contributing to improving the image quality and lesion detectability. On the other hand, Q
H,10mm was poorly associated with the visual score, as it approached a constant when a certain level of counts were acquired. It should be noted that N
10mm and Q
H,10mm are affected by the reconstruction condition while NECphantom is not, and that the reconstruction condition was predetermined in the present experiments. Therefore, different results may have been obtained under different reconstruction parameters even with the same PET camera model.
A few PET camera models presented very poor detectability of the 10-mm sphere, possibly due to large N
10mm, i.e., poor image uniformity in the background area. This may be improved by installing software that enhances corrections for detector efficiency normalization and for attenuation and scatter. Suppression of N
10mm is especially important because detection of a hot sphere requires perception of the hot sphere activity in contrast to the surrounding false positive noise activities.
In this guideline Version 2.0, new PET cameras having TOF reconstruction algorithm were also examined.
The median value of the 13 camera models that provided the average visual sore of 1.5 in this experiment was adopted as the recommended reference value for each of the three physical indicators: NECphantom > 10.8 (8.7–17.5) and >8.8 (6.9–13.2) (Mcounts), N10mm < 5.6 (4.2–10.6) and <6.3 (5.8–8.1)(%), and QH,10mm/N10mm > 2.8 (2.1–3.2) and >2.2 (2.1–2.8), for 5.30 and 2.65 kBq/ml concentration, respectively (95 % confidence interval in parenthesis).
Simulation of image resolution and Phantom experiment #2
Computer simulation was carried out to determine the relationship between spatial resolution and the recovery coefficient measured under noise-free conditions in Phantom experiment #2. Using a 3D Gaussian filter with FWHM = 10 mm, the recovery coefficients of the spheres under the present experimental conditions turned out to be: RC10mm = 0.38, RC13mm = 0.52, RC17mm = 0.72, RC22mm = 0.88 and RC28mm = 0.97 (Fig. 4). Based on this simulation, RC10mm > 0.38 was adopted as the recommended reference value in this guideline, assuming that a spatial resolution of 10 mm FWHM or better would be necessary for an oncology FDG-PET image with sufficient quality.
All the PET camera models examined in this study met the requirement by selecting appropriate reconstruction parameters.
The so-called Gibbs Ringing artifact was frequently observed in the images reconstructed with PSF algorithm, i.e., RC values larger than 1.0 were observed for 17 and/or 22 mm spheres (object size being 3–4 times as large as the crystal size). PSF reconstruction should be treated with caution in quantitative measurement although it is considered to improve lesion detectability by emphasizing edges.
Accuracy of phantom background SUV (SUVB,ave)
Methods Quantitative capability was examined on the 13 PET camera models mentioned in Sect. “Phantom experiment #1” using a standard body phantom prepared in the procedure of Phantom experiment #1. The phantom was scanned for 10 min starting at the concentration of 5.30 kBq/ml without a scatter phantom. Images were reconstructed with the usual parameters and average SUV in the background area was obtained using Fusion Viewer (Ver.2.0).
Results Table 1 presents SUVB,ave of each camera model. The median of the 13 PET camera models was 1.01 (95 % confidence interval 0.98–1.05).
Table 1 Background SUV (SUVB,ave) for each camera model (theoretical value = 1.00)
SUVB,ave depends on the details of image reconstruction and other correction procedures, injected activity and cross-calibration. The accuracy of SUVB,ave is influenced by the frequency of cross-calibration, clock synchronization, and other equipment maintenance and facility management. In this study, we also examined the data acquisition methods, state of the PET camera maintenance and software updating. The results indicated that SUVB,ave obtained by SET3000 B/L and SET3000 G/X was more than 10 % off the theoretical value (1.0), of which visual inspection of the images also showed non-uniform activity in the center of background area. This may be caused by inappropriate correction methods for normalization, attenuation and scatter, and SUVB,ave and uniformity may be improved by installation of new software which has recently been released. As a matter of fact, SUVB,ave was improved in GEMINI TF after upgrading SUV calibration software.
SUVB,ave allows evaluation of the overall accuracy of the quantitative capability based on the PET camera performance, data acquisition quality, and maintenance and management of the facility. Reliability of lesion SUV values may be evaluated with SUVB,ave and RC for the sphere of corresponding size measured in Phantom experiment #2. In clinical settings, however, not only partial volume effect and quantitative capability but also physiological factors such as motion and respiration affect SUV values, and accurate measurement of lesion SUV is nearly impossible.
In summary, measurement of SUV using a phantom containing an area with theoretical SUV of 1.0 allows evaluation of quantitative capability of the PET camera together with the calibration system. If it is off 1.0, causes should be investigated and corrective measures taken.
Effect of phantom size on image quality
Methods Four PET camera models (Discovery STEP, Discovery 600, Biograph LSO, Aquiduo) were tested for the effect of object size on the image quality using two additional larger body phantoms (33 and 36 cm in major axis, corresponding to body weight of 80 and 100 kg, respectively) that were designed to be similar to the standard NEMA IEC body phantom (30 cm in major axis corresponding to 60 kg) except that the background pool activity area is larger. Data were acquired according to Phantom experiment #1 (5.3 and 2.65 kBq/ml) without a scatter phantom, and 60 sets of images were obtained with the usual reconstruction parameters. The images were evaluated by five readers with Fusion Viewer Ver. 2.5. ROI was placed and physical parameters were computed using “PETquant” (Ver 2.02.02). Phantom experiment #2 was also carried out and RCs were computed. The results were averaged across the four PET cameras.
Results Figure 5 illustrates the relationship between scanning duration and visualization score of the 10-mm sphere in Phantom experiment #1 on the larger phantoms. Longer scanning duration was required to provide detectability of the 10-mm sphere in larger phantoms containing the same radioactivity concentration.
Figure 6 presents the relationship of the cross-sectional area of the phantom against reference values of NECphantom, N
10mm, and Q
H,10mm/N10mm that made the visualization score >1.5. As the phantom became larger, larger NECphantom was required while similar N
10mm and Q
H,10mm/N
10mm were sufficient to visualize the 10-mm sphere. This suggests that N
10mm and Q
H,10mm/N
10mm remain good indicators of lesion detectability irrespective of the object size. Although NEC is believed to reflect image quality, NEC density, which is used for evaluation of patient scans, may be a better indicator for variable object size.
Figure 7 illustrates RC curve against the sphere diameter measured in Phantom experiment #2, which showed a tendency of lower RC for larger phantoms.
Image noise inevitably increases due to increased scatter and attenuation in larger phantoms, which hinders detection of hot spheres as contrasted to the background noise, and thus the results of the present study. The large phantom used here contained the same radioactivity concentration as the standard phantom, corresponding to the same injected activity per body weight in patient scans. Therefore, longer scanning duration should be necessary to obtain the same lesion detectability in patient scans if injected with the same activity per body weight. Increasing the injected activity per body weight is an alternative if radiation exposure permits and the dose is available, although it may not increase NEC and image quality as much as elongation of scanning duration does because of increased random rate and count loss. TOF reconstruction may be another solution as described in Sect. “Human image quality evaluation”.
Based on the relationship between body weight and cross-sectional area of the Japanese population, the results of the present study allow estimation of the necessary scanning duration to obtain the same lesion detectability in patients of large body weight injected with the same activity per body weight: compared with 60 kg body weight as reference, 1.3, 1.7 and 2.2 times as long scanning duration (i.e., many NEC) is required for 70, 80 and 90 kg body weight, respectively.
Human image quality evaluation
Methods To examine the image quality of whole-body FDG-PET images currently acquired clinically in Japan and the relationship with the physical parameters, patient images were collected from 10 PET centers using 10 different PET camera models, 28–30 cases from each center. Those images had been acquired as routine diagnostic scans according to the protocol of each PET center without any artifacts or other problems, and interpreted by local PET physicians and reported to the attending physicians. Images with extremely abnormal FDG accumulation were excluded.
The quality of the images was visually evaluated by five JSNM-certified PET physicians using 5-step scores regarding how and whether they had sufficient quality to be read and interpreted. The image was given a score of 5 for “very good quality”, 4 for “sufficiently good quality”, 3 for “scarcely sufficient quality”, 2 for “not sufficient quality”, and 1 for “unreadable”. NECpatient, NECdensity and liver SNR were computed as described above and were compared with the visual score as well as the BMI of the patient. The results were also analyzed separately for PET cameras using time-of-flight (TOF) reconstruction and for non-TOF reconstruction.
Results and discussion Figure 8 illustrates the plots of the average visual score against NECpatient, NECdensity and liver SNR. These three indicators are known to be excellent indicators of image quality for whole-body FDG-PET/CT images acquired with a single PET camera model [2]. In the present study, in which image data by 10 PET camera models were merged, visual score presented a weak but significant correlation with NECpatient (r = 0.376, p < 0.001) and with NECdensity (r = 0.432, p < 0.001). This suggests that NECpatient and NECdensity may still be useful indicators of image quality even across different PET cameras and PET centers. NECdensity is less influenced by the body size and by the arm position, which might have provided higher correlation coefficients. On the other hand, liver SNR was weakly and negatively correlated with the visual score (r = −0.278, p < 0.001). Because liver SNR depends on the image reconstruction method and parameters, which is variable among PET camera vendors and models but is already optimized to some extent, an opposite correlation might have been observed. It is also noted that liver SNR relies on careful ROI placement.
Since the images were all selected from routine clinical scans, heavier patients had been injected with more activity and/or were scanned for a longer duration, so that they would not include images with too high or too low quality. This may be another reason for the weak correlation between the visual score and the physical parameters.
There was a significant difference in the visual score between PET cameras employing TOF reconstruction (3.87 ± 0.44) and non-TOF reconstruction (3.46 ± 0.41), TOF gaining a significantly higher score than non-TOF (p < 0.001). No significant difference was observed between TOF and non-TOF for NECpatient (22.4 ± 6.36 vs. 23.5 ± 8.16 Mcounts/m) or for NECdensity (0.44 ± 0.13 vs. 0.45 ± 0.22 kcounts/cm3), respectively, which is reasonable because NEC is defined in the acquired raw data and is independent of the reconstruction technique. This also supports the hypothesis that TOF is an effective reconstruction technique for improving image quality of given raw data. Interestingly, liver SNR was lower in TOF than in non-TOF (13.1 ± 2.92 vs. 16.0 ± 5.01, p < 0.001), suggesting that TOF images may provide higher visual quality even with lower liver SNR.
Figure 9a, b plot visual scores against BMI in TOF images and in non-TOF images, respectively. No correlation was observed for TOF images (r = −0.095, p = 0.171), while a significant negative correlation was observed for non-TOF (r = −0.474, p < 0.001). A trend to a lower visual score in patients with larger BMI was pointed out for routine whole-body FDG non-TOF PET/CT scans in our previous study [1], suggesting insufficient adjustment of scanning duration and/or injected activity for large BMI patients, which was also confirmed by the present result in Fig. 9b. However, the lack of this trend for TOF images in Fig. 9a indicates that equivalent visual image quality is obtained for large BMI patients in routine scans and may suggest the effectiveness of TOF in preventing degradation of visual image quality in large BMI patients.
Based on these patient data, the recommended reference values were determined as NECpatient > 13 (Mcounts/m), NECdensity > 0.2 (kcounts/cm3) and liver SNR > 10 for this guideline. It should be noted, however, that these reference values may still depend on the camera model, and that further modification and revision may be necessary to make them reliable criteria for quality control.