Assessment of different quantification metrics of [18F]-NaF PET/CT images of patients with abdominal aortic aneurysm

Background We aim to assess the spill-in effect and the benefit in quantitative accuracy for [18F]-NaF PET/CT imaging of abdominal aortic aneurysms (AAA) using the background correction (BC) technique. Methods Seventy-two datasets of patients diagnosed with AAA were reconstructed with ordered subset expectation maximization algorithm incorporating point spread function (PSF). Spill-in effect was investigated for the entire aneurysm (AAA), and part of the aneurysm excluding the region close to the bone (AAAexc). Quantifications of PSF and PSF+BC images using different thresholds (% of max. SUV in target regions-of-interest) to derive target-to-background (TBR) values (TBRmax, TBR90, TBR70 and TBR50) were compared at 3 and 10 iterations. Results TBR differences were observed between AAA and AAAexc due to spill-in effect from the bone into the aneurysm. TBRmax showed the highest sensitivity to the spill-in effect while TBR50 showed the least. The spill-in effect was reduced at 10 iterations compared to 3 iterations, but at the expense of reduced contrast-to-noise ratio (CNR). TBR50 yielded the best trade-off between increased CNR and reduced spill-in effect. PSF+BC method reduced TBR sensitivity to spill-in effect, especially at 3 iterations, compared to PSF (P-value ≤ 0.05). Conclusion TBR50 is robust metric for reduced spill-in and increased CNR. Electronic supplementary material The online version of this article (10.1007/s12350-020-02220-2) contains supplementary material, which is available to authorized users.


INTRODUCTION
Positron emission tomography/computed tomography (PET/CT) is a hybrid imaging technique that maximizes the information that can be extracted from both anatomical (CT) and functional (PET) images. 1,2 While many radiotracers are used in PET/CT imaging, [ 18 F]-fluorodeoxyglucose ([ 18 F]-FDG) is the most common radiotracer; it can be used for different oncologic and non-oncologic applications. One of the non-oncologic applications of [ 18 F]-FDG-PET/CT is for inflammatory vascular disease, as is the case with abdominal aortic aneurysms (AAAs). 3 [ 18 F]-FDG accumulation in the AAA region is related to an active inflammatory process, which can be defined as leukocyte infiltration in the adventitia, in addition to increased concentrations of circulating C-reactive protein. 4,5 However, the role of [ 18 F]-FDG in predicting the future growth and rupture risk of AAAs remains unclear, as studies have reported conflicting results. 6,7 Moreover, local cellular hypoxia, which affects [ 18 F]-FDG uptake and the contribution of the uptake from metabolically active adjacent structures, may confound the PET signal. 6 Thus, more sufficient evidence is required to support the use of [ 18 F]-FDG to predict future growth or rupture risk.
On the other hand, there is an increasing evidence of the efficacy of the sodium fluoride ([ 18 F]-NaF) radiotracer as a marker of microcalcifications in AAAs, which can be a predictive sign of an increased risk of future rupture. 6,8 Furthermore, [ 18 F]-NaF PET/CT may be able to determine the hotspots of microcalcification. However, it is important to note that a major challenge and an inherent limitation of using [ 18 F]-NaF for AAAs is that [ 18 F]-NaF is taken up by the vertebrae because it is mainly a bone radiotracer. 9,10 The anatomical site of the vertebrae is close to the posterior wall of the aorta, which results in an increased signal from this region because of the spill-in effect, leading to inaccurate quantification results. 11 Thus, to increase the accuracy of the results, it is essential to either correct for the spill-in effect or else to identify the most appropriate quantification metrics which are less affected by the spill-in effect.
Spill-in correction can be applied during or after the standard OSEM reconstruction. 12 It can be performed using different techniques, such as the background correction (BC) method in addition to PSF reconstruction. [13][14][15] Although the PSF reconstruction method alone can correct for the generic partial volume effect, it has not been proven to be effective for the more specific spill-in correction when the region of interest is in close proximity to an active region. [15][16][17] The PSF is modeled as a 3D Gaussian function, and it can be incorporated into the OSEM algorithm 13 where it is used in forward and backward projections. 13,15 The BC method is applied after the background contribution to the PETreconstructed image has been identified, using a segmented CT image as the background mask. 15 After correcting for the spill-in effect, standardized uptake value (SUV) measurements can be derived where the spill-in effect, potentially, leads to a significant overestimation in SUV. 18 This overestimation is partly influenced by the ROI selection criteria as a part of the active region might mistakenly be included with the target region, and past studies have shown that this spillin effect is more prominent in SUV max than the SUV mean. 15,17 Also, the spill-in effect reduces with iteration which comes at the expense of increased noise and reduced contrast. 19 SUV max is the highest voxel value within the region of interest (ROI); therefore, it is not so much affected by the ROI selection, but it is affected by noise and the spill-in effect. [20][21][22] However, SUV mean is the average of all the voxel values in the ROI; thus, while it is affected by the ROI selection, it is less sensitive to noise. [20][21][22] SUV max is the most common parameter used to measure radioactivity in patients, but SUV mean is impractical and unreliable in atherosclerotic plaque quantification because it is affected by the ROI selection. 23 It is also very difficult to define AAAs accurately because they do not have smooth edges 23 ; this leads to an inaccurate SUV mean . Furthermore, because of the limitations of both SUV max and SUV mean , alternative SUV metrics can be derived in addition to SUV max and SUV mean which may be more robust to spill-in effect and noise. The present study proposed that the SUV metrics between SUV max and SUV mean , such as SUV 90, SUV 70 or SUV 50 , could possibly provide a better trade-off. These proposed metrics represent the mean of voxel values of the respective AAA region that are equal to or greater than 90%, 70%, or 50% respectively of SUV max .
However, when assessing vascular regions, the variability of the PET imaging protocols affects the SUV measurements. 23 According to Huet et al., 24 the SUV values are influenced by several factors, such as image reconstruction, the number of iterations used and the post-filtering applied to the reconstructed images. The injected activity, the time between the injection and imaging of the patient and the acquisition duration have low variability, so they do not significantly affect the SUV values. 24 This issue may limit the ability to conduct fair comparisons of the results from different institutions. To address this issue, the target-to-background ratio (TBR) was first introduced for assessing the atherosclerotic plaque. 25 The TBR can be derived from SUV; TBR max , TBR mean , TBR 50 , TBR 70 and TBR 90 are derived from SUV max , SUV mean , SUV 50 , SUV 70 and SUV 90 , respectively. The TBR is used to reduce the variation of the SUV measurements by correcting for the blood uptake. 23 Therefore, TBR represents what the SUV actually represents, which is the measure of the radioactivity of the tracer in the vascular plaque. In the case of [ 18 F]-NaF, the TBR represents plaque microcalcification. To date, no known studies have used TBR 50, TBR 70 or TBR 90 to measure radioactivity in atherosclerotic plaque, and no studies have made direct comparisons between different TBR metrics to determine the TBR metric that is most robust to the spill-in effect under specific circumstances.
Thus, the present study aims to compare a range of TBR metrics, including TBR max , TBR 90 , TBR 70 and TBR 50 , to investigate which TBR might be more robust to the spill-in effect for use in [ 18 F]-NaF AAA PET imaging. This comparison was performed using the standard reconstruction (including PSF modelling), and the correction (PSF?BC) methods at 3 and 10 iterations, and two different ROI delineations, to investigate which TBR metric is less sensitive to the spill-in effect, and for which method and iteration.

Study Datasets
For the present study, the data from 72 patients from the archive of the ''Sodium Fluoride Imaging of Abdominal Aortic Aneurysms (SoFIA 3 )'' study (NCT02229006) 11 were used for this study. All participants were older than 50 years of age and were diagnosed with asymptomatic AAA. The aneurysms were measured using ultrasound undertaken at either the Royal Infirmary of Edinburgh, the Western Infirmary in Glasgow or the Forth Valley Royal Hospital, with an anteroposterior diameter of C 4 cm for all patients whose data were used in the study. The data consist of 61 males and 11 females with age range 72.5 ± 6.9 years, body mass index 27.6 ± 3.5 kg/m 2 and aortic diameter 48.8 ± 7.7 mm. The patients were injected intravenously using the 125 MBq of [ 18 F]-NaF radiotracer. After 60 minutes of waiting uptake time, images were taken using a hybrid PET/CT scanner (Biograph mCT; Siemens Healthcare, Erlangen, Germany). Image acquisition was sequential starting with a low dose of the radiotracer and a 128-detector array CT scan, followed by PET imaging. During PET imaging, to ensure that the entire area of the aneurysm was covered, the acquisition was obtained from the thoracic aorta to the aortic bifurcation. This was achieved by applying three bed positions, each lasting 10 minutes.
Written informed consent was obtained from the participants to use their datasets, and approval was given by the research ethics committee in accordance with the Declaration of Helsinki.

Image Reconstruction and Spill-in Correction
The datasets were reconstructed using the software for tomographic image reconstruction (STIR) 26 with OSEM (21 subsets, 10 iterations). PSF reconstruction was incorporated into the reconstruction as an isotropic 3D Gaussian kernel with 4.4 mm full width at half maximum (FWHM) in both axial and transverse planes. The BC technique was used for spill-in correction. 15 The bone was segmented from the CTAC image and the bone radioactivity was obtained from the reconstructed PET image (i.e., third iteration). This bone contribution was then included as an additive term in the reconstruction, producing a reconstructed image alleviating the contribution from the bone. Further details about the BC technique can be found in the literature. 15,17,27 No post-filtering was applied to any of the reconstructed images.

Datasets Analysis
Datasets were analyzed using ''A Medical Imaging Data Examiner (AMIDE)'' software 28 in several steps. For region of interest (ROI) analysis, two ROIs were drawn on the CT images using the semi-automated ellipsoid method. One of the ROIs was defined as the entire aneurysm area, referred to in the present study as AAA. The other ROI included the entire aneurysm area, but excluded the posterior wall of the aorta that is near the vertebrae, referred to in this study as AAA exc . Following past research which showed that the spill-in effect is pronounced in regions within 2 voxels to the hot region, 15 AAA exc was drawn such that its distance from the bone is approximately 5mm, corresponding to about 2 voxels. The ROIs were then transferred to the reconstructed PET data. The standard clinical iteration is 3 iterations, but the image at 10 iterations was also used in the present study for comparison because a past work was found that the difference in uptake values due to the spill-in effect decreases by increasing the iterations and converges at approximately 10 iterations. 15 Next, semi-quantitative measurements, which are the SUV metrics, were derived from the data including SUV max , SUV 90 , SUV 70 and SUV 50 .
The TBR was calculated for each SUV metric by drawing a background ROI on the inferior vena cava for background blood pool correction. Consequently, the TBR for the two ROIs (AAA and AAA exc ) for each SUV metric, method and iteration was calculated using Eq. (1): where i denotes max, 90, 70, and 50.
The effective spill-in effect from the bone into the aneurysm was quantified by the difference between the TBR AAA and TBR AAA exc (DTBR) given by: The noise properties of the TBR metrics were evaluated using the contrast-to-noise ratio (CNR) given by: Statistical Analysis Statistical analyses were undertaken using IBM SPSS statistics software package, version 23. For all patients, the difference in the TBR metrics (max, 90, 70 and 50) were compared between the two reconstruction methods and iterations using paired t-test. The statistical analyses were performed with a 95% confidence interval (CI), and a P-value of B 0.05 was considered to be statistically significant. Figure 1 shows a sample CTAC and the PETreconstructed images from a patient which indicate a high [ 18 F]-NaF uptake in the aneurysm and the bone. The segmented bone used for the BC is also shown. Note that the bone uptake contribution including the spill-in has been removed in the PSF?BC image. Figure 2 shows the comparisons of the TBR metrics for different ROIs, methods, and iterations. As seen for PSF at 3 iterations (Figure 2b), the different TBR metrics form variations above the black reference line when comparing the values of TBRs for the two ROIs (AAA and AAA exc ). Because the reference line indicates that the difference between the TBR values of the X-axis (i.e., AAA exc ) and Y-axis (i.e., AAA) equals zero, variations above the reference line indicate that, for PSF at 3 iterations, TBR AAA is higher than TBR AAA exc . Moreover, TBR 50 was the closest metrics with the lowest intercept to the reference line, followed by TBR 70 , then TBR 90 and TBR max . Higher uptake values can be seen in the AAA ROI, where the highest values were recorded for TBR max , TBR 90 and TBR 70 . While TBR 50 also had some high values, they were however closer to the reference line. The plot of PSF at 10 iterations (Figure 2a) shows that the variations between the TBR metrics decreased and then became closer to the reference line, with almost similar intercepts. While there are some high uptake values for TBR max , TBR 90 , TBR 70 and TBR 50 at the AAA ROI, TBR 50 had lower values than the other TBRs.

RESULTS
For PSF?BC at 10 iterations (Figure 2c), the TBR metrics became closer to each other; therefore, their divergence was reduced. TBR 70 intercept was the farthest from the reference line. The plot of PSF?BC at 3 iterations (Figure 2d) shows that the TBR metric lines were similar to the results obtained for PSF?BC at 10 iterations; they had intercepts closer to the reference line, and the TBR line extensions were almost identical to the reference line. The dispersion of the TBRs values was slightly higher for the PSF?BC at 3 iterations than at 10 iterations, especially for TBR 70 , TBR 90 and TBR max . However, this dispersion is minimal when compared with PSF at 3 iterations. Furthermore, all values of different TBR metrics for PSF?BC were close to the reference line in comparison to the values for PSF. TBR 50 was the closest to the reference line, while TBR 70 , TBR 90 and TBR max had a slightly higher uptake at the AAA ROI.
It could also be seen that the scattering of the TBR values was greater in the PSF method than the PSF?BC method, especially at 10 iterations, and the highest intercept values were for TBR max , TBR 90 and TBR 70 , while the TBR 50 values were closer to the reference line. However, as seen in Table 1, for the PSF method, the differences between the iterations were not statistically significant for all the TBR values except for TBR 70 with P-value 0.04. In addition, the PSF?BC method had similar statistical results, except for TBR max and TBR 90 with P-values equal to 0.002 and 0.04, respectively.
Additional analyses were conducted to compare the methods and iterations to investigate the differences in the TBR metrics. Figure 3 shows the comparisons of the different methods (PSF vs PSF?BC) at 3 and 10 iterations. At 10 iterations, there was a variation between the TBR metric lines. The TBR lines closest to the reference line were TBR 50 , followed by TBR 70, TBR max and TBR 90 . The closest high values to the reference line were for TBR 50 . However, as shown in Table 1, no statistically significant difference was found between PSF and PSF?BC for all the TBR metrics at 10 iterations.
On the other hand, at 3 iterations, as seen in Figure 3, the variations between the TBR metrics lines was greater than the variations in the lines at 10 iterations, and the lines were far from the reference line.
The TBR values were higher for the PSF method, with an increase in the number of values away from the reference line. As previously shown, the highest values were for TBR 70 , TBR 90 and TBR max , while TBR 50 had few anomalous values in relation to the reference line. As seen in Table 1, the comparison of PSF vs PSF?BC for all the TBR metrics indicates a statistically significant difference between the methods at 3 iterations, with P-values equal to 0.0002, 0.0006, 0.002, and 0.0003 for TBR max , TBR 50 , TBR 70 and TBR 90 , respectively.

DISCUSSION
Despite the increasing evidence of the efficacy of the sodium fluoride ([ 18 F]-NaF) radiotracer as a marker of microcalcifications in AAAs, 6,8 a major confounding issue is the spill-in contamination from the bone (where Fig. 1. CT images and PET-reconstructed images of a patient dataset, showing a high [ 18 F]-NaF uptake in the bone and the aneurysm. The activity contribution from the bone was removed using PSF?BC. The ROIs used to extract the SUVs at the aneurysm are shown on the CTAC image. The outer yellow and inner red ROIs represent AAA and AAA exc , respectively. Following past research, 15 AAA exc was drawn such that its distance from the bone is approximately 4 mm. The blue small spherical region highlights the background ROI used for blood pool correction and the calculation of TBR.  To further evaluate the TBR metrics and their robustness to spill-in effect and noise reduction, the differences in TBR between AAA and AAA exc (i.e., DTBR) was plotted against the CNR as shown in Figure 4. As expected, the difference in TBR due to the different ROI delineation was high at lower iteration but reduces as iteration increases. However, this comes as the expense of reduced CNR. TBR 90 has the highest CNR but the DTBR was high just like TBR max . TBR 50 gave the best trade-off between increased CNR and reduced DTBR the tracer is taken up) to the adjacent aneurysm. 9,10 Our previous study 17 extensively investigated the spill-in effect in [ 18 F]-NaF PET imaging of AAA, and it was shown that the spill-in effect depends on the activity uptake in the bone, proximity of the aneurysm to the bone, as well as ROI delineation criteria. This effect poses a great challenge to the quantification accuracy at the aneurysm site and it may adversely affect AAA disease prediction and patient management. 11,17 As reflected by the SOFIA 3 study, 11 better AAA disease prediction using [ 18 F]-NaF, in addition to clinical risk factors including AAA diameters, would be of great benefit to patients with high-risk aneurysms which size may be smaller than what the current guidelines may suggest (i.e., 55 mm). Thus, to increase the accuracy of the AAA quantification, it is essential to either correct for the spill-in effect or else to identify the most appropriate quantification metrics which are less affected by this effect. This was the main aim of the study. The present study investigated TBR metrics using PSF and PSF?BC methods, 3 and 10 iterations and two semi-automated ROIs (AAA vs AAA exc ) to determine which TBR metric is less sensitive to the spill-in effect coming from the hot region (i.e., bone) adjacent to the aneurysm. TBR mean is impractical for quantifying uptake at the aneurysm due to the heterogenous activity distribution in the aneurysm, and ill-definition of the aneurysm edges. Therefore, the use of the semi-automated method for ROI definition may result in inaccurate TBR mean , thus it was excluded from this study.
By comparing the TBR values in different situations, and observing the results shown in Figure 2, it can be concluded that the more the iterations, the more robust the TBR values, and these values do not appear to be affected by the ROI. Increasing the iterations for the same method reduces the difference in the uptake values of the two ROIs. Because the uptake values consistently increase (for PSF?BC) or decrease (for PSF) while increasing the number of iterations until they reach convergence, 15,29 which may explain the consistency of the TBRs values at 10 iterations. This result is the focus of attention because the difference in how individuals draw the ROI may become less important by increasing the number of iterations. Furthermore, by applying 10 iterations, the TBR results indicated that both methods were similar. However, Akerele et al. 15 noted that, although increasing the number of iterations reduces the impact of the spill-in effect, it also increases noise and decreases the contrast-to-noise ratio.
By comparing the two methods (PSF and PSF?BC), as seen in Figure 3 and Table 1, the TBRs are more consistent and less sensitive to the spill-in effect with the PSF?BC method. Thus, the TBRs in both iterations appear to have converged. This may indicate the importance of applying PSF?BC, because it contributes to minimizing the impact of the spill-in effect. It might be better to use PSF?BC at 3 iterations, because its behavior is very similar at 3 and 10 iterations, rather than increasing the number of iterations for the PSF method, due to the increase in noise. The results for the PSF method conflict with the findings reported in the literature review where PSF, theoretically, can provide an advantage in terms of reducing TBR overestimation due to the spill-in effect. [30][31][32][33] PSF alone is used as a correction for the generic partial volume effect, but it has not been proven to be effective for the more specific spill-in correction, as is the case with AAA assessment. Moreover, Akerele et al. 15 reported that for proximal lesions to an active region, incorporating PSF into the standard OSEM reconstruction has no added advantage compared to using OSEM alone. However, this could be due to the fact that only a simple space invariant PSF was used.
In terms of the robustness of the TBR metrics to noise and spill-in reduction, the graph of DTBR against CNR (Figure 4) shows that for each TBR metrics, the difference in TBR due to the different ROI delineation was high at lower iteration but reduces as iteration increases. However, this comes as the expense of reduced CNR. TBR 90 has the highest CNR but the DTBR was high just like TBR max . TBR 50 gave the best trade-off between increased CNR and reduced DTBR. Overall, TBR 50 appears to be the most robust TBR value as it is less affected by the ROI and the spill-in effect for both PSF methods (with and without correction) and all iterations, followed by TBR 70 ; in contrast, the closer the TBR was to the TBR max , the more it was affected by the spill-in effect.
The present study's TBR findings are consistent with the results reported by Boucek et al. 34 who investigated the accuracy of SUV max in a tumor response assessment and found that the SUV max was influenced by the spill-in effect. Furthermore, Visser et al. 35 stated that the impact of the spill-in effect can be reduced using voxel values equal to or greater than a fixed percentage of the SUV max , which is similar to what was used in the present study: TBR of SUV 50, SUV 70 or SUV 90 . Because the TBR values were derived from the SUV values, these two studies can be considered to have similar results, which, in turn, might support the results of the present study. However, further studies are needed to investigate the TBRs results to obtain fair comparisons, because the SUV metrics differ from one center to another depending on several factors that are difficult to standardized due to the differences between scanners, image reconstruction and data analysis software. 24 It is worth noting that the recommendations in this work are rather task-based. If one is concerned mainly about quantification accuracy, then it is recommended to use more iterations with or without BC. This is because the more the iterations, the better the quantification and the less the spill-in effect. However, if better contrast and lesion detectability is of utmost importance, it is best to use less iteration and then apply the BC method.

STUDY LIMITATIONS AND FUTURE WORK
The results of the present study are subject to some limitations. First, drawing ROIs using the semi-automated method could have affected the measurement of the TBRs; the manual method might be more accurate for determining the size of the aneurysm because the AAA wall is not often well defined. Therefore, an issue that was not addressed in this study was whether or not the semi-automated method differs from the manual method in terms of TBR measurement accuracy. Second, only two iterations (i.e., 3 and 10) were used to extensively investigate the impact of increasing iterations on the spill-in effect rather than evaluating many more iterations. Finally, no known studies have made direct comparisons between different TBR metrics, which prevented the ability to compare the study's TBRs results to other studies. Therefore, it is recommended that future studies be conducted to further explore the current topic.
Also, this study needs to be further validated with larger cohort to potentially distinguish any differences between male and female AAA patients. The main reason that we had more male AAA patients than females in this study is the fact that the male sex is one of the risk factors for AAA. 36,37 So, our study represents a typical AAA cohort with larger number of male (N = 61) than female (N = 11) patients. However, there might be some sex-specific variables such as arteries sizes which might affect the generalization of our results. Furthermore, it is interesting to investigate whether there is a significant difference in aneurysm  TBRmax  TBR90  TBR80  TBR70  TBR50  TBR40  TBR20   Iteration 1 Iteration 10 Fig. 4. The plot of difference in TBR values for AAA and AAA exc (DTBR) against CNR for all the TBR metrics as iteration increases. This is shown for a sample patient reconstructed with PSF. A robust TBR metric will show low DTBR and high CNR.
shape and heterogeneity of [ 18 F]-NaF uptake between male and female patients, which may impact ROI thresholding as proposed in this study. Although the application of the BC technique helped to reduce the spill-in effect, there are several other challenges and biases which could affect the TBR results such as the scanners, image reconstruction algorithms and data analysis software used across clinical centers. 38 So, there is a need to further investigate other metrics that could have more accurate results than SUV metrics or TBR metrics. Advanced metrics, known as radiomics, have emerged and may help overcome the limitations of using SUV and TBR metrics. Radiomics can provide more reliable prognostic information than conventional SUV metrics. 39 Several studies have compared radiomics and SUV in terms of therapy outcome, and the results favored the use of radiomics. [40][41][42][43][44] Therefore, extensive research should be conducted to assess the reliability and robustness of these advanced metrics before they are clinically adopted.

NEW KNOWLEDGE GAINED
In this study, we have shown that the most commonly employed quantification metric of TBR max for clinical assessment in [ 18 F]-NaF PET/CT imaging of AAA is prone to quantification overestimation, partly due to the spill-in effect from the bone into the aneurysm, and also due to differences in ROI delineation criteria. The use of lower TBR thresholds can yield more robust [ 18 F]-NaF quantification that is less sensitive to spill-in effects, with TBR 50 resulting in the least overestimation.

CONCLUSIONS
The quantitative metric of TBR contrast in AAA regions of [ 18 F]-NaF images acquired from human PET/ CT exams appeared to be less sensitive to the spill-in effect when using PSF?BC and/or when increasing the number of OSEM iterations. However, the noise levels increased with the number of OSEM iterations thus reducing CNR and potentially impacting AAA lesions detectability. Therefore, to enhance [ 18 F]-NaF quantification in AAA, we recommend applying the PSF?BC method with few iterations. Moreover, the use of a 50% relative-to-maximum threshold for defining the TBR (TBR 50 ) was found to be most robust metric as it exhibited the lowest sensitivity to the spill-in effect; in contrast, the closer the TBR definition was to the TBR max , the more it was affected by the spill-in effect.