Introduction

Iodine concentration (CI) is a commonly assessed feature in quantitative contrast-enhanced computed tomography (CECT). It has been suggested as an imaging biomarker for evaluating treatment response and tissue vascularity and distinguishing among histological tumor subtypes [1,2,3]. Recent studies have evaluated its accuracy and precision under several imaging conditions [4,5,6,7]. Some studies have shown that bias, a measure of accuracy, increases with phantom size and decreases with radiation dose. Other studies have shown that the repeatability coefficient, a measure of precision, decreases with radiation dose and when iterative reconstruction (IR) algorithms are used, compared to the commonly used filtered backprojection (FBP) algorithm. These rigorous evaluations are not only useful but necessary to fully characterize the properties of CI as a quantitative imaging biomarker.

Accurate and precise CI values are essential to enable reliable and reproducible detection and quantification tasks in the clinical and preclinical scenarios. This is the case when small changes or thresholds in CI are used to assess changes during treatment or post-treatment [1, 4], to distinguish between pathological types or risk subgroups [3, 7], for staging or evaluating tumor burden and metastasis [7], to evaluate tumor heterogeneity or perfusion parameters from CI maps or time-CI curves [1,2,3]. Moreover, the evaluation of accuracy and precision are key features in the current guidelines for the translation of quantitative imaging biomarkers into the clinic, as well as for the optimization and standardization of image acquisition and analysis, or to understand their limitations [8,9,10].

CI has been evaluated in the preclinical setting, particularly for the study of animal models of cancer with contrast-enhanced micro-computed tomography (CE micro-CT) to estimate the angiogenic status of tumors or their vascularization [11,12,13,14,15,16,17]. Recently, one study has reported the evaluation of the bias of CI to compare the performance of two imaging detectors under two CE micro-CT imaging protocols [16]. To the best of our knowledge, no other studies have evaluated the accuracy or precision of CI in CE micro-CT. Additionally, current approaches in CE micro-CT involve the use of iterative reconstruction and noise removal algorithms that could have an impact on the accuracy and precision of CI [15, 18,19,20]. The findings in CECT and the current approaches in CE micro-CT highlight the necessity to evaluate the impact of both conventional and novel approaches in CE micro-CT on the accuracy and precision of CI.

The aim of this work was to assess the accuracy and precision of CI quantified in CE micro-CT images reconstructed and post-processed under several conditions: images reconstructed with the micro-CT vendor’s conventional reconstruction algorithm (FBP-based), images reconstructed with an in-house-implemented IR algorithm, and images filtered with a noise removal algorithm. The noise removal algorithm was a 3D bilateral filter (BF) specifically designed for either the FBP or IR algorithm. A phantom study evaluated and compared image quality, and the accuracy and precision of CI quantified under these conditions. An animal study assessed the in vivo performance of CI under the same conditions. All images were acquired with a previously optimized protocol for the assessment of vascular parameters with subtracted CE micro-CT imaging [21], and the IR and BF algorithms had also been previously validated and reported [20, 22].

Materials and Methods

Image Acquisition and Reconstruction

Image acquisition was performed with the micro-CT scanner of the trimodal PET/SPECT/CT Albira ARS preclinical system (Bruker, Spain). Micro-CT images were acquired with 45 kV, 0.8 mA, and 400 projections, according to a previously optimized imaging protocol for subtracted CE micro-CT imaging with this scanner [15]. This protocol involved the acquisition of two images: a pre-contrast image (i.e., the baseline image) and a second image after or during the administration of the contrast agent (i.e., the contrast-enhanced (CE) image). The baseline image was then subtracted from the CE image to yield the subtracted CE image, which was then parameterized to units of CI with a calibration function to yield the subtracted CI image. The total radiation dose to water measured at the isocenter for this imaging protocol was 680 mGy [15].

Image reconstruction was performed with either the conventional vendor’s reconstruction algorithm (FBP-based) or with a simultaneous iterative reconstruction technique (SIRT) algorithm implemented in-house with the Matlab R2018b (The MathWorks Inc., Natick, MA, USA) ASTRA toolbox. The SIRT algorithm was previously validated and optimized [22]. The number of iterations for the SIRT algorithm was optimized in this study as a trade-off between noise and spatial resolution, as described in the Supplementary Appendix and briefly summarized here. Increasing the number of iterations increased the noise and improved the spatial resolution in SIRT reconstructed images, as shown in Supplementary Figure S1. A range of 85–180 iterations were evaluated, and 180 iterations were selected since this number produced SIRT images with the highest spatial resolution among the evaluated number of iterations. An adverse consequence of selecting a high number of iterations, however, is that a higher noise content would be observed in the SIRT images used in this study, compared to the FBP images. FBP reconstructed images had a matrix size of 560 × 560x516 and a pixel size of 0.125 mm; SIRT reconstructed images had a matrix size of 750 × 750x657 and a pixel size of 0.1 mm. Images were calibrated to Hounsfield units (HU) using the average attenuation value of water for each reconstruction algorithm, which was obtained from images of a water phantom; a transverse view of this phantom is shown in Fig. 1.

Fig. 1
figure 1

(a) Water phantom, (b) acrylic semi-cylinder phantom, and (c) calibrated iodinated phantoms

Noise Removal

A BF is a spatial-domain non-linear function designed to reduce noise while preserving the small structures and edges [23]. Its optimal parameters are related to the noise and spatial resolution of the image to be filtered, which means that they are unique for a given imaging protocol and reconstruction algorithm. In this work, a specific 3D BF was used for micro-CT images reconstructed with SIRT or FBP algorithms. BFs were implemented in MATLAB R2018b (The MathWorks Inc., Natick, MA, USA); the details of their implementation have been described previously [20].

Quantification of Image Quality

Image quality was assessed with the noise power spectrum (NPS), the modulation transfer function (MTF), and the contrast-to-noise ratio (CNR) in four scenarios: FBP reconstructed images, SIRT reconstructed images, filtered FBP images (fFBP), and filtered SIRT images (fSIRT). All images were reconstructed with the FBP and SIRT algorithms. The fFBP and fSIRT images were obtained after applying the corresponding BF to the reconstructed images. The 2D and 1D NPS were evaluated from images of a water phantom, and the 1D MTF was evaluated in the transverse plane from images of an acrylic semi-cylinder phantom, following guidelines for the assessment of image quality in CT scanners [24, 25]. Figure 1 shows the water and acrylic phantoms.

Calibrated iodinated phantoms were used to assess the CNR[26]. These phantoms consisted of a solid epoxy material with CI values of 0, 0.5, 1.0 and 3.0 mg I/ml, as shown in Fig. 1. One image of each iodinated phantom was acquired in a separate and consecutive manner with the same acquisition parameters, as defined above. An affine registration was performed between each image and the image of the 0 mg I/ml phantom [27], which was considered the baseline image, and then the baseline image was subtracted from the images of the other iodinated phantoms to yield the subtracted CE images.

The contrast was quantified from the subtracted CE images as the difference between each iodinated phantom and the 0 mg I/ml phantom. The noise was evaluated as the standard deviation of the mean value measured in the subtracted CE image of the 0 mg I/ml phantom. The CNR was estimated as the contrast of each iodinated phantom divided by the noise. The NPS, MTF, and CNR were quantified using MATLAB R2018b (The MathWorks Inc., Natick, MA, USA).

Accuracy and Precision of CI: Phantom study

The subtracted CE images were converted to CI using an appropriate calibration function for each reconstruction algorithm, to yield the CI images. The calibration functions (CI vs. HU measured in the subtracted CE images of the calibrated iodinated phantoms) were CI = 0.022*CE + 0.298 (R2 = 0.99) and CI = 0.023*CE – 0.095 (R2 = 0.99), for FBP and SIRT images, respectively.

Three repeated measurements of CI were performed for each nominal CI value. A separate set of images of the iodinated phantoms was acquired in a second experiment to assess the precision with a test–retest approach; CI values were measured in this set of images to yield the replicate measurements.

The linear relationship between the measured and the nominal CI values were assessed for the FBP, SIRT, fFBP, and fSIRT images [10]. Plots of the replicate measurements were obtained (measured vs. nominal CI); 2nd and 1st order polynomials were fitted to the data, and linearity was supported when the β2 coefficient of the 2nd order term of the 2nd order fitted polynomial was small (β2 < 0.5), and the β1 coefficient of the 1st order term of the 1st order fitted polynomial was close to one (0.95 < β1 < 1.05) and R2 > 0.9. The CI accuracy was assessed with the bias [10], which was determined as the difference between the measured value and the nominal value; the bias was plotted against the nominal value.

The precision of CI was estimated from repeatability and reproducibility metrics [10]. Repeatability was assessed with the within-subject standard deviation (wSD = standard deviation of the replicate measurements for each nominal CI value), the within-subject coefficient of variation (wCV = wSD/mean), and the repeatability coefficient (RC = 2.77wSD). The reproducibility of CI was evaluated with the correlation coefficient for the following comparisons: FBP vs. SIRT, FBP vs. fFBP, SIRT vs. fSIRT, and fFBP vs. fSIRT.

In vivo Evaluation

All experimental procedures with the animals were reviewed and approved by the Ethics Committee and the Institutional Animal Care and Use Committee of the Instituto Nacional de Cancerología, Mexico, where all the experiments took place; approval number: (018/051/IBI) (CEI/1294/18). The in vivo evaluation was performed on a virgin female Sprague–Dawley rat with chemically-induced mammary cancer. Mammary lesions were chemically induced with dimethylbenz[a]anthracene (DMBA) [28]. The animal was kept in a pathogen-free environment and fed with autoclaved food and water ad libitum. A single intragastric dose of 20 mg/ml DMBA (Sigma) dissolved in 1 ml of sunflower oil was administered to the animal (7-week-old), after a previous intraperitoneal injection of ketamine and xylazine (30 and 6 mg/kg body weight, respectively) [28]. Imaging was performed after tumor detection, which occurred 10 weeks after the inoculation of DMBA. For image acquisition, the animal was anesthetized with isoflurane (3% in 100% oxygen). A baseline image was acquired; then, a CE image was acquired during continuous infusion of a clinical contrast agent (Omnipaque 300, GE Healthcare,Wauwatosa, WI, USA; average dose = 2.4 mg of iodine/g of body weight (b.w.), infusion rate = 0.5 mL/min), via a catheter placed in the right external jugular vein of the animal. No gating (cardiac or respiratory) was used during image acquisition. Images were reconstructed, filtered, registered, and subtracted as described and converted to CI values. Mean CI and its standard deviation were quantified in FBP, SIRT, fFBP, and fSIRT images within spherical volumes of interest (VOIs) with AMIDE software [29] for several tissues. VOIs were placed in the left ventricle (LV, 3 mm diameter), abdominal aorta (0.7 mm diameter), liver (3 mm diameter), tumor (2 mm diameter), and muscle (2 mm diameter). A CNR related to muscle (CNRmuscle) was obtained for each tissue; in this case, the contrast was evaluated as the difference between CI within each tissue and CI within the muscle, and the noise was defined as the standard deviation of the mean value of CI within the muscle.

Statistical Analysis

GraphPad Prism 6 (GraphPad Software, Inc., San Diego, CA, USA) was used to perform all statistical analyzes. A Shapiro–Wilk test was used to assess the normality of the data. Data were compared in the following pairs: FBP vs. SIRT, FBP vs. fFBP, SIRT vs. fSIRT, and fFBP vs. fSIRT. Normally distributed data were compared with a one-way analysis of variance (ANOVA) test, followed by Bonferroni’s test for multiple comparisons (namely, CNR, bias, RC, CNRmuscle). Non-parametric data were compared with the Friedman test and Dunn’s multiple comparison test (namely, NPS, MTF, CI in the in vivo evaluation). Pearson correlation coefficient was used to evaluate the reproducibility of CI. An adjusted p-value less than 0.05 was considered as statistically significant.

Results

Image Quality

Figure 2 shows the results of the evaluation of image quality for FBP, SIRT, fFBP, and fSIRT images. The 2D NPS in Fig. 2a-d show isotropically distributed values for FBP and fFBP images, and anisotropically distributed values for SIRT and fSIRT images. This difference reflects the non-linearity of the SIRT algorithm, compared to the linear FBP algorithm. As can be observed in Fig. 2a-d, the BF reduces the amplitude of the noise for both reconstruction algorithms. As shown in Fig. 2e, the noise removal was more marked for the SIRT algorithm compared to FBP, and both algorithms showed a change in texture, since the peak and form of the 1D NPS curve were lost after the filtration. Statistically significant differences were found between the NPS mean values for FBP vs. SIRT (adjusted -value = 0.0025), FBP vs. fFBP (adj. p-value < 0.0001), SIRT vs. fSIRT (adj. p-value < 0.0001), and fFBP vs. fSIRT (adj. p-value = 0.0047), evaluated with the Friedman test. It is known that iterative algorithms, particularly SIRT, yield images with lower noise than images obtained with FBP [30]. However, the higher noise in SIRT images compared to FBP images observed in Fig. 2e is related to the number of iterations chosen in this work for the SIRT algorithm.

Fig. 2
figure 2

2D Noise power spectrum (NPS) for (a) FBP images: images reconstructed with a commonly used algorithm (filtered backprojection); (b) fFBP images: FBP images after the application of the bilateral filter (BF); (c) SIRT images: images reconstructed with an iterative algorithm (simultaneous iterative reconstruction technique), and (d) fSIRT images: SIRT images after the application of the BF. Image quality was assessed by the (e) 1D NPS, (f) the modulation transfer function (MTF), and (g) the contrast-to-noise ratio (CNR) for several iodine concentrations (CI)

Figure 2f shows the MTF for the scenarios evaluated. Although the values were similar, a statistically significant difference was found for FBP vs. SIRT (adj. p-value = 0.0051), and fFBP vs. fSIRT (adj. p-value < 0.0001), evaluated with the Friedman test. The agreement found between FBP vs. fFBP and SIRT vs. fSIRT, and the 1D NPS results, demonstrates the adequate functioning of the BF: it reduces image noise while it preserves the spatial resolution.

Figure 2g shows the CNR at the evaluated nominal CI values for FBP, SIRT, fFBP, and fSIRT images. Non-significant statistical differences were found when comparing these results, despite the fact that CNR was higher in the filtered images compared to the non-filtered images.

Accuracy and Precision of CI

Figure 3a and b show the replicate CI measurements compared to the nominal CI values for FBP and SIRT images; a higher variability was observed in the replicate CI measurements for the FBP images compared to the SIRT images. A linear relationship (R2 > 0.95) was found between the measured and nominal CI values for the evaluated scenarios, as shown in Fig. 3c, which reflects the similarity between the measured and nominal CI values.

Fig. 3
figure 3

Comparison of measured and nominal iodine concentration (CI) values from (a) FBP and (b) SIRT images. (c) Evaluation of the relationship between measured and nominal CI values. Assessment of the accuracy and precision of CI with (d) bias, (e) the repeatability coefficient (RC), and (f) the reproducibility of CI quantified in FBP vs. SIRT images (evaluated with the Pearson correlation coefficient)

Figure 3d shows the bias for FBP, SIRT, fFBP, and fSIRT images. Statistically significant differences were found for FBP vs. SIRT (adj. p-value = 0.0308) and SIRT vs. fSIRT (adj. p-value = 0.0308), which suggests that the use of the SIRT algorithm compared to the FBP algorithm has an impact on the accuracy of CI, while the use of the BF only affects the accuracy in SIRT images.

Figure 3e shows the RC for FBP, SIRT, fFBP, and fSIRT images. A statistically significant difference was found for FBP vs. SIRT (adj. p-value < 0.0001) and fFBP vs. fSIRT (adj. p-value < 0.0001), which suggests that the use of the SIRT algorithm compared to the FBP algorithm increases the precision of CI, however, its precision is not affected by using the BF in either the FBP or SIRT images.

Figure 3f shows an example of the evaluation of the correlation between measured CI values in FBP and SIRT images; strong associations were found for all the comparisons (Pearson r > 0.99, p-value < 0.001). These results suggest that CI values are reproducible across the reconstruction and noise removal algorithms in the evaluated CI range.

In vivo Evaluation of CI

Figure 4 shows the coronal views of baseline, CE, and subtracted CI images of an animal model of chemically-induced mammary cancer; a magnification of the tumor region is shown in the insets. As shown in Fig. 4, subtracted CI images enabled a better depiction of the tumor regions with high and low CI values and their vasculature, compared to baseline and CE images for all the evaluated scenarios. Qualitatively, SIRT images showed a higher noise content than FBP images, as well as a slightly better spatial resolution indicated by the better definition of some structures like the ribs of the animal; these results agree with the quantitative evaluation of image quality shown in Fig. 2. The yellow arrows in Fig. 4 indicate the enhancement of the streak artifacts present in the FBP images as a result of the use of the BF. The enhancement of these artifacts is observed as thicker alternating black lines. The blue arrows in Fig. 4 indicate a slightly better definition of the vasculature in SIRT images after the application of the BF (fSIRT image), compared to the SIRT images, despite no significant difference being found in the MTF for these images. A bright artifact can be observed in the boundary between the lungs and the liver of the animal in the subtracted CI image in Fig. 4, due to the misregistration caused by respiratory motion (no gating was used during image acquisition). If respiratory gating were used, it could improve the quality of the subtraction and potentially reduce this artifact [17].

Fig. 4
figure 4

Coronal views of baseline, contrast-enhanced (CE), and subtracted micro-CT images of an animal model of chemically-induced mammary cancer. No gating was used during image acquisition. Images were reconstructed with an FBP or SIRT algorithm and filtered with a bilateral filter (fFBP and fSIRT, respectively). Subtracted images are shown in units of iodine concentration (CI). The yellow arrows show artifacts in the fFBP images; the blue arrows show tumor vessels. HU: Hounsfield units

Figure 5a shows the quantitative in vivo evaluation of CI in several tissues in FBP, SIRT, fFBP, and fSIRT images. An agreement was found among the measured CI values for each scenario, which agrees with the reproducibility results exemplified in Fig. 3f. Figure 5b shows the CNRmuscle, which was quantified from the contrast observed between each tissue and muscle. Statistically significant differences were found in CNRmuscle quantified in FBP vs. SIRT (adj. p-value = 0.0168), FBP vs. fFBP (adj. p-value = 0.0003), and SIRT vs. fSIRT (adj. p-value = 0.0003), evaluated with one-way ANOVA, which suggest that filtered images, regardless of the reconstruction algorithm, showed a higher image quality. This result highlights the importance of performing the in vivo evaluations since the phantom study showed that the difference in CNR was not statistically significant.

Fig. 5
figure 5

In vivo quantitative assessment of (a) iodine concentration (CI) and (b) image quality, assessed with the contrast-to-noise ratio related to muscle (CNRmuscle). LV = left ventricle, ns = non-significant, *Adjusted p-value = 0.0168, ***adj. p-valued = 0.0003; evaluated with one-way ANOVA and Bonferroni’s test for multiple comparison

Discussion

In this work, we have evaluated the effect of reconstruction (by FBP-based vendor’s and in-house SIRT algorithms) and noise removal (by a 3D BF algorithm) on the accuracy and precision of CI quantified with subtracted CE micro-CT. A phantom study evaluated and compared image quality, and the accuracy and precision of CI under four scenarios: FBP, SIRT, fFBP, and fSIRT images. In vivo experiments evaluated image quality and the reproducibility of CI under the same scenarios in an animal model of chemically-induced mammary cancer.

The image quality evaluation in the phantom study showed a significant effect of the SIRT and BF algorithms on the image noise; however, this effect did not introduce significant changes in the CNR. Specifically, the BF reduced the noise and maintained the spatial resolution for the two reconstruction algorithms evaluated, as expected. As expected from the optimization of the number of iterations for the SIRT algorithm, the noise in SIRT images was higher than the noise in FBP images. The CNR was higher in the filtered images compared to the non-filtered images, however, this difference was non-significant. One of the main findings of this work was that the use of the SIRT algorithm significantly improved the accuracy and precision of CI by reducing the bias and the RC, respectively, compared to the FBP algorithm. Interestingly, it was also found that the use of the BF maintained the improvement in the precision of CI for the SIRT images, which encourages its use in subtracted CE micro-CT imaging. For the reproducibility, a high association was found among the scenarios evaluated, as demonstrated by the strong correlation coefficients found in this work for the evaluated CI range.

The in vivo study demonstrated the reproducibility of CI values quantified in images of the animal model. The main finding of the in vivo study was that a significant improvement in image quality was observed with the use of the BF, although non-significant differences were found in the phantom study. This significant result demonstrates the relevance of performing preliminary in vivo studies in the validation stage, since they involve biological variables that phantoms cannot usually resemble.

A direct comparison with previously reported findings is not possible since, to our knowledge, this is the first study that specifically addresses the effects of iterative reconstruction and noise removal algorithms on the accuracy and precision of CI in subtracted CE micro-CT. However, some similarities can be found with studies performed in CECT imaging. In a phantom study, Euler et al. found that the accuracy of CI depended on the scanner type, patient-related factors (such as size), radiation dose, and the reconstruction algorithm [4]. For the reconstruction algorithm, it was observed that the use of iterative algorithms (although different from SIRT) increased the accuracy of CI compared to FBP, in agreement with our results for SIRT. In another phantom study, Chen et al. found that the precision of CI was related to the radiation dose and the reconstruction algorithm [6]; it was observed that the RC decreased when IR algorithms were used compared to FBP, which agrees with our results.

Some authors have explored the effect of noise removal algorithms (different from IR algorithms) on image quality or the quantification of vascular parameters. Davidoiu et al. found that NPS decreased and CNR increased when different noise removal algorithms were applied to FBP-reconstructed micro-CT images of digital and physical phantoms [31]. Although the algorithms evaluated did not include the BF, in general, those findings agree with ours for fFBP images. In another study, Yeung et al. found similar image quality results both in a digital phantom and in an animal model of glioma when another noise removal and an IR algorithm were used [32]. Moreover, they found that the accuracy of perfusion parameters such as blood volume or blood flow increased when the noise removal algorithm was applied to images of a digital phantom, which is in general agreement with our results for CI in fSIRT images. In the clinical scenario, Pisana et al. found a high similarity between ground truth values of blood volume and values obtained from FBP images filtered with a modified BF [33]. Additionally, Pisana et al. found a higher CNR in the filtered images compared to non-filtered images, which is in agreement with our results.

The present work has some limitations. First, the accuracy and precision of CI were evaluated with calibrated phantoms with a limited concentration range (0–3 mg I/ml). This CI range could be appropriate for tumor studies, since previous works have reported values of CI < 5 mg I/ml in several animal cancer models [1, 11, 14, 15]. However, a wider range of CI should be evaluated to extend the understanding of the accuracy and precision of CI in several organs and tissues. Another limitation is that only one size of the calibrated phantom was evaluated, and it has been demonstrated that the accuracy of CI depends on the phantom size [4]. Besides evaluating different-sized phantoms, a non-uniform background and contrast extravasation could be incorporated to simulate the complex structure of the in vivo studies [31]. The discrepancy found between the phantom and the in vivo results for the CNR could be related to this lack of complexity in the phantoms used in this study. Finally, although it has been demonstrated that acquisition parameters such as kilovoltage and radiation dose significantly impact the accuracy and precision of CI, those parameters were not considered in the present study since the imaging protocol had been previously optimized and standardized [21, 22]. This optimization was carried out in a systematic and rigorous manner, and it included the optimization of image quality and radiation dose.

Conclusion

The improvements in accuracy and precision of CI enabled by the SIRT and BF algorithms compared to FBP and non-filtered images encourages their use in subtracted micro-CT imaging.

Conflict of Interest

Lizbeth Ayala-Domínguez, Luis Alberto Medina, and María Ester Brandan report a grant from Consejo Nacional de Ciencia y Tecnología (CONACyT), grant CB-251497, during the conduct of the study. Marcela Lizano and Carmen Aceves declare that there is no conflict of interest.