Background

The need for pediatric computed tomography (CT) examinations is constantly increasing despite the "as low as reasonably achievable" principle and concerns of radiation hazards for children. Pediatric body CT, including in emergency rooms and in tumor patients, is an important imaging test in children. With the development of technology, efforts to reduce the radiation dose have continued steadily, with the development and use of iterative reconstruction (IR) as a typical example.

Over the past decade, the IR algorithm has been used to produce high-resolution images by decreasing image noise through the use of computational processing, resulting in better image quality with lower radiation dose compared with single reconstructed filtered back projection (FBP) in adults [1, 2] and children [3,4,5,6]. The recently developed adaptive statistical iterative reconstruction-V (ASIR-V) technique provides a short reconstruction time with better image quality and lowers radiation dose than other IR algorithms [7, 8]. However, ASIR-V still does not overcome excessive image smoothing and unnatural image appearance. Hybrid IR images that blend IR with FBP can be used to decrease this texture problem, although a trade-off between image noise and image texture occurs [9].

Recently, image denoising algorithms using artificial neural networks, termed deep learning reconstruction (DLR), have been applied to CT image reconstruction to overcome the drawbacks of IR while achieving good image quality [10,11,12,13,14,15]. However, there have been a limited number of studies evaluating this technique in a small number of children and the technique was only evaluated in abdomen CT images [16,17,18]. The purpose of our study was to compare the objective and subjective image quality of DLR and IR on pediatric abdomen and chest CT images.

Methods

This was a retrospective study approved by the institutional review board at our institution, and the need for informed consent was waived.

Study population

We included all consecutive pediatric patients who underwent chest or abdomen CT at our institution between February 2020 and October 2020 with the same CT system (Revolution CT; GE Healthcare), which has a routine protocol including DLR. We retrospectively reviewed 51 patients. There were 34 boys and 17 girls with a mean age of 11.5 ± 4.6 years (range 1–18 years). Non-enhanced chest CT (n = 16), contrast-enhanced chest CT (n = 12), and contrast-enhanced abdomen CT (n = 23) images were included. Height and weight were recorded at the time of CT examination and BMI was calculated. Body weight group was divided as < 20 kg, 20–60 kg, and > 60 kg.

Phantom study

In general, signal to noise (SNR) and contrast to noise ratio (CNR) are used to measure the amount of noise (magnitude) in images. However, the standard deviation (SD) used in the SNR and CNR calculations has different values depending on the region of interest (ROI) position in the human body image with a non-homogeneous medium, and SNR and CNR only evaluate the noise magnitude. Noise power spectrum (NPS) is a method that can evaluate the magnitude and texture of image noise in the spatial frequency domain [19] and it can overcome the drawbacks of SD measurement in SNR and CNR calculation. For NPS analysis, we scanned the uniformity module of the Catphan 500 phantom (Catphan 500, The Phantom Laboratory, NY, USA), and performed three scans including the dose level of the patient image used in this study. We directly implemented a 3D-based NPS based on the method presented by the American Association of Physicists in Medicine (AAPM) [20], and used Matlab (Version R2017a, The MathWorks, Inc., MA, USA) for this calculation.

Scanning technique and radiation dose measurements

All patients were examined using a 256-slice CT (Revolution CT; GE Healthcare). Peak kilovoltage (kVp) was divided in to three groups by weight: 100 kVp for > 40 kg, 80 kVp for 15–40 kg, and 70 kVp for < 15 kg. An automatic dose modulation technique (Smart mA; GE Healthcare) was used with a range of 50–200 mAs. The noise index was 33 for abdomen CT and 22 for chest CT. Other parameters used to generate images were as follows: gantry rotation time, 0.35 s; coverage speed, 226.79 mm/s; pitch, 0.992:1; and slice thickness, 2.5 mm.

Weight-based IV contrast injection was used with settings of 1.5–2.0 ml/kg with a maximum of 100 ml, using 300 mg iodine/ml concentration intravenous contrast iobitridol (Xenetix; Laboratoires Guerbet). The contrast was injected through an upper extremity peripheral intravenous line, followed by a saline chaser of 0.5 ml/kg. Injection speed was adjusted for a total injection time of 15 s or less. For contrast-enhanced abdomen CT, a fixed time interval of 60 s after contrast injection for portal phase without bolus tracking was used. For contrast-enhanced chest CT, a circular ROI was placed at the main pulmonary artery and the CT scan began 4 s after the threshold attenuation of 100 Hounsfield units (HU) was reached.

Four axial reconstructions were generated for each patient with a 2.5 mm slice thickness and 2.5 mm slice interval according to the standard algorithm: 50% ASIR-V, 100% ASIR-V, medium- and high-strength DLR (TrueFidelity; GE Healthcare). We set the blending factors to 50% and 100% according to previous experience [3, 4]. DLR provides three selectable reconstruction strength levels (low, medium, and high) to control the amount of noise reduction with a standard reconstruction kernel. We chose medium and high based on our preliminary experience. TrueFidelity is the first clinically available deep learning-based CT reconstruction technique which is based on deep neural network trained with low-dose raw CT projection data. The ground truth data used to train the algorithm were filtered back projection CT images resulting from ideal data acquisition conditions, both from phantoms and patients in a clinical setting. The output is a reconstructed image that appears as if it had been reconstructed from high-dose raw CT data. However, the details about the network architecture and the training process are not publicly available [21].

The CT dose index volume (CTDIvol, mGy) and dose-length product (DLP, mGy × cm) of all patients were recorded in both CT examinations. CTDIvol was converted to size-specific dose estimates (SSDE) based on the American Association of Physicists in Medicine Report 204 [22]. Patient-specific dimensions were obtained from axial CT images at the carina on chest CT and at the main portal vein on abdomen CT. We used the sum of anteroposterior and lateral dimensions to determine patient effective diameter and conversion factors. The following equation was used to calculate the effective dose (ED, mSv): ED = DLP × WT (tissue-weighting factor; variable according to kVp, organ, and age [23]). Tissue-weighting factors of less than 80 kVp are unknown, so a tissue-weighting factor of 80 kVp was adopted for 70 kVp studies.

Quantitative image analysis

Quantitative analysis of axial images was performed by a board-certified radiologist with 9 years of experience. The mean attenuation (HU) and SD were measured by manually placing the round ROI (8–10 mm in diameter) using a picture archiving and communication system (PACS) workstation (Centricity Radiology RA1000; GE Healthcare) in the mediastinal/soft-tissue window setting (window level, 50 HU; window width, 350 HU). On chest CT images, ROIs were placed in lung and paraspinal muscles at the level of the carina. On abdomen CT images, ROIs were placed in liver, aorta, and paraspinal muscles at the level of the main portal vein on axial images. To obtain reliable measurements for the areas, each ROI was positioned to encompass the homogeneous portion and did not include surrounding structures or vessels. Image noise was defined as the SD of the pixel values obtained from the paraspinal muscle. Both contrast- and signal-to-noise ratios (CNR and SNR) were defined as CNR = |HUobject − HUmuscle|/SDnoise and SNR = HUobject/SDnoise [24]. Also, we calculated the NPS peak (HU2 mm2) and NPS average spatial frequency (mm−1) from each NPS curve measured using phantom. The NPS peak shows the magnitude of the noise, and the NPS average shows the texture of the noise.

Qualitative image analysis

CT images were independently reviewed by two board-certified pediatric radiologists with 17 and 9 years of experience who were blinded to the clinical findings and the CT reconstruction methods. Images were displayed on the PACS in random order and two radiologists independently recorded their opinions on overall image quality, noise, and motion or beam hardening artifacts. A four-point scale was used: 4 was superior, 3 was average, 2 was suboptimal, and 1 was unacceptable.

Statistical analysis

All statistical analyses were performed using MedCalc software (version 12.1.0; MedCalc Software). Patient demographic characteristics and dose descriptors (CTDIvol, DLP, SSDE, and ED) are summarized and presented as the mean and SD. Repeated measures ANOVA with pairwise comparisons and Bonferroni correction were performed to compare the reconstructions concerning attenuation, noise, CNR, and SNR. Wilcoxon signed rank and Cohen kappa tests were performed to compare qualitative evaluation and to assess interobserver agreement. Agreement between reviewers is expressed as κ values: κ values of 0–0.20, 0.21–0.40, 0.41–0.60, 0.61–0.80, and greater than 0.81 indicated poor, fair, moderate, good, and excellent agreements, respectively. A p-value of less than 0.05 was considered statistically significant.

Results

The mean weight and BMI of the patients were 44.3 ± 18.9 kg and 20.3 ± 5.1 kg/m2, respectively. Two patients had metallic hardware within the scanned field of view of the CT images. CTDIvol, DLP, SSDE, and ED of chest CT images were 1.3 ± 0.5 mGy (range, 0.6–2.5 mGy), 49.0 ± 26.3 mGy × cm (range 16.5–112.9 mGy × cm), 2.0 ± 0.6 mGy (range 1.2–3.1 mGy), and 2.2 ± 3.2 mSv (range 0.7–16.1 mSv), respectively. CTDIvol, DLP, SSDE, and ED of abdomen CT images were 1.5 ± 0.6 mGy (range, 0.4–3.2 mGy), 77.9 ± 35.0 mGy × cm (range 12.6–147.7 mGy × cm), 2.5 ± 0.9 mGy (range, 0.8–4.7 mGy), and 2.0 ± 0.7 mSv (range, 0.7–3.7 mSv), respectively.

Quantitative image assessment

The results of the quantitative image assessment are summarized in Table 1 and Fig. 1. The mean attenuation values between reconstructions were equivalent.

Table 1 Quantitative image analysis of pediatric CT with different reconstruction techniques in comparison with 50% ASIR-V
Fig. 1
figure 1

Box-and-whisker plots of quantitative pediatric CT image analyses with different reconstruction techniques. When compared with 50% adaptive statistical iterative reconstruction-V (ASIR-V), high strength deep learning reconstruction (DLR-H) was associated with a noise reduction, b better contrast to noise ratio (CNR), and c better signal to noise ratio (SNR). ASIR-V 50 50% adaptive statistical iterative reconstruction-V, ASIR-V 100 100% ASIR-V, DLR-M medium strength deep learning reconstruction, DLR-H high strength DLR, CE contrast-enhanced

When compared with 50% ASIR-V, high strength DLR was associated with noise reduction in non-contrast chest CT (33.0%), contrast-enhanced chest CT (39.6%), and contrast-enhanced abdomen CT (38.7%) with increases in CNR at 149.1%, 105.8%, and 53.1%, respectively, and increases in SNR at 148.6%, 106.3%, and 57.4%, respectively (Fig. 2, Additional file 1: Fig. S1–S4).

Fig. 2
figure 2

Abdomen CT images with contrast enhancement in a 15-year-old boy who had abdominal pain with a BMI of 19.9 kg/m2. ad Axial contrast-enhanced CT images of the same anatomical location show image quality comparison between a standard 50% adaptive statistical iterative reconstruction-V (50% ASIR-V), b 100% ASIR-V, c medium-strength deep learning image reconstruction (DLR-M), and d high-strength deep learning image reconstruction (DLR-H). Contrast to noise ratio (CNR) in the liver was 2.18 in 50% ASIR-V, 2.84 in 100% ASIR-V, 3.03 in DLR-M, 3.88 in DLR-H

Medium strength DLR also showed decreased noise in abdomen CT, but no significant difference was found in noise in chest CT when compared with 50% ASIR-V. Medium strength DLR showed better CNR and SNR in both non-contrast and contrast-enhanced chest CT; however, there was no significant difference in CNR and SNR in abdomen CT.

When compared with 100% ASIR-V, high strength DLR showed improved CNR in chest CT images without contrast enhancement by 24%. However, there was no significant improvement in CNR in both chest CT and abdomen CT images with contrast enhancement (Additional file 2: Table S1).

Figure 3 shows the NPS curves according to the clinical dose levels and image reconstruction methods, and the NPS peak and average spatial frequency for each NPS curve are summarized in Table 2. In all image reconstruction methods, as the dose increased (1 to 5 mGy), the NPS peak decreased, and the decrease rate was similar to about 21%. At the same dose level, the NPS peaks of all reconstitution methods decreased in the order of 50% ASIR-V, DLR-M, 100% ASIR-V, and DLR-H. However, the peaks of 100% ASIR-V and DLR-M were almost similar. In all image reconstruction methods, the NPS average spatial frequency showed no significant difference according to the change in dose. However, DLR methods overall showed higher average spatial frequency values than ASIR-V, and in particular, the average spatial frequency of 100% ASIR-V showed the lowest average. Overall, the DLR methods showed a pattern of remarkably reducing the magnitude of noise while maintaining the texture.

Fig. 3
figure 3

Noise power spectrum (NPS) results measured by a uniform phantom. ac Each line represents the standard 50% adaptive statistical iterative reconstruction-V (50% ASIR-V, blue line), 100% ASIR-V (blue dotted line), medium-strength deep learning image reconstruction (DLR-M, red line), and high-strength deep learning image reconstruction (DLR-H, red dotted lines) at the dose level of a 1 mGy, b 3 mGy, and c 5 mGy. The NPS peaks of all reconstitution methods decreased in the order of 50% ASIR-V, DLR-M, 100% ASIR-V, and DLR-H

Table 2 Peaks and average spatial frequency of noise power spectrum (NPS) curve

We also analyzed the effects of body weight on noise reduction. In DLR group, the paraspinal muscle noise reduction was better in patients over 20 kg than in patients under 20 kg in both high strength group (noise: 16.9 in < 20 kg group vs. 13.3 in 20–60 kg group [p = 0.033] and 12.4 in > 60 kg group [p = 0.015]) and medium strength group (noise: 23.2 in < 20 kg group vs. 18.2 in 20–60 kg group [p = 0.028] and 17.7 in > 60 kg group [p = 0.014]). However, the noise was not different according to the body weight group in ASIR-V images.

Qualitative image assessment

The results of the subjective image quality analyses are summarized in Table 3 and Fig. 4. The subjective assessment of overall image quality and noise were also better on DLR images both on medium and high strength compared to 50% ASIR-V (p < 0.001). The agreement was moderate for overall image quality and good for noise in high strength DLR (p < 0.001). However, there was poor agreement in both image quality and noise in medium strength DLR (p < 0.001). There was no significant difference in motion or beam hardening artifacts between reconstruction methods with an excellent interobserver agreement (κ = 0.944, p < 0.001) (Fig. 5).

Table 3 Distribution of subjective image scoring for different reconstruction techniques by two pediatric radiologists
Fig. 4
figure 4

Qualitative image analysis of chest and abdomen CT from different reconstruction techniques. The four-point scale was used as follows; superior (4), average (3), suboptimal (2), unacceptable (1). Deep learning reconstruction (DLR) showed better overall image quality and noise compared with 50% adaptive statistical iterative reconstruction-V (ASIR-V); however, artifacts were not different between different reconstruction techniques. ASIR-V 50 50% adaptive statistical iterative reconstruction-V, ASIR-V 100 100% ASIR-V, DLR-M medium strength deep learning reconstruction, DLR-H high strength DLR

Fig. 5
figure 5

Chest CT images with mediastinal window of a 1-year-old girl who had a cough. ad Axial contrast-enhanced CT images with a standard 50% adaptive statistical iterative reconstruction-V (50% ASIR-V), b 100% ASIR-V, c medium-strength deep learning image reconstruction (DLR-M), and d high-strength deep learning image reconstruction (DLR-H) show no difference in beam hardening artifacts due to dense contrast material in the superior vena cava (arrow) and motion artifacts in the bilateral ribs (arrow heads), resulting in lower reader scores for artifacts. Both readers thought the image was suboptimal. Contrast to noise ratio (CNR) of the lung was 4.8 in 50% ASIR-V, 12.0 in 100% ASIR-V, 13.0 in DLR-M, and 15.0 in DLR-H

Discussion

Our study found that DLR can improve the quantitative and qualitative image quality in pediatric chest and abdomen CT relative to advanced IR technique, our standard 50% ASIR-V. High-strength DLR showed significant noise reduction with increased CNR and SNR. DLR also scored significantly better for image quality and noise subjectively. However, motion or beam hardening artifacts were not decreased with deep learning method, regardless of strength.

There have been efforts to improve image quality of low dose CT imaging by decreasing noise and artifacts with various reconstruction methods [25,26,27]. Recently, the DLR algorithm has been developed for CT to remove image noise. The effect of DLR on image quality and its potential to lower patient radiation dose is being investigated. A phantom study demonstrated that DLR had superior noise, magnitude, noise texture, and spatial resolution [11]. Another study also showed that DLR improves the image quality through noise reduction and increased CNR without altering the image texture on abdomen CT [12]. They demonstrated that subjective diagnostic confidence was increased in all DLR images when compared with ASIR-V with a 30% blending factor, and the higher strength in DLR lowers the noise with increased sharpness [13]. The SNR and CNR values of high-strength DLR images were higher than those of ASIR-V with 80 or 100% blending factor. Similar results were also reported in studies with different vendor systems and algorithms [10, 14, 15].

DLR has been introduced to pediatric patients in a few studies of abdomen CT [16,17,18]. Lim et al. [16] studied a 5-year-old patient’s phantom and pediatric abdomen CT exams using a vendor-neutral DLR technique and demonstrated similar image quality with a hybrid IR technique. Brady et al. [17] used contrast-enhanced abdomen CT with DLR algorithm showing improved object detectability, reduced image noise, and high radiologist preference when compared to conventional IR images. About a 51% dose reduction using DLR was hypothesized based on mathematical extrapolation from this retrospective study. Lee et al. [18] used DLR with low iodine concentration abdominal dual-energy CT and showed decreased noise in DLR images without difference in CNR, overall image quality, and diagnostic quality of lesions. The CTDIvol and total iodine administration were lower in dual energy CT with DLR. Both studies suggested that DLR has the potential to improve image quality and potentially reduce patient radiation dose. However, no study has evaluated the role of DLR in pediatric chest CT and the effect of DLR on image artifacts.

Our study shows similar results in noise reduction and quality improvement. High strength DLR was associated with noise reduction in non-contrast chest CT, contrast-enhanced chest CT, and contrast-enhanced abdomen CT with an increase in both CNR and SNR. The subjective assessment of overall image quality and noise were also better on DLR images both on medium and high strength DLR compared to 50% ASIR-V. Our study showed no significant difference in attenuation values of the organs in pediatric chest and abdomen. This result is comparable with a previous report with an adult population [12]. Therefore, we can use CT images with DLR for attenuation analyses such as emphysema index measurements.

Previous studies have focused on noise reduction and image quality improvement of DLR with little focus on artifacts. DLR scored better on artifacts than 30% ASIR-V images in a previous study [12]. Another study reported no DLR related image artifacts [14]. A prior study has reported more frequent distortion artifacts with DLR [28]. In our study, there was no significant difference in artifacts between reconstruction methods with excellent inter-observer agreement on artifacts. Mainly these artifacts were beam hardening artifacts from metal or dense contrast media in vessels. The motion and beam hardening artifact reduction were not significant by TrueFidelity in our study. This may be due to a lack of learning about these artifacts and may suggest that TrueFidelity is weak in this perspective. Future learning about these artifacts may be required for better image reconstruction. However, unlike previous study, there was no significant distortion artifacts in our study. Depending on the purpose and input data of the DLR technology, the role of DLR may vary. It would be better if DLR algorithm is developed as an open source so that it can be used in various equipment and undergo further development by other researchers.

Our study has limitations. First, the sample size of our retrospective study was small, and we could not evaluate lesion detectability or diagnostic accuracy. Second, the data is from a designated vendor’s DLR algorithm. Since it was hard to get the projection data from the vendors directly, we could not compare other DLR, such as the image-domain-based method. Third, the number of patients with artifacts was not the majority of the patient population. Fourth, from the retrospective nature of our study, we could not compare images between FBP and DLR. Fifth, our study cannot suggest an estimated radiation dose reduction using DLR. Additional prospective studies with more patients are needed.

Conclusions

Compared with 50% ASIR-V, DLR improved the CT evaluation of pediatric chest and abdomen images with significant noise reduction. However, motion or beam hardening artifacts were not decreased by DLR, regardless of strength.