The effect of deep learning reconstruction on abdominal CT densitometry and image quality: a systematic review and meta-analysis

Objective To determine the difference in CT values and image quality of abdominal CT images reconstructed by filtered back-projection (FBP), hybrid iterative reconstruction (IR), and deep learning reconstruction (DLR). Methods PubMed and Embase were systematically searched for articles regarding CT densitometry in the abdomen and the image reconstruction techniques FBP, hybrid IR, and DLR. Mean differences in CT values between reconstruction techniques were analyzed. A comparison between signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) of FBP, hybrid IR, and DLR was made. A comparison of diagnostic confidence between hybrid IR and DLR was made. Results Sixteen articles were included, six being suitable for meta-analysis. In the liver, the mean difference between hybrid IR and DLR was − 0.633 HU (p = 0.483, SD ± 0.902 HU). In the spleen, the mean difference between hybrid IR and DLR was − 0.099 HU (p = 0.925, SD ± 1.061 HU). In the pancreas, the mean difference between hybrid IR and DLR was − 1.372 HU (p = 0.353, SD ± 1.476 HU). In 14 articles, CNR was described. In all cases, DLR showed a significantly higher CNR. In 9 articles, SNR was described. In all cases but one, DLR showed a significantly higher SNR. In all cases, DLR showed a significantly higher diagnostic confidence. Conclusions There were no significant differences in CT values reconstructed by FBP, hybrid IR, and DLR in abdominal organs. This shows that these reconstruction techniques are consistent in reconstructing CT values. DLR images showed a significantly higher SNR and CNR, compared to FBP and hybrid IR. Key Points CT values of abdominal CT images are similar between deep learning reconstruction (DLR), filtered back-projection (FBP), and hybrid iterative reconstruction (IR). DLR results in improved image quality in terms of SNR and CNR compared to FBP and hybrid IR images. DLR can thus be safely implemented in the clinical setting resulting in improved image quality without affecting CT values. Supplementary Information The online version contains supplementary material available at 10.1007/s00330-021-08438-z.


Introduction
Computed tomography (CT) acquires images of tissues inside the human body. Photons, emitted by an X-ray tube, interact with human tissue and either get absorbed due to the photoelectric effect or get scattered due to Compton scattering. A fraction of the initial photon beam leaves the patient's body and is detected on the opposite side of the CT X-ray tube. The ratio between the incident intensity and the emerging intensity is related to the attenuation coefficient, which forms a material-specific property. Hence, by reconstructing the attenuation coefficient for each voxel (a pixel within a 3-dimensional image), imaging and identification of tissues can be carried out in a non-invasive way. However, since attenuation coefficients of various soft tissues have values close to each other, especially in the abdomen, the Hounsfield unit (HU) was introduced, which is a measure relative to the density of water. These CT values, expressed in HU, are calculated by the following formula: To reconstruct attenuation measurements by the CT scanner into HUs and display these in a comprehensible image suitable for clinical diagnosis, a method for iterative data reconstruction was proposed. This technique simulated the CT system, and based upon this model, it iteratively adjusted measured data for various factors influencing the attenuation measurements as e.g., estimated dose or noise effects. However, due to large amounts of data and the absence of sufficient computational power, iterative reconstruction (IR) was not competent for clinical practice [1].
Instead, the less computational demanding, and hence much faster, reconstruction technique filtered back-projection (FBP) was introduced. This reconstruction technique has been the most used technique for decades, until the awareness of radiation-induced health effects, such as the formation of neoplasms, grew among society. Hence, a new reconstruction technique got introduced that allowed for low-dose CT measurements while maintaining image quality in terms of signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) [2]. This reconstruction technique resembles the initially proposed IR method, but is a combination of FBP and full IR, hence called hybrid IR. However, radiologists occasionally assess CT images reconstructed by hybrid IR as being too artificial-looking, because hybrid IR does not model the complete CT system [3][4][5]. Nonetheless, recent developments in computing power and the use of artificial intelligence have made it possible to apply full IR. Full IR is also referred to as model-based IR (MBIR), as it incorporates a model that simulates the CT system, including its photon beam formation principle and several 1000 * tissue − water water . factors that may deform it, due to e.g. the focal spot size or the beam hardening effect. Characteristic for MBIR is the use of backward and forward projection steps during each iteration cycle. This method enables optimizing the true data based on comparisons with estimated artificial data, which is updated for each new cycle. These comparisons are iterated until no corrections need to be made in the true data, or until the maximum number of iterations has been reached [4].
Deep learning reconstruction (DLR) uses this base of MBIR to further improve image quality. Deep learning is a subset of artificial intelligence and uses convolutional neural networks (CNNs) to learn from the input data itself. These CNNs are trained on high-quality labelled CT images and learn from unlabelled data in clinical settings just like radiologists are trained on the job. The CNNs can differentiate even better between noise and signal in comparison to FBP or hybrid IR, and therefore, even more dose reduction is possible while maintaining image quality and detail preservation [6]. DLR techniques are now being introduced into clinical practice, but whether these algorithms are applied correctly still needs to be investigated carefully. Especially for its application in abdominal CT scans-in which low contrast resolution is essential-it is important that CT values are reconstructed correctly. As DLR simulates the complete CT system, contrary to FBP and hybrid IR techniques, one might assume that CT reconstructions made by DLR may result in different CT values for certain tissues than with the use of FBP or hybrid IR. This can lead to serious complications when CT values reconstructed by DLR differ much from FBP or hybrid IR, that established cut-off-values in certain diagnostic methods will not hold anymore for DLR reconstructed CT images resulting in missing diagnoses [7]. Hence, in this systematic review and meta-analysis, we sought to evaluate differences between DLR reconstructed CT values and FBP or hybrid IR reconstructed CT values. Furthermore, we compared the image quality of these three reconstruction techniques.

Methods
This systematic review and meta-analysis were performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [8].

Literature search
PubMed and Embase databases were searched to find studies that describe the performance of one or more CT image reconstruction techniques. The following search terms were used: ("CT Densitometry" OR ("Densitometry" AND "Tomography, X-Ray Computed")) AND "Hounsfield Units" AND ("Filtered Back Projection" OR "Iterative Reconstruction") AND ("Artificial Intelligence" OR "Deep Learning") AND "Abdomen". The complete search string is described in Appendix A. No beginning search date was set; the literature search was updated until May 10, 2021.

Exclusion criteria
The titles and abstracts of all studies were independently and blindly screened by two researchers. If the two researchers did not agree about inclusion or exclusion, a final decision was made by a third researcher. Papers were excluded if the title clearly indicated that CT densitometry was not described.
The exclusion criteria for abstract screening were as follows: (1) FBP, hybrid IR, or DLR were not described; (2) articles about segmentation; (3) not specified to the abdomen; (4) no densitometry described; and (5) full text not available in English or Dutch. The exclusion criteria for full-text eligibility were as follows: (1) no CT values, SNR, or CNR described and (2) no DLR on abdominal organs described.

Quality assessment
All included studies were assessed with a custom-made quality assessment (QA). The aspects of the QA scale were valuable for assessing quality to all authors. A score with a range of 0-18 was assigned to each study. Points were scored for the following aspects: (1) study design; (2) data collection; (3) samples; (4) statistical analysis; (5) funding; (6) material description; (7) DLR vs FBP/IR comparison; and (8) region of interest (ROI) description. This list also contains aspects of other QA scales for quantitative studies [9]. A point was awarded if the answer to the question was "Yes". A list of all questions is given in Appendix B. Those with a score of 6 or lower were classified as low-quality studies, 7-11 as moderate-quality studies, and 12 or more as high-quality studies.

Descriptive analysis
Descriptive data were extracted independently by two authors. An overview of differences in CT values, SNR, and CNR between reconstruction techniques was given. Study-specific variations were reported. Data on subjective image quality in the liver was extracted by one author. The average diagnostic confidence of hybrid IR and DLR of one vendor was assessed by two radiologists on a scale from 1 to 5, where 1 = unacceptable, 2 = suboptimal, 3 = acceptable, 4 = good, and 5 = excellent.

Data extraction
Data of CT values were only extracted for the meta-analysis if the vendor was used in more than one included study. To rule out variations between vendor's reconstruction algorithms, one meta-analysis per vendor was made. The mean and standard deviation of the CT values were extracted for all organs that were examined in multiple included articles. When images were reviewed by more than one radiologist, the data of the most experienced one were extracted. The mean difference was calculated by extracting the CT values reconstructed by DLR from the values reconstructed by hybrid IR.

Statistical analysis
The data were analyzed using random effects (DerSimonian-Laird method) in OpenMeta[Analyst] [10] software version 10.12. Statistical heterogeneity was evaluated by calculating I 2 statistics. Low heterogeneity was defined as I 2 < 50%. The statistical significance was defined as p < 0.05. The results were summarized in forest plots.

Study characteristics
All 16 articles were either retrospective or prospective studies. All included studies compared DLR to another reconstruction technique, either a hybrid IR technique, FBP, or both. In all studies, images were reconstructed with different techniques, while using the same raw image data. Table 1 summarizes the characteristics of each study, such as the number of patients, which reconstruction techniques were compared, and what quantitative outcome measures were described.
Per study the formula of CNR is shown in Table 1. All 6 articles included in the meta-analysis tried to determine image quality of hybrid IR and DLR. Study goals showed similarity and are shown in Table 1.
In all 6 studies, image quality was assessed independently by two radiologists blinded for image reconstruction. Slice thickness varied between 1.25 and 5.0 mm. ROI placement was linked for different reconstruction techniques in three of six articles.

Quality assessment
The results of the quality assessment are summarized in Table 2. The complete overview of the specific points given to the studies can be found in Appendix B. All included studies reported the CT scan protocol. In addition, all studies compared DLR to a hybrid technique. Four studies compared FBP to DLR. A total of 12 studies were rated as high quality and 4 studies as moderate quality.

Descriptive results
Kaga et al. and Park et al. showed a statistically significantly higher mean CT value in liver tissue in DLR than in hybrid IR [16,22]. However, these differences were clinically not relevant with maximum differences of 0.5-0.6 HU. All other studies that were included in the meta-analysis found no significant difference between reconstruction techniques in CT values, measured in the liver, spleen, pancreas, or renal cortex. Kim et al. pointed out that CT   [12,18,24,25]. The difference in mean CT value found in these studies was not significant in abdominal organs. Steuwe et al.described that DLR was significantly different in the comparison between DLR and IR (p = 0.007) in spleen tissue; other abdominal tissues were not significantly different between these reconstruction techniques. The average CT values for liver, spleen, and fat tissue obtained by the three reconstruction techniques (FBP, IR, and DLR) were within ± 0.3 HU (absolute difference) and ± 1.3% (relative difference) [24]. Zeng et al.also showed no statistical differences in mean CT values of liver and fat tissues between low-dose DLIR and low-dose IR (mean CT value difference was 0.3 HU (p = 0.837) and 1.8 HU (p = 0.118), respectively) [26].

Quantitative results
Six articles showed CT values of GE Healthcare's ASiR-V 30-50% (hybrid IR) and TrueFidelity (DLR) reconstruction techniques, so these papers were included in the meta-analysis [13,[15][16][17][18]21]. CT values were analyzed for the liver, spleen, and pancreas. All six studies examined the liver, five included the spleen, and only two studies measured CT values in the pancreas.

Discussion
The results, despite being insignificant, showed a trend of the mean CT values reconstructed by DLR being higher than those reconstructed by hybrid IR. Kim et al. even showed a mean difference of − 5.2 HU, regarding the liver [17]. However, this trend can only be applied to GE Healthcare's reconstruction techniques. More research is needed to determine this for other vendors. DLR resulted in significantly higher CNRs and SNRs compared to hybrid IR and FBP. Results on subjective image quality are in favour of DLR. The higher diagnostic confidence in the liver rated by experienced radiologists is promising. However, more research on diagnostic confidence for other abdominal organs needs to be conducted to provide a complete overview.  Only Noda et al. reported a lower CT value with DLR in the spleen, but differences were also insignificant [21]. During the literature search, one study (Matsukiyo et al. [27]) was excluded because this study assessed the abdominal arteries and no organs. However, hybrid IR and DLR were compared, and this study showed a significant difference in reconstructed CT values of two arteries. Here, the CT values of images reconstructed by DLR were significantly higher. This study used a Canon Medical Systems Corporation CT scanner. Small differences in scan protocol existed between the included studies for our meta-analysis but did not affect CT value comparison as the tube voltages of all studies were similar.
Unfortunately, there were only a few articles included that compared DLR to FBP [18,[23][24][25]. For that reason, no meta-analysis of these reconstruction techniques could be performed. All articles that included FBP described that image quality of DLR was better than FBP and IR. Most articles studied differences in SNR, CNR, and lesion detection between DLR and other reconstruction techniques and concluded that DLR works very well in low-dose CT imaging.
With better image quality and no difference in CT values between other reconstruction techniques, DLR is recommended for clinical diagnosis based on CT densitometry.
An important aspect of DLR is that its performance highly depends on the dataset used for training the algorithm. A heterogeneous dataset representative for all patients is required to prove accuracy for all patient groups. One study was found which was conducted in Europe [24]. Other studies were conducted in East Asia (n = 13) [11-14, 16-22, 25, 26] and North America (n = 2) [15,23]. In East-Asian studies, the average weight of the patient group tends to be lower than in other regions in the world. Due to a higher obesity rate in western cultures, beam hardening can influence the image reconstruction more in these cultures and therefore the accuracy of CT values.
Hybrid IR data were used for our meta-analysis if the hybrid level was between 30 and 50%. This range represents the hybrid IR technique best. A higher level will result in a higher percentage model-based IR technique which was not the intended reconstruction technique.
Differences between the studies included in the metaanalysis existed in methodology of image assessment. Slice thickness could affect SNR, CNR, and diagnostic confidence as increasing slice thickness results in lower noise levels. ROI shape and size were not similar and could also affect SNR, CNR, and diagnostic confidence.
In all included studies [11-16, 18-23, 26] , except for Kim et al., Steuwe et al., and Wang et al. [17,24,25], contrastenhanced CT was used. In contrast-enhanced CT, a difference in CT value per scan or reconstruction technique is less relevant compared to non-contrast CT, because of the small CT value range in non-contrast CT. When this range is small, it is more important to conduct accurate CT values.
PixelShine described by Steuwe et al. [24] is not developed by CT vendors, while all other reconstruction techniques are. Even though PixelShine reduces image noise by using a deep learning algorithm, it is not clear if the image is reconstructed from the raw data or that the algorithm uses FBP or IR reconstructed images as input. On top of that, this reconstruction technique is not yet approved by the Food and Drug Administration (FDA) [6].
No phantom studies were included in this systematic review because the phantoms lacked abdominal characteristics. Phantom studies can, however, be useful in determining the uniformity and validity of CT values of a single reconstruction technique. CT uniformity dictates that for scanning a uniform material, the mean CT value does not depend on the position within the image. This is important when evaluating the accuracy of a reconstruction technique. More research on uniformity and validity is needed for an even better comparison between reconstruction techniques. Phantom studies can also be used to determine whether variable tube currents or tube voltages affect CT values.
In conclusion, this systematic review and meta-analysis identified no significant difference in CT values reconstructed by FBP, hybrid IR, and DLR in abdominal organs. This shows that these reconstruction techniques are consistent in reconstructing CT values and can thus be applied safely in the clinical setting. Also, DLR images showed a significantly higher CNR and SNR than FBP and hybrid IR images. However, there was a limited number of studies that described FBP, and the meta-analysis was only done with studies that described GE's hybrid IR and DLR. Therefore, more research is needed to determine whether the same conclusion is true for the comparison between FBP and DLR and other CT vendors.
Ethical Approval Institutional Review Board approval was not required because this review did not include specimens or involve any treatments or interventions.

Study subjects or cohorts overlap
All the included studies have been previously reported as an original research paper.

• Systematic Review • Meta-analysis
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.