Introduction

Since March 2017, our institution has employed an ultrahigh-resolution computed tomography (UHRCT) scanner to improve the in- and through-plane spatial resolution of CT images. The clinical utility of UHRCT has been shown in CT examinations of the temporal bone, chest, and Adamkiewicz artery and virtual bronchoscopy and coronary angiography [1,2,3,4,5]. However, the greater image noise associated with this method may limit its usefulness in CT that requires lower contrast resolution, such as abdominopelvic CT (APCT) for oncologic follow-up, where UHRCT may improve diagnosis of fine recurrent, disseminated, and metastatic lesions.

To overcome these potential limitations, deep learning (DLR) and model-based iterative (MBIR) reconstruction techniques have become clinically available for use in combination with UHRCT [6]. MBIR is reported to improve image quality with respect to noise characteristics, spatial resolution, artifacts, and low-contrast detectability. However, radiologists have been reluctant to adopt this modality because it produces a coarse texture associated with low-frequency noise, described as an “oil painting” or “plastic-like” appearance, compared to results obtained using hybrid iterative reconstruction (HIR), which is widely applied in clinical settings [6,7,8]. At a routine radiation dose, the quality of abdominal UHRCT images may be better using DLR than either MBIR or HIR [6]. On the other hand, contrast-enhanced APCT (CE-APCT) for oncologic follow-up requires a relatively large amount of contrast material (CM) and a high radiation dose [9,10,11]. Minimizing CM dose may be desirable for oncology patients because they tend to have multiple risk factors for kidney injury [12,13,14] and reduction of radiation dose is critical to minimize the potential adverse effects by ionizing radiation because repeated CT examinations usually need to be performed [15]. In particular, lowering tube voltage may enable reasonable low-radiation-dose (LD) UHRCT scans aided by DLR while preserving contrast enhancement even with reduction of CM dose. We believe, though, that the quality and acceptance by radiologists of LD CE-APCT images by UHRCT using DLR as well as MBIR has not been assessed yet.

We therefore undertook phantom and clinical pilot studies by UHRCT to compare findings between MBIR and DLR in CE-APCT for oncologic follow-up obtained utilizing a high-resolution and LD (HR & LD) protocol using our routine protocol as reference. We evaluated the image quality and radiologists’ acceptance of MBIR and DLR and attempted to determine the appropriate HR & LD protocol that would yield the least radiation exposure.

Materials and methods

In this study, we mainly aimed to (1) determine the HR & LD protocol with MBIR and/or DLR to achieve the lowest radiation dose and the similar low-frequency noise to that using the routine protocol in the phantom study and (2) assess validity of this HR & LD protocol based on image quality and radiologists’ acceptance using the routine protocol as reference in the clinical pilot study.

Our institutional review board approved this clinical study, and we obtained written informed consent from all patients.

Phantom study

Phantoms

We assessed spatial resolution by the task-based modulation transfer function (TTF) using a quality assurance phantom (TOS phantom; Canon Medical Systems, Tochigi, Japan) that included inserts of various materials to provide different levels of image contrast (air, -1000 HU; polypropylene, -105 HU; water, 0 HU; acrylic, 120 HU; Delrin, 340 HU; and Teflon, 940 HU). We focused on the acrylic insert of the lowest positive contrast, almost equivalent to the contrast between soft tissue and fat attenuations, for oncologic follow-up by CE-APCT. To assess image noise characteristics by the noise power spectrum (NPS), we utilized an original abdomen phantom comprising an elliptical cylinder (33-cm longest diameter, 22-cm shortest diameter) made of epoxy-based and polyurethane resin (Kyoto Kagaku, Kyoto, Japan) (Fig. 1) [16].

Fig. 1
figure 1

Axial CT images of the 2 phantoms used to assess a the task-based modulation transfer function (TTF), which included cylinder inserts of various materials that offered different image contrasts (left to right: Delrin, 340 HU; acrylic, 120 HU; air, -1000 HU; polypropylene, -105 HU; and Teflon, 940 HU), and b the noise power spectrum (NPS), made primarily of epoxy-based and polyurethane resin

CT image acquisition and reconstruction

We performed helical scanning of the 2 phantoms with a UHRCT scanner (Aquilion Precision, Canon Medical Systems) using automatic exposure control (AEC) and parameters for our routine and HR & LD protocols, which are summarized in Table 1. Specifically, we used 5 dose settings at AEC noise indices of 20, 25, 30, 35, and 40 HU for the HR & LD protocol at 100 kV. To assess radiation exposure, we reviewed the volume CT dose index (CTDIvol) for each protocol recorded as a dose report.

Table 1 Parameters for CT scanning and reconstruction

We reconstructed the phantom images using a standard kernel (FC03) and an HIR algorithm (Adaptive Iterative Dose Reconstruction [AIDR] 3D Standard, Canon Medical Systems) for those acquired with the routine protocol and using an MBIR algorithm (Forward-projected model-based Iterative ReconSTruction [FIRST] Body Standard, Canon Medical Systems) and a DLR algorithm (Advanced intelligent clear-IQ engine [AiCE] Body Standard, Canon Medical Systems) for those acquired with the HR & LD protocol. Table 1 summarizes other reconstruction parameters.

For the DLR, as shown in Fig. 2, standard-dose images by HIR as low-quality input data and high-dose images by advanced MBIR with much more iterations than MBIR as targeting high-quality data were used as training pairs, and in the training process, statistical features that differentiate signals from the noise and artifacts could be “learned” and then “updated” in the deep convolutional neural network for use in future reconstructions [6, 17]. This training process had been performed in advance as a black box by the manufacturer. Because these ideal MBIR images were used to train the network, DLR yielded comparable or superior image quality to that of MBIR in a shorter processing time than that of MBIR [6].

Fig. 2
figure 2

Flowcharts of the training and reconstruction process in deep learning reconstruction (DLR). a In the training process, given standard-dose images by hybrid iterative reconstruction (HIR) as low-quality input data and high-dose images by advanced model-based iterative reconstruction (MBIR) with much more iterations than MBIR as targeting high-quality data as training pairs, the deep convolutional neural network (DCNN) is updated to minimize the difference between DCNN output and the target for future reconstructions. This process has been performed in advance as a black box by the manufacturer. b In the reconstruction process, the DCNN is validated for clinical image processing to generate final high-quality images from input images by HIR

Image quality assessment

We analyzed each set of phantom images using the appropriate software (CT measure version 0.98, http://www.jsct-tech.org/; Excel 2016, Microsoft). On an axial image of the acrylic insert in the quality assurance phantom, we radially acquired and averaged profile curves crossing the circular edge to obtain its edge-spread function. TTF was calculated by Fourier transformation using line-spread function obtained by differentiating the edge-spread function to assess the intermediate-contrast in-plane spatial resolution with non-linear algorithms, such as HIR, MBIR, and DLR, at various noise levels. Our method to determine NPS in the epoxy-based resin part is described elsewhere [8, 16]. Using routine images reconstructed by HIR for reference, we then compared TTF and NPS between the HR & LD images with the 5 dose settings at noise indices of 20, 25, 30, 35, and 40 HU reconstructed by MBIR and DLR. Ultimately, we determined the noise index setting for the AEC in the HR & LD protocol by MBIR and/or DLR to minimize the radiation dose while preserving or lessening noise at lower frequencies compared with the noise obtained using the routine protocol.

Clinical pilot study

Subjects

From March 11 through March 29, 2019, we prospectively enrolled 41 consecutive adult patients with mild to moderate renal impairment (i.e., estimated glomerular filtration rate: 30 to 59 mL/min/1.73m2) from whom we obtained written informed consent and who underwent CE-APCT using an HR & LD protocol with the UHRCT scanner for oncologic follow-up. Exclusion criteria were: inadequate CT image acquisition and history of surgical operation of the liver and/or intrapelvic organs, which precluded our image quality assessment described below. Actually, 5 patients were excluded due to insufficient scan coverage (n = 3) and history of total hysterectomy (n = 2). Thus, we finally included 36 consecutive patients (24 men, 12 women; mean age, 75 ± 9 years; range, 48 to 93 years; mean body weight [BW], 57.1 ± 11.7 kg; range, 39 to 86 kg; mean body mass index [BMI], 22.5 ± 3.5 kg/m2; range, 15 to 29 kg/m2) in the present study. Using the routine protocol as reference, we compared the quality of the CT images reconstructed by MBIR and DLR. In the HR & LD protocol, we set the noise index at 35 HU based on the results of the aforementioned phantom study, as described in the “Results” section, and reduced the iodine load by 40% of that with our routine protocol.

CT image acquisition and reconstruction

Patients underwent helical acquisition of CE-APCT with the UHRCT scanner using parameters summarized in Table 1. All patients received non-ionic iodinated CM (Iopamiron 300; Bayer HealthCare, Osaka, Japan) at a concentration of 300 mgI/mL. A total dose of 312 mgI/kg of BW was administered over 45 s via the right antecubital vein using a 22-gauge plastic intravenous catheter with a power injector (Dual Shot-type GX 7; Nemoto Kyorindo, Tokyo, Japan), and scanning began at 120 s following the start of CM administration. To assess radiation exposure, we reviewed the CTDIvol and dose-length product (DLP) recorded as a dose report and then calculated the estimated effective dose as the DLP multiplied by a k factor for the abdomen and pelvis of 0.015 mSv mGy−1 cm−1 [18] for each patient. Thus, we calculated the mean CTDIvol, DLP, and estimated effective dose for the HR & LD protocol. As in the phantom study, we used both the MBIR and DLR algorithms to reconstruct the CE-APCT images acquired for each patient (Table 1).

Quantitative assessment of image quality

On the CT images reconstructed by both MBIR and DLR and displayed on a commercially available workstation (Ziostation Version 2.4; Ziosoft, Inc., Tokyo, Japan), 3 radiology technologists, by consensus, employed a copy-and-paste function to place 3 circular regions of interest (ROIs) in the hepatic parenchyma, carefully avoiding large vessels and any areas of focal changes in attenuation, and prominent artifacts. In a similar manner, they placed a circular ROI in the upper abdominal subcutaneous fat at the same level, in the major psoas muscle at the level of the aortic bifurcation, avoiding any macroscopic fat infiltration, in the urinary bladder and the prostate (for men) or uterus (for women), avoiding any areas of focal change in attenuation and prominent artifacts, and in the lower abdominal subcutaneous fat at the same level. Thus, they measured the CT number and its standard deviation (SD) value within these anatomies in each patient. We calculated the mean SD value in all patients as objective noise in the hepatic parenchyma, upper abdominal subcutaneous fat, major psoas muscle, urinary bladder, and lower abdominal subcutaneous fat. We also calculated the contrast-to-noise ratio (CNR) of the liver and pelvis using the following equations: CNR of the liver = (mean CT number of the hepatic parenchyma – CT number of the major psoas muscle)/noise in the upper abdominal subcutaneous fat, and CNR of the pelvis = (CT number of the prostate or uterus – CT number in the urinary bladder) / noise in the lower abdominal subcutaneous fat.

Qualitative assessment of image quality

On the workstation, 2 independent board-certified radiologists with 10 and 11 years’ clinical experience who were blinded to patient demographics and CT parameters used a 5-point scale to grade the quality of CE-APCT images reconstructed by both MBIR and DLR. Five points represented much better quality compared with the reference; 4 points, better quality; 3 points, comparable quality; 2 points, worse quality; and one point, much worse quality. Referencing routine CE-APCT images of other 101 consecutive adult patients (52 men, 49 women; mean age, 65 ± 17 years; range, 29 to 92 years; mean BW, 57.3 ± 14.4 kg; range, 32 to 117 kg; mean BMI, 21.9 ± 4.5 kg/m2; range, 12 to 36 kg/m2) with normal renal function (i.e., estimated glomerular filtration rate: ≥ 60 mL/min/1.73m2) imaged by UHRCT using the routine scan and reconstruction protocol (Table 1) and our routine iodine load (520 mgI/kg of BW) from January 1 through March 1, 2019, the reviewers considered the general acceptability of the image with regard to overall diagnostic confidence and both image appearance and image texture in the liver and intrapelvic organs (prostate or uterus and urinary bladder) as well as image noise, as described by Laurent and colleagues [7]. The HR & LD images reconstructed by both MBIR and DLR were presented in random order on a preset soft tissue window (window width, 370 HU; window level, 40 HU).

Statistical analysis

Results were expressed as mean ± SD for continuous variables. Statistical analysis was performed using commercially available statistical software (SPSS for Windows, Version 23.0, IBM SPSS, Armonk, NY). Objective noise and CNR were compared between MBIR and DLR using paired t-test, and subjective image quality grades were compared using Wilcoxon signed-rank test. BW and BMI were compared between the study and reference patient groups using unpaired t-test. A P value below 0.05 was considered to indicate significant difference. Inter-reviewer agreement was estimated using weighted kappa statistics.

Results

Phantom study

In the phantom study, the CTDIvol was 8.7 mGy using the routine protocol and 11.3 mGy at a noise index of 20 HU; 9.7 mGy, 25 HU; 7.2 mGy, 30 HU; 5.5 mGy, 35 HU; and 4.4 mGy, 40 HU using the HR & LD protocol. As shown in Fig. 3, the phantom study revealed a higher TTF with the HR & LD protocol than with the routine protocol, and TTF was higher by DLR than MBIR at the same dose with the HR & LD protocol. This tendency was more prominent at lower doses. In addition, the HR & LD protocol yielded less low-frequency noise but greater high-frequency and overall noise (i.e., SD value; calculated by area under the NPS curve) by DLR than by MBIR at the same dose (Fig. 4). In particular, low-frequency noise was less at a noise index of 20–30 HU, comparable at 35 HU, and higher at 40 HU by DLR, but it was higher at 35–40 HU by MBIR compared with the routine protocol. We thus determined to use DLR at a noise index of 35 HU as the HR & LD protocol for our clinical pilot study to minimize radiation exposure and achieve similar image texture and greater sharpness compared to those with the routine protocol. Actually, the CTDIvol at an index of 35 HU (i.e., 5.5 mGy) with the HR & LD protocol was lower than that using the routine protocol (i.e., 8.7 mGy); TTF increased from that using the routine protocol to that at an index of 35 HU with MBIR to that at an index of 35 HU by DLR (Fig. 3c); low-frequency noise at an index of 35 HU by DLR was comparable to that using the routine protocol and less than that at an index of 35 HU by MBIR (Fig. 4c). Nevertheless, overall noise increased from that at an index of 35 HU with MBIR (6.9 HU) to that using the routine protocol (7.2 HU) to that at an index of 35 HU by DLR (8.4 HU).

Fig. 3
figure 3

Task-based modulation transfer function (TTF) curves for deep learning reconstruction (DLR) (a and c: blue-tone curves) and model-based iterative reconstruction (MBIR) (b and c: green-tone curves) at 5 different dose levels (a and b), particularly including standard deviation (SD) of 35 HU (c), using the high-resolution and low-radiation-dose (HR & LD) protocol with those for hybrid iterative reconstruction (HIR) using our routine protocol (ac: red dotted curves). Spatial resolution is higher for both DLR and MBIR using the HR & LD protocol than that for HIR using the routine protocol. Note the higher spatial resolution for DLR than MBIR at the same radiation dose

Fig. 4
figure 4

Noise power spectrum (NPS) curves for deep learning reconstruction (DLR) (a and c: blue-tone curves) and model-based iterative reconstruction (MBIR) (b and c: green-tone curves) at 5 different dose levels (a and b), particularly including standard deviation (SD) of 35 HU (c), using the high-resolution and low-radiation-dose (HR & LD) protocol with those for hybrid iterative reconstruction (HIR) using our routine protocol (ac: red dotted curves). Low-frequency noise is less for DLR than for MBIR at the same radiation dose. Note that the noise for HIR using the routine protocol is comparable to that for DLR at an SD of 35 HU and less than that for MBIR at an SD of 35 HU (c)

Clinical pilot study

In the clinical pilot study, use of the HR & LD protocol at a noise index of 35 HU yielded significantly less objective and subjective noise and significantly greater CNR in all anatomies, but all subjective image qualities except subjective noise were significantly worse by MBIR than by DLR (P < 0.001 for all, Table 2 and Figs. 5 and 6). Both reviewers graded subjective noise as 4 or 5 by MBIR and 3 to 5 by DLR in all patients, and they scored all other subjective image quality from one to 3 by MBIR and 3 to 5 by DLR. Inter-reviewer agreement was excellent (κ = 0.87). Both BW and BMI were comparable between the study and reference patient groups (P = 0.892 and 0.512, respectively). With the HR & LD protocol, the mean CTDIvol was 4.2 ± 1.6 mGy, the DLP, 243.2 ± 106.0 mGy cm, and the estimated effective dose, 3.6 ± 1.6 mSv.

Table 2 Objective noise, contrast-to-noise ratio, and subjective image quality
Fig. 5
figure 5

Violin plots with box-and-whisker plots representing the subjective image quality scores by model-based iterative reconstruction (MBIR) than by deep learning reconstruction (DLR) in the clinical pilot study. Five points represent much better quality compared with the routine protocol; 4 points, better quality; 3 points, comparable quality; 2 points, worse quality; and one point, much worse quality. The dashed line represents 3 points. Use of the high-resolution and low-radiation-dose protocol at a noise index of 35 HU yields significantly less subjective noise in both the liver and pelvis, but all subjective image qualities except the noise are significantly worse by MBIR than by DLR (P < 0.001 for all). Note that all the scores are 3 to 5 by DLR, representing non-inferior diagnostic efficacy by DLR compared to the routine protocol

Fig. 6
figure 6

Contrast-enhanced ultrahigh-resolution CT (UHRCT) axial images of the abdomen (a and b) of a 71-year-old man (156 cm, 60 kg, body mass index [BMI]: 24.7 kg/m2) and the pelvis (c and d) of an 86-year-old man (152 cm, 56 kg, BMI: 24.2 kg/m2) acquired with the high-resolution and low-radiation-dose protocol (tube voltage, 100 kV; standard deviation [SD], 35 HU; CT dose index volume, 4.7 mGy for the first subject and 4.0 mGy, for the second) and reconstructed by model-based iterative reconstruction (MBIR) (a and c) and deep learning reconstruction (DLR) (b and d). Despite its greater subjective noise, DLR more sharply and naturally depicts anatomies in the abdomen and pelvis than MBIR without the characteristic oil painting texture of MBIR. Specifically, the delineation of the cystic lesion in the pancreatic body (arrows) and the Gerota’s fasciae (arrowheads) is more conspicuous by DLR (b) than by MBIR (a); the boundaries between the prostate (arrows) and the adjacent urine in the urinary bladder and between the seminal vesicles (arrowheads) and the surrounding fat tissue appear clearer by DLR (d) than by MBIR (c)

Discussion

The phantom study, using the HR & LD protocol, demonstrated higher spatial resolution and lower low-frequency noise by DLR than by MBIR at the same dose. Compared with findings using the routine protocol, low-frequency noise was similar by DLR but greater by MBIR at a noise index of 35 HU, whereas high-frequency and overall noise were greater by DLR and less by MBIR. Low-frequency noise was reported to produce coarse image texture described as an “oil painting” or “plastic-like” appearance, which compromised the detection of small lesions [7, 8, 19]. From the clinical pilot study, using this imaging protocol at a noise index of 35 HU, though noise characteristics were worse by DLR than by MBIR, the objective noise was less than 10 HU even by DLR, and both reviewers graded subjective noise as similar or better in all patients compared with findings using the routine protocol. An optimal noise index of 12.5 or 15.0 HU was reported to obtain diagnostically acceptable APCT images at a reasonably reduced radiation dose using conventional multidetector CT (MDCT) scanners and filtered back projection (FBP) [20]. The other image qualities‒diagnostic confidence, image appearance, and image texture‒were similar or better by DLR but similar or worse by MBIR in all patients compared to findings with the routine protocol, and they were significantly better by DLR than by MBIR. Because the patient body size was comparable between the study and reference groups, all the subjective image qualities and thus diagnostic efficacy were thought to be not inferior only by DLR (i.e., not by MBIR) using the HR & LD protocol compared with the routine protocol including the standard resolution and dose and HIR.

Reporting clinical study findings, Akagi and colleagues [6] observed that DLR improved the quality of abdominal CT images obtained by UHRCT at their routine dose (CTDIvol: 12.6 mGy). Generally, attenuation is greatest as the x-ray beam travels horizontally at the level of the hip joint because the pelvic bones and bilateral femoral heads block photons from reaching the x-ray detectors, resulting in photon starvation artifact as a major issue to be resolved in low-dose pelvic CT [21]. We first applied the HR & LD protocol in APCT by UHRCT clinically and successfully reduced radiation dose by two-thirds (i.e., CTDIvol: 4.2 mGy) compared with the study by Akagi’s group [6], achieving a value much lower than the diagnostic reference levels for low-dose APCT followed in many countries (CTDIvol: 13 to 18 mGy) [22, 23]. This reduced dose still permitted acquisition of adequate image quality in CE-APCT by UHRCT for oncologic follow-up, though such challenges are perceived as more easily manageable in certain high-contrast examinations, such as CT angiography, CT examination of nephroureterolithiasis, and CT colonography [24, 25]. Previous studies reduced the CTDIvol to approximately 6 mGy in APCT examinations by conventional MDCT (i.e., non-UHRCT) with tube voltage reduction and/or the application of various iterative reconstruction algorithms [21, 26,27,28,29,30,31,32,33,34,35]. Park and colleagues [26] described the combined use of automated attenuation-based tube potential selection on third-generation dual-source CT with an iterative reconstruction algorithm maximized median CTDIvol reduction. They were able to achieve a median CTDIvol of 4.8 mGy by decreasing tube voltage to 90 kV in a patient subgroup with the smallest body physique, but their result is still higher than the mean CTDIvol in the present study. In a phantom study utilizing a conventional MDCT scanner, Higaki and colleagues [8] reported less low-frequency noise and thus higher task-based detectability at various task contrast settings by DLR than MBIR at low radiation doses. The exact reason is unknown; however, DLR is robust in low-dose situations because its training includes low-quality datasets to allow the generation of high-quality images from low-quality images with the preservation of signal and spatial resolution [white paper, https://mfl.ssl.cdn.sdlmedia.com/636837173033229994OU.pdf. Accessed 24 Apr 2020]. In our study, noisier UHRCT images probably enhanced these benefits by DLR. In contrast, spatial resolution by MBIR is easily degraded in low-dose and/or low-contrast situations [36]. MBIR is associated with changes in image texture related to the distribution of signal within a narrow bandwidth of frequencies compared to that with HIR, which accounts for the coarser texture of MBIR images [7]. This texture change by MBIR might degrade subjective image qualities other than subjective noise by radiologists with a clear preference toward HIR. In addition, higher spatial resolution allows sharper delineation of various anatomies and more conspicuous depiction of potential focal lesions by DLR than MBIR. As well, MBIR usually requires higher computational power and longer processing time than those with FBP and HIR [6], and with shorter and more reasonable processing time, DLR is considered more clinically useful than MBIR [6]. Thus, the use of DLR is regarded as clinically acceptable as an adjunct to CE-APCT by UHRCT with the HR & LD protocol because it yields similar or better subjective image qualities and thus non-inferior diagnostic efficacy compared to those acquired using the routine protocol. The combination of DLR and the HR & LD protocol may be particularly beneficial for maximally reducing radiation dose and improving diagnosis of fine recurrent, disseminated, and metastatic lesions in CE-APCT by UHRCT for oncologic follow-up.

Our study was limited as follows. In the phantom study, we assessed TTF only using the single intermediate contrast, almost equivalent to image contrast between attenuations of soft tissue lesions (e.g., peritoneal disseminations, lymph node metastases) and intra- and retroperitoneal fat, instead of multiple contrasts including a low contrast for metastases within solid organs. In the clinical study, it included only a small study population at a single institution, and the smaller BW and BMI of our Japanese patients compared with those of average-sized patients in Western countries may have affected our findings. In addition, we assessed only image quality in CE-APCT but did not examine lesion delineation or diagnostic performance. Further studies to assess the clinical usefulness of our results should include examination of lesion delineation and diagnostic performance in a larger cohort at multiple institutions.

Conclusions

In CE-APCT at a low dose (CTDIvol: approximately 4 mGy) by UHRCT, DLR yields better image qualities, with the exception of noise characteristics, and greater acceptance by radiologists than the use of MBIR. Particularly, lower low-frequency noise is likely to produce less coarseness of image texture and better acceptance by radiologists at the low dose by DLR than by MBIR.