Introduction

The contribution of computed tomography (CT) to the total effective dose due to medical X-ray examinations has been recently reported to be up to 70% [1]. Hence, continuous efforts have been made by manufacturers and users of CT to reduce the dose level per examination with the integration of new technologies (e.g., tube current modulation, iterative reconstruction (IR) algorithms, or more efficient detectors) and the optimization of clinical protocols [2].

An important aspect to take into account when dealing with protocol optimization is the variation of the practice even for a well-defined indication. Hence, diagnostic reference levels (DRLs) were proposed by the International Commission on Radiological Protection (ICRP) in 1996 to reduce the variability of clinical practice by leading users of CT to take actions when the local dose indicator systematically exceeds the national DRL [3]. Two major limitations appear. DRLs are often not related to precise clinical indication, nor to any clinical image quality criteria. The first limitation was partially addressed by recently published national or local DRLs [4,5,6,7,8,9], and at the European level [10]; the second one is still an open question as mentioned by Rehani [11]. Moreover, the technological differences between CT scanners should be taken into account when dealing with clinical protocol optimization. Adjusting the radiation dose level of a clinical protocol using the value of the associated DRL without assessing the image quality is suboptimal [12]. On the one hand, further patient dose optimization could be justified for the most modern CT scanners. On the other hand, it could cause an excessive dose reduction with a loss of diagnostic performance, in particular for older CT scanners. This practice can lead to variations in image quality and patient care, while the goal is the standardization of image quality such that it is just sufficient for the clinical task at the lowest possible dose [13, 14]. Hence, it appears necessary to associate national DRLs for specific clinical tasks with task-based image quality criteria in order to assess a potential dose optimization and avoid excessive patient dose reduction.

Among all existing CT examinations, abdominal CT protocols deliver the highest radiation doses to the patients [1]. Moreover, the optimization process is particularly crucial for abdominal protocols due to the challenges arising from the detection of small low-contrast lesions [15]. An excessive patient dose reduction can highly increase the risk of missing subtle lesions.

The use of basic image quality metrics (standard deviation, contrast, contrast-to-noise ratio, modulation transfer function) is of limited interest because they are not directly related to any clinical requirement [16]. Task-based image quality analysis was initially proposed by Barrett and Myers to quantify the CT diagnostic performances [17, 18]. The methodology was recently applied with success to benchmark CT scanners [19] and clinical protocols [20] or assess the use of IR algorithms [16, 21].

The purpose of this contribution is to assess task-based image quality for two abdominal protocols on various CT scanners and to establish a relationship between DRL values and image quality for the respective clinical tasks.

Materials and methods

Image quality phantoms

An abdominal anthropomorphic phantom (QRM, A PTW COMPANY) was used to assess the image quality of two examination types. The phantom mimics various tissues (muscle, liver, spleen, and vertebrae) (Fig. 1a). Due to the absence of materials with high atomic numbers, the phantom was designed to assess non-contrast CT scans. Its effective diameter of 30 cm simulates the attenuation of a patient with a weight around 75 kg. The phantom contains a hole of 10 cm in diameter into which different modules can be inserted. To mimic the detection of focal liver lesions, a first module containing hypodense low-contrast spheres of different sizes (in particular 8 and 5 mm diameter) with a contrast of 20 HU relative to the background was used (Fig. 1b). These two lesion sizes were considered clinically relevant. Indeed, liver lesions smaller than 5 mm are often benign. Furthermore, it is difficult to accurately characterize smaller lesion sizes in the liver with this type of contrast in CT [22].

A second module containing a high contrast calcic rod of 20 mm in diameter and a contrast of 200 HU was used to quantify the spatial resolution, an important aspect for assessing the detection of renal stones (Fig. 1c).

Fig. 1
figure 1

a Photo of the QRM phantom. b CT slice of the phantom with the module containing the low-contrast spheres, the 8-mm spheres are all positioned on the first row and the 5-mm spheres are positioned on the third row. c CT slice of the phantom with the module containing the high contrast rod

CT scanners and acquisition/reconstruction parameters

In concertation with a panel of radiologists, two sets of acquisition and reconstruction parameter settings were defined that are typical for examinations of a) focal liver lesions and b) renal stones. Five volume computed tomography dose index (CTDIvol) levels were used for each set (4, 8, 12, 16, and 20 mGy for focal liver lesions and 2, 4, 6, 10, and 15 mGy for renal stones). The current Swiss DRLs (11 mGy for focal liver lesions CT acquisitions and 6 mGy for renal stones CT acquisitions) and the underlying dose distributions [6] were used to determine the 5 CTDIvol levels, so that they cover the clinically relevant dose range.

The 12 CT scanners involved in this study are listed in Table 1. Three different CT scanners from each of the four major CT manufacturers were included. Thus, the variability of image quality due to scanner-specific technology properties could be adequately studied. In practice, there is no identical set of acquisition and reconstruction parameters that can be used on all CT scanner models. Instead, acquisition and reconstruction parameters were matched as closely as possible (Table 1). Reconstruction algorithms and reconstruction kernels are manufacturer- and model-specific.

Table 1 CT scanners and their acquisition and reconstruction settings

Radiation dose assessment

Before each acquisition session, CTDIw was measured with a 10-cm ionization chamber (PTW TM30009 or Radcal 10X6-3CT) using a 32-cm-diameter CTDI phantom, following the international electrotechnical commission (IEC) standard 60601-2-44. The ratio of the measured CTDIw to the displayed CTDIw was used to correct the displayed CTDIvol of the image quality phantom scans. For the 12 CT scanners, the correction factors ranged from 0.847 to 1.057. Furthermore, the actual radiation dose depends on the z-position if the tube current is modulated. All CTDIvol values presented in the results section are corrected and refer to the actual z-position where the image quality was evaluated.

Relative standard uncertainties on the final CTDIvol values were evaluated in detail [23]. It turned out that 2.5% is a good estimate for all CT scanners and all dose levels. The most important uncertainty component was the uncertainty of the CTDIw measurements, more specifically the uncertainty of the chamber calibration factors (relative standard uncertainty of 1.5%, from calibration certificate).

Image analysis

Low-contrast detectability

We quantitatively assessed the image quality using a task-based methodology. The clinical tasks were the detection of low contrast lesions with a size of 5 and 8 mm. The low-contrast module contains four spheres of 8 mm and five spheres of 5 mm in diameter in the exact same slice. As 20 acquisitions for each dose level were acquired, we were able to extract at least 80 square regions of interest (ROIs) of 18 × 18 pixels containing lesions of 8 mm and 5 mm in diameter. On the right homogeneous part of the phantom images, 400 ROIs containing only noise were extracted in five slices around the slice of interest (Fig. 1b).

An anthropomorphic mathematical model observer was chosen to quantitatively assess the detectability of low contrast lesions. Based on Bayesian statistical decision theory, this kind of observer has the ability to mimic human observer responses in the detection of low contrast structures in an image [24,25,26]. The channelized Hotelling observer (CHO) with 10 dense difference of Gaussian channels (DDoG) was applied, following the methodology proposed by Wunderlich et al to compute the signal-to-noise ratio (SNR), expressing the detectability of the lesion [27]. The CHO model observer was previously computed using the same anthropomorphic phantom [21]. As CHO model observers are more efficient than human observers for simple detection tasks in uniform background, it is necessary to adjust the detection outcomes of model observers by adding internal noise on the covariance matrix [28]. Internal noise was calibrated with the data from the inter-comparison study of Ba et al [29]. The area under the receiver operating characteristics curve (AUC) was used as the figure of merit to assess the detectability of low contrast lesions. A monotonic function can link SNR and AUC [30]. The AUC was computed for each CT, dose level, and lesion size.

High-contrast detectability

For the detection of renal stones, we also used a task-based methodology. The clinical task was the detection of calcic lesions of 3 and 5 mm with a contrast of 450 HU. Indeed, renal stones of 3 mm and smaller have a high chance of spontaneous passage [31]. We decided to use 3 mm as a cut-off. An anthropomorphic mathematical observer, the non-prewhitening observer with an eye filter (NPWE) expressed in the Fourier domain was used. Developed by Burgess [32], the NPWE computes the SNR of simulated high contrast lesions using the in-plane contrast-dependent spatial resolution (target transfer function (TTF)) from the images of high contrast objects, the noise power spectrum (NPS), and the virtual transfer function of the human eye [33].

The TTF was computed using the module containing the high-contrast rod. As six acquisitions were performed for each CT scanner and dose level, 78 ROIs of 64 × 64 pixels centered on the rod could be extracted. The 2D TTF was calculated from the edge of the rod following the methodology described by Monnin et al and radially averaged and normalized at the zero frequency to obtain the 1D TTF [34].

Image noise was quantified by computing the NPS [35,36,37]. A total of 90 ROIs of 64 × 64 pixels were extracted from 15 homogeneous slices per acquisition. The 2D NPS was computed on the cropped ROIs and then radially averaged to obtain 1D NPS.

As the integral of 1D NPS decreases as the slice thickness increases [38], we corrected the SNR of the NPWE model for the 5 CT scanners with a 2.5 mm slice thickness by a factor \( \sqrt{\frac{3}{2.5}} \).

Statistical analysis

For the CHO model observer, to reduce the positive bias caused by the use of a finite number of images and to compute the exact 95% confidence interval of SNR, the methodology developed by Wunderlich was applied [27]. A linear fit between the logarithm of the SNR and the logarithm of the dose, taking into account the uncertainties, was performed for each CT scanner to calculate SNR and AUC values at a given CTDIvol and vice versa.

For the NPWE outcome, the uncertainties were determined using a bootstrap method. Results were computed using 100 bootstrapped samples of 50 ROIs used for TTF and NPS calculations.

Results

To ensure the impartiality of this work, the results are reported in an anonymous manner consistently throughout the manuscript. It was not the purpose of this work to compare individual CT scanner models but rather to study the size of the variability when using different models. A capital letter (A, B, C, and D) was assigned to each manufacturer and figures 1, 2, and 3 refer to the three different CT scanners.

Fig. 2
figure 2

Area under the ROC curve as a function of CTDIvol in the slice of interest for the 8-mm lesion size for the 12 CT scanners. The horizontal and vertical uncertainty bars represent the expanded uncertainty (k = 2, 95% level of confidence) for the CTDIvol and AUC, respectively. The solid black line was plotted by joining 5 points representing the mean AUC and the mean CTDIvol over all 12 CTs for each dose level. The gray band was plotted by joining the limits of the 95% confidence intervals of the 5 points

Fig. 3
figure 3

Area under the ROC curve as a function of CTDIvol in the slice of interest for the 5-mm lesion size for the 12 CT scanners. The horizontal and vertical uncertainty bars represent the expanded uncertainty (k = 2, 95% level of confidence) for the CTDIvol and AUC, respectively. The solid black line was plotted by joining 5 points representing the mean AUC and the mean CTDIvol over all 12 CTs for each dose level. The gray band was plotted by joining the limits of the 95% confidence intervals of the 5 points

Low-contrast detectability

As expected, irrespective of the lesion size, the low contrast detectability increased with the dose level (Figs. 2 and 3).

For the largest lesion size (8 mm), at 11 mGy, corresponding to the Swiss DRL of the investigated liver protocol, the AUC reached a high image quality level with values higher than 0.95 for 10 out of 12 CT scanners (Fig. 2 and Table 2). The use of a dose level below 7 mGy (25th percentile of the DRL distribution) induced a loss of image quality. The percentage of AUC reduction when decreasing the dose level from 11 to 7 mGy, varied from 1.7 to 4.3% for the various CT scanners. The variability of image quality between the various CTs is higher at low-dose levels: The AUC ranged from 0.90 to 0.96 at 7 mGy (Table 2). A comparable level of image quality was obtained at substantially different CTDIvol values. For example, an AUC of 0.95 was obtained at a range of doses between 5.3 and 13 mGy, as calculated using the best-fit equations.

Table 2 AUC values for the 8 mm (top) and 5 mm (bottom) lesion size calculated using the fit equation for each CT scanner at three dose levels: the 25th percentile of the DRL distribution, the 50th percentile (achievable dose), and the 75th percentile (DRL). The percentage of AUC reduction was then calculated when decreasing the dose level from the 75th percentile to the 25th percentile of the DRL distribution

For the smaller lesion size (5 mm), the AUC results were lower than for the 8 mm lesion size, as expected. The AUC increased with the dose but never reached a high level of image quality for all CTs. Indeed, the mean AUC over all CT scanners was only 0.86 at 11 mGy and reached 0.91 for the highest dose level (Fig. 3). The use of a dose level lower than the DRL induced a higher loss of image quality in comparison with the 8-mm lesion size (Table 2). The percentage of AUC reduction when decreasing the dose level from 11 to 7 mGy, varied from 3.6 to 6.8% for the various CTs. The AUC ranged from 0.76 to 0.86 at 7 mGy. An AUC of 0.85 was obtained at a range of doses between 6.0 and 14.3 mGy.

High-contrast detectability

The most challenging high contrast task was the detection of a 3 mm calcic lesion (Fig. 4). The results for the 5-mm lesion are presented in Fig. 5. For each CT, the detectability increased with the dose. But even at the lowest dose level (2 mGy), for both lesion sizes, the SNR for all CTs was very high (AUC close to 1.0), indicating that the detection of lesions with such sizes and nominal contrast relative to a homogeneous background was trivial.

Fig. 4
figure 4

Signal-to-noise ratio (SNR) calculated for the 3-mm lesion size using the NPWE model observer as a function of CTDIvol in the slice of interest for the 12 CT scanners. The vertical and horizontal uncertainty bars represent the expanded uncertainty (k = 2, 95% level of confidence) for the SNR and the CTDIvol, respectively. The solid black line was plotted by joining 5 points representing the mean SNR and the mean CTDIvol over all 12 CTs for each dose level. The gray band was plotted by joining the limits of the 95% confidence intervals of the 5 points

Fig. 5
figure 5

Signal-to-noise ratio (SNR) calculated for the 5 mm lesion size using the NPWE model observer as a function of CTDIvol in the slice of interest for the 12 CT scanners. The vertical and horizontal uncertainty bars represent the expanded uncertainty (k = 2, 95% level of confidence) for the SNR and the CTDIvol, respectively. The solid black line was plotted by joining 5 points representing the mean SNR and the mean CTDIvol over all 12 CTs for each dose level. The gray band was plotted by joining the limits of the 95% confidence intervals of the 5 points

Discussion

In the framework of patient radiation dose optimization, it is essential to ensure that both the dose and image quality are equally balanced to fulfill the diagnostic requirements at the lowest possible dose [13]. The detection of low-contrast lesions in a uniform background is a simple task in comparison with the complexity of a radiological diagnosis for the detection of focal liver lesions. However, even in this simple condition, the task is challenging (Figs. 2 and 3). For the largest lesion size investigated (8 mm), the dose optimization curve reaches a high level of image quality (mean AUC over all CTs higher than 0.95) at approximately 11 mGy (corresponding to the DRL). However, there is a loss of low-contrast detectability for all CTs when using lower dose levels. Our results indicate that one has to be cautious when using doses below the current Swiss DRL (11 mGy) and even more below the 25th percentile (7 mGy), as discussed in ICRP 135 [12]. For the 5-mm lesion size, the task is even more challenging. The detectability never reached a high level of image quality when increasing the dose from 4 to 20 mGy. Furthermore, the variations in image quality between CT scanners should imply a difference of diagnostic information contained in clinical images. Conversely, different doses should be used to achieve the same outcome when dealing with low contrast detection (see Table 2). This shows the limitation of the DRL concept for optimizing radiation dose without assessing image quality. The high contrast detection task was chosen to simulate the detection of renal stones. It appears that this task in homogeneous background is not challenging enough to assess the potential dose optimization. Even for the smallest dose level investigated (2 mGy) and the smallest lesion size (3 mm in diameter), the detectability is very high for all CTs, indicating a perfect detection in this simple condition. Nevertheless, differences in the SNR between the CT scanners were observed for all five dose levels (Fig. 4). With these results, it seems reasonable to hypothesize that correct optimization would lead to different doses on different CT scanners for a more realistic, more challenging high-contrast detection task with anatomical background, or for size, shape, and CT number determination.

The results show that it is necessary to link national DRLs for specific clinical tasks with task-based image quality criteria. In the future, an image quality reference level associated with the DRL could be used for specific clinical tasks [39]. A discussion among the radiologists, the community should also be initiated to define a minimum level of image quality required, depending on the clinical indications, for a safe diagnosis. This could avoid excessive patient dose reduction, in particular for the detection of subtle lesions, as reported by several authors in phantoms [40, 41] and also in patient studies [42]. This approach follows ICRP publication 135, claiming that the “application of DRL values is not sufficient for optimization of protection. Image quality must be evaluated as well” [12]. The assessment of task-based image quality using mathematical observers is an objective and quantitative approach [17] and the outcomes are linked with human observer performances [26, 39]. The phantom presents some limitations. Firstly, the contrast of the various lesions in the phantom was created using plastic materials of low atomic numbers and cannot perfectly simulate the contrast of lesions in a CT acquisition that uses a contrast agent. Ideally, a phantom with iodine lesions should be used to optimize arterial and venous phases of abdominal protocols. Secondly, the background was homogeneous. We should expect that the use of a realistic anatomical background would be more challenging and the AUC results would be worse [43]. CT scanner–specific settings and properties like collimation, flying focus technique, pitch, tube voltage, rotation time, ATCM settings, reconstruction algorithms, slice thickness, and increment are not identical. However, these differences cannot be avoided. Particularly, the 3-mm slice thickness with an increment of 1.5 mm is not optimal to minimize the partial volume effect of the 5 mm lesion size [44]. Moreover, we did not reposition the phantom between scans, so the effect was not averaged out. Furthermore, the standard IEC CTDIw measurement method that was used in this study is known to underestimate CTDIw for wide CT beams because the scatter equilibrium is not achieved [45, 46]. However, no correction factor was applied to the IEC measurements because the collimation was smaller than 40 mm for 11 out of 12 CTs [47]. The described differences in CT scanner specific settings and properties do not allow a completely fair comparison between scanners. However, the goal was not to rate the CT scanners but to study typical CT scanner variability of the image quality at a given dose. Despite the stated limitations, the results show the limitation of the DRL concept. Hence, CT scanner model–specific DRLs could be an option to avoid an unjustified wide dispersion of image quality for well-defined clinical tasks. However, due to the great diversity of CT models and manufacturers on the market, their implementation in clinical routine is difficult. The application of local DRLs to check the clinical practice may be easier to implement using Dose Archiving and Communication Systems (DACS). Ideally, dose optimization should encompass both the DRL process and image quality evaluation using a task-based paradigm. However, the highest priority for the optimization process is to ensure that the image quality is sufficient for the clinical question.

In conclusion, task-based image quality was assessed for various dose levels related to the current DRL values. Assessing image quality metrics related to the clinical question to be answered must be an important part of the optimization process. Comparable image quality for specific clinical questions cannot be reached at the same dose level on all CT scanners. This variability between CTs implies the need for a CT model–specific dose optimization.