Introduction

Computed tomography (CT) colonography is currently adopted as one of the recommended screening tools for colorectal cancer and advanced adenomas by the American College of Radiology, the American Cancer Society, and the U.S. Multisociety Task Force on Colorectal Cancer [1]. Apart from yield and participation, costs have to be considered in the selection of a screening test. Recently doubts were raised about the cost-effectiveness of CT colonography compared with established screening techniques [2]. A modelling study estimated that CT colonography costs should not exceed 43 % of colonoscopy costs to be cost-effective [3]. One should therefore scrutinise all aspects of CT colonography screening contributing to the total cost, including reading strategies.

A potential approach to reduce costs is reading of CT colonography images by radiographers. Radiographers have been demonstrated to be as accurate as radiologists in reading CT colonograms for intracolonic findings in a single read strategy [4, 5]. When such a strategy is implemented, images still require additional reading for examining extracolonic findings. In the asymptomatic screening population the incidence of potentially important extracolonic incidental findings is estimated at about 10 %, which results in about 2.5 % relevant new diagnoses [6]. With radiographers as readers for the intracolonic findings in screening, having an additional radiologist read for extracolonic findings in all CT colonograms would be less practical.

Radiographers may also be capable of evaluating extracolonic findings after sufficient training. Previous studies have demonstrated that radiographers can accurately read mammograms [7], plain x-rays [8], intravenous pyelograms [9] and double-contrast barium enemas [10]. Much lower accuracy results have been seen in a study in which non-radiologists evaluated paediatric brain CT but no dedicated training was given in that study [11].

Before radiographers can triage CT colonography examinations for extracolonic findings, they should achieve competence. We evaluated the accuracy and learning curve of formally trained radiographers in triaging extracolonic findings in CT colonography.

Methods

A research grant was received from the Nuts Ohra Foundation (Amsterdam, the Netherlands). The Nuts Ohra Foundation was not involved in designing and conducting this study, did not have access to the data, and was not involved in data analysis or preparation of this manuscript.

We trained consenting radiographers to triage CT colonograms for extracolonic findings using a training program comprising two parts: a basic training and 200 feedback training cases (see flow chart, Fig. 1). Before and after the feedback training cases, radiographers triaged the same 40 CT colonograms (initial and final test cases). In this study, the radiographers did not assess CT colonograms for intracolonic findings.

Fig. 1
figure 1

Flow chart of the study

Basic training cases

For the initial training part of the programme (basic training), 100 clinical abdominal CT examinations were used: 40 CT examinations were enhanced with intravenous contrast medium, 54 were unenhanced and 6 were low-dose CT colonograms. One hundred CT examinations were assumed to be the minimum for the radiographers to become acquainted with the most important and/or frequent extracolonic findings (Table 1) [1214]. All CT examinations had been performed with two CT systems: 4-slice (Mx8000; Philips Healthcare, Best, the Netherlands) and 64-slice (Brilliance, Philips Healthcare) CT systems. A radiology research fellow (T.N.B.) searched the radiologists’ reports from consecutive clinical CT examinations. Important and/or frequent diagnoses were tallied at least one time. For difficult lesions (e.g. bone metastasis, lymph nodes and adrenal lesions) more examples were selected to supplement the dataset. For lesions not found in consecutive clinical CT examinations, examples were obtained from subspecialist radiologists of the area required. All selected CT examinations that were considered representative by an abdominal radiologist (J.S.) were included and subsequently anonymised.

Table 1 Extended version of C-RADS classification for radiographers. This version is based on the original classification as proposed by Zalis et al. [20]

Feedback training and test cases

The local medical ethics committee waived informed consent for this specific study because the 200 feedback training cases and 40 test cases were derived from three existing CT colonography study databases containing both symptomatic and asymptomatic individuals [1517]. The local medical ethics committee has approved these studies, in which all included study participants gave informed consent. These studies were performed with the same 4-slice and 64-slice CT systems, with which 65 and 175 unique CT colonograms respectively were assessed. All CT colonograms were obtained in both prone and supine position, using low-dose protocols and without intravenous contrast medium. Participants underwent an iodine contrast bowel preparation, in one study combined with barium.

From the three studies all CT colonograms with either an E3 or E4 finding according to the C-RADS classification were included in the dataset (Table 2) [1517]. For each CT colonogram with an E3 or E4 finding, two consecutive enrolled CT colonograms without E3 or E4 findings were added from the same study. This resulted in a dataset of 234 CT colonograms, to which 6 consecutive cases without E3 or E4 findings were added to create a dataset of 240 cases. The 200 feedback training cases contained 44 E3 and 19 E4 cases according to the reference standard (Tables 2 and 3). The 200 feedback cases were preceded and followed by an additional set of 40 cases, which constituted the initial and final tests. The tests contained 11 E3 and 4 E4 cases (Tables 2 and 3). The CT colonograms in the two tests were identical. Cases for the final test were arranged in a randomised order using the ASELECT function in Excel 2003. Observers were not aware that the CT examinations for both tests were identical, nor that these served as a test. Both feedback training cases and test cases were anonymised, using a custom-made computer program that automatically removed all patient data.

Table 2 The original C-RADS classification as proposed by Zalis et al. and the number of CT colonograms classified E1–E4 in the initial and final test cases and the feedback training cases [20]
Table 3 Detailed distribution of actual extracolonic findings and data composition for the 280 cases

Reference standard

All CT colonograms were prospectively read for extracolonic findings by an abdominal radiologist. This observer was not blinded to the intracolonic findings and had access to previous patient history. A second abdominal radiologist (C.Y.N.; previous experience approximately 100,000 abdominal CT examinations and >1,000 CT colonograms) retrospectively read all included anonymised CT colonograms to reduce the chance of visual errors and to ensure a correct C-RADS classification. After being informed of the possibly (E3) and probably (E4) important findings of the previous report, the second experienced abdominal radiologist gave his final judgement, which served as the reference standard in this study.

Observers

Thirteen radiographers were recruited using posters at our radiology department. After additional information, ten continued with the training program. Seven worked with CT systems during their daily work. Three radiographers had previous experience with reporting intracolonic findings, but none of them had reported extracolonic findings before. For detailed information about the radiographers’ experience see Table 4. The observers performed their training primarily in their own time.

Table 4 Experience and daily work of the radiographers (R)

Training program—basic training

All ten radiographers completed the basic training program. The program consisted of five components: study assignments, instruction manuals, presentation, hands-on training and watching expert reading. Radiographers were asked to study two correlative anatomy-imaging course books, including anatomical illustrations with corresponding axial CT images and to read two articles concerning the evaluation of abdominal lymph nodes and adrenal gland masses, as these are relatively frequent findings that require more insight [18, 19]. A course book about pathology in abdominal organs was given as reference. An instruction manual was provided to guide the observers through the CT examination step by step. For each organ the optimal window was indicated as well as what abnormalities to look for and what to measure. Observers also received an extended version of the C-RADS classification for extracolonic findings (Tables 1 and 2) [20].

The 16-h training program was given by an experienced abdominal radiologist and a research fellow (T.N.B.). Observers were familiar with the PACS system (Agfa IMPAX client, version 5.3) prior to the study, but received additional training on certain aspects, such as HU measurements. The C-RADS classification system was discussed followed by six normal anatomy CT examinations, of which three contained series with intravenous contrast medium, to see the anatomy in full detail [20]. Thereafter for every organ, the anatomy and all frequent and/or important extracolonic findings were discussed. Subsequently hands-on training was given using training cases with findings in that organ. Individual training was given for the most difficult 15 cases by an experienced abdominal radiologist (J.S.). Clinical background was presented where considered appropriate (e.g. clinical importance of a cystic pancreatic lesion compared with a simple renal cyst). Imaging findings were presented with emphasis on differentiating normal (E1) and frequent non-relevant extracolonic findings (E2) from other findings. We did not teach differential diagnoses for possible important extracolonic findings.

Training program—feedback

Eight of the ten radiographers reported all 280 CT colonograms (200 feedback cases and two times 40 test cases) for extracolonic findings in an electronic case record form. One could not continue for personal reasons, and one found it more time intensive than anticipated and stopped. Radiographers evaluated all examinations for C-RADS E2–E4 findings and performed size and HU measurements when appropriate. They expressed their level of confidence for every extracolonic finding on a ten-point scale and indicated whether or not they felt confident enough to read the case without a second read by a radiologist. A case was considered positively triaged by radiographers where one or more E3 findings (‘E3 case’) or E4 findings (‘E4 case’) were reported or where the need for read by a radiologist was indicated. Reading time per case was also documented.

For the 200 feedback cases, immediate feedback from the reference standard was disclosed for each case after findings had been submitted, as well as possible work-up and a final diagnosis for E3 and E4 lesions. For feedback cases observers were allowed to ask for further explanation by an abdominal radiologist; if this occurred this was documented. A third experienced abdominal radiologist (J.S.; previous experience approximately 40,000 abdominal CT examinations) evaluated the CT colonograms when a radiographer reported an E3 or E4 finding that did not correspond with the reference standard.

Statistical analysis

The findings of the observers were compared with the reference standard. For the initial and final test the sensitivity and specificity were calculated for all observers combined using the McNemar’s test. An E3 or E4 case was defined as a case in which the highest classified lesion by the reference standard was an E3 or E4 finding respectively.

For individual observers and all observers combined, learning curves for case sensitivity and specificity were constructed using a moving average technique (window of 60 cases and steps of 20 cases). A learning effect for case sensitivity and specificity was studied using binary logistic regression with number of cases as the independent variable. Sensitivity and specificity pairs were plotted in receiver operating characteristic (ROC) space, with sensitivity for E3 and E4 combined as the true positive rate and the corresponding false positive rate.

A reading time analysis was performed using a moving average (window of 20 cases and steps of 10 cases). The average reading time for the initial and final 40 test cases was compared using a paired t-test. Reader confidence for the initial and final 40 test cases was compared with a Wilcoxon rank-sum test. All analyses were performed using SPSS 16.0.2.1. In statistical tests P values smaller than 0.05 were considered to indicate statistical significance.

Results

Exams

In the initial test, eight radiographers combined had a sensitivity of 52 % (46/88; 95 % CI: 42–63 %) for E3 cases and 69 % (22/32; 95 % CI: 52–86 %) for E4 cases. For the final test the average sensitivity was 70 % (62/88; 95 % CI: 61–80 %) for E3 cases and 69 % (22/32; 95 % CI: 52–86 %) for E4 cases, a significant increase of 18 % for E3 cases (P < 0.0001) and no difference for E4 cases (P = 1.00). The combined specificity decreased from 83 % (165/200; 95 % CI: 77–88 %) in the initial test to 70 % (139/200; 95 % CI: 63–76 %) in the final test (P < 0.0001).

The radiographers’ average reading time per case decreased from 11:51 min to 4:13 min (P < 0.0001). Median radiographers’ confidence for all reported lesions increased from 8 out of 10 at the initial test to 9 out of 10 in the final test (P < 0.0001).

Learning curve

The case sensitivity learning curves for E3 and E4 for individual radiographers and eight radiographers combined are shown in Fig. 2a–d. The specificity learning curves are shown in Fig. 2e, f and the reading time curves in Fig. 2g, h. The ROC plot shows that with an increase in the proportion of true positives (sensitivity) the proportion of false positives also increased (Fig. 3).

Fig. 2a–h
figure 2

Learning curves for radiographers. a Average sensitivity for E3 cases and b for radiographers individually. c Average sensitivity for E4 cases and d for radiographers individually. e Average specificity and f for radiographers individually. g Average reading time and h for radiographers individually

Fig. 3
figure 3

Receiver operating characteristic (ROC) space for radiographers’ average proportion of true positives (sensitivity) for E3 and E4 combined versus the average percentage of false positive cases without E3 and E4 lesions. Each point in the ROC space indicates a time point in the learning curve. 1 indicates the first time point and 12 indicates the last time point

In triaging E3 cases there was a significant positive learning effect for three radiographers; for E3 and E4 cases combined the learning effects were significant for two radiographers (Table 5). In correctly identifying E4 cases only, none of the radiographers achieved a significant learning effect. None of the radiographers showed a significant learning effect in specificity (Table 5).

Table 5 Binary logistic regression results of individual radiographers (R) for sensitivity (E3, E4 and combined) and specificity on a per-case basis. The regression coefficient B and the results of the Wald test are shown

In 15 cases, the additional findings by radiographers resulted in an altered case classification from E1 or E2 to E3. None of the cases with an altered classification were test cases. No follow-up was performed as a result of these findings.

The median number of days needed to report the 200 training cases was 57 days (range 28–115). The eight radiographers combined asked 116 questions to the abdominal radiologist about the feedback cases; these included 112 questions about radiographers’ findings that were not in the reference standard.

Discussion

This study shows that a dedicated formal training program can lead to a significant increase in the sensitivity of radiographers in triaging E3 cases, without an increase in correctly triaging E4 cases. The overall triaging sensitivity and specificity at the end of the training were about 70 %, with substantial individual variation. There was a significant increase in reader confidence accompanied by a significant decrease in reading time. Specificity decreased after training, although this decrease was not significant for individual observers.

As shown previously, radiographers are capable of reading CT colonography for intracolonic findings [4, 5], but our results did not show high enough accuracy after training to facilitate an effective triage for extracolonic findings. The learning effect for triaging E3 findings at the end of training suggests that the maximum sensitivity may not yet have been reached with the number of training and feedback cases we offered. We were disappointed to learn that there was no learning effect for E4 cases and that specificity was low. The large number of different lesions, the differential diagnoses and the conspicuity of some lesions are plausible causes for the difficulty that radiographers had in reporting extracolonic findings. This is different from other learning curve studies concerning the evaluation of one organ with a limited number of differential diagnoses (e.g. intracolonic findings).

A recent study by Liedenbaum et al. showed that for reporting intracolonic findings, novice readers needed on average 164 CT colonograms with feedback to reach a desired level of competence [5]. In this study one of three radiographers reached sufficient competence within 164 CT colonography cases. As in our study, there were considerable differences in the individual performances of the radiographers. Some radiographers achieved very good sensitivity for the final test, but the specificity of these observers was below average. The increased reader confidence and reduced interpretation time observed in our study are encouraging findings, but of lesser importance than accuracy.

Apart from the possible lack of sufficient training, other important factors that may affect radiographer’s capability in reading extracolonic findings are aptitude and level of professional education. Based on this study we cannot determine whether the latter two factors are principal factors that—despite a modified training program—will preclude radiographer reading of extracolonic findings.

We invested substantial effort in the training program. We ensured the training program was not just formal and reproducible, but also extensive. It included all frequent and important extracolonic findings. We attempted to make the training fit well with radiographers’ knowledge. For an optimal learning effect, radiographers were blinded from feedback until submitting the report of a case. Only highly motivated radiographers participated, because free time had to be invested. We helped the radiographers with classifying lesions, by providing an extensive C-RADS classification. Furthermore, new findings of radiographers were evaluated by an independent experienced abdominal radiologist.

A number of limitations have to be acknowledged. The dataset proved to be too small, as sensitivity and specificity did not seem to reach a plateau phase. We estimated the number of cases needed to triage important findings, without data to support our assumptions. Our training program may also not have been optimal for this specific group of observers. It was developed based on earlier experience by several groups who were successful in training non-radiologists to read imaging studies [5, 11, 21]. We combined both knowledge transfer and hands-on experience in a structured program with direct feedback. Additional findings by the eight radiographers combined in 15 cases altered the case classification from E1 or E2 into E3. All of these CT colonograms were feedback training cases; they did not influence the estimated sensitivity of the examination cases. The order of training cases was not randomised to make sure that lesions were well distributed over the dataset. The prevalence of extracolonic findings in this dataset was higher than in screening and results may be different in a screening population. Finally, we would like to mention that we performed multiple tests and some of the statistically significant results may have occurred by chance.

Our findings suggest that a radiographer-only strategy for screening CT colonography is not a viable option to further reduce the costs of screening CT colonography. Based on the triaging results shown in this study, the number of CT colonograms that would have to be read by a radiologist in a screening setting would still be around 40 %, while probably missing lesions. Even if performance had been comparable to that of a radiologist, medicolegal aspects of having CT colonography examinations read by radiographers and a radiographer-only reading strategy may have been a hurdle in many countries.