Introduction

Dynamic contrast-enhanced Magnetic Resonance Imaging (MRI) is increasingly used to evaluate pathological features of the breast. Applications for MRI of the breast include diagnostic and screening indications [16]. Image analysis is based on the enhancement pattern of lesions in dynamic breast MRI and on morphological characteristics [79]. Using those two criteria for the interpretation of the images, breast MRI has a very high sensitivity, which usually exceeds 90% [1012] and a negative breast MRI shows a sufficient high negative predictive value (NPV) (97%) to safely rule out malignancy [1315]. However, breast MRI has several limitations, the overall reported specificity varies between 67% and 72%, which therefore results in a high number of false-positive results [10, 12, 16]. Furthermore, MRI requires significant time for image acquisition, processing and interpretation [17, 18]. In order to try to overcome those limitations, Computer Aided Detection (CAD) programs for MR imaging of the breast have been developed [18]. In general, CAD software was developed to identify suspect features on the image and bring them to the attention of the radiologist, in order to decrease false-negative readings [19]. However, in breast MRI, most lesions were regarded as having already been detected by the radiologist. Therefore, the primary aim to develop CAD for breast MRI was not to identify lesions, but to assist the radiologist in determining which lesions are benign and which are malignant.

Computer-aided detection systems automate many processing and analysis functions, which would normally have to be performed manually by MRI technologists and radiologists. The automated kinetic assessment of CAD generates a colour-coding based on the signal intensity voxel changes during the enhancement of the breast tissue. This provides an easier way of interpreting the patterns of contrast enhancement (persistent, plateau and washout enhancement) across a series of images, which may help identify lesions and their likelihood of being malignant.

The implementation of CAD software may improve the accuracy of breast MRI by reducing the number of false-positive diagnoses and by shortening the time needed to interpret breast MRI images [17, 18, 20, 21]. Furthermore, a state of the art CAD system should automatically identify (almost) all non-calcified lesions suspected of malignancy at mammography. This is reflected by a very high sensitivity and NPV for these non-calcified breast lesions.

The purpose of this systematic review and meta-analysis is to assess the radiologists’ accuracy in discriminating benign from malignant breast lesions regarding breast MRI with and without CAD implementation in terms of sensitivity and specificity.

Materials and methods

Search strategy

A computerised search was performed to identify all relevant studies in Medline and Embase up to 2010. The following search terms were used in Medline: “Diagnosis, Computer-Assisted” [Mesh term] OR “computer-aided-diagnosis” [Text Word] OR “computer-aided-detection” [Text Word] OR “computer-aided” [Text Word] OR “CAD” [Text Word] OR “three-time-point method” [Text Word] AND “Magnetic Resonance Imaging” [Mesh term] OR “Magnetic Resonance Imaging” [Text Word] OR “MRI” [Text Word] OR “MR imaging” [Text Word] AND “Breast Neoplasms” [Mesh term] OR “breast cancer” [Text Word] OR “mamma carcinoma” [Text Word] OR “malignant breast lesions” [Text Word] AND (“Sensitivity and Specificity” [Mesh term] OR “specificity OR sensitivity” [Text Word]. In Embase the same strategy was used. All languages were considered.

Eligibility criteria and study selection

We searched for studies assessing the value of CAD for a radiologist in the discrimination between benign and malignant breast lesions with MRI. Studies were included if the following inclusion criteria were met: (1) all patients had undergone breast MRI; (2) a commercially available CAD system was used; (3) the study population had benign and malignant breast lesions; (4) accuracy, sensitivity, specificity, positive predictive value and/or negative predictive value was/were measured or could be derived, and (5) studies had to be published with original data in peer-reviewed journals. Studies in which an institution-specific CAD system was used were excluded, as well as reviews, editorials and case reports.

Two researchers (MDD, MCJW) independently selected relevant studies based on title and abstract or full article. Any discrepancies concerning the study selection were resolved by discussion of the full article. The complete search yielded 587 studies. 519 out of 587 studies were excluded based on the title. After removing duplicates (n = 12), 56 studies were screened on title and abstract. Twenty-nine studies did not meet the inclusion criteria (technical article (n = 20), overview (n = 8) and case report (n = 1)). From the remaining 27 studies the full article was reviewed. Seventeen studies were excluded because CAD was an institution-specific CAD system. Ten studies [2029] fulfilled our inclusion criteria (Fig. 1). Eight studies were in the English language [2024, 26, 27, 29]; the other 2 were in the German Language [25, 28].

Fig. 1
figure 1

Flow chart of search results, with reasons for exclusion and the total number of studies included

Data collection and quality assessment

The following study descriptives were extracted: population descriptives (age, number of patients, number of benign and malignant lesions), study design, type of MRI used, type of CAD software used, minimum threshold enhancement used, number of radiologists that assessed the MR images with and without the use of CAD and diagnostic accuracy numbers (true-positives, false-positives, true-negatives and false-negatives).

Study quality was assessed independently by the same two observers using the QUADAS tool [30, 31], disagreement was resolved by arbitration. This evidence-based tool is developed specifically to assess the quality of diagnostic accuracy studies and includes 14 quality items. The 14 items can be scored as “yes”, “no” or “unclear”. The total score can range from 0 to 14, in which 14 is the maximum attainable score.

Statistical analysis

The performance of the radiologist in distinguishing breast lesions on MRI with and without the implementation of CAD was assessed. Besides the use of CAD, comparisons were made between radiologists with experience in imaging assessment and residents or radiologists with no or minor experience. Primary outcome was sensitivity and specificity at tumour level. Pooling of data was performed within the bivariate mixed-effects binary regression modelling framework. Model specification, estimation and prediction were carried out with xtmelogit in STATA. Using the model summary sensitivity and specificity were calculated, and a summary ROC curve was drawn (with AUC and confidence intervals). A forest plot was generated containing the individual study sensitivities and specificities with 95% confidence intervals (CI) and the pooled sensitivity and specificity estimates.

A test for heterogeneity was applied, using the I2 statistic [32]. This statistic calculates the percentage of total variation across studies that can be attributed to inter-study heterogeneity, ranging from 0 (no heterogeneity) to 100% (all variance due to heterogeneity). The presence of publication bias was visually assessed by producing a funnel plot. In STATA linear regression was performed of log odds ratios on the inverse root of effective sample sizes as a test for funnel plot asymmetry. The log odds ratios are defined as the log transformed diagnostic odds ratios, which are needed for the performance of linear regression. Publication bias was considered present if there was a significant non-zero slope coefficient, (p < 0.10), suggesting that only the small studies reporting a high sensitivity with CAD had been published, whereas the small studies reporting a lower sensitivity had not been published. Data were analysed in SPSS 16.0 (SPSS, Chicago, IL, USA), Meta Disc [33] and STATA SE version 11.0 (STATA, College Station, TX, USA).

Results

Study descriptives

The 10 studies included a total of 895 patients (range 29–329) with a total of 1264 breast lesions (range 33–469) of which 606 were classified as malignant (range 9–279) and 658 as benign (range 22–190) [2029].

In 5 [23, 24, 2628] studies a selection was made of patients with suspect findings based on mammography and ultrasound examinations. In the other 5 studies [2022, 25, 29] patients with a suspect lesion on MRI were included. One of these 5 studies retrospectively searched the database of an ongoing MRI screening study of patients at high risk of breast cancer for BIRADS 3–5 lesions that were detected with MRI [22], and 2 studies included lesions that were not palpable and were not visible on mammography or ultrasound [20, 21]. In all 10 studies histology was used as the gold standard. In 4 studies a follow-up MRI after 6 or 24 months was performed [23, 25, 28, 29]; in the case of positive findings biopsy provided further histological assessment.

Mean study quality was 12.6, ranging from 10 to 14. Four studies were of maximum quality (Table 1) [20, 21, 26, 27].

Table 1 Study characteristics of the 10 included studies (SD standard deviation, NR not reported, P prospective, R retrospective, c consecutive, TB tumour-based)

CAD systems

In all 10 studies the CAD systems (CADstream, DynaCAD, Fulltime point, 3-Time-Point Method and CAD-Gaea) incorporated precontrast medium (unenhanced) images and 2 (immediate and delayed) or all postcontrast medium (enhanced) images. The CAD systems compared pixel intensity values on the precontrast medium and immediated postcontrast medium series. If a pixel value increased above a user-specified minimum enhancement threshold, such as a 50 or 100% increase in enhancement, the pixel was regarded as meeting threshold enhancement. Once a pixel was identified as enhancing above the established threshold, the CAD systems compared pixel signal intensity values on the immediate and delayed postcontrast medium series to indicate washout enhancement, plateau enhancement or persistent enhancement. A specific colour or colour intensity was assigned to each pixel for different types of tissue enhancement. The end result of all CAD systems was a colour overlay on each MRI slice indicating regions of significant enhancement and providing details about enhancement type and extent.

CAD threshold enhancement

Six [2024, 27] studies analysed the presence or absence of “threshold enhancement” at different minimum thresholds. Lehman et al. [20], Williams et al. [21] and Meeuwis et al. [27] used the CAD system CADstream. The sensitivity at the minimum thresholds of 50%, 80% and 100% remained the same. The specificity increased at higher minimum thresholds. The study by Meeuwis et al. [27] showed a higher specificity than the other two studies. In the studies by Baltzer et al. [23, 24] DynaCAD was used. In these studies, the sensitivity decreased and the specificity increased at higher minimum thresholds. CAD-Gaea [22] had the same results as DynaCAD with respect to sensitivity and specificity, although CAD-Gaea had a lower level of specificity. Meeuwis et al. [27], using CADstream, reported the highest sensitivity and specificity (Table 2).

Table 2 The sensitivity and specificity of a CAD system using the presence or absence of lesion enhancement at the user-specified minimum thresholds

Radiologist with or without CAD

In 8 out of 10 studies [2229] the sensitivity and specificity of the radiologist or resident in assessing MR images with the use of CAD were measured (Table 3). The enhancement thresholds used were set up individually according to the radiologist’s preference. Furthermore, in 4 [24, 2729] out of these 8 studies the sensitivity and specificity of the radiologist or resident was also calculated without the use CAD. In these four studies the radiologists or residents assessed the MR images as visual evaluation of contrast enhancement or by making a manual curve analysis by using the region of interest (ROI) method [24, 2729]. The pooled sensitivity and specificity of radiologists and residents assessing breast lesions on MRI without the implementation of CAD was 82% (95% CI: 72–90%) and 81% (95% CI: 74–87%), respectively. With CAD implementation they attained higher sensitivity scores (sensitivity: 89%, 95% CI: 83–93%; specificity: 81%, 95% CI: 76–85%) (Fig. 2). The sROC curve showed an AUC of 0.89 (Fig. 3). In 3 studies differentiation was made between radiologists with experience and residents with no or minimal experience [2729]. The experience of those radiologists varied from 5 to 15 years (or > 500 MRIs). Residents had no more than 6 months (or < 50 MRIs) breast MRI experience. After stratification, the experienced radiologists showed a comparable pooled sensitivity of 89% with (95% CI: 81–94%) and without (sensitivity: 89%; 95% CI: 78–94%) CAD implementation. The pooled specificity of 86% (95% CI: 79–91%) decreased to 82% (95% CI: 76–87%) with CAD. Residents or radiologists with less experience showed a pooled sensitivity of 72% (95% CI: 62–81%) and a pooled specificity of 79% (95% CI: 69–86%) when assessing breast lesions on MRI without CAD. With the use of CAD, their sensitivity increased to 89% (95% CI: 80–94%), whereas their specificity remained comparable (specificity: 78%; 95% CI: 69–84%) (Table 4).

Table 3 The performance of radiologists and residents in breast MRI diagnosis in terms of sensitivity and specificity with and without the use of a CAD system, specified for type of CAD and MRI system, number of lesions, and experience (RAD radiologist RES resident)
Fig. 2
figure 2

Funnel plot with log odds ratios on the inverse root of effective sample sizes for visualisation of publication bias

Fig. 3
figure 3

Summary ROC curve regarding the studies of radiologists and residents using a CAD system

Table 4 Results of pooled sensitivity and specificity (95% CI) of the radiologist in assessing breast lesions on MRI with and without the use of a CAD system in general, stratified for experienced radiologists and residents with no or less experience (RANDOM effects model)

Analyses of heterogeneity

Moderate to substantial heterogeneity was observed among the eight studies exploring the sensitivity and specificity of radiologists assessing MR images with and without the implementation of CAD (sensitivity: no CAD: I2: 78%, p < 0.0001, with CAD: I2: 80%, p < 0.0001; specificity: no CAD: I2: 46%, p = 0.007, with CAD: I2: 55%, p = 0.002). After stratification of radiologists with experience and residents with no or minimal experience, heterogeneity did not change for experienced radiologists (sensitivity: no CAD: I2: 79%, p < 0.0001, with CAD: I2: 83%, p < 0.0001), and residents (sensitivity: no CAD: I2: 79%, p = 0.009; with CAD: I2: 64%, p = 0.02). Concerning specificity, heterogeneity dropped to low to moderate (radiologists without CAD: specificity: I2: 56%, p = 0.04, radiologists with CAD: specificity: I2: 62%, p = 0.01; residents without CAD: specificity: I2: 33%, p = 0.22; residents with CAD: specificity: I2: 24%, p = 0.26). Because of heterogeneity and possible unmeasured variance at the study level a random-effects model was used to obtain all pooled estimates, as this model interprets the available data with more caution and uses broad confidence intervals.

Assessment publication bias

A non-significant non-zero slope coefficient (p-value = 0.16) indicated that there was no evidence of publication bias (Fig. 4). This suggests that we most likely did not miss studies with a negative outcome.

Fig. 4
figure 4

Forest plot of pooled sensitivity and specificity of radiologists and residents assessing breast lesions on MRI with the use of a CAD system

Discussion

This meta-analysis shows that the pooled sensitivity and specificity of the experienced radiologist for the assessment of breast lesions with MRI remains comparable with the implementation of CAD. Residents or radiologists with less experience seem to attain a higher sensitivity with CAD implementation, although not significant.

All selected studies were of high quality, so it is likely that the quality of the studies did not have a significant impact on the results of the meta-analysis. Furthermore, there was no evidence of publication bias and therefore it is not expected that the meta-analysis overestimates the effect of CAD evaluation.

The ten studies used different indications for breast MRI and there was a wide variation in the number and tissue type of lesions selected [2029]. This resulted in the greater heterogeneity. Therefore, we used a random-effects model that interprets the results with more caution. Furthermore, there was an indication of selection bias. In all studies the radiologists only assessed MRIs with lesions (≥ BIRADS 2) and discriminated between benign and malignant. This selection increased the prevalence of breast malignancy in the study population compared with the target population. The lesion selection could have influenced the performance of the radiologist.

Six studies analysed the influence of the presence or absence of “threshold enhancement” at different minimum thresholds. Of those six, the study by Meeuwis et al. [27] resulted in the highest sensitivity and specificity. This result could be explained by the fact that a 3.0T MRI system was applied, which has a better performance than the 1.5T MRI scanner which was used in the other studies.

With CADstream the sensitivity at a higher enhancement threshold remained the same, i.e. the same malignant lesions enhanced at the 50%, 80% and 100% thresholds [20]. The remaining false-negative enhancing malignant lesions showed no enhancement with CADstream due to a noise filtering process leading to failure of automatic analysis of small areas of enhancement [21, 27]. The specificity of CADstream increased at higher enhancement thresholds, which means that at a higher threshold benign lesions did not enhance [20]. Therefore, absence of lesion enhancement at higher thresholds helps to improve the discrimination between benign and malignant lesions. In comparison to the study by Meeuwis et al. [27], the low specificity of the study by Williams et al. [21] is most likely due to the high prevalence (n = 22/71 versus n = 113/154) and the large tissue type variation of benign lesions.

With the DynaCAD software the specificity performance was analogous to that of the CADstream software. The sensitivity of DynaCAD however, decreased at higher threshold enhancements, not visualising all malignant enhancements, resulting in false-negative lesions [23].

Residents or radiologists with no or less experience achieved a higher sensitivity when they were accompanied by a CAD system for discrimination between breast lesions on MRI. The change in sensitivity after using CAD was not significant. Nevertheless, a considerable increase could be seen (sensitivity from 72%; 95% CI: 62–81% to 89%; 95% CI: 80–94%). This increase could be a result of the fact that CAD brings more enhancing lesions to the attention of the resident or inexperienced radiologist. Therefore, it seems that they benefit from CAD when assessing breast lesions with MRI. However, more research must be conducted to verify these results.

The performance of the experienced radiologists showed a non-significant decrease in specificity from 86% (95% CI: 79–91%) without CAD to 82% (95% CI: 76–87%) with CAD. A clarification for this observation could be that CAD systems are only based on the enhancement dynamic, without regarding the morphology of the lesion. As a consequence, the use of CAD could lead to a higher number of enhancing lesions, part of which could be assigned as benign on the basis of morphology. The experienced radiologists can be mislead by the enhancement pattern of CAD, resulting in a decrease in specificity. Therefore, it is important that experienced radiologists are aware of this.

In conclusion, concerning the assessment of MR images CAD has little influence on the sensitivity and specificity of the performance of radiologists experienced in breast MRI diagnosis. Therefore, breast MRI interpretation by radiologists remains essential. Residents or radiologists with less experience seem to benefit from a CAD system when performing breast MRI evaluation.