Introduction

Women with dense breasts (breast density C and D) constitute about 40% of the female screening population and pose an organizational challenge to any supplemental imaging program. Using supplemental imaging to screen for breast cancer is an active research topic and contrast-enhanced breast MRI is regarded as the most promising modality to compensate for the limited ability of mammography to detect breast cancer in women with extremely dense breast tissue (density D) [1].

Consequently, based on the results of the DENSE trial and the higher breast cancer risk in dense breasts, the EUSOBI has published recommendations in favor of breast MRI screening in women with extremely dense breasts [2, 3]. Contrast-enhanced breast MRI, however, is expensive and its application for screening is limited by the need to inject intravenous (IV) contrast and concerns regarding the perceived risk of adverse effects of gadolinium-based contrast media.

In addition, though not yet proven in a prospective trial, it is conceivable that MRI would improve screening also in women with moderately dense breasts (breast density C). Contrast-enhanced mammography as a competitor for breast MRI also requires IV contrast and increases exposure to ionizing radiation. In addition, the risk profile of iodine-based contrast media is less favorable than that of gadolinium-based contrast media. Contrast agent is not needed for diffusion-weighted imaging (DWI), an alternative MRI technique that visualizes and quantifies the Brownian molecular motion in the extracellular space in vivo. Breast cancer typically shows areas of hindered diffusion that appear hyperintense on diffusion-weighted images and can thus be used for cancer detection. The apparent diffusion coefficient is a DWI metric that can quantitatively distinguish benign from malignant findings. Initial research on DWI, which is acquired faster than a contrast-enhanced breast MRI protocol, has demonstrated favorable diagnostic results very similar to contrast-enhanced breast MRI [4, 5]. However, no prospective study has yet demonstrated the ability of DWI to screen for breast cancer in women with dense breasts. One reason is the lack of technical robustness of the DWI technique, mainly presenting as artifacts due to insufficient fat suppression and image noise [6]. Given basic MRI physics, it seems evident that these issues should be more aggravated in women with fatty breasts, and less severe in MRI screening aiming at women with dense breasts. For the sake of readability, we refer to the amount of fibroglandular tissue as “breast density,” though this is not entirely the same because MRI does not measure density.

To test the hypothesis that image quality of DWI is higher in women with a high amount of fibroglandular breast tissue, we designed a retrospective bicentric study with multiple readers and compared the image quality of DWI and contrast-enhanced breast MRI stratified by the amount of fibroglandular tissue as a measure of breast density.

Methods

Study design

This bicentric visual grading analysis study was performed as a retrospective, cross-sectional, multi-reader diagnostic image quality assessment study at site 1 (University hospital Groningen, the Netherlands) and site 2 (Allgemeines Krankenhaus, Medical University of Vienna, Austria). Both local medical ethical committees waived the need for informed consent due to the retrospective nature of the study. At both sites, a consecutive sample of 200 breast MRIs of female subjects without a history of breast surgery or breast prosthesis was retrospectively included, adding up to 400 included scans. Mean age of included women was 46 years (±13, range 19–83), 42 years at site 1 and 50 years at site 2. Scans of each site were read by two local readers (5th year residents specializing in breast radiology). All readers individually scored 125 scans. Reader 1 read scans 1–125 (site 1), reader 2 read scans 76–200 (site 1), reader 3 read scans 201–325 (site 2), and reader 4 read scans 276–400 (site 2). Thereby, on both sites, 50 cases were double read to check for inter-observer agreement. All readers were blinded to the study aim, hypothesis, original radiology report, and pathology. Indications for scanning were mostly screening in high-risk women. Scans were performed in 2016–2019 for site 1 and 2019–2020 for site 2.

Breast MRI protocol

Both sites used 2 scanner types, randomly assigned by scanner availability at the hospital. Both 1.5-T (Both sites: Avanto_fit) and 3.0-T (Site 1: Skyra, Site 2: Prisma) MRI scanners (Siemens Healthineers) were used. Reading protocol consisted of T2 with fat suppression, DCE-T1 early post-contrast images (1st series after contrast administration at around 70 s), and diffusion-weighted images.

Site 1:

  1. o

    DCE-T1-SPAIR: slice thickness 1.2 mm, FOV 350 mm, TR 5.27 ms (1.5 T)/4.50 ms (3.0 T), TE 2.39 ms (1.5 T)/1.63 ms (3.0 T). DCE-T1-DIXON: slice thickness 1.2 mm, FOV 380 mm, TR 5.41 ms, TE 2.46 and 3.69 ms.

  2. o

    DWI-SS-EPI-SPAIR: slice thickness 4 mm, FOV 350 mm, TR 5500 ms (1.5 T)/6900 ms (3.0 T), TE 64 ms (1.5 T)/65 ms (3.0 T). b-values 0, 50, 200, 500, 800, 1000 s/mm2. For the ADC map, b-values 0 and 1000 s/mm2 were used.

Site 2:

  1. o

    DCE-T1 DIXON: slice thickness 2mm, FOV 360 mm (1.5 T)/340 mm (3.0 T), TR 10 (1.5 T)/3.91 ms (3.0 T), TE 2.39 and 4.77 ms (1.5 T)/1.29 and 2.52 ms (3.0 T).

  2. o

    DWI-SPAIR: slice thickness 3.5mm (1.5 T)/3 mm (3.0 T), FOV 408 mm (1.5 T)/360 mm (3.0 T), TR 5900 (1.5 T)/4000 (3.0 T), TE 105 (1.5 T)/91 ms (3.0 T). b-values 0, 20, 30, 50, 100, 200, 400, 800, and 1500 s/mm2 (3.0 T) and 0 and 800 s/mm2 (1.5 T). For the ADC map, b-values 0 and 800 s/mm2 were used.

At site 1, DCE-T1 SPAIR series were randomly replaced by DCE-T1 DIXON series at 2019 in clinic. Therefore, of the included scans, 24 DCE-T1 scans were performed with DIXON fat suppression (six cases were ACR breast density A, 1 case density B, 14 cases density C, and 3 cases density D).

Visual grading image analysis

Visual grading analysis was performed using ordinal scales [7]. Readers started with scoring the ACR BI-RADS breast density as follows: A = almost entirely fatty, B = scattered areas of fibroglandular tissue, C = heterogeneously dense, D = extremely dense. For the sake of readability, we refer to the amount of fibroglandular tissue as “breast density” throughout the paper though this is not entirely the same and MRI does not measure density. In order to account for confounders, age of the women, background parenchymal enhancement (BPE), and the presence or absence of enhancing lesions (including small spots/foci besides BPE) or cysts were assessed. Enhancing lesions on the DCE-T1 and cysts on the T2 images were scored as one out of 4 options: none/only a few/moderate amount/a lot. BPE was scored based on the ACR BI-RADS lexicon as follows: minimal/mild/moderate/marked. Overall image quality and fat suppression quality of both DCE-T1 and ADC/DWI images were scored on a standard 5-point Likert scale of 1 to 5, in which 1 = a minimal/non-diagnostic quality and 5 = maximal quality. General image quality was based on visual assessment of signal, noise, contrast, sharpness, artifacts (ghosting), and the presence of anatomical distortions.

Statistical analysis

To test for inter-observer agreement, Kappa statistics was performed to check for agreement between readers 1 and 2 and between readers 3 and 4, who double read 50 cases of their research site. Kappa 0 = agreement fully based on chance; kappa 1 = complete inter-observer agreement. Thereafter, data of all 4 readers were combined. Cross tabulations were generated to visualize the differences in image and fat suppression quality of the different categories of fibroglandular tissue composition. Visual grading characteristic (VGC) curves and their area under the VGC curve (AUC) were calculated for fatty breasts (A/B) versus dense breasts (C/D) based on the theory of Bath et al [7]. Area under the VGC curve was also calculated for 1.5 vs. 3.0 T. IBM SPSS Statistics 24 was used for statistical analysis.

Results

Population characteristics

Of the 400 scans, 71 (18%) were scored as breast density A, 125 (31%) as B, 112 (28%) as C, and 92 (23%) as D. At both sites, 100 scans were performed at 1.5 T and 100 scans at 3.0 T. Table 1 shows the homogeneous breast density distribution of the two sites, separately visualized for 1.5- and 3.0-Tesla scanners.

Table 1 Distribution of the amount of fibroglandular tissue on 1.5 and 3.0 T for sites 1 and 2 separately. T Tesla

Inter-observer agreement

There was a good inter-observer agreement of breast density for readers 1 and 2 (site 1) (kappa = 0.595) and for readers 3 and 4 (site 2) (kappa = 0.734). Kappa values were 0.427 and 0.199 for image quality of T1 early post-contrast and 0.423 and 0.480 for quality of fat suppression of DCE-T1, respectively. Kappa values for image quality of DWI were −0.012 (± 0.109) and 0.498 (±0.090) and for fat suppression quality of DWI −0.033 (± 0.059) and 0.232 (±0.085). The negative values can be explained by differences in preferences of observers for using the extreme values on the Likert scale (1 or 5).

Breast density and image quality

In DWI, image quality was higher with ascending breast density, with good scores (score 3–5) for 85.9% of density D and 88.4% of density C cases, compared to 61.6% for density B and 53.5% for density A cases. High scores (score 4) for image quality of DWI were more common in breast density C and D compared to A and B (Table 2). In the comparison between dense breasts (C/D) versus fatty breasts (A/B), the VGC curve for overall image quality of DWI was in favor of dense breasts (C/D), with an area under the VGC curve of 0.659 (p < 0.001).

Table 2 Overall image quality for each amount of fibroglandular tissue category in absolute values (and percentages). DWI diffusion-weighted imaging, DCE dynamic contrast enhanced

Area under the VGC curve for overall image quality T1 early post-contrast was 0.574 (p = 0.011).

Overall image quality of DCE-T1 was good, with comparable percentages of maximal scores (score 5) for breast density C and D (33% and 40.3%, respectively) compared to breast density A and B (22.5 and 24%, respectively) (Table 2). Image quality score 4 for T1-DCE was also comparable for breast density A, B, C, and D, with values of 63.4%; 63.2%; 58.0%; and 51.1% respectively.

Breast density and fat suppression quality

In DWI, quality of fat suppression increased with higher breast density (Fig. 1) with good scores (score 3–5) for 86.9% of density D and 90.2% of density C cases compared to 76.0% for density B and 54.7% for density A cases. A high score of 4 was given to 53.3% of density D, 46.4% of density C, 38.4% of density B, and only 31% of density A breasts. The VGC curve visualizes this (relatively small difference), with an AUC of 0.615 (p < 0.001) for DWI images, favoring dense breasts (category C/D) (Fig. 2).

Fig. 1
figure 1

Stacked bar chart of the quality of fat suppression of diffusion-weighted imaging for each amount of fibroglandular tissue category

Fig. 2
figure 2

Visual grading characteristics (VGC) curve comparing the fat suppression quality of DWI images and T1 early post-contrast images between breast with a low and a high amount of fibroglandular tissue

Quality of fat suppression of T1-DCE was also higher for breast density C/D compared to that for A/B, with, for example, higher percentages for maximal quality (score 5) of 45.7% for density D, 42.9% for density C, 33.6% for density B, and 29.6% for density A. Figure 3 shows this result in a stacked bar chart. The VGC curve showed an AUC of 0.569 (p = 0.016) or fat suppression quality of T1-DCE, favoring dense breasts (Fig. 2).

Fig. 3
figure 3

Stacked bar chart of the quality of fat suppression of DCE-T1 for each amount of fibroglandular tissue category

Comparison of fat suppression types in DCE-T1

In total, 224 DCE-T1 scans were performed with DIXON fat suppression (all scans of site 2 and 24 of site 1) and 176 with SPAIR (site 1). Fat suppression quality was higher for DIXON, with 52.7% of total scans scoring maximal, versus 1.1% for SPAIR. Higher fat suppression quality for DIXON compared to SPAIR was seen in all breast density categories (Fig. 4).

Fig. 4
figure 4

Stacked bar chart showing the quality of fat suppression of DCE-T1 for DIXON and SPAIR fat suppression, displayed per amount of fibroglandular tissue category

Comparison of field strengths

In the comparison of image quality between 1.5 T and 3.0 T, there was a slight bend of the VCG curve towards 3.0 T, with an AUC of 0.561 (p = 0.029) for DCE-T1 and 0.570 (p = 0.015) for DWI. There was no difference for fat suppression quality of DWI (AUC 0.553, p = 0.066) between 1.5 and 3.0 T. Fat suppression quality of DCE-T1 was in favor of 3.0 T (AUC 0.617, p < 0.001).

Discussion

Our study demonstrates that image quality of DWI depends on breast density, specifically on the amount of fibroglandular tissue (FGT). Dense breasts (ACR C/D) had higher scores on both overall image quality and fat suppression quality of DWI, as pointed out in the VGC curves bending towards dense breasts (Fig. 2). These results are important as they explain the heterogeneous results of DWI in terms of diagnostic image quality in literature, which led to the assumption that DWI in general is not robust. In fact, the majority of DWI scans performed in two centers and with four MRI scanners (both 1.5 T and 3.0 T) were of diagnostic quality. This emphasizes the fact that DWI indeed may be tested for screening purposes already. Dense breasts also showed high overall DWI image quality scores compared to fatty breast. On a side note, contrast-enhanced T1-weigthed images did not show a relevant dependence on the amount of FGT.

In regard to the recent developments of tailored breast cancer screening and the advice to apply supplemental screening with MRI in extremely dense breasts, it is important to find a fast and convenient scan protocol [1, 8, 9]. It has been shown that MRI as solitary screening technique in extremely dense breast is cost-effective, especially when applying a 4-year interval. However, some authors argue in favor of shorter, still cost-effective, screening intervals [10,11,12]. Nonetheless, the need for contrast injection constitutes an invasive screening approach. Therefore, a DWI screening protocol that avoids IV contrast would be more feasible and less invasive. The high image and fat suppression quality of DWI in women with dense breasts (which would be the population to screen) is encouraging.

Until now, only the influence of breast density on conventional imaging such as mammography (MG) is studied and awareness of the negative impact of higher breast density on the detection rate is growing [13]. However, in breast MRI, only the independence of contrast-enhanced imaging diagnostic outcomes from breast density has been demonstrated [14]. There is no comparable literature on the influence of breast density on DWI quality or diagnostic performance. Regarding DWI image quality, only data addressing the influence of different b-values, synthetic b-values, or different DWI and fat suppression techniques have been studied previously [15,16,17,18,19].

General image quality, as rated by the 4 readers, is based on signal, noise, contrast, sharpness, artifacts (ghosting), and the presence of anatomical distortions. While in T1-DCE fat suppression may help image interpretation, it is mandatory for valid image interpretation in DWI. This extra step in image acquisition gives rise to more artifacts, distortions, or areas of incomplete fat suppression. These issues depend on the amount of fatty tissue to be suppressed and thus explain our results showing higher image and fat suppression quality in dense breasts.

In T1-DCE imaging, a subgroup analysis compared fat suppression quality between SPAIR and DIXON, showing higher fat suppression quality for DIXON with 52.7% maximal scores (score 5) compared to 1.1% for SPAIR. We therefore recommend DIXON fat suppression for T1-DCE in clinical breast MRI, leading to better image quality independent of B0 inhomogeneity without the need for extra scanning time. This is in concordance with the results of Kalovidouri et al, who studied both the imaging quality and signal to noise ratio and confirmed the superiority of DIXON (at 3.0 T) compared to SPAIR in breast imaging [20]. Clauser et al also concluded that the DIXON technique outperformed a spectral water suppression technique in all evaluated visual grading criteria in breast T1-DCE at 3.0 T [18], in analogy to musculoskeletal tumor imaging [21].

Furthermore, this study showed no influence of the field strength on the image or fat suppression quality. In literature, it is suggested that fat suppression is more difficult at 3.0 T compared to that at 1.5 T [22]. Breast size, which may influence fat suppression, was not available and therefore not taken into consideration.

One of the qualities of the study is the high sample size and the multicentric multi-reader approach. A limitation of a visual grading analysis study is the impossibility to know the exact image characteristics or image disturbances on which the grading score was based for each reader. However, the good inter-observer agreement of DWI imaging showed the consistency of the scoring and its validity. Another limitation is the non-diagnostic approach, with image quality as a surrogate. We did for instance not investigate the dependence of fibroglandular tissue on lesion conspicuity as the visibility of breast lesions using unenhanced DWI-based imaging has been since long established and as there are no physical reasons why FGT should or even could have such an influence. However, these results are sufficient to refute the notion of “DWI is not robust,” by revealing a major influencing factor on DWI quality.

With the help of automated breast density scoring and possible other deep learning applications for excluding normal scans, DWI could be further tested as an additional screening method in dense breasts [23, 24]. Future research is needed to optimize fat suppression protocols for non-dense breasts and develop deep learning algorithms for breast cancer screening with DWI.

Conclusion

Dense breasts show excellent fat suppression and substantially higher image quality in DWI images compared with non-dense breasts. These results support the setup of studies exploring DWI-based MR imaging without IV contrast for additional screening of women with dense breasts.