Introduction

The amount of fibroglandular breast tissue (FGT) is a recognized independent marker for breast cancer risk [15]. The American College of Radiology (ACR) [1] advises commenting on FGT, which is assessed by subjective visual estimation, when reporting mammography. However, it has been demonstrated that this assessment is prone to great intra- and inter-observer variability [6, 7]. The revised fifth edition of the ACR BI-RADS atlas has incorporated the recommendation to include such a subjective visual estimation of FGT with magnetic resonance imaging (MRI) [1]. This MRI feature has shown a high correlation to mammographic breast density assessment, and can be used to distinguish more clearly between density categories, which are closely related [810]. With regard to experience with mammography, it can be assumed that the assessment of FGT with MRI by subjective visual estimation would be prone to great intra- and inter-observer variability, which might limit its usefulness. However, no such data currently exists. The committee on BI-RADS recognizes that subjective estimates of FGT are imprecise [11, 12]. The investigation of three-dimensional, cross-sectional breast imaging modalities, such as MRI [11], in conjunction with observer-independent automated quantitative measurement systems [8, 1317] for more reliable measures of the true proportion of FGT, and thus, breast cancer risk, are encouraged. The first automated observer-independent quantitative measurement approaches with MRI have been explored, with promising results [8, 1316, 18]. Despite being aware of these limitations, for practical reasons, the committee on BI-RADS still recommends subjective visual estimation of FGT with MRI [19].

This study aimed to evaluate the inter- and intra-observer agreement of BI-RADS-based subjective visual estimation of FGT, and to investigate whether FGT assessment could benefit from an automated, observer-independent quantitative MRI measurement by comparing both approaches.

Materials and methods

Study design

Between February 2011 and June 2013, 90 women who were referred to our institution’s Breast Health Care Centre for screening or diagnostic workup of abnormal imaging findings, and who ultimately had normal or benign imaging findings with mammography and ultrasound [BI-RADS 1 (n = 71) and 2 (n = 9)], were recruited for this institutional review board (IRB)-approved prospective study. The use of oral contraceptives, hormonal replacement therapy, and other types of anti-hormonal treatments, as well as known contraindications to MRI, were defined as exclusion criteria. All women gave written, informed consent and underwent MRI for FGT assessment. In premenopausal women, all imaging studies were obtained between the 7th and 14th day of the menstrual cycle.

Imaging technique

MRI

All breast MRI examinations were performed in the prone position on a Siemens TimTrio MRI scanner at 3.0 T (Siemens, Erlangen, Germany) with a dedicated four-channel breast coil (In Vivo, Orlando, FL, USA). MR data for FGT quantification were acquired with the following sequence using the Dixon technique: TR/TE 6 ms/ 1.45 ms/ 2.67 ms; 256 slices; matrix 352 × 352; 1 mm isotropic; flip angle 6°; base resolution 352; phase resolution 100 %; bandwidth 440 Hz/Px; one average; 3 min 38 sec [17]. No contrast agent was applied.

Image analysis

Subjective visual estimation of FGT with MRI

Four breast radiologists: two readers inexperienced in MRI FGT assessment (reader one – P.K., and reader two – R.W.) and two experienced readers (reader three – P.C., and reader four – G.W.) independently performed a subjective visual estimation of FGT with MRI.

To familiarize readers with the MRI assessment of FGT by subjective visual estimation, ten training cases for each of the four density categories, which were not included in the study population, were presented to all readers (Fig. 1). These cases were selected by an experienced reader from previous studies in healthy volunteers, where both MRI and mammography data was available. FGT with MRI was then assessed for each study using both of the acquired Dixon sequences (water-only and fat-only high-contrast images) and classified as: ACR a – almost entirely fatty; ACR b – scattered fibroglandular tissue; ACR c – heterogeneous FGT; or as ACR d – extreme FGT.

Fig. 1
figure 1

Image examples for each of the four MRI FGT categories of the ACR BI-RADS atlas: ACR a – almost entirely fat, ACR b – scattered fibroglandular tissue, ARC c – heterogeneous fibroglandular tissue, and ACR d – extreme fibroglandular tissue, with the corresponding Dixon sequence-based images (upper row: fat only, lower row: glandular tissue only)

All readings were performed on a five-mega-pixel PACS workstation (IMPAX EE, Agfa HealthCare GmbH, Bonn, Germany). All MRI studies were independently arranged in random order for FGT assessment. After an interval of 2 months, all four readers reassessed all MRI studies. All examinations were again arranged in random order and the previously assigned FGT readings were withheld to avoid any bias.

Automated observer-independent quantitative measurement of FGT with MRI

Automated observer-independent quantitative measurements of FGT were obtained using a previously described MRI measurement system [17]. Percent fibroglandular volume (% FGV), as the ratio of the fibroglandular volume to the total breast volume, was calculated fully automatically in every woman. The calculated quantitative MRI FGT values were transformed into an MRI FGT grade analogous to the standard four ACR categories [17]. MRI FGT were scored from < 7.84 % (mean 5.67 %) as an MRI FGT grade a, from 7.84 to 25.88 % (mean 15.62 %) as MRI FGT grade b, from 26.25 to 44.15 % (mean 34.42 %) as MRI FGT grade c, and from 39.86 < (mean 49.74 %) as MRI FGT grade d.

Statistical analysis

Statistical analyses were performed using statistical software (IBM SPSS Statistics Version 22.0). Inter- and intra-observer agreement of FGT assessment with MRI by subjective visual estimation and agreement with automated observer-independent quantitative measurements of FGT with MRI were analyzed using a Cohen’s kappa coefficient.

To express the differences between subjective evaluation of FGT with MRI for each individual reader and reading, the Wilcoxon signed rank test and Mantel Haenszel statistics were used.

The strength of agreement was expressed in k values: with almost perfect agreement for values from 0.81 to 0.99; substantial agreement for values from 0.61 to 0.80; values from 0.41 to 0.60 indicated moderate agreement; fair agreement was given for values from 0.21 to 0.40; slight agreement for values from 0.01 to 0.20, and values less than or equal to zero represented less than chance agreement [20].

Results

Results for BI-RADS-based subjective visual estimation of FGT with MRI and the respective MRI density grades derived by automated observer-independent quantitative measurements of FGT with MRI are summarized in Table 1.

Table 1 BI-RADS-based subjective visual estimation of FGT with MRI for each reader and the respective MRI density grades derived from automated observer-independent quantitative measurements of FGT with MRI

Subjective visual estimation of FGT with MRI

Inter-observer agreement

Inter-observer agreement of subjective visual estimation of MRI FGT for all four readers is summarized in Table 2. In the first reading round, inter-observer agreement for subjective visual estimation of FGT with MRI in inexperienced readers was moderate (R1mr1 – R2mr1; k = 0.435). A substantial agreement (R3mr1 – R4mr1; k = 0.798) was observed in experienced readers.

Table 2 Inter-observer agreement between the first and second reading for the subjective visual estimation of FGT with MRI

In the second reading, inter-observer agreement for both the inexperienced and experienced readers improved substantially (range; k = 0.727 to k = 0.830).

Intra-observer agreement

Intra-observer agreement is summarized in Table 3. Experienced readers achieved better results than inexperienced readers. Intra-observer (range: k = 0.679 to k = 0.594) agreement for the inexperienced readers was moderate. Intra-observer agreement for the experienced readers was almost perfect (range: k = 0.882 to k = 0.847).

Table 3 Intra-observer agreement between the first and second reading of FGT with MRI by subjective visual estimation

Automated observer-independent quantitative measurement of FGT with MRI

Automated observer-independent quantitative measurements of FGT with MRI were successfully performed for every examination using the previously described technique [17]. Automated observer-independent quantitative measurements of FGT with MRI ranged from 1.3 % to 76.1 % (mean 20.6 %). The translation of the calculated percentages to one of the four MRI density grade categories is summarized in Table 1.

Comparison of subjective visual estimation and automated observer-independent quantitative measurements of FGT with MRI

Results for the agreement between subjective visual estimation and the quantitative measurements of FGT with MRI are summarized in Table 4.

Table 4 Agreement between the first and second reading of FGT with MRI by subjective visual estimation and automated observer-independent quantitative measurements of FGT with MRI

There was only fair to moderate agreement between subjective visual estimation and the quantitative measurements of FGT with MRI, ranging from k = 0.209 to 0.497.

Compared to subjective visual estimation, automated observer-independent quantitative measurement of FGT classified fewer breasts as dense (n = 27, categories C and D) than non-dense (n = 53, categories A and B) (Table 1).

Discussion

Our results demonstrate that inter- and intra-observer agreement of subjective visual estimation of FGT was moderate in inexperienced readers. Experienced readers achieved better results, with a substantial inter-observer agreement and a perfect intra-observer agreement, which implies that practice and experience can reduce observer-dependency. Thus, an automated observer-independent quantitative system, which allows reproducible measurements, seems to be better suited for a reliable and standardized assessment of FGT with MRI.

The results of our study show that, analogous to FGT assessment with mammography by subjective visual estimation, MRI is observer-dependent. There was only moderate inter- and intra-observer agreement in inexperienced readers. In the second reading round, inexperienced readers achieved better results, improving to a substantial inter-observer agreement. Experienced readers, in general, achieved better results, with a substantial inter-observer agreement and an almost-perfect intra-observer agreement. However, even experienced readers improved their inter-observer agreement within their already substantial agreement from 0.798 to 0.830. These results indicate that it is necessary to familiarize readers with this new MRI BI-RADS feature, and that further practice, especially for inexperienced readers, is warranted to keep inter- and intra-observer agreement to a minimum. Nevertheless, it remains doubtful whether such subjective visual estimation of FGT, because it is so dependent on practice and experience, should be used for risk evaluation, management, and the assessment of preventive breast cancer measures in women.

The limitations of subjective visual estimates of FGT have also been recognized by the committee on BI-RADS and the investigation of observer-independent automated quantitative measurement systems [8, 1317] for more reliable measures of the true proportion of FGT have been encouraged. Automated observer-independent quantitative measurement approaches have been developed and tested for both mammography and MRI [8, 1316, 18, 21].

Wengert et al. introduced and validated a measurement system for MRI, which was used in this study. This measurement system for MRI allows an automated, observer-independent, robust, reproducible, volumetric, quantitative FGT assessment through different levels of breast composition [17].

To our knowledge, there is currently no study that has compared subjective visual estimation of MRI FGT to an automated observer-independent quantitative MRI measurement system. The results of our study demonstrate that there are distinct differences in subjective visual and automated observer-independent quantitative MRI FGT estimation, with only fair to moderate agreement (k = 0.209–0.497).

Compared to subjective visual estimation, automated observer-independent quantitative measurements of FGT with MRI classify fewer breasts as dense (categories C and D) than non-dense (categories A and B). These findings are in good agreement with previously published results, which compared automated observer-independent quantitative measurements of FGT with MRI to subjective mammographic FGT estimation. Khazen et al. found a twofold overestimation of mammography breast density assessment compared to MRI [13]. Based on a twofold error between mammography and MRI breast density assessment in patients with dense breast, Lee et al. [22] concluded that mammography has a limited capacity for breast density estimation due to the two-dimensional character of the modality. Thompson et al. showed that breast density assessment with interactive tissue segmentation on precontrast T1-weighted MRI revealed consequently lower results than semi-automated quantitative breast density assessment with mammography [15]. It can be expected that automated observer-independent quantitative measurements of FGT with MRI will provide the necessary standardization for a reliable measurement, as well as tracking of alterations in FGT over time. Together with clinical parameters and risk factors, this potentially might facilitate a more accurate individual breast cancer risk stratification, management, and an assessment of preventive breast cancer measures in women.

A limitation of the current study is the relatively small number of participants, as well as the number of volunteers in whom the used software was initially validated. However, this is the first study to address inter- and intra-observer agreement of subjective visual estimation of FGT with MRI, as recommended by BI-RADS, and the findings have been corroborated by previous experience with mammography and initial results for automated quantitative MRI measurements of FGT [10, 17, 23]. Another limitation is that it is uncertain as to which quantitative measurement approach of FGT comes closest in reflecting the histopathological composition of breast tissue, and therefore, a direct correlation of the amount of FGT in MRI as well as mammography with histopathology is difficult in the clinical setting.

In conclusion, subjective visual estimation of FGT with MRI shows moderate intra- and inter-observer agreement, which can be improved by practice and experience. Therefore, automated observer-independent quantitative measurements of FGT with MRI seem to be more appropriate to enable a standardized risk evaluation, management, and the assessment of preventive breast cancer measures in women.