Introduction

Digital breast tomosynthesis (DBT) has been investigated as an alternative to digital mammography (DM) in breast cancer screening, with proven increased cancer detection rates (CDR) in several prospective trials [1]. However, the impact of DBT on recall rates has varied across different studies and screening settings [1]. Women with high breast density seem to particularly benefit from DBT due to its reduction of the overlapping tissue effect, as compared with DM [2]. Women with high breast density also have a higher risk of breast cancer, missed cancers, and false positive (FP) findings with DM compared with women with low breast density [3,4,5].

Radiologists commonly classify breast density into four categories according to the Breast Imaging Reporting and Data System (BI-RADS) [6]. Yet, this categorization is associated with both intra- and interobserver variations [7, 8]. In previous studies investigating DBT, breast density was often dichotomized into dense and non-dense [9]. A more detailed assessment of breast density might better capture the risk of developing breast cancer and address reduced sensitivity of cancer detection from the overlapping tissue effect [10]. Several automated quantitative breast density assessment software algorithms have been developed with the aim to primarily reduce observer variability [11]. One such software is the Laboratory for Individualized Breast Radiodensity Assessment (LIBRA) [12, 13].

Screening with DBT improves CDR compared with DM in women with dense breasts [9]. However, results from more detailed density sub-analyses in prospective trials with either BI-RADS density classification or automated software breast density assessment have shown inconsistent CDR results and recall rates in different density subgroups and most data are from American rather than European material [9, 14,15,16]. Accordingly, more information, especially from European data is needed. Younger women also generally have higher breast density, and DM’s sensitivity for breast cancer detection in this population is lower compared with older women [4, 17]. Further, density sub-analyses from previous prospective DBT screening trials have not included women 40–49 years old [14,15,16].

The prospective Malmö Breast Tomosynthesis Screening Trial compared one-view wide angle DBT alone to two-view DM and included women 40–74 years old [18]. The purpose of this current study is to evaluate which breast density subgroups, as assessed by automatic software, that have the greatest benefit from digital breast tomosynthesis compared with digital mammography in the Malmö Breast Tomosynthesis Screening Trial, with a separate evaluation for women aged 40–49 years.

Materials and methods

Study participants

The prospective Malmö Breast Tomosynthesis Screening Trial was conducted between January 27, 2010 and February 13, 2015 at Skåne University Hospital in Malmö, Sweden. This secondary analysis was pre-specified and received ethical approval from the local ethics committee at Lund University (Dnr 2009/770; trial protocol at https://www.ClinicalTrials.gov: NCT01091545). A random sample of 21,691 women aged 40–74 years old were selected from the Malmö screening registry, asked to participate in the trial, and enrolled after providing their written informed consent (Fig. 1). Exclusion criteria were pregnancy and non-Swedish or non-English speakers. One-view (mediolateral oblique) wide angle DBT and two-view (mediolateral oblique and craniocaudal) DM images were acquired at one screening occasion with Mammomat Inspiration (Siemens Healthineers, Erlangen, Germany). The authors had full control of the data and all information submitted for publication, and none were employed by Siemens Healthineers. Seven radiologists (among them SZ) with breast imaging experience ranging from 2 to 40 years participated in the screen reading. Five of the readers had a screen reading volume of over 5000 screen examinations per year. All images were read in two separate reading arms, the DM reading arm and the DBT reading arm, with double reading in each arm and consensus meetings taking place before recall. The participants could be recalled from one or both reading arms (Fig. 1) [18, 19]. Breast density categorization within the trial was performed according to BI-RADS breast density 4th Ed categories [6] for all participating women by the first reader as part of the DM reading arm. The study sample was investigated in several previous publications (Additional file 1), though screening performance had not been investigated by automatically assessing breast density. Breast density was retrospectively assessed with the automated software LIBRA for this study (Fig. 2). Breast area and absolute dense area (DA) were analyzed for each processed DM view, resulting in four analyzed images per woman (two in women with one breast) that were combined for a mean value. The mean value of breast percent density (PD) was calculated by dividing DA by breast area. Final exclusion criteria were inability of LIBRA to perform an analysis and the presence of breast implants.

Fig. 1
figure 1

Flowchart of Malmö Breast Tomosynthesis Screening Trial participants and reading arms. DM digital mammography; DBT digital breast tomosynthesis; BI-RADS Breast Imaging Reporting and Data System 4th ed. LIBRA Laboratory for Individualized Breast Radiodensity Assessment

Fig. 2
figure 2

Participant images with density assessment. Images from the Laboratory for Individualized Breast Radiodensity Assessment (LIBRA) of a woman without cancer, 47 years old, who participated in the Malmö Breast Tomosynthesis Screening Trial. The woman was not recalled from screening. Breast density assessment with the LIBRA showed breast density corresponding to the fourth quintiles of both breast percent density and absolute dense area. Left images show the craniocaudal (upper) and mediolateral oblique (lower) view from digital mammography without density assessment. Right images show the same projections with density assessment. The total breast areas are marked in red and the dense areas in green

Definitions

Previous screening was defined as a woman who had participated in the regional screening program in Skåne, Sweden in 2005 or later. Menopausal status was defined by age at DBT screening as premenopausal (< 55 years) or postmenopausal (≥ 55 years) [20].

Study outcomes

Outcomes, calculated per woman, were sensitivity, specificity, and CDR for breast cancer per 1,000 women screened, as well as FP rate, recall rate, biopsy rate, positive predictive value for recall, and positive predictive value for biopsy. A subgroup analysis was conducted for women aged 40–49 years.

Statistics

The study participants were divided into quintiles of PD and DA per increasing density. The outcomes of the DBT reading arm were compared with those of the DM reading arm for each breast density quintile. The density subgroups were not pre-specified in the study protocol. The sensitivity and specificity of DBT and DM were compared per each quintile and the overall study sample with McNemar’s test in SPSS Statistics for Windows (version 26, 2019, IBM Corp., Armonk, NY, USA). Logistic regression analyses were also performed in SPSS to analyze the relation between cancer detected with DBT, cancer detected with DM, FP with DBT, and FP with DM with PD or DA quintiles, adjusting for menopausal status and previous screening to generate odds ratios (OR) and 95% confidence intervals (CI). DBT and DM outcomes were calculated for each quintile using Epitools (Sergeant, ESG, 2018, Ausvet; available at: http://epitools.ausvet.com.au), presented with 95% CI. An exploratory test to analyze which quintiles had the largest difference in CDR with the use of DBT compared with DM was performed. Analyses were also performed per BI-RADS density category. Subgroup analyses for women aged 40–49 were presented descriptively for sensitivity, specificity, and CDR in new PD and DA quintiles40−49 and BI-RADS density categories. An alpha value of 0.05 was considered significant. A Bonferroni correction for multiple testing with six tests, 5 quintiles and overall, (five tests with BI-RADS density, 4 categories and overall) was used for McNemar’s test (alpha after correction 0.0083 and 0.01, respectively).

Results

Participant characteristics

This study included 14,730 women after exclusions (95 due to breast implants and 23 due to missing LIBRA values) (Fig. 1) at a median age at inclusion of 58 years (inter-quartile range = 16). Further descriptive data are presented in Table 1. One woman, later presenting with interval cancer, was recalled from the screening examination but without any cancer found at follow-up. This woman is included both as an FP and as a participant with interval cancer.

Table 1 Descriptive data of the study population

Breast percent density and absolute dense area

The median PD and DA were 21.6% and 33.2 cm2, respectively. Each quintile contained 2945–2947 women. Two women at the cut-off value between quintiles 3 and 4 had an equal DA. Descriptive data for all quintiles are presented in Table 2.

Table 2 a Descriptive statistics of breast percent density quintiles b Descriptive statistics of absolute dense area quintiles

Sensitivity

Sensitivity was higher for DBT compared with DM for all PD quintiles, significantly for the highest quintile (81.1% (95% CI 65.8–90.5) vs 43.2% (95% CI 28.7–59.1), p < 0.001; Fig. 3 and Table 3). The DA quintiles had similar results, with significance for quintile 4 (76.7% (95% CI 62.7–86.8) vs 51.2% (95% CI 36.8–65.4), p = 0.003) and quintile 5 (83.3% (95% CI 68.1–92.1) vs 47.2% (95% CI 32.0–63.0), p = 0.002). The largest absolute difference in sensitivity between DBT and DM emerged in quintile 5 for both PD (37.9 percentage points (95% CI 15.8–60.0)) and DA (36.1 percentage points (95% CI 14.1–58.1)).

Fig. 3
figure 3

ad Graphs of sensitivity and specificity. Graph of (a and b) sensitivity (sens) and (c and d) specificity (spec) of breast percent density (PD) and absolute dense area (DA) in all quintiles for digital breast tomosynthesis (DBT) and digital mammography (DM), with 95% confidence intervals as vertical lines. Dotted lines mark overall sensitivity and specificity for DBT and DM

Table 3 Sensitivity of digital breast tomosynthesis and digital mammography in all quintiles

Specificity

Specificity was lower for DBT compared with DM for all PD quintiles, significant for quintile 3 (97.1% (95% CI 96.4–97.6) vs 98.1% (95% CI 97.6–98.1), p = 0.001) and quintile 5 (95.5% (95% CI 94.7–96.2) vs 97.2% (95% CI 96.6–97.8), p < 0.001; Additional file 2: Table S1). The DA quintiles revealed similar results, with significantly lower specificity for DBT compared with DM for quintile 3 (97.4% (95% CI 96.8–97.9) vs 98.4% (95% CI 97.8–98.8), p = 0.003), quintile 4 (96.6% (95% CI 95.9–97.2) vs 98.1% (95% CI 97.5–98.5), p < 0.001), and quintile 5 (95.6% (95% CI 94.8–96.3) vs 96.9% (95% CI 96.2–97.5), p < 0.001). The largest absolute difference in specificity between DBT and DM was seen in PD quintile 5 (1.7 percentage points (95% CI 0.8–2.7)) and DA quintile 4 (1.4 percentage points (95% CI 0.6–2.3)).

Logistic regression

In the logistic regression models, after adjustment for menopausal status and previous screening, higher PD and DA quintiles were associated with cancer detected with DBT (OR 1.24 (95% CI 1.09–1.42, p = 0.001) and OR 1.28 (95% CI 1.12–1.46, p < 0.001), respectively). This relationship was not seen for cancer detected with DM for neither PD nor DA (Table 4). Higher PD and DA quintiles were also associated with FP for both DBT (OR 1.27 (95% CI 1.17–1.38, p < 0.001) and OR 1.23 (95% CI 1.14–1.33, p < 0.001), respectively) and DM (OR 1.24 (95% CI 1.13–1.37, p < 0.001) and OR 1.20 (95% CI 1.10–1.32, p < 0.001), respectively). In the logistic regression, previous screening did not significantly affect cancer detection or FP after adjustments. Postmenopausal women had a higher OR for cancer detection and a lower OR for FP with both DBT and DM after adjusting for previous screening and density by PD or DA.

Table 4 Multivariable logistic regression for detected breast cancers and false positive recall

Cancer detection rate and false positives

CDR was higher with DBT compared with DM in all five quintiles, both for PD and DA. However, the CI for difference included zero for all quintiles except the highest PD quintile (Fig. 4 and Additional file 2: Table S2). The largest difference between DBT and DM was found in the highest PD and DA quintiles, with 4.8 (95% CI 0.3–9.3) and 4.4 (95% CI  − 0.1–9.0) additional cancer detections per 1,000 women screened, respectively. FP rates were also higher for DBT compared with DM for all PD and DA quintiles, although with CI for difference overlapping zero for PD quintiles 1 and 4 and DA quintiles 1 and 2.

Fig. 4
figure 4

a and b Graphs of differences in cancer detection and false positives. Graph of differences in cancer detection rate (CDR) per 1000 women screened and false positives (FP) in percentage points between digital breast tomosynthesis and digital mammography for all (a) breast percent density (PD) and (b) absolute dense area (DA) quintiles. Dotted lines mark overall difference in CDR and FP

Recall, biopsy rate, positive predictive value for recall, and positive predictive value for biopsy

Recall rates were highest for both DBT and DM in the highest PD and DA quintiles (Fig. 5). Biopsy rates were higher for DBT compared with DM for all PD and DA quintiles, albeit with CI for difference overlapping zero for PD quintiles 1–4 and DA quintiles 1–3 (Additional file 2: Table S3). The positive predictive values for recall and biopsy were similar between DBT and DM across all PD and DA quintiles.

Fig. 5
figure 5

Bar charts of recall rate, biopsy rate and positive predictive values. Bar charts of (A and B) recall rate, (C and D) biopsy rate, (E and F) positive predictive value of recall (PPV-1), and (G and H) positive predictive value of biopsy (PPV-3) of breast percent density (PD) and absolute dense area (DA) in all quintiles for digital breast tomosynthesis (DBT) and digital mammography (DM), with 95% confidence intervals (CI) in vertical lines. The difference (Δ) between DBT and DM are presented in percentage points with 95% CI in parenthesis

Exploratory analysis

An exploratory test analyzed which quintiles had the largest difference in CDR when using DBT. For PD, the largest gain was in quintile 5 alone, so no further testing was done. For DA, the largest gain in CDR occurred in quintiles 4 and 5. When these quintiles were analyzed together, the incremental CDR was 4.1 (95% CI 0.7–7.4) additional cancer detections per 1,000 women screened for DBT compared with DM. The corresponding incremental FP rate for the DA quintiles with DBT compared with DM was 1.4 percentage points (95% CI 0.7–2.0).

Women 40–49 years old

For women aged 40–49 years, the median PD and DA were 35.8% and 43.9 cm2, respectively. Additional file 2: Table S4 provides the descriptive data of the PD and DA quintiles40−49 among this study subgroup. Sensitivity, specificity, and CDR in all PD and DA quintiles40−49 as well as in BI-RADS density categories for DBT and DM are available in Table 5 and Additional file 2: Table S5. Overall sensitivity was higher (82.1% (95%CI 64.4–92.1) vs 53.6% (95% CI 35.8–70.5), p = 0.02) and overall specificity lower (95.8% (95% CI 95.2–96.4) vs 97.0% (95% CI 96.4–97.4); p < 0.001) for DBT compared with DM for women aged 40–49. Higher sensitivity and CDR but somewhat lower specificity for DBT compared with DM, as in the full study sample, occurred across most quintiles40−49.

Table 5 Sensitivity, specificity, and cancer detection rate among women 40–49 years old

Outcome by BI-RADS density category

For completeness and reference, data outcomes by BI-RADS density category are presented in Additional file 2: Tables S6–S9. These data, featuring FP, CDR, and BI-RADS density distribution results, were published in part in previous studies [18, 21,22,23].

Discussion

The diagnostic accuracy of digital breast tomosynthesis (DBT) compared with digital mammography (DM) in breast cancer screening may vary per breast density subgroup. This study thus evaluated the diagnostic accuracy of DBT and DM in the Malmö Breast Tomosynthesis Screening Trial by breast density subgroup with the automatic software the Laboratory for Individualized Breast Radiodensity Assessment (LIBRA). The largest difference in cancer detection rate (CDR) in screening with DBT and DM was found among women in the highest breast density quintile. For the 20% of women with the highest breast percent density (PD), sensitivity went from 43.2% with DM to 81.1% with DBT (p < 0.001), corresponding to 4.8 (95% CI 0.3–9.3) additional women with breast cancer identified per 1000 screened. The largest difference in specificity between DM and DBT, with lower results for the latter, was also seen in women in the highest PD quintile; however, specificity was still high (95.5%) for DBT. Among women aged 40–49, the sensitivity of DBT was higher compared with DM in most density categories for both PD and absolute dense area (DA).

In the USA, DBT is widely implemented in screening since several years, especially among women with dense breast. However, in 2021, the European Commission Initiative on Breast Cancer published a conditional recommendation for DBT in screening women with dense breasts, albeit with “very low certainty of the evidence” [24]. Both European [24] and American recommendations [25] dichotomized breast density categories. Two studies with more detailed density sub-analyses with automatic breast density assessment that analyzed data from prospective trials, the Oslo Tomosynthesis Screening Trial [14] and Tomosynthesis trial in Bergen [15], did not find a significantly higher CDR for DBT compared with DM for women with the densest breasts. However, in the Oslo Tomosynthesis Screening Trial, the higher CDR for the densest group with DBT compared to DM was of similar magnitude (21.7% (95% CI 3.0–41.9), p = 0.06) as the incremental rate for the subgroup with the second highest breast density (22.6% (95% CI 12.9–32.9), p < 0.001) [14]. In the Tomosynthesis trial in Bergen, no difference in CDR between any density subgroups in DBT and DM was found [15]. These differences in findings in comparison with this study could have derived from the smaller sample sizes of the densest subgroups in both the Oslo Tomosynthesis Screening Trial and Tomosynthesis trial in Bergen. As well as that the Tomosynthesis trial in Bergen did not find any difference in CDR overall [26], in contrast to several other European trials [1]. A detailed density sub-analysis of the prospective Tomosynthesis plus Synthesized Mammography trial, which used the BI-RADS density categorization, found a significantly higher CDR with DBT compared with DM for women with the highest breast density (OR 3.8 (95% CI 1.5–11.1)), which is in agreement with the present study’s findings [16]. Neither the Tomosynthesis trial in Bergen nor the Oslo Tomosynthesis Screening Trial found any significant difference in FP between DBT and DM among women with the highest breast density [14, 15]. These different results compared with this study could again be due to the smaller sample size among the densest subgroups and the Oslo Tomosynthesis Screening Trial’s FP rate being derived before the consensus meeting.

Automated breast density assessment enables reproducibility. LIBRA can assess breast density in both raw and processed images [12], which is beneficial since in clinical settings, it is common that only processed images are stored [27]. Whether PD or DA should be used for breast density assessment is still debatable [28], although it has been suggested that PD has a higher correlation with breast cancer risk [29]. The current study’s results showed similarities between PD and DA, but in exploratory analyses, a larger group that benefits more from DBT in terms of increased CDR could be identified with DA. Still, this study was not designed to compare the two different breast density assessment methods.

The current study does have limitations. The subgroup division and post hoc analysis were not powered in the original trial, though significant differences were still found in the higher breast density subgroups. DM’s FP rate in this trial could also be underestimated due to the DBT images available at the consensus meeting, which caused DM to be favored. The LIBRA assessments were not manually reviewed, though LIBRA has previously been validated for Siemens images, with a strong association with radiologists’ density assessments (r = 0.89) [20]. Images with failed LIBRA readings, due to bad positioning of the breast, were excluded in the study. However, the number of failed readings were low (n = 23). Further, the density measurement with LIBRA was assessed area-based from DM-images. A stronger association with breast cancer, has however, previously been shown for volumetric measurements from DBT [30]. Finally, the subgroup of women aged 40–49 was small, so these results should be interpreted with caution.

The findings in this study add important knowledge to the scarce evidence regarding DBT screening in women with the densest breasts, showing greatest impact for women in the highest breast density subgroup. To evaluate the full value of DBT in the screening program, future evaluation should assess breast density beyond binary categorization.

Conclusion

In conclusion, women with high mammographic density, as assessed with automatic density software, had the greatest benefit from digital breast tomosynthesis screening compared with digital mammography, as it improved cancer detection for 20–40% of the screening population at the cost of a small decrease in specificity. These results may influence digital breast tomosynthesis’s use in a future individualized screening program stratified by, for instance, breast density.