Study selection
A total of 766 records were identified in PubMed. Five hundred eight records were excluded after title and abstract screening and 258 records were assessed in full text for eligibility. A total of 246 studies were excluded after full-text screening for eligibility. Twelve studies [5, 20,21,22,23,24,25,26,27,28,29,30], representing the results of 10 populations, were included in at least one meta-analysis of CDR [5, 20,21,22,23,24,25,26,27,28], recall rate [20, 21, 23,24,25,26,27, 29], ICR [21, 30], biopsy rate [20, 23,24,25, 27], PPV-1 [20, 21, 23,24,25,26,27, 29], PPV-2 [20, 23], or PPV-3 [20, 23,24,25, 27]. The selection process is shown in Fig. 1.
Study characteristics
Table 1 summarises characteristics of studies comparing DBT plus s2D versus DM alone. Twelve studies [5, 20,21,22,23,24,25,26,27,28,29,30] represent results of 10 unique study populations with 414,281 women. Two studies were conducted in the USA (79,209 women) [20, 23], one in Australia (10,146 women) [26], and 9 in Europe, representing results of 7 populations (324,926 women) [5, 21, 22, 24, 25, 27,28,29,30]. Two of the European studies reported outcomes from Trento, Italy (Bernardi et al. 2020 [21] and Bernardi et al. 2016 (STORM-2) [5]). Women who were previously enrolled in STORM/STORM-2 were excluded from analysis in Bernardi et al. (2020).
Table 1 Summary of study characteristics of included studies comparing DBT plus s2D versus DM alone Quality assessment
Figure 2 shows the RoB and applicability assessment. All studies were evaluated having a high RoB in ‘flow of timing’, as not every woman received the same reference test after screening. Low RoB would require that all women, including women with inconspicuous findings in screening, subsequently undergo histopathological assessment for verification. Since this is ethically not acceptable, studies assessed with a high RoB in the domain ‘flow and timing’ only were assessed with an overall low RoB (ESM, S3).
Synthesis of results
CDR, recall rate, and PPV-1
The CDR reports the number of cancers detected among 1,000 women screened/examinations. Ten studies [5, 20,21,22,23,24,25,26,27,28] were included in the meta-analysis of CDR. Two studies [29, 30] were not included to avoid double counting of women. The CDR was estimated to be significantly higher when using DBT plus s2D compared to DM alone (RR: 1.35, 95% CI: 1.20–1.52, p < 0.01, I2: 58%) using REM (Fig. 3). Sensitivity analyses demonstrated robustness regarding statistical significance (ESM, S2).
The number of women recalled per 100 women/examinations is represented by the recall rate. Eight studies [20, 21, 23,24,25,26,27, 29] were included in the meta-analysis of recall rates (Fig. 3). The study of Bernardi et al. 2016 [5] was excluded since only false-positive recalls were reported. Data of recalls in Skaane (2019) (Oslo Tomosynthesis Screening Trial (OTST)) [28] were not available for the study group DBT plus s2D (Arm D); recalls were reported in total for women screened with DBT plus DM/s2D (Arm C + D). Previous published OTST studies were excluded from analysis, since they did not report recall rates in women screened with DBT plus s2D compared to DM alone [31,32,33]. Recall rates by REM were significantly lower when using DBT plus s2D compared to DM alone (RR: 0.79, 95% CI: 0.64–0.98, p: 0.03, I2: 97%). Results were not robust with regard to statistical significance if single studies were left out (ESM, S2).
The relation of cancers detected by women recalled is represented by the PPV-1. Eight studies [20, 21, 23,24,25,26,27, 29] were included in the meta-analysis of PPV-1 (Fig. 4). Two studies [21, 26] did not separately report the PPV-1; therefore, we calculated PPV-1. There was a statistically significant higher cancer detection when being recalled in screening with DBT plus s2D compared to DM alone (RR: 1.69, 95% CI: 1.45–1.96, p < 0.01, I2: 73%), using REM. Sensitivity analyses demonstrated statistically significant robust results (ESM, S2).
Data of a screening programme in Australia are included in the performed meta-analyses of CDR, recall rate, and PPV-1. Houssami et al. [26] performed sensitivity analyses in which screens of women who reported symptoms at screening were excluded. Symptomatic women are also likely to participate in breast cancer screening programmes; however, the results of the other screening programmes were not stratified for asymptomatic and symptomatic women at screening. Therefore, we included data of all women to enhance comparability between studies, but performed sensitivity analyses, in which the data of asymptomatic women only were included. Results of sensitivity analyses showed no differences of risk ratios for CDR, recall rates, and PPV-1 (ESM, S2).
Biopsy rate, PPV-2, and PPV-3
Biopsy rates indicate how many biopsies were performed per 1,000 women/examinations. Five studies [20, 23,24,25, 27] were included in the meta-analysis of biopsy rates. In two studies [24, 27], biopsy rates were calculated using the percentage of PPV-3 or CDR. No statistically significant differences in biopsies in women screened with DBT plus s2D compared to DM alone were observed. The RR calculated using REM (RR: 0.87, 95% CI: 0.70–1.09, p: 0.22, I2: 91%) demonstrates a potentially lower number of biopsies when using DBT plus s2D in screening compared to DM alone (Fig. 5). Sensitivity analyses demonstrated robustness of results with regard to statistical significance (ESM, S2).
PPV-2, or PPV-3 respectively, indicates the number of cancers detected among 100 biopsies recommended, or performed, respectively. Since some studies report PPV-2 or PPV-3 only, we analysed both. Five studies were included in the meta-analysis of PPV-3 [20, 23,24,25, 27]. Cancer detection in women being biopsied after screening with DBT plus s2D is statistically significantly higher (RR: 1.36, 95% CI: 1.17–1.58, p < 0.01) compared to women screened with DM alone (Fig. 5), using REM (I2: 67%). Sensitivity analyses demonstrated robustness of results regarding statistical significance (ESM, S2). PPV-2 (Fig. 5) was reported in two studies [20, 23]. CDR is higher in women with recommended biopsy after screening with DBT plus s2D compared to DM alone using REM (RR: 1.57, 95% CI: 1.08–2.28, p: 0.02, I2: 82%).
ICR
The ICR indicates the number of interval cancers per 1,000 women screened/examinations. Two European studies [21, 30] were identified reporting interval cancers in women screened with DBT plus s2D compared to women screened with DM alone (Fig. 6). No statistically significant difference in ICR was observed for both the pooled estimate using REM (RR: 1.03, 95% CI: 0.66–1.63, p: 0.88, I2: 70%) and single ICR reported in the studies.