Digital breast tomosynthesis for breast cancer screening and diagnosis in women with dense breasts – a systematic review and meta-analysis
- 755 Downloads
This study aimed to systematically review and to meta-analyse the accuracy of digital breast tomosynthesis (DBT) versus digital mammography (DM) in women with mammographically dense breasts in screening and diagnosis.
Two independent reviewers identified screening or diagnostic studies reporting at least one of four outcomes (cancer detection rate-CDR, recall rate, sensitivity and specificity) for DBT and DM in women with mammographically dense breasts. Study quality was assessed using QUADAS-2. Meta-analysis of CDR and recall rate used a random effects model. Summary ROC curve summarized sensitivity and specificity.
Sixteen studies were included (five diagnostic; eleven screening). In diagnosis, DBT increased sensitivity (84%–90%) versus DM alone (69%–86%) but not specificity. DBT improved CDR versus DM alone (RR: 1.16, 95% CI 1.02–1.31). In screening, DBT + DM increased CDR versus DM alone (RR: 1.33, 95% CI 1.20–1.47 for retrospective studies; RR: 1.52, 95% CI 1.08–2.11 for prospective studies). Recall rate was significantly reduced by DBT + DM in retrospective studies (RR: 0.72, 95% CI 0.64–0.80) but not in two prospective studies (RR: 1.12, 95% CI 0.76–1.63).
In women with mammographically dense breasts, DBT+/−DM increased CDR significantly (versus DM) in screening and diagnosis. In diagnosis, DBT+/−DM increased sensitivity but not specificity. The effect of DBT + DM on recall rate in screening dense breasts varied between studies.
KeywordsBreast neoplasm Digital mammography Digital breast tomosynthesis Review Meta-analysis Breast density
Breast Imaging Reporting and Data System
Cancer detection rate
Digital breast tomosynthesis
Breast cancer (BC) is the most common cancer in women and the leading cause of cancer death among women in Europe . Many countries have adopted population-wide BC screening, initially with film-screen and subsequently with digital mammography (DM), aiming to lower mortality from BC by earlier detection of the disease [2, 3]. However, DM has moderate sensitivity, for which estimates vary from 67.3% to 93.3% . High breast tissue density, defined as having more than 50% density on mammography, categories 3 and 4 or categories c and d in the BI-RADS 4th or 5th edition respectively [5, 6] reduces the sensitivity of mammography due to its masking effect, and may increase false-positives due to superimposition of dense parenchyma. It is estimated that about half of all women taking part in screening have dense breasts although the proportion differs in age-groups [7, 8]. Breast density is also considered an independent risk factor for BC .
Digital Breast Tomosynthesis (DBT) enables pseudo-3D imaging of the breast, resulting in better discrimination of tissue structures and potentially improved visualisation of cancer [10, 11]. Hence, DBT has the potential to improve both sensitivity and specificity of imaging in BC screening, leading to more detected cancers with fewer false-positives . However, using both DM and DBT increases radiation dose to the breast, if both acquisitions are obtained. Improved screening accuracy using DBT has been shown in several prospective and retrospective studies, in screening populations and in studies using BC-enriched mammogram series [12, 13]. Very few reviews have examined the role of DBT in women with dense breasts, and these were either concise reports or did not use systematic methodology [14, 15]. Therefore, in this work we aimed to systematically review the literature on the accuracy of DBT compared to DM in women with dense breasts. A secondary objective was to perform a meta-analysis on four outcomes (cancer detection rate - CDR, recall rates, sensitivity and specificity) of DBT compared to DM in women with dense breasts.
A systematic review and meta-analysis were performed by two independent reviewers (XAP and GHdB or AT), following a predetermined review protocol based on the PRISMA guidelines (http://www.prisma-statement.org/) (Additional file 1). Discordance throughout the process was discussed between the two reviewers and if consensus was not reached then a third reviewer (GHdB or NH) was consulted.
We searched for studies that included women older than 18 years, who underwent breast imaging using DBT and DM and were classified as having dense breasts on mammography. Studies comparing DBT to DM and reporting at least one accuracy measure were considered. Prospective as well as retrospective comparative studies could be included.
Data sources and searches
Eligible studies were: studies that compared the accuracy of DBT and DM in a screening setting or diagnostic setting; reported data on at least one of 4 outcomes (CDR, recall rate, sensitivity and specificity) for both DBT and DM (where data reported or could be calculated); included at least 100 women with dense breasts who were asymptomatic (screening setting) or recalled after screening (diagnostic setting); and where ‘dense breast’ was defined as more than 50% density [BI-RADS 3 and 4 (4th edition) or BI-RADS c and d (5th edition)]. Only English publications were considered. Studies which did not contain original data, or simulation studies, were excluded. If multiple publications were based on the same study population, the most extensive study in terms of data reported was chosen.
Articles identified from the search were loaded into RefWorks (2016, ProQuest LLC) and duplicates were removed. Titles/ abstracts, followed by full text, were reviewed based on predefined criteria and a final set of eligible studies were selected.
Data collection process
A predefined form was developed, and used to extract information from included studies: type of study (prospective or retrospective), study setting (screening or diagnostic), number of women with dense breasts, inclusion and exclusion criteria, age of women with dense breast (age of whole study population if not specified for dense breasts), number of screening rounds if applicable, length of follow-up, method of reporting breast density, number of BCs, reading protocol (single or double reading), definitions for recall and for positive test, DBT manufacturer, number of DBT views (one or two), utilisation of additional modalities (DM or none), and reported outcomes for DBT and for DM in women with dense breasts (CDR, recall rate sensitivity and/or specificity).
Risk of bias and quality appraisal
The quality of included studies was assessed using the QUADAS-2 tool which was modified to ensure assessment was appropriate for the breast screening or diagnostic context. The domains considered were: patient selection, index test, reference standard, flow and timing and applicability. This was performed by two reviewers independently and final quality assessment was based on consensus.
Meta-analysis was performed to estimate the relative risk of cancer detection and of recall for DBT and DM using a random effects model in RevMan 5.3. This analysis was performed separately for screening and diagnostic studies, and also separately for studies comparing two groups of women (unpaired data) and those comparing detection within one group of women (paired data). Subgroup analyses were carried out to examine the effect of covariates, modality (whether stand-alone DBT, or DBT with DM), outcome definitions, and reading protocol (single or double-reading). A summary ROC was produced for DBT and DM for sensitivity and specificity where studies reported these outcomes. For computation, it was assumed that all screens were independent, even if there were multiple screens for some patients in some studies. Heterogeneity across studies was quantified with I2 measure for CDR and recall rate.
A total of 608 unique studies were eligible for title and abstract screening, and 63 studies were checked at full-text reading (details in Fig. 1). Sixteen studies [12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30] met our predefined inclusion criteria and were included in the evidence synthesis. The meta-analysis was performed separately for 5 diagnostic studies, and for 11 screening studies (these were examined separately for 8 screening studies that used two independent study groups, and the 3 screening studies that used one study group). Details about study inclusion with reasons for exclusion are described in the flow-chart (Fig. 1).
Overview of included studies
Characteristics of 16 included studies are presented in Additional file 2. Studies differed in terms of study setting, threshold definitions, breast density categorization, reading protocol and whether DBT was used alone or with DM. Among the five diagnostic studies, four studies using DBT and DM reported sensitivity and specificity [12, 18, 21, 30] and one study reported recall rate . It was possible to calculate CDR from three studies which reported sensitivity and specificity [12, 18, 21]. All but two of 11 screening studies performed one screening round. Nine screening studies reported CDR [16, 19, 20, 23, 24, 25, 26, 27, 29] and nine studies reported recall rate [16, 19, 20, 22, 24, 26, 27, 28, 29].
Quality appraisal of included studies
Comparing tomosynthesis and digital mammography in women with dense breasts in diagnostic settings (N = 5)
Comparing tomosynthesis and digital mammography in women with dense breasts in screening setting
Studies comparing two groups of participants (N = 8)
Studies comparing within the same group of participants (paired data) (N = 3)
These studies used DBT combined with DM [16, 19] or DBT alone , double-reading protocol and similar definitions for recall. Pooled estimates from three studies showed improved CDR when using DBT with DM compared to DM alone, RR = 1.52, 95%CI 1.08–2.12 (Fig. 3c) with homogeneity across studies (I2 = 0%). Using DBT with DM did not reduce recall rate based on two studies (I2 = 76%). The pooled RR was 1.12, 95% CI 0.76–1.63 (Fig. 4b). Subgroup analysis was not performed due to a small number of studies.
This systematic review identified 16 studies (5 diagnostic and 11 screening studies) comparing accuracy measures, such as CDR, recall rate, sensitivity and specificity, of DBT and DM in women with dense breasts at mammography. We found that in diagnostic studies, DBT with or without DM improved CDR (RR = 1.12, 95% CI 1.01–1.24) and sensitivity compared to DM alone (84%–89% vs 69%–86%) in women with dense breasts, whereas specificity did not increase when DBT was used (72–93% vs 57–94%). In the screening setting, CDR was improved when using DBT with or without DM, in studies comparing within one study group (RR = 1.52, 95% CI 1.08–2.12) or comparing two study groups of participants (RR = 1.33, 95% CI 1.20–1.47). Recall rate was reduced when using DBT compared to DM alone in screening studies using two study groups (RR = 0.72, 95% CI 0.64–0.80), though heterogeneity across studies was very high (I2 = 93%) and partially explained by the two different definitions of outcome.
Almost all of the reviews in the literature comparing DBT with DM in BC screening do not distinctly report on women with dense breasts. One review, not restricted to women with dense breasts, reported that DBT with DM increased CDR with a RR of 1.29 (95% CI 1.16–1.43) (Yun et al. ) which is comparable to our estimate. We identified only two reviews reporting on women with dense breasts, one was a quantitative rapid review and one was a narrative review without analyses [14, 15]. The rapid review identified eight studies comparing CDR and recall rate of DBT plus DM to DM alone in women with dense breasts but was restricted to screening studies. The rapid review reported a significantly increased CDR when pooling studies comparing within same group of participants (incremental cancer detection per 1000 screens: 3.9, 95% CI 2.7–5.1) as well as when pooling studies comparing two groups of participants (incremental cancer detection per 1000 screens: 1.4, 95% CI 0.9–2.0) . Although our results are generally in line with these previous reviews, by performing a systematic search and by considering both screening and diagnostic studies, we were able to identify more data sources for the comparison of DBT with DM, and were also able to present data on a broader range of outcomes (sensitivity and specificity as well as CDR and recall rate) hence we extend on existing reviews. In addition, we conducted quality assessment of the included evidence which was not done in the other reviews on this issue.
Studies included in our review were heterogeneous in several aspects. Firstly, some studies included asymptomatic or symptomatic population. Although aiming to investigate the accuracy of DBT and DM in BC screening, some studies included women who were recalled after screening [12, 17, 21, 30]. By doing so, they obtained populations recalled to assessment which have higher cancer rates than unselected asymptomatic populations. However, the results from the screening and diagnostic settings had generally comparable findings. Secondly, retrospective studies tended to perform single-reading whereas the prospective studies performed double-reading (reflecting screening practice in various settings) which may increase CDRs in the latter. In the two STORM trials, screen-reading results were based on recall by either reader, making the recall rate of integrated DM and DBT higher than DM in STORM-2  and non-significantly lower in STORM-1 . Additionally, all but one  screening study used DBT together with DM while among five diagnostic studies two studies used DBT as stand-alone modality [17, 30].
Another difference among studies was the outcome definitions which may be contributing to some of the observed heterogeneity. Studies performed in the United States used the BI-RADS system for reporting recall whereas the European studies used a simplified ‘recall or no recall’ reporting for screen-readings. Amongst studies using BI-RADS, different thresholds were used to define recall or positive test. When analysing data for different thresholds, the result of recall rate in screening studies using two study groups remained significantly lower for DBT compared to DM but the heterogeneity decreased. Studies defining BI-RADS 0 as recall [22, 27, 29] showed a larger decrease in recall rate than studies using a different definition (BI-RADS 0,3,4,5  or BI-RADS 0,4,5  or where unspecified [26, 28]). Among four diagnostic studies reporting sensitivity and specificity, one study from the UK  used a lower threshold (BI-RADS 3 instead of BI-RADS 4) and reported lower specificity than other diagnostic studies which may account for more false positives.
The main limitation of the data used in our analysis is that all but two screening studies [20, 23] used only a single screening (likely to be first round) for DBT. When only first DBT screening rounds are used [19, 24, 26, 27, 28, 29], more prevalent cases are usually detected, increasing CDR and potentially exaggerating the contribution of DBT to screen-detection measures. Another limitation is the short temporal perspective in all these studies: because of the recent introduction of DBT, the lack of long follow-up makes it impossible to assess whether the improved CDR and sensitivity of DBT screening further reduces BC mortality through screening compared to screening with DM alone. The retrospective studies had one major limitation in that they used two study groups (DM group versus DBT group) which were not randomly assigned to screening modalities. The study groups were from different time periods (or services) and in the DBT implementation phase, or due to the limit of DBT availability, there may have been selection to DBT screening and hence potential bias. In those studies, DBT groups were more likely to include women with dense breasts and family history indicating high cancer rate. The incremental value of DBT in those studies may be partially due to possible selection bias. Finally, in order to be able to compute the results, we made an assumption of independent screens, which might not be the case in the few studies that included more than one screening round [20, 25] or where study populations might overlap [16, 19]. However, estimates from those studies were similar to the other studies, thus we do not foresee that this assumption affected our reported findings.
We performed a systematic review to summarize current evidence on the use of DBT in BC screening and diagnosis specifically in women with dense breasts on mammography. We identified and systematically examined data for women with dense breasts from 16 eligible studies to report the most extensive review so far on the accuracy of DBT in women with dense breasts. Moreover, this is the first review assessing the quality of evidence and bias in the studies on DBT in women with dense breasts.
We found that in both the screening and diagnostic settings, DBT improved CDR (versus DM) in women with dense breasts. In the diagnostic setting, using DBT with or without DM increased sensitivity but did not change specificity. There was a significant reduction in recall rate when using DBT with DM (versus DM) in retrospective screening studies comparing between two study groups, although heterogeneity across studies was relatively high. A small number of prospective studies conducted in organized screening programs did not show reduced recall from using DBT. Improved CDR and reduced recall rate from DBT may imply a more effective screening test or diagnostic work-up for women with dense breasts. However, the critical issue is that more studies with longer follow-up and more screening rounds are necessary to draw definite conclusions on whether this improvement in cancer detection has an impact on interval cancer rates and potentially on BC mortality.
We acknowledge A.M. Daszczuk for her contribution to article reviewing and data extraction for this work.
The work had no specific funding. N. Houssami receives support through a National Breast Cancer Foundation (NBCF Australia) Breast Cancer Research Leadership Fellowship.
Availability of data and materials
The authors declare that all the data supporting the findings of this study are available within the article and its supplementary information files.
XAP: methodology, validation, formal analysis, investigation, data curation, writing-original draft, writing-reviewing and editing, project administration. AT: investigation, writing-reviewing and editing. NH: methodology, validation, investigation, writing-reviewing and editing. MJWG: conceptualization, methodology, validation, investigation, writing-reviewing and editing. GHdB: conceptualization, methodology, validation, formal analysis, investigation, data curation, writing-original draft, writing-reviewing and editing, project administration, supervision. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 5.D’Orsi CJ, Mendelson EB, Ikeda DM, et al. Breast imaging reporting and data system: ACR BI-RADS-breast imaging atlas. Reston: American College of Radiology; 2003.Google Scholar
- 6.D’Orsi CJ, Sickles EA, Mendelson EB, et al. ACR BI-RADS atlas, breast imaging reporting and data system. Reston: American College of Radiology; 2013.Google Scholar
- 7.Kerlikowske K, Zhu W, Hubbard RA, Geller B, Dittus K, Braithwaite D, Wernli KJ, Miglioretti DL, O'Meara ES. Breast Cancer Surveillance Consortium. Outcomes of screening mammography by frequency, breast density, and postmenopausal hormone therapy. JAMA Intern Med. 2013;173:807–16.CrossRefPubMedPubMedCentralGoogle Scholar
- 8.Sprague BL, Gangnon RE, Burt V, Trentham-Dietz A, Hampton JM, Wellman RD, Kerlikowske K, Miglioretti DL. Prevalence of mammographically dense breasts in the United States. J Natl Cancer Inst. 2014;106 https://doi.org/10.1093/jnci/dju255.
- 12.Shin SU, Chang JM, Bae MS, Lee SH, Cho N, Seo M, Kim WH, Moon WK. Comparative evaluation of average glandular dose and breast cancer detection between single-view digital breast tomosynthesis (DBT) plus single-view digital mammography (DM) and two-view DM: correlation with breast thickness and density. Eur Radiol. 2015;25:1–8.CrossRefPubMedGoogle Scholar
- 13.Skaane P, Bandos AI, Gullien R, Eben EB, Ekseth U, Haakenaasen U, Izadi M, Jebsen IN, Jahr G, Krager M, Niklason LT, Hofvind S, Gur D. Comparison of digital mammography alone and digital mammography plus tomosynthesis in a population-based screening program. Radiology. 2013;267:47–56.CrossRefPubMedGoogle Scholar
- 16.Bernardi D, Macaskill P, Pellegrini M, Valentini M, Fantò C, Ostillio L, Tuttobene P, Luparia A, Houssami N. Breast cancer screening with tomosynthesis (3D mammography) with acquired or synthetic 2D mammography compared with 2D mammography alone (STORM-2): a population-based prospective study. Lancet Oncol. 2016;17:1105–13.CrossRefPubMedGoogle Scholar
- 17.Carbonaro LA, Di Leo G, Clauser P, Trimboli RM, Verardi N, Fedeli MP, Girometti R, Tafà A, Bruscoli P, Saguatti G, Bazzocchi M, Sardanelli F. Impact on the recall rate of digital breast tomosynthesis as an adjunct to digital mammography in the screening setting. A double reading experience and review of the literature. Eur J Radiol. 2016;85:808–14.CrossRefPubMedGoogle Scholar
- 18.Chae EY, Kim HH, Cha JH, Shin HJ, Choi WJ. Detection and characterization of breast lesions in a selective diagnostic population: diagnostic accuracy study for comparison between one-view digital breast tomosynthesis and two-view full-field digital mammography. Br J Radiol. 2016;89:20150743.CrossRefPubMedPubMedCentralGoogle Scholar
- 19.Ciatto S, Houssami N, Bernardi D, Caumo F, Pellegrini M, Brunelli S, Tuttobene P, Bricolo P, Fantò C, Valentini M, Montemezzi S, Macaskill P. Integration of 3D digital mammography with tomosynthesis for population breast-cancer screening (STORM): a prospective comparison study. Lancet Oncol. 2016;14:583–9.CrossRefGoogle Scholar
- 20.Conant EF, Beaber EF, Sprague BL, Herschorn SD, Weaver DL, Onega T, Tosteson AN, McCarthy AM, Poplack SP, Haas JS, Armstrong K, Schnall MD, Barlow WE. Breast cancer screening using tomosynthesis in combination with digital mammography compared to digital mammography alone: a cohort study within the PROSPR consortium. Breast Cancer Res Treat. 2016;156:109–16.CrossRefPubMedPubMedCentralGoogle Scholar
- 21.Gilbert FJ, Tucker L, Gillan MG, Willsher P, Cooke J, Duncan KA, Michell MJ, Dobson HM, Lim YY, Purushothaman H, Strudley C, Astley SM, Morrish O, Young KC, Duffy SW. The TOMMY trial: a comparison of TOMosynthesis with digital MammographY in the UK NHS breast screening Programme--a multicentre retrospective reading study comparing the diagnostic performance of digital breast tomosynthesis and digital mammography with digital mammography alone. Health Technol Assess. 2015;19(i-xxv):1–136.CrossRefPubMedPubMedCentralGoogle Scholar
- 23.Lang K, Andersson I, Rosso A, Tingberg A, Timberg P, Zackrisson S. Performance of one-view breast tomosynthesis as a stand-alone breast cancer screening modality: results from the Malmo breast Tomosynthesis screening trial, a population-based study. Eur Radiol. 2016;26:184–90.CrossRefPubMedGoogle Scholar
- 28.Sharpe RE Jr, Venkataraman S, Phillips J, Dialani V, Fein-Zachary VJ, Prakash S, Slanetz PJ, Mehta TS. Increased Cancer detection rate and variations in the recall rate resulting from implementation of 3D digital breast Tomosynthesis into a population-based screening program. Radiology. 2016;278:698–706.CrossRefPubMedGoogle Scholar
- 30.Waldherr C, Cerny P, Altermatt HJ, Berclaz G, Ciriolo M, Buser K, Sonnenschein MJ. Value of one-view breast tomosynthesis versus two-view mammography in diagnostic workup of women with clinical signs and symptoms and in women recalled from screening. AJR Am J Roentgenol. 2013;200:226–31.CrossRefPubMedGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.