Introduction

Digital breast tomosynthesis (DBT) has the potential to replace or complement digital mammography (DM) in breast cancer screening. Before implementing a new breast cancer screening method, false positives should be analyzed to further understand the consequences of the new method. The majority of recalls in breast cancer screening are false positives, and they often cause psychosocial distress and may lead to less re-attendance [1,2,3,4]. The risk of breast cancer is higher in women after a false-positive mammography screening compared to a true negative screening [5, 6]. Prospective European trials have shown both higher and lower false-positive recall rates in screening with DBT compared to DM [7,8,9,10], whereas retrospective trials from the USA have shown lower false-positive recall rates compared to DM [11].

Information about false-positive recall characteristics in screening with DBT, other than rates, is limited. A study investigating finding types with combined DBT/DM compared to DM only, leading to true-positive and false-positive examinations, found that in both groups, the mammographic appearance of asymmetry led to most false-positive examinations, followed by calcifications [12]. An analysis of the false-positive recalls in the Oslo Breast Tomosynthesis Screening Trial showed that the lower false-positive rate with DBT plus DM compared to DM was due to fewer asymmetric densities [13]. Similar results were shown in the randomized controlled To-Be trial [14]. In the first half of the Malmö Breast Tomosynthesis Screening Trial (MBTST), comparing one-view DBT to two-view DM, the false-positive rate was higher for DBT alone compared to DM alone, mainly due to a higher recall of stellate distortions. The false-positive recall rate was lower over time, indicating a learning curve [15]. Results from the whole trial are important for comparison with other trials. Also, information on false positives recalled with DBT only after an initial stabilization in the false-positive recall rate will add further insight into the early clinical experiences of DBT in screening.

In this study, we evaluated the total number of false-positive recalls including radiographic appearances and false-positive biopsies in the MBTST. We also analyzed the false-positive recall rates and appearances the first trial year compared to trial years 2 to 5.

Materials and methods

The Malmö Breast Tomosynthesis Screening Trial (MBTST)

The MBTST is a prospective, population-based one-arm screening trial comparing one-view DBT (mediolateral oblique view, no synthetic images) with two-view DM (craniocaudal and mediolateral oblique views) in breast cancer screening (ClinicalTrials.gov: NCT01091545). A random sample of women in the screening population in Malmö, Sweden, age 40 to 74 years old, were invited through letter between January 27, 2010, and February 13, 2015. In total 21,691, women were invited to participate. Exclusion criteria were pregnancy and non-Swedish non-English-speakers. Participating women gave written informed consent. The trial was approved by the local ethics committee at Lund University.

Participating women had one-view DBT with a wide-angle (50°) system (Mammomat Inspiration, Siemens Healthineers) and two-view DM, acquired with the same machine, at one screening occasion. In total, seven radiologists, with 2 to 41 years of breast radiology experience, were readers in the trial. The images were read independently in two separate reading groups, the DBT reading group and the DM reading group, by two radiologists in each group. The reading procedure has been described in detail elsewhere [9].

False-positive recalls

False-positive recalls were defined as recalled women who were not diagnosed with breast cancer at screening work-up. Women with breast cancer were identified through cross-linkage with the Cancer Registry South. The false-positive recalls were divided into three separate groups based on the reading arm/s where the finding leading to recall was observed: DBT group (i.e., recalled only in the DBT reading arm), DM group (i.e., recalled only in the DM reading arm), and DBT + DM reading (i.e., recalled both in the DBT reading arm and the DM reading arm). Recalled women underwent work-up according to local routine, commonly including further imaging (typically DM and ultrasound) and if needed, fine needle aspiration and/or core needle biopsy. At the time of the trial, fine needle aspiration was still used in the routine work-up of suspicious findings at our institution.

Variables of interest

Radiographic appearance of the dominant imaging finding leading to recall was assessed through the report from the consensus meeting or the report from the initial work-up through the Radiology Information System, divided into the following categories: stellate distortion (the finding includes distortions and a density with stellate configuration), circumscribed mass, indistinct density, architectural distortion (parenchymal disorganization without stellate configuration), focal asymmetry, calcifications, and other (such as nipple retraction or skin thickening). Self-reported breast symptoms, with no imaging findings, could also be reason for recall. If there was no clear description of the appearance in the reports, the images were retrospectively reviewed by three or more members in a panel consisting of three breast radiologists, one radiology resident, and two medical students, who categorized the appearance in consensus. At least one of the panel members was a breast radiologist. Categorization of false positives recalled in both reading groups was performed based on the appearance at DBT if there was a discrepancy between the modalities. The outcome of the work-up was divided into the following categories: normal breast tissue, benign cyst, benign calcifications, fibroadenoma, benign lesion not otherwise specified, radial scar, surgical scar tissue, and other (such as skin lesions). The outcome was primarily based on the result of biopsy, retrieved through pathology reports. If no biopsy was performed, the outcome was based on the description of the outcome in the imaging report or, if not clearly stated in the report, by the panel in consensus. Surgery was defined as any surgical procedure, such as open surgical biopsy and breast-conserving surgery.

The work-up time was defined as the time period between the screening examination and until breast cancer was ruled out, which could include one or several visits to the breast imaging unit and/or one or several visits to the breast surgery clinic. The number of imaging exams, i.e., the number of DM, ultrasound, DBT, and/or magnetic resonance exams, during work-up was retrieved through the Radiology Information System. All women were also followed until the next scheduled screening examination, i.e., 18 to 24 months.

False-positive recall rates, appearances, and outcomes were compared between the reading groups. The same parameters were also analyzed the first trial year (trial year 1) compared to trial years 2 to 5 in the DBT reading group and the DM total group (DM reading group and DBT + DM group).

Some women diagnosed with the high-risk lesion lobular carcinoma in situ go through breast surgery. They are not considered false-positive lesions in this study, but are presented for data completeness.

Statistical analyses

Descriptive methods (numbers and percentages) were used to analyze and present data in the reading groups. The false-positive recall rate was defined as the number of false-positive recalls per 100 screened women (%) and the false-positive biopsy rate as the number of biopsies (fine needle aspiration and core needle biopsy) per 100 false-positive recalls (%) and were calculated with 95% confidence intervals (CI). The subgroups were not compared other than with numbers and percentages because of small sample sizes. Since a large proportion of women in the DBT + DM group were recalled due to symptoms and not imaging findings, the focus was on differences between the DBT group, i.e., the additional false positives, and the DM group.

Results

In total, 14,848 women participated in the MBTST and 660 women were recalled for work-up. One woman was excluded from the analysis due to lymphoma, one woman due to known distant metastases from breast cancer at screening, and three due to declining work-up. There were 137 women with breast cancer and 514 false-positive recalls (Fig. 1). Mean age at screening in women with false-positive recalls was 53 years (standard deviation ± 9.7). The false-positive recall rate was 3.5% (514/14,848, 95% CI: 3.3; 3.8) in total, 1.6% (244/14,848, 95% CI: 1.4; 1.8) in the DBT group, 0.8% (121/14,848, 95% CI: 0.7;1.0) in the DM group, and 1.0% (149/14,848, 95% CI: 0.9; 1.1) in the DBT + DM group (Table 1). The false-positive recall rate in the DBT group was higher during trial year 1, 2.6% (38/1480, 95% CI: 1.8; 3.5) and then stabilizing at 1.5% (206/13,368, 95% CI: 1.3; 1.8). The false-positive recall rate in the DM group varied between 0.5 and 1% throughout the trial (Fig. 2).

Fig. 1
figure 1

Flowchart of population and design in the Malmö Breast Tomosynthesis Screening Trial. DBT, digital breast tomosynthesis; DM, digital mammography; LCIS, lobular carcinoma in situ

Table 1 False positives in the Malmö Breast Tomosynthesis Screening Trial. False-positive recalls and characteristics in different reading groups. Numbers within brackets are percentages if not otherwise stated. DBT digital breast tomosynthesis, DM digital mammography, CI confidence interval, NOS not otherwise specified
Fig. 2
figure 2

False-positive recall rate over time in the Malmö Breast Tomosynthesis Screening Trial for digital breast tomosynthesis (DBT) reading group only, digital mammography (DM) reading group only, and recalls in both reading groups (DBT + DM). FP, false-positive recalls

The most common radiographic appearance of false-positive recalls in the DBT group was a stellate distortion, 37.3% (91/244), whereas the most common radiographic appearance in the DM group was a circumscribed mass, 29.8% (36/121). In the DBT + DM group, the most common reason for recall was symptoms, 38.3% (57/149). Normal breast tissue was the dominant work-up outcome in both the DBT group, 57.0% (139/244) and in the DM group, 50.4% (61/121) (Table 1). The outcome of stellate distortions showed even higher proportions of normal breast tissue, 76.9% (70/91) in the DBT group and 96.6% (28/29) in the DM group (Table 2). Two image examples of false-positive recalls are shown in Figs. 3 and 4.

Table 2 Outcome of false-positive recalls with the radiographic appearance of stellate distortion in different reading groups in the Malmö Breast Tomosynthesis Screening Trial. Numbers within brackets are percentages. DBT digital breast tomosynthesis, DM digital mammography
Fig. 3
figure 3

A 49-year-old woman was recalled in the DBT group due to a circumscribed mass (circle). One-view digital breast tomosynthesis (a) and mediolateral oblique (b) and craniocaudal (c) views at digital mammography at screening. Ultrasound (d) showed a benign cyst confirmed by fine needle aspiration

Fig. 4
figure 4

A 48-year-old woman who was recalled in the DBT group due to a stellate distortion (circle). One-view digital breast tomosynthesis (a) and mediolateral oblique (b) and craniocaudal (c) views at digital mammography and at screening. The false-positive finding enlarged (d). Ultrasound at work-up (e) showed a diffuse irregular lesion (arrows). Core needle biopsy confirmed a radial scar

The false-positive biopsy rate was similar in the DBT group, 29.5% (72/244, 95% CI: 23.9; 35.7), and the DM group, 31.4% (38/121, 95% CI: 23.3; 40.5) and higher in the DBT + DM reading group, 61.7% (92/149, 95% CI: 53.4; 69.6). The false-positive core needle biopsy rate was also similar in the DBT group, 6.1% (15/244, 95% CI: 3.5; 9.9), and the DM group, 4.1% (5/121, 95% CI: 1.4; 9.4), but higher in the DBT + DM group, 12.1% (18/149, 95% CI: 7.3; 18.4). In the DBT group, 26.4% of lesions leading to biopsy were stellate compared to no stellate distortions leading to biopsy in the DM group. The most common outcome of biopsy was a benign cyst in all three groups. There were 12 radial scars as outcome of biopsy in the DBT group, but no radial scars in the DM group. The work-up time was longer in the DBT group, median 48 days, compared to 29 and 31 days in the DM group and in the DBT + DM group, respectively. In total, 11 of the false-positive recalled women underwent surgery (Table 3).

Table 3 False-positive biopsies, work-up time, and imaging work-up in the Malmö Breast Tomosynthesis Screening Trial. Numbers within brackets are percentages if not otherwise stated. DBT digital breast tomosynthesis, DM digital mammography, CI confidence interval, NOS not otherwise specified

The most common radiographic appearance leading to a false-positive recall in the DBT group during trial year 1 was a stellate distortion, 50% (19/38). During trial years 2 to 5, the proportion of stellate distortions was lower, 35.0% (72/206). The false-positive biopsy rate in the DBT group was lower during trial year 1, 16% (6/38, 95% CI: 6; 31) than during trial years 2 to 5, 32.0% (66/206, 95% CI: 25.7; 38.9) (Table 4).

Table 4 False-positive recalls in the Malmö Breast Tomosynthesis Screening Trial in trial year 1 and trial years 2 to 5 in the digital breast tomosynthesis (DBT) group and in the total digital mammography (DM) group (DM group and DMT + DM group). Numbers within brackets are percentages if not otherwise stated. CI confidence interval, NOS not otherwise specified

Lobular carcinoma in situ

There were in total four women (mean age 55 years at screening, standard deviation ± 16) with lobular carcinoma in situ detected in the MBTST; two in the DBT group and two in the DBT + DM group. Three were recalled due to stellate distortions and one was recalled due to symptoms. All four women had surgery.

Discussion

The false-positive recall rate in the prospective population-based Malmö Breast Tomosynthesis Screening Trial (MBTST), with 14,848 participating women, was higher in screening with one-view digital breast tomosynthesis (DBT) only, 1.6%, compared to screening with digital mammography (DM) only, 0.8%. The radiographic appearance of stellate distortion was more common with DBT, 37.3%, compared with DM 24.0%. The higher false-positive recall rate in the DBT group during trial year 1, 2.6%, was stabilized at 1.5% during trial years 2 to 5, mainly due to a lower proportion of stellate distortions over time. Of lesions leading to biopsy, 26.4% were stellate in the DBT group compared to non-stellate distortion leading to biopsy in the DM group. There were 12 radial scars diagnosed with DBT and none with DM, but apart from that, the outcome of the work-up was similar in the two modalities. Our results indicate a small increase in false-positive recalls and increased detection of stellate distortions when introducing DBT in screening. The decrease in proportion of stellate distortions over time could indicate a learning curve.

In the STORM trial, comparing DM and DBT to DM only, the overall false-positive recall rate was 5.5% (395/7 292) [16]. It showed a lower false-positive recall rate with DBT + DM compared to DM only, 1.0% (73/7 292) and 2.0% (141/7 292), respectively. In the STORM-2 trial, DBT and DM, DBT and synthetic DM, and DM only were compared in screening [7]. The false-positive recall rate was significantly higher for DBT + DM, 4.0% (381/9587) and DBT + synthetic DM, 4.5% (427/9587) compared to DM alone, 3.4% (328/9587). The Oslo Tomosynthesis Screening Trial had four reading arms, where reading arm A + B represented DM (DM and DM + computer-aided detection) and reading arm C + D represented DBT (DBT and DBT + synthetic mammogram) [8]. It showed a post-consensus false-positive recall rate at 3.2% (768/24,301) in reading arms C + D and 2.1% in the A + B reading arms, hence a slightly higher false-positive recall rate in DBT screening, as in our study. The randomized controlled To-Be trial, comparing DBT + synthesized DM to standard DM, had false-positive recall rates of 2.4% (349/14,380) and 3.4% (484/14,369), respectively [14]. The false-positive recall rates in the MBTST are in general low compared to these other trials. However, the designs in all trials are different from ours, hampering the comparison of rates between studies. All these trials, including the MBTST, show results from prevalence screening rounds with no previous DBT screening examinations for comparison. A retrospective study from the USA showed a lower false-positive recall rate with DBT + DM compared to DM in the three first DBT rounds but no difference in rounds 4 and 5 [17]. The breast cancer screening strategy in the USA is however different from the European, with for example recommendations of yearly mammography screening from the age of 40 in women at average risk [18], and the results cannot be applied directly to Europe. Increased experience, in combination with access to prior DBT examinations, could decrease the false-positive recall rates in the future.

Few studies describe the radiographic appearance of false positives in DBT screening. The To-Be trial has shown that the most common radiographic appearance of false-positive recalls with DBT was asymmetry, 28.9%, which is different from our results where only 0.4% showed focal asymmetry. Spiculated masses were uncommon, only 0.6%, whereas stellate distortions were very common in our study, 36.9% [14]. Asymmetries were also most frequent in recalls with DBT in combination with DM in a retrospective study by Kim et al (75.9%) [12]. The inconsistent results between those two studies and our trial are likely to be due to different definitions of appearances, various study populations, screening ages, and screening intervals and that the examinations were performed on DBT machines from different vendors with different acquisition angles and other technical specifications. Regardless, the distribution of lesion types that radiologists will assess in DBT screening will differ from DM screening. The proportions of calcifications in DBT false-positive recalls in the mentioned DBT screening trials were however similar, around 10%, probably because the presence of calcifications is less subjective to classify than other appearances. To the best of our knowledge, no other trial has described the distribution of false-positive recalls and appearances over time, and therefore further studies are warranted to investigate how the false-positive recall rate with DBT evolves over time.

This study has limitations. The recall decision was made after a consensus meeting where all images were available. The appearances were not clearly described in the reports in a few cases and were retrospectively reviewed, meaning that the true reason for recall may differ from the one seen retrospectively. Not all findings leading to recall were biopsied which give an uncertainty of the true outcome, but we know that only one of the false-positive recalled women was diagnosed with cancer within a minimum of 2 years of follow-up. Some women were recalled due to self-reported symptoms, which reflects the screening situation. There was no access to DBT-guided biopsy at the breast imaging unit during the trial which, at least in part, could explain the longer work-up time in the DBT group and this could also have affected the biopsy rate. The results cannot necessarily be generalized to other screening settings as the MBTST was performed in a Swedish screening setting at a single-center using a wide-angle, single-vendor DBT machine.

To conclude, the false-positive recall rate in a digital breast tomosynthesis screening trial was higher compared to digital mammography, but still low compared to other digital breast tomosynthesis screening trials. The higher false-positive recall rate with digital breast tomosynthesis was mainly due to an increased recall of stellate findings; the proportion of these findings was reduced after the first trial year. Studies on false-positive recalls and false-positive appearances in subsequent screening rounds are needed to learn how access to prior digital breast tomosynthesis examinations can affect false positives in tomosynthesis screening.