Radiological review of prior screening mammograms of screen-detected breast cancer

Objective To perform a radiological review of mammograms from prior screening and diagnosis of screen-detected breast cancer in BreastScreen Norway, a population-based screening program. Methods We performed a consensus-based informed review of mammograms from prior screening and diagnosis for screen-detected breast cancers. Mammographic density and findings on screening and diagnostic mammograms were classified according to the Breast Imaging-Reporting and Data System®. Cases were classified based on visible findings on prior screening mammograms as true (no findings), missed (obvious findings), minimal signs (minor/non-specific findings), or occult (no findings at diagnosis). Histopathologic tumor characteristics were extracted from the Cancer Registry of Norway. The Bonferroni correction was used to adjust for multiple testing; p < 0.001 was considered statistically significant. Results The study included mammograms for 1225 women with screen-detected breast cancer. Mean age was 62 years ± 5 (SD); 46% (567/1225) were classified as true, 22% (266/1225) as missed, and 32% (392/1225) as minimal signs. No difference in mammographic density was observed between the classification categories. At diagnosis, 59% (336/567) of true and 70% (185/266) of missed cancers were classified as masses (p = 0.004). The percentage of histological grade 3 cancers was higher for true (30% (138/469)) than for missed (14% (33/234)) cancers (p < 0.001). Estrogen receptor positivity was observed in 86% (387/469) of true and 95% (215/234) of missed (p < 0.001) cancers. Conclusions We classified 22% of the screen-detected cancers as missed based on a review of prior screening mammograms with diagnostic images available. One main goal of the study was quality improvement of radiologists’ performance and the program. Visible findings on prior screening mammograms were not necessarily indicative of screening failure. Key Points • After a consensus-based informed review, 46% of screen-detected breast cancers were classified as true, 22% as missed, and 32% as minimal signs. • Less favorable prognostic and predictive tumor characteristics were observed in true screen-detected breast cancer compared with missed. • The most frequent mammographic finding for all classification categories at the time of diagnosis was mass, while the most frequent mammographic finding on prior screening mammograms was a mass for missed cancers and asymmetry for minimal signs.


Introduction
Breast cancer is diagnosed among screening participants as screen-detected breast cancer or interval breast cancer (breast cancer diagnosed between two scheduled screening rounds after a negative screening episode). When obvious mammographic findings corresponding to the location of the tumor are visible on prior screening mammograms, which in retrospect should have resulted in a recall, the cancer may be defined as missed. In studies, up to 50% of interval and screen-detected breast cancers can present visible findings on prior screening mammograms, ranging from minor benign-looking findings to obviously missed cancers [1][2][3][4][5][6][7][8]. Breast cancer can be missed at screening due to misperception-the lesion is not perceived by the radiologist-or misinterpretation-the lesion is detected by the radiologist, but not considered suspicious enough to warrant a recall, either by the reading radiologist or at a consensus meeting. Further, unsatisfactory image quality, positioning, or inadequate assessment at recall may cause a cancer to be missed [9,10]. Mammographic density may also impact the rates of missed breast cancer due to the masking effect of dense breast tissue and overlapping structures [11].
Radiologic reviews may be useful for quality assurance and quality improvement of both the program and the radiologists. European guidelines [12] and BreastScreen Norway quality manual [13] recommend continuous surveillance and regular review of screening mammograms performed prior to diagnosis of interval breast cancer. Further, in the National Health Service Breast Screening Programme (UK), radiologists are obliged to audit mammograms of interval breast cancer [14,15]. Use of audits is discussed also in other countries, and we expect this topic to receive more attention in the future [16]. However, the information available as well as the number of reviewers affect the results of the review [3,4]. Further, whether a commitment to inform the women will affect the results of a review or an audit is debatable.
Several review studies of interval breast cancer have been performed [8,[17][18][19][20]. Larger review studies of screendetected breast cancer, particularly including digital mammography (DM), are, to our knowledge, sparse. Screendetected breast cancer with no visible findings on prior screening mammograms, defined as true cases, may grow faster than missed breast cancer. Thus, different histopathological characteristics and different mammographic findings are anticipated for true versus missed screen-detected breast cancers.
We conducted a nationwide consensus-based, informed review within BreastScreen Norway. The study included prior screening mammograms and mammograms available at diagnosis from women diagnosed with screen-detected breast cancer. The overall aim of the study was quality improvement for radiologists' performance and the program as such. The objectives were to investigate the proportions of true and missed screen-detected breast cancers and to explore whether mammographic findings, density, or histopathological characteristics differed between the two groups. We hypothesized that these three aspects differed between true and missed screendetected breast cancers.

Materials and methods
The study was approved by the data protection official at the Cancer Registry of Norway (CRN) (PVO approval number: 2016/4696), and the local breast centers agreed to the study. The Cancer Registry Regulation waived the requirement to obtain written informed consent for use of screening data for quality assurance and research [21]. We received de-identified data for analyses from the CRN.
BreastScreen Norway offers women aged 50-69 biennial screening with two-view standard DM. The screening exams take place at 27 stationary or mobile units. Screen reading is performed at 16 breast centers and includes independent double reading by breast radiologists. The radiologists score each breast on a 5-point scale; 1 indicates negative findings, and 5 indicates a high suspicion for malignancy. Exams scored ≥ 2 by either radiologist are discussed in a consensus meeting to decide whether to recall the woman [22]. The median annual reading volume for radiologists during 1996-2016 was 4492 exams; 46% of the radiologists had over 10 years of screen reading experience [23].

Materials and review logistics
CRN randomly extracted 85 screen-detected breast cancer cases from each of the 16 breast centers. All examinations were performed with DM during 2006-2015, and all women had a screening exam with DM 2 years previously (prior screen). We aimed to review at least 75 cases at each center within an 8-h session, including instruction time. Panels of 5 breast radiologists performed the review from September 2016 to April 2017. Radiologists not participating in the panel could observe the review session.
We performed a consensus-based, fully informed retrospective radiological review. All screening and diagnostic images were available to the reviewers, including ultrasound and MRI, as well as histopathological reports. The breast centers were divided into 8 pairs; two radiologists from each center reviewed each other's images and made up the consensus panel together with one independent radiologist, the first author (T.H.). To qualify for the panel, we required the radiologists to have at least 1 year of experience in screen reading and a reading volume of ≥ 5000 mammograms during the past 2 years.
To ensure consistency across the centers in the review procedures, classifications, registration, and coding of results, T.H. took part in all reviews together with a representative from CRN. Ahead of each session, T.H. presented the classification systems and general instructions for the review. In the case of dissent among the panel members, a majority decision was made. All images were reviewed locally from the picture archiving and communication system (PACS).

Review procedure
The review procedure is described in Fig. 1. First, we reviewed the mammograms, resulting in recall and diagnosis of screen-detected breast cancer, and thereafter classified mammographic density using the Breast Imaging-Reporting and Data System (BI-RADS) 5th edition categories a-d [24]. We identified the malignancy and classified mammographic findings as mass, calcifications, asymmetry, distortion, or associated findings using the BI-RADS lexicon. If calcifications were present alongside another finding, the non-calcification finding was preferred for classification, unless calcifications were the dominant finding. We classified the largest tumor in case of multifocality or bilateral disease. We measured the diameter (mm) of the findings on the mammogram, using electronic calipers. If no malignancy was visible at the time of diagnosis, the case was classified as occult.
Thereafter, we reviewed and classified the finding, based on its visibility on prior screening mammograms. True cancers showed no findings at the eventual cancer site on prior screening mammograms (Fig. 2a, b). Cancers with obvious findings at the cancer site on priors which retrospectively should have resulted in a recall, as considered by the reviewing radiologists, were defined as missed (Fig. 2c, d).
Minimal sign cancers showed minor findings on prior mammograms, not necessarily warranting assessment (Fig. 2e, f). At review, minimal signs were classified as either actionable (recall considered possible, but not expected within a screening program) or non-actionable (non-specific findings, recall not considered possible). However, we consider all minimal signs as one category in the main analyses. Finally, we classified mammographic findings on prior mammograms for missed and minimal sign cancers.
Histopathological information extracted from the CRN database was merged with data from the review and communicated to the radiologists after complete classification of each case. Prognostic characteristics included histological type (ductal carcinoma in situ (DCIS), invasive carcinoma of no special type (NST), invasive lobular carcinoma (ILC), other invasive carcinomas) and, for invasive cancers, also histological grade, histopathological tumor diameter (mm), and lymph node status. Predictive tumor characteristics for invasive cancers included estrogen receptor (ER) and progesterone receptor (PR) status.

Statistical analyses
We performed descriptive analyses of age at diagnosis, review classification categories, mammographic findings, mammographic density, and histopathological characteristics. Data were presented as percentages with 95% confidence intervals (CIs), calculated using the Clopper-Pearson mid-P interval; means ± standard deviations (SDs); and medians with interquartile ranges (IQRs). Chi-square tests and independent sample t tests were used to test the differences between review classification categories and mammographic findings or histopathological characteristics, as well as between mammographic findings and histopathological characteristics. The Bonferroni correction was used to adjust for multiple testing, and a p value < 0.001 was considered statistically significant. IBM SPSS Statistics (version 25) was used for all analyses.

Results
We reviewed and classified mammograms from 1227 women screened with DM, recalled due to mammographic findings and diagnosed with breast cancer. We excluded two mammographically occult cases. The final study sample thus consisted of 1225 cases.
Eighty-two percent (247/302) of cases classified as asymmetries on prior mammograms were classified as masses at diagnosis. Ninety-nine percent (132/134) of the cases classified as masses at priors were classified as masses at diagnosis; 79% (112/142) of calcifications on priors also presented as such at diagnosis, and 18% (26/142) of calcifications presented as a mass. Among cancers classified as distortions on priors, 51% (40/78) were classified as distortions at diagnosis and 46% (36/78) as a mass (Fig. 3).
We did not observe any differences in mammographic density between classification groups; the percentage of BI-RADS a + b was 66% (372/567) for true, 67% (178/266) for missed, and 66% (260/392) for minimal signs (Table 1, Fig.  4a). However, the percentages of calcifications and distortions were statistically higher in mammograms classified with high (BI-RADS c + d) mammographic density compared with low (BI-RADS a + b), both at diagnosis (Fig. 4b) and at prior screening (Fig. 4c).
For masses, the percentage of histological grade 3 invasive cancer was higher for true than for missed and minimal sign screen-detected breast cancer (p < 0.001 for true compared with both missed and minimal signs); otherwise, we observed no differences in histopathological tumor characteristics stratified by review classification categories and mammographic findings (Table 4). We observed no differences for mammographic findings, histologic tumor type, diameter, and grade for minimal signs, actionable versus non-actionable tumors ( Table 5 in the Appendix).

Discussion
In this informed, consensus-based review of mammograms from prior screening and diagnosis of 1225 women with screen-detected cancer, radiologists classified 46% as true, 22% as missed, and 32% as minimal signs. The most frequent mammographic finding at diagnosis was a mass for all classification categories; no statistically significant differences were observed between the classification categories regarding mammographic findings at diagnosis. At prior screening, the most frequent mammographic finding for missed cancer was a mass, whereas for minimal sign cancer, it was asymmetry. The majority of asymmetries at prior screening progressed Fig. 4 a Distribution of review classification groups based on findings on prior screening mammograms (true, missed, or minimal signs) stratified by the BI-RADS density score (low: BI-RADS a + b; high: BI-RADS c + d) (p = 0.88). b Distribution of mammographic findings at diagnosis (mass, calcifications, asymmetry, or distortion) stratified by the BI-RADS density score (p < 0.001). c Distribution of mammographic findings on prior screening mammograms stratified by the BI-RADS density score p < 0.001 Fig. 3 Mammographic findings on prior screening mammograms of missed and minimal sign cancers, stratified by mammographic findings on mammograms at diagnosis of screen-detected breast cancer into masses by the time of diagnosis. Mammographic density did not differ between the review classification categories. True invasive cancers were more often histological grade 3 and had less favorable hormonal status than missed and minimal sign invasive cancers.
Our findings support results from other retrospective, informed review studies of screen-detected breast cancer. In a study by Ikeda et al [8], findings at the later cancer site, obvious or non-specific, were observed retrospectively in 67% of screen-detected breast cancers. Van Breest Smallenburg et al [9] found 21% of screen-detected breast cancer to be missed and 22% with non-specific minimal signs at informed review, and Broeders et al [7] identified findings on prior screening mammograms in 53% of screen-detected cancers. However, all three studies included screen-film mammography (SFM).
In an informed review from BreastScreen Norway, 12% of screen-detected breast cancers were classified as missed and 9% minimal sign actionable for screening with SFM [6] and 10% missed and 9% minimal sign actionable for screening with DM [1]. However, the review procedures and definition of classification groups differed between the studies, making comparison challenging. In experimental review studies of interval breast cancer exploring different study designs, the percentage of cancers classified as missed differed largely depending on review procedure and number of radiologists. Hofvind et al [3] found the percentage of missed interval breast cancer ranging from 1% (mixed, blinded individual review) to 34% (informed, consensus-based) and 36% (informed individual). In a study by Ciatto et al [4], the proportion of missed (screening error) cancers varied from 24% in a simulated blinded review to 42% in a simulated fully informed review. Further, studies have demonstrated that the proportion of missed cancer is affected by how close the study setting is to a normal screening setting [25,26]. Easy understandable standardized definitions and recommendations on classification groups and review procedure are needed to enable future comparison of results from reviews. True cancers include tumors detected at an early stage, reflected by the higher percentage of DCIS among true versus missed cancers. However, true cancers may also be fast- Unless otherwise specified, data are the number of patients and 95% confidence interval in parentheses DCIS ductal carcinoma in situ, NST no special type, ILC invasive lobular carcinoma, ER estrogen receptor, PR progesterone receptor, SD standard deviation, IQR interquartile range *p < 0.001, compared with calcifications; **p < 0.001, compared with distortion growing tumors, with less favorable tumor characteristics [27,28]. This is illustrated in our study by the larger percentage of histological grade 3 invasive cancer among true compared with missed and minimal sign cancers.  [29][30][31].
A mass was the most frequent finding on prior mammograms of missed cancers, and a special emphasis on masses at screening might be reasonable. Masses might be misinterpreted as benign, in particular if retrospectively visible on more than one prior screening exam or if not having spiculated margins [18,32]. Further, the mean diameter of mammographic findings of missed cancer was 11 mm, which usually is regarded to be above the limit for visual perception. This could indicate that a certain proportion of cancers was missed due to misinterpretation at screen reading or dismissed at consensus. The high frequency of asymmetries on priors of missed and minimal signs developing into masses at the screening exam leading to diagnosis of cancer is in line with other studies; increased awareness of asymmetries may be useful to reduce the burden of missed cancers at screening [32,33]. However, asymmetries are common and most often represent glandular tissue, in particular if visible only in one view. Thus, radiologists should be attentive and should avoid an unreasonably increase in the recall rate for such findings. Evaluating more than one prior screening exam may be valuable in this respect. Moreover, a recent study showed that increasing the recall rate mainly increased detection of low-grade and not high-grade cancer [34]. This is consistent with our results demonstrating a higher proportion of tumors of low and intermediate histological grade among missed/minimal signs than true; an increased recall rate would probably reduce the proportion of missed and minimal signs.
Our study has some limitations. First, the review was consensus-based and all images were available-this design yields the highest percentage of missed cancers in review studies [3][4][5]. This limits the external generalizability of our results. Second, our study was performed at 16 breast centers with images from a span of approximately 10 years, and major heterogeneity in the combinations of PACS, workstations, and mammographic equipment. As a result, the image quality and presentation differed between centers, which might have influenced the assessment of review classifications, mammographic findings, and density. Third, the consensus panel included five radiologists: one who participated in all the reviews and four who only participated in one session at their own and one at the paired center. Although we presented and communicated the general instructions and classification systems to all radiologists at the start of each review, some differences in interpretation and assessment between radiologists are likely to have occurred. However, we included mammograms and radiologists from all breast centers in Norway, and our study is as far as we are aware of, the largest reported in peer-reviewed journals, which we consider as strength. Finally, we assessed mammographic density from the screening mammograms at diagnosis. This might have biased the association between review categories (classified from prior screening mammograms 2 years earlier) and mammographic density, as the women's breast density might have decreased during the 2 years.
The study confirmed our hypothesis that mammographic findings and histopathologic characteristics differed between true and missed screen-detected breast cancers in BreastScreen Norway. However, we did not identify differences in mammographic density between review classification categories. One main goal of the classification of missed cancers was quality improvement for radiologists' performance and the program. We would like to emphasize that the review and study setting differed substantially from real-life screening settings. Visible findings on priors were not necessarily indicative of a screening failure. Recalling all women with subtle findings would increase the rate of false positive recalls and probably also the detection of small, low proliferation tumors (overdiagnosis). This is important to keep in mind during audits and when exploring medicolegal aspects of mammographic screening. waiver of informed consent to perform surveillance, quality assurance, and studies based on data collected as a part of invitation to and participation in the program Ethical approval Institutional review board approval was obtained. Data was collected from BreastScreen Norway and thus covered by the Cancer Registry Regulation. Data was approved by the institutional Data Protection Officer at the Cancer Registry and the Heads of Department and/or research administration at the local breast centers.
Study subjects or cohorts overlap Some study subjects or cohorts are reported in the study of Tsuruda K, et al: Survival among women diagnosed with breast cancer retrospectively classified as true, missed, or minimal signs through radiological review. Will be submitted to European Radiology, March 2020.

Methodology
• retrospective • observational • multicenter study Appendix Unless otherwise specified, data are the numbers of patient and 95% confidence intervals in parentheses DCIS ductal carcinoma in situ, NST no special type, ILC invasive lobular carcinoma, SD standard deviation, IQR interquartile range *p = 0.002, compared with minimal sign actionable Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.