Screen-detected and interval breast cancer after concordant and discordant interpretations in a population based screening program using independent double reading

Objectives To analyze rates, odds ratios (OR), and characteristics of screen-detected and interval cancers after concordant and discordant initial interpretations and consensus in a population-based screening program. Methods Data were extracted from the Cancer Registry of Norway for 487,118 women who participated in BreastScreen Norway, 2006–2017, with 2 years of follow-up. All mammograms were independently interpreted by two radiologists, using a score from 1 (negative) to 5 (high suspicion of cancer). A score of 2+ by one of the two radiologists was defined as discordant and 2+ by both radiologists as concordant positive. Consensus was performed on all discordant and concordant positive, with decisions of recall for further assessment or dismiss. OR was estimated with logistic regression with 95% confidence interval (CI), and histopathological tumor characteristics were analyzed for screen-detected and interval cancer. Results Among screen-detected cancers, 23.0% (697/3024) had discordant scores, while 12.8% (117/911) of the interval cancers were dismissed at index screening. Adjusted OR was 2.4 (95% CI: 1.9–2.9) for interval cancer and 2.8 (95% CI: 2.5–3.2) for subsequent screen-detected cancer for women dismissed at consensus compared to women with concordant negative scores. We found 3.4% (4/117) of the interval cancers diagnosed after being dismissed to be DCIS, compared to 20.3% (12/59) of those with false-positive result after index screening. Conclusion Twenty-three percent of the screen-detected cancers was scored negative by one of the two radiologists. A higher odds of interval and subsequent screen-detected cancer was observed among women dismissed at consensus compared to concordant negative scores. Our findings indicate a benefit of personalized follow-up. Key Points • In this study of 487,118 women participating in a screening program using independent double reading with consensus, 23% screen-detected cancers were detected by only one of the two radiologists. • The adjusted odds ratio for interval cancer was 2.4 (95% confidence interval: 1.9, 2.9) for cases dismissed at consensus using concordant negative interpretations as the reference. • Interval cancers diagnosed after being dismissed at consensus or after concordant negative scores had clinically less favorable prognostic tumor characteristics compared to those diagnosed after false-positive results. Supplementary Information The online version contains supplementary material available at 10.1007/s00330-022-08711-9.


Introduction
Mammographic screening is shown to reduce mortality from breast cancer and is recommended by international health organizations [1,2]. However, the identification of asymptomatic breast cancers presenting as subtle mammographic findings are challenging, with 20-25% of interval cancers reported to be visible at prior mammograms in informed reviews [3]. Studies from Europe have shown that double reading increased the rate of screen-detected cancer [4]. The recall rate has been shown to be higher for double reading without consensus or arbitration meeting [5], but lower if double reading was followed by consensus or arbitration meeting [6], compared with single reading. European guidelines and the European Commission Initiative on Breast Cancer suggest double reading with consensus or arbitration, but do not specify if double reading should be independent or not [1,7].
Women with false-positive screening results in doublereading programs are shown to have increased risk of interval cancer and cancer detected in the consecutive screening round [8]. However, less is known about the risk of interval cancer among women with screening examinations discussed and dismissed at consensus as well as the prognostic characteristics of such tumors. Two studies have reported a higher interval cancer rate after being dismissed at consensus compared to those with concordant negative screening results [9,10]. To examine this, we obtained data collected as a part of BreastScreen Norway, which provides detailed information about the radiologists' interpretation at both initial screening and consensus, as well as the screening outcome. In this study, we aimed to analyze the odds of screen-detected, interval, and subsequent screen-detected cancer by initial interpretation scores and consensus. Furthermore, we described differences in histopathologic tumor characteristics by screening and consensus interpretations.

Materials and methods
This retrospective cohort study was approved by the data protection official at Oslo University Hospital (20/12601).
The data was disclosed with legal bases in the Norwegian Cancer Registry Regulations of 21 December 2001 No. 47 [11].
BreastScreen Norway is a population-based screening program which started in 1996 and invites all women aged 50-69 to biennial two-view mammography. The program is described in detail elsewhere [12]. Briefly, the Cancer Registry of Norway administers the program and collects information about screening examinations, recalls, diagnostic work-ups, treatment, and surveillance. Digital mammography replaced screen-film mammography gradually from 2004, and all women have been offered digital mammography since 2011. During the first 20 years of the screening program, the annual participation rate in the screening program was 75%, the consensus rate 7%, and the recall rate 3.8%. The rate of screen-detected cancer was 5.9 per 1000 screening examinations and the interval cancer rate 1.8 per 1000 examinations.
Independent double reading with consensus is standard practice in BreastScreen Norway. Each breast is assigned a score from 1 to 5 by each radiologist, where 1 indicates normal findings; 2 probably benign; 3 intermediate suspicion; 4 probably malignant; and 5 high suspicion of malignancy. If both radiologists give a score of 1, the screening examination is considered negative. If either radiologist assigns a score of 2 or higher for one or both breasts, the exam is discussed in consensus to determine whether to recall the woman for further assessment (recall) or not (dismiss). If consensus is not met by the two radiologists, a third is consulted. Examinations dismissed at consensus are considered screen-negative. We defined discordant interpretation as a score of 1 by one of two radiologists and 2 or higher by the other. A score of 2 or higher by both radiologists was defined as concordant positive, while a score of 1 by both radiologists was considered concordant negative. During the study period, 2006-2019, 196 radiologists were registered as readers in the program. The median annual average interpretations per radiologist were 2992 examinations (interquartile range (IQR): 1357-5327).
The study sample included women without a history of breast cancer, screened with standard digital mammography within the study period. To ensure availability of prior digital mammograms for comparison at the time of interpretation, the study period started 2 years after implementation of digital mammography at the 17 centers in BreastScreen Norway. The women were followed for 2 years after index screen to identify interval and screen-detected cancers in the consecutive screening round. Index screenings were performed in 2006-2017, while subsequent screenings were performed in 2008-2019 (Fig. 1). Index screenings included women who had their first screening (prevalent) and women with a previous screening (incident) in BreastScreen Norway (Appendix, Figure 1 and 2). We excluded mammograms that were technically inadequate (n = 495), those with registration error or no independent double reading (n = 1018), and those performed among women with self-reported symptoms (n = 1850).
A screen-detected cancer was defined as breast cancer (ductal carcinoma in situ (DCIS) or invasive breast cancer) diagnosed after a recall. If a recall was concluded negative within 6 months after screening, the screening result was considered false positive. Interval cancer was defined as breast cancer detected after a negative screening result or more than 6 months after a false-positive screening result and within 24 months after screening [8]. For women diagnosed with two or more bilateral synchronous breast tumors, we included the interpretation scores from the breast with the highest score.
Histopathologic tumor characteristics were based on surgical specimens and included histologic type (DCIS, invasive carcinoma no special type, invasive lobular carcinoma, and other types of invasive carcinomas), tumor diameter (mm), histologic grade (grade 1-3), and lymph node involvement. Immunohistochemical subtypes were

Statistical analysis
All analyses were conducted at the woman level rather than at breast level to ensure clinical applicable results. We stratified the index screening examinations by negative, discordant, and concordant positive scores and further into dismissed and recalled at consensus. We used logistic regression to estimate odds of index screen-detected, interval, and incident screen-detected cancer. Results were presented as ORs with 95% confidence intervals (CIs), adjusted for age and prevalent/incident screenings. Chisquare or Fisher exact test was used to test associations between categorical variables (tumor characteristics) and discordant and concordant positive scores, or negative, dismissed, and false-positive screening results. We used the nonparametric test for comparing tumor diameters. A significance level of 0.05 was chosen, and all statistical analyses were performed with Stata MP Version 17.0 (StataCorp).

Discussion
We found that nearly a quarter (23%) of screen-detected cancers were scored negative by one of two interpreting radiologists in an organized screening program using independent double reading with consensus (Figs. 3, 4, and 5). Examinations discussed and dismissed at consensus had higher odds of interval and subsequent screendetected cancer compared to concordant negative examinations. Histopathological results indicate that interval cancers diagnosed after being dismissed at consensus or after concordant negative scores had less favorable prognostic histopathologic tumor characteristics compared to those diagnosed after a false-positive screening result.
Our results showing higher odds of interval cancer after being dismissed at consensus are in line with previous studies [9,10]. A study from UK reported the rate to range Unless otherwise specified, data are presented as numbers with percentage in parenthesis IQR interquartile range, NST no special type, ER estrogen receptor, PR progesterone receptor, Her2 human epidermal growth factor receptor * Overall p value for association between concordant negative/dismissed/false-positive screening results, and the different tumor characteristics from 6.1 to 7.7/1000 screening examinations, while results from a Norwegian study ranged from 2.9 to 3.1/1000. For comparison, the rates for negatively screened were 2.9/ 1000 screening examinations in the UK and 1.7/1000 in Norway.
A lower proportion of discordant screen-detected cancers was observed among prevalent (20.0%) versus incident (25.2%) screened women. This was also observed in a previous study from BreastScreen Norway, using mainly analog mammograms [10]. Screen-detected cancers among incident screened women have been associated with a smaller proportion of advanced breast cancer compared to first-time, prevalent screened women [14]. However, histopathological tumor size has been reported to be similar among prevalent and incident screenings [12]. Future studies focusing on comparing tumor characteristics between these two groups would help fill this knowledge gap.
In this study, 7.4% of all screening examinations were discussed at consensus due to discordant scores and 75.4% of these were dismissed at consensus. We found that 10.6% of interval cancers and 12.1% of subsequent screen-detected cancers were discordant cases discussed and dismissed at consensus. In other words, 340 (1.3%) out of the 27,008 women with dismissed examinations were diagnosed with breast cancer within 2 years. Using a 1-year follow-up strategy for discordant cases dismissed at consensus may be one strategy for lowering the interval cancer rate and increasing the rate of screen-detected cancer. However, this may also increase the recall rate and falsepositive screening rates and increase workload for Table 4 Tumor characteristics of subsequent screen-detected cancer, stratified by negative index screening, dismissed at index screening, and falsepositive screening results in BreastScreen Norway Unless otherwise specified, data are presented as numbers with percentage in parenthesis IQR interquartile range, NST no special type, ER estrogen receptor, PR progesterone receptor, Her2 human epidermal growth factor receptor * Overall p value for association between discordant/discordant scores and the different tumor characteristics radiologists. Use of tomosynthesis represents a possible strategy due to the higher rate of screen-detected cancers [15][16][17]. However, there are variable results on recall, interval cancer, and reading time compared to standard digital mammography. Formal cost-effectiveness analyses would help weight the benefits versus costs of such approaches. Another strategy could be use of artificial intelligence (AI). AI has the potential to increase the accuracy of screening interpretations and reduce the radiologists' workload, costs, and subjectivity of the interpretation. Studies introducing AI in the reading process have shown promising results with some studies reporting performance at the level of radiologists [18,19]. However, so far, the evidence is scarce due to small study populations, enhanced data sets often used to train the models, and lack of prospective studies [20,21]. Our findings of prognostic favorable histopathological tumor characteristics for discordant screen-detected cancers versus concordant positive cases are consistent with other studies [5,6]. For interval cancers diagnosed after being dismissed at consensus, the rate of invasive cancers was higher among dismissed and concordant negative compared to false-positive cases. Although not significantly different, the results of more lymph node involvement, a lower proportion of histological grade 1 invasive cancers and Luminal A-like immunohistochemical subtype among dismissed and concordant negative examinations indicates less favorable prognostic characteristics compared to cancers detected after false-positive screening.
High completeness of the data and detailed information about the radiologist's interpretation scores represent strengths of this study. However, despite a large study population, some subgroups had few cancer cases resulting in less powerful results. Using woman-level rather than breast-level analyses ensures the clinical approach, on the cost of the accuracy as some of the cancers might be in the other breast than the positive score at index screening. Further, some features that resulted in a positive score at index screening might not be the same as later diagnosed as cancer, even though they appeared in the same breast. A previous retrospective review of screening mammograms in BreastScreen Norway identified that 42.9% of interval cancers diagnosed after a false-positive screening were recalled due to the same mammographic finding [8]. Further, the scoring system used in BreastScreen Norway represents a modified version of BI-RADS [22]. A score of 1 in the Norwegian system corresponds to BI-RADS 1 and 2, scores 2, 3, 4, and 5 are analog to BI-RADS 3, 4a-b, 4c, and 5, respectively, while BI-RADS 0 and 6 do not apply. We consider these differences not affecting the generalizability of our study.
In conclusion, 23% of screen-detected cancers were detected by only one of two radiologists. The odds of interval and subsequent screen-detected cancer was 2-3 times higher for women with examinations discussed but dismissed at consensus for index screening compared to those with concordant negative scores. Adding an additional screening 1 year after being dismissed at consensus or exploiting AI in screenreading and at the time of consensus are potential strategies that may be considered for the purpose of reducing interval cancers.

Declarations
Guarantor The scientific guarantor of this publication is Solveig Hofvind.

Conflict of Interest Solveig Hofvind is the head of BreastScreen
Norway. The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.
Statistics and Biometry Two of the authors, Silje Sagstad and Marthe Larsen, have significant statistical expertise.
Informed Consent Written informed consent was not required for this study because the Cancer Registry of Norway's Regulations waive the requirement for informed consent for surveillance and quality assurance projects based on data collected as a part of invitation to and/or participation in BreastScreen Norway.
Ethical Approval: Approval was obtained from the Oslo University Hospital data protection official for research (20/12601).
Study subjects or cohorts overlap This is a study including information from women who participated in BreastScreen Norway 2006-2019. These women are included in several other publications from BreastScreen Norway, but the specific study population used in this study has never been used for any other publication previously. Fig. 5 The craniocaudal and mediolateral oblique mammograms of the right breast at index (a and b) and subsequent screening (c and d) from a 54year-old woman diagnosed with subsequent screen-detected cancer after false-positive index screening. The examination was characterized as a one-plane asymmetry in the craniocaudal view at index screening. At subsequent screening, a circumscribed mass in the upper medial quadrant (arrow) and a smaller mass, located more lateral and inferior (arrowhead), were both histologically verified as cancers

Methodology
• retrospective • cohort study • multicenter study Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.