Skip to main content

Prognostic value of histopathological DCIS features in a large-scale international interrater reliability study



For optimal management of ductal carcinoma in situ (DCIS), reproducible histopathological assessment is essential to distinguish low-risk from high-risk DCIS. Therefore, we analyzed interrater reliability of histopathological DCIS features and assessed their associations with subsequent ipsilateral invasive breast cancer (iIBC) risk.


Using a case-cohort design, reliability was assessed in a population-based, nationwide cohort of 2767 women with screen-detected DCIS diagnosed between 1993 and 2004, treated by breast-conserving surgery with/without radiotherapy (BCS ± RT) using Krippendorff’s alpha (KA) and Gwet’s AC2 (GAC2). Thirty-eight raters scored histopathological DCIS features including grade (2-tiered and 3-tiered), growth pattern, mitotic activity, periductal fibrosis, and lymphocytic infiltrate in 342 women. Using majority opinion-based scores for each feature, their association with subsequent iIBC risk was assessed using Cox regression.


Interrater reliability of grade using various classifications was fair to moderate, and only substantial for grade 1 versus 2 + 3 when using GAC2 (0.78). Reliability for growth pattern (KA 0.44, GAC2 0.78), calcifications (KA 0.49, GAC2 0.70) and necrosis (KA 0.47, GAC2 0.70) was moderate using KA and substantial using GAC2; for (type of) periductal fibrosis and lymphocytic infiltrate fair to moderate estimates were found and for mitotic activity reliability was substantial using GAC2 (0.70). Only in patients treated with BCS-RT, high mitotic activity was associated with a higher iIBC risk in univariable analysis (Hazard Ratio (HR) 2.53, 95% Confidence Interval (95% CI) 1.05–6.11); grade 3 versus 1 + 2 (HR 2.64, 95% CI 1.35–5.14) and a cribriform/solid versus flat epithelial atypia/clinging/(micro)papillary growth pattern (HR 3.70, 95% CI 1.34–10.23) were independently associated with a higher iIBC risk.


Using majority opinion-based scores, DCIS grade, growth pattern, and mitotic activity are associated with iIBC risk in patients treated with BCS-RT, but interrater variability is substantial. Semi-quantitative grading, incorporating and separately evaluating nuclear pleomorphism, growth pattern, and mitotic activity, may improve the reliability and prognostic value of these features.


Ductal carcinoma in situ (DCIS) of the breast is a non-obligate precursor of invasive breast cancer (IBC). Since the introduction of organized population-based breast screening, the incidence of DCIS has increased manyfold [1,2,3]. Although DCIS is almost always treated to avoid progression to IBC, this has not led to a reduced IBC incidence. Breast screening programs are therefore criticized by some for being associated with overdiagnosis and overtreatment of DCIS [4,5,6]. It has been reported that a large proportion of untreated DCIS will not progress to IBC [7, 8]. Ryser et al. reported a 10-year net risk of ipsilateral IBC (iIBC) of 12.2% (95% Confidence Interval (95% CI) 8.6–17.1%) for women with DCIS grade 1/2 and 17.6% (95% CI 12.1–25.2%) for grade 3 [8]. Although based on selected patients, these results underline that at least some DCIS lesions have a low risk of progression and may thus be overtreated. However, reliably distinguishing high- from low-risk DCIS to guide treatment is still challenging.

Many studies have tried to find histopathological markers that could predict progression of DCIS [9, 10]. So far, no single marker ended up being used in clinical practice due to lack of conclusive evidence of predictive ability, in part due to suboptimal biased study designs in particular due to insufficient handling of confounders and poorly described study groups [10]. Especially grade has been extensively studied as a biomarker for the invasive potential of DCIS. The use of many different grading systems with partly unclear criteria and often only poor to modest interrater reliability makes it difficult to evaluate the role of grade in risk stratification [11,12,13,14,15,16,17,18,19,20,21].

In addition, various studies have assessed reproducibility of histopathological evaluation of DCIS lesions. Unfortunately, these studies were frequently based on highly selected case sets, assessed by expert breast pathologists often after having received instructions or tutorials beforehand and using reference diagnoses without follow-up data [17, 18, 22,23,24,25,26,27,28]. The interpretation of results and evaluation of potential bias is further complicated by inadequate reporting [29].

This study assesses the interrater reliability of various histopathological features in DCIS in a setting which as closely as possible reflects daily practice. We subsequently evaluate whether these features, based on a more robust majority opinion of 38 raters, are associated with risk of development of subsequent iIBC.


Patient selection

We assembled a population-based, nationwide cohort of screen-detected primary and pure DCIS, treated with breast-conserving surgery with or without adjuvant radiotherapy (BCS ± RT) between January 1, 1993 and December 31, 2004, by linkage of data from the Netherlands Cancer Registry (NCR) with data from the Dutch breast cancer screening program [30]. From 1989, the Dutch biennial screening program was gradually introduced, inviting women aged 50–69 years and from 1998 aged 50–75 years. Screen-detected DCIS was defined as DCIS detected within 30 months after a first or subsequent positive screening examination. The cohort was supplemented with data from the nationwide network and registry of histology and cytopathology in the Netherlands (PALGA) [31]. Information on age and date at diagnosis, treatment, and if applicable subsequent iIBC and vital status was provided by the NCR (follow-up data available until January 1, 2011). Patients diagnosed with a prior malignancy, other than non-melanoma skin cancer, were excluded. The review boards of the NCR, PALGA and the Dutch breast cancer screening organization approved this study.

Interrater reliability analysis

We first assessed the interrater reliability of histopathological DCIS features in this cohort using a case-cohort design [32]. From the cohort of 2767 women, we randomly sampled 357 women (subcohort; 13%) and additionally selected all 177 patients who subsequently developed an iIBC but were not included in the random sample for a total of 534 patients. Figure 1 shows the selection of patients with exclusions at pathology report review (n = 27) and slide review (n = 76). Slide review was based on freshly cut slides stained with hematoxylin and eosin and in case of uncertainty about the in situ nature of the lesion also with cytokeratin 14 by EJG (clone LL002; 1/3200 dilution, 32 min at 37 °C + amplification, Neomarkers/Thermo Scientific).

Fig. 1
figure 1

Flow diagram for patient selection and exclusions Subcohort randomly selected patient group; outside subcohort patients who developed subsequent ipsilateral invasive breast cancer not included in the subcohort, iIBC ipsilateral invasive breast cancer; a2 outside subcohort patients developed invasive breast cancer after a mastectomy was performed during follow-up, for other reasons than iIBC.

For 353 patients the diagnosis of pure DCIS could be confirmed and from each lesion a single slide was selected with the highest quantity of DCIS. These slides were digitized using an Aperio AT2 scanner (Leica Biosystems) at 20 × magnification and uploaded on an online viewing platform ( For each DCIS lesion a scoring form (see Supplementary methods) was built-in with the items: DCIS present (yes or no), grade (1, 2, or 3), grade (low or high), growth pattern (flat epithelial atypia (FEA), clinging, (micro)papillary, cribriform, or solid) and mitotic activity of DCIS (sparse or many mitoses), calcifications (present or absent), necrosis (present or absent), periductal fibrosis (absent, subtle, or prominent) and lymphocytic infiltrate (absent, subtle, or prominent). For each item, a ‘not assessable’ category was also provided. Regarding DCIS growth patterns, there is controversy about whether to consider FEA as a subtype of DCIS (clinging, monomorphic type) or not; therefore, this option was included as possible DCIS growth pattern.

European raters with varying expertise were invited to participate in the study. Each rater was assigned a study set of 146 cases to score independently, blinded to subject information. Raters were not given instructions regarding the (interpretation of) histopathological features and were requested to score as they would in daily practice to provide an unbiased baseline measure of reliability. Further details on rater selection, participation, and the scoring process are described in Supplementary methods.

Statistical analysis

In total, 11 patients were excluded from reliability analysis because > 50% of raters considered their lesion as no DCIS/not assessable (often considering atypical ductal hyperplasia/FEA as alternative diagnosis; n = 5) or > 25% commented on suboptimal slide quality (n = 6). If DCIS was not confirmed, any scores for following histopathological features were ignored. Scores for type of fibrosis were only considered when periductal fibrosis was present according to the majority opinion. Raters were excluded for the analysis of single histopathological features when they scored an item as ‘not assessable’ in > 50% of their study set.

Krippendorff’s alpha (KA), Gwet’s AC2 (GAC2), and percentage agreement were calculated to assess interrater reliability (‘not assessable’ scores were excluded) [33, 34]. KA and GAC2 are applicable to studies involving nominal/ordinal data and multiple raters scoring different subsets. A weighted analysis using linear weights was used for ordinal variables with > 2 categories. Interpretation was performed according to Landis and Koch [35]. Recategorization of grade, periductal fibrosis, and lymphocytic infiltrate was undertaken during analysis to evaluate reliability using different cut-offs.

For the analysis of subsequent iIBC risk, an additional 10 patients were excluded, because > 25% of the raters considered an invasive carcinoma component (mainly microinvasion) to be present adjacent to DCIS (n = 8) or because the patient underwent a mastectomy before developing iIBC (n = 2). For a detailed comparison of clinical characteristics between in- versus excluded patients see Supplementary Table S1.

Associations of histopathological features, treatment, age at diagnosis, and period of diagnosis (1993–1998, reflecting the screening implementation phase, versus 1999–2004, reflecting full nationwide coverage) with risk of iIBC was assessed using Cox models. Analyses were performed irrespective of treatment as well as separately for BCS alone and BCS + RT. Interactions with treatment were also considered. Proportional hazard assumptions (PHA) were tested using residual-based and graphical methods. In case the PHA was violated, a time factor was added, and the associations were estimated for different time-periods (i.e., for the first 5 years and after 5 years). For the histopathological features the majority opinion, i.e., the most frequently assigned category, was used in the analysis (‘not assessable’ scores were excluded). In case of equal frequencies, the presence of a histopathological feature was chosen over absence, the highest grade, the most complex growth pattern (i.e., cribriform/solid), many over sparse mitoses, prominent over subtle presence for periductal fibrosis and lymphocytic infiltrate and the least common type of fibrosis (i.e., myxoid). Time to iIBC was compared between women with low-grade DCIS versus high-grade DCIS and women treated with BCS + RT versus BCS alone using median test. Clinicopathological factors were entered in multivariable models including treatment, based on a P value ≤ 0.15 in univariable analyses. Barlow’s inverse probability weights were used to adjust the partial likelihood function for case-cohort analysis with robust variance estimation [32]. Fit of non-nested models was compared using Akaike's and Bayesian information criteria. Two-sided P values ≤ 0.05 were considered statistically significant. All statistical analyses were performed using Stata/SE (version 13.1, Statacorp).


Interrater reliability

The mean number of scores per slide was 14 (range 12–15) (Supplementary Table S2). The raters consisted of a mixed group (Supplementary Table S3), about half of them working in the Netherlands and half in other European countries within a wide range of laboratories regarding size and degree of specialization. Forty-seven percent of raters were members of the European Working Group of Breast Screening Pathologists. The diagnosis of DCIS was confirmed in 98.6% of the patients based on the majority opinion.

The interrater reliability for the 3-tiered grading system (grade 1, 2, or 3), the most commonly used histopathological feature, was only fair (KA 0.34; 95% CI 0.30–0.39) to moderate (GAC2 0.52; 95% CI 0.50–0.55; Table 1). Using a 2-tiered grading system (either low versus high grade or grade 1 + 2 versus grade 3) did not improve reliability. When the 3-tiered grading was recategorized into a category for grade 1 and a category for grade 2 + 3 combined, the reliability was substantial using GAC2 (0.78; 95% CI 0.74–0.82).

Table 1 Agreement, Gwet’s AC2 (GAC2), and Krippendorff’s alpha (KA) coefficients per histopathological feature

Comparable moderate (KA) to substantial (GAC2) reliability was found for growth pattern, necrosis, and calcifications, which are all features assessed in daily practice within the context of DCIS. FEA was scored 38 times in 24 different patients (representing 0.76% of all evaluations); in only 1 patient FEA was the majority opinion. Reliability did not change when FEA scores were excluded from analysis. A striking discrepancy in reliability was found for the assessment of mitotic activity with only fair reliability when considering KA (0.24) but substantial reliability based on GAC2 (0.70). In a 3-tiered system (absent, subtle, or prominent presence), lymphocytic infiltrate showed moderate reliability, which was slightly better than the interrater reliability for periductal fibrosis. Recategorization, comparing periductal fibrosis presence with absence led to a moderate reliability (GAC2 0.53).

Risk of subsequent iIBC after DCIS

Subcohort patients were diagnosed with DCIS at a median age of 58.4 (interquartile range 53.4–64.0) and treated by BCS alone in 40.5% (87 patients) and by BCS + RT in 59.5% (128 patients). After a median follow-up of 11.2 years (interquartile range 8.6–14.1), 20 patients developed an iIBC in the subcohort. DCIS was assigned grade 1 in 10.7%, grade 2 in 53.5%, and grade 3 in 35.8%, based on the majority opinion. Median time to iIBC was 5.3 years (interquartile range 3.3–7.6 years). Time to subsequent iIBC for women with low-grade DCIS did not differ significantly from those with high-grade DCIS (median 5.3 years versus 5.6 years, respectively, P = 0.57). Time to iIBC for women treated with BCS + RT (median 5.9 years) did also not differ significantly from those treated with BCS alone (median 5.1 years); P = 0.12). Table 2 shows clinicopathological characteristics of the subcohort and of all patients who developed an iIBC and Fig. 2 depicts photomicrographs of several histopathological DCIS features based on the majority opinion.

Table 2 Clinical characteristics and histopathological characteristics (based on the majority opinion) of the study population
Fig. 2
figure 2

Photomicrographs from histopathological DCIS features based on the majority opinion. a low-grade DCIS (hematoxylin and eosin (H&E); × 200), b high-grade DCIS (H&E; × 200), c many mitoses (H&E; × 200), d necrosis (H&E; × 200), e subtle periductal fibrosis (H&E; × 50), f prominent periductal fibrosis (H&E; × 50), g sclerotic periductal fibrosis (H&E; × 50), h myxoid periductal fibrosis (H&E; × 50), i subtle periductal lymphocytic infiltrate (H&E; × 50), j prominent periductal lymphocytic infiltrate (H&E; × 50)

In univariable analysis, patients treated with BCS alone had a much higher risk of iIBC than patients treated with BCS + RT with a Hazard Ratio (HR) of 4.80 (95% CI 2.49–9.24) in the first 5 years and a HR of 2.47 after 5 years (95% CI 1.42–4.30; Supplementary Table S4). In patients treated with BCS alone, grade 3 (versus grade 1 + 2 combined), a cribriform/solid growth pattern (versus FEA, clinging, and (micro)papillary growth pattern), and mitotically active DCIS (versus DCIS with low mitotic activity) were also associated with a higher iIBC risk, whereas in patients treated with BCS + RT these associations were not found. In univariable analysis, a significant interaction with treatment was found for grade 3 versus 1 + 2 (P = 0.028) and for growth pattern (P = 0.023).

In multivariable analysis, a model which, besides treatment, included grade 3 versus grade 1 + 2 and growth pattern (cribriform and solid versus FEA, clinging, and (micro)papillary) best predicted the risk of developing iIBC in patients treated with BCS alone, while grade and growth pattern were not associated with iIBC risk in patients treated with BCS + RT (Table 3). The risk of developing iIBC did not differ between patients with DCIS grade 1/2 and FEA, clinging, or (micro)papillary growth pattern who were treated with BCS alone or BCS + RT. Figure 3 shows cumulative risk of iIBC based on categories derived from this model.

Table 3 Associations of histopathological features with subsequent iIBC in multivariable analysis
Fig. 3
figure 3

Kaplan–Meier curve illustrating iIBC incidence after diagnosis of DCIS treated by BCS alone. GP growth pattern, other flat epithelial atypia, clinging and (micro)papillary growth pattern. The red dashed reference line depicts the maximum reached incidence in patients with DCIS grade 3 with a cribriform/solid growth pattern treated with BCS + RT


To the best of our knowledge, this is the first study combining a comprehensive interrater reliability study in DCIS, reflecting daily practice as closely as possible, with an analysis of iIBC risk based on the majority opinion of a large group of raters. This approach minimizes the muddling effect of interrater variability and subjectivity on the evaluation of the prognostic value of histopathological features. It will improve our ability to identify those histopathological DCIS features that matter the most in terms of iIBC risk, on which future studies which aim to optimize reliability should focus.

In univariable analysis, patients treated with radiotherapy after BCS had a strongly reduced risk of iIBC compared to those treated by BCS alone, as was already shown previously [30, 36, 37]. Also grade 3 (versus grade 1 + 2 combined), a high mitotic activity and a cribriform/solid growth pattern (versus FEA, clinging, or (micro)papillary growth pattern) were associated with increased iIBC risk in patients treated with BCS alone. In multivariable analysis however, only grade 3 (versus grade 1 + 2) and a cribriform/solid growth pattern were independently associated with an increased iIBC risk. Mitotic activity did not add any predictive value to grade 3 versus 1 + 2 and growth pattern in a multivariable model, though this is likely due to collinearity with grade. Another important finding in our study is that no histopathological features were associated with iIBC risk in the patients treated with BCS + RT. Although women in our study were not randomized for treatment arm, this finding may suggest that radiotherapy neutralizes the effect of these classical histopathological features. This is also in line with the fact that within the large randomized controlled trials of RT in DCIS no subgroup could be identified without RT benefit [36].

So far, grade is the sole histopathological feature in DCIS that is used in clinical practice and also has an impact on eligibility in the context of clinical trials investigating the safety of active surveillance in low-risk DCIS [38,39,40]. In general, only women over the age of 45 or 50 with screen-detected calcifications associated with DCIS grade 1 or grade 2 are eligible in these trials. A three-tiered grading system is used for this selection purpose. Our study supports the rationale to distinguish between grade 1 + 2 versus grade 3 as DCIS grade 3 is independently associated with an increased risk of iIBC in patients treated with BCS alone. Unfortunately, the interrater reliability of assessing grade using either a 3-tiered grading system (grade 1, 2, or 3) or a 2-tiered system differentiating grade 1 + 2 combined versus grade 3 was only fair when considering KA and at best moderate based on the GAC2.

The interrater reliability for growth pattern was moderate (KA) to substantial (GAC2). The predictive ability of grade and growth pattern has been intensively studied previously, with conflicting results [10]. Factors such as substantial interrater variability, grading system used, bias in designs, and relying on histopathological assessments of a single pathologist’s opinion may have resulted in these different findings [10]. Interrater reliability based on GAC2 was higher overall, when histopathological features showed strongly skewed distribution and when agreement was already very high (i.e., grade 1 versus 2 + 3, growth pattern, and mitotic activity). Under these circumstances, a GAC2 test may result in more accurate reliability coefficients, as was previously shown in comparison with Cohen’s kappa, which overestimates the concordance attributed due to chance alone in these situations leading to lower reliability coefficients [41].

In view of the prognostic value and interrater reliability observed in our study, it is questionable whether it is safe to base clinical treatment decisions solely on the assessment of classical histopathological features. Here, we propose four strategies that may improve risk stratification in DCIS.

Within the context of DCIS, the three features with reasonable prognostic value (grade 1 + 2 versus 3, growth pattern, and mitotic activity) are currently used in many grading systems, but without clear definitions and rules about how to value each feature. We therefore firstly would suggest to objectify histological grading by using a numerical semi-quantitative scoring system which separately evaluates each of these features, analogous to the modified Bloom and Richardson grading system for IBC [42, 43]. Dichotomous scoring systems may further improve reliability and prognostic value and should be further explored evaluating different cut-offs [44, 45].

Secondly, performing additional immunohistochemistry to assign specific DCIS profiles may add prognostic value, possibly only in subsets of patients (i.e., grade 2). Previously, associations were reported of human epidermal growth factor receptor 2 (HER2)-positive, estrogen receptor (ER)-negative DCIS, and DCIS with high cyclooxygenase 2, p16, and Ki-67 levels with increased iIBC risk [9, 10, 46, 47]. These markers would be good candidates for further exploration. Automated scoring within this context may result in more standardized and objective assessment [48,49,50,51]. Previously, a 3-tiered grading system in DCIS, combining nuclear grade according to the Van Nuys criteria with automated Ki-67 count, was reported to show excellent correlation with immunohistochemical markers of reported biological relevance such as ER and HER2 [9, 46, 47, 50].

Thirdly, alternative approaches using pathology information such as artificial intelligence-based methods should also be considered in search for clinically relevant biomarkers in DCIS [52]. Recently, others have developed a whole slide image-based machine learning model, which accurately predicted the risk of an invasive or in situ recurrence and significantly outperformed traditional clinicopathological variables [53].

Lastly, besides pathology, other criteria could also be incorporated in clinical decision schemes, e.g., as in current active surveillance trials requiring DCIS to be screen-detected based on calcifications only without clinical symptoms and diagnosed on representative vacuum-assisted biopsies [38,39,40].

Our study had several limitations. From our study population, each rater scored a different subset of patients. Therefore, we were not able to analyze the association of histopathological DCIS features with iIBC risk per rater or grading system used and to study the effect of interrater variability on risk stratification. However, the resulting immense workload would probably have caused major rater-dropout. Also tissue slides were digitally assessed using research technology producing images of somewhat lower resolution. This may have led to difficulty of assessing histopathological features requiring great detail, such as mitotic activity. Our reliability study was nonetheless performed under conditions as close as possible to clinical practice, as a large set of non-selected DCIS cases from a population-based cohort were reviewed by a large group of raters with varying levels of expertise without provision of instructions or tutorials beforehand. And lastly, data on margin status and DCIS lesion size, factors potentially associated with the risk of iIBC, were not collected in a standardized way [10, 46, 47, 54]. However, Dutch guidelines state that a re-excision or mastectomy is obligatory in case of involved margins after a primary excision. An explorative analysis using the available data on margin status indeed showed no significant difference in the risk of iIBC for positive margins and even a protective effect for close margins in women treated with BCS alone in comparison to women with negative margins, suggesting they were subjected to re-excisions.


We evaluated the prognostic value of histopathological DCIS features to inform risk stratification using a unique, combined approach. Our study showed substantial interrater variability in the classification of histopathological DCIS features, while using rater majority opinions, minimizing the muddling effect of interrater variability, DCIS grade, growth pattern, and mitotic activity were associated with the risk of subsequent ipsilateral invasive breast cancer after DCIS in patients treated with BCS without radiotherapy. A semi-quantitative grading system incorporating and separately evaluating nuclear pleomorphism, growth pattern, and mitotic activity, analogue to IBC grading, may improve the reliability and prognostic value of these histopathological features.

Data availability

The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request. Requests should be made to Prof. J. Wesseling:



Ductal carcinoma in situ


Invasive breast cancer


Ipsilateral invasive breast cancer

95% CI:

95% Confidence Interval


Breast-conserving surgery with or without radiotherapy


Netherlands Cancer Registry


The nationwide network and registry of histology and cytopathology in the Netherlands.


Krippendorff’s alpha


Gwet’s AC2


Proportional hazard assumptions


Hazard Ratio


Flat epithelial atypia


Human epidermal growth factor receptor 2


Estrogen receptor


  1. Virnig BA, Tuttle TM, Shamliyan T, Kane RL (2010) Ductal carcinoma in Situ of the breast: A systematic review of incidence, treatment, and outcomes. J Natl Cancer Inst. 102(3):170–178

    PubMed  Google Scholar 

  2. Netherlands Comprehensive Cancer Organisation (IKNL). Available from:

  3. Cancer Research UK. Available from:

  4. Ripping TM, Verbeek ALM, Fracheboud J, De Koning HJ, Van Ravesteyn NT, Broeders MJM (2015) Overdiagnosis by mammographic screening for breast cancer studied in birth cohorts in the Netherlands. Int J Cancer. 137(4):921–929

    CAS  PubMed  Google Scholar 

  5. Harding C, Pompei F, Burmistrov D, Welch HG, Abebe R, Wilson R (2015) Breast cancer screening, incidence, and mortality across US counties. JAMA Intern Med. 175(9):1483–1489

    PubMed  Google Scholar 

  6. van Luijt PA, Heijnsdijk EAM, Fracheboud J, Overbeek LIH, Broeders MJM, Wesseling J et al (2016) The distribution of ductal carcinoma in situ (DCIS) grade in 4232 women and its impact on overdiagnosis in breast cancer screening. Breast Cancer Res. 18(1):47

    PubMed  PubMed Central  Google Scholar 

  7. Erbas B, Provenzano E, Armes J, Gertig D (2006) The natural history of ductal carcinoma in situ of the breast: a review. Breast Cancer Res Treat 97(2):135–144

    PubMed  Google Scholar 

  8. Ryser MD, Weaver DL, Zhao F, Worni M, Grimm LJ, Gulati R et al (2019) Cancer outcomes in DCIS patients without locoregional treatment. J Natl Cancer Inst 111(9):952–960

    PubMed  PubMed Central  Google Scholar 

  9. Lari SA, Kuerer HM (2011) Biological markers in DCIS and risk of breast recurrence: a systematic review. J Cancer. 2:232

    PubMed  PubMed Central  Google Scholar 

  10. Visser LL, Groen EJ, Van Leeuwen FE, Lips EH, Schmidt MK, Wesseling J (2019) Predictors of an invasive breast cancer recurrence after DCIS: A Systematic Review and Meta-analyses. Cancer Epidemiol Biomark Prev 28(5):835–845

    CAS  Google Scholar 

  11. Holland R, Peterse JL, Millis RR, Eusebi V, Faverly D, Van de Vijver MJ et al (1994) Ductal carcinoma in situ: a proposal for a new classification. Semin Diagn Pathol 11(3):167–180

    CAS  PubMed  Google Scholar 

  12. Pinder SE, Duggan C, Ellis IO, Cuzick J, Forbes JF, Bishop H et al (2010) A new pathological system for grading DCIS with improved prediction of local recurrence: Results from the UKCCCR/ANZ DCIS trial. Br J Cancer. 103(1):94–100

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Cserni G, Sejben A (2019) Grading Ductal Carcinoma In Situ (DCIS) of the breast – what’s wrong with It? Pathol Oncol Res 26(2):665–671.

    Article  PubMed  Google Scholar 

  14. Lagios MD (1990) Duct carcinoma in situ. Pathology and treatment. Surg Clin North Am. 70(4):873–883

    Google Scholar 

  15. Silverstein MJ, Poller DN, Waisman JR, Colburn WJ, Barth A, Gierson ED et al (1995) Prognostic classification of breast ductal carcinoma-in-situ. Lancet 345(8958):1154–1157

    CAS  PubMed  Google Scholar 

  16. Sloane JP, Amendoeira I, Apostolikas N, Bellocq JP, Bianchi S, Boecker W et al (1998) Consistency achieved by 23 European pathologists in categorizing ductal carcinoma in situ of the breast using five classifications. European Commission Working Group on Breast Screening Pathology. Hum Pathol 29(10):1056–1062

    CAS  PubMed  Google Scholar 

  17. Wells WA, Carney PA, Eliassen MS, Grove MR, Tosteson ANA (2000) Pathologists’ agreement with experts and reproducibility of breast ductal carcinoma-in-situ classification schemes. Am J Surg Pathol 24(5):651–659

    CAS  PubMed  Google Scholar 

  18. Bethwaite P, Smith N, Delahunt B, Kenwright D (1998) Reproducibility of new classification schemes for the pathology of ductal carcinoma in situ of the breast. J Clin Pathol 51:450–454

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Lakhani SR, Ellis IO, Schnitt SJ, Tan PH, van de Vijver MJ (2012) WHO classification of tumours of the breast, 4th edn. International Agency for Research on Cancer, Lyon

    Google Scholar 

  20. College of American pathologists. Available from:

  21. Poller DN, Silverstein MJ, Galea M, Locker AP, Elston CW, Blamey RW et al (1994) Ideas in pathology. Ductal carcinoma in situ of the breast: a proposal for a new simplified histological classification association between cellular proliferation and c-erbB-2 protein expression. Mod Pathol 7(2):257–262

    CAS  PubMed  Google Scholar 

  22. Elston CW, Sloane JP, Amendoeira I, Apostolikas N, Bellocq JP, Bianchi S et al (2000) Causes of inconsistency in diagnosing and classifying intraductal proliferations of the breast. Eur J Cancer 36(14):1769–1772

    CAS  PubMed  Google Scholar 

  23. Scott MA, Lagios MD, Axelsson K, Rogers LW, Anderson TJ, Page DL (1997) Ductal carcinoma in situ of the breast: Reproducibility of histological subtype analysis. Hum Pathol 28(8):967–973

    CAS  PubMed  Google Scholar 

  24. Schuh F, Biazús JV, Resetkova E, Benfica CZ, Edelweiss MIA (2010) Reproducibility of three classification systems of ductal carcinoma in situ of the breast using a web-based survey. Pathol Res Pract 206(10):705–711

    PubMed  Google Scholar 

  25. Schuh F, Biazús JV, Resetkova E, Benfica CZ, Ventura A, de Freitas Uchoa D et al (2015) Histopathological grading of breast ductal carcinoma in situ: validation of a web-based survey through intra-observer reproducibility analysis. Diagn Pathol 10(1):93

    PubMed  PubMed Central  Google Scholar 

  26. Elmore JG, Longton GM, Carney PA, Geller BM, Onega T, Tosteson ANA et al (2015) Diagnostic concordance among pathologists interpreting breast biopsy specimens. JAMA 313(11):1122–1132

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Verkooijen HM, Peterse JL, Schipper MEI, Buskens E, Hendriks JHCL, Pijnappel RM et al (2003) Interobserver variability between general and expert pathologists during the histopathological assessment of large-core needle and open biopsies of non-palpable breast lesions. Eur J Cancer. 39(15):2187–2191

    CAS  PubMed  Google Scholar 

  28. van Dooijeweert C, van Diest PJ, Willems SM, Kuijpers CCHJ, Overbeek LIH, Deckers IAG (2019) Significant inter- and intra-laboratory variation in grading of ductal carcinoma in situ of the breast: a nationwide study of 4901 patients in the Netherlands. Breast Cancer Res Treat. 174(2):479–488.

    CAS  Article  PubMed  Google Scholar 

  29. Kottner J, Audige L, Brorson S, Donner A, Gajewski BJ, Hroóbjartsson A et al (2011) Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. Int J Nurs Stud. 48(6):661–671

    PubMed  Google Scholar 

  30. Elshof LE, Schaapveld M, Schmidt MK, Rutgers EJ, van Leeuwen FE, Wesseling J (2016) Subsequent risk of ipsilateral and contralateral invasive breast cancer after treatment for ductal carcinoma in situ: incidence and the effect of radiotherapy in a population-based cohort of 10,090 women. Breast Cancer Res Treat 159(3):553–563

    PubMed  PubMed Central  Google Scholar 

  31. Casparie M, Tiebosch ATMG, Burger G, Blauwgeers H, Van De Pol A, Van Krieken JHJM et al (2007) Pathology databanking and biobanking in The Netherlands, a central role for PALGA, the nationwide histopathology and cytopathology data network and archive. Cell Oncol 29(1):19–24

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Barlow WE, Ichikawa L, Rosner D, Izumi S (1999) Analysis of case-cohort designs. J Clin Epidemiol 52(12):1165–1172

    CAS  PubMed  Google Scholar 

  33. Hayes AF, Krippendorff K (2007) Answering the call for a standard reliability measure for coding data. Commun Methods Meas 1(1):77–89

    Google Scholar 

  34. Gwet KL (2014) Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among raters, 4th edn. Advanced Analytics, LLC, Gaithersburg

    Google Scholar 

  35. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174

    CAS  PubMed  Google Scholar 

  36. Correa C, McGale P, Taylor C, Davidson N, Gelber R, Piccart M et al (2010) Overview of the randomized trials of radiotherapy in ductal carcinoma in situ of the breast. J Natl Cancer Inst 41(41):162–177

    Google Scholar 

  37. Donker M, Litière S, Werutsky G, Julien JP, Fentiman IS, Agresti R et al (2013) Breast-conserving treatment with or without radiotherapy in ductal carcinoma in situ: 15-year recurrence rates and outcome after a recurrence, from the EORTC 10853 randomized phase III trial. J Clin Oncol 31(32):4054–4059

    PubMed  Google Scholar 

  38. Elshof LE, Tryfonidis K, Slaets L, Van Leeuwen-Stok AE, Skinner VP, Dif N et al (2015) Feasibility of a prospective, randomised, open-label, international multicentre, phase III, non-inferiority trial to assess the safety of active surveillance for low risk ductal carcinoma in situ: The LORD study. Eur J Cancer 51(12):1497–1510

    PubMed  Google Scholar 

  39. Francis A, Thomas J, Fallowfield L, Wallis M, Bartlett JMS, Brookes C et al (2015) Addressing overtreatment of screen detected DCIS. The LORIS trial. Eur J Cancer 51(16):2296–2303

    PubMed  Google Scholar 

  40. Hwang ES, Hyslop T, Lynch T, Frank E, Pinto D, Basila D et al (2019) The COMET (Comparison of Operative versus Monitoring and Endocrine Therapy) trial: a phase III randomised controlled clinical trial for low-risk ductal carcinoma in situ (DCIS). BMJ Open 9(3):e026797

    PubMed  PubMed Central  Google Scholar 

  41. Gwet KL (2008) Computing inter-rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol 61(1):29–48

    PubMed  Google Scholar 

  42. Bloom HJ, Richardson WW (1957) Histological grading and prognosis in breast cancer a study of 1409 cases of which 359 have been followed for 15 years. Br J Cancer 11(3):359–377

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Elston CW, Ellis IO (1991) The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology 19(5):403–410

    CAS  PubMed  Google Scholar 

  44. Van Bockstal M, Baldewijns M, Colpaert C, Dano H, Floris G, Galant C et al (2018) Dichotomous histopathological assessment of ductal carcinoma in situ of the breast results in substantial interobserver concordance. Histopathology 73(6):923–932

    PubMed  Google Scholar 

  45. Dano H, Altinay S, Arnould L, Bletard N, Colpaert C, Dedeurwaerdere F et al (2019) Interobserver variability in upfront dichotomous histopathological assessment of ductal carcinoma in situ of the breast: the DCISion study. Mod Pathol 33(3):354–366

    PubMed  PubMed Central  Google Scholar 

  46. Visser LL, Elshof LE, Schaapveld M, Van De Vijver K, Groen EJ, Almekinders MM et al (2018) Clinicopathological risk factors for an invasive breast cancer recurrence after ductal carcinoma in situ-a nested case-control study. Clin Cancer Res 24(15):3593–3601

    CAS  PubMed  Google Scholar 

  47. Kerlikowske K, Molinaro AM, Gauthier ML, Berman HK, Waldman F, Bennington J et al (2010) Biomarker expression and risk of subsequent tumors after initial ductal carcinoma in situ diagnosis. J Natl Cancer Inst 102(9):627–637

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Mohammed ZMA, McMillan DC, Elsberger B, Going JJ, Orange C, Mallon E et al (2012) Comparison of visual and automated assessment of Ki-67 proliferative activity and their impact on outcome in primary operable invasive ductal breast cancer. Br J Cancer 106(2):383–388

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Van Velthuysen MLF, Groen EJ, Sanders J, Prins FA, Van Der Noort V, Korse CM (2014) Reliability of proliferation assessment by Ki-67 expression in neuroendocrine neoplasms: eyeballing or image analysis? Neuroendocrinology 100(4):288–292

    PubMed  Google Scholar 

  50. Stasik CJ, Davis M, Kimler BF, Fan F, Damjanov I, Thomas P et al (2011) Grading ductal carcinoma in situ of the breast using an automated proliferation index. Ann Clin Lab Sci 41(2):122–130

    PubMed  Google Scholar 

  51. Balkenhol MCA, Tellez D, Vreuls W, Clahsen PC, Pinckaers H, Ciompi F et al (2019) Deep learning assisted mitotic counting for breast cancer. Lab Investig 99(11):1596–1606.

    Article  PubMed  Google Scholar 

  52. Bera K, Schalper KA, Rimm DL, Velcheti V, Madabhushi A (2019) Artificial intelligence in digital pathology: new tools for diagnosis and precision oncology. Nat Rev Clin Oncol 16(11):703–715.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Klimov S, Miligy IM, Gertych A, Jiang Y, Toss MS, Rida P et al (2019) A whole slide image-based machine learning approach to predict ductal carcinoma in situ (DCIS) recurrence risk. Breast Cancer Res 21(1):1–19

    CAS  Google Scholar 

  54. Collins LC, Achacoso N, Haque R, Nekhlyudov L, Fletcher SW, Quesenberry CP et al (2013) Risk factors for non-invasive and invasive local recurrence in patients with ductal carcinoma in situ. Breast Cancer Res Treat 139(2):453–460

    PubMed  PubMed Central  Google Scholar 

Download references


The authors thank all collaborating hospitals and PALGA, the nationwide network and registry of histo- and cytopathology, for facilitating retrieval of archival tissue material and providing pathology data. The authors thank the Netherlands Comprehensive Cancer Organization for providing data of the Netherlands Cancer Registry. The authors would like to thank the Dutch screening organization for providing screening data. The authors would like to acknowledge the NKI- AVL Core Facility Molecular Pathology & Biobanking (CFMPB) for supplying lab support. We thank all other pathologists who participated in the study: Mariëtte Giessen, Erik Nijhuis, Erwin Geuken, Frank Bellot, Karen Koopman, Ivana Verlinden, Mariël Brinkhuis, Franka van Merriënboer, Gesina van Lijnschoten, Horst Bürger, Alicia Córdoba, Inta Liepniece-Karele, and Grace Callagy.


This work was supported by KWF Kankerbestrijding (Grant Number NKI2014-7167) and by Cancer Research UK and by KWF Kankerbestrijding in a joint grant (Grant Number C38317/A24043).

Author information

Authors and Affiliations




EJG, EHL, MS, and JW were responsible for the study design. EJG coordinated the study. LM provided technical support. EJG revised all slides. JH provided an online platform to enable pathology scoring. MvS, MMA, SA, AK, AR, ZV, FJAN, SB, WV, EB, MVB, JK, EC, EB, MJdR, WV, AF, NELF, PR, PJW, LFSK, CQ, GF, GS, and PJvD scored the slides for the reliability study. EJG analyzed the data under supervision of MS. EJG wrote the manuscript with significant contributions by all authors. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jelle Wesseling.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflicts of interest.

Ethics approval

This retrospective study involving human participants was in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This study was approved by the review boards of the NKI-AVL, the Netherlands Cancer Registry, the nationwide network and registry of histology and cytopathology in the Netherlands and the Dutch breast cancer screening organization.

Informed consent

The study used only unidentifiable patient information, and no informed consent was required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 63 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Groen, E.J., Hudecek, J., Mulder, L. et al. Prognostic value of histopathological DCIS features in a large-scale international interrater reliability study. Breast Cancer Res Treat 183, 759–770 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Ductal carcinoma in situ
  • Invasive breast cancer
  • Interrater reliability
  • Risk stratification