Background

Despite endorsements by several international guidelines [1, 2] KI67 is yet to gain widespread application as a prognostic and/or predictive marker in breast cancer [3]. This is due, largely, to methodological variability in KI67 scoring (such as antibody type, specimen type, type of fixative, antigen retrieval methods, method of scoring, etc.), and limitations in the design and analyses of studies that have reported on this marker [37].

In the majority of settings, KI67 is evaluated visually by a pathologist even though there is yet to be consensus regarding which regions to score between the invasive edge, hot spots or the entire spectrum of the whole section or tumour core [8]. As a result, both the intra-observer and, especially, the inter-observer reproducibility of visually derived KI67 scores have been shown to be poor [911]. This has not only hampered inter-study comparability for KI67, but has fuelled concerns regarding its analytical validity [3]. To address some of the methodological issues related to KI67 assessment, the International KI67 in Breast Cancer Working Group published recommendations aimed at the standardisation of the analytical processes for KI67 evaluation [8]. This panel, however, fell short of making recommendations regarding the preferred method of scoring for KI67 between visual and automated. Several reports suggest that automated methods could address some of the problems associated with visual scoring [1119]. These methods are high throughput and are not limited by intra-observer variability. However, concerns exist regarding the accuracy of automated methods and the prognostic power of KI67 derived using these methods relative to that derived visually by pathologists. Few relatively small studies have reported a head-to-head comparison between scores derived using both methods in terms of prognostic properties, and the results from these are conflicting [11, 1719].

The majority opinion regarding the prognostic property of KI67 derives mostly from reviews and meta-analyses, which support its prognostic role in breast cancer [47, 20]. The meta-analyses by de Azambuja et al. [6] involving 12,155 patients and by Stuart-Harris et al. [7] which included over 15,000 patients represent two comprehensive analyses on this subject. These are limited, however, by reported evidence of publication bias, by significant between-study heterogeneity and by the fact that most of the included studies utilised different methodological approaches for KI67 evaluation. Furthermore, while the analysis by de Azambuja et al. [6] was limited by its inclusion of only univariate hazard ratios, that by Harris et al. [7] was limited by the small intersection between the sets of covariates in the included studies. In a population-based cohort of a cancer registry, Inwald et al. [21] examined the prognostic role of KI67 in 3658 patients for whom KI67 was routinely measured in clinical practice and reported significant associations between KI67 and overall survival [21]. An important strength of this analysis was that it utilised routinely assessed KI67 measurements in a clinical setting. But this was also limited by the heterogeneity of the KI67 analytical processes in the different laboratories involved in the study. Nonetheless, KI67 has found use in a variety of clinical and epidemiological scenarios, including its endorsement by a number of international guidelines for use in treatment decision-making in ER-positive breast cancer [1, 2] and its incorporation as part of emerging prognostic tools such as the IHC4 score [22, 23] and PREDICT, a breast cancer treatment benefit tool [24].

In this study, we evaluate the value and robustness of automated scoring of KI67 for large-scale, multicentre studies of breast cancer prognostication. We centrally generated an automated KI67 score from stained tissue microarrays (TMAs), and assessed its prognostic value overall for different subtypes of breast cancer. We also compared the prognostic performance of automated and visually derived KI67 scores in a subset of patients.

Methods

Study population and study design

This analysis was conducted within the Breast Cancer Association Consortium [25], which is a large, ongoing collaborative project involving study groups across the globe. Figure 1 shows that we collected a total of 166 TMAs containing 19,039 cores, representing 10,005 patients from 13 study groups (Additional file 1: Table S1). Ten study groups provided unstained TMAs which were then stained and digitised in the Breakthrough Core Pathology laboratory at the Institute of Cancer Research (ICR) and the academic biochemistry laboratory of the Royal Marsden Hospital (RMH), London, UK. Two groups (MARIE and PBCS) provided pre-stained TMAs which were also digitised in our centre. One study (SEARCH) provided TMA images acquired using a similar Ariol technology (a digital image acquisition and analysis system) to the one adopted for this analysis. Of the 10,005 patients, 1917 were excluded on account of failing predefined quality control checks (N = 946) or due to absent data on follow-up times and/or vital status (N = 971). As a result, a total of 8088 patients from 10 study groups with a median follow-up of 7.5 years and a total of 1401 breast cancer specific deaths were used in the survival analysis involving automated KI67. Of these, 2440 patients with pathologists’ visual KI67 scores in addition to automated KI67 scores were used to extrapolate a visual from an automated cut-off point, following which comparative survival analyses involving visual and automated KI67 scores were conducted. Information on other clinico-pathological characteristics of tumours including histological grade, nodal status, tumour size, stage, adjuvant systemic therapy (endocrine therapy and/or chemotherapy) and other IHC markers (i.e. oestrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2)) were obtained from clinical records. Additional Ariol HER2 data were obtained for a subset of patients with missing clinical HER2 data but for whom data on ER and PR were available (N = 403). All patients provided written informed consent and all participating studies gained approval from the local ethical committees and institutional review boards.

Fig. 1
figure 1

Study population and study design. We collected 166 TMAs containing 19,039 cores from 10,005 patients. Of these, 15 TMAs containing 1346 cores were selected as the training set and these were used to develop an automated scoring protocol that was validated against corresponding computer-assisted visual (CAV) scores. Ultimately, this protocol was applied to the scoring of all 166 TMAs. Following automated scoring, all cores that failed our priori defined quality control checks (including total nuclei count >50 and <15,000, and KI67 score = 100 %) were excluded (N = 946 patients). For the purpose of survival analyses, all subjects with missing follow-up/survival data were also excluded (N = 971 patients). As a result, a total of 8088 patients were used in the survival analysis involving automated KI67 score. Furthermore, based on a subset of patients (N = 2440) with pathologists’ KI67 scores in addition to the automated KI67 scores, we extrapolated a visual from an automated cut-off point and used this to compare the prognostic performance of visual and automated KI67 scores in breast cancer. QC quality control, TMA tissue microarray

KI67 immunostaining

Sections were dewaxed using xylene and rehydrated through graded alcohol (100, 90 and 70 %) to water. Slides were then placed in a preheated (5 min, 800 W microwave) solution of Dako Target Retrieval solution pH 6.0 (S1699), microwaved on high power for 10 min and then allowed to cool in this solution at room temperature for 10 min. In the next stage, the slides were placed on a Dako autostainer and stained using a standard protocol with Dako MIB-1 diluted 1/50 and visualised using the Dako REAL kit (K5001). The MIB-1 antibody was also adopted for the staining of the TMAs that were not part of those centrally stained at the ICR, but at varying concentrations (PBCS = 1:500; MARIE = 1:400 and SEARCH = 1:200) (Additional file 1: Table S2).

KI67 scoring

KI67 scoring has been described previously [26], but briefly all TMAs were digitised using the Ariol 50s digital scanner. Fifteen TMAs were selected as a training set. These were scored visually by a pathologist (MA) using a computer-assisted visual (CAV) counting method and used to validate the automated method. The CAV method relied upon built-in features of the Ariol digital system to count negative and positive nuclear populations within 250 μm × 250 μm squares separated by grids. The standard CAV approach entailed the counting of at least 1000 cells across the entire spectrum of each core. In the majority of cores, more than 1000 cells were counted even though fewer than this number was counted in a small minority. Overall, cores with more than 500 cells were considered to be of satisfactory quality. The CAV method is precise, prevents double counting and was observed to have excellent intra-observer reproducibility when a random subset of cores (N = 111) were re-scored at an interval of 3 months from the first time they were scored (observed agreement/kappa = 96 %/0.90); good core-level agreements with two other independent scorers (observed agreement/kappa: CAV vs scorer 2 = 87 %/0.66; CAV vs scorer 3 = 84 %/0.59; scorer 2 vs scorer 3 = 89 %/0.69) were also observed in a randomly selected subset of 202 cores. Visual scoring in the external TMAs involved both quantitative and semi-quantitative methods. Each core from each patient was scored by two independent pathologists and the KI67 score for each patient was then taken as the average score from the two scorers across all cores for that patient.

The automated scoring was performed using the Ariol machine which has functionality that allows for the automatic detection of malignant and non-malignant nuclei using shape and size characteristics. Using colour deconvolution, it also distinguishes between DAB-positive and DAB-negative (haematoxylin) malignant cells. To determine the negative and positive populations of cells, an appropriate region of interest of the malignant cell population in a core was demarcated and two colours were selected to indicate positive and negative nuclear populations. The appropriate colour pixels were then selected to represent the full range of hue, saturation and intensity that was considered representative of the positive and negative nuclear classes [26]. Subsequently, the best shape parameters that discriminated malignant and non-malignant cells according to their spot width, width, roundness, compactness and axis ratio were also selected. The data were divided into a training and a validation subset and the automated and visual scoring for KI67 showed good agreement (observed agreement = 87 %; Kappa = 0.64) and discriminatory accuracy (AUC = 85 %) in the validation subset, hence allowing for the adoption of this method for the scoring of all 166 TMAs.

Statistical methods

For patients with multiple cores from the same tumour, we used the average KI67 score across valid cores to represent the % positive cells in that tumour. Descriptive analyses of the distribution of KI67 according to clinical and pathological characteristics of the patients were conducted using the non-parametric Kruskal–Wallis equality of medians test for continuous measures and the paired chi-squared test for categorical measures. The relative survival probabilities for patients in different quartiles of the KI67 distribution were compared using Kaplan–Meier survival curves for the 10-year breast cancer specific survival (BCSS). To allow for prevalent cancers, time at risk was left-censored for study entry. It was decided, a priori, not to make any assumptions on a prognostic cut-off point for automated KI67 scores in our dataset but instead to leverage on the continuous values to observe a prognostic threshold. As a result, we performed quartile analysis by dividing the continuous KI67 scores into quartiles (Q1–Q4) and examining the prognostic differences among the different quartiles for all patients in the study. The 10-year BCSS was determined using Kaplan–Meier survival curves and Cox-proportional hazards regression models stratified by ER status (positive vs negative) and according to nodal status (positive vs negative) and other IHC markers. The univariate Cox models were partially adjusted for study group and age at diagnosis while the multivariate models had further adjustments for other known prognostic factors including histological grade, tumour size, nodal status, morphology, ER, PR, HER2 and adjuvant systemic therapy (endocrine and/or chemotherapy). In the multivariate models, missing values for other covariates were addressed using the multiple imputation plus outcome (MI+) approach [27]. Because of observed violation of the proportionality assumption of the Cox model by automated KI67, it was modelled as a time-varying covariate using an extension of the Cox model that allows for the inclusion of a coefficient (T) that varied as an exponential function of time. The log of the coefficient is indicative of both the direction and the magnitude of change in hazard ratio with time, such that if log T < 1 then hazard falls with time, while if log T > 1 then hazard increases with time. Known violation of the proportional hazards assumption by ER was addressed in the same way. Consistency of hazard ratio (HR) estimates across the different study groups was evaluated using the I 2 statistic, derived by performing a fixed-effect meta-analysis of study-specific HR estimates. To enable direct comparison between the visual and automated KI67 scores, we extrapolated a visual from an automated cut-off point in a linear regression model and used the resulting cut-off point for all further analyses. All analyses were conducted using STATA statistical software version 10 (StataCorp, College Station, TX, USA). Statistical tests were two-sided and P < 0.05 was considered statistically significant.

Results

Description of study population and association between automated KI67 score and other clinico-pathological characteristics of breast cancer patients (N = 8088)

In all, a total of 143 TMAs containing 15,313 cores from 8088 patients were used in this analysis, as shown in Fig. 1. The studies included in this analysis used different TMA designs (Table 1). More than half (4431/55 %) of the patients had KI67 scores on at least two cores and evaluation of dichotomous categories revealed concordant KI67 status in 83.7 % of the patients. When we examined the distribution of continuous KI67 scores among categories of the different clinical and pathological characteristics we observed this to differ according to histological grade, tumour size, morphology, ER status, PR status and HER2 status, but not nodal status or stage at diagnosis (Fig. 2). The distribution of these characteristics for patients with high KI67 (Q4 or >12 % positive cells) and low KI67 (Q1–Q3) are shown in Additional file 1: Tables S3 and S4 for ER-positive and ER-negative patients, respectively.

Table 1 Description of study populations, TMA designs and patient characteristics for the 8088 patients included in this analysis
Fig. 2
figure 2

Distribution of continuous KI67 scores according to categories of other clinical and pathological variables. Significant differences were seen in the distribution of automated KI67 scores according to categories of histological grade, tumour size, morphology, ER status, PR status and HER2 status, but not nodal status or stage. ER oestrogen receptor, HER2 human epidermal growth factor receptor 2, PR progesterone receptor

Association between automated KI67 score and 10-year BCSS among 8088 patients

Using continuous measures of KI67 categorised into quartiles, we observed poorest survival in the highest quartile, corresponding to 12 % positive cells, but little difference in survival between the other three (Q1–Q3) quartiles (log-rank P = 1.2 × 10−5; Fig. 3a). As a result, the continuous KI67 value was dichotomised at the threshold of 12 % in subsequent analyses. High KI67 was significantly associated with worse 10-year BCSS overall (log-rank P = 3.1 × 10−7) among ER-positive cancers (log-rank P = 1.3 × 10−3) but not ER-negative cancers (log-rank P = 0.35) (Fig. 3b–d, respectively). Similarly, in multivariate models, high KI67 expression was significantly associated with worse 10-year BCSS among ER-positive cancers (HR at baseline = 1.96; 95 % CI = 1.31–2.93) but not ER-negative breast cancers (HR = 1.23; 95 % CI = 0.86–1.77; P-heterogeneity = 0.064) (Table 2). Further stratification of ER-positive cancers according to nodal status showed that high KI67 was associated with worse survival in both node-negative and node-positive cancers in multivariate analysis (node-negative 2.47 (1.16–5.27); node-positive 1.74 (1.05–2.86); P-heterogeneity = 0.67) (Table 2). The association between KI67 and survival was significant among ER-positive patients who did not receive chemotherapy (1.95 (1.18–3.21); P = 0.009) but not among those who did (1.89 (0.84–4.29); P = 0.124; P-heterogeneity = 0.60). We found no evidence of between-study heterogeneity in estimates of HR for ER-positive patients (I 2 = 0.0 %, P = 0.94) or ER-negative patients (I 2 = 0.0 %, P = 0.86) (Additional file 2: Figure S1). Among hormone receptor-positive breast cancers, the HR for KI67 was not significantly different according to HER2 status (Table 2; P-heterogeneity = 0.270). Modest evidence for a poorer prognosis among high, relative to low, KI67 was also seen for triple-negative breast cancers (1.70 (1.02–2.84); P = 0.04). No significant associations with prognosis were found for KI67 among HER2-positive (i.e. ER/PR/HER2+) breast cancers (0.91 (0.60–1.36)) (Table 2).

Fig. 3
figure 3

Kaplan–Meier survival curves for the 10-year BCSS according to strata of automated KI67 scores, overall and by ER status. KM survival curves for the association between KI67 and 10-year BCSS among: (a) quartiles of KI67 (Q1, <25th percentile; Q2, 25th–50th percentile; Q3, >50th to 75th percentile; and Q4, >75th percentile; N = 8088); (b) dichotomous categories of KI67 (≤12 %/low and >12 %/high) overall (N = 8088 patients); (c) ER-positive cancers (N = 5520 patients); and (d) ER-negative cancers (N = 2049 patients)

Table 2 Hazard ratio (HR) and 95 % CI for the association between automated KI67 score and 10-year BCSS in partially and fully adjusted models: analysis stratified overall and according to ER, nodal status and other immunohistochemical markers (N = 8088 patients)

Comparison of 10-year BCSS among 2440 patients with both visual and automated quantitative KI67 scores

The automated cut-off point of 12 % positive cells corresponded to a visual cut-off point of 24.2 % based on a linear regression model comprising patients with quantitative data on both methods. The visual cut-off was rounded up to a cut-off point of 25 %. Strong evidence (P < 0.0001) in support of a positive linear correlation (r = 0.63) between automated and visual scores was observed and continuous automated scores showed good discriminatory accuracy against the visually determined binary classes (AUC = 82 %, 95 % CI = 80–84 %)(Additional file 3: Figure S2). Twenty-six percent of the patients were classified as having high visual KI67, in contrast to 29 % for the automated KI67 scores; cross-classification of visual and automated categories revealed better specificity (84 %) than sensitivity (65.6 %) for the automated score in classifying visually determined categories (Additional file 1: Table S5). High KI67 was associated with worse survival in Kaplan–Meier curves based on both automated (log-rank P = 9.8 × 10−6) and visual (log-rank P = 3.8 × 10−14) KI67 scores even though attenuation of the difference between strata was observed for automated KI67 scores (Additional file 4: Figure S3). In two separate models for visual and automated KI67 scores each adjusted for age at diagnosis and study group we observed stronger evidence for an association between KI67 and survival for the visual KI67 score than for the automated KI67 score (Table 3). Analysis of model fit revealed similar parameters for both scores, however, especially in ER-positive breast cancers (AIC/BIC: visual = 2656/2618; automated = 2675/2638) (Table 3). When we performed further adjustments for other prognostic factors in multivariate Cox models of imputed datasets, we observed both visual and automated KI67 scores to be significantly associated with survival for all patients (HR (95 % CI): visual = 1.75 (1.23–2.49); automated = 1.61 (1.14–2.28)) and for ER-positive patients (visual = 2.30 (1.34–3.94); automated = 2.10 (1.28–3.47)), but not for ER-negative patients (visual = 1.63 (0.97–2.72); automated = 1.28 (0.79–2.05)) (Table 3).

Table 3 Univariate (partially adjusted) and multivariate (fully adjusted) hazard ratio (HR) and 95 % CI for the associations between automated and visual KI67 scores with survival in breast cancer (N = 2440)

Discussion

Findings from our analysis provide strong evidence in support of a prognostic relationship for automated KI67 scoring in ER-positive (node-negative and node-positive) patients that is independent of tumour grade and other prognostic factors. Even though our data suggested a larger magnitude of the association between KI67 and survival among the node-negative patients, the difference between node-positive and node-negative was not statistically significant. Involving over 8000 patients from multiple centres internationally, this represents the largest study that has evaluated the prognostic value of automated KI67 scoring in breast cancer to date. Furthermore, the large sample size allowed us to evaluate its prognostic value in a number of breast cancer subtypes including ER+ (node-negative and node-positive), ER, ER+ and/or PR+ (HER2+ or HER2), ER/PR and HER2+ (i.e. HER2-enriched) and triple-negative breast cancers.

Our findings suggest that automated KI67 scoring is an analytically valid approach to generating KI67 scores. This is particularly noteworthy given the growing need to incorporate measures of KI67 in prognostic tools such as the IHC4 score and PREDICT [23, 24]. These tools are relatively cheap, readily available and utilise routinely measured IHC markers and, in the case of PREDICT, other routinely available patient data to provide information that can help clinicians and patients make informed decisions regarding the course of treatment. It is acknowledged that prognostication in breast cancer is becoming increasingly more sophisticated and that a number of multigene assays [28, 29] have been validated for this purpose; however, their costs and proprietary concerns limit their use in a large number of settings. Moreover, findings from previous studies suggest that some multigene assays may not perform better than routinely measured IHC markers. For instance, Cuzick et al. [23] reported similar prognostic properties for the Genomic Health recurrence score (GHI-RS, Oncotype DX), a 21-gene panel test, and the IHC4 score in their analysis of 1125 women from the TransATAC study, and notably KI67 was assessed by image analysis in that study [23]. Nonetheless, the relative performance of visual and automated KI67 scores in relation to the IHC4 score or PREDICT can only be assessed in studies that are specifically designed for that purpose.

In addition to lack of analytical validity, the prognostic performance of KI67 has also been questioned due to the design and analysis of studies that have reported previously on this protein [3]. Our evaluation is a large-scale, multicentre analysis which has adopted the recommended laboratory processes for the staining and scoring of KI67 [8]. All TMAs in our analysis were stained using the MIB1 antibody (even though not all of them were centrally stained in our centre) and scored using a single automated algorithm. Our estimates of ~2-fold and ~1.5-fold increased risk of mortality at baseline for high versus low KI67 in univariate and multivariate analyses, respectively, are similar to those reported by de Azambuja et al. (HR = 1.95) and Harris et al. (HR = 1.42) [6, 7] in their univariate and multivariate meta-analyses, respectively. Stratification of our analysis according to other IHC markers (in addition to ER) showed automated KI67 to be prognostic in hormone receptor-positive cancers. These findings, together with our observation of the prognostic value of KI67 in both node-negative and node-positive ER-positive patients, support the decision by the St Gallen International Expert Consensus to endorse KI67 for treatment decision-making in ER-positive early (1–3 axillary nodes) breast cancer patients [1]. We also observed modest evidence in support of poorer survival outcomes among high, relative to low, KI67 expressing triple-negative subtypes of breast cancer. This finding is in support of a previous report by Keam et al. [30]. Our population of triple-negative breast cancers (N = 1001), however, was 9.5 times larger than that of Keam et al. (N = 105).

Comparative analysis of visual and automated KI67 scores showed a stronger survival association for the visual over the automated scores; however, differences were generally modest. Given the advantages of automated versus visual scoring in terms of its potential for standardisation, reproducibility and throughput, automated methods appear to be promising alternatives to visual scoring for KI67 assessment. A potential limitation to the adoption of automated KI67 scoring in the clinical setting is that misclassification of positive nuclei as negative or malignant nuclei as benign could lead to attenuation of prognostic associations, an observation that has been reported previously for ER and PR [31] and one which we have also observed for KI67 in this analysis. This can be mitigated, however, by stringent quality control processes or by the adoption of a synergistic approach that combines the benefits of both the automated and visual scoring methods. One such approach is the CAV scoring method which we developed for the visual counting of negative and positive malignant nuclei. This approach, a variation of which has been reported previously [15], exploits the advantages of both visual and digital imaging tools by enabling the visual counting of KI67-positive cells in well-defined areas of a tumour within a computer microenvironment. This method is limited, however, by the observation that it is time consuming; as such, it may not be efficient if adopted for the large-scale scoring of KI67 in epidemiological studies, clinical trials or biomarker discovery studies. Nonetheless, efforts are currently underway to standardise the methods for the visual scoring of KI67 in core-cuts.

We centrally generated KI67 scores on TMAs and determined a threshold of 12 % positive cells of prognostic relevance in our study population. However, due to possible variations in the distribution of KI67 scores according to specimen type and among different laboratories, this cut-off point may not apply to other types of clinical samples or to other laboratories. As a result, pending international standardisation of the KI67 analytical processes, setting local laboratory-specific cut-off points as recommended by international guidelines [1] remains a pragmatic approach to determining ‘high’ and ‘low’ KI67. Furthermore, although our automated cut-off point of 12 % positive cells was determined to correspond to a visual score of 25 %, this may be related, at least in part, to the fact that automated systems generally count more cells than the visual evaluator, a reason that has been proposed to explain differences in KI67 scores between visual and automated scoring and different automated scoring approaches [26]. Nonetheless, findings from a recent meta-analysis that assessed the prognostic value of different cut-off levels of KI67 suggest that a visual cut-off point >25 % provides greater discrimination in mortality risk than other cut-off points [32].

Some limitations of our analysis include the lack of data on specific chemotherapeutic or endocrine agents received by each patient, as a result of which we were unable to account for the impact of a specific treatment regimen on survival or to examine whether or not KI67 is predictive of response to specific chemotherapeutic and/or endocrine agents. We were, however, able to account for whether or not patients received adjuvant systemic treatment in all our analyses because more than two-thirds of the patients had information on treatment. This also allowed us to perform stratified analysis according to whether or not chemotherapy was administered. Also, we did not have data on disease-free survival which may have been a more informative end point than BCSS in early breast cancer. Our assessment of KI67 on TMAs may mean that direct inference cannot be drawn from our findings on other types of clinical samples, especially whole sections [8]. This is because KI67 scores are speculated to be lower for TMAs than for whole sections and not many studies have assessed the correlation between KI67 scores on TMAs and those on whole sections. However, one such study by Kobierzycki et al. [33] involving 51 archival paraffin blocks of invasive ductal carcinoma showed excellent correlation (r = 0.91) between the TMAs and whole sections. Their paper utilised three 0.6 mm core punches, however, and this may explain the high correlation between KI67 scores on TMAs and whole sections that was observed in that study. Nonetheless, the fact that more than half (4431/55 %) of the patients in our analysis had KI67 scores on two or more cores, with 83 % of these showing concordant KI67 status, should limit the impact of intra-tumour heterogeneity of KI67 scores on our findings.

Conclusion

Our large, multicentre study indicates that automated KI67 scoring provides prognostic information in breast cancer that is independent of standard parameters. In view of its potential for standardisation, throughput and reproducibility, the automated method appears to be a promising alternative to visual scoring for KI67. These findings are important given the increasing need to incorporate measures of KI67 as part of tools that are needed to refine prognostic scores for breast cancer patients; this is especially relevant for patients with ER-positive, node-negative tumours, in order to aid decisions on providing adjuvant chemotherapy. However, further work is needed to standardise the staining and scoring protocols for KI67. In doing so, the potential benefits and drawbacks of automated versus visual scoring systems should merit consideration. In light of this we welcome ongoing efforts by the International Working Party on KI67 in Breast Cancer aimed at standardisation of the analytical processes for KI67.