Abstract
Purpose
Quantification of Ki67 in breast cancer is a well-established prognostic and predictive marker, but inter-laboratory variability has hampered its clinical usefulness. This study compares the prognostic value and reproducibility of Ki67 scoring using four automated, digital image analysis (DIA) methods and two manual methods.
Methods
The study cohort consisted of 367 patients diagnosed between 1990 and 2004, with hormone receptor positive, HER2 negative, lymph node negative breast cancer. Manual scoring of Ki67 was performed using predefined criteria. DIA Ki67 scoring was performed using QuPath and Visiopharm® platforms. Reproducibility was assessed by the intraclass correlation coefficient (ICC). ROC curve survival analysis identified optimal cutoff values in addition to recommendations by the International Ki67 Working Group and Norwegian Guidelines. Kaplan–Meier curves, log-rank test and Cox regression analysis assessed the association between Ki67 scoring and distant metastasis (DM) free survival.
Results
The manual hotspot and global scoring methods showed good agreement when compared to their counterpart DIA methods (ICC > 0.780), and good to excellent agreement between different DIA hotspot scoring platforms (ICC 0.781–0.906). Different Ki67 cutoffs demonstrate significant DM-free survival (p < 0.05). DIA scoring had greater prognostic value for DM-free survival using a 14% cutoff (HR 3.054–4.077) than manual scoring (HR 2.012–2.056). The use of a single cutoff for all scoring methods affected the distribution of prediction outcomes (e.g. false positives and negatives).
Conclusion
This study demonstrates that DIA scoring of Ki67 is superior to manual methods, but further study is required to standardize automated, DIA scoring and definition of a clinical cut-off.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Breast Cancer is the most common cancer globally and the fifth leading cause of cancer deaths [1]. In Norway, 4,247 new cases were reported in 2022 [2]. A panel of immunohistochemistry (IHC) biomarkers can classify breast cancer into four surrogate subtypes: hormone receptors (HR) (estrogen (ER) and progesterone (PR)), HER2 and Ki67. These classify luminal A-like (low Ki67, HR+ , HER2−), luminal B-like (high Ki67, HR+ , HER2−), HER2 positive (HER2+) and triple negative (HR−, HER2−) tumors [3, 4]. The endocrine-therapy sensitive luminal-like group is the largest category and may be further categorized, using the biomarker Ki67, into luminal A-like (low-Ki67) and luminal B-like (high-Ki67). The latter benefiting from additional chemotherapy, whilst the former does not [5].
The Ki67 score defines the percentage of positively stained tumor cells in a defined hotspot or global region [6, 7] and has prognostic, predictive, and monitoring potential [8,9,10,11,12,13]. Patients with a low Ki67 score have a low recurrence risk and may be spared from chemotherapy whilst patients with a high Ki67 score are associated with an increased risk of recurrence, higher mortality rate and may benefit more from adjuvant chemotherapy [14,15,16,17,18,19,20].
Despite its value as a prognostic biomarker, there is no global consensus on how to standardize Ki67 scoring. This includes standardization of pre-analytical and analytical conditions, a protocol for measurement of Ki67, and cutoff score for adjuvant treatment [6, 21,22,23,24]. In 2021 the St. Gallen consensus recommended the guidelines set by the International Ki67 in Breast Cancer Working Group (IKWG): patients with low Ki67 < 5% are not recommended for adjuvant chemotherapy whilst patients with a high Ki67 ≥ 30% are recommended [21]. However, treatment recommendations for the intermediate category (> 5%, < 30%) are still debated, and other cut-offs are still used.
With the rise of digital pathology and artificial intelligence (AI), digital scoring of Ki67 by automated algorithms or applications (APPs) have provided a new avenue for improved quantification. Various studies have demonstrated equal or improved reproducibility and accuracy using automated digital image analysis (DIA) compared to manual scoring methods [9, 25,26,27,28,29,30]. However, no recommendation exists currently for automated DIA methods and established cutoffs for these methods have not yet been extensively validated.
In this study, we aim to compare the reproducibility and prognostic capacity of Ki67 score using four DIA scoring methods compared to using two manual methods (conventional-HS and global unweighted and weighted).
Materials and methods
Study cohort
The study received approval from the Regional Ethics Committee of Health West Norway (2010/1241) and informed consent waived. Patients who received a primary diagnosis of breast cancer between 1990 and 1998 (N = 346) and 2000–2004 (N = 253) at Stavanger University Hospital (SUH) were available for this study. Patients who received neoadjuvant treatment (N = 8) were excluded. The following inclusion criteria were used to select cases: 1) archive tumor material available as a formalin-fixed paraffin embedded (FFPE) tissue blocks from surgical excisions, 2) HR+ (ER ≥ 1%, and/or PR ≥ 10%) and HER2− status, and 3) at least one follow-up sample, for non-progression cases > 6 months after initial diagnosis. Of the 591 cases, 64 (11%) were removed due to missing blocks/lack of tumor material, 19 (3%) were lost to follow-up and 141 HER2+ /TNBC were excluded. In summary, 367 cases comprised the study cohort.
Datasets
The development dataset consisted of a training and tuning set (Fig. S1). A training set was used to create manual annotations to train classification algorithms. Five whole slide images (WSI) of good quality and representative of strong and weak staining, and high and low Ki67 positivity were selected. The WSIs were annotated with training labels for segmentation of key features: tissue and background, tumor and non-tumor, and positive brown and negative blue nuclei. Two of the four DIA methods required training (VIS1-HS, QuPath), whilst the two remaining methods (VIS2-HS, VIS2-G) were pre-trained and validated. All DIA methods were run on the tuning dataset, which was used to monitor and evaluate algorithm performance, assess reproducibility, and define prognostic cutoffs.
Immunohistochemistry and imaging
New 3 µm tissue sections were cut, mounted on SuperFrost® Plus slides (Menzel Gläser, Braunschweig, Germany), and dried overnight at 37 °C followed by 1 h at 60 °C. Sections were transferred deparaffinized to the Dako Omnis (Dako, Glostrup, Denmark). Antigen retrieval was performed using the EnVision FLEX Target Retrieval Solution High pH (Dako Omnis), heated at 97 °C for 30 min. Sections were stained for Ki67 using a pre-optimized protocol (diagnostic protocol). Given the age of the tissue blocks, the diagnostic protocol required adjustment to address the issue of false negative staining. The Dako MIB-1 clone was diluted 1:50 (diagnostic protocol uses 1:100) and incubated for 20 min. Additionally, signal amplification was performed using the EnVision FLEX + Mouse LINKER (Dako Omnis) with a 10-min incubation (diagnostic protocol uses FLEX). Previously stained slides from 2011 were matched and compared to the newly stained sections, using the adjusted protocol, to ensure equivalent results.
Manual hotspot Ki67 quantification (conventional)
Using a microscope, the whole section was viewed at low power to identify the most proliferative region (hotspot) in the invasive tumor region. Non-invasive regions, necrotic regions and areas with high lymphocytic infiltration were avoided. Two manual methods, which reflected the diagnostic setting at the time of counting, were used for the 1990–1998 and 2000–2004 cohorts. Manual counts were performed previously and available in the pre-established database.
For the 1990–1998 cohort, a 40X objective was used to count positively stained tumor nuclei (brown) and the total number of tumor nuclei in the hotspot. For each case, at least 500 tumor cells were counted. If fewer than 500 tumor cells, adjacent fields of view (FOV) were counted. Ki67 score was calculated as the percentage positive Ki67 (Ki67 positive/(Ki67 positive + Ki67 negative)).
For the 2000–2004 cohort, the interactive QPRODIT system (Leica, Cambridge, UK) was used to score Ki67 as described previously [31]. Within a hotspot tumor region, 250–350 FOV were defined, and a test grid used to classify each field as Ki67 positive or negative. A field was classified as positive if the first tumor cell that intersected with a grid point was positive, and vice versa for negative cells. Ki67 score was calculated as the percentage positive Ki67.
Both methods were grouped under the term conventional Ki67 score. No significant differences were observed between the two cohorts (Table S1).
Manual global Ki67 quantification (global unweighted and weighted)
Global weighted and unweighted scoring was performed according to the protocol set by the IKWG [32]. The IKWG Ki67 mobile counting tool was used (https://www.ki67inbreastcancerwg.org/). Using the NDP2.view2 image viewing software (v.2.9.29, Hamamatsu Photonics, Japan), a WSI of the Ki67-stained tissue was examined and the percentage area of negligible, low, medium, or high Ki67 was estimated and entered into the counting tool. Three to four circular annotations were placed to simulate a field of view, in each field type, as directed by the tool. In a typewriter fashion, 100 nuclei were counted as either negative or positive in each ROI. The unweighted and weighted global scores were recorded for each slide.
Digital image analysis (DIA)
Scanning
Whole sections stained with Ki67 were scanned at 40X magnification using the Hamamatsu Nanozoomer S60 (Hamamatsu Photonics, Hamamatsu City, Japan) at SUH.
Two platforms were used to score Ki67 on whole slide images (WSI): QuPath [33] and Visiopharm® (Version 2022.09.3.12885, Visiopharm A/S, Hørsholm, Denmark). The following hardware was used: Dell Precision 3640 Tower, Intel Core i9-10900, Nvidia GeForce RTX 2080 Ti.
VIS1-HS
An in-house APP (VIS1-HS) was developed using the Visiopharm® platform for quantification of Ki67 score (Fig. 1). Six standalone APPs were developed, that were batch run: 01 Tissue detection, 02 Positive nuclei detection, 03 Hotspot detection; 04 Tumor detection, 05 Nuclei segmentation, and 06 Hotspot Ki67 score quantification. Manual annotations from the training dataset were used to create labels to train the classifier APPs: (01) tissue and non-tissue training labels to train a tissue classifier, (02) positive nuclei labels to train a positive nuclei classifier, (04) tumor and non-tumor labels to train a tumor classifier, and (05) positive and negative nuclei labels to train a nuclei classifier. Training was performed until the classifier achieved accurate and consistent predictions, as determined by the operator. All classifier APPs used supervised K-means clustering. Specific parameters were trialed with the VIS1-HS APP (Fig. S1): the drawing radius for the heatmap configuration (175 µm, 400 µm), number of hotspot ROI (1–5), and ROI size criteria (0.2mm2, 1mm2, minimum 550 tumor cell count, minimum 1000 tumor cell count).
VIS2-HS/G: Visiopharm® provided a commercial, CE-IVD Ki67 quantification APP (VIS2-HS/G) (Fig. 1). This method consists of six standalone APPs that were batch run, in the following order: 01 #10,182—IHC Tissue Detection, AI; 02 #10180—Invasive Tumor Detection, AI; 03 #10180—Invasive Tumor Postprocessing; 04 #10173—Ki-67 Nuclei APP, Breast Cancer, AI; 05 #10114 Hot Spot Detection; 06 #10114 Hot Spot Quantification (Table S2). This method generated both a global score (VIS2-G), which assessed the invasive tumor region of the entire WSI, and a hotspot score (VIS2-HS).
QuPath
A classifier for quantification of Ki67 score was developed using QuPath (Fig. 1), based on the protocol established by Acs and colleagues [34]. To train the cell classifier, positive and negative tumor cell nuclei were annotated manually on WSI from the training dataset. A tumor ROI was first manually drawn around the invasive tumor region and the cell classifier script (Table S3) run. The density maps tool generated a heatmap and three ROIs were automatically placed on suspected hotspots. Each hotspot was reviewed, and the highest scoring hotspot was approved according to the following criterion: minimum 500 tumor cells and satisfactory labelling of positive and negative tumor nuclei. If the ROI contained less than 500 tumor cells, it was manually enlarged until this criterion was met. The QuPath method was semi-automated; manual delineation of the tumor region was required for each case, and the classifier and density maps were run separately per case.
Statistical analysis
Statistical analyses were performed with SPSS for Windows (version 26.0.0; IBM SPSS Statistics) and R studio (2023.06 + 561). Assumptions were verified for each test and p < 0.05 was significant. Mann Whitney U and Kruskall Wallis tests were used to test for significant differences between patients with no distant metastases (DM) and with DM. Level of agreement, using the intraclass correlation coefficient (ICC), was assessed on transformed Ki67 scores (multiplied by a factor of 10 and log transformed). Receiving operating characteristic (ROC) curves were generated for each scoring method and the area under the curve (AUC) was used to assess a method’s discriminative ability. Kaplan Meier survival analyses were performed to assess the prognostic value of DM-free survival and compared using log-rank (endpoint: first diagnosis of a DM in the follow-up or censored according to last-known follow-up date). Cox regression models for univariate and multivariate analysis was performed for: age, tumor size, mitotic activity index, Nottingham grade, operation type, adjuvant treatment, and Ki67 score.
Results
Overview
Of the 367 cases from the development dataset, 12 cases were ineligible for analysis due to poor quality material and 61 cases due to poor staining (Fig. S2). This left 294 cases (Table S4). A further 9 cases failed analysis with QuPath due to: a false positive edge effect, high numbers of inflammatory cells, necrosis, and artefacts (Fig. S2).
Performance of DIA methods
For the VIS1-HS method various ROI specifications were trialed and differences in Ki67 scores for each was considered marginal (Table S5). A 1mm2 ROI was selected for the final VIS1-HS method as it was the most consistent for detecting over 500 tumor cells.
All DIA methods required manual editing for some or all cases. For VIS1-HS, nearly all cases required manual removal of either DCIS, artefacts, inflammatory cell clusters or normal tissue. For VIS2-HS, 13% of cases (37/294) required a manual edit. The VIS2-G method required a manual edit in 28% of cases (81/294). For QuPath, all cases required a manual delineation of the tissue region for analysis, and 45% of cases (128/285) required a manual intervention for HS ROI placement or expansion.
Comparison of Ki67 score for four DIA scoring and two manual methods
For DIA and manual scoring methods the mean Ki67 score ranged from 9.5% (VIS2-G) to 16.2% (QuPath) (Table 1, Fig. S3). For cases with no distant metastases (no DM) compared to cases with distant metastases (DM) in the follow-up, the greatest mean difference was recorded for QuPath (11.9%), followed by the VIS2-HS (9.6%), global weighted (8.6%), VIS1-HS (7.2%), VIS2-G (6.9%), global unweighted (6.8%), and conventional hotspot (5.9%) methods. Furthermore, QuPath recorded higher scores on average, with the VIS2-G method recording the lowest (Table 1).
Reproducibility is one of the primary concerns regarding scoring of Ki67. In the study cohort, there was moderate to excellent agreement between all scoring methods (Table 2). Comparison of all hotspot scoring methods revealed good to excellent agreement (ICC 0.781–0.906). Agreement was on average higher between automated hotspot methods than between automated and manual hotspot methods (Table 2). There was good agreement between the global DIA method and manual global method (ICC 0.803–0.810). Agreement was lower between global and hotspot scoring methods (ICC 0.636–0.759).
Defining a prognostic threshold
A ROC curve was generated to assess DM-free survival and identify optimal cutoffs. The two methods with the highest AUC were QuPath, 0.721 (95% CI 0.643–0.798) and VIS2-HS, 0.705 (95% CI 0.625–0.785). The manual methods recorded the lowest AUC: conventional hotspot, 0.655 (95% CI 0.569–0.741); global weighted, 0.648 (95% CI 0.554–0.744); and global unweighted, 0.636 (95% CI 0.541–0.731). A range of coordinates were selected from the ROC curve for binary categorization of Ki67 score, which revealed similar DM-free survival (Table S6, Fig. S4). The manual methods demonstrated lower p-values (log-rank) for cut-offs around 10%, whereas automated methods reported lower p-values between 10 and 14% (Table S6). A 14% cutoff was chosen for further evaluation, due to its recommendation by the Norwegian Guidelines [35].
Binary categorization of Ki67-14% for all methods was significantly associated with DM-free survival in a 20-year follow-up period (Fig. 2a). All methods demonstrated a significant separation of patients with DM-free survival for low (< 14%) and high (≥ 14%) Ki67, with the manual methods reporting the smallest separation in comparison to DIA (Fig. 2a). Additionally, percentage agreement was highest for VIS1-HS and VIS2-HS methods and lowest between VIS2-G and QuPath methods (Table S7).
International recommendations suggest < 5% to assign low Ki67 and ≥ 30% to assign high Ki67, with remaining cases falling into an intermediate category. The VIS2-G method had the largest proportion of low (< 5%) Ki67 cases (58%) of all methods (Fig. 2b). QuPath and global weighted/unweighted methods had the highest proportion of high (≥ 30%) Ki67 cases (17–18%). The VIS2-G method reported the highest number of false negatives (low Ki67, DM), whilst the highest number of false positives was recorded by the manual global methods (high Ki67, no DM), respectively (Fig. 3). This was also observed at a 10-year follow-up.
Multivariate analysis
To assess the prognostic value of Ki67 scoring, we performed Cox regression analysis of each method alongside established prognostic markers. Of the variables tested, Nottingham grade, mitotic activity index (MAI10), tumor-size (2 cm), operation type, and Ki67 for all methods (14%) were statistically significant predictors of DM-free survival in univariate analysis (Table 3). Hazard ratios (univariate) for all DIA methods (Ki67 14%) ranged from 3.054 to 4.077, with overlapping 95% confidence intervals whilst the manual methods had lower hazard ratios: 2.012–2.056 (Table 3). In a multivariate analysis, Nottingham Grade, adjuvant treatment, and operation type were predictors in the final model.
Discussion
We compare several automated DIA tools for global and hotspot Ki67 to two manual methods in HR+ , HER2−, LN− breast tumors. Although Ki67 is considered an important biomarker in breast cancer, the concerns surrounding lack of standardization and poor reproducibility, have brought its value into question. In this study, we observed that commercial DIA tools (VIS2-HS, VIS2-G) required notably less manual editing compared to the in-house methods (VIS1-HS, QuPath).
Inter-platform variability demonstrated good to excellent agreement between all hotspot scoring methods and all global methods (ICC > 0.8). Another inter-platform study reported excellent reproducibility (ICC > 0.9) [26] and our observation of strong agreement between manual and DIA platforms is consistent with observations in the literature [7, 9, 36,37,38,39,40,41,42]
Although efforts have been made to standardize Ki67 scoring, both hotspot and global Ki67 score are still reported in the literature and in Norway, Sweden, and Denmark [6, 38, 43, 44]. In addition, a range of cutoffs for defining low and high Ki67 are still reported (range: 10–20%) [17, 26, 45, 46]. We observed a range of prognostic cutoffs, with 14% being optimal and in agreement with Norwegian guidelines [6]. The hazard ratios (HR 2.7–3.7) reported by Acs et al. [26] for DIA scoring on core needle biopsies and tissue microarrays were similar to those reported in our study (HR 3.1–4.1) and Boyaci et al. [47] (HR 2.6–4.2) for DIA scoring on surgical specimens. Furthermore, we observed that DIA methods had a greater discriminative capacity, using the AUC metric, than manual methods. This was reflected in the hazard ratios for Ki67 score (14%) and DM-free survival (DIA HR 3.054–4.077 vs. Manual HR 2.012–2.056). Another study observed similar hazard ratios for DIA scoring (hotspot HR: 6.88; global HR 3.13) compared to a manual hotspot method (HR 2.76), for recurrence free survival [48].
In 2021, the St. Gallen consensus adopted the IKWG recommendation of < 5% (low) and > 30% (high), with patients between 5 and 30% (intermediate) not recommended for treatment decisions by Ki67 [38]. In our study, evaluation of Ki67 score using these thresholds revealed that QuPath had the largest proportion of Ki67 high cases and highest number of false positives (high Ki67, no DM). Whilst, the VIS2-G, global scoring method, demonstrated the highest proportion of Ki67 low cases and highest number of false negatives (low Ki67, DM). This suggests that regardless of using a more restrictive cutoff, patients are still at risk of over- and under-treatment. Furthermore, assessing the number of false positives and negatives revealed differences in clinical consequence between methods. This is important for future implementation of DIA methods as the choice of method: hotspot or global scoring, can have notable differences in classification of patients.
A high total tumor count resulted in lower Ki67 scores. The average total number of tumor cells measured for global DIA score was > 100,000, far more than for all other methods and it consistently reported lower Ki67 scores than the others. Norwegian guidelines and IKWG recommendations for Ki67 scoring recommend a minimum of 400 to 500 tumor cells scored [6, 49]. Our results suggest that Ki67 scores from tumor cell counts around 2000–4000 cells, generated by a 1mm2 ROI, were more consistent, with fewer potentially under- or over-treated cases. Observations from Robertson et al. revealed greater reproducibility with increasing tumor cell counts (from 200 to 1000 tumor cells) [48]. This suggests the need for caution when translating current manual methods to a DIA method. A larger study is required to affirm the optimal number of total tumor cells for Ki67 score by DIA.
As use of molecular signature tests for classification of breast cancer increases, the use of Ki67 for treatment decisions is called into question. Molecular testing has demonstrated prognostic and predictive value [50,51,52,53,54]. However, where such molecular panels are unavailable, considered costly, or introduce delays to treatment due to slower return of results, Ki67 could be an equivalent approach. Furthermore, Ki67 has the potential to be used as a screening tool for recommending molecular testing in intermediate cases (Ki67 > 5%, < 30%) [38]. Majority consensus warrants use of both multigene panels and Ki67 score [21].
This study does not come without its limitations. The study utilized a retrospective cohort, and many of the tissue blocks were > 20 years old, therefore the staining protocol had to be adjusted due to antigen decay. Additionally, only one scanner type and one automated IHC-instrument was used whereas multiple different scanners and staining methods, from different locations and time periods, on different or the same tissue blocks, would be worth investigating. In the present study, we only used one global scoring DIA method (VIS2-G), and future work to compare global DIA reproducibility could be pursued. For a future study, it would be worthwhile to investigate DIA scoring of Ki67 in a prospective cohort, with molecular profile data, with a planned long-term follow-up, such as the EMIT study [55].
In summary, we report good agreement between manual and counterpart DIA scoring methods. DIA Ki67 scoring methods had a greater discriminative capacity for DM-free survival than manual methods. A range of cutoffs was prognostic for each method, but the choice of scoring method and cutoff can lead to notable differences in the number of patients to be treated or tested, emphasizing the need for further validation in a prospective cohort. Total tumor cell count contributed to changes in risk categorization using the recommended 5% and 30% threshold. Automated, DIA methods may improve reproducibility and prognostic value of Ki67 scoring in comparison to manual methods, if standardized.
Data availability
The patient databases used in this study are not publicly available due to ethical and legal concerns. Anonymized data can be requested from Stavanger University Hospital Institutional Data Access/Ethics Committee (contact via email: rek-vest@uib.no, REK vest, Rogaland, Vestland, Norway) for researchers who meet the criteria for access to confidential data.
References
Sung H, Ferlay J, Siegel RL et al (2021) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71:209–249. https://doi.org/10.3322/caac.21660
Cancer Registry of Norway. Cancer in Norway 2022 - Cancer incidence, mortality, survival and prevalence in Norway. https://www.kreftregisteret.no/globalassets/cancer-in-norway/2022/cin_report-2022.pdf. Accessed 12 Dec 2023
Yersal O, Barutca S (2014) Biological subtypes of breast cancer: prognostic and therapeutic implications. World J Clin Oncol 5:412–424
WHO Classification of Tumors Editorial Board (2019) Breast Tumours. 5th edn. International Agency for Research on Cancer
Curigliano G, Burstein HJ, Winer EP et al (2017) De-escalating and escalating treatments for early-stage breast cancer the St. Gallen International Expert Consensus Conference on the Primary Therapy of Early Breast Cancer 2017. Ann Oncol 28:1700–1712
Norwegian Directorate of Health (2021) Nasjonalt handlingsprogram med retningslinjer for diagnostikk, behandling og oppfølging av pasienter med brystkreft (National Guidelines for Diagnosis, Treatment and Follow-up of Patients with Breast Cancer). https://nbcgblog.files.wordpress.com/2021/03/nasjonalt-handlingsprogram-for-pasienter-med-brystkreft-01.03.2021-16-utgave.pdf. Accessed 12 Dec 2023
Leung SC, Nielsen TO, Zabaglo LA et al (2019) Analytical validation of a standardised scoring protocol for Ki67 immunohistochemistry on breast cancer excision whole sections: an international multicentre collaboration. Histopathology 75:225–235. https://doi.org/10.1111/his.13880
Smith I, Robertson J, Kilburn L et al (2020) Long-term outcome and prognostic value of Ki67 after perioperative endocrine therapy in postmenopausal women with hormone-sensitive early breast cancer (POETIC): an open-label, multicentre, parallel-group, randomised, phase 3 trial. Lancet Oncol 21:1443–1454. https://doi.org/10.1016/S1470-2045(20)30458-7
Rimm DL, Leung SC, McShane LM et al (2019) An international multicenter study to evaluate reproducibility of automated scoring for assessment of Ki67 in breast cancer. Mod Pathol 32:59–69. https://doi.org/10.1038/s41379-018-0109-4
Cheang MC, Chia SK, Voduc D et al (2009) Ki67 index, HER2 status, and prognosis of patients with luminal B breast cancer. J Natl Cancer Inst 101:736–750. https://doi.org/10.1093/jnci/djp082
De Azambuja E, Cardoso F, de Castro G et al (2007) Ki-67 as prognostic marker in early breast cancer: a meta-analysis of published studies involving 12 155 patients. Br J Cancer 96:1504–1513. https://doi.org/10.1038/sj.bjc.6603756
Yerushalmi R, Woods R, Ravdin PM, Hayes MM, Gelmon KA (2010) Ki67 in breast cancer: prognostic and predictive potential. Lancet Oncol 11:174–183. https://doi.org/10.1016/S1470-2045(09)70262-1
Penault-Llorca F, André F, Sagan C et al (2009) Ki67 expression and docetaxel efficacy in patients with estrogen receptor-positive breast cancer. J Clin Oncol 27:2809–2815. https://doi.org/10.1200/JCO.2008.18.2808
Harbeck N, Rastogi P, Martin M et al (2021) Adjuvant abemaciclib combined with endocrine therapy for high-risk early breast cancer: updated efficacy and Ki-67 analysis from the monarchE study. Ann Oncol 32:1571–1581. https://doi.org/10.1016/j.annonc.2021.09.015
Lee AK, Loda M, Mackarem G et al (1997) Lymph node negative invasive breast carcinoma 1 centimeter or less in size (T1a, bN0M0) clinicopathologic features and outcome. Cancer 79:761–771. https://doi.org/10.1002/(SICI)1097-0142(19970215)79:4%3C761::AID-CNCR13%3E3.0.CO;2-Y
Klintman M, Strand C, Ahlin C et al (2013) The prognostic value of mitotic activity index (MAI), phosphohistone H3 (PPH3), cyclin B1, cyclin A, and Ki67, alone and in combinations, in node-negative premenopausal breast cancer. PLoS ONE 8:e81902. https://doi.org/10.1371/journal.pone.0081902
Abubakar M, Orr N, Daley F et al (2016) Prognostic value of automated KI67 scoring in breast cancer: a centralised evaluation of 8088 patients from 10 study groups. Breast Cancer Res 18:104. https://doi.org/10.1186/s13058-016-0765-6
Viale G, Regan MM, Mastropasqua MG et al (2008) Predictive value of tumor Ki-67 expression in two randomized trials of adjuvant chemoendocrine therapy for node-negative breast cancer. J Natl Cancer Inst 100:207–212. https://doi.org/10.1093/jnci/djm289
Viale G, Giobbie-Hurder A, Regan MM et al (2008) Prognostic and predictive value of centrally reviewed Ki-67 labeling index in postmenopausal women with endocrine-responsive breast cancer: results from Breast International Group Trial 1–98 comparing adjuvant tamoxifen with letrozole. J Clin Oncol 26:5569–5575
Petrelli F, Viale G, Cabiddu M, Barni S (2015) Prognostic value of different cut-off levels of Ki-67 in breast cancer: a systematic review and meta-analysis of 64,196 patients. Breast Cancer Res Treat 153:477–491. https://doi.org/10.1007/s10549-015-3559-0
Thomssen C, Balic M, Harbeck N, Gnant M (2021) St. Gallen/Vienna 2021: a brief summary of the consensus discussion on customizing therapies for women with early breast cancer. Breast Care (Basel) 16:135–143. https://doi.org/10.1159/000516114
Polley MYC, Leung SC, McShane LM et al (2013) An international Ki67 reproducibility study. J Natl Cancer Inst 105:1897–1906. https://doi.org/10.1093/jnci/djt306
Polley MYC, Leung SC, Gao D et al (2015) An international study to increase concordance in Ki67 scoring. Mod Pathol 28:778–786. https://doi.org/10.1038/modpathol.2015.38
Røge R, Nielsen S, Riber-Hansen R, Vyberg M (2019) Impact of primary antibody clone, format, and stainer platform on Ki67 proliferation indices in breast carcinomas. Appl Immunohistochem Mol Morphol 27:732–739. https://doi.org/10.1097/PAI.0000000000000799
Skjervold AH, Pettersen HS, Valla M, Opdahl S, Bofin AM (2022) Visual and digital assessment of Ki-67 in breast cancer tissue - a comparison of methods. Diagn Pathol 17:1–14. https://doi.org/10.1186/s13000-022-01225-4
Acs B, Pelekanou V, Bai Y et al (2019) Ki67 reproducibility using digital image analysis: an inter-platform and inter-operator study. Lab Invest 99:107–117. https://doi.org/10.1038/s41374-018-0123-7
Kwon A-Y, Park HY, Hyeon J et al (2019) Practical approaches to automated digital image analysis of Ki-67 labeling index in 997 breast carcinomas and causes of discordance with visual assessment. PLoS ONE 14:e0212309. https://doi.org/10.1371/journal.pone.0212309
Stålhammar G, Martinez NF, Lippert M et al (2016) Digital image analysis outperforms manual biomarker assessment in breast cancer. Mod Pathol 29:318–329
Stålhammar G, Robertson S, Wedlund L et al (2018) Digital image analysis of Ki67 in hot spots is superior to both manual Ki67 and mitotic counts in breast cancer. Histopathology 72:974–989. https://doi.org/10.1038/modpathol.2016.34
Gudlaugsson E, Skaland I, Janssen EA et al (2012) Comparison of the effect of different techniques for measurement of Ki67 proliferation on reproducibility and prognosis prediction accuracy in breast cancer. Histopathology 61:1134–1144. https://doi.org/10.1111/j.1365-2559.2012.04329.x
Egeland NG, Austdal M, van Diermen-Hidle B et al (2019) Validation study of MARCKSL1 as a prognostic factor in lymph node-negative breast cancer patients. PLoS ONE 14:e0212527. https://doi.org/10.1371/journal.pone.0212527
International Ki67 in Breast Cancer Working Group. https://www.ki67inbreastcancerwg.org/ Published 2009. Accessed 2 Feb 2024
Bankhead P, Loughrey MB, Fernández JA et al (2017) QuPath: Open source software for digital pathology image analysis. Sci Rep 7:16878. https://doi.org/10.1038/s41598-017-17204-5
Acs B, Leung SC, Kidwell KM et al (2022) Systematically higher Ki67 scores on core biopsy samples compared to corresponding resection specimen in breast cancer: a multi-operator and multi-institutional study. Mod Path 35:1362–1369. https://doi.org/10.1038/s41379-022-01104-9
Norwegian Directorate of Health (2023) Nasjonalt handlingsprogram med retningslinjer for diagnostikk, behandling og oppfølging av pasienter med brystkreft (National Guidelines for Diagnosis, Treatment and Follow-up of Patients with Breast Cancer). https://nbcgblog.files.wordpress.com/2023/02/11.01.2023-nasjonalt-handlingsprogram-for-brystkreft-19.-utgave-publisert-11.01.23.pdf. Accessed 12 Dec 2023
Pons L, Hernández-León L, Altaleb A et al (2022) Conventional and digital Ki67 evaluation and their correlation with molecular prognosis and morphological parameters in luminal breast cancer. Sci Rep 12:8176. https://doi.org/10.1038/s41598-022-11411-5
Paik S, Kwon Y, Lee MH et al (2021) Systematic evaluation of scoring methods for Ki67 as a surrogate for 21-gene recurrence score. NPJ breast cancer 7:1–8. https://doi.org/10.1038/s41523-021-00221-z
Nielsen TO, Leung SCY, Rimm DL et al (2021) Assessment of Ki67 in breast cancer: updated recommendations from the international Ki67 in breast cancer working group. J Natl Cancer 113:808–819. https://doi.org/10.1093/jnci/djaa201
Jang MH, Kim HJ, Chung YR, Lee Y, Park SY (2017) A comparison of Ki-67 counting methods in luminal breast cancer: the average method vs. the hot spot method. PLoS ONE 12:e0172031. https://doi.org/10.1371/journal.pone.0172031
Thakur SS, Li H, Chan AM et al (2018) The use of automated Ki67 analysis to predict Oncotype DX risk-of-recurrence categories in early-stage breast cancer. PLoS ONE 13:e0188983. https://doi.org/10.1371/journal.pone.0188983
Røge R, Riber-Hansen R, Nielsen S, Vyberg MJB (2016) Proliferation assessment in breast carcinomas using digital image analysis based on virtual Ki67/cytokeratin double staining. Breast Cancer Res Treat 158:11–19. https://doi.org/10.1007/s10549-016-3852-6
Shui R, Yu B, Bi R, Yang F, Yang W (2015) An interobserver reproducibility analysis of Ki67 visual assessment in breast cancer. PLoS ONE 10:e0125131. https://doi.org/10.1371/journal.pone.0125131
Regional Cancer Centres in Sweden. Bröstcancer vårdprogram - Kvalitetsbilaga för bröstpatologi (KVAST-bilaga) (Breast Cancer Care Program - Quality supplement for breast pathology (KVAST-supplement)). https://kunskapsbanken.cancercentrum.se/diagnoser/brostcancer/vardprogram/kvalitetsdokument-for--patologi/https://kunskapsbanken.cancercentrum.se/diagnoser/brostcancer/vardprogram /kvalitetsdokument-for--patologi/ Accessed 17 Sept 2023
Danish Multidisciplinary Cancer Groups. Patologiprocedurer og molekylærpatologiske analyser ved brystkræft (Pathology Procedures and Molecular Pathology Analyses for Breast Cancer), v.1.3. https://www.dmcg.dk/siteassets/forside/kliniske-retningslinjer/godkendte-kr/dbcg/dbcg_patologiprocedure-v1.3_admgodk040422.pdf. Accessed 17 Sept 2023
Stuart-Harris R, Caldas C, Pinder S, Pharoah P (2008) Proliferation markers and survival in early breast cancer: a systematic review and meta-analysis of 85 studies in 32,825 patients. Breast 17:323–334. https://doi.org/10.1016/j.breast.2008.02.002
Arihiro K, Oda M, Ohara M et al (2016) Comparison of visual assessment and image analysis in the evaluation of Ki-67 expression and their prognostic significance in immunohistochemically defined luminal breast carcinoma. Jpn J Clin Oncol 46:1081–1087. https://doi.org/10.1093/jjco/hyw107
Boyaci C, Sun W, Robertson S, Acs B, Hartman J (2021) Independent clinical validation of the automated Ki67 scoring guideline from the international Ki67 in breast cancer working group. Biomolecules 11:1612. https://doi.org/10.3390/biom11111612
Robertson S, Acs B, Lippert M, Hartman J (2020) Prognostic potential of automated Ki67 evaluation in breast cancer: different hot spot definitions versus true global score. Breast Cancer Res Treat 183:161–175. https://doi.org/10.1007/s10549-020-05752-w
Dowsett M, Nielsen TO, A’Hern R et al (2011) Assessment of Ki67 in breast cancer: recommendations from the International Ki67 in breast cancer working group. J Natl Cancer Inst 103:1656–1664. https://doi.org/10.1093/jnci/djr393
Paik S, Shak S, Tang G et al (2005) Expression of the 21 genes in the recurrence score assay and tamoxifen clinical benefit in the NSABP study B-14 of node negative, estrogen receptor positive breast cancer. J Clin Oncol 23:510. https://doi.org/10.1200/jco.2005.23.16_suppl.510
Sparano JA, Gray RJ, Ravdin PM et al (2019) Clinical and genomic risk to guide the use of adjuvant therapy for breast cancer. N Engl J Med 380:2395–2405. https://doi.org/10.1056/NEJMoa1904819
Piccart M, van’t Veer LJ, Poncet C et al (2021) 70-gene signature as an aid for treatment decisions in early breast cancer: updated results of the phase 3 randomised MINDACT trial with an exploratory analysis by age. Lancet Oncol 22:476–488. https://doi.org/10.1016/S1470-2045(21)00007-3
Ohnstad HO, Borgen E, Falk RS et al (2017) Prognostic value of PAM50 and risk of recurrence score in patients with early-stage breast cancer with long-term follow-up. Breast Cancer Res 19:1–12. https://doi.org/10.1186/s13058-017-0911-9
Sestak I, Buus R, Cuzick J et al (2018) Comparison of the performance of 6 prognostic signatures for estrogen receptor–positive breast cancer: a secondary analysis of a randomized clinical trial. JAMA Oncol 4:545–553. https://doi.org/10.1001/jamaoncol.2017.5524
Ohnstad H, Borgen E, Mortensen E et al (2023) 103P Impact of Prosigna test on treatment decision in lymph node-negative early breast cancer: a prospective multicenter study (EMIT1). ESMO Open Abstract ESMO Open 8:7. https://doi.org/10.1016/j.esmoop.2023.101327
Acknowledgements
The authors would like to thank Marit Nordhus for her technical assistance and the Units for Immunohistochemistry and Histology at Stavanger University Hospital for their technical support.
Funding
Open access funding provided by University of Stavanger & Stavanger University Hospital. The study was funded by the Helse Vest Strategic Research Fund as part of the Pathology in Western Norway project. The Visiopharm® CE-IVD Ki67 Application, used in this study, was provided free-of-charge for academic purposes. Visiopharm® did not provide any financial support and did not have any additional role in study design, data collection and analysis, and decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
Emma Rewcastle and Emiel AM Janssen designed the study. Emma Rewcastle, Ivar Skaland and Emiel AM Janssen contributed to the development of the methodology. Emma Rewcastle, Jan PA Baak, Einar Gudlaugsson, Silja K Fykse, and Emiel AM Janssen performed data acquisition. Emma Rewcastle performed statistical analyses. Emma Rewcastle, Ivar Skaland, Einar Gudlaugsson, Jan PA Baak and Emiel AM Janssen interpreted the data. Einar Gudlaugsson and Jan PA Baak acted as medical consultants. Emma Rewcastle wrote the manuscript. All authors read, corrected, and approved the final paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare there are no competing interests.
Ethical approval
The study was retrospective and received approval from the Regional Ethics Committee of Health West Norway (2010/1241) and informed consent waived. The study was performed in accordance with the Declaration of Helsinki.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Rewcastle, E., Skaland, I., Gudlaugsson, E. et al. The Ki67 dilemma: investigating prognostic cut-offs and reproducibility for automated Ki67 scoring in breast cancer. Breast Cancer Res Treat 207, 1–12 (2024). https://doi.org/10.1007/s10549-024-07352-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10549-024-07352-4