Introduction

Sustained proliferative signalling is one of the hallmarks of cancer, as proposed by Hanahan and Weinberg in 2011 [1]. The nuclear antigen detected by the Ki-67 antibody is a marker of the growth fraction of a tumour. It is expressed in the G1, S, G2 and M phases of the cell cycle, but not in the resting phase, G0. While expression levels are low in G1 and S, they peak during G2 and M [2]. In breast cancer (BC), immunohistochemical (IHC) staining of the Ki-67 antigen is commonly used in the assessment of the proliferative activity of the tumour. It can provide information on prognosis and predict response to treatment in the adjuvant and neoadjuvant settings [3,4,5,6]. High Ki-67 score is associated with poor prognosis [7] but also a good response to chemotherapy [8, 9].

In molecular subtyping of BC, Ki-67 can be used to distinguish between Luminal A-like (Ki-67 low) and HER2 negative Luminal B-like (Ki-67 high) BC subtypes [10, 11]. While Luminal A patients generally have a good prognosis and may qualify for endocrine treatment only, Luminal B patients have a poorer prognosis and will often be given chemotherapy in addition. Thus, differentiation between these two subtypes has important therapeutic value [8, 10, 12].

Although the clinical validity of the Ki-67 Proliferation Index is accepted in BC, its clinical utility is still regarded as limited and there is a lack of consensus on the appropriate number of cells to count and cut-off levels for prognostication and treatment [13]. Furthermore, inter- and intra-observer agreement in the assessment of Ki-67 is poor [14,15,16,17,18,19].

Ki-67-staining is often heterogeneous within a tumour [20, 21]. In the assessment of Ki-67 IHC, only positively stained nuclei and mitotic figures should be scored, regardless of staining intensity, and between 500 and 1000 tumour cells should be counted in hotspot areas [22, 23]. According to the International Ki67 in Breast Cancer Working Group, Ki-67 levels between 5% and 30% are subject to considerable interobserver and interlaboratory variability. They suggest that only very low (< 5%) or very high (≥ 30) levels should be considered clinically actionable [13, 24]. To ameliorate issues of inter-laboratory variation, the 14th St. Gallen International Breast Cancer Conference in 2015 proposed that the in-house median value at each laboratory should be used to determine cut-off values due to interlaboratory differences [17].

Several studies have suggested the use of automated digital image analysis (DIA) to improve reproducibility in the assessment of Ki-67. With the introduction of DIA, it should be possible to redefine interpretation algorithms for biomarker assessment for both established clinical and novel biomarkers in BC, and address the issue of inter- and intraobserver variation in the interpretation of these biomarkers [15, 18, 19, 25,26,27,28,29].

In this study we compared visual assessment (VA) and DIA of tissue sections stained for Ki-67 in a consecutive series of BCs. The aim was to identify the number of tumour cells necessary to count in each method to reflect the growth potential of a given tumour, as measured by tumour grade, mitotic count and patient outcome.

Materials and methods

Study population

The study comprises 250 BCs from a larger series of BC patients. The background population from which this series arises comprises 25,727 women born between 1886 and 1928 in Nord-Trøndelag County in Norway, who were followed for BC occurrence from 1961 to 2008. In total, 1379 cases of BC were diagnosed during follow-up, and 909 of these tumours were classified into six molecular subtypes using IHC and chromogenic in situ hybridization (CISH) as surrogates for gene expression analysis [30]. After diagnosis, all patients were followed until death from BC, or death from other causes or until December 31st, 2015 [30, 31].

In the present study, we included 250 consecutive cases of invasive carcinoma of no special type [32]. Two cases were excluded due to unsatisfactory staining (Fig. 1).

Fig. 1
figure 1

Flowchart showing an overview of the cases included in this study

Immunohistochemistry

Full-face sections 4 μm thick, mounted on SuperFrost glass slides, were retrieved from storage (-20 °C). Paraffin was removed using TissueClear and sections were rehydrated with ethanol and water. Slides were heated at 60 °C for two hours and pretreated in a PT Link Pre-Treatment Module for Tissue Specimens (Dako Denmark A/S, 2600 Glostrup, DK) with a buffer (Low pH Target Retrieval Solution K8005) at 97 °C for 20 min. The Ki-67 antibody was applied (Clone MIB1, 35 mg/L, 1:100, Dako Denmark A/S, Glostrup, Denmark) in a DakoCytomation Autostainer Plus (Dako), with 40 min incubation time. Dako REAL™EnVision™ Detection System with Peroxidase/DAB+, Rabbit/Mouse (K5007), was used for visualization.

Digital image analysis

The IHC-stained slides were scanned at 40X magnification with a resolution of 0.23 μm/pixel using Hamamatsu NanoZoomer S360 Digital Slide scanner C13220-01 (Inter Instruments AS) at the Department of Pathology, St. Olav’s Hospital, Trondheim University Hospital, Norway. The digital images were analysed for Ki-67 protein expression using the open-source, DIA software QuPath v. 0.1.2 [27].

Training of the classifier

A separate series of 19 representative cases from the main cohort were used as a training set to train a two-class object classifier in QuPath after watershed nucleus detection [27]. The tumour area was delineated manually in the QuPath software. Cell nuclei (training objects) were selected and defined as either epithelial tumour cell nuclei or other (non-tumour cell nuclei or tumor stroma cell nuclei) in the whole slide images (WSI).

In the training set, stains were digitally separated using the colour deconvolution method and the automated “Estimate stain vectors” function in QuPath [27]. Watershed cell nucleus detection was performed and optimized visually using the following settings: Optical density (OD) sum; requested pixel size 0.4 μm; background radius 8.0 μm; median filter radius 1.5 μm; sigma 1.5 μm; min/max area 10/350 µm; threshold 0.02; maximum background intensity 3.0; and cell expansion 5 μm. Smoothing of object features (25, 50 and 100 μm) was applied. The threshold value for Ki-67-positivity (nucleus DAB OD mean) was assessed and adjusted manually, to best correspond to the visual perception of Ki-67 positivity in VA. Hence, the threshold was finally set to 0.15 nucleus DAB OD mean for all slides.

A cell nucleus detection object two-class Random Trees classifier (tumour cell nuclei vs. non-tumour cell nuclei) was trained using the default settings [27]. Training continued until visibly acceptable classification was achieved using 67% equally spaced train/test-split, resulting in approximately 85% accuracy. This was obtained using 7514 training objects and 135 object features from the 19 annotated images in the training set. The classifier was saved and applied to the watershed nucleus detections within the manually annotated tumor areas of all 248 cases in this study.

All nuclei in the tumour were detected by running positive cell nucleus detection provided by QuPath, and then sub-classified into epithelial tumour cell nuclei and other intra-tumoural nuclei by the trained classifier. Due to the heterogeneity of BC tissue, additional annotations were subsequently added to the classifier for most of the digital images until visually acceptable discrimination between epithelial tumour cell nuclei and all other nuclei was achieved for each WSI. Examples of annotation of training objects are shown in Fig. 2.

Fig. 2
figure 2

A Overview image from QuPath showing cell nucleus detection and classification. B Arrows indicate elongated stromal nucleus and lymphocyte (green); Ki-67 positive tumour cell nucleus (red) and Ki-67 negative tumour cell nucleus (blue)

Digital Ki-67 hotspot identification

The tumour area in each of the 248 full-face sections was delineated manually by an experienced breast pathologist and the manual delineation was thereafter used to guide digital delineation of the tumour in the WSIs in the QuPath software. Ki-67 positive tumour hotspot areas were identified using a semi-automated approach by generating measurement heat maps in QuPath by visualizing nucleus DAB OD mean: Smoothed 50 μm. The heat maps were manually adjusted for each WSI to identify and annotate the area with the highest density of Ki-67 positive tumour cell nuclei (Fig. 3A-D). Areas with obvious artefacts resulting in false hotspots were manually excluded.

Fig. 3
figure 3

A HES stained WSI; B IHC stained WSI (Ki-67) with manually delineated tumour area (red); C Cell detection within the tumor area and tumor (blue/red) and non-tumor (green) classified cells; D Measurement heat map showing Ki-67 hot-spots in red

Scoring and reporting

Visual assessment

Visual assessment of Ki-67 proliferation rate was done using a brightfield microscope (Nikon Eclipse 80i) at 40x magnification. A total of 500 tumour cell nuclei (5 × 100) were counted in visually selected hotspot areas in each case, starting with the group of 100 cell nuclei which appeared to have the highest proportion of Ki-67 positive cells. The number of positive-staining tumour cell nuclei was recorded separately for each 100-cell increment counted.

Digital image analysis

All cases were assessed for Ki-67 expression using the QuPath software. Once the Ki-67 tumor hotspot was identified using the measurement heat map, five areas containing 100 tumour cell nuclei were manually delineated using the QuPath “brush tool”. Counting started in the group of 100 nuclei that, within the identified hotspot, appeared to have the highest density of positive staining nuclei according to the heat map and continued in decreasing order of density until five sets of 100 nuclei were counted (Fig. 4).

Fig. 4
figure 4

A and B Hotspot identification and delineation images from QuPath. Areas of 100 tumour cell nuclei ordered from the area with the highest proportion of Ki-67 positive tumour cell nuclei [1] to the lowest [5]

Cut-off levels for Ki-67 Low/Intermediate/High positivity

We determined cut-off levels based on the median Ki-67 values for each method according to the St. Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2015 [17]. Ki-67 Low was defined as 10% points below the median, and Ki-67 High as 10% points above the median. Values falling between Low and High were classified as Intermediate. The median values of Ki-67 positivity using VA and DIA were calculated for 100 cells (VA100, DIA100); 200 cells (VA200, DIA200); 300 cells: (VA300, DIA300); 400 cells (VA400, DIA400); and 500 cells (VA500, DIA500) (Fig. 5). In the statistical analyses, only the results for VA/DIA100 and VA/DIA500 were used.

Fig. 5
figure 5

Median Ki-67 at each cumulative 100-cell increment for visual assessment (VA) and digital image analysis (DIA)

Statistical analyses

Tumour characteristics were compared using Pearson’s Chi squared test across categories of VA and DIA (Low, Intermediate and High as described above) for 100 and 500 nuclei counted. Bland-Altman plots were used to evaluate the agreement between VA500 as the reference measurement, and DIA100 and DIA500, by estimating the difference between the methods in relation to the mean. Cumulative incidence of death from BC was calculated for VA100, VA500, DIA100 and DIA500, treating death from other causes as competing events. Gray´s test was used to compare equality between cumulative incidence curves. Cox proportional hazard analyses were used to estimate hazard ratios (HR) of BC death, with censoring at death from other causes. Harrell’s C-test was used to compare the predictive ability of VA100, VA500, DIA100 and DIA500. All analyses were performed using Stata v. 16.0 (StataCorp LP, College Station, Texas, USA).

Results

Patient and tumour characteristics are presented in Table 1. Of the 248 patients evaluated in this study, 108 had died of BC and 124 had died of other causes by the end of follow-up. There were 16 (6.5%) histopathological grade 1, 131 (52.8%) grade 2, and 101 (40.7%) grade 3 tumours.

Table 1 Patient and tumour characteristics according to Ki67 visual assessment (VA) and digital image analysis (DIA) of full face tissue sections

Cut-off levels for Low/Intermediate/High Ki-67 positivity

Cut-off levels for Ki-67 positivity were calculated for both VA and DIA according to the median Ki-67 values after 500 tumour cell nuclei were counted (VA500, DIA500). The median Ki-67 level was 22.3% for VA500 and 30.0% for DIA500, as shown in Fig. 5. Thus, for the present study, cut-off levels for VA were set at < 12.3% (Ki-67 Low), ≥ 12.3 ≤ 32.3% (Ki-67 Intermediate) and > 32.3% (Ki-67 High). For DIA, cut-off levels were set at < 20.0% (Ki-67 Low), ≥ 20.0 ≤ 40.0% (Ki-67 Intermediate) and > 40.0% (Ki-67 High).

In VA, there was no clear difference between the median values of the five cumulative 100-cell increments (VA100-VA500) (range 22.3-23.2%). Using DIA, the median value for both DIA100 and DIA200 was 34.0%, falling to 30.0% at DIA500. Cumulative median values for all 100-cell increments (both VA and DIA) are shown in Fig. 5.

Visual assessment

Using the VA median-derived cut-off levels, 48 cases (19.4%) were classified as Ki-67 Low at VA100, falling to 44 (17.7%) at VA500. Twelve cases were upgraded from Ki-67 Low at VA100 to Ki-67 Intermediate at VA500. None were upgraded from Low to High. Similarly, a total of 123 cases (49.6%) were classified as Intermediate at VA100, rising to 132 (53.2%) at VA500. Eight of these cases were downgraded from Intermediate at VA100 to Low at VA500, and eight were upgraded to High. A total of 77 cases (31.4%) were classified as High at VA100 falling to 72 (29.0%) at VA500. Thirteen cases were downgraded from High at VA100 to Intermediate at VA500, and none were downgraded to Low (Fig. 6).

Fig. 6
figure 6

Number of cases in each Ki-67 category (Low, Intermediate and High) for each 100 cell-increment in A Visual assessment (VA) and B digital image analysis (DIA)

Digital image analysis

Using the DIA median-derived cut-off levels, 44 cases (17.7%) were classified as Low at DIA100, rising to 75 (30.2%) at DIA500. Thus, with increasing number of cells counted a further 31 cases (12.5%) were classified as Low. None were upgraded from Low to Intermediate at DIA500. One hundred and four cases (41.9%) were classified as Intermediate at DIA100, falling to 94 cases (37.9%) at DIA500. Thirty cases were downgraded from Intermediate at DIA100 to Low at DIA500. None were upgraded to High. One hundred cases (40.3%) were classified as High at DIA100, falling to 79 (31.9%) at DIA500. Twenty-six cases were downgraded from High at DIA100 to Intermediate at DIA500. None were downgraded from High to Low (Fig. 6).

The numbers of cases classified as Low were similar in VA100 (48 cases), VA500 (44 cases) and DIA100 (44 cases) but increased at DIA500 (75 cases). The number of cases classified as High was greatest at DIA100 (100 cases), falling to levels comparable with VA100 (77 cases) and VA500 (72 cases) at DIA500 (79 cases) (Table 1; Fig. 6).

Ki-67 and histopathological grade

Grade 1

Among the 16 Grade 1 tumours, six (37.5%) tumours were classified as Ki67 Low at VA500. Five cases were classified as Low at DIA100 rising to nine (56.3%) at DIA500 (Table 1).

Grade 2

Of the 131 Grade 2 tumours, 13 (9.9%) were classified as High at VA500. Using DIA, 30 (22.9%) were High at DIA100 falling to 21 (16%) at DIA500. A higher number of Grade 2 tumours were classified as Intermediate in VA compared to DIA (Table 1).

Grade 3

Of the 101 Grade 3 tumours, 59 (58.4%) were classified as High at VA500. Using DIA, 70 (69.3%) were High at DIA100, falling to 58 (57.4%) at DIA500. The number of Grade 3 tumours classified as Low was greatest at DIA500 (12 (16%)) (Table 1).

Ki-67 and mitotic count

There was a clear association (p < 0.001) between high mitotic count (> 14.5 mitoses/10 HPF) and Ki-67 High across all counting modalities. The highest number of cases were observed at DIA100 where 51 of 62 (82.3%) cases with high mitotic count were classified as Ki-67 High (Table 1).

Ki-67 and prognosis

There was no clear association between Ki-67 cell counts and risk of death. By the end of follow-up, 108 (43.5%) patients had died of BC.

For VA100 High, the cumulative risk of death from BC during the first five years after diagnosis was 32.5% (95% CI 23.3–44.2), and 46.8% (95% CI 36.4–58.5) 10 years after diagnosis.

For VA500 High, the corresponding risks were 37.5% (95% CI 27.5–49.7) and 48.6% (95% CI 37.8–60.7), respectively. Using VA500 Low as the reference, the rate of death from BC was unchanged for VA500 Intermediate but was higher for VA500 High (HR 1.94 ((95% CI 1.1–3.4))(Table 2; Fig. 7A).

Table 2 Risk of death from breast cancer according to Ki-67 level and counting procedures, expressed as cumulative incidence and hazard ratios of death from breast cancer
Fig. 7
figure 7

Cumulative incidence of death from breast cancer. A Visual assessment (VA), Grays’ test for VA100 (Low, Intermediate and High) P = 0.1062; Grays’ test for VA500 (Low, Intermediate and High) P = 0.0500. B Digital image analysis (DIA), Gray’s test for DIA100 (Low, Intermediate and High) P = 0.3335; Grays’ test for DIA500 (Low, Intermediate and High) P = 0.0796

For DIA100 High, the cumulative risk of death from BC during the first five years after diagnosis was 31.0% (95% CI 22.9–41.1) and after 10 years 44.0% (CI 34.9–54.3).

For DIA500 High, risk was 32.9% (CI 23.7–44.4) within the first five years, and 44.3% (CI 34.2–55.9) within the first 10 years.

Using DIA100 Low as the reference, the rate of death from BC was unchanged for DIA100 Intermediate but was higher for DIA100 High (HR 1.80 (95% CI 1.02–3.19), Table 2; Fig. 7B).

Comparison of methods

The Bland-Altman plots show that both DIA100 and DIA500 were clearly correlated to VA500. However, the mean values for Ki-67 using DIA (100 and 500) were on average higher than those for VA500, and the differences between DIA and VA500 increased with increasing mean values (Fig. 8). Harrell’s C test showed no clear difference in predictive ability between the VA and DIA methods. A Cox model including grade and DIA100 correctly predicted survival times in 61% of cases, compared to 60% of cases for models combining grade and any one of the other three methods (VA100, VA500 and DIA500).

Fig. 8
figure 8

Bland-Altman plots illustrating the agreement of the VA and DIA methods. A The difference between DIA100 and VA500 compared to the mean of DIA100 and VA500. B The difference between DIA500 and VA500 compared to the mean of VA500 and DIA500

Discussion

In this study we compared Ki-67 protein expression in IHC-stained BC tissue sections assessed by DIA using the QuPath platform, and by VA according to current recommended guidelines [22, 23]. We found that the median Ki-67 level was higher using DIA compared to VA. We show that while the proportion of Ki-67 positive tumour cells did not change substantially with increasing number of cells counted using VA, the number of cells counted did impact the result when using DIA. Furthermore, the highest proportion of patients with Ki-67 High tumours was found when 1-200 cells were counted using DIA. All counting methods predicted a poor prognosis for patients with the highest Ki-67 levels, but with little difference between the methods.

Gerdes proposed in 1984 that, with the help of the monoclonal antibody Ki-67, we now had a simple means of estimating the growth fraction of a given subset of human cells. This would be of particular interest in tumour diagnostics since the proportion of proliferating cells in given neoplasms would be of prognostic value and could contribute to the determination of treatment strategies [2]. Ki-67 is now used as a prognostic marker and may also be used as a predictive marker of response to chemotherapy [7,8,9]. There has been considerable debate regarding counting methods and cut-of levels for both prognostication and determination of treatment [10, 16, 33,34,35,36,37].

At the St. Gallen conference in 2015, it was proposed that the in-house median value at each laboratory should be used to determine cut-off values to offset interlaboratory differences [17]. More recently, the 17th St. Gallen International Breast Cancer Conference proposed that Ki67 should be used to determine treatment in estrogen receptor-negative, HER2-negative T1-2N0-1 BC in accordance with the International Ki67 Breast Cancer Working Group. The determination of cut-off levels is still challenging as reflected by these latest recommendations where only clearly low or clearly high levels of KI67 protein expression are considered to have clinical utility [13, 24]. Romero and co-workers suggested in 2014 a stepwise counting strategy without fixed denominators, especially to target heterogenetic tumours with some highly proliferative hotspots [29] and the International Ki67 Breast Cancer Working Group has proposed a standardized visual scoring method using a scoring app available online [13]. Thus, the need for a standardized approach in the IHC assessment of Ki-67 in BC has been recognized.

In this study, we found clear differences in the median levels of Ki-67 positivity between VA and DIA (VA500 (22.3%) and DIA500 (30%)) reflecting the respective method’s ability to identify hotspot areas in the tissue section. This is in agreement with previous studies [38,39,40,41]. Still, others have reported no real differences between the two methods [38, 41,42,43,44]. In the present study, the threshold set for OD sum in DIA and thus the ability to digitally detect positive Ki-67 staining, was set close to the pathologist’s threshold for positive staining before commencement of classifier training and digital assessment. The difference between the median values in VA and DIA, suggests that there is need for calibration of cut-off levels according to the method employed. The Bland-Altman plot [45, 46] shows that the methods perform quite similarly but that DIA in general reported higher levels of Ki-67 positivity compared to VA. Introduction of DIA for the assessment of Ki-67 in our hands would thus require recalibration of cut-off levels in order to correspond to established clinically actionable Ki-67 levels. This underlines the importance of understanding the consequences the introduction of a new method may have on patient treatment. However, Harrell’s C test [47] and risk-of-death analyses did not show any clear difference between methods in their ability to predict survival.

Recent studies have suggested that downgrading of Ki-67 levels in some tumors may occur in VA when more than 2-300 cells are counted [29, 48]. However, in the present study we found that there was little difference in the percentage of Ki-67 positive cells in each of the five 100-cell increments across cut-off levels using VA. This would imply that it may not be necessary to count more than 2-300 cells in VA. On the other hand, there was a clear fall in the number of Ki-67 High cases and a corresponding rise in the number of cases classified as Low with increasing cell counts using DIA. Thus, using DIA, the highest proportion of Ki-67 positive cell nuclei is achieved by counting 1-200 cells in digitally identified hotspots. This appears to be in agreement with Romero et al. [29]. In our hands, a significantly higher number of grade 3 tumours was found in DIA100 High compared to VA100 High, VA500 High and DIA500 High (p < 0.0001). Thus, we show that declining Ki-67 levels are more likely to occur using DIA compared to VA. A greater number of deaths from BC was seen at DIA100 Ki-67 High compared to DIA500 Ki-67 High (54 vs. 42 cases; 50.0% vs.38.9%). In comparison, for VA, the difference in the numbers of deaths from BC between the VA100 Ki-67 High group and VA500 Ki-67 High group were negligible (43 vs. 42 cases; 40.0% vs. 38.9%).

The cases included in our study were diagnosed with BC over a timespan extending from 1961 to 2008, and pre-analytical conditions may have varied. Ki-67 IHC is robust in formalin-fixed, paraffin-embedded tissue [49, 50] and antigenicity is well preserved, though staining intensity is prone to be reduced with increasing storage-time [51,52,53]. In the present study, staining intensity was not assessed. The international Ki-67 in Breast Cancer Working Groups has expressed concern about Ki-67 assessment of tissue stored in paraffin-blocks for more than five years, because of the degradation of the epitope in paraffin blocks. The exact mechanisms of the Ki-67 epitope degradation are not yet fully explored and there is still concern about the precision of the assessment. They recommend that the internationally standardized laboratory guidelines (ASCO and CAP) for HER2 and hormone receptors should also be applied to Ki-67 IHC [13]. Variation in tissue processing, staining reagents, laboratory protocols, and digitization procedures, may all contribute to variability in the interpretation of IHC in both conventional VA and DIA. Standardization of the preanalytical and analytical phases of tissue processing would greatly contribute to the creation of a more robust classifier for the digital analysis, although BC’s inherent heterogeneity would still remain a challenge [21, 54, 55]. In the present study, we included only invasive cancer (not otherwise specified). The classifier would require further development to reliably identify tumour cell nuclei morphologies such as those typical of lobular carcinoma. We found that some tissue slides were not suitable for DIA due to artefacts such as tissue folds, damaged tissue, or inadequate staining.

Studies comparing the QuPath platform with other digital analysis platforms have shown good reproducibility and functionality [38, 56]. One study comparing DIA using QuPath with VA shows that QuPath gave stronger prognostic stratification than the manual method [57]. The QuPath software was developed to improve the efficiency, objectivity, and reproducibility of digital histopathology, as well as biomarker analysis using digital images [27]. In the present study a greater number of cases were classified as either Low or High using QuPath DIA compared to conventional VA. Using the Ventana Virtuoso platform, Kwon et al., reported high concordance between VA and DIA, and stronger accuracy using DIA in the High Ki-67-group (≥ 20%) compared to the low Ki-67-group (≤ 10%). They also found that DIA is more useful in the borderline cases between cut-off levels citing observer variation as a greater challenge in these cases [55].

The initial regions of interest on the WSIs were manually delineated using the brush tool in QuPath. This approach was time-consuming, and automatic tissue detection or WSI annotation would be preferable. The first 100-cell increment counted by DIA was visually selected within the area of the tumour with the highest expression of Ki-67 in the heat map. To identify these hotspots, we created measurement maps for nucleus DAB OD mean with 50 μm smoothing. In this process we were aware that tissue folds, ink debris and abundant lymphocytes could result in higher OD in non-relevant areas. Thus, the measurement map method for detecting hotspots may not be suitable in sections with too many such irregularities and artefacts. We noted that membranous staining presented a greater challenge to the QuPath software than to experienced pathologists. A pathologist will ignore non-relevant staining, while the software will detect anything with color, unless the classifier is trained to ignore it.

In the present study, the QuPath-based DIA method entailed a considerable amount of manual adjustment, thus rendering it time-consuming and impractical for implementation in a clinical setting. Robertson et al. published a paper in 2020 that suggested that a digital global scoring of Ki-67 was a practical and clinically valid approach [58]. The International Ki67 in Breast Cancer Working Group discuss several methods including global score and hot spot score in addition to their own online scoring app giving a weighted global score based on the assessment of 100 cells in each of four areas in the tumour section (negligible, low, medium, or high). To the best of our knowledge, the latter has not achieved widespread acceptance. They point out that none of the current scoring systems achieved high analytical validity [13]. Global scoring was not evaluated in the present study. We chose to follow the guidelines for visual assessment of Ki-67 in BC currently in use in Norway, counting 500 cells in the area of the tumour with highest proliferation as assessed under the light microscope [23]. We used the same approach in the digital assessment. We acknowledge that this method may have drawbacks but in comparing the two methods our main finding remains that recalibration of cut-off levels is essential when introducing new methodology in the assessment of tissue biomarkers [23].

The number of cases in this study was limited and thus survival analyses should be interpreted with caution. Our results need to be validated in larger series of cases from other sources. However, the study clearly illustrates that new methodology in biomarker assessment requires recalibration of established cut-off levels.

Conclusions

In this study we show that assessment of Ki-67 in breast tumours using DIA identifies a greater proportion of cases with high Ki-67 levels compared to VA of the same tumours. Using VA, we found that the results do not change substantially with increasing number of cells counted. However, we propose that, using DIA, it may be sufficient to count 1-200 cells in a digitally selected hotspot area to identify the greatest number of cases with Ki-67 High tumours. Associations with survival should be interpreted with caution due to the limited number of cases and variation of pre-analytical conditions of the tissue samples in this study. Finally, our findings underline the need for recalibration of established cut-off levels on the introduction of new methodology.