Introduction

Breast cancer is the most common type of cancer among women worldwide, and one of the leading causes of cancer-related death [1]. A diversity of histological and molecular parameters exists to predict prognosis and survival [2]. Immunohistochemistry for Ki67 (MKI67), a nuclear antigen which is present in all but the G0 phase of the cell cycle and therefore expressed in proliferating cells, can be used to determine tumor proliferation index [3]. Ki67 is a prognostic and predictive marker in breast cancer patients used in both clinical practice and clinical trials [4, 5]. However, Ki67 staining is subject to intra-tumoral heterogeneity and Ki67 scoring is prone to inter- and intra-observer variability, especially with ‘eyeballing’ [6,7,8,9,10]. Manual counting is time-consuming as at least 500–1000 cells have to be counted to achieve acceptable error rates and to correct for heterogeneity [4, 5].

Recently, digital image analysis (DIA) has emerged as a reproducible and less time-consuming alternative to manual scoring of Ki67 in breast cancer, which potentially offers a standardized diagnostic solution [4, 5, 11]. Several studies report high concordance between manual scoring and DIA [12,13,14]. However, these studies focus mainly on small tumor areas, either tissue microarrays (TMAs) or specific regions of interest (ROIs) within larger sections, which does not take into account intra-tumoral Ki67 heterogeneity. In clinical practice, Ki67 scoring is often performed on whole tissue sections, which is also promoted by the International Ki67 in Breast Cancer Working Group, who recommends ‘an approach that assesses the whole section’ [5]. For DIA on Ki67-stained sections, the distinction between tumor and non-tumor tissue is vital to avoid over- or underestimation of Ki67 proliferation index due to counting of non-neoplastic cells. However, manual tumor outlining in the large tissue areas of whole tumor sections is impractical, and tissue classifiers based on morphological characteristics can be relatively inaccurate [15,16,17]. Physical dual staining offers a possible solution, by identifying tumor with cytokeratin in addition to Ki67 on the same section, but DIA on this method is impaired by overlapping chromogens and pixel intensities of both stains [18, 19]. A novel method which circumvents this issue is virtual dual staining (VDS), in which serial sections stained with Ki67 and cytokeratin are digitally aligned [14, 15]

Studies comparing manual scoring and DIA have used different platforms by various vendors, which have unique image analysis algorithms to determine Ki67 proliferation index [12,13,14,15,16,17]. As these algorithms have different approaches to classify tissue and cellular components, inter-platform variability may be expected [15, 20]. To the best of our knowledge, all studies up to this date have implemented only one DIA platform per study and therefore have not examined inter-platform agreement.

The aims of this study were to validate DIA of Ki67 in breast carcinomas in a clinical setting using VDS on whole sections by comparing a manual whole section scoring protocol with automated scoring, and to assess inter-platform agreement between two independent DIA platforms.

Materials and methods

Resection specimens of 154 consecutive primary invasive breast carcinomas treated in the University Medical Center Groningen (The Netherlands) between August 2015 and February 2017 were prospectively included. Patient and tumor characteristics are shown in Table 1.

Table 1 Patient and tumor characteristics

Immunohistochemistry

Three-micrometer serial sections were cut from formalin-fixed paraffin-embedded tumor blocks during normal clinical workflow. Adjacent sections were stained for Ki67 (CONFIRM anti-Ki-67 (30-9) rabbit monoclonal antibody, Ventana Medical Systems, Illkirch, France) and cytokeratin 8/18 (CK8/18 (B22.1 & B23.1) mouse monoclonal antibody, Ventana Medical Systems) on a Ventana BenchMark Ultra immunostainer (Ventana Medical Systems). Antibodies were pre-diluted by the manufacturer and staining was performed following the manufacturer’s protocols. Antigen retrieval times were 36 min for Ki67 and 64 min for CK8/18 (both using Cell Conditioning 1, pH 9, Ventana Medical Systems). Antibody incubation times were 28 min for Ki67 and 32 min for CK8/18. Antibody amplification was applied for CK8/18 (not for Ki67), using the Ventana Amplification Kit (Ventana Medical Systems).

Image acquisition and DIA platforms

Digital images were acquired by scanning the glass slides in a Philips Ultra Fast Scanner 1.6 (Philips, Eindhoven, The Netherlands) with a 40× magnification lens, using a single focus layer without Z-stacking. Tissue detection with focus points was applied automatically to obtain the optimal image. Digitalized slides were stored on a centralized image server and a direct link with this server was established in both DIA platforms. The DIA platforms were Visiopharm Integrator System (VIS) platform version 6.9.0.2779 (Visiopharm, Hørsholm, Denmark) and HALO platform version 2.0.1061 (Indica Labs, Corrales, New Mexico, United States).

Manual counting

Manual counting of Ki67 proliferation index was performed by a resident pathologist (TK), using a protocol based on the ‘whole section scoring protocol’ by the International Ki67 in Breast Carcinoma Working Group, with ROIs to represent the spectrum of staining in the whole section [5, 8]. On the digital image, three 0.500 mm2 ROIs were annotated within areas with high, medium, and low proliferation, respectively. If only two area types were present, two of three ROIs were selected in the area comprising the most common proliferation rate. Of the three ROIs, at least one ROI was selected centrally and at least one peripherally (i.e., the invasive edge) in the tumor. One of the ROIs was a hot spot, if present. In each ROI, 200 cells were counted in a ‘typewriter’ pattern (i.e., counting in rows within the ROI, from top to bottom, to assure a reproducible counting method) [7, 8]. Any definite brown nuclear staining was considered positive. Ki67 proliferation index representative of the whole tumor section was then calculated by dividing the number of Ki67 positive cells by the total number of counted cells (600 cells for each case).

Digital image analysis

A training set of 20 randomly selected breast carcinoma cases obtained between January and August 2015, which were identically handled and stained but not included in the current study, was used to calibrate tissue classification by CK8/18 in VDS and nuclear classification of Ki67 in both DIA platforms. Calibration was done in close collaboration with both platform vendors, independently of each other. In both platforms, VDS was applied to digitally align corresponding Ki67- and cytokeratin-stained sections. During this process, the algorithms automatically perform distortion and rotation modifications to eliminate small differences due to tissue and section processing. Alignment was verified visually for each case, and misaligned cases were excluded from further analysis. The algorithms were then set to use the cytokeratin-stained area as the tumor classifier on the Ki67-stained section. Within the whole tissue section, the complete invasive tumor area was annotated. If present, large areas of carcinoma in situ, pre-existent epithelium, and tissue or staining artifacts were excluded. Ki67 positivity was analyzed with nuclear classification algorithms which detect nuclei by morphological form and size and classify these as positive or negative based on pixel color and intensity. In both platforms, named ‘platform A’ and ‘platform B’ henceforth, Ki67 proliferation index was calculated by dividing the number of Ki67 positive cells by the total number of positive and negative cells within the area classified as tumor by VDS. In cases with a ≥ 10% Ki67 difference by DIA versus manual counting and intra-tumoral Ki67 heterogeneity, additional DIA was performed on the manually counted ROIs only, to evaluate representativeness of these ROIs for the whole tumor.

Statistical analysis

Spearman’s correlation coefficients were calculated for inter-observer agreement between Ki67 proliferation index by manual counting and by one of the two DIA platforms, as well as for inter-platform agreement. Scatterplots and Bland–Altman plots were created to assess inter-observer and inter-platform correlation and agreement in relation to data ranges. Plots were created and statistical analysis was performed using IBM SPSS Statistics for Windows version 23.0.0.3 (SPSS, Chicago, Illinois, United States). All testing was two sided. Values of p < 0.05 were considered significant.

Results

Of the 154 cases included, VDS failed in 37 cases (24%) because of alignment issues due to relative folding or twisting of tissue, or because sections were not properly cut in serial order. VDS alignment was not influenced by CK8/18 staining or tumor size. Of these 37 cases, 32 were misaligned in both DIA platforms, 3 cases were misaligned in only one platform (as the other platform’s algorithm was able to correct the relative twisting), and 2 cases could not be aligned by one platform as the stains were mirrored. Therefore, further analysis was performed on 117 cases.

Correlation of manual counting and DIA

Manual and digital cell count profile and Ki67 proliferation index are displayed in Table 2. DIA is illustrated in Fig. 1. Ki67 scores were slightly higher by manual counting than by DIA; mean 19.5% versus 18.3–18.4% and median 13.5% versus 12.2–12.6%. Scatterplots and Bland–Altman plots of manual counting compared to both DIA platforms as well as between platforms are displayed in Fig. 2. There was no skewness within specific data ranges. Correlation for inter-observer agreement between manual counting and DIA was high: Spearman’s correlation coefficients were 0.94 (p < 0.001) for manual counting compared to platform A and 0.93 (p < 0.001) for manual counting compared to platform B. Correlation for inter-platform agreement between platform A and platform B was even higher, with a Spearman’s correlation coefficient of 0.96 (p < 0.001).

Table 2 Cell count profile and Ki67 proliferation indexes by manual counting and digital image analysis
Fig. 1
figure 1

Digital image analysis of Ki67 with virtual dual staining. Corresponding cytokeratin (a) and Ki67 (b) stains are virtually aligned and Ki67 nuclear classification is determined among the cells in the area classified as tumor, shown in platform A (c, d) and platform B (e, f). Images at 200× magnification

Fig. 2
figure 2

Scatterplots with correlation coefficients (left) and Bland–Altman plots of agreement (right) between whole tumor Ki67 proliferation index by manual counting versus platform A (upper row), manual counting versus platform B (middle row), and platform A versus platform B (lower row)

Cases with ≥ 10% Ki67 difference

Ten of all 117 cases (8.5%) showed a difference in Ki67 proliferation index of ≥ 10% by DIA compared to manual counting, as shown in Table 3 and illustrated in Fig. 3. Only 2 cases had differences of > 13%. In 5 of the 10 cases, differences between manual counting and DIA were due to intra-tumoral Ki67 heterogeneity. When DIA was done on the manually counted ROIs only (instead of the whole tumor), differences were well below 10%. In the other 5 cases, the difference was due to tumor morphology or staining artifacts. Interestingly, differences between both DIA platforms were < 5% in the majority of cases (8 out of 10). Only one case had a ≥ 10% (10.2%) difference between platforms, due to hematoxylin overstaining which led to positive classification of Ki67 negative cells in platform A but not in platform B. In another case, artifactual cytoplasmic staining was erroneously recognized as positive nuclear staining by platform B but not by platform A (6.2% difference).

Table 3 Cases with ≥ 10% difference of Ki67 proliferation index by manual counting and DIA (total n = 117)
Fig. 3
figure 3

Cases with ≥ 10% difference in Ki67 proliferation index between digital image analysis and manual counting due to tumor morphology or staining artifacts. One case had clear cell morphology, causing erroneous tumor classification (ac). Two cases had nuclear overlap and cell clustering (d), causing misclassification of Ki67 both by platform A (e) and platform B (f). In one case, artifactual cytoplasmic Ki67 staining (g) was correctly handled by platform A (h) but was classified as positive nuclear staining by platform B (i). One case with hematoxylin overstaining (j) led to false-positive classification of nuclei by platform A (k) but not by platform B (l). Images at 200× magnification

Clinical context

The clinically relevant Ki67 cut-off is 20%, as defined by the St. Gallen criteria [21]. When this cut-off was applied in our study, discordance of tumor subtype classification due to Ki67 score by DIA versus manual counting occurred in 4 cases (3.4%) with platform A and 2 cases (1.7%) with platform B. Of these cases, one was among the cases with ≥ 10% difference discussed previously. All of the remaining cases were just above or just below the 20% cut-off with differences of 3.9% at most, illustrating a small margin of error (results by both platforms were similar). The degree of Ki67 differences between different counting methods in cases with Ki67 between 15 and 25% (near the 20% cut-off) is displayed in Supplementary Table 1. Clinicopathological characteristics of these cases were similar to those of the total study population (Supplementary Table 2).

Discussion

The aims of this study were to clinically validate DIA of Ki67 using VDS on whole tissue breast carcinoma sections, and to assess inter-platform agreement between two independent DIA platforms. We found high inter-observer agreement between manual counting and DIA, and even higher inter-platform agreement.

Correlations in studies comparing manual scoring with DIA vary between 0.89 and 0.97 [12,13,14]. In the current study, Spearman’s correlation coefficients were 0.94 (p < 0.001) and 0.93 (p < 0.001) between manual counting versus platform A and platform B, respectively, which is in line with these studies. Only one study implemented VDS, with an intraclass correlation coefficient of 0.97 between manual counting and DIA [14]. However, that study used TMAs and thereby preselected smaller areas of the whole tumor. In clinical practice, Ki67 is often scored on whole sections, as is recommended by the International Ki67 Working Group [5]. In our study, a manual counting protocol based on the ‘whole section scoring protocol’ by this Working Group was highly concordant with DIA on whole sections. As such, we have confirmed that VDS is an accurate method to perform DIA of Ki67 on whole sections and can be used in clinical practice.

Of the initial 154 cases included in our study, a large number (37 cases; 24%) was excluded due to VDS failure, which occurred in both platforms. For successful VDS alignment, the Ki67- and cytokeratin-stained sections must be identical and accurate serial sectioning is essential. Additionally, folding or twisting of one of the sections can cause VDS misalignment. As such, it is crucial that laboratory technicians responsible for the preparation of the slides are properly instructed and trained. For this study, laboratory technicians did not receive specific instructions on the necessity of careful stretching and serial sectioning, which could be the cause of the large number of misaligned cases. For clinical implementation of VDS, we therefore recommend specific instruction and training courses for laboratory technicians on the effects of inaccurately cut and mounted sections.

Inter-platform variability between different DIA platforms may be expected as tissue morphology, cellular features, and staining patterns are handled differently depending on the platform’s algorithm [15, 20]. In clinical practice, this could lead to inconsistency of Ki67 scores when different platforms are used to perform DIA. To the best of our knowledge, the current study is the first to address inter-platform agreement on one set of tumors. Inter-platform agreement was very high, with a Spearman’s correlation coefficient of 0.96. This shows that DIA is reproducible among different platforms, and therefore a clinical pathology laboratory is not bound to a specific DIA platform or vendor, as long as the algorithm is calibrated and validated in close collaboration with the platform vendor.

In 5 cases, there was a ≥ 10% difference in Ki67 proliferation index between DIA and manual counting due to tumor morphology or staining artifacts. Results by both platforms were similar in these cases, illustrating that both platforms handle troublesome cases in a similar way. We recommend that after analysis, a quick visual check of the results by a clinical pathologist should always be performed.

Intra-tumoral heterogeneity is a known occurrence in Ki67 stains [4, 16, 22, 23]. The manual counting protocol used in our study compensated for heterogeneity in most cases, yet there were 5 cases with a ≥ 10% difference due to heterogeneity. In these cases, we performed additional DIA on the manually counted ROIs (instead of the whole tumor). In that analysis, differences became well below 10%, showing that the manually selected ROIs were inadequately representative for the whole tumor in these cases (Table 3).

In a clinical context, Ki67 can be used in the distinction of intrinsic tumor subtypes (luminal A or luminal B). Previously, DIA of the concerning surrogate biomarkers (ER, PR, HER2 and Ki67) was shown to be prognostic superior to manual scoring [15]. According to the St. Gallen criteria, the clinically relevant Ki67 cut-off is 20% [21]. When this cut-off was applied for Ki67 by DIA versus manual counting in our study, there was discordance of tumor subtype classification in only a few cases. Additionally, the difference between counting methods (Supplementary Table 1) was < 5% in the majority of all DIA cases as well as in Ki67 15–25% subgroups (near the 20% cut-off). However, even a small discrepancy can make the difference between subtyping in cases near the 20% cut-off. Whether manual counting or DIA should be the ‘gold standard’ in these cases is subject to debate; most studies have correlated clinical significance with manual counting, but others have shown that DIA is prognostically stronger [5, 11, 15]. With regard to the St. Gallen criteria, Ki67 is only of importance in tumors which are ER positive, PR positive, and HER2 negative, especially in low-grade tumors of small size [21]. However, clinicopathologic characteristics of cases near the 20% cut-off were similar to that of our total study population (Supplementary Table 2), with possibly a slightly higher ER-positivity and HER2-positivity rates. As such, no specific clinicopathological characteristic is predictive of this Ki67 subgroup, though the clinical relevance of Ki67 in ER-negative, PR-negative, and/or HER2-positive tumors could be limited.

Calibration and validation are vital to the success of DIA [9]. Calibration can be challenging, and it is important to realize that the image analysis algorithms of both platforms used in our study were calibrated on our laboratory stains and scans, in close collaboration with the platform vendors. This collaboration is important, as the pathologist has the clinical expertise, whereas the platform vendor has the technical expertise. Differences in protocols and equipment among laboratories but also within one laboratory necessitate proper and continuous calibration and validation, as differences in staining methodology and materials can lead to variable texture and color nuances which can influence DIA algorithms [14]. Further studies could investigate the performance of DIA with regard to inter-laboratory variability on identical sets of tumors. Additionally, inter-platform agreement between other platforms than the two included in this study should be investigated.

A last point of interest is the cost of DIA. Initially, DIA would seem expensive, as it requires a scanner for digitalization of the images, DIA software, and a technician to carry out the analysis. However, an increasing amount of modern pathology laboratories are incorporating digital pathology in their diagnostic workflow [24]. In addition to being more reproducible, DIA can replace time-consuming manual counting of Ki67, saving pathologists time.

In conclusion, we have shown that DIA using VDS is an accurate method to determine Ki67 proliferation index on whole sections of invasive breast carcinomas. For clinical implementation, proper training of laboratory technicians responsible for the section preparation is crucial to prevent failure of VDS alignment. DIA of Ki67 offers an objective alternative to manual Ki67 counting and has high inter-platform agreement, suggesting that it is clinically implementable independent of a specific platform vendor.