Introduction

Assessment of breast cancer biomarkers has become a staple of routine histopathology for every colleague working in this field. Assessment and quantification of oestrogen receptor (ER), progesterone receptor (PR), and c-erbB2/HER2 are used daily by the clinicians making fundamental therapeutic choices for the patients. On the other hand, Ki-67 has struggled to join this established trifecta in the routine management and risk stratification of breast cancer patients [1,2,3,4] and its prognostic and predictive value is restricted to very specific settings in breast cancer; recently, the results from the monarchE study has suggested a prognostic role for Ki-67 ≥ 20% in patients with early breast cancer treated with cyclin-dependent kinase 4 and 6 (CDK4/6) inhibitors in combination with endocrine therapy [5].

Recently, changes in the stratification of ER positivity have been put forward by ASCO/CAP through an update of the recommendations for ER and PR determination, with the formal introduction of the ER-low-positive (ER-LP, defined as 1–10% of nuclear positivity in the tumour) [6]; this category has been known for some time to share more similarities with basal-type triple-negative breast cancer than with the luminal group [7,8,9,10], in morphological aspects, molecular signature and clinical behaviour.

In the same way, the results of the DESTINY-Breast03 trial [11, 12], and the identification of the HER2-low category [13, 14] as a new group of patients who may significantly benefit from anti-HER2 target therapy administration, have underlined once again the importance of the c-erbB2 status and its assessment with both immunohistochemistry and molecular techniques.

Core needle biopsy (CNB) is the most common method for diagnosis of breast cancer, and it has been demonstrated to be a reliable indicator of the surgical specimen results [15, 16]. Assessment of biomarkers on the preoperative specimen is required to administer neoadjuvant therapy to the selected patients that benefit from it and, in case of a complete pathological response, it represents the only sample of that tumour available; also for metastatic patients, who are not eligible for surgical resection, the bioptical sample is the only one that can be used to take life-changing clinical decisions.

It is clear, then, the importance of a reliable assessment of these biomarkers on biopsy. Given the steadily increasing request for precise molecular characterization of breast cancer to ensure correct patient management, we aimed to retrospectively evaluate our patient cohort for reproducibility of the biopsy results for ER, PR, c-erbB2, and Ki-67, focusing also on the recently defined ER-LP and HER2-low groups. We also review the recent literature on the topic to validate our results in the context of the international available results.

Methods

Patients and clinicopathological characteristics

We reviewed the records of 1654 breast cancer patients who underwent both biopsy and surgical excision at the Department of Breast Surgery, San Matteo Hospital, between January 2014 and December 2020. We excluded patients for which ER, PR, c-erBb2 or Ki-67 values were missing, patients with in situ or microinvasive-only breast cancer, patients with multifocal tumours, metastatic disease and patients who underwent neoadjuvant chemotherapy. Our final cohort comprised 923 patients and their clinical characteristics are summarized in Table 1. The study was conducted according to the guidelines of the Declaration of Helsinki.

Table 1 Clinicopathological characteristics of the cohort

Pathology evaluation

Samples were fixed in 10% neutral buffered formalin and embedded in paraffin before histopathological evaluation. 4-to-5 µm-thick sections were cut and stained with hematoxylin and eosin (HE), and unstained sections were used for immunohistochemistry with antibodies anti-ER (clone EP1, Dako Omnis), anti-PR (clone PgR 1294, Dako Omnis), c-erbB2 (clone A0485, Dako Omnis) and Ki-67 (clone MIB-1, Dako Omnis). All immunoreactions were carried out on a Dako Omnis platform (Dako, Glostrup, Denmark). All cases were seen by at least one pathologist (M.L. and/or C.R.) expert in breast pathology, who also revised all the discrepant cases prior to the final diagnosis.

ER and PR were defined positive when ≥ 1% of the tumour cell nuclei showed immunostaining, according to the 2010 ASCO/CAP guidelines [17]. ER was further stratified into LP and positive using the 10% cut-off, according to the 2020 ASCO/CAP guidelines update [6]. Ki-67 was scored ‘high’ when ≥ 20% of the tumour nuclei were positive, taking into account the cut-off clinically used to define the Luminal B class according to the 2013 St Gallen International Breast Cancer Conference experts Panel opinion [18]. Cells positive for Ki-67 were scored over 100 cells in both ‘cold’ and ‘hot’ tumour areas, and the final value represented an average between those of the different areas. This method is similar to the one recommended by the International Ki-67 in Breast Cancer Working Group (IKWG) in their 2021 updated recommendations [19], with the only difference that the recommended online scoring app was not used. c-erbB2 was scored according to the ASCO/CAP 2013 guidelines and 2018 Focused Update [20, 21] as 0, 1 +, 2 + or 3 + depending on intensity and completeness of the membrane staining. FISH was performed in all the equivocal cases, but the results are not reported in this paper since we focus on the immunohistochemical evaluation alone.

Cohen’s kappa (κ) was used to measure the interobserver agreement between biopsy and surgical specimen; weighted κ was used when concordance between more than one result was evaluated to account for close matches. κ values < 0.20 were interpreted as poor agreement, 0.21–0.40 fair agreement, 0.41–0.60 moderate agreement, 0.61–0.80 good agreement, 0.81–0.99 very good agreement and 1 perfect agreement. P-values were calculated with Fisher’s exact test and chi-squared test, when appropriated, on GraphPad Prism 5, and p-values < 0.05 were deemed significant.

Results

ER

ER was positive in 829/923 (89%) biopsies and in 831/923 (90%) surgical resection specimens, with a concordance of 97.83% (n = 903/923), a Cohen’s κ of 0.880 (very good agreement) and a p-value < 0.0001. The concordant and discrepant results are reported in Table 2, with 9 positive results (9/923, 1%) later reported as negative on the surgical specimen and 11 negative results (11/923, 1.2%) that were upgraded to positive on the final pathology report. These changes would have had a clinical impact with either withholding or administration of endocrine therapy in these patients, if the decision was based only on the biopsy results.

Table 2 Concordance between biopsy and surgical specimen for oestrogen receptor, progesterone receptor and Ki-67. ER, oestrogen receptor, PR, progesterone receptor

When the positive results were further stratified into ER-LP and ER-positive (Table 3) the overall agreement was still very good (97.18%, weighted κ = 0.924, p value < 0.0001), but the concordance for the ER-LP category itself was low (n = 11/23, 47.8%). Of the 12 discordant ER-LP results on biopsy, 8/12 (66.7%) were ER-negative on the surgical specimen and 4/12 (33.3%) were ER-positive. Of these four cases, three cases were only slightly above the cut-off for ER-LP (15%), whilst one showed positivity in 60% of the cells (Fig. 1).

Table 3 Concordance between biopsy and surgical specimen for oestrogen receptor-negative, oestrogen receptor-LP and oestrogen receptor-positive cases. ER, oestrogen receptor
Fig. 1
figure 1

Example of ER-LP discordance between biopsy and surgical specimen. On biopsy, (A, HE, 10x, B, ER immunohistochemistry, 10x) the tumour showed only faint, very focal (arrowhead) positivity for ER, that was quantified at 1%. The surgical specimen (C, HE, 5x, DF, ER immunohistochemistry, 5x) revealed a dishomogeneous and faint ER positivity. Note the positive internal control (arrowhead) in the top-left corner of panel (E)

PR

PR was positive in 759/923 (82%) biopsies and in 778/923 (84%) surgical resection specimens. Concordance was 94.26% (n = 870/923), Cohen’s κ was 0.794 (good agreement) and p value was < 0.0001. Most (n = 36/53, 67.9%) of the discordant results were biopsies reported as PR negative that were later reported as PR positive on the surgical specimen. The complete results are summarized in Table 2.

c-erbB2/HER2

c-erbB2 was found to be concordant in 68% of cases (n = 631/923), and Table 4 details the results for each category. Cohen’s weighted κ was 0.675 (good agreement) and p-value was < 0.0001. Breaking down the results for each category, 1 + was the least concordant group (37% vs 83%, 79% and 97% for 0, 2 + and 3+ , respectively), with 72% (n = 136/188) of the discordant results being diagnosed as 0 on the surgical specimen.

Table 4 Concordance between biopsy and surgical specimen for c-erbB2 status

According to current clinical practice, only four (4/923, 0.4%) patients had changes in the diagnosis that could significantly impact their treatment choice (two 1 + biopsies later upgraded to 3 and two 3 + biopsies downgraded to 1 +). They were all, except for one locally advanced cancer, early breast cancers that would have not been candidates for neoadjuvant treatment. However, on the account of the results of the biopsy alone, they would have been denied a potentially life-saving treatment or subjected to the toxicities of an ineffective one.

ki-67

Ki-67 was high in 256/923 (28%) biopsies and in 357/923 (38%) surgical resection specimens. Concordance was 86.13% (n = 759/923), Cohen’s κ was 0.686 (good agreement), and p value was < 0.0001. Concordance and discordance are reported in Table 2; most of the discordant results (n = 107/128, 84%) were biopsies in which Ki-67 was reported as low that were later upgraded on the excision specimen.

Discussion

ER

In the published literature ER concordance between biopsy and surgical excision averages at 93.3% (range 78.7–99.1%, Table 5). Our concordance was 97.83%, slightly higher than the average value but in keeping with the known results. This variance can also be attributed to the different cut-offs and scoring methods used in the single laboratories, reported in the table that impair their universal reproducibility.

Table 5 Review of the literature regarding oestrogen receptor concordance

Several factors are recognized to influence concordance between biopsy and surgical specimen, with the pre-analytical phase being the most important to standardize intra-laboratory results, trying to avoid under-fixation [17, 22]. The choice of the antibody clone and the immunostaining platform has also been demonstrated to be relevant in the over- or underestimation of the ER percentage [23], especially when comparing the widely used Dako and Ventana clones.

The high percentage (66.7% of ER-LP cases) of ER-LP that tested ER-negative on the surgical specimen suggests that ER-LP results more frequently represent an “overcalling” of a non-luminal carcinoma, in line with the current understanding of ER-LP tumour biology, that relates them more closely to this group. Recent published data suggest also that artefactual reduced intensity of staining of ER in normal breast tissue adjacent to the neoplasia may concur to report these cases as ER-LP [24], therefore underlying the importance of the presence of internal and/or on-slide controls in the assessment of these borderline cases.

In summary, our results suggest that caution should be taken when calling ER-LP on biopsy and for subsequent decision-making, and further studies are needed to define if this category can be safely defined on a biopsy specimen.

PR

Our final concordance for PR was 94.26%. From a review of the literature, the average reported concordance is 87.3% (range 73.5–95%, Table 6), and our results fall on the upper side of this range.

Table 6 Review of the literature regarding PR concordance

The lower concordance of PR assessment on biopsy and surgical specimen reflects its naturally occurring dishomogeneity in breast normal tissue and tumours, owing to his nature as a down-stream ER effector and therefore requiring an intact ER pathway to be strongly expressed. Our results with a higher proportion of upgrades rather than downgrades on the surgical specimen (67.9% vs 32.1%) suggest that indeed tumour heterogeneity, with a negative spot sampled on biopsy, may be a significant issue in PR assessment.

Given this heterogeneity, PR concordance is especially sensitive to sampling artefacts, especially undersampling of the target lesion. From the current literature, four represents the minimum number of biopsy cores that should be retrieved to ensure a correct preoperative evaluation [49].

c-erbB2/HER2

In the published literature, concordance for c-erbB2 when evaluated with immunohistochemistry alone averages at 85.4% (range 56–98.8%, Table 7); our series demonstrates a concordance in 68% of cases, lower than the reported values, but still in the published range.

Table 7 Review of the literature regarding c-erbB2/HER2 concordance

Breaking down the results, the 1 + category was the one with the lowest concordance, with only 37% of results getting confirmation on the surgical specimens. Whilst we had previously reported that this discordance for the 1 + category was not likely to have a significant therapeutic impact for the patients [50], the recent introduction of the HER2-low category as a subset of patients that could benefit from the administration of targeted anti-HER2 therapy will radically change that in the upcoming years.

The 2013 ASCO/CAP guidelines define 1 + as incomplete membrane staining that is faint or barely perceptible and within > 10% of tumour cells and readily appreciated in a contiguous population using a low-power objective, whereas 0 is defined as absence of staining or incomplete membrane staining that is faint/barely perceptible and within ≤ 10% of tumour cells [20]. This diagnosis must be made on immunohistochemistry alone, with no help coming from ancillary molecular studies.

However, in everyday practice, this distinction is not an easy one to make especially on biopsy, where crushing or technical artefacts and tumour heterogeneity make it a particularly challenging task. In the light of the growing importance of distinguishing between 0 and 1 +, we stress the importance of a dedicated, up-to-date breast pathologist examining these specimens, especially in those difficult cases in which the diagnosis is not immediately clear.

Ki-67

Ki-67 represents an unfulfilled promise in the field of breast cancer; despite being widely used as a marker of proliferation, it has failed time and time again to reach prognostic and predictive significance.

This is at least in part ascribable to the still unclear nature of this biomarker that, despite the best efforts that has been put into it, still does not have a fixed biological cut-off; guidelines for what qualifies as ‘high’ and ‘low’ Ki-67, and whether these definitions truly represent different biological entities, are still unclear.

Recent recommendations from the IKWG [19, 53] report that sufficient levels of evidence of the prognostic value of Ki-67 exists only in the setting of ER-positive early-stage breast cancer, where levels ≥ 5% and ≥ 30%, respectively, may favour withholding or administration of chemotherapy. Historically, the 2013 St Gallen International Breast Cancer Conference suggested a 20% cut-off for the definition of ‘high’ Ki-67 in the definition of the surrogate intrinsic subtypes of breast cancer [18], but still advised that different cut-offs could be adopted by single laboratories. As reported in literature [40, 43, 44, 46, 54, 55], most laboratories use either a 14% or a 20% cut-off.

Our concordance was in line with the values reported in literature (Table 8), especially with those reported by You et al. [44], whose cut-off and population are similar to those of this work. Similar values are reported also with a 14% cut-off by Kim et al., [55], Ahn et al., [54] and Meattini et al. [43], whilst lower concordance levels are reported by studies in which a Ventana antibody is used for immunohistochemistry [40, 46] irrespectively of the cut-off used.

Table 8 Review of the literature regarding Ki-67 concordance

To our knowledge, this work represents one of the largest single-centre series present in literature and the first one where issues within the ER-LP and HER2-low categories were specifically addressed. Moreover, all the immunohistochemistry reactions were carried out with the same antibodies on the same platform, and a breast pathologist (M.L. and/or C.R.) was involved in the diagnosis of all cases, ensuring a high degree of homogeneity in the test results. Possible limitations include the retrospective nature of the study and the lack of data regarding the HER2 2 + amplification status; however, this study aimed to evaluate only the immunohistochemical concordance for HER2, with particular attention to the HER2-low categories.

We confirmed a very good agreement for ER assessment on biopsy, and a still satisfactory, albeit lower, concordance for PR that falls just short of the very good agreement cut-off. Ki-67 evaluation instead was confirmed to be slightly less reliable than ER and PR, with a significantly lower concordance, and could warrant evaluation also on the surgical specimen, if available, in the light of potential treatment options [5].

ER-LP analysis revealed that, even if the global ER concordance is still satisfactory, the concordance for the specific category is still low and must be further investigated to define the boundaries within which it can be safely assessed on a biopsy sample.

The low concordance for the 1 + c-erbB2 immunohistochemistry is particularly relevant in the context of the new therapeutic advances involving it and highlights the technical difficulties in consistently implementing the current diagnostic criteria available for the diagnosis.

In the light of the quick and exciting evolution of the therapeutic landscape of breast cancer, it is important that caution is taken in evaluating biopsy samples, especially for those predictive factors that may significantly impact the therapeutic options of the patient. Although only a very small number of patients in our series would have received an inappropriate treatment based on biopsy results alone, our data underline the relevance of surgical sample retesting, at least in selected cases, including large tumours and cases with discrepant histological characteristics. Specific training needs should be addressed by the national and international pathology societies, especially in those still grey areas of ER-LP and HER2-low categories, where the difference between a negative and a positive result may withhold a target therapy, or cause patients to undergo unnecessary, and often quite burdensome, aggressive therapy.