Background

Until recently, therapies targeting the human epidermal growth factor receptor 2 (HER2) have been ineffective in HER2-negative breast cancer (BC) including BC with low levels of HER2 expression [1]. Recently, however, phase III results for the novel antibody–drug conjugate trastuzumab deruxtecan (T-Dxd) showed significantly improved survival in patients with metastatic HER2-low BC—defined with reference to prevailing recommendations for HER2 testing as an immunohistochemical score of 1+ or 2+ without detectable gene amplification [2, 3]. In consequence, as about 60% of primary invasive BCs belong to the HER2-low category [4,5,6], T-DXd may improve the outcome for a large group of patients.

The prevailing recommendations for HER2 testing from the American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) have established criteria for the immunohistochemical scores as summarized in Table 1 and supported a testing algorithm with immunohistochemistry (IHC) as the primary test and gene testing by in situ hybridization (ISH) as a supplementary test in case of score 2+ [3, 7, 8]. HER2 status is classified as positive in case of score 3+ or gene amplification and negative in case of score 0, 1+, or 2+ with normal gene status.

Table 1 Immunohistochemical scoring of HER2 as recommended by American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) in the guidelines first released in January 2007* and revised in November 2013† and November 2018‡

These recommendations were, however, designed with the aim of allocating HER2-positive patients to HER2-targeted treatment with trastuzumab, and while the distinction between positive and negative cases has shown good inter-observer reproducibility [9,10,11,12,13], reasonable consistency among laboratories [14,15,16], and high concordance between biopsy and surgical specimen [17,18,19], the discrimination of HER2-low BC may not show similar robustness. Thus, the development of new, more effective HER2-targeted agents raises a fundamental and urgent methodological problem: Can the current test method for HER2 with reasonable reproducibility discriminate HER2-low BC? In other words, is the current test method fit to answer a different question than it was originally developed for?

To address this problem, we performed a nationwide registry study on real-world HER2 data aiming to explore inter-laboratory variability in the assessment of HER2-low BC across all Danish pathology departments.

Methods

Patient data

The study included women with BC diagnosed between 2007 and 2019 in Denmark.

Since 1977, the Danish Breast Cancer Group (DBCG) has hosted a nationwide clinical database on patients with primary invasive BC in Denmark, and since 2006, the database has been synchronized with the Danish register for pathology reports, Patobank, with a close-to-complete coverage of patients with histopathologically verified BC [20].

From the DBCG database, we obtained data on all female patients diagnosed between January 1, 2007, and December 31, 2019, who were subsequently assigned for curatively intended treatment according to national guidelines. Most patients with primary advanced BC were therefore not included in the study.

The following clinicopathological parameters were extracted: HER2 IHC score and, if available, HER2 gene status (reported by HER2 gene copy number and HER2/CEN17 ratio), the resulting HER2 status, age at diagnosis, histological subtype, tumor size, estrogen receptor (ER) status (reported as percentage of ER-positive tumor cells; tumor positivity defined as ≥ 1% positive tumor cells), histological grade according to the Nottingham grading system, lymph node status at time of diagnosis, and the examining pathology department. HER2 gene amplification was defined according to the ASCO/CAP recommendations in force at the time in question [3, 7, 8]. The recorded HER2 status was corrected manually in case of clear discrepancy with the recorded IHC score and gene status (N = 11). As we did not have access to patient files, it was not possible to retrieve missing data.

In Denmark, diagnosis and management of BC take place exclusively within the public health system, which is organized under five administrative regions: Capital Region (1.73 million inhabitants in 2013), Zealand (0.82 million), Southern Denmark (1.20 million), Central Denmark (1.27 million), and Northern Denmark (0.58 million) [21]. In consequence, all breast biopsies and surgical specimens are examined at public pathology departments, which all adhere to the national guidelines from DBCG.

In January 2007, DBCG entered recommendations on HER2-targeted treatment in the national guidelines, at first for a limited patient population and since April 2010 for all patients with HER2-positive disease [20]. Since January 2007, Danish pathologists have therefore routinely reported HER2 score and status at time of diagnosis and progression of BC. From 2005 to September 2008, the DBCG guidelines recommended a testing algorithm for HER2 which in essence was identical to the algorithm later recommended by ASCO/CAP, with reference to the HERceptin Adjuvant trial [20, 22]. The ASCO/CAP recommendations for HER2 testing released in 2007 [7] were implemented in the DBCG guidelines in September 2008, and the 2013 and 2018 revisions [3, 8] in February 2014 and December 2018, respectively [20].

In the reporting of the results, we have chosen to anonymize the pathology departments.

IHC assays

In all Danish pathology departments, the quality of HER2 IHC and ISH is monitored semiannually as part of an external quality assurance program under the auspices of NordiQC [23]. With permission from all Danish pathology departments, we obtained data from NordiQC on assays and staining platforms used for HER2 IHC in Danish pathology departments from 2007 to 2019.

In order to include assay in a logistic regression model (see below), we entered which HER2 IHC assay every patient in the data set was assessed by, based on the departments’ semiannual reports to NordiQC. However, when the departments changed their assay, the exact date of the change was not reported. We therefore made the assumption that every change of assay was done either January 1 or July 1.

Statistical analysis

Distribution of HER2 score and status and ER status according to region, department, and year of diagnosis was evaluated by χ2 test. Patients with unknown score/status were not included in this analysis. The proportion of patients with unknown HER2 score was evaluated separately according to department. As an alternative measure of variability, the relative difference was determined as the difference divided by the minimum value.

Multivariable logistic regression was applied to examine how department, year, and IHC assay related to HER2 score and HER2 status, respectively. We evaluated HER2 score 0 versus {1+, 2+, and 3+}, as well as HER2 positive vs. HER2 negative. Reference categories were Dept. 4 (highest patient count), year 2014 (few unknowns), and the PATHWAY assay 4B5 790-2991 (most frequently used). We ran the analysis both with all the different IHC assays and with a grouping of related assays (PATHWAY assays, HercepTest™ assays, others, and unknowns); the former gave a significantly better model and was therefore chosen. Wald χ2 statistic was used to assess the significance of the variables. We also ran the analysis with HER2 score 0 versus {1+, 2+, 3+, and score unknown} and with HER2 positive vs. {HER2 negative and status unknown}; as this only affected the estimates modestly, the results are not shown. Interactions for pair of variables were investigated in separate models.

A P value < 0.05 was considered statistically significant. Statistical analysis was performed using SAS Enterprise Guide 7.15, SAS Institute Inc.

Results

From 2007 to 2019, a total of 50,714 women were diagnosed with primary invasive BC and treated with curative intent. The pathological examination was undertaken at 14 Danish pathology departments. Patient characteristics are reported in Table 2, stratified according to pathology department and administrative region. Mean age for the population was 61.2 years; 80.2% of tumors were classified as invasive ductal carcinoma; median tumor size was 16 mm; and 85.8% of tumors were ER positive. Among the three histological grades, grade II was the most frequent (42.7%). Based on sentinel node or axillary dissection, 36.3% of patients had lymph node involvement at time of diagnosis. Overall, only relatively minor differences were seen across the population.

Table 2 Patient characteristics stratified according to administrative region and pathology department

For 49,042 patients (96.7%), HER2 IHC score was recorded in the DBCG database, and for 48,382 patients (95.4%), both HER2 score and HER2 status were recorded, as schematized in Fig. 1. For 397 patients (0.78%), HER2 status, but not HER2 score, was recorded; among these, 303 (76.3%) were negative of HER2 and 94 (23.7%) positive, and 371 (94.9%) had a recording of HER2/CEN17 ratio. Among the 8029 patients with a score of 2+, 6308 (78.6%) had normal gene status, 1061 (13.2%) had gene amplification, and 660 (8.2%) had unknown gene status. Among patients with a score of 0 or 1+, gene status was reported for 281, among whom 277 had normal gene status and four had gene amplification (two with a score of 0 and two with a score of 1+, hence classified as HER2 positive).

Fig. 1
figure 1

Block diagram of the population. Two patients with a score of 0 and two with a score of 1 + were classified as HER2 positive due to gene amplification; these four patients are not plotted explicitly in the chart

Distribution of HER2 scores

Table 3 shows how HER2 was scored in Danish pathology departments from 2007 to 2019. The distribution of the scores varied significantly among regions, departments, and years (P < 0.0001 in all cases). When patients with unknown HER2 score were left out of account, the relative frequency of the scores ranged among departments from 10.7 to 38.1% for score 0, from 35.8 to 58.8% for score 1+, from 6.7 to 31.0% for score 2+, and from 9.5 to 15.6% for score 3+. The inter-laboratory variability for scores 0 and 2+ corresponded to a very high relative difference of 2.6 and 3.6, respectively. Inter-annually, frequencies ranged from 20.1 to 33.7% for score 0, from 40.4 to 51.3% for score 1+, from 12.9 to 18.8% for score 2+, and from 10.8 to 13.2% for score 3+. Surprisingly, the adjusted definition of score 0 in the 2013 revision of the ASCO/CAP guidelines (cf. Table 1) did not increase the frequency of score 0 (26.6% in the years 2007–2013 vs. 26.4% in the years 2014–2019).

Table 3 Distribution of HER2 scores across administrative regions and pathology departments

In Fig. 2, the distribution of the scores is illustrated over time for the five administrative regions. Striking differences and trends appear: Thus, from 2011 onwards, the frequency of score 2+ increased in Central Denmark and declined in Capital Region, and from 2009 onwards, the frequency of score 0 declined in Central Denmark. Likewise, for the individual departments, different trends were seen across the years; e.g., in the years 2017–2019, the frequency of score 0 increased from 8.9 to 12.8% to 28.6% in Dept. 11 and decreased from 56.5 to 46.2 to 40.4% in Dept. 13 (data not shown).

Fig. 2
figure 2

Distribution of HER2 scores over time in the five administrative regions of Denmark. (NA, not available)

In addition to this, the proportion of patients with unknown HER2 score differed significantly among departments (P < 0.0001). Here, three departments stood out: Dept. 3, 10, and 14 with 19.4% (N = 55), 37.4% (N = 278), and 20.8% (N = 349) unknowns, respectively, as compared to 2.1% at the other 11 departments. Overall, in the entire population, the number of patients with unknown HER2 score declined from 2007 to 2013—and from 2012 onwards, the proportion was < 1% every year. In this context, Dept. 9 deviated from the overall picture, as the proportion increased in the last part of the study period, from 2.5% in the years 2007–2014 (N = 57) to 12.1% in the years 2015–2019 (N = 234). Of the 397 patients with recorded HER2 status but unknown HER2 score, 200 came from Dept. 9 (all with recorded HER2/CEN17 ratio and 171 from the years 2015–2019) and 130 from Dept. 2 (109 with recorded ratio).

Variability in HER2 status and HER2-low BC

Table 4 shows variability in HER2 status for the 48,382 patients with recordings of both HER2 score and HER2 status. Among these patients, 6765 (14.0%) had positive HER2 status and 28,633 (59.2%) belonged to the HER2-low group. HER2 positivity rates ranged from 13.1 to 14.6% among regions (P = 0.004, relative difference 0.11), from 11.8 to 17.2% among departments (P < 0.0001, relative difference 0.46), and from 12.6 to 15.7% over the years (P = 0.005, relative difference 0.25).

Table 4 Variability in HER2 status among patients with recordings of both HER2 score and HER2 status

The proportion of HER2-low cases ranged from 48.3 to 64.5% among regions (P < 0.0001, relative difference 0.34), from 46.3 to 71.8% among departments (P < 0.0001, relative difference 0.55), and from 49.3 to 65.6% over the years (P < 0.0001, relative difference 0.33). When the eight pathology departments with more than 3000 BC patients were considered separately, the frequency of score 0 ranged from 18.1 to 38.4% and the proportion of HER2-low cases from 49.2 to 70.0% (P < 0.0001 in both cases). In Fig. 3, the HER2-low rates in these eight departments are illustrated over time. As it appears, the dispersion increased from 2011 to 2019: In 2011, the range was 52.5–64.9%, while in 2019, it was 46.5–81.6%. In the three departments with the highest patient count (Dept. 1, 2, and 4), HER2-low rates ranged from 54.4 to 60.0% (P < 0.0001).

Fig. 3
figure 3

Frequency of HER2-low breast cancer among patients with recordings of both HER2 score and status in the eight pathology departments with more than 3000 breast cancer patients

Of note, HER2 positivity rates showed only slightly higher variability than ER positivity rates (cf. Table 2), which ranged from 85.4 to 86.5% among regions (P = 0.22), from 81.8 to 88.2% among departments (P < 0.0001), and from 82.0 to 87.4% over the years (P < 0.0001).

IHC assays

Figure 4 shows the assays and staining platforms used for HER2 IHC in all Danish pathology laboratories. A general movement from HercepTest™ antibodies K5207 and SK001 (Dako/Agilent) toward PATHWAY antibody 4B5 790-2991 (Ventana/Roche) is noticed. Indeed, in 2007, 11 out of 14 laboratories used different HercepTest™ assays, while from 2012 onwards, 11 out of 13 laboratories used 4B5 790-2991, including the eight departments with the highest number of BC patients.

Fig. 4
figure 4

Assays and staining platforms for HER2 immunohistochemistry (data kindly provided by NordiQC). (CDx, Companion diagnostics; LDT, Laboratory developed test)

Multivariable logistic regression

By multivariable logistic regression, we examined the impact of department, year, and IHC assay on the odds of being classified as HER2 positive or HER2 score 0, respectively, as reported in Table 5. Besides an analysis of the entire study period (2007–2019), we did an analysis of the last six years alone (2014–2019), as this period gave a more present picture and only covered two guideline editions with very similar scoring criteria (cf. Table 1). In the analysis of the last six years, we excluded Dept. 3 due to shutdown of the laboratory in January 2012 and Dept. 10 due to a patient count of only 22.

Table 5 Multivariable logistic regression testing the impact of department, year, and immunohistochemical assay on the odds of being classified as either HER2 positive or HER2 score 0—performed for both the entire study period and for the last six years

The examining pathology department was significantly related to HER2 positivity (P < 0.0001 for both 2007–2019 and 2014–2019) with odds ratios (ORs) ranging from 0.84 (95% confidence level (CL) 0.73–0.97) to 1.27 (95% CL 1.10–1.46) among all departments and from 0.86 (95% CL 0.76–0.98) to 1.16 (95% CL 1.03–1.30) for the eight departments with the highest patient count. Similarly, the examining pathology department had a significant impact on odds for score 0 (P < 0.0001 for both 2007–2019 and 2014–2019) with ORs ranging from 0.25 (95% CL 0.22–0.30) to 1.41 (95% CL 1.19–1.67) among all departments and from 0.46 (95% CL 0.42–0.51) to 1.36 (95% CL 1.24–1.49) for the eight departments with the highest patient count.

In the analysis of the entire study period, IHC assay was significantly related to HER2 score 0 (P < 0.0001) but not HER2 positivity (P = 0.08), whereas the assay had no significant impact in the period 2014–2019, where 11 out of 12 laboratories in the model used the same assay (P > 0.5 for both HER2 positivity and score 0). Year of diagnosis was significantly related to HER2 score 0, both in the entire period and in the last six years (P < 0.0001 in both cases), but only to HER2 positivity in the analysis of the entire period (P = 0.01 vs. P = 0.15 for the last six years).

Tests for interactions in the model of HER2 positive vs. HER2 negative showed significant interactions in the years 2007–2019 between department and year (P = 0.002, indicating that HER2 positivity rates developed differently at the departments across the years) and department and assay (P < 0.001, indicating that the impact of assay on HER2 positivity differed among departments) but not between assay and year (P = 0.33, indicating that the impact of assay was stable across the years); for the years 2014–2019, no significant interactions were found. In the model of score 0 versus 1+, 2+, and 3+, significant interactions were found between department, year, and assay for the years 2007–2019 (P < 0.0001 for both department/year, department/assay, and assay/year, indicating that the frequency of score 0 developed differently at the departments across the years, as exemplified above, and that the impact of assay differed among departments and across the years); for the years 2014–2019, significant interactions were demonstrated between department and year (P < 0.0001) and assay and year (P = 0.002).

Discussion

With the development of new, more effective anti-HER2 agents, patients with HER2-low BC may now benefit from HER2-targeted treatment. These advances, however, call into question whether the current test method for HER2 with reasonable reproducibility can discriminate HER2-low disease [24].

We performed a nationwide registry study on 50,714 women diagnosed with BC in the period 2007–2019, using data from daily clinical practice across all Danish pathology departments. HER2 score and status were recorded for 48,382 patients (95.4%), among whom 59.2% belonged to the HER2-low group and 14.0% were positive of HER2. The proportion of patients with HER2-low disease varied by 25.5 percentage points among departments (range 46.3–71.8%, relative difference 0.55) and 16.3 percentage points over the years (range 49.3–65.6%, relative difference 0.33). Notably, in the eight pathology departments with the highest number of patients, variability in HER2-low cases increased from 2011 onwards, although the same IHC assay and staining platform were used. In comparison, the proportion of HER2-positive cases varied by 5.4 percentage points among departments (range 11.8–17.2%, relative difference 0.46) and 3.1 percentage points over the years (range 12.6–15.7%, relative difference 0.25). By multivariable logistic regression, the examining pathology department was significantly related to both HER2 score 0 and HER2 positivity (P < 0.0001) but showed greater dispersion in ORs in the former case (range 0.25–1.41 vs. 0.84–1.27 among all departments). Overall, IHC assay and year of diagnosis were stronger predictors of HER2 score 0 than of HER2 positivity.

Consequently, the assessment of HER2-low BC showed markedly higher inter-laboratory variability than the assessment of HER2-positive disease, although the relative differences were equally high. The findings cast doubt on whether the current test method can be used for allocating patients with HER2-low BC to HER2-targeted treatment in daily clinical practice. With the ambition of targeting HER2-low BC therapeutically, reliable and robust delimitation of score 1+ from score 0 is essential as false results may lead to misassignment for treatment or no treatment. Therefore, if reproducibility is not improved significantly, our data may support that T-DXd is offered to all patients with metastatic HER2-negative BC, rather than to HER2-low patients alone, given the high efficacy of T-DXd reported by Modi et al. [2]. Indeed, phase II results for T-DXd did show some activity in BC with a score of 0 [25], supposedly primarily in cases with sporadic (≤ 10%) incomplete membrane reaction; this subgroup is therefore also eligible for randomization in the ongoing phase III trial for T-DXd, DESTINY-Breast06 (ClinicalTrials.gov ID: NCT04494425). Our findings stress the need for standardized procedures, as well as further investigation of assay interchangeability. In addition, our findings support the reassessment of previously stained HER2 slides if a metastatic lesion cannot be biopsied. The overall proportion of HER2-low cases in our study is in line with other population-based investigations [5, 6].

Limitations to the study include variability in cases of unreported HER2 score among departments. This could be a source of bias, as it is not a given that these patients showed similar patterns of HER2 expression as patients with known HER2 score. In fact, the group of patients with unreported HER2 score but recorded HER2 gene status (N = 371) was enriched of HER2-positive cases (23.7%). Most of these patients came from Dept. 9 (N = 200) and 2 (N = 109), suggesting some local underreporting of score 2+/3+. However, the overall data completeness was high and improved during the years.

The high variability in HER2-low BC presented in the current study is consistent with recent data from CAP's quality assurance program, where tissue microarray cores from 80 BC cases were stained and scored for HER2 at 1400 laboratories [26]. Here, 15 out of 56 cases considered as score 0 or 1+ had less than 70% inter-rater agreement. In the same study, a data set of 170 scanned slides assessed by 18 experienced pathologists showed only 26% concordance between HER2 score 0 and 1+ as compared to 58% between score 2+ and 3+ [26].

This and other studies suggest that the variability demonstrated in the present study is in large part attributable to variability in the evaluation of the IHC stains [26,27,28,29,30]. Indeed, the scoring methodology is a matter of subjective interpretation, and the scores have in several studies shown considerable inter-rater variability, especially (as common logic would dictate) the intermediate scores [28, 29, 31,32,33,34]. As regards HER2-low disease, the decisive distinction goes between score 0 and 1+, which has until now been clinically inconsequential; accordingly, for this distinction, pathologists may have adhered less rigorously to the ASCO/CAP criteria and may only rarely have conferred cases with colleagues. This may have increased variability in HER2-low rates further. In light of this, ASCO/CAP now recommends that cases close to the interpretive threshold between score 0 and 1+ be assessed by two pathologists at 40× magnification [35].

In addition to this, discrepancies in staining protocols and assays—i.e., analytical differences—and in the handling of the tissue—pre-analytical differences—may have contributed to the high variability. Thus, the currently available IHC assays are designed for detecting HER2-positive cases, where the number of HER2 receptors per cell is 25–100 times higher than in normal breast tissue and in BC cases with a score of 0 or 1+ [36, 37]. It is therefore not surprising that the assays lack both sensitivity and specificity for capturing the low-HER2 dynamic range. Moreover, different IHC assays show different staining patterns, as recently demonstrated by Agilent Technologies whose latest HercepTest™ assay reportedly lowered the frequency of score 0 by 37.5% [38]. However, from 2012 to 2019, 11 out of 13 Danish pathology departments used the same assay and staining platform for HER2 IHC (cf. Fig. 4), although possibly with some discrepancies in protocols, and in all Danish pathology departments, the quality of HER2 IHC and ISH is subject to close external control in a common quality assurance program [23]. For pre-analytical factors such as time to ischemia, fixation, tissue preparation, section thickness, choice of control tissue, and whole-slide vs. tissue microarray evaluation, international standards are widely implemented, yet the significance of these factors is only sporadically monitored. Among these, cold ischemia and underfixation are probably the best elucidated in BC, as delayed and poor fixation reduced HER2 immunoreaction in several studies [39,40,41]. In fact, fixation itself is reported to reduce HER2 receptor antigenicity [42, 43]. The impact of these factors may be relatively greater in the low-HER2 range.

Regarding possible solutions to the high variability in HER2-low rates, we consider it plausible that, in itself, increased awareness and a formal redefinition of the dichotomous HER2 status (as recently proposed by European Society for Medical Oncology [44]) will help reduce differences in scoring practice [24]. Moreover, training of pathologists, possibly assisted by digital learning tools, could improve concordance, just as digital image analysis calibrated to distinguish score 0 from 1+ could be a helpful supplement to light microscopy [32, 45, 46]. In addition to this, our data may indicate that central review could be part of the solution, as the three departments with the highest patient count only differed by 5.6 percentage points in HER2-low rates; this comes, however, at a cost in terms of turnaround time. Finally, the introduction of novel molecular analyses must be considered, e.g., as an add-on in case of score 0 or 1+. In theory, quantitative measurements of the treatment target, i.e., the HER2 receptor, or a closely related surrogate marker such as mRNA would be preferable to biomarkers reflecting more upstream molecular events, e.g., gene amplification. Kennedy et al. measured the number of HER2 receptors in HER2-negative tumors by means of targeted mass spectrometry and showed an implied positive correlation with IHC score, although with great variance around the trend and great overlap between the scores (thereby possibly illustrating the inaccuracy of the current test method in the low end of the scoring system) [47]. Recently, Moutafi et al [48] introduced quantitative immunofluorescence of HER2 in HER2-low BC showing good association with targeted mass spectrometry and decent association with IHC. RNA-based methods remain to be investigated properly in HER2-low disease [47, 49] but have previously shown conflicting results in gene amplified BC [50]. In contrast to IHC, proteomic and transcriptomic methods for HER2 quantification provide a normal range for HER2 expression. Our findings highlight the need for further investigation into these methods in search of a quantitative, clinically feasible, and reproducible alternative to IHC.

Conclusions

The findings of this nationwide real-world data study showed high inter-laboratory variability in the assessment of HER2-low BC. The results cast doubt on whether the current test method for HER2 is robust and reliable enough to select HER2-low patients for HER2-directed treatment in daily clinical practice. Our data stress the need for standardized procedures, as well as further research into new, quantitative methods for HER2-low testing.