Introduction

Breast cancer is the most common type of cancer amongst women worldwide and the leading cause of cancer-specific death for women in Europe [1]. The oestrogen receptor (ER) and progesterone receptor (PR) expression are the oldest biomarkers in breast cancer [2, 3].

Different methods exist for determining the expression of hormone receptors (HRs). The tissue can be analysed using enzyme immunoassays (EIA), in which the amount of HRs is expressed in fmol/mg, defining HR positive as 15 fmol/mg or more [4, 5]. More recently, however, immunohistochemistry (IHC) has been the preferred method of staining hormone receptors. The number of cells expressing HRs is counted, generating a percentage of positive cells [6]. Different cut-off levels are used to determine whether a tumour is considered HR positive. Usually, a tumour is considered HR positive when more than 10% of the tumour cells express HRs [7, 8].

Furthermore, nuclei can be grouped into categories of negative, weak, moderate and strong nuclear staining to generate a continuous histoscore ranging from 0 to 300, calculated by multiplying the sum of the percentage of weakly stained cells times 1, moderately stained cells times 2, and strongly stained cells times 3 [9]. Tumours with a histoscore of 50 or more are usually considered HR positive.

Additionally, the Allred scoring system has been used, which is a semi-quantitative measure that takes into consideration the proportion of positive cells (scored on a scale of 0–5) and staining intensity (scored on a scale of 0–3). The sum of these produces a score between 0 and 8, and tumours with a score of 3 or more are usually considered HR positive [10].

Another semi-quantitative measure is the ER immunoreactive score (IRS), which also relies on the proportion and intensity. This produces a score between 0 and 12, considering tumours with a score of 2 or higher HR positive [11].

Already more than 15 years ago, IHC was proposed as the reference method by different boards and peer committees [12] and a dichotomous, qualitative scale of HR expression (i.e. “positive” or “negative”) was unanimously adopted [13]. This method remains the gold standard for HR expression evaluation [14].

Although it is generally claimed that tumours with strong ER and/or PR expression are more sensitive to endocrine therapy (ET), there is no clear definition of weak or strong ER/PR expression. The most recent guidelines as proposed by the St. Gallen International Expert Consensus Conference on the Primary Therapy of Early Breast Cancer in 2017, briefly mention high ER expression as a characteristic of a low risk tumour and vice versa, but fail to provide any definition or cut-off value to determine which tumours are in fact high in ER expression [14]. As there is no consensus on the value of quantitative HR expression analysis, it is not (yet) common practice to report on HR load in the clinical setting.

This systematic review gives an overview of the methods to quantitatively assess HR load, the predictive and prognostic value of determining the HR load, gives recommendations for clinical practice and discusses future developments for HR analysis and endocrine treatment.

Methods

Data searches and study selection

To obtain all the relevant literature, the electronic databases PubMed, Embase and Web of Science were searched in March 2018, using the keywords presented in Table 1. The complete search string for all databases can be found in supplementary Table 1. This search was updated in August 2018 and in January 2019. According to PRISMA guidelines for systematic reviews, two of the authors (IN and AFG) individually and independently screened the articles for predefined inclusion criteria [15]. These were stated as follows:

  • The article was published in English in a peer reviewed journal;

  • The article was a primary report of original data;

  • The study concerned women diagnosed with stage 1–3 adenocarcinoma of the breast;

  • The tumour’s ER and/or PR expression was analysed using IHC (the international gold standard);

  • ER and/or PR expression was reported quantitatively (continuous) or semi-quantitatively (minimum of three groups);

  • Within the subset of HR-positive cases, the (semi-)quantitative measure of ER and/or PR was analysed in association to the primary clinical endpoint.

Table 1 Key words used for data search

Only studies that the reviewers reached a consensus on were included. If needed, a third reviewer was consulted.

Due to the retrospective nature of most included studies, it was elected not to perform a risk of bias assessment. Each study was awarded a level of evidence according to the Oxford Centre of Evidence-Based Medicine [16].

Data extraction

All data from the included studies were analysed and data regarding the following items were extracted:

  • Number of participating patients;

  • Method of HR expression determination;

  • Method of HR expression quantification;

  • Type of systemic treatment (ET and chemotherapy) and timing (adjuvant or neoadjuvant);

  • Primary clinical endpoint and follow-up time;

  • Association primary clinical endpoint to quantified HR expression.

Due to the heterogeneity of the included studies, data was not pooled, and no meta-analyses were performed.

Results

Characteristics of included studies

Using the key words presented in Table 1, 777 unique articles were identified. After matching these to the inclusion criteria, 19 articles were included. The most common ground to exclude studies was not reporting ER and/or PR expression (semi-)quantitatively (n = 273) (Fig. 1). Combined, all included studies comprised a cohort of 30,754 patients.

Fig. 1
figure 1

CONSORT diagram to account for excluded studies. ER Oestrogen receptor, PR Progesterone receptor, IHC Immunohistochemistry

Quantitative assessment of HR expression

Of the nineteen included studies, six studies performed HR staining on whole-section slides of the tumour tissue, whereas nine studies first created TMAs, where several cores are taken off the tissue blocks. HR staining is then performed on these cores instead of on whole-section slides. In four studies, it was not specified how the staining was performed.

In five studies, a continuous quantitative measure (percentage or histoscore) was used to determine HR load, in four studies, patients were divided in groups of negative, low and high expression and in nine studies patients were divided in four or more groups according to the HR expression. In one study, both a continuous and a semi-quantitative measure was used.

Systemic treatment of included patients

Of all included patients, 5812 were treated with tamoxifen, 3111 were treated with an aromatase inhibitor (AI), 2164 were treated with a combination of tamoxifen and an AI, 6614 were treated with unspecified ET and 7769 patients did not receive any ET. For 5284 patients, it was not specified whether they received ET or not (Fig. 2). Additionally, 10,036 patients were treated with chemotherapy. For 7788 patients, it was not specified whether they received chemotherapy or not, 12,930 patients did not receive chemotherapy. Treatment with anti-HER2 medication was explicitly stated for only three patients [17]. Supplementary Table 2 provides detailed information on all included studies.

Fig. 2
figure 2

Distribution of types of endocrine therapy over patients. TAM Tamoxifen, AI Aromatase inhibitor, ET Endocrine therapy

Table 2 Overview of results of the included articles studying oestrogen receptor (ER) load in 26,259 patients

Overall association ER load and clinical outcome

In 17 of the 19 included studies, the ER load was analysed, in a total of 26,259 patients with stage 1–3 breast cancer patients (Table 2) [11, 17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]. In 11 studies, HR-negative patients were also included, all reported associations between the ER load and the primary outcome measure regard the subset of HR-positive cases only. Disease-free survival (DFS) was used as primary outcome measure in seven studies, overall survival (OS) was used in five studies, recurrence in three studies and breast cancer-specific mortality in two studies.

When studying ER load as a continuous measure (either using percentage or histoscore, n = 6), a higher ER load was found statistically significantly associated with better clinical outcome in two studies, marginally significantly associated in one study, and three studies did not find a significant association between higher ER load and better clinical outcome.

When dividing patients in three groups, i.e. ER-negative, low ER and high ER expression (n = 4), three studies did not find a significantly longer DFS for patients in the high ER expression group than patients in the low ER expression group, one study only found a marginally significant association between longer OS and higher ER expression.

When dividing patients in four or more groups based on ER expression (n = 8), seven studies did not find a significantly better clinical outcome for patients with a higher ER expression and one study only found a marginally significant association between better clinical outcome and higher ER expression.

These results are summarised in Table 2.

Using ER load as a prognostic and/or predictive marker

There was only one randomised trial that compared patients with and without ET, which could be used to analyse the prognostic and predictive properties of the quantitative ER load without the risk of bias due to treatment indication. This study by Chapman et al. (n = 345) [21] used a continuous quantitative measure, stained on TMAs and found no significant correlation between higher ER percentages and longer DFS in the overall study population (p = 0.24). They did not find any association between higher ER load and longer DFS in the subgroup that was randomised to receive no adjuvant ET and thus conclude that the quantitative ER load cannot be used as a prognostic marker. They also did not find any association between higher ER load and longer DFS in the subgroup of patients that was treated with adjuvant ET and, therefore, conclude that quantitative ER load is not an adequate predictive marker for sensitivity to ET, either.

Overall association PR load and clinical outcome

Of the 19 included studies, ten studies analysed PR load in 14,161 early breast cancer patients (Table 3) [18,19,20,21,22,23,24, 29, 33, 34]. In six studies, HR-negative patients were also included, all reported associations between the PR load and the primary outcome measure regard the subset of HR-positive cases only. DFS was used as primary outcome measure in six studies, three studies used recurrence as primary outcome and one study used breast cancer-specific mortality.

Table 3 Overview of results of the included articles studying progesterone receptor (PR) load in 14,161 patients

When studying PR load as a continuous measure (n = 6), a higher PR load was found to be significantly associated with better clinical outcome in two studies, a higher PR load was found marginally significantly associated with better clinical outcome in one study, and three studies did not find any association between PR load and clinical outcome.

When dividing patients in PR-negative, low PR and high PR expression groups (n = 2), DFS was not significantly longer in the high PR expression group than in the low PR expression group.

When dividing patients in four or more groups based on their PR expression (n = 3), clinical outcome was not significantly better for a higher PR load.

These results are summarised in Table 3.

Using PR load as a prognostic and/or predictive marker

There were two randomised trials comparing patients with and without ET, which could be used to analyse the prognostic and predictive properties of the quantitative PR load without risk of bias [21, 34]. Both studies randomised patients between tamoxifen or no adjuvant ET and used TMAs to stain the PR.

The study by Chapman et al. (n = 345) [21] used a continuous quantitative measure and found no association between continuous higher PR percentage and longer DFS in the overall randomised study population (p = 0.04; uncorrected for multiple testing). They did not find any association between higher PR load and longer DFS in the subgroup that received no adjuvant ET. They also did not find any association between higher PR load and longer DFS in the subgroup of patients that was treated with adjuvant ET.

The study by Nordenskjold et al. (n = 449) [34] divided patients in seven groups based on the number of positive PR staining cells and did not find an association between the PR percentage groups and the occurrence of disease recurrences in the overall study population. They found no association between higher PR load and less disease recurrences within the subgroup of patients that did and did not receive ET, either.

Thus, both studies concluded that quantitative PR load is not an adequate tool to determine the prognosis of early breast cancer patients, nor to predict sensitivity to ET.

Interaction between ER and PR

Of the eight studies that examined both the ER and PR load, only two studied the interaction between ER and PR load. The study by Campbell et al. [19] found a statistically significant interaction between the quantitative ER and PR load, and only found a significant association between higher PR load and better outcome in those patients that also had a higher ER load. The study by Harigopal et al. [24] found a moderate interaction between continuous quantitative ER and PR percentage (Pearson r = 0.43, p < 0.0001).

This suggests that the quantitative PR load is not independently associated with outcome, but only in relation to the quantitative ER load.

Discussion

Many efforts have been made to identify biomarkers or profiles in breast cancer patients capable of predicting sensitivity to endocrine treatment and the risk of recurrence after treatment is discontinued [35, 36]. One of these methods is the quantitative assessment of ER and PR expression, i.e. the ER and PR load, instead of merely assigning tumours an ER and PR-positive or negative status [37].

This review concludes that in patients with an ER-positive tumour (defined as ER > 10%), a higher ER load as assessed by IHC is not correlated to better outcome, and no evidence could be found for using quantitative ER load as a prognostic marker. In other words, patients with a higher ER load (e.g. 100%) do not inherently have a better prognosis than patients with a lower ER load (e.g. 20%). Furthermore, no evidence could be found for using quantitative ER load as a predictive marker, i.e. patients with a higher ER load do not have more benefit of ET than patients with a lower ER load.

This review also concludes that in patients with an HR-positive tumour, higher PR load does not seem to be correlated to better outcome. Based on the included studies, quantitative PR load is not a suitable prognostic marker; patients with a higher PR load do not inherently have a better prognosis than patients with a lower PR load, nor is it suitable as a predictive marker. Furthermore, PR load seems to be interacted with ER load and is therefore not recognised as an independent predictor.

One of the included studies, by Esslimani-Sahla [23], found an unusually high number of recurrences and only found an association between recurrence and ER load when examining ERβ, not when examining ERα. As this is the only study that specifically examines ERβ, it is somewhat of an outlier, and its results should be interpreted with caution.

In this analysis, only studies that examined the HR expression using IHC were included. This method is the gold standard for determining HR status and other methods, such as EIA or mRNA expression profiles are not routinely used in clinical practice. Specifically, we have not focussed on articles studying EIA to determine HR status, as this method is outdated and is not routinely used in the current clinical practice. This also ensures that the included studies create a homogenous cohort. Even still, different methods were used to stain the HRs, such as staining on whole-section slides or on TMAs, though this did not seem to influence the outcome; studies staining on TMAs were not less likely to find a correlation between HR load and outcome than studies staining on whole-section slides.

Studies also differed in their way of quantitatively measuring HR load; some studies used a continuous percentage or histoscore, some studies used groups of HR-negative, low HR expression and high HR expression and some divided patients in four or more groups based on Allred score or percentage. This does have an influence on the outcome. Studies were more likely to find a positive association between HR load and outcome if a continuous score was used. However, using a continuous quantitative measure to assess HR expression is questioned by several articles. Interobserver variability is high, and samples get assigned different HR percentages depending on the pathologist and the lab it was reviewed in [38]. Most importantly, staining breast cancer tissue using IHC does not allow for precise enough measurement of HR load to generate a continuous score and can only quantify into negative, weak positive and strong positive [7, 39, 40]. The problem with this approach is defining “weak” and “strong”. A lack of generally accepted definition results in pathologists and papers choosing their own definition, making it difficult to compare multiple studies. Furthermore, and as-mentioned previously, the St. Gallen Consensus makes a distinction between high and low ER expression but fails to provide any definition or cut-off value to determine which tumours are in fact high in ER expression [14]. The St. Gallen Consensus does not mention high and low PR expression at all.

Based on the results of this review, we propose using both ER and PR expression only as a qualitative measure; defining tumours with < 1–10% of cells expressing this receptor as negative, and tumours with more than 1–10% of cells expressing the receptor as positive [17, 41]. Using a continuous quantitative measure does not seem feasible without centralised, unambiguous and clear pathological measurement. The implications for the daily clinical practise of pathologists are that more detailed information on the HR status beyond “positive” or “negative” should no longer be provided, to prevent oncologists subconsciously or instinctively making different treatment decisions based on this information. Since there is no evidence for different treatment strategies, providing extra information is both unnecessary and undesirable.

Simultaneously, one can speculate whether there is any added value of measuring the PR status at all. It is generally accepted that there is no such thing as an ER-negative/PR-positive tumour [42, 43]. Since the quantitative PR load is correlated to the quantitative ER load and PR load is inversely correlated to the histological grade of the tumour [19, 24], the question arises whether PR status provides any additional prognostic information, when ER status, grade and potentially a proliferation factor such as ki-67 is known. Likewise, when examining guidelines on adjuvant treatment, they do not propose different treatment strategies for tumours that are ER positive/PR positive compared to tumours that are ER positive/PR negative [8, 14]. Therefore, if the PR status is unlikely to change the course of treatment, it could be considered wasteful and excessive to continue measuring it [44, 45]. It might be worthwhile to focus future research on the independent contribution of PR status using multivariable models.

Gene expression profiling can be used to identify two inherently different entities within breast cancer, known as luminal-A and luminal-B. These are intrinsic molecular subtypes that reflect a different tumour biology and disease prognosis. Unfortunately, neither subtype is predictive for a better response to ET [29, 46,47,48,49,50]. Moreover, the gene expression profiles used to differentiate between these subtypes are expensive and not universally available and the added value for daily clinical practice, in particular for which subgroup of patients, is still debated [35]. For these reasons, researchers have tried to approach the distinction between these molecular subtypes using IHC, which resulted in subtypes called luminal-A-like and luminal-B-like [29]. When defining IHC-based luminal-A-like tumours as HR positive, HER2 negative and ki-67 below 14%, approximately 81–85% of luminal-A tumours were correctly identified as luminal-A-like. However, approximately 35–52% of luminal-B tumours were incorrectly identified as luminal-A-like. When expanding the definition of luminal-A-like to HR positive, HER2 negative, ki-67 below 14% and PR above 20%, the specificity improves somewhat but not enough to accurately discriminate between the two subtypes [29, 48, 50].

With these considerations, and the lack of prognostic and predictive value of IHC assessed quantitative ER and PR load as shown in this review, the distinction between IHC-based luminal-A-like and luminal-B-like tumours should not be used to tailor treatment decisions for women with HR-positive stage 1–3 breast cancer.

Future perspectives

All in all, identifying the early breast cancer patients that could benefit most from ET remains a challenge, as more than half of all patients with an ER-positive breast cancer will not respond to ET [51]. Considering the frequent and often severe side effects of ET, an improved upfront selection of likely responders may lower the treatment burden. Since quantitative measurement of HR does not seem an appropriate instrument for identifying these patients, the oncologic community is searching for much needed other means to predict response to ET. One potential method to identify patients is to measure the activity of the ER pathway to distinguish in which patients the oestrogen receptor is not only expressed but also active and thus a suitable target for ET [52]. The use of predictive biomarkers in the neoadjuvant ET setting will be clearer after results of ongoing trials become available. Potentially, response to neoadjuvant therapy can be measured at a per patient level using postoperative pathology, bypassing the need for predictive markers altogether.

Conclusion

There is no clear evidence for using quantitative ER and PR load assessed by immunohistochemistry as a prognostic measure nor as a predictive marker for response to ET in patients with stage 1–3 breast cancer. Immunohistochemistry is the gold standard for measuring HR status but should only be used to distinguish HR-negative and HR-positive tumours. Gene expression profiles have prognostic value for women with ER-positive disease, early response evaluation to neoadjuvant therapy holds promise in the prediction of long-term response to endocrine therapy.