Co-expressed genes enhance precision of receptor status identification in breast cancer patients

Purpose Therapeutic decisions in breast cancer patients crucially depend on the status of estrogen receptor, progesterone receptor and HER2, obtained by immunohistochemistry (IHC). These are known to be inaccurate sometimes, and we demonstrate how to use gene-expression to increase precision of receptor status. Methods We downloaded data from 3241 breast cancer patients out of 36 clinical studies. For each receptor, we modelled the mRNA expression of the receptor gene and a co-gene by logistic regression. For each patient, predictions from logistic regression were merged with information from IHC on a probabilistic basis to arrive at a fused prediction result. Results We introduce Sankey diagrams to visualize the step by step increase of precision as information is added from gene expression: IHC-estimates are qualified as ‘confirmed’, ‘rejected’ or ‘corrected’. Additionally, we introduce the category ‘inconclusive’ to spot those patients in need for additional assessments so as to increase diagnostic precision and safety. Conclusions We demonstrate a sound mathematical basis for the fusion of information, even if partly contradictive. The concept is extendable to more than three sources of information, as particularly important for OMICS data. The overall number of undecidable cases is reduced as well as those assessed falsely. We outline how decision rules may be extended to also weigh consequences, being different in severity for false-positive and false-negative assessments, respectively. The possible benefit is demonstrated by comparing the disease free survival between patients whose IHC could be confirmed versus those for which it was corrected. Electronic supplementary material The online version of this article (10.1007/s10549-018-4920-x) contains supplementary material, which is available to authorized users.


Background and significance
Individualized breast cancer therapy is based on molecular characterization [1][2][3], in particular the presence of receptors for estrogen (ER), progesterone (PGR) and human epidermal growth factor 2 (HER2) in an incoming patient. It is hence essential to reliably assess the status of these three receptors when aiming at optimum individualized therapy within precision medicine [1][2][3][4][5].
Receptor status obtained from immunohistochemistry (IHC) is usually considered standard of care, and crucially guides therapy. However, in up to 20% of patients, assigned ER + status may be erroneously classified [6][7][8]. Multicenter studies have been performed for quality assessment [9,10] and guidelines have been issued [8,11]. Possible consequences of misclassification on outcome have been evaluated Electronic supplementary material The online version of this article (https ://doi.org/10.1007/s1054 9-018-4920-x) contains supplementary material, which is available to authorized users. [12] and several authors have suggested making improvements on the reliability of IHC estimates by additionally considering gene-expression data [13][14][15][16].
In a previous paper [17], we have substantiated this suggestion by devising refined decision criteria based on geneexpression data.

Receptor status from IHC and one single gene
Our previous work [17] started out from IHC measurements (e.g. ER + IHC , ER − IHC and ER 0 IHC for estrogen positive, negative or missing). In a second step, estimates for gene-expression (GE) were added for ER (gene ESR1), for PGR (gene PGR) and for HER2 (gene ERBB2). Combined results were obtained in each patient via a scoring system based on all three receptors.
As a result, the IHC estimates of receptor status were questioned in a significant portion of patients. These patients might receive more adequate treatment due to an improvement of receptor status assessment, as proposed.

Adding co-genes
In the present work we now extend our previous analysis to qualified co-genes as suggested by several authors [18,19]. We were able to demonstrate how adding co-expression (CO) can even further improve receptor status assessment.
We first demonstrate how co-genes can be properly selected and why we ultimately chose AGR3 as co-gene for ESR1 [20,21], ESR1 as co-gene for PGR and PGAP3 as co-gene for HER2, see

Objectives regarding patient benefit
The usefulness of our method is assessed as follows: (a) Disease free survival curves are compared for those patients having their IHC estimate confirmed by both, GE as well as CO. They have received optimum therapy, as concluded from IHC alone. Second, we compute the disease free survival for those patients whose IHC estimates have been questioned by GE and/or CO. Therapies might have been erroneous, or at least suboptimal. The difference in disease free survival is considered a direct indicator of a benefit possibly being leveraged by this work. (b) Paired survival curves are computed for the ER, PGR and HER2.

Predictive co-genes
All genes were subjected to a numerical 'co-expression check' to ascertain their usability, for details see the methods section. All in all we ended up with pairs of receptor-genes and cogenes as shown in Table 1.

Predicting receptor status separately from genes and co-genes
For a given receptor, such as the ER, we performed two separate logistic regressions, one for the very receptor gene and a second one for a co-gene, see Fig. 1, left panel.  Table 8. Right panel: corresponding receiver operator characteristics (ROC)-curves. For quantitative estimates of regression quality, see Table 8 1 3 Each curve is represented by a logit function. For simplicity of notation, we exemplify the formalism only for the estrogen receptor: The differences between the curves in Fig. 1 are reflected in individual parameters β 0 and β 1 , resulting from different logistic regressions for each gene and co-gene. See supplementary material for numerical results and the methods section for computational details.
Upon entering the expression value, x GE , above formula yields the probability p ER + GE (x GE ) for the patient being receptor positive. Vice versus, the probability for being receptor negative is given by A similar formula is obtained for the co-gene of estrogen, AGR3, with different coefficients β 0 and β 1 , however. Thus, for a given receptor being positive we end up with two probabilities, p ER + GE (x GE ) and p ER + CO (x CO ).
The very same procedure applies to PGR and HER2. Mathematical details and values for β 0 and β 1 are given in supplementary material. Note that all curves tend towards p(x) = 1, since very high expression indicates receptor positivity with almost certainty.

Joint prediction of receptor status from IHC, genes and co-genes
In this section we demonstrate the benefit achieved by enriching IHC estimates with information from receptorgenes and co-genes.
Considering only IHC estimates, numbers of patients are given in column 'IHC only' of Table 2. Results '−' and '+' directly enter treatment allocation, patients with IHC estimates 'not available' cannot be properly allocated (no conclusions can be drawn, hence we use the term 'inconclusive' for the rest of this article). (1)

Probabilistic view on IHC estimates
As a first step towards joining information from IHC and gene-expression ( Fig. 1), IHC estimates are interpreted probabilistically as follows: 1. If an IHC-assay yields receptor positive, we do not take this for sure but attribute the precision p + IHC = 0.85 for the sample being truly positive and insert this into Eq. 2. This is reasonable, since we have to bear in mind that about 15% of IHC estimates are considered false [6,7]. 2. Conversely, if an IHC-assay yields receptor negative, we credit p + IHC = 0.15 for truly being receptor positive. 3. If an IHC estimate is not available, we attribute the precision of p + IHC = 0.5 . Note that this precision bears no context to the prevalence of receptor status.

Joint prediction from IHC, expression of genes and co-genes
For a specific patient, the probabilities obtained from IHC, gene-expression and co-expression have now to be fused to arrive at a joint estimate.
For reasons outlined in the methods section, we consider odds, aggregate them by adding their logarithms and arrive at a score S + for the patient being receptor positive: Numerical values for the parameters β are given in supplementary material, for each of the receptors. To arrive at a decision, this score S + is compared with a threshold, S 0 , which we set S 0 = log(0.85∕0.15) ≈ 1.735 = logit(precision). 1 This represents an executable procedure for aggregating information into a comprehensive receptor status assessment: For mathematical details and threshold setting, please see the methods section.
Combining information from IHC, gene-expression and co-expression yields the numbers of patients as shown in the rightmost parts of (2) 1 3

Overall improvement of receptor diagnostics based on joint assessment
We then analysed the overall improvement of receptor assessment due to adding expression data for the receptor gene and a co-gene. To illustrate the overall effect of such a joint assessment, flows of patients between diagnostic states 'IHC' and 'IHC & GE & Co' are shown in a Sankey diagram, see Figs. 2, 3 and 4. The Sankey diagram displays changes in estimated receptor status ('flows' of patients) after enriching information from IHC by information from GE and CO.
Since we discriminate three different categories ('+', '−' and 'inconclusive'), there are 9 possible types of flow from initial IHC estimates towards some final result which is based on all information available (IHC & GE & CO). Flows are labelled from (a) to (i), see also Table 3, and the examples below for ER, PGR and HER2.
The relevance of this sort of enriched receptor diagnosis is reflected in the fact that out of 9 patient flows possible in theory, each one actually occurs in practice.

Estrogen receptor assessment
As expected, the flow category 'confirmed' of the IHC estimates represent the largest flows [in Fig. 2: red → red (label a: 1562, ≈ 94%) and blue → blue (label b: 1219, ≈ 89%)]. The error rates reported (6% and 11%, respectively) are only Results are given separately for each receptor. For IHC (leftmost column) we discern the categories-/inconclusive (inc)/ +. In some cases information from IHC is not available but we use the term 'inconclusive' for consistency of notation. Information from gene expression (GE, CO) is but always available, however it may yield 'inconclusive' as a result, see the column headings seemingly contradictive with the initial guess of 15%, in fact they are not. 15% invalid IHC results have been reported in the literature (as quoted). Adding gene plus co-gene information fixes only a portion-not all of those. Very important are flows allocating missing IHC estimates from 'inconclusive' into 'definite', after adding information from GE & CO. They represent diagnostic improvements, resulting in ER + for ≈ 42% (92 patients) and in ER − for ≈ 42% (91 patients), see Fig. 2, labels (c) and (d), respectively.
Of utmost interest for patient safety are 'corrected' cases, in which the IHC estimate is converted into its opposite. Fortunately, we found only a few such cases: 52 (≈ 3%) correcting ER + → ER − and 68 (≈ 5%) correcting ER − → ER + , see labels (e) and (f), respectively. Even though improvements are small in terms of percentages, they helped to fine tune the treatment approach and be more precise in treatment selection for better results.
A third type of flow represents 'rejected' estimates, i.e. patients starting with a definite IHC estimate, which is questioned thereafter and ends up inconclusive after adding 'GE & CO'. In our data we observe 45 such cases for ER + (≈ 3%) and 78 for ER − (≈ 6%), see Fig. 2, labels (g) and (h), respectively. These cases also represent an improvement, even though the receptor status results inconclusive and has to be re-determined: This way, possible suboptimal treatments may be avoided.
The last flow represents 'inconclusive' patients (in our data 34, i.e. ≈ 16%) for which not even the full information (IHC & GE & CO) sufficed to arrive at a definite receptor status, see Fig. 2, label (i).

Progesterone receptor assessment
In most cases, enhanced information leads to the confirmation of PGR-status, see Fig. 3: red → red (label a: 808 patients) and blue → blue (label b: 1076 patients).
IHC estimates initially missing were upgraded into definitely PGR + in a flow comprising 373 patients and into definite PGR − in 477 patients, see Fig. 3, labels (c) and (d), respectively.
Cases in which PGR-status needs to be corrected are rare: 23 turning PGR + →PGR − (label e) and 25 PGR − → PGR + The flows leading into assessments in question are moderate in size: 93 patients initially within PGR + evade to 'inconclusive', see Fig. 3, label (g), and 135 initially PGR − end up 'inconclusive', see Fig. 3, label (h). As mentioned above for ER status, the category 'inconclusive' being rendered may be seen as a warning to improve assessment (in which way ever) so as to avoid possibly suboptimal treatment.
Inconclusive PGR-status remains as such in 231 patients, despite full information, see Fig. 3, label (i).
The overall improvement of PGR diagnostics is reflected in the increase of definite results from 2160 (= 924 + 1236) to 2882 (= 1206 + 1576), cf. Table 2 and Fig. 3. Concomitantly, the number of inconclusive receptor estimates declines from 1081 to 459.

HER2 assessment
Despite the availability of standardized HER2 testing strategies and the widespread use of ASCO/CAP guidelines, amplification results vary considerably. Our approach to enrich information for HER2 assessment, leads to confirmation in about 72% of HER2 + IHC patients, see Corrected cases for HER2 are asymmetric: 85 turn HER2 + → HER2 − (≈ 13%, label e) and 13 HER2 − → HER2 + (≈ 1%, label f), see the blue and the faint red flow crossing over into the opposite domains, respectively.

Selection of co-genes
One would expect co-genes could be found by looking for genes which show the strongest correlation with the corresponding receptor gene. This is not optimum, however, for the following reason: Given a gene with 100% correlation, it could clearly deliver no additional information on   top of the gene itself. Hence, looking for largest possible correlations is suboptimal. For this reason we applied linear discriminant analysis via the limma software package, as described in the methods section, results for the estrogen receptor see table 4. Discriminant analysis in fact led to the surprising finding that a co-gene (in this case ERS1) of progesterone may be more predictive than the very receptor gene itself (PGR).

Concordance of estrogen and progesterone receptor status
ER and PGR are concordant in the majority of cases. However, in accordance with literature [8] a small portion (23 ≈ 1.7%) of the patients assessed ER − IHC were at the same time found PGR + IHC in our dataset, see Table 5. Likewise, 240 patients assessed PGR − IHC were at the same time found ER + IHC . As a consequence, both receptors have to be considered in combination to optimize the stratification of therapies.

Impact of false positive hormone receptor assessment on outcome
In clinical practice, therapy is allocated according to IHC estimates. We know, however, that these may sometimes be inaccurate, and we have to envisage worse outcomes as compared to patients with correctly assessed receptor status. In order to quantify these effects (based on our model with parameters given in Table 8) we build sets of patients as follows, cf. Fig. 2 Kaplan Meier estimates of disease-free survival were computed separately for positive estrogen receptor status assigned correctly ({ER a }) and erroneously ({ER e,g }), see      Fig. 6, left panel. On the contrary, 639 patients have originally been assessed HER2 + IHC , out of which 458 were confirmed, 85 corrected towards negative (flow e) and 96 rendered inconclusive (flow g). The merged set {HER2 e,g } = {HER2 e } ∪ {HER2 g } is comprised of 181 patients who may have received unnecessary anti-HER2 therapy. The impact on disease-free survival is shown in Fig. 6, right panel.

Enhanced precision of receptor status: impact on outcome
IHC estimates rejected or even corrected by GE & CO definitely represent improvements in diagnostic quality. Corrected cases might receive more adequate therapies (flows In displaying the impact on outcome, we merge corrections and rejections, e.g. show that the disease free survival for erroneously positive assigned ER (set {ER e,g }) is worse than for confirmed positive cases (set {ER a }), Wilcoxon test, p = 0.03, see left panel Fig. 5.
For PGR, the negative effect of wrong assignments cannot be substantiated (right panel Fig. 5), survival curves fail to show significant differences (Wilcoxon test, p = 0.08). The reason may lie in the fact that patients falsely negative in PGR nevertheless received anti-hormone therapy, due to being assessed ER + IHC . Please note that the numbers of erroneously assigned receptor status are comparatively low and statistical test results are therefore insignificant in many cases. However, such findings are nevertheless highly important for the patients concerned, and their relevance must not be judged according to p-values.
Strictly speaking, the worse survival of patients with illassigned IHC-estimates could also have other causes than suboptimal therapy. However, since we know that therapy was likely suboptimal in these cases, it seems the most probable cause and worth being improved.
All in all it is obvious that the number of assignments increases by adding a co-gene.
It is important to understand that this is achieved by the intake of additional information given by the co-gene rather than by relaxing the threshold, S 0 , of acceptance. In fact, relaxing the threshold, S 0 , would also increase the number of seemingly conclusive assignments-at the cost of concomitantly increasing the rate of wrong assignments, however. Fiddling around with the threshold would only seem to be an improvement. Adding information from a co-gene, however, leads to a real and substantial improvement.
Another issue pertains to the number of co-genes to be considered for each receptor. Of note, adding correlated variables does not confer much additional information. Each variable-considered on its own-holds valuable information, and a statistical test would recommend its inclusion. However, the theory of feature selection recommends caution so as to avoid overfitting due to including a whole bunch of such correlated variables. As broadly described in the literature, many expression profiles up to now have suffered from overfitting, yielding results not reproducible for newly incoming patients.

Setting the precision threshold
We have chosen the threshold probability, S 0 , for acceptance exactly at the logit of precision of a positive IHC measurement without any further information from gene expression.
The reason for this is that any evidence from expression data not contradicting the IHC measurement should yield a definite result.

Different clinical weights of false positive and false negative assessments
In this work we reveal the impact of erroneously assessed receptor status on disease free survival and ignore all other aspects, e.g. side effects and quality of life being reduced by unnecessary treatment.
In an overall optimization one would have to include weights (judged by experts and patients) in order to tune sensitivity versus specificity of all assessments involved in a comprehensive manner. In particular, gains and losses due to falsely positive and negative are often assumed symmetric for simplicity-but this does not sincerely reflect reality.
A detailed analysis of gains versus losses would be needed, as a matter of fact. Gains in lifetime may be weighed against losses in quality of life for each type of correction envisaged (flows e, f, g and h). Should different sets of weights be advocated (e.g. by different panels of doctors and/or patients), slightly different strategies would mathematically result as respective optima. On the contrary, should ethic discussions arise and call for quantitative arguments, this work could readily provide 'criteria and scores for ethic strategies' in terms of lifetime.
This work helps to better identify patients for relevant and more appropriate therapy with long overall survival.

Study selection, normalization and co-genes
The dataset for this study has been assembled as follows [25]: out of several hundred breast cancer studies on Gene Expression Omnibus (GEO), which use the Affymetrix chip U133A + 2.0 ('platform GPL570' in GEO), we retained only those with 12 samples or more and data for receptor status and/or survival. Out of these 43 studies, 5 were dismissed due to incompatible normalization and two more because of insufficient receptor status. We finally used 36 breast cancer studies from gene-expression omnibus, see Table 9, curated and normalized them as described in Supplementary Materials and Methods.
Receptor-genes are uniquely defined for ER, PGR and HER2, and hence their expression values can directly be used. As opposed, possible co-expressed genes have to be selected according to criteria to be defined. To these end we developed and performed a co-expression check, based on intricate criteria, spotting those genes capable to yield maximum information on top of what is known from the very receptor-genes. Finally we end up with AGR3 as co-gene for ESR1, ESR1 as co-gene for PGR and PGAP3 as co-gene for Her2. For details see the Supplementary Materials and Methods and Fig. 7.

Information extraction and modelling
We performed logistic regressions to model the impact of gene-expression (of genes and co-genes) on receptor status and fused information from three sources (IHC, expression of receptor gene and co-gene) via the product of odds to arrive at a unique and most reliable assessment for each receptor and single patient. For details see the Supplementary Materials and Methods.  Table 9 List of series-IDs (GSExxxx) and sample-IDs (GSMxxxxx) downloaded from gene expression omnibus (GEO) to be used in the current work. As an example we show the first few IDs out of the first two series. The full list can be downloaded from the Supplementary  Table   GSM-ID  GSE-ID   GSM124996  GSE5460  Series GSE5460  GSM125003  GSE5460  GSM125005  GSE5460  GSM125007  GSE5460  GSM125022  GSE5460  GSM125023  GSE5460  GSM125039  GSE5460  GSM125042  GSE5460  …  …  GSM151259  GSE6532  Series GSE6532  GSM151260  GSE6532  GSM151261  GSE6532  GSM151262  GSE6532  GSM151263 GSE6532 … … Fig. 7 Agreement between IHC and gene-expression measurements. The agreement is measured by the Matthew coefficient [23]. It can be shown [24] that MCC is suitable also for imbalanced group size as in the case of HER2. Setting p + IHC entails a certain threshold via S 0 = log p + IHC ∕ 1 − p + IHC = logit p + IHC , the optimum value 0.85 being indicated by the reference line. The higher one chooses p + IHC , the higher the threshold (S 0 ) results above which an expression measurement is considered conclusive. Concomitantly, with rising threshold, the agreement between IHC and GE also rises, as reflected by an increasing MCC. Beyond p + IHC = 0.90 , however, only few geneexpression measurements remain conclusive, causing the graphs to fluctuate due to sparsity of data. Accordingly, there is no special meaning to the fact that the MCCs for ER and PGR further increase while the MCC for HER2 declines in the rightmost parts

Fusion of information from different sources
Of note, the step-wise increase of information and reliability, as quantified in Table 2, can most vividly be presented in Sankey diagrams, see Figs. 2, 3 4, 8, 9 and 10. They display clearly, how many patients arrive at increasingly secure and precise receptor diagnostics as a result of stepwise fusion of OMICs data (IHC, expression of receptorgenes and expression of co-genes).