Impact of the menstrual cycle on commercial prognostic gene signatures in oestrogen receptor-positive primary breast cancer

Purpose Changes occur in the expression of oestrogen-regulated and proliferation-associated genes in oestrogen receptor (ER)-positive breast tumours during the menstrual cycle. We investigated if Oncotype® DX recurrence score (RS), Prosigna® (ROR) and EndoPredict® (EP/EPclin) prognostic tests, which include some of these genes, vary according to the time in the menstrual cycle when they are measured. Methods Pairs of test scores were derived from 30 ER-positive/human epidermal growth factor receptor-2-negative tumours sampled at two different points of the menstrual cycle. Menstrual cycle windows were prospectively defined as either W1 (days 1–6 and 27–35; low oestrogen and low progesterone) or W2 (days 7–26; high oestrogen and high or low progesterone). Results The invasion module score of RS was lower (− 10.9%; p = 0.098), whereas the ER (+ 16.6%; p = 0.046) and proliferation (+ 7.3%; p = 0.13) module scores were higher in W2. PGR expression was significantly increased in W2 (+ 81.4%; p = 0.0029). Despite this, mean scores were not significantly different between W1 and W2 for any of the tests and the two measurements showed high correlation (r = 0.72–0.93). However, variability between the two measurements led to tumours being assigned to different risk categories in the following proportion of cases: RS 22.7%, ROR 27.3%, EP 13.6% and EPclin 13.6%. Conclusion There are significant changes during the menstrual cycle in the expression of some of the genes and gene module scores comprising the RS, ROR and EP/EPclin scores. These did not affect any of the prognostic scores in a systematic fashion, but there was substantial variability in paired measurements. Supplementary Information The online version contains supplementary material available at 10.1007/s10549-021-06377-3.


Introduction
Oestrogen receptor (ER)-positive disease represents approximately 80% of breast cancers [1,2]. Standard treatment of patients with ER-positive disease comprises surgery and adjuvant endocrine therapy with the addition of chemotherapy based on clinical risk factors and/or prognostic estimates from one of several gene expression-based tools. Three of the most widely used tumour profiling tests are the Oncotype DX Recurrence Score (RS) [3], Prosigna risk of recurrence (ROR) score often known as the PAM50 [4] and EndoPredict (EP/EPclin) [5], which provide an estimate of the 10-year risk of distant recurrence assuming 5 years of adjuvant endocrine therapy without chemotherapy and are endorsed for use in ER-positive, human epidermal growth factor receptor-2 (HER2)-negative and lymph node-negative disease in authoritative guidelines [6,7]. The data supporting their use are stronger in postmenopausal than premenopausal patients although they are applied clinically in both settings.
The ROR is a 50 gene (plus eight reference genes) test performed on the NanoString nCounter platform [4,12]. In addition to a continuous risk score (0-100), the test provides intrinsic subtype classification (Luminal A or B, HER-enriched, Basal-like). The ROR is calculated from the correlation of the expression profile of the sample with the reference gene expression profile (centroid) for each intrinsic subtype, combined with a score from the proliferative genes and tumour size [4,12]. Risk categories are defined by cut-points of 0-40 (low), 41-60 (intermediate) and 61-100 (high) for node-negative cancers and 0-15 (low),  (intermediate) and 41-100 (high) for one to three nodepositive cancers.
The EP score represents the molecular component of EPclin and comprises eight prognostic genes and four reference genes [5]. The test is RT-PCR based. The EP score ranges between 0 and 15 and uses a cut-point of 5 to categorise patients into low-and high-risk groups. EPclin, the readout of the clinically available EndoPredict test, combines the EP score with tumour size and nodal status and ranges between 0 and 8.16 with a cut-point of 3.3 used to categorise patients into low-and high-risk groups [5].
Each of the above tests includes a number of oestrogenresponsive genes (ERGs) and proliferation-associated genes (PAGs). The expression of some ERGs and PAGs in ERpositive breast cancers is known to vary across the menstrual cycle [13,14]. A recent study found significant changes in the expression of ERGs (twofold to threefold) and PAGs (1.4-fold) within the same patient that related to the hormone changes that occur during the menstrual cycle [15].
The presence of multiple ERGs and PAGs within the commercial signatures suggests that these tests may be sensitive to the prevailing hormone milieu at the time of testing. Theoretically, this might lead to a different score and risk categorisation being obtained depending on the point of the menstrual cycle when the prognostic signature was measured. Thus, we have investigated if RS, ROR and EP/ EPclin scores vary according to the time in the menstrual cycle when they are measured.

Patients and samples
Samples were selected from two clinical trials reported in a recent study of the effect of the menstrual cycle on breast tumour biology in ER-positive breast cancer [15]: MenCER, a UK-based multicentre study [15] and a study of neoadjuvant oophorectomy in Vietnam [16]. Paired tumour samples were taken at diagnosis and 1-4 weeks later, with no treatment occurring between these time-points.
In the current study, samples were assigned to two menstrual cycle windows, based on their previously measured serum hormone concentrations and menstrual cycle data [15]

Measurement of gene expression
The NanoString nCounter gene expression system (GEN2) (NanoString Technologies, Seattle, WA) was used to measure gene expression without target amplification [17]. A custom gene expression nCounter CodeSet was used to measure the expression of 82 genes including 14 reference genes (Supplementary Table 1) that include the genes of the RS, ROR and EP prognostic signatures. In brief, the CodeSet was hybridised to 150-200 ng total RNA and samples were processed using the NanoString nCounter Prep Station and Digital Analyzer according to the manufacturer's instructions.

Calculation of RS, ROR and EP/EPclin scores and % risk of distant recurrence
The gene expression normalisation and adjustment factors of NanoString data used to calculate the 'research use only' (RUO) RS and EP scores are described in Buus et al. [18]. Briefly, validated linear models were used to adjust each signature gene for cross-platform (NanoString vs. RT-PCR) variation and to generate RUO scores according to their published algorithms [3,5]. RUO EPclin scores were calculated from RUO EP scores using the EPclin algorithm [5] incorporating tumour size and nodal status. The corresponding % risk of distant recurrence at 10 years was calculated for RS using web-based tools provided by GHI [19] and for EP/ EPclin by digital read-out (https:// apps. autom eris. io/ wpd/) from the published graphs of EP/EPclin score vs. % risk [5]. RUO ROR scores and their corresponding % risk of distant recurrence at 10 years were calculated by NanoString.

Data analysis
For paired data, the Wilcoxon matched-pairs signed rank test was used to compare differences in gene expression. For individual genes, false discovery rate was calculated using the Benjamini-Hochberg procedure to adjust for multiple testing. The F test was used to compare variances of the different scores and risks in paired samples taken in either different or the same window. To study associations between continuous variables Spearman's rank correlation was used.

Patient demographics
Patient demographics of the 30 patients are described in Supplementary Table 2. All patients were premenopausal with ER-positive/HER2-negative tumours. Of those, 88% were progesterone receptor (PgR)-positive and 8 patients had node-positive disease (range 1-2 nodes positive). Figure 1a shows the individual changes in the prognostic scores between W1 (low oestrogen and progesterone) and W2 (high oestrogen ± progesterone) for each test. Mean [± standard error of the mean (SEM)] scores were not significantly different between W1 and W2 for RS (26.7 ± 3.5 vs. 26.9 ± 3.9; Wilcoxon p = 0.96), ROR (34.2 ± 3.7 vs.

Variation of scores measured in the same window vs. different windows
Measurements of the four signature scores in the same window, one menstrual cycle apart, from eight patients showed no significant changes (Fig. 2). The variation of the scores when they were measured in W1 and W2 compared to those measured in the same window was significantly higher for RS (F test; p = 0.0003) and EP/EPclin (p = 0.029 and 0.019, respectively), but not for ROR (p > 0.05) (Fig. 2a). Variation of the corresponding estimates of % risk of disease recurrence showed the same pattern with significant differences for RS (p = 0.0008) and EP/EPclin (p = 0.0064 and 0.0071, respectively), but again not for ROR (p > 0.05) (Fig. 2b).

Changes in gene signature component modules and individual genes
Of the individual modules of the RS, the mean ER module score was significantly higher in the window with high oestrogen (W2) (+ 16.6%; p = 0.046), whilst the mean invasion module score trended lower in W2 than W1 (− 10.9%; p = 0.098) with more than a twofold reduction in W2 in some patients (Fig. 3). The change in ER module score was driven by a significant increase in PGR expression between the two windows (+ 81.4%; p = 0.0029) with no change apparent in the other three genes (ESR1, BCL2 and SCUBE2) in the module (Supplementary Fig. 2a). There was a trend for a higher RS proliferation module score (mean + 7.3%; p = 0.13) in W2, even though the score was thresholded in 13 cases in W1 and 10 cases in W2 (Fig. 3). All five of the individual PAGs that make up the RS proliferation module showed an increase in their mean expression in W2 compared to W1 (9.6-44.6%; p = 0.065-0.21) (Supplementary Fig. 2b), but in no case was this statistically significant. Both genes in the RS invasion module (MMP11 and CTSL2) showed lower expression in W2, but this did not reach significance for either of them ( Supplementary Fig. 2c). There was no significant change in the HER2 module scores, which were thresholded in 21/22 cases, between the windows (mean + 1.7%; p = 0.25) (Fig. 3). The ROR proliferation score showed a non-significant trend to be higher in W2 compared to W1 (23.9%, p = 0.092; Supplementary Fig. 3) and there was a very strong correlation with the change in the ROR proliferation score and the change in ROR score between W1 and W2 (r = 0.86, p < 0.0001). Other than PGR (see above), no other individual gene in any of the signatures showed a significant change between W1 and W2.  Fig. 4). In both windows, RS and ROR showed the weakest correlation, whilst all correlations were stronger in W1 than W2.

Correlation of RS, ROR and EPclin signature scores
Changes in estimated risk between W1 and W2 with RS did not correlate significantly with the change in estimated risk with each of the other 3 signatures (range r = 0.32-0.41; p = 0.06-0.15). However, the change in estimated risk found in each of the other signatures did correlate significantly between each of those signatures (range r = 0.73-0.98; p ≤ 0.001), such that in most cases tumours showing an increase or decrease in risk with one test also showed an increase or decrease, respectively, with the other tests (Fig. 4).

Discussion
Earlier studies examining changes in tumour biology during the menstrual cycle have focused mainly on ER and PgR protein levels and produced inconsistent results [20][21][22][23][24][25][26][27]   reflecting the difficulties of reliably assigning the timing of the menstrual cycle. In more recent retrospective studies, we have shown tumoural ERG expression to be significantly higher in mid-to late cycle and PAG expression lower later in the cycle [13,14]. In a prospective study, significant changes in the expression of ERGs and PAGs were demonstrated within the same tumour [15].
There is very little previous work examining the effect of menstrual cycle on gene expression-based prognostic signatures, such as RS, ROR and EP, which are widely used in ER-positive breast cancer to estimate the risk of distant recurrence for patients receiving endocrine therapy and help guide the use of adjuvant chemotherapy. A recent study by Bernhardt et al. [28] in 25 women reported a higher discordance of RS score when measured in paired samples from the 16 women < 50 years of age. Eight of the 16 cases < 50 years showed differences of > 4 U in the recurrence score between the paired biopsy compared with none of the 9 cases from older women. The calculation of an 'analogous' RS in that study did not appear to threshold the proliferation and HER2 modules as performed in the clinically used RS algorithm so the results may not correctly replicate the clinically used RS. This observation highlights the importance of using methodology able to accurately recapitulate clinical prognostic signature scores in the research setting. In the current study, we used our published method for the derivation of RUO RS and EP/EPclin scores using gene expression data generated on the NanoString nCounter platform [18]. Nonetheless, the data from the Bernhardt study support the concept of substantially greater variation in RSs in premenopausal than in postmenopausal women.
Although none of the individual gene signatures showed systematic changes in their score or their estimate of risk of distant recurrence in the absence of chemotherapy between the different windows of the menstrual cycle, substantial variability was observed between paired samples for all three scores. Some of these changes might result in different clinical decision-making regarding the use of chemotherapy in the affected patient. However, it is not possible to say whether results might be more accurately aligned to clinical outcome if tests were conducted in one window rather than the other. The lower % discordance for EP/EPclin would be expected due to the absence of an intermediate risk group for these scores and therefore less potential for discordance. There were also risk categorisation changes in the small group of control samples taken within the same window of the menstrual cycle suggesting that a significant proportion of the variation observed may be inherent to the assays, tissue heterogeneity or subtle menstrual cycle effects. The proportion of patients that switch from one category to another is clinically relevant and is most easily judged with the EPclin where there are just low-and high-risk categories. In this study set 3/22 (14%) differed in this way but the size of the study does not allow this to be considered as generalisable. The proportion switching will also vary according to the population in which this is assessed with higher proportions occurring when estimates are close to the risk category cut-off.
It should be noted that changes in risk categorisation can give a very variable read-out of a test's reproducibility. Thus, when the revised cut-points (11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25) for RS from the TAI-LORx study [9] were used, 50% of tumours were classified differently in the same window, whereas there were no misclassifications using the original cut-off values. The changes seen in the intrinsic subtype information provided by the ROR test were similar between samples taken in different windows (14%) and samples taken in the same window (25%) providing no evidence for any additional variation in intrinsic subtype determination due to menstrual cycle effects.
Comparison of the variation of the signature scores and their estimates of % risk of distant recurrence in the absence of chemotherapy when they were measured in W1 and W2 compared to in the same window indicated a significant difference for RS and EP/EPclin, but not for ROR suggesting a greater effect of the menstrual cycle on the former signatures with the caveat that the numbers for comparison are low in the same window group. Alternatively, this may reflect a greater inherent variability in the ROR score, such that it is harder to detect a difference in variability between the pairs of measurements in the same and the different windows. Published analytical and reproducibility data for the clinical versions of the tests show standard deviations of 1.53 (1.53% of reporting range) for RS [29], 0.21 (1.40% of reporting range) for EP, 0.057 (0.70% of reporting range) for EPclin [30] and 2.9 (2.9% of reporting range) for ROR [31], with a 90% concordance of subtype classifications for the latter. This provides some evidence for a greater inherent variability of the ROR score, although the data underlying these estimates come from different populations and the comparisons are therefore indirect. Interestingly, RS, ROR and EPclin scores showed stronger correlations with each other in W1 than in W2 possibly reflecting the less variable hormonal milieu in W1. Incorporation of clinical information might be expected to reduce the observed variability between paired measurements in the same patient as it is identical for both sample pairs. However, there was little evidence for this when EPclin was compared to EP.  The variability of the RS during the menstrual cycle was investigated further by examining changes in its component modules and genes. The ER module score was significantly higher in the presence of the higher oestrogen and progesterone levels in W2 rather than in W1, driven by a significant increase in PGR expression. Additionally, the proliferation module score, even though thresholded, showed a trend to increase in W2, whilst the invasion module score trended lower in W2. These data confirm that changes in individual genes and gene modules do occur across the cycle, but that these changes largely balance one another out because of their opposite direction in the risk algorithm for the RS. In agreement with the trend for the RS proliferation score to increase in W2, the ROR proliferation score also showed a strong trend to increase in W2. The change in ROR proliferation score correlated very strongly with the change in ROR between windows. This concurs with recent work indicating that proliferation appears to be the main driver of ROR, in contrast to RS, which may be more driven by its ER module (and predominantly by PGR itself) in a postmenopausal population [8].
Strengths of the current study include the careful assignment of menstrual cycle timing, the use of validated methodology to accurately recapitulate the prognostic signature scores and the availability of a group of tumours taken in the same window to act as a control. A weakness of the study was the modest number of patients available particularly for those pairs of samples taken in the same window. To maximise numbers, we used samples from two independent studies [15] and split the menstrual cycle into just two windows. As a consequence, W2 contained a wide range of progesterone concentrations in particular, ranging from very low in the first half of W2 to maximal in the latter half of the window. This would be likely to add extra variability to measurements made in W2, thereby reducing the power of the study to observe significant differences between paired samples taken in W1 and W2. Another limitation of the study is the inclusion of patients with node positivity although the RxPONDER trial found no evidence that OncotypeDX is informative for choosing whether patients should receive chemotherapy. There is no reason to expect that variability in molecular scores of the primary will vary according to lymph node status but this would impact on the estimates of risk of distant recurrence.
In summary, we show that there are significant changes during the menstrual cycle in the expression of some of the genes and gene module scores comprising the RS, ROR and EP/EPclin scores, but these do not affect any of the prognostic scores in a systematic fashion. Whilst none of the individual gene signatures showed significant changes between different windows of the menstrual cycle, substantial variability was observed for all three scores, such that 14-27% of samples were assigned to a different risk category.
Author contributions MD, IES, SC, MC and OG designed the initial study proposal. BPH and OG managed the study and data collection. IES, MC, CH, CO, AE, AS, MS, CR, SL, LN, LHQ, PTH, PHK and NVD recruited and managed the study patients. TVT conducted pathology studies and managed the tissue samples in Vietnam. BPH performed the gene expression measurements. BPH, MD and OG collected and managed the study data and wrote the manuscript with assistance from AA, RB, GS and MC. Data availability The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Declarations
Conflict of interest Mitch Dowsett received lecture fees from Na-noString Technologies and served on an Agilent advisory board.
Ethical approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Consent to participate Informed consent was obtained from all individual participants included in the study.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.