Introduction

A younger age at menarche and an older age at menopause are well-established risk factors for the development of breast cancer [1]. In the general population, the risk of breast cancer decreases by 10% for each 2-year delay in menarche [2] but increases by 3% for each year that menopause is delayed [3]. These associations are consistent with the hypothesis that breast cancer risk is related to the extent of steroid hormone exposure during a woman's reproductive years, which drives breast mitotic activity and determines the probability of tumorigenic somatic events [4].

Recently, genome-wide association studies (GWAS) have identified several new common genetic loci associated with either age at menarche or age at natural menopause. Four independent GWAS of age at menarche have identified two novel loci at LIN28B and 9q31.2 [58], and two GWAS of age at natural menopause have identified four novel loci on chromosomes 5, 6, 19, and 20 [5, 9]. Most recently, the ReproGen Consortium, which consisted of these initial GWASs and many additional studies, has conducted expanded meta-analyses for age at menarche [10] and age at natural menopause [11] and reported more new loci identified for each trait. Given the well-established associations of age at menarche and age at natural menopause with breast cancer risk, we set out to assess whether these common genetic loci influence breast cancer risk and whether a genetic risk score (GRS) for these reproductive events might be useful for identifying a high-risk subgroup for breast cancer. Furthermore, since the reproductive risk factors have been observed to be differentially associated with breast cancer by tumor histological subtypes [1216], we assessed these genetic associations by tumor histological subtypes defined by estrogen receptor (ER) status.

We therefore conducted a meta-analysis of six population-based studies to investigate the association between genetic loci associated with age at menarche or age at natural menopause and breast cancer risk. We assessed 19 and 17 single-nucleotide polymorphisms (SNPs) that have been previously reported to be linked to age at menarche [10] and age at natural menopause [11], respectively, among up to 3,683 breast cancer cases and 34,174 controls in women of European ancestry and evaluated whether these SNPs were differentially associated with breast cancer subtypes defined by ER status in two studies in which such data were available.

Materials and methods

Study population

The ReproGen Consortium was formed by more than 30 studies in the US and Europe to investigate the genetics of reproductive aging traits [10, 11]. Our analysis used data from six population-based studies from the ReproGen Consortium: the Nurses' Health Study (NHS), the Women's Genome Health Study (WGHS), the SardiNIA Breast Cancer Study (SardiNIA), the Rotterdam Study I and II (RSI+II), the Framingham Heart Study (FHS), and the Atherosclerosis Risk in Communities Study (ARIC). Each study had at least 200 breast cancer cases. Four studies were prospective cohort studies, one was a nested case-control study, and one was a case-control study. A description of the six studies is provided in Table 1, and more information is given in Additional file 1. Briefly, breast cancer cases occurring in defined populations during specific periods of time were identified by structured questionnaires, medical records, or linkage with a nationwide registry of cancer or death index or both. By the time we conducted this study, the majority of the women in these studies had passed through menopause. As most of the participants in these studies were European whites, we restricted analyses to women of European ancestry. We excluded subjects with missing information on age. Two studies (NHS and WGHS) provided information on the ER status of the breast tumors for a subset of the cases. This information was extracted from medical records. Each study was approved by the relevant local institutional review boards.

Table 1 List of participating studies and number of case and control subjects

Genotype data

We analyzed genotypes for 19 and 17 independent SNPs with reported associations with age at menarche and age at natural menopause, respectively, in the ReproGen Consortium, in which all SNPs achieved genome-wide significance in the meta-analysis of each trait (combined stage 1 and replication P value of less than 1 × 10-8) [10, 11]. None of these SNPs has been reported to be associated with breast cancer risk in previous GWAS and this is likely because of the very stringent P value threshold used to declare genome-wide significance (usually, P values were less than at least 1 × 10-7). As positive controls, 10 SNPs with consistently reported associations with breast cancer as shown in recent GWAS were included [1719]. All 46 SNPs are listed in Table S1 of Additional file 2. Genotypes used in this analysis have been previously described [10, 11]. Complete genotype data from a total of up to 3,683 cases and 34,174 control subjects were available for analysis after the exclusions described in the 'Study population' section.

Breast cancer risk factors

The six studies from the ReproGen Consortium provided information on one or more of the following risk factors for breast cancer: age (continuous, at study entry or diagnosis), age at menarche (continuous, between 9 and 17 years), age at natural menopause (continuous, between 40 and 60 years), age at first live birth (less than 20, 20 to 24, 25 to 29 or no birth, at least 30 years), family history of breast cancer in first-degree relatives (yes/no), alcohol consumption (less than 5, 5 to 15, 15 to 30, at least 30 g/day), parity (0, 1 to 2, at least 3), menopausal hormone therapy (ever/never), oral contraceptive (OC) use (ever/never), and adult body mass index (BMI) (continuous).

Genetic risk score computation

The GRS was calculated on the basis of the 19 and 17 independent SNPs identified in previous studies as being associated with age at menarche and age at natural menopause, respectively [10, 11]. As a younger age at menarche and an older age at menopause are independently associated with an elevated breast cancer risk, we computed separate GRSs for a younger age at menarche and an older age at natural menopause. The risk allele was defined as an allele that was associated with a younger age at menarche or an older age at natural menopause. Two methods were used to determine the GRS: a simple count method (count GRS) and a weighted method (weighted GRS). We assumed an additive genetic model for each SNP, applying a linear weighting of 0, 1, or 2 to genotypes containing 0, 1, or 2 risk alleles, respectively. The count method assumes that each SNP contributes equally to the risk of breast cancer. The count GRS was calculated by simply summing the number of risk alleles of each SNP. For the weighted GRS, each SNP was weighted by β-coefficients obtained from the replication studies of recent meta-analyses of two traits [10, 11]. The weighted GRS was calculated by multiplying each β-coefficient by the number of corresponding risk alleles (0, 1, or 2) and then summing the products. To simplify interpretation and facilitate comparison with the count GRS, the weighted GRS was further divided by twice the sum of the β-coefficients and then multiplied by the total number of risk alleles. To provide a positive control and also to control for potential confounding by known breast cancer-associated genetic variants, a count GRS was computed on the basis of the 10 SNPs with consistently reported associations with breast cancer [19]: rs2981582, rs3803662, rs11249433, rs7716600, rs13387042, rs889312, rs13281615, rs999737, rs3817198, and rs1045485.

Statistical analysis

In each of the six studies, we performed logistic regression to evaluate the association with breast cancer for each of the 46 candidate SNPs, assuming an additive genetic model. Logistic regression was also used to analyze the association between GRS and breast cancer by including both GRSs for age at menarche and age at natural menopause in the model as the main effects. The GRSs were modeled as continuous variables or categorized into quintiles, and the cutoff points for quintiles were based on the WGHS population, which is the largest prospective cohort population among all participating studies. This approach was applied to each of the six participating studies. Odds ratios (ORs) and 95% confidence intervals (CIs) were estimated from logistic regression. To control for potential confounding by population stratification, we adjusted for the top principal components of genetic variation chosen for each study. We adjusted for age in the main model. To examine whether the genetic association of each of the candidate SNPs or GRSs with breast cancer is mediated through the onset of menarche or natural menopause, we then adjusted for self-reported age at menarche and age at natural menopause in the main model. Other conventional risk factors for breast cancer - including age at first live birth, family history of breast cancer in first-degree relatives, alcohol consumption, parity, menopausal hormone therapy, OC use, and adult BMI - were further included in the model to control for potential confounding in studies which had such data available. To examine whether these genetic associations differ by breast cancer subtypes, in each of the two studies that provided information on ER status, we then investigated the genetic association of each of the candidate SNPs or GRSs with breast cancer in subgroup analysis by ER histological status (positive or negative).

Forest plots were used to present study-specific ORs and 95% CIs. We then performed meta-analyses by using the fixed-effects model to estimate summary ORs from study-specific estimatesthat were weighted by the inverse of the variance of each study. As the meta-analyses restricted to prospective cohort studies or case-control studies yielded similar results, we present results from only the meta-analysis of all six participating studies. We also tested the heterogeneity of associations across studies as well as across different tumor subtypes by using the Q test [20].

All statistical analyses were performed by using SAS version 9.1 software (SAS Institute Inc., Cary, NC, USA). Power calculations were carried out by using Quanto (University of Southern California, Los Angeles, CA, USA). All P values were based on two-sided tests and were considered statistically significant if less than 0.05. Because SNPs were selected on the basis of an a priori hypothesis, adjustments for multiple comparison tests were not performed.

Results

The six participating studies contributed 3,683 breast cancer cases and 34,174 controls of self-reported white women of European ancestry (Table 1), all with available data on age and the 46 candidate SNPs, and at least one of the conventional risk factors considered. Of the 3,683 cases, about 52% were from the four prospective cohort studies (WGHS, RSI+II, FHS, and ARIC), about 30% were from the nested case-control study in NHS, and about 18% were from the population-based case-control study in SardiNIA. ER status was known for 2,087 cases in the NHS and the WGHS. On average, compared with the controls, the cases had a younger age at menarche and an older age at natural menopause. The expected associations with breast cancer were generally observed for the conventional risk factors across all of the studies (Table S2 of Additional file 3). The associations of the 46 candidate SNPs with age at menarche or age at natural menopause in the six studies were consistent with the original findings from the two meta-analyses [10, 11].

Table 2 shows the risk allele frequency and the corresponding per-risk-allele OR of breast cancer for each of the 46 candidate SNPs. The results are arranged in order of the strength of statistical significance (P value). The allele frequency for each SNP in the controls was similar to those reported for populations of European descent [2123]. After adjusting for age and potential population stratification, we found that, among the 19 candidate SNPs for a younger age at menarche, two SNPs, rs1079866 and rs7821178, were significantly associated with breast cancer risk and had corresponding per-risk-allele ORs of 1.14 (96% CI = 1.05 to 1.24; P value = 0.003; P for heterogeneity = 0.37) and 1.08 (95% CI = 1.02 to 1.15; P value = 0.009; P for heterogeneity = 0.43), respectively. The SNP rs1079866 is located about 250 kb away from the INHBA gene on chromosome 7, whereas SNP rs7821178 is about 181 kb away from the PXMP3 gene (also known as PEX3) on chromosome 8. The strongest GWAS hit for age at menarche, rs7759938 at LIN28B on chromosome 6, was not found to be associated with breast cancer risk (P value = 0.60). Of the 17 candidate SNPs associated with an older age at natural menopause, one SNP, rs2517388, was significantly associated with breast cancer risk with a per-risk-allele OR of 1.10 (95% CI = 1.01 to 1.20; P value = 0.023; P for heterogeneity = 0.08). This SNP is an intronic SNP in the ASH2L gene on chromosome 8. The study-specific and summary ORs for the three associated SNPs are shown in Figure 1. Further adjustment for conventional risk factors - including age at menarche, age at natural menopause, age at first live birth, family history of breast cancer in first-degree relatives, alcohol consumption, parity, menopausal hormone therapy, OC use, and adult BMI - did not change the results substantially. For candidate loci for age at menarche and age at natural menopause, the findings did not differ materially when we further adjusted for known breast cancer-associated SNPs.

Table 2 Association of candidate single-nucleotide polymorphism loci and the risk of breast cancer
Figure 1
figure 1

Forest plots for the three candidate loci (rs1079866, rs7821178, and rs2517388) in association with breast cancer risk. Per-risk-allele odds ratios (ORs) and 95% confidence intervals (CIs) were obtained from unconditional logistic regression in each study, and age and potential population stratification were adjusted for. The size of the box is inversely proportional to the standard error of the log OR estimate. P values for heterogeneity across studies are 0.37, 0.43, and 0.08, respectively. ARIC, Atherosclerosis Risk in Communities Study; FHS, Framingham Heart Study; NHS, Nurses' Health Study; RSI+II, Rotterdam Study I, II; SardiNIA, SardiNIA Breast Cancer Study; WGHS, Women's Genome Health Study.

To evaluate the combined effect of candidate SNPs on breast cancer risk, we calculated a GRS for each trait by using either a count GRS or a weighted GRS approach. The mean values of count and weighted GRSs were 20.41 and 20.03, respectively, for age at menarche and 16.21 and 14.32, respectively, for age at natural menopause (Table 3). Based on the count GRS for a younger age at menarche, the OR for breast cancer associated with each point scored, corresponding to 1 risk allele, was 1.01 (95% CI = 1.00 to 1.03) after age and potential population stratification were adjusted for. ORs did not increase linearly across quintiles of GRS for age at menarche (P for trend = 0.06). Compared with women in the lowest quintile, women in the fourth and fifth quintiles had ORs for breast cancer of 1.14 (95% CI = 1.01 to 1.28) and 1.13 (95% CI = 1.00 to 1.27), respectively. Results were similar when analyses were performed by using weighted GRS. Overall, we did not observe statistically significant associations between breast cancer risk and age at natural menopause when either count or weighted GRS was used.

Table 3 Association between genetic risk score and risk of breast cancer

In secondary analyses, we then determined whether the associations of the 46 candidate SNPs with breast cancer vary across tumor subtypes defined by ER status in the NHS and the WGHS (Table 4). For the two SNPs (rs1079866 and rs7821178) that had reported associations with age at menarche and that were associated with overall breast cancer risk, we found no statistically significant evidence that the associations differed across subtypes (P for heterogeneity = 0.31 and 0.66, respectively), although rs1079866 appeared to have a stronger association with ER+ tumors (per-allele OR = 1.26; 95% CI = 1.12 to 1.41) than with ER- tumors (per-allele OR = 1.11; 95% CI = 0.89 to 1.38). Of note, one SNP that had a reported association with age at menarche, rs17188434, had a significantly stronger association with ER- tumors (per-allele OR = 1.51; 95% CI = 1.15 to 1.98) than with ER+ tumors (per-allele OR = 1.08; 95% CI = 0.92 to 1.26; P for heterogeneity = 0.035). Another SNP that had a reported association with age at menarche, rs17268785, was associated with a decreased risk of ER- tumors (per-allele OR = 0.83; 95% CI = 0.68 to 1.00) but an increased risk of ER+ tumors (per-allele OR = 1.07; 95% CI = 0.96 to 1.19; P for heterogeneity = 0.023). For the SNP that had a reported association with age at natural menopause and that was associated with overall breast cancer risk, we observed a stronger association with ER- tumors (per-allele OR = 1.32; 95% CI = 1.09 to 1.61) than ER+ tumors (per-allele OR = 1.11; 95% CI = 1.00 to 1.24); however, the test for heterogeneity was not statistically significant (P for heterogeneity = 0.12). When the count GRS for age at menarche or age at natural menopause was applied to ER+ and ER- breast cancer separately, the trend in the OR for ER+ tumors was very similar to that for overall breast cancer. The ER- tumor data suggested a somewhat different pattern, although the statistical power was limited for this subtype (Figure 2).

Table 4 Association of candidate single-nucleotide polymorphism loci and risk of breast cancer by estrogen receptor status in the Nurses' Health Study and Women's Genome Health Study
Figure 2
figure 2

The associations between groups defined by quintiles of genetic risk scores (GRSs) and risk of breast cancer by estrogen receptor (ER) status in the Nurses' Health Study and the Women's Genome Health Study. (a) Count GRS for age at menarche. (b) Count GRS for age at natural menopause. (c) Count GRS for breast cancer-associated SNPs. CI, confidence interval.

Of the 10 candidate SNPs with consistently reported associations with breast cancer risk, five SNPs (rs11249433, rs3803662, rs2981582, rs13387042, and rs999737) appeared to have a stronger association with ER+ tumor than ER- tumors, and rs3803662 reached statistical significance (P for heterogeneity = 0.008) with per-risk-allele ORs of 1.26 (95% CI = 1.16 to 1.37) and 0.98 (95% CI = 0.82 to 1.16) for ER+ and ER- tumors, respectively (Table 4). Two breast cancer candidate SNPs, rs1045485 and rs3817198, did not show statistically significant associations with overall risk. However, rs1045485 appeared to have a stronger association with ER- tumors (per-allele OR = 1.20; 95% CI = 0.95 to 1.52) than ER+ tumors (per-allele OR = 1.06; 95% CI = 0.95 to 1.19; P for heterogeneity = 0.35), and rs3817198 was associated with a decreased risk of ER- tumors (per-allele OR = 0.88; 95% CI = 0.74 to 1.04) but an increased risk of ER+ tumors (per-allele OR = 1.06; 95% CI = 0.98 to 1.15; P for heterogeneity = 0.047).

In these analyses, we further confirmed statistically significant associations with breast cancer risk for 8 of the 10 candidate SNPs that were identified previously in published GWAS of breast cancer (most P values were less than 0.001) (Table 2). We did not observe a statistically significant association for either LSP1-rs3817198 or CASP8-rs1045485 (both with P values of 0.13) in our study, although the direction of the associations was consistent with that of previous reports [21, 22]. We also calculated, as a positive control, a count GRS based on these 10 SNPs. We found that each score point increase, corresponding to one-risk-allele increase, was significantly associated with an OR of 1.13 (95% CI = 1.11 to 1.15) for breast cancer (Table 3). Compared with women in the lowest quintile, women in the highest quintile had an OR for breast cancer of 1.89 (95% CI = 1.67 to 2.14). For this GRS, the trend in log odds was significantly steeper for ER+ than for ER- tumors (P for heterogeneity < 0.001), and the OR across quintiles was no longer monotonic in ER- tumors (Figure 2).

Discussion

In this large meta-analysis of six population-based studies, we investigated whether 19 loci linked with age at menarche and 17 loci linked with age at natural menopause were associated with breast cancer risk among up to 3,683 breast cancer cases and 34,174 controls. We found that two SNPs with reported associations with age at menarche and one SNP with a reported association with age at natural menopause were significantly associated with breast cancer risk. However, no statistically significant associations were found for GRSs that combined all 19 or 17 loci associated with each trait, although the association for age-at-menarche GRS was marginally statistically significant. We confirmed most of the candidate loci for breast cancer which were identified in previous GWAS. Some of these associations appeared to differ by tumor subtypes defined by ER status.

In our analyses, most of the candidate SNPs, including the strongest GWAS hit for age at menarche or age at natural menopause, were not found to be associated with breast cancer risk. This is not necessarily surprising given that age at menarche and age at natural menopause are relatively weak risk factors [2, 3], and all candidate SNPs collectively explain only a small portion of the variation of each trait [10, 11]. However, two candidate SNPs for age at menarche, rs1079866 and rs7821178, and one candidate SNP for age at natural menopause, rs2517388, were found to be associated with breast cancer risk. These associations were not attenuated after we further adjusted for self-reported age at menarche and age at natural menopause, suggesting these three genetic loci were associated with breast cancer risk independently of their associations with age at menarche or age at natural menopause. It is possible that these genetic loci have pleiotropic effects on reproductive timing as well as other biological processes leading to breast cancer, and the observed associations might be due largely to other biological consequences of these risk variants that do not manifest themselves as changes in age at menarche or age at natural menopause. Alternatively, it is also possible that the relatively crude assignment of these reproductive events to a single chronological year is not sufficiently accurate to capture the biological effect of these processes on breast cancer risk and the genetic variants contribute independent information on the underlying biological risk. The three candidate SNPs also contributed to breast cancer risk independently of the known susceptibility loci for breast cancer, as further adjustment for breast cancer loci did not materially alter the results.

We found statistically significant evidence of association with breast cancer for eight of the 10 breast cancer susceptibility loci examined: FGFR2-rs2981582, TNRC9-rs3803662, 1p-rs11249433, 5p-rs7716600, 2q35-rs13387042, MAP3K1-rs889312, 8q24-rs13281615, and RAD51L1- rs999737. The direction and magnitude of these associations were consistent with those of previous reports [17, 18, 2225]. We did not observe a statistically significant association for either LSP1-rs3817198 or CASP8-rs1045485. However, these two SNPs had relatively small reported effects that our study might not have been able to detect. When the 10 candidate SNPs were combined by using a polygenic risk score, the relative risk for women in the highest quintile was about twice that in the lowest quintile, and this is in accordance with other published results [19, 26]. In this study, none of the 10 breast cancer susceptibility loci was significantly associated with age at menarche or age at natural menopause, and this is in line with a previous report [27].

Given that most of the candidate loci for age at menarche and age at natural menopause were not associated with breast cancer risk, it is not surprising that there were no statistically significant associations for the polygenetic risk scores that combined all candidate loci for each trait. To conduct a post hoc and exploratory analysis, we created a polygenetic risk score by including only the three candidate loci associated with either age at menarche or age at natural menopause and with breast cancer risk and found that each risk allele increment was associated with an approximately 17% increased risk for breast cancer. Women with four or more risk alleles had an approximately 60% increased risk for breast cancer in comparison with those with two risk alleles or less. When we further combined the three associated SNPs with the 10 breast cancer susceptibility loci to create a polygenetic risk score, each risk allele increment was associated with an approximately 18% increased risk for breast cancer. For women with 14 or more risk alleles (the highest quintile), the risk for breast cancer increased threefold in comparison with those with 10 or less (the lowest quintile). Because the former group constitutes approximately 20% of the study population, the GRS that combines the three candidate SNPs for age at menarche and age at natural menopause and the identified breast cancer susceptibility loci might be useful for identifying a subgroup of women with a high genetic risk for breast cancer. Further research is needed to confirm this finding.

It has been hypothesized that the risk of ER+ breast cancer is positively associated with a woman's cumulative lifetime exposure to endogenous ovarian hormones [28]. A younger age at menarche [12, 15, 29] and an older age at menopause [30] have been observed to be more consistently associated with ER+ than ER- tumors. In this report, we found that candidate loci for age at menarche and age at natural menopause may also be differentially associated with tumor subtypes defined by ER status. Of the three candidate loci that were found to be associated with overall breast cancer risk, rs1079866 was more strongly associated with ER+ tumors, rs7821178 was equally associated with both, whereas rs2517388 was more strongly associated with ER- tumors, although differences were not statistically significant. Importantly, two candidate loci for age at menarche, rs17188434 and rs17268785, had significantly different associations with ER+ and ER- tumors. Whereas both SNPs were not significantly associated with overall and ER+ breast cancer, the former showed a statistically significant positive association with ER- tumors, whereas the latter showed a statistically significant inverse association with ER- tumor. These findings provide further support for the notion that ER+ and ER- tumors are the result of different etiologic pathways [31].

Although common genetic variants that influence the intermediate phenotypes or risk factors have been hypothesized to subsequently affect disease risk, few studies have assessed the association between these genetic variants and disease risk or, furthermore, whether these associations are mediated through the intermediate phenotypes. Chen and colleagues [32] investigated obesity-linked genetic variants in relation to breast cancer risk but found no statistically significant association. To our knowledge, ours is the first study to evaluate the associations of candidate loci for age at menarche and age at natural menopause with breast cancer risk. One of the strengths of our study is the relatively large combined sample size achieved through international collaboration. We had adequate statistical power (80%) to detect an OR of 1.12 for SNPs with a minor allele frequency (MAF) of 0.10 and an OR of 1.09 for SNPs with an MAF of 0.20. However, our analysis of ER+ tumors was less adequately powered, as the ER status was not available for all cases, and the study had limited statistical power for ER- tumors. One limitation in our study is the multiple comparisons that could lead to false-positive results. Although none of the candidate SNPs with a reported association with age at menarche or age at natural menopause survived Bonferroni correction in the test of breast cancer association, this correction is considered to be overly conservative given that the candidates were chosen on the basis of promising hypotheses. Another potential limitation of our study comes from differences in the study population and designs and methods of collecting risk factors and genetic marker data across studies. However, the findings were generally consistent across studies, arguing for the robustness of our results. Finally, as our analyses were restricted to women of European ancestry, results from this study may not be generalizable to other ethnic groups.

Conclusions

In summary, in this large analysis of the association of several novel candidate loci for age at menarche and age at natural menopause with breast cancer risk, we observed that three loci - two for age at menarche and one for age at natural menopause - were significantly associated with breast cancer risk independently of their associations with each trait and independently of known breast cancer susceptibility loci. These associations may differ by tumor subtypes defined by ER status. A combination of all 19 loci associated with age at menarche or 17 loci associated with age at natural menopause did not appear to be helpful for identifying a high-risk subgroup for breast cancer.