Introduction

Oestrogenic stimulation is central in breast carcinogenesis and oestrogen receptor α is the most important mediator of the response to stimulation by oestrogens in classical target tissues, such as the breast epithelium. There are several genetic variants of the oestrogen receptor α gene (ESR1), many of which have been studied in relation to breast cancer. Linkage was first described between an intron 1 variant (c.454-351A → G) and breast cancer in a family with late-onset cases [1]. However, later case-control studies have not found any convincing evidence for an association between ESR1 variants and breast cancer risk (Table 1). There is nevertheless an established positive association between breast cancer risk and bone mineral density (BMD) [2], which in turn seems to be influenced by ESR1 variation [35]. Hence, a role of ESR1 polymorphism in the aetiology of breast cancer seems biologically plausible, although it might have been overlooked in earlier studies because of small sample size (Table 1). We have genotyped five ESR1 variants in a large, population-based case-control study; a dinucleotide repeat (TA n ) polymorphism (microsatellite) in the promoter region, two single-nucleotide polymorphisms (SNPs) in intron 1 (c.454-397C → T and c.454-351A → G, also known as PvuII or IVS1-401 and XbaI or IVS1-354, respectively), and one silent SNP each in exon 3 (c.729C → T) and exon 4 (c.975C → G). We estimated the overall influence of ESR1 genotypes and haplotypes in histopathological subgroups, and according to oestrogen-related breast cancer risk factors, on breast cancer risk.

Table 1 Previously published studies about oestrogen receptor α gene polymorphisms and breast cancer

Methods

Parent study

This nationwide population-based case-control study encompassed all incident cases of primary invasive breast cancer among women 50 to 74 years of age resident in Sweden between October 1993 and March 1995, as previously described in detail [6]. Cases of breast cancer in situ were not included. Breast cancer patients were identified at diagnosis through the six Swedish regional cancer registries, to which the reporting of all malignant tumors is mandatory. All Swedish residents are assigned a unique national registration number. This number is recorded in all registries, including the Total Population Register. It is possible for researchers, provided that the appropriate permissions are granted, to approach the authority in charge of the Total Population Register (currently the Tax Authority) and ask for the national registration numbers and addresses of people that fulfil certain criteria specified by the researcher. Control women were randomly selected from the general population according to the expected age frequency distribution (in 5-year age groups) of cases.

Cases were asked to participate in the study by their respective physicians. When patients consented, they received a mailed questionnaire asking for detailed information about intake of menopausal hormones and oral contraceptives, weight, height, reproductive history, medical history, and other lifestyle factors. Controls were contacted directly with the questionnaire. Eighty-four percent of eligible cases (n = 3345) and 82% of the controls (n = 3454) ultimately participated in the parent study. Among the participating controls, 455 who failed to return the mailed questionnaire were interviewed by phone. Results from the parent study are available in previous publications [68].

Selection of present study population

We randomly selected 1500 women with invasive breast cancer and 1500 controls (frequency-matched by age) among postmenopausal participants without any previous malignancy (except in situ cervix carcinoma or non-melanoma skin cancer) in the parent study. To increase statistical power in subgroup analyses, we additionally selected all remaining eligible cases and controls who had taken menopausal hormone treatment (either medium-potency oestrogen treatment only or medium-potency oestrogen in combination with progestin) for at least 4 years (191 cases and 108 controls) and all women with self-reported diabetes mellitus (110 cases and 104 controls). In total, 1801 cases and 1712 controls were selected. In addition, 345 controls from the parent study selected for a parallel endometrial cancer study who fulfilled the inclusion criteria could be added to our sample of breast cancer-free controls. The present study was approved by the respective Institutional Review Boards at Karolinska Institutet and Uppsala University and was performed in compliance with the Helsinki Declaration.

Collection of biological samples

We contacted all selected living women by mail, and those who gave informed consent received a blood sampling kit by mail. Whole blood samples were drawn at a primary health care facility close to the woman's home. Breast cancer cases who declined to donate a blood sample were asked to authorise our use of archived paraffin-embedded tissue taken at breast cancer surgery. We also attempted to retrieve archived tissue samples from all deceased breast cancer cases. We obtained blood samples from 1322 and archived tissue samples for 247 breast cancer patients (87% of all selected). Among the chosen control women, 1524 (74%) contributed blood samples. Reasons for non-participation included lack of interest in or scepticism about genetic research, old age and, in some instances, severe disease or death. We thus obtained final population-based participation rates of 73% in cases and 61% in controls.

We isolated DNA from 3 ml of whole blood with the Wizard Genomic DNA Purification Kit (Promega, Madison, WI) in accordance with the manufacturer's instructions. From non-malignant cells in paraffin-embedded tissue we extracted DNA by using a standard phenol/chloroform nnmsdzz protocol [9].

Genetic analyses

SNPs and microsatellite markers

We selected the ESR1 polymorphisms to be analyzed from the literature [1013]. All primers and probes for the 5' promoter TA-repeat at – 1174 base pairs upstream of exon 1 [HGVbase STR000063453], the intron 1 SNPs c.454-397C → T (previously known as PvuII or IVS1-401, [dbSNP rs2234693]) and c.454-351A → G (previously known as XbaI or IVS1-354, [dbSNP rs9340799]), and the exonic c.729C → T (codon 243 CGC → CGT, synonymous Arg, [dbSNP rs4986934]) and c.975C → G SNPs (codon 325 CCC → CCG, synonymous Pro, [dbSNP rs1801132]) were designed on the basis of the reference sequences [GenBank AF082876], [GenBank AF326912] and [GenBank NM_000125], respectively. For the fluorescence polarisation and minisequencing assays [14], we designed minisequencing primers complementary to the sequence immediately adjacent to the SNPs. For the Molecular Beacon assay [15], we designed two fluorescently labelled allele-specific probes for each SNP, carrying the variable position in the middle of the loop region [16]. We used a web-based DNA folding program (mfold) [17] to estimate the stability of the stem and loop structure of the Molecular Beacon probes. A full description of the laboratory protocol and also the primer and probe sequences and their modifications are given in Additional file 1.

Minisequencing assay with fluorescence polarisation detection

The region containing the intron 1 SNPs, c.454-397C → T and c.454-351A → G, was amplified in one polymerase chain reaction (PCR) fragment from 20 ng of genomic DNA. The PCR products were treated enzymatically to remove remaining primers and nucleotides, and the minisequencing reactions were performed with the two dideoxynucleotides relevant for the particular SNP fluorescently labelled and included at a 1:5 ratio relative to unlabelled dideoxynucleotides. The fluorescence signals were read on an Analyst AD™ (Molecular Devices Corporation, Sunnyvale, CA); genotypes were assigned by the software supplied with the instrument, and a custom-made Excel macro. In each run, positive controls for the three genotypes and negative controls were included. Both DNA polarities were analysed and the results were concordant in all samples. About 30% of the assays were repeated; the results were identical. In addition, 3% of the genotypes were validated by solid-phase minisequencing.

Molecular Beacon assay

To analyse the exonic SNPs c.729C → T and c.975C → G, 10 ng of genomic DNA was amplified with real-time PCR monitoring of fluorescence signals with an ABI Prism 7700 Sequence Detection System (Applied Biosystems). The increase in fluorescent signal was registered during the annealing step of the reaction, and the end-point signals were used to assign the genotypes as previously described [16]. In each run, positive controls for the three genotypes and negative controls were included. Both possible nucleotides of the SNP were interrogated in the same reaction. About 3% of the assays were repeated; the results were identical. Two percent of the assays were validated by solid-phase minisequencing. The Molecular Beacon assay has previously been quantitatively validated in our laboratory [16].

Solid-phase minisequencing assay

Solid-phase minisequencing in microtitre plates [18] was used in part to genotype the c.729C → T SNP. The assay also served as a reference method for the other SNP assays.

Microsatellite assay

The TA-repeat region was amplified, with one PCR primer fluorescently labelled, using an ABI-877 Integrated Thermal cycler PCR robot with standard reagents (Applied Biosystems) [19]. The PCR products were separated by using a 96-well ABI 377 automatic sequenator and analysed with software supplied with the instrument (all Applied Biosystems). In each run, two or three negative controls were included. About 1% of the assays were repeated; they gave concordant results. After calculation of actual number of TA repeats there was no difference between repeated runs of the same sample. The assay was validated by control sequencing of three different repeat lengths.

Statistical methods

We tested for Hardy–Weinberg equilibrium (HWE) among cases and controls separately. We considered the data for TA n in its original form, as eight categories, or dichotomised into long and short (not more than 14 repeats or more than 14 repeats), because the TA n lengths were bimodally distributed with peaks at 11 and 18 repeats, with a dip at 14 repeats.

We also estimated all pairwise linkage disequilibrium values |D'| and r 2 [20, 21]. Although there are disadvantages associated with r 2 compared with |D'| for linkage disequilibrium mapping [22, 23], r 2 is arguably the most relevant measure for association analysis, because there is a simple inverse relationship between r 2 and the sample size required to detect association between susceptibility loci and SNPs [21].

For association analyses, using single loci genotypes, we used conditional logistic regression [24] to calculate odds ratios (ORs) and 95% confidence intervals (CIs). We conditioned on the variables used for selection, namely age in 5-year categories, long-term use of menopausal hormones, and diabetes mellitus (see the section on Selection of present study population above). We evaluated possible associations between ESR1 polymorphisms and other exposures/covariates by scrutinising 2 × k tables and calculating χ2 statistics. Continuous variables were categorised for this purpose. Where there seemed to be plausible evidence of association, the exposure was considered either a potential confounder or a factor in the causal pathway between ESR1 and breast cancer, and was tested as such by introducing it into the logistic regression model.

We tested for disease–haplotype association by using likelihood-based approaches. We used the software EH plus [25] as well as routines written in the S-PLUS (Insightful) programming language. Most available software (including EH plus) for testing association between haplotypes and case/control status assume multiplicative penetrance; that is, the OR comparing two haplotype copies against none is assumed to be the square of the OR comparing one against none. Under this assumption, together with the assumption of HWE in the population and the assumption of a rare disease, it can be shown that cases, as well as controls, will be in HWE. This means that it is possible to perform a likelihood ratio test by comparing a model that infers haplotype phase for cases and controls separately (both under HWE, with different haplotype frequencies) with a model that infers haplotype phase for cases and controls jointly (that is, under HWE, identical haplotype frequencies). For the most part we have used this approach (using EH plus). We have also written our own program to estimate haplotype–case/control status association. The method in this program is in essence identical to that used by Stram and colleagues [26]. This makes use of sampling fractions that are assumed to be known (in practice estimated from population register data; our program requires as input the ratio of sampling fractions between cases and controls). With this ascertainment information we can in principle estimate any model (we are not restricted to particular penetrance assumptions). For our program we consider the likelihood pr(y|x,s = 1), where y is case/control status, x represents covariate information and s is an indicator variable for whether a subject is selected to the sample. We essentially adapt the approach of Neuhaus [27], specifically to the situation in which there is missing covariate information, in this case haplotype phase. The program is able to estimate models other than multiplicative penetrance and enables adjustment for other covariates. Given that the present study is based on large age strata we wished to adjust for this by including age group as a covariate. A likelihood ratio test did not give evidence against a model of multiplicative penetrance (we compared the goodness of fit of the multiplicative penetrance model to the goodness of fit of a model that specifies an individual risk for each unique haplotype pair). Our program also represents a convenient framework within which to estimate haplotype–environment interaction. Results reported here for models involving haplotypes are based on our own program.

We fit a variety of models of association between ESR1 variants and breast cancer risk. We chose to fit models separately for lobular and ductal cases because there is an indication in the literature that these two histotypes have partly different aetiologies. For single-locus genotype effects we opted to include a parameter for each genotype (AA, Aa, aa) rather than test a battery of models with a specified penetrance (dominant, recessive, multiplicative penetrance/allele counting), which have one degree of freedom less. For haplotype effects we fitted multiplicative penetrance models (see above).

Because we fit several models we also make adjustments for multiplicity. We use a permutation-based approach that controls the family-wise error rate (probability of rejecting one or more true null hypotheses of no association). This is based on the permutation step-down procedure of Westfall and Young [28], and takes into account the dependence structure of the polymorphisms/hypotheses. The permutation approach to multiple testing is computationally demanding, particularly when haplotype phase has to be inferred, and for this reason we adjust only P values for the ductal cases versus control group tests. This is reasonable because the power of the individual tests for lobular cases versus controls is weak; unlike Bonferroni procedures, for the permutation approach that we use, a completely meaningless test (power ≈ 0) does not affect the other tests at all because it will never affect the distribution of the minimum P-value statistic in the lower α tail. We additionally remove the TA repeat, because of deviance from HWE (see below), and c.729C → T, because of low rare-allele frequency, when adjusting for multiplicity. This means that we apply a multiplicity adjustment to P values from tests of seven dependent hypotheses.

Results

General

We were able to genotype the SNPs in more than 99% of the biological samples. We obtained complete information in all five markers from 1512 cases and 1511 controls.

For those participating by means of a blood sample or by means of a normal tissue sample, the mean time between breast cancer diagnosis and enrolment in the present study was 5.1 years (range 6.8–3.5 years) and 5.9 years (range 7.5–4.6 years), respectively. Breast cancer stage was more advanced in those who participated by means of normal tissue samples; 58% of tissue samples were from cases with stage 2 or more advanced stages, whereas the corresponding number was 38% in cases who donated blood (P < 0.0001). The mean age of those who donated a blood sample was similar in cases and controls, whereas those for whom we used tissue were on average 1.6 years older. Non-participants were on average slightly older than genotyped participants. There were no notable differences or trends in genotype frequencies over quartiles of time between breast cancer diagnosis and time of blood donation (data not shown); neither were there any differences in genotype frequencies between those who donated a blood sample and those for whom we used normal tissue (data not shown). Further, there were no significant differences in genotype frequencies over breast cancer stages at diagnosis. As expected, mean age at first birth and mean number of births reflected known case-control differences but were largely the same between participants and non-participants. The allele and genotype frequencies in our study population were similar to what has previously been found in Caucasian populations [11, 29, 30].

Among controls, we did not find any convincing associations between any of the studied ESR1 polymorphisms singly and height, body mass index (BMI), smoking, diabetes mellitus, age at menarche, age at first full-term birth, parity, age at menopause, menopausal hormone use, weight gain during adult life, alcohol intake, history of benign breast disease, first-degree family history, or use of oral contraceptives (data not shown).

Association with breast cancer

Single loci

There were no compelling overall relations between any of the studied polymorphisms singly and breast cancer risk (Table 2). If anything, there seemed to be a slight negative association between homozygosity for any of the two intron 1 SNPs and heterozygosity for the c.975C → G SNP and ductal cancer risk, but only one estimate was significant. The association patterns between single loci genotype and lobular cancer risk differed somewhat from those regarding ductal cancer risk, but the lack of power precluded a conclusive comparison. In secondary analyses, we tested different cut-points for the TA n but found that no choice of cut-point would have resulted in a significant association between this marker and breast cancer risk.

Table 2 Odds ratios for ductal and lobular breast cancer in relation to single locus genotype

Haplotypes

Haplotypes describe the genetic make-up more thoroughly than SNPs. Thus, our further analyses of the association between ESR1 polymorphism and breast cancer development, overall and stratified according to hormonal factors, are confined to the influence of different haplotypes.

There is no evidence of extensive historical recombination between the typed SNPs (pairwise |D'| values 0.998, 0.872, 0.848; Table 3). However, r 2 values indicate that the allelic association is not strong.

Table 3 Pairwise linkage between polymorphisms in the oestrogen receptor α

Table 4 shows the estimated haplotype frequencies based on the c.454-397C → T, c.454-351A → G, c.729C → T and c.975C → G SNPs. Of the controls, 65% carried either of the two most common haplotypes. Out of 16 possible haplotypes, only 6 were represented among more than 98% of the women in our study. Although the genotype frequencies of the four SNPs were in accordance with HWE, the genotype frequency of the TA n was not, neither when all alleles were considered nor when the repeat lengths were categorised (P < 0.0001). We therefore excluded the TA n marker from haplotype reconstruction because it did not fulfil the necessary assumptions. Furthermore, the c.729C → T marker was expelled from further haplotype analyses, because the minimal prevalence of the rare allele would not allow meaningful inference.

Table 4 Distribution of ESR1 four-locus haplotype frequencies as estimated through expectation-maximisation algorithms

In Table 5 we present P values, unadjusted and adjusted for multiple comparisons, for the main effects tested, beginning with single-locus associations. Next we explored the prevalence of three two-locus haplotypes among cancer cases versus controls (Table 5). We found an association between ESR1 and ductal, but not lobular, cancer risk in a haplotype analysis based on the c.975C → G marker in combination with either c.454-397C → T or c.454-351A → G (Table 5). Likelihood-ratio tests evaluating models in which each haplotype carries its own risk (that is, a variable with four categories) were statistically significant (P = 0.019 and 0.022, respectively; df = 3). The corresponding P values adjusted for multiple comparisons were 0.07 and 0.08, respectively.

Table 5 P values from single genotype and haplotype association tests

The three-SNP haplotype composed of the c.454-397C → T, c.454-351A → G and c.975C → G SNPs was not more strongly associated with breast cancer risk than any of the two-SNP haplotypes based on the c.975C → G marker in combination with either c.454-397C → T or c.454-351A → G (Table 5).

Table 6 accounts for the age-adjusted relative risk, overall and stratified according to breast cancer risk factors, for ductal breast cancer in relation to the c.454-351A → G or c.454-397C → T and c.975C → G haplotypes (AC and TC, respectively). Under a model of multiplicative penetrance, the OR for carrying two copies of the haplotype compared with none is the square of the estimates for carrying one copy of the haplotype compared with carrying no copy. To save space we present only the latter estimates. The OR for carrying one copy of the AC haplotype was 1.19 (95% CI 1.06–1.33) compared with carrying no copy. The association was confined to overweight women (BMI > 25) and seemed more pronounced among those with a BMI of more than 28 kg/m2. There was a similar pattern of risk for the TC haplotype. Further stratification revealed an even stronger relation; when considering only women with BMI > 30 with one AC compared with none, we found an OR of 1.48 (95% CI 1.08–2.03) and among those with BMI > 32 an OR of 1.60 (95% CI 1.02–2.49). The corresponding ORs for the TC haplotype were 1.59 (95% CI 1.15–2.21) for BMI > 30 and 1.71 (95% CI 1.08–2.70) for BMI > 32. The effect of the AC or TC haplotype was more pronounced in uniparous women but there was no trend over number of pregnancies. There were no indications of interaction between AC or TC haplotype and menopausal hormone use, family history, years of menstruation or age at menopause.

Table 6 Ductal breast cancer risk in relation to high-risk haplotypes, stratified by breast cancer risk factors

The AC haplotype seemed to be more common in controls with low BMI (Table 6). A test of association between BMI in two categories (less than 28 and 28 or more) and AC haplotype versus the other three possible haplotypes yielded P = 0.09. There was no association between BMI and TC haplotype (P = 0.48). In a model with BMI in two categories the P value for interaction between haplotype and BMI was 0.12 for the AC and 0.047 for the TC haplotype.

Discussion

Our results indicate that natural allelic variation in ESR1 might be associated with postmenopausal ductal breast cancer risk; common haplotypes, composed of weakly linked markers, were in our data associated with increased breast cancer risk. The associations seemed the most pronounced in groups with high BMI.

Our study has strengths in that it is population-based, large, and set in a comparatively homogenous population with regard to ethnicity and menopausal status. In addition, we had some possibility of evaluating potential interaction with other breast cancer risk factors.

To our knowledge, no previous study has considered the influence of three-marker haplotypes in ESR1 on breast cancer risk. Two of the loci studied are in untranslated regions, and one locus is a base-pair exchange in the third codon position that does not alter the resulting amino acid. The polymorphisms could theoretically have direct regulatory roles (see below) but can also be regarded as markers, potentially in linkage disequilibrium with a functional locus or loci.

If the underlying model for the disease is that combinations of SNP alleles are causally important, it is essential to define haplotypes and to use them as the unit of exposure in the analysis. A possible explanation for the observed association, apart from its being a chance finding, is that there is such a functional combination of SNP alleles; that is, that the two markers (or loci linked to the markers) in combination alter the function of the gene, such as RNA stability or the translation machinery [31]. Functional combinations of SNP alleles composed of polymorphisms in non-coding regions have been shown to exist [32].

If in contrast it seems likely that there is only one disease locus, haplotypes are useful in that they can help to make the study of a gene more efficient through reducing the number of SNPs that need to be typed in the entire study population. This can be accomplished by making use of the linkage disequilibrium between markers in a gene and selecting for subsequent typing the SNPs that best capture the haplotype structure of a gene, the so-called haplotype tagging SNPs.

Our four SNP markers and one dinucleotide repeat were not specifically selected to define ESR1 haplotypes. Rather, they were chosen because they were known to be polymorphic at the time when the study was planned. Clearly, the ability to capture the gene's haplotype diversity is limited by the use of only four markers, of which only three exist with a substantial prevalence. Our design is not optimal in the light of current knowledge. It would be more unbiased to address the problem by using strategies in which a comprehensive set of SNPs are identified and validated and subsequently a smaller set are chosen specifically to tag the important haplotypes in the gene. Our intention here was merely to make the best possible use of data that were already available.

A weakness in our study is that, despite its size, we had only a limited potential to investigate possible interactions between haplotype and other breast cancer risk factors. Because it is plausible that moderate genetic effects are manifested only in the presence of other exposures, it is highly desirable to be able to perform analyses of interaction.

The error of haplotype frequency estimation by the expectation-maximisation (EM) algorithm has been shown by to be low under various conditions with regard to, for example, heterozygosity, haplotype frequency distributions, and linkage disequilibrium [33]. Haplotype frequency estimation with EM algorithms entails assuming that the single marker genotype frequencies are in HWE. This assumption is easily testable. In this study we could not establish HWE for the TA n among controls; this polymorphism was therefore not included in haplotype reconstruction. The reason for deviation of the TA n from HWE is unclear. If the reason for the deviation, contrary to our belief, were due to genotyping error and if the error were random with regard to case-control status, association between the TA n and breast cancer would be underestimated.

Another supposition used for our disease–haplotype association estimation is a model of multiplicative penetrance. Multiplicative penetrance, rather than dominant, is likely in the action of common genetic variants because their effect is thought to be modest. If instead a recessive model were correct, our estimates would be conservative. In our data, a model of multiplicative penetrance could not be rejected.

Selection bias is a potential problem in case-control studies, and our participation rates, calculated by using those eligible for the parent study in the denominator (73% and 61% in cases and controls, respectively), could lead to spurious findings but only if participation were related to genotype or haplotype. Survivor bias might be a concern in our study because death and severe disease were reasons for non-participation. If a genetic variant is associated with severe breast cancer but not with less severe breast cancer, and if the more severe cases of breast cancer are less likely to participate because they have died, any association with breast cancer overall would be biased towards the null. One can also view this as an issue of generalisability; in other words, our findings do not pertain to more aggressive breast cancer. However, in our data genotype frequencies did not vary according to time between diagnosis and enrolment through blood versus tissue donation (the latter mainly representing deceased cases). Neither were genotype associations appreciably different across stages at breast cancer diagnosis. We therefore conclude that survivor bias is not a major problem in our study.

Some observations make functional consequences of some of the five studied ESR1 polymorphisms plausible. The TA n microsatellite is located in the promoter region and, as directly exemplified by Enattah and colleagues [34] in a study about lactose intolerance, promoter variants can severely affect gene function. Herrington and colleagues [35] found that the IVS-401 C allele contains a potential binding site for Myb transcription factors and showed that this can augment transcription up to 10-fold. In general, our knowledge about the functional significance of non-coding DNA sequences is not fully formed, but there is evidence that upstream, downstream and intronic sequences have important regulatory roles [36].

There are some reports that suggest connections between ESR1 variants and breast cancer risk factors. Ushiroyama and colleagues found that Japanese women with the c.454-397C → T CT genotype had higher plasma and serum oestradiol levels than those with the TT genotype [37]. In a Dutch population, c.454-397C → T TT was associated with an earlier onset of menopause, which would entail a decreased risk for breast cancer [38]. In contrast, Deng and colleagues [39] found c.454-397C → T TT to be associated with higher BMI and against a tendency to gain weight with age, characteristics previously shown to be associated with an increased risk for breast cancer. These associations between the c.454-397C → T variant and onset of menopause, BMI, and weight gain were not corroborated by our data. However, we did find a tentative association between BMI and a haplotype constructed by the c.454-351A → G SNP and the c.975C → G SNP. In a previous study about endometrial cancer, a disease closely related to oestrogen exposure, we found that the c.454-397C → T CC and c.454-351A → G GG genotypes were associated with a decreased risk for endometrial cancer [40].

The previous literature of ESR1 polymorphism in relation to breast cancer risk is inconclusive (Table 1). In contrast to our findings, one study showed an increased risk with c.454-351A → G G allele [41] or decreased risk with the c.454-351A → G A allele [42], others showed an increased risk with the c.975C → G G allele [43, 44]. In a recent large Chinese study [45] the c.454-397C → T C allele was associated with an increased breast cancer risk, which is contrary to the tendencies in our data for ductal cancers but in line with those for lobular cancers. Histotypes were not reported in the Chinese study but the majority were most probably ductal cancers. The remaining investigators did not find any association with breast cancer risk [11, 13, 30, 4650].

BMI is the most important determinant of oestrogen levels in postmenopausal women. Our finding that the association grew stronger as we considered women with increasing BMI is biologically plausible and would, if it is real, indicate that ESR1 variation is more influential in the presence of higher levels of oestrogen. Contradicting this theory, however, is the fact that we could not see any corresponding influence of duration of menopausal hormone treatment.

The association between BMD and ESR1 polymorphism is by far the most closely investigated, in particular with regard to the intron 1 SNPs. Any variant in ESR1 that seems to increase bone mass or to decrease risk for osteoporosis and fractures would be expected to increase breast cancer risk because it might indicate a greater influence of oestrogen. However, the results of studies of BMD and ESR1 polymorphism are conflicting. Among the largest studies, some have shown, in line with the suggestions in our data, that the c.454-397C → T T allele or TT genotype confers the highest BMD, that is, the strongest oestrogenic influence [3, 51], whereas others found the highest BMD in those with the C allele or CC genotype [52]. Similarly, for the c.454-351A → G locus, although some, supported by our findings, found the A variant to be associated with higher BMD [3, 51, 53], others found high BMD with the G variant [54].

The quest for disease-causing genes has grown markedly over the years. Nevertheless, the efforts have yielded few lucid results. It is likely that the genetic alterations we are looking for are of low penetrance and thus implicitly need co-action or interaction with other exposures to show strong associations. Another plausible scenario is that breast cancers arise from various genetically distinct causal mechanisms, in other words that genetic heterogeneity is present, which acts to blur the gene–disease associations under study. Population history might also contribute to an explanation of previous diverging results in different populations such that the causative variant might be linked to one marker allele in one population but with the other allele in another population. A contributing fact is that many studies have lacked sufficient power to capture such biological complexity either because of small sample size or, as has been argued by some, because of inappropriate study design [55].

When initiating this study we had strong belief in the possibility of ESR1 variation's being involved in breast carcinogenesis; there is undisputed evidence of oestrogens' causal role in breast carcinogenesis, and oestrogenic action is mediated through oestrogen receptor α. Furthermore, there were existing data indicating that certain ESR1 variants are associated with oestrogenic action. Wacholder and colleagues [56] suggest making use of prior belief in a genetic variant when trying to determine whether a statistically significant finding is noteworthy. They recommend calculating the false positive report probability (FPRP) by using the power of the study to detect the particular finding, the P value for the association, and the prior belief. Our prior estimate was that the chance of association between ESR1 and breast cancer was at least 10%. The power for detecting an odds ratio of 1.19 for the AC haplotype at an α level of 0.05 in our sample was 0.79 when we compared the AC with the next most common haplotype (AC being the most common). P for the association was 0.02. With our own prior of 0.1 the FPRP was 0.18, which makes our finding noteworthy. However, many previous studies have failed to establish a role for ESR1 in breast carcinogenesis, which decreases our priors. Yet, because most previous studies were small and because no other study has considered haplotypes between the SNPs in intron 1 and exon 4 the influence on the prior would not be large. Nevertheless, our design might not have captured the full haplotype diversity of ESR1. The belief in haplotypes based on the few markers that we have chosen is clearly lower. If, say, prior belief is then 0.01, the FPRP becomes a substantial 0.71.

Conclusions

We found suggestions of an association between ESR1 haplotypes and the risk for postmenopausal ductal breast cancer of mild to moderate severity, although this is conceivably a false positive finding. This association seemed stronger as we considered women with higher BMI. If these haplotypes truly entail an increased breast cancer risk, owing to their high population prevalence, they have the potential for substantial role in breast cancer aetiology overall.