Introduction

The estrogen receptor beta (ESR2) is one of the two nuclear receptors that mediate most of the actions of estrogen. ESR2, the gene that encodes for the ESR2 protein, has many polymorphisms that have been associated with a wide variety of phenotypes, such as hot flashes [1], serum lipids [2] endometrial cancer [3], cardiovascular diseases [4], Alzheimer's disease [5], and breast cancer [6]. Despite the many clinical association studies, it is still not clear which of the identified single-nucleotide polymorphisms (SNPs) are functional. Although the effect size of the ESR2 SNPs in clinical association studies has been relatively small, this does not necessarily mean that the effects of the ESR2 genetic variations are small. One of the recent explanations for the relatively small effect size of many of the common SNPs observed in GWAS studies is the phenomenon of “synthetic associations” [7, 8]. This apparent small effect size is observed when multiple rare SNPs with large functional effects load disproportionately onto haplotypes that are imperfectly tagged by the more common SNPs. Since there does not appear to be any common variant in the ESR2 gene that has obvious recognizable function, we hypothesized that there are multiple rare SNPs in the ESR2 gene that have larger effects. Since the low minor allele frequency of the rare SNPs would require very large populations to test their associations individually, this is impractical for the phenotypes that we measure because they require intensive prospective phenotype measurements [1, 2]. Therefore, we have taken the approach of functionally characterizing the rare SNPs in the ESR2 gene so that they can be evaluated together in related clinical studies, using an approach similar to the scoring system for polymorphisms in the drug-metabolizing enzyme genes [9].

To address this hypothesis, we have resequenced a section of the ESR2 promoter and identified rare SNPs that are predicted to impact promoter function. We have also validated the bioinformatic prediction using in vitro functional promoter assays. Our studies have identified a SNP in the ESR2 promoter TATA box that substantially reduces the promoter activity.

Material and Methods

DNA Sequencing

We resequenced 1.6 kb of the ESR2 gene. The target region included exon ON and the flanking regions (nucleotide 45760236–45761880 from accession no. NT_026437). The DNA was amplified using three PCR reactions. The primers were designed using Primer3 software and obtained from Integrated DNA Technologies, Inc (Coralville, Iowa, USA). The sequences of the primers are listed in the Supplementary Table 1. The target region was amplified in 50 African American DNA samples from the Coriell diversity panel and in 50 samples from the COBRA tamoxifen breast cancer clinical trial (ClinicalTrials.gov Identifier no. NCT00228930). The subjects in this tamoxifen trial were mostly Caucasians. The PCR reaction consisted of 50 ng of genomic DNA, 200 nM of primers, 5% DMSO and the final volume was made up to 50 μl using Accuprime Pfx Supermix (Invitrogen Corporation, Carlsbad, CA). The DNA was amplified using GeneAmp PCR System 9700 (Applied Biosystems, Foster city, CA) using the following cycle: (a) an initial denaturation step of 95°C for 5 min; (b) 40 cycles of 95°C for 15 s, 52°C (Amplicon 2 and 3)/57°C (Amplicon 1) for 30 s, 68°C for 1 min and (c) a final primer extension step of 68°C for 5 min. The size and purity of the amplicons were confirmed by agarose gel electrophoresis. The amplicons were purified using Microcon centrifugal filter devices (Millipore Corporation, Billerica, MA) and sequenced at the Biochemistry Biotechnology Facility (Indiana University, Indianapolis, IN) using the Big Dye® Terminator v3.1 cycle sequencing kit (Applied Biosystems, Inc, Foster city, CA) on the ABI 3100 Genetic Analyzer (Applied Biosystems, Foster city, CA). The ABI trace files from sequencing containing the sequences were assembled and analyzed using the VectorNTI software (Invitrogen Corporation, Carlsbad, CA). The chromatogram for each subject was visually scanned for the presence of SNPs. The haplotypes from the DNA sequencing data were estimated using the PHASE software [10, 11].

Bioinformatic Analysis of the Polymorphic Regions of the ESR2 Gene

Two bioinformatic analyses of the polymorphic regions of the ESR2 gene were conducted to prioritize the SNPs for functional laboratory studies. The first was a determination of species conservation of the regions containing the SNP. The human (NT_026437), mouse (NW_047761), rat (NT_039551), and chimpanzee (Chimp Nov. 2003 Assembly, UCSC) sequences were aligned and visualized using ClustalW [12] and VISTA [13]. The level of conservation of the predicted TATA box among the species was observed using ClustalW. It was further confirmed using the rVISTA tool [14]; the cut-offs for the similarity score and core similarity were left at the default of 0.7 and 0.75 respectively. The second bioinformatic analysis was to determine if the SNPs caused any predicted changes in any transcription factor binding sites. Transcription factor binding sites were identified using TFSEARCH against the TRANSFAC database using the default threshold score of 85. For SNPs that are located in transcription factor binding sites, the variant sequences were also analyzed to determine if the SNP caused a change in the score of the predicted binding site.

TATA Box SNP Genotyping

We designed a Taqman™ assay to genotype the TATA SNP (rs35036378) in samples from additional populations. The sequences of the primers and probes are listed in the Supplementary Table 1. The Taqman assay was used to genotype an additional 50 African American, 10 Chinese, 10 Japanese, and 9 Indo-Pakistanis from the Coriell repository and 221 Caucasians from the COBRA tamoxifen breast cancer clinical trial (ClinicalTrials.gov Identifier no. NCT00228930). This was a prospective observational clinical trial of women 18 years or older who were treated with tamoxifen in the adjuvant setting or for the prevention of breast cancer [15]. The use of the samples from the COBRA tamoxifen clinical trial was approved by the IRBs from Georgetown University Medical Center, the University of Michigan, and Indiana University Medical Center.

We also used a PCR restriction fragment length polymorphism (RFLP) assay to verify a subset of the sequencing and genotyping results for rs35036378. We used the amplicon 4 primers and digested the resultant amplicon with the DraI restriction enzyme (New England Biolabs, Ipswich, MA), which cuts the wild-type (T) but not the variant (G).

ESR2-TATA Box Luciferase Reporter Assays

A 410-bp sequence of the ESR2 promoter containing the TATA box (nucleotide 45761496 to 45761087 from accession no. NT_026437) was cloned into HindIII/XhoI site of the pGL3 Basic promoterless vector (Promega Corporation, Madison, WI). The plasmid with the variant TATA box was created using the Quick Change® Site-Directed Mutagenesis Kit (Stratagene, La Jolla, CA). The sequences of the wild-type and variant inserts were confirmed by DNA sequencing. The wild-type and variant plasmids were transfected into the LNCaP and SKBR3 cells.

LNCaP cells were plated in a 24-well plate at a cell density of 7 × 104 cells/well and were grown overnight in DMEM with 10% FBS to approximately 80% confluency. The DNA mix for transfection was prepared in Opti-MEM I Reduced Serum Medium (Invitrogen Corporation, Carlsbad, CA) and consisted of 0.5 μg of the test plasmid (wild type or variant) with 0.05 μg of the phRGTK renilla control vector (Promega, Madison, WI) that served as an internal control to normalize luciferase activity. The transfection was carried out using Lipofectamine LTX (Invitrogen Corporation, Carlsbad, CA).

SKBR3 cells were plated in a 12-well plate at a cell density of 4 × 105 cells/well and were grown overnight in DMEM with 10% FBS to approximately 80% confluency. The DNA mix for transfection was prepared in Opti-MEM I Reduced Serum Medium (Invitrogen Corporation, Carlsbad, CA) and consisted of 1.6 μg of the test plasmid (wild type or variant) with 0.30 μg of the phRGTK renilla control vector (Promega, Madison, WI) that served as an internal control to normalize luciferase activity. The transfection was carried out using Lipofectamine 2000 (Invitrogen Corporation, Carlsbad, CA).

Cells were harvested after 24 h and the luciferase activity was determined using the Dual Luciferase assay kit (Promega Corporation, Madison, WI) in a SIRIUS luminometer (Berthold Technologies, Oak Ridge, TN). The experiments were repeated on three different days.

To assure reproducibility of the reporter assay results, the experiments with the LNCaP cells were repeated on different days, in two different laboratories (TS and SO), using different batches of DNA.

Results

ESR2 Resequencing

Resequencing of a 1.6-kb ESR2 region in 100 samples led to the identification of five novel SNPs in the promoter and one in intron 1 of the ESR2 gene (Table 1, Fig. 1). These results have been submitted to the PharmGKB (PS203888, PS203889) and dbSNP (rs nos. are in Table 1) databases. The five promoter SNPs were observed in the African American population, but not in the Caucasian samples, whereas the intronic SNP was observed in both populations. No SNPs were identified in the exon ON. The PHASE software was utilized to estimate the haplotypes from the DNA sequencing data (Table 2). The program predicted 11 different haplotypes with wild type being the most frequent.

Table 1 SNPs in the ESR2 gene
Fig. 1
figure 1

Schematic representation of the positions of the SNPs in the ESR2 gene. The designated position of each SNP is relative to the first nucleotide of Exon ON, which is nucleotide no. 45761078 of accession no. NT_026437. The black boxes represent the transcription factor binding sites in which the SNP exists. TSS transcription start site

Table 2 The 11 haplotypes that were identified in the 50 African American and 50 Caucasians along with their frequency of occurrence

The TFsearch software was used to predict the functional significance of promoter SNPs. Based on the difference in TFsearch TFBS scores, three of the promoter SNPs were predicted to alter a transcription factor binding site (Table 1). We focused additional studies on confirming the rs35036378 (−43 T > G) SNP, which was predicted to reduce the function of the ESR2 TATA box.

Functional Variant in TATA Box

To confirm the presence of this TATA box SNP (rs35036378) in the Coriell DNA samples, we used an RFLP assay. The results were concordant with the sequencing results (data not shown). Using a Taqman assay, we genotyped additional samples for a total of 100 African American, 271 Caucasian, 10 Chinese, 10 Japanese, and 9 Indo-Pakistani samples. We did not identify this SNP in any of the Caucasian, Chinese, Japanese, or Indo-Pakistani samples, but only in the African American samples. In total, there were four samples (all African Americans) that were heterozygous for the TATA box SNP (NA17110, NA17131, NA17170, and NA17196).

We prioritized the SNPs for functional studies in part based on the documented conservation of the SNP regions. Using ClustalW and rVISTA, we determined the inter-species conservation of the region of the ESR2 gene that was sequenced. The wild-type TATA box sequence, which includes rs35036378, and several neighboring nucleotides were 100% conserved across all four species (Fig. 2). In contrast, the regions around the other SNPs were not highly conserved (data not shown).

Fig. 2
figure 2

Conservation of the TATA box region of the ESR2 gene. The alignment was constructed using ClustalW. The asterisk below the sequence indicates conservation across all four species. The bold nucleotides indicate the TATA box and the arrow above indicates the position of the SNP

Functional Studies

We used the pGL3 promoter luciferase plasmid to determine the functional significance of the TATA box SNP (rs35036378). We cloned 410 bp (nucleotide 45761087–45761496 from accession no. NT_026437) of the wild-type ESR2 promoter into the pGL3 basic vector. The rs35036378 SNP was incorporated using site-directed mutagenesis. These plasmids were then transfected into the LNCaP and SKBR3 cells. These cell lines were chosen for this study because they are known to express ESR2, suggesting that the cloned ESR2 promoter would be active in these cells. Compared to the wild-type plasmid, the luciferase activity from the variant plasmid was reduced by ∼50% in both cell lines (Fig. 3). These results suggest that rs35036378 in the ESR2 TATA box is a functional SNP with significant effect on ESR2 gene expression.

Fig. 3
figure 3

Effect of the TATA box SNP on ESR2 promoter activity. Wild-type (WT) and variant (VAR; -43(T/G)) promoters were cloned into the pGL3 plasmid and expressed in LNCaP (A) and SKBR3 (B) cells. Values are means and standard errors of the firefly luciferase activity normalized to the renilla luciferase activity of three experiments performed on three different days (p value 0.009 (LNCaP), 0.0002 (SKBR3))

Discussion

In this study, we identified six SNPs in the promoter of the ESR2 gene. Three of these are in predicted transcription factor binding sites. They all reduced the TRANSFAC transcription factor binding site scores for their respective binding sites. We focused our laboratory validation on rs35036378, since our bioinformatic analyses predicted that this SNP would reduce the functionality of the ESR2 promoter TATA box. Others have previously shown that this TATA box is functionally important in the proximal promoter of the ESR2 gene [16]. Previous studies have shown that two specific isoforms of ESR2 originated from two distinct untranslated first exons namely exon OK and exon ON [17] and that these regulatory mechanisms can influence the deregulation of ESR2 expression in cancer [18]. Studies by Ashworth et al. have shown an association between venous ulceration and SNPs in the region spanning exon ON and promoter [19]. Various other studies have shown that methylation of the ESR2 promoter ON is associated with reduced expression of ESR2 isoforms in breast, prostate, and ovarian cancer tissue and cell lines [2022].

The rs35036378 SNP found in the untranslated region 5′ of exon ON reduced the ESR2 promoter luciferase reporter activity by approximately 50%. The luciferase assays were performed in the SKBR3 breast cancer and LNCaP prostate cancer cell lines because they are known to express ESR2 mRNA. In the context of cell types in vivo, the SNP could have more or less effects. Since ESR2 appears to have roles in multiple tissues, such as the brain [23], ovary [24], prostate [25], this SNP could have effects on a variety of functions. The altered phenotypes observed in the ESR2 knockout mice (βERKO) [26] and in mice treated with ESR2 selective ligands [27] should be useful in guiding which phenotypes may be affected by this SNP in humans. These would likely include effects on neurologic and reproductive function.

Since the ESR2 promoter SNPs are relatively rare, the large numbers of subjects that would be required to obtain sufficient power to detect an association would likely make a traditional genotype–phenotype study impractical. Consequently, deriving an algorithm for combining multiple rare SNPs will be necessary. We have used such a scoring system for the drug-metabolizing enzyme genes for which the functions of multiple rare SNPs are well characterized [9]. In order to establish these algorithms, the function of each SNP must be known. The studies described here provide a functional characterization of the rs35036378 SNP that would be the beginning of such an effort. In addition, a better understanding of the function of the individual ESR2 SNPs may also help understand the mechanisms behind the SNPs that have been associated with clinical phenotypes [16].

These studies are the beginning of what will be required to thoroughly understand the functional genetic variability in the ESR2 gene. Within our study, there are two additional SNPs that are predicted to affect three transcription factor binding sites. Each of these will require careful testing. Individually, each of the three SNPs that were predicted to be functional (rs8008187, rs3829768, rs35036378) is relatively rare. However, if the bioinformatic functional predictions are correct, then the combined frequency of the haplotypes containing one or more functionally affected alleles would total nearly 10% in the African American population. It may then be possible to conduct genotype–phenotype association studies with all of the reduced function alleles combined into one group, similar to the way that we combine CYP2D6 poor metabolizer alleles, so the frequency would be approximately 10%, which should be sufficient to detect genotype–phenotype associations in reasonable sized studies. This illustrates the importance of further in vitro studies to test each of the promoter SNPs. There are also many SNPs throughout the rest of the ESR2 gene that may also have effects on gene function through mechanisms such as alternative splicing. Since methylation appears to be part of the regulatory mechanisms of ESR2 expression [28], SNPs that affect ESR2 methylation may also be of substantial importance. These SNPs will all require similar testing to obtain a comprehensive understanding of the influence of genetic variability on the function of the ESR2 gene.