Introduction

Reproductive and hormonal factors, including age at menarche, parity, number of full-term pregnancies, age at first full-term pregnancy, breastfeeding, age at menopause, body mass index (BMI), and physical activity, are associated with breast cancer risk [1, 2]. Consistent with these observations, breast cancer risk is higher among women with higher circulating levels of endogenous estrogen [35] and among women using combined postmenopausal estrogen and progestin therapy (EPT) [611].

Sex steroid hormones, whether endogenous or exogenous, are synthesized and metabolized by many different enzymes (reviewed in [12]). Therefore, genetic variation among genes regulating sex steroid hormone levels may increase or decrease breast cancer risk by influencing hormone metabolism. Polymorphisms in several hormone pathway genes, including CYP19A1 and COMT, have been associated with endogenous hormone levels [1316]; however, studies investigating the association of genetic variation in hormone metabolism genes and breast cancer risk have generated mixed results [14, 1729]. Recently, a large study from the Breast and Prostate Cancer Cohort Consortium (BPC3) comprehensively analyzed 37 steroid hormone metabolism pathway genes in relation to breast cancer risk and reported null associations [16, 30], suggesting that inconsistencies in the literature may be due to findings observed by chance in small studies. However, it is possible that the inconsistencies may be explained at least partly by differences in the distribution of environmental factors that modify the effects of genetic polymorphisms.

EPT use increases breast cancer risk to a much greater extent than estrogen-only therapy (ET) [611]. Therefore, it is important to examine EPT and ET use separately when investigating gene-hormone therapy (HT) interactions. To date, few studies have investigated the gene-HT interactions by hormone formulation by using a comprehensive single-nucleotide polymorphism (SNP)-tagging approach. The California Teachers Study is an effective resource to study these questions because detailed data on hormone use were collected at baseline, and approximately 41% and 28% of the postmenopausal participants reported current use of EPT and ET, respectively [31]. Thus, using data from a case-control study nested within the California Teachers Study, we systematically investigated whether any of 24 hormone metabolism pathway genes or their interactions with HT were associated with breast cancer risk.

Materials and methods

Participants

The California Teachers Study has been previously described in detail [32]. Briefly, the California Teachers Study is a prospective cohort of women who were current, recent, or retired California public school professionals in 1995. By returning a baseline questionnaire in 1995-1996, 133,479 women joined the cohort and provided detailed information on menopausal status, HT use, and other lifestyle and medical factors. The baseline questionnaire is available on the California Teachers Study website [33]. Cancer diagnoses in the cohort are identified through annual linkage with the California Cancer Registry, which identifies at least 99% of cancers diagnosed in California [34]. The California Teachers Study has been approved by the institutional review boards at all participating institutions: the Cancer Prevention Institute of California (CPIC), the University of California at Irvine (UCI), the University of Southern California (USC), and the City of Hope in accordance with assurances filed with and approved by the US Department of Health and Human Services.

The nested breast cancer case-control study was designed to obtain biospecimens from breast cancer cases and unaffected controls within the 113,590 members of the cohort who were less than 80 years old at baseline, had continued residence in California during the study period (1995 to time of blood draw), and, before 1998, had no prior history of invasive or in situ breast cancer. Cases were women who had a histologically confirmed invasive primary carcinoma of the breast (International Classification of Disease for Oncology code C50 restricted to morphology codes under 8,590) and who were 80 years old or younger between 1 January 1998 and 31 May 2007. One control participant per case was randomly selected from the cohort and frequency-matched to the case on age at baseline (within 5-year age groups), self-reported race/ethnicity (white, African-American, Latina, Asian, and other), and three broad geographic regions (that is, California Teachers Study specimen collection centers). Cancer cases were identified through quarterly linkages of the cohort to the California Cancer Registry database. Control selection was conducted on a quarterly basis without replacement. For each wave of control selection, a reference date was determined (that is, January, April, July, and October of each year). Nearly equal numbers of controls were selected in each wave. One control participant had her breast cancer diagnosed after her control selection and was included in the analysis as both a control and a case.

Collection of biological specimens and DNA extraction

Collection of biological specimens was conducted at three study centers (CPIC in the northern half of California and USC and UCI in the southern half). Women who declined blood draw were asked whether they were willing to provide a saliva sample, and, if so, an Oragene DNA self-collection kit (DNA Genotek, Kanata, ON, Canada) was mailed to the participant with informed consent and return postage paid mailing materials. From the 8,118 eligible participants (2,618 cases and 5,500 selected controls), we collected biological specimens for 74% of the cases (1,923 cases: 1,684 blood specimens and 239 saliva specimens) and 61% of the controls (3,350 controls: 3,012 blood specimens and 338 saliva specimens). All biologic samples were sent via overnight courier to the UCI laboratory for DNA extraction. DNA was extracted from blood clots by using Qiagen Clotspin Baskets and DNA QIAmp DNA Blood maxi kits (Qiagen Inc., Valencia, CA, USA) in accordance with Qiagen protocols. DNA was extracted from saliva samples by using the Oragene protocol (DNA Genotek). The nested breast cancer case-control study of the California Teachers Study has been approved by the institutional review boards at all participating institutions, and all participants provided written informed consent.

Tagging SNP selection and genotyping

We investigated 24 genes that are involved in female sex steroid hormone biosynthesis, metabolism, or excretion. Reviews on the function of these genes are available elsewhere [12, 35, 36]. For 21 of these genes, a tagging SNP approach was used (Supplementary Table S1 in Additional file 1). For 16 of the 21 genes, we selected linkage disequilibrium tagging SNPs across each gene, 20 kb upstream of 5' untranslated region (UTR) and 10 kb downstream of 3' UTR. The tagging SNPs were selected to capture all common SNPs (minor allele frequency (MAF) of at least 5%) in individuals of European ancestry with minimum pairwise R2 of at least 0.80 by using the Snagger software [37] and the data from the International HapMap Project for the white CEPH (Utah residents with ancestry from northern and western Europe) population (HapMap release 21, July 2006, genotype build 36 [38]). Tagging SNPs for five genes included in the present study had been selected by BPC3 by using the TagSNPs program [30, 39]. To facilitate the comparison across studies, we used the BPC3-selected tagging SNPs for these five genes (Supplementary Table S1 in Additional file 1). The BPC3-selected tagging SNPs captured all common SNPs (MAF of at least 5%) in whites with minimum pairwise R2 of at least 0.8. For the remaining 3 of the 24 genes, we genotyped a few selected SNPs due to space limitations of our genotyping platform. For CYP19A1, the selected SNPs were shown to be associated with circulating estrogen concentrations in a comprehensive analysis [13] (Supplementary Table S1 in Additional file 1).

A total of 1,751 breast cancer cases and 1,697 controls were available for genotyping. We included a random sample of 193 replicates (105 cases and 88 controls) to monitor reproducibility and track plate flips or switches. The DNA samples were genotyped for the selected tagging SNPs by using the Illumina Golden Gate Assay (Illumina, Inc., San Diego, CA, USA) in the USC Core Facility. About 10% of the genotyped samples, including 189 cases and 150 controls, had a genotyping success rate (call rate) of less than 90% and were excluded from the analyses. The genotyping concordance rate based on the 160 duplicate samples with a call rate of at least 90% was 99.9%. Call rates were lower when the DNA was obtained from saliva samples: 23% of saliva samples and 8% of blood samples had a call rate of less than 90% and these samples therefore were excluded from the analyses. However, the genotyping concordance after excluding the low call rate samples was excellent for saliva samples (greater than 99.9%). In addition, results from sensitivity analyses excluding saliva samples were similar to those using all samples. For the present study, we also excluded 88 women (52 cases and 36 controls) who self-reported to have had a previous history of cancer, leaving 1,510 cases and 1,511 controls. Because the majority (approximately 91%) of participants were non-Hispanic whites, we restricted the analyses to 2,746 non-Hispanic white women (1,351 cases and 1,395 controls).

Of the 355 SNPs genotyped, 332 had an SNP call rate of at least 90%. We excluded an additional three SNPs that had discordant readings in more than two duplicate pairs, eight SNPs with an MAF of less than 1% among non-Hispanic white controls, and four SNPs in COMT, CYP11A, SULT1A1; SULT1A2, UGT1A8 not in Hardy-Weinberg equilibrium (P < 0.001); thus, we analyzed 317 SNPs in the present study.

For 19 of the 20 genes for which we used the tagging SNP approach, the genotyped tagging SNPs efficiently captured 70% to 100% of all common SNPs (MAF of greater than 5%) in the HapMap dataset of European ancestry (HapMap release 24, genotype build 36) with pairwise R2 of at least 0.80 (Supplementary Table S1 in Additional file 1). We did not have sufficient tagging coverage for CYP21A2.

Imputation

We imputed SNPs in gene regions where we found a statistically significant association (P < 0.01 before multiple testing correction) with breast cancer risk among all women or among subgroups defined by menopausal status. To do this, we used publicly available HapMap genotype data in the CEPH population as the reference sample (HapMap release 24, genotype build 36 [38]) and MACH 1.0 [40]. We excluded imputed SNPs when the MAF was less than 1% or when the R2 was less than 0.30 [40].

Statistical analyses

We used conditional logistic regression models with strata defined by 5-year age group and the three specimen collection centers to estimate the odds ratios (ORs), 95% confidence intervals (CIs), and P values associated with each SNP by using log-additive models. The results did not change after further adjustment for potential confounders including menopausal status (premenopausal, postmenopausal, and unknown), HT use status at baseline (never used HT, currently using ET, currently using EPT, used HT in the past, and unknown), BMI (less than 25, 25 to less than 30, at least 30 kg/m2, and unknown), parity (0, 1 to 2, at least 3, and unknown), and oral contraceptive use (never, ever, and unknown). Therefore, we presented the results from the conditional logistic regression models not adjusting for these potential confounders.

We performed subgroup-specific analyses by menopausal status and, among postmenopausal women, by HT use at baseline, defining the groups of interest as never used HT, currently using ET, and currently using EPT. We calculated P for interaction by likelihood ratio test comparing the model with and without the product term of genotype (0, 1, and 2 copies of minor allele, as ordinal variable) and menopausal status or HT use. The interaction tests for HT use were done separately for current EPT use (as compared with never HT use) and for current ET use (as compared with never HT use).

Results

A greater proportion of breast cancer cases than controls were currently using EPT at baseline; a lower proportion of cases than controls had high parity. Cases also had slightly earlier age at menarche than control women (Table 1).

Table 1 Characteristics of the study participants at time of joining the cohort

Evaluation of the q-q plot of the P values for the association between the 317 SNPs in the 24 hormone metabolism genes and breast cancer risk showed no evidence of systematic bias (Supplementary Figure S1 in Additional file 2).

We observed statistically significant associations at a P value of less than 0.01 with two SNPs in SLCO1B1 and one SNP in HSD17B4 in the overall analysis (Table 2). However, after multiple comparisons were corrected for, none of these associations was statistically significant. For postmenopausal women, 10 SNPs in SLCO1B1 were associated with breast cancer risk with an uncorrected P value of less than 0.01. Of these, the associations for SNPs rs11045773, rs11045777, rs16923519, rs4149057, and rs11045884 remained statistically significant after correction for multiple testing within the gene (within-gene PACT < 0.05). However, after multiple testing across all genes was corrected for, none of these associations was statistically significant. The ORs and 95% CIs associated for all tested SNPs are provided in Supplementary Table S2 in Additional file 3. There was some evidence that, of these, rs4149013 in SLCO1B1 was associated with breast cancer risk in postmenopausal women (OR 1.39, 95% CI 1.07 to 1.81; uncorrected P = 0.015).

Table 2 Single-nucleotide polymorphisms associated with breast cancer risk with P values of less than 0.01a

When examining by HT use, we observed a strong association between several SNPs in SLCO1B1 and postmenopausal breast cancer risk in current EPT users (Table 3). Breast cancer risk among postmenopausal women who were using EPT at baseline increased more than twofold per minor allele of rs4149013 (OR 2.31, 95% CI 1.47 to 3.62; P = 0.0003, within-gene PACT = 0.002). This association was statistically significant even after PACT adjustment across all SNPs studied (PACT = 0.023). The P value for interaction (EPT versus never HT use) was 0.019 (not corrected for multiple testing). When we combined the homozygous and heterozygous minor allele carriers (that is, a dominant genetic model), we observed similar OR estimates and P values (OR 2.43, 95% CI 1.53 to 3.85; P = 0.0002) (Supplementary Table S3 in Additional file 4). We did not observe any significant associations in never HT users and ET users. There was no statistically significant difference in effects when stratifying by estrogen receptor status.

Table 3 Single-nucleotide polymorphisms associated with breast cancer risk in postmenopausal women using estrogen-progestin therapy at baselinea

Discussion

In this case-control study nested within the California Teachers Study cohort, genetic variation in only 1 (SLCO1B1) of 24 genes in the hormone metabolism pathway genes was associated with breast cancer risk. SLCO1B1, a gene involved in the hepatic uptake of female sex steroids, seemed to be associated with breast cancer risk among postmenopausal women. This association was statistically significant after correcting for multiple testing within the gene but was not statistically significant after we corrected for multiple testing across genes. However, there was also an indication that EPT may interact with SNPs in SLCO1B1; one variant in SLCO1B1 (rs4149013) was statistically significantly associated with breast cancer risk in EPT users.

Our findings of no association between SNPs in hormone metabolism pathway genes and breast cancer risk are consistent with the results from other large studies such as BPC3 [13, 29, 30, 41] and meta-analyses of selected functional SNPs in CYP1A1 [23], SULT1A1 [42, 43], CYP1B1 [44, 45], and COMT [46], although two smaller meta-analyses of selected functional SNP in COMT supported an association in Caucasian populations [47, 48]. Although a few studies have suggested associations between genetic polymorphisms in other hormone pathway genes, including CYP11A [25, 26], CYP1A1/CYP1A2 [49, 50], CYP1B1 [51, 52], SULT1E1 [53], or COMT [54], many of these associations were not observed consistently [30, 49, 50, 5557]. In addition, the few studies other than BPC3 that have investigated polymorphisms in CYP2C9 [51], CYP3A4 [49, 51, 56], HSD17B2 [58], SRD5A1 [56], and UGT2B7 [51], in relation to breast cancer risk among Caucasian populations, have reported no associations. However, a recent study using admixture maximum likelihood (AML)-based global tests reported that genetic variation in androgen-estrogen conversion pathway was associated with breast cancer risk, although no single SNP was significant after correcting multiple testing [59].

To our knowledge, no studies have investigated genetic variation in SLCO1B1 and breast cancer risk by hormone therapy use. SLCO1B1, also known as OATP-C or SLC21A6, is expressed in the liver and plays an important role in transporting drugs and endogenous substrates from the blood into the hepatocytes (reviewed in [60]). Endogenous substrates of SLCO1B1 include steroid hormone conjugates such as estradiol-17β-glucuronide and estrone-3-sulfate [61, 62]. Serum estrone sulfate (E1S) is a major form of circulating estrogen in postmenopausal women and can be converted to estradiol in breast tissue [63]. E1S is also a major component of conjugated equine estrogens, the estrogen component of the predominant (prior to 2002) HT regimens in the US [64]. Genetic variation in SLCO1B1 has been shown to decrease the uptake of E1S and estradiol glucuronide in several [61, 65] but not all [66] studies. Furthermore, one study has shown that genetic variation in SLCO1B1 is associated with blood E1S levels in Caucasians [61], suggesting that genetic variation in SLCO1B1 may interact with HT use. Rs4149013 is located near the 5' end of SLCO1B1. The functional significance of this variant is not known, but even if there is none, this variant could be linked to a causal allele.

In the publicly available Cancer Genetic Markers of Susceptibility (CGEMS) breast cancer data [67], we found additional support implicating SLCO1B1. In CGEMS, five genotyped SNPs in SLCO1B1 (rs704166, rs852550, rs852549, rs7489119, and rs2306283) were associated with breast cancer risk with a P value of less than 0.05. These 5 SNPs, as imputed genotypes, were null in our dataset (data not shown), but this could be due to misclassification from imputation or false-positive associations across both CGEMS and our data. However, our findings and those of CGEMS, combined with the previous literature on the role of this gene in affecting estrone absorption, suggest that further investigation of the role of SLCO1B1 genetic variation and its interaction with EPT on breast cancer risk is warranted.

The strengths of this study include the systematic investigation of a large number of hormone metabolism genes and the detailed information on HT use collected at baseline. A limitation of our study was the inability to genotype all tagging SNPs for several of the genes of interest, including AKR1C4, ARSC, and CYP19A1. Thus, we cannot exclude the possibility that the lack of associations for these loci was due to incomplete tagging. Overall, we had 80% statistical power to detect ORs ranging from 1.17 to 1.40 for SNPs with an MAF of 0.05 to 0.49 by using log-additive models and an alpha of 0.05. For the subset analyses among postmenopausal women, the minimum detectable OR ranged from 1.20 to 1.46 for SNPs with an MAF of at least 0.05. The statistical power to detect associations in premenopausal women or to detect interactions with menopausal status or HT use was limited. Another limitation of this study is that HT use status assessed at baseline may have changed during follow-up. The participation rates (donating biological specimens for this nested case-control study) among the potentially eligible cohort members were moderate (74% for cases and 61% for controls). However, it is unlikely that the participation was differential according to genotype and case status, and thus selection bias is unlikely to have influenced our findings.

Conclusions

Common genetic variations in SLCO1B1 may be associated with breast cancer risk in postmenopausal women, particularly in EPT users. The known effects of variants in SLCO1B1 on estrogen metabolism suggest that further study of the role of SLCO1B1 is warranted.