Background

Prolactin (PRL) is an essential regulator of mammary development, acting synergistically with a wide variety of hormones during puberty and pregnancy [1, 2]. Early studies in animals first demonstrated that prolactin could induce spontaneous mammary tumors [36]. Results from in vitro studies support the findings from animal studies and suggest that PRL stimulates proliferation, [710] increases cell motility and cytoskeleton alterations [11], and promotes angiogenesis [12] in human breast cells. Prolactin receptor (PRLR), found in both normal and malignant breast tissue, has been reported to be slightly more prevalent in malignant tissue [13]. Though early clinical studies of patients treated with bromocriptine, an inhibitor of pituitary PRL, found no association with breast cancer, recent evidence of autocrine/paracrine regulation [14, 15] of PRL in extra-pituitary tissue provides further support for a possible role of PRL in tumorigenesis.

There are few prospective epidemiological studies evaluating plasma PRL levels and breast cancer risk. The largest prospective cohort study of postmenopausal women reported a 34% increase in risk of breast cancer when comparing top to bottom quartiles (> 12 vs. < 7.4 ng/mL) of PRL levels [16]; these findings were similar to results from an earlier study reporting a non-significant increase in risk of 1.34, based on a smaller sample size [17]. Two smaller studies of postmenopausal women also reported a positive association, but these were also non-significant [18, 19]. Results from case-control studies [2027] give conflicting results and are difficult to interpret due to the retrospective nature of blood collection. There have been limited prospective data on prolactin levels and breast cancer risk among premenopausal women [18, 19, 28] until recently; the Nurses' Health Study reported a non-significant 30% increase in breast cancer risk among premenopausal women when comparing top to bottom quartiles (> 17.6 vs. < 9.8 ng/mL) of PRL levels among 377 cases and 786 controls [29].

In humans, the PRL gene lies on chromosome 6 and is approximately 10 kilobases (kb) in length with five coding exons [30]. An additional non-coding first exon has been described that lies 5.8 kb upstream of the pituitary promoter site [31]. This distal promoter region has been associated with extra-pituitary expression of PRL, described in a variety of tissues including decidua, lymphocytes, and breast tissue. Depending on promoter usage, PRL mRNAs may differ slightly in length but encode the same mature polypeptide protein hormone [32].

The human PRLR gene is located on chromosome 5 and is approximately 180 kb in length and is originally described as having 10 exons, of which exons 3–10 are coding exons [33]. Recently, six alternative non-coding first exons have been described whose functions are unknown but have been found to be expressed in human ovary, testis, liver, breast tissue, and breast cells [34, 35]. In addition, an exon 11 located 15 kb downstream of exon 10 has been reported; alternative splicing of exons 10 and 11 appear to produce novel short forms of the receptor that may be involved in distinct signaling pathways than the common long form [36, 37].

Previous studies have demonstrated that genetic polymorphisms in candidate genes can lead to variations in plasma levels of encoded proteins [38, 39]. In this study, we used a combination of approaches that included sequencing the coding regions to identify common missense variation, and haplotype-based analyses to characterize common patterns of genetic variation across each locus to test the hypothesis that genetic variations in PRL and PRLR are associated with plasma PRL levels and breast cancer risk. Tests of association were performed in a large case-control study of breast cancer among African-American (AA), Native Hawaiian (NH), Japanese-American (JA), Latina (LA), and White (WH) women in the prospective Multiethnic Cohort Study (MEC). To our knowledge, this is the first comprehensive study of common genetic variation in PRL and PRLR genes in relation to breast cancer risk and plasma PRL levels in a multiethnic population

Results

Characterization of Genetic Variation at PRL and PRLR loci

We genotyped 80 SNPs in PRL and 173 SNPs in PRLR (approximately 1 SNP every 1 kb) to characterize linkage disequilibrium (LD) and haplotype patterns in a multiethnic panel of 349 unaffected subjects (69–70 of each of the 5 racial/ethnic populations in the MEC). We characterized genetic variation across 59 kb of the PRL locus, 24 kb upstream of PRL's alternative first exon 1a (5.8 kb upstream of pituitary promoter site) to 20 kb downstream of exon 5, using 80 common (minor allele frequency, MAF, ≥ 5%) SNPs (Additional File 1, Table S1). In PRL, we observed three regions of LD (blocks 1, 3, 4, see the Methods section for a description of the criteria used to define LD block regions), and one 19 kb region ("pseudo-block 2") with little evidence of LD. Based on the dense coverage across this 19 kb region (1 common SNP every < 1 kb apart, on average), we decided to construct haplotypes to test associations with common variation (Figure 1, Additional File 1, Table S1). In this region, the multivariate squared correlation, R s 2, between the selected tagSNPs and all SNPs examined in the multiethnic panel was = 0.70 in all ethnic groups, which suggests that unmeasured SNPs in this region are most likely well predicted by our set of tags. Thus, we describe four regions in PRL: block 1 (SNPs 1–24; 14 kb), "block" 2 (SNPs 25–45; 19 kb), block 3 (SNPs 46–59; 7 kb), and block 4 (SNPs 61–77; 14 kb). In general, block sizes in PRL were similar among racial/ethnic groups (Additional Files 2, 3, 4, 5, 6).

Figure 1
figure 1

Linkage disequilibrium (LD) plot across the prolactin (PRL) locus for all racial/ethnic groups combined. The horizontal black line depicts the 59-kilobase region of chromosome (chr) 6 analyzed in our multiethnic panel. The PRL gene is shown in grey (RefSeq gene = completed genes from the human genome assembly). Alternative exon 1a (associated with the distal extra-pituitary promoter region) lies 5.8 kb upstream of exon 1 (associated with the pituitary promoter region). The 80 single nucleotide polymorphisms (SNPs) used for genetic characterization are listed below the black line. The LD plot, presented at the bottom of the figures, is based on the measure of D'. Each diamond indicates the pairwise magnitude of LD, with dark grey indicating strong LD (D' > 0.8) and a logarithm of odds score of greater than 2.0. (Figure prepared with LocusView, Broad Institute, Cambridge, MA, unpublished software by T. Petryshen, A. Kirby, and M. Ainscow [61]).

In PRLR, we assessed 210 kb of the locus, from 25 kb upstream of the first alternative exon E13 to 10 kb downstream of exon 11 (the alternatively spliced exon 10) (Additional File 1, Table S2). Using 173 common SNPs, we defined nine blocks of LD in PRLR: block 1 (SNPs 6–30; 14 kb), block 2 (SNPs 31–39; 10 kb), block 3 (SNPs 41–66; 29 kb), block 4 (SNPs 73–88; 22 kb), block 5 (SNPs 95–113; 31 kb), block 6 (SNPs 114–135; 35 kb), block 7 (SNPs 136–153; 24 kb), block 8 (SNPs 154–161; 3 kb), and block 9 (SNPs 167–173; 6 kb) (Figure 2, Additional File 1, Table S2). Compared to the other racial/ethnic groups, African-Americans demonstrated smaller block sizes for block 3 (SNPs 49–58), block 5 (SNPs 102–113), block 6 (SNPs 114–124), and block 7 (SNPs 147–153), and Native Hawaiians had larger block sizes, with combined blocks 1–3 (SNPs 1–72) and blocks 5–9 (SNPs 97–173) (Additional Files 7, 8, 9, 10, 11). "Tagging" SNPs (tagSNPs) were selected to allow for high predictability of the common haplotypes (= 5% frequency in any one ethnic population) with LD blocks in both genes: 33 tagSNPs in PRL and 60 tagSNPs in PRLR (Additional File 1, Tables S1 and S2; see Methods for a description of the approach utilized to select tagSNPs). African-Americans demonstrated a greater number of common haplotypes per block (Additional File 1, Table S9). Therefore, in order to accurately predict the common haplotypes in PRLR for African-Americans, additional tagSNPs were genotyped for blocks 1, 2, 3, 5, 7, and 9 (tagSNPs 16, 24, 35, 49, 111, 112, 151, 153, 167, and 171).

Figure 2
figure 2

Linkage disequilibrium (LD) plot across the prolactin receptor (PRLR) locus for all racial/ethnic groups combined. The horizontal black line depicts the 210-kilobase region of chromosome (chr) 5 analyzed in our multiethnic panel. The PRLR gene is shown in grey (RefSeq gene = completed genes from the human genome assembly). Alternative first exons are shown in black below the gene: hE13, hE1N1, hE1N2, hE1N3, hE1N4, and hE1N5. The 173 single nucleotide polymorphisms (SNPs) used for genetic characterization are listed below the black line. The LD plot, presented at the bottom of the figures, is based on the measure of D'. Each diamond indicates the pairwise magnitude of LD, with dark grey indicating strong LD (D' > 0.8) and a logarithm of odds score of greater than 2.0. (Figure prepared with LocusView, Broad Institute, Cambridge, MA, unpublished software by T. Petryshen, A. Kirby, and M. Ainscow [61]).

Of the 60 tagSNPs selected in PRLR we were unable to genotype four of them in the case-control study because Illumina assays could not be designed, block 1: SNP6 (rs9986182), SNP12 (rs9292582), SNP24 (rs6451192), and SNP29 (rs7701473). This resulted in the inability to distinguish between haplotypes 1A1, 1A2, and 1A3 in LA (minor allele frequency 16.9%, 6.4%, and 6.6%), between haplotypes 1A1 and 1A3 in AA (9.2% and 2.2%), and between 1A1 and 1A2 in NH (17.2% and 4.5%) and in WH (34.6% and 5.9%) (Additional File 1, Table S9) which spans 14.2 kb, 142 kb upstream of the start codon in exon 3. Aside from block 1 of PRLR, the predicted common haplotypes frequencies in the multiethnic panel were similar to those observed in the larger case-control sample (Additional File 1, Tables S8-S11). Therefore, only haplotypes with ≥ 5% frequency in cases or controls, per each racial/ethnic group, are shown in Additional File 1, Tables S10 and S11. To assess how well the selected tagSNP perform in capturing the common SNPs that were not selected as tagSNPs in each population, we calculated multi-marker R2 measures for both genes [40]. For PRL, the fraction of SNPs predicted with a multi-marker R2 > 0.7 was 89%, 93%, 98%, 100%, and 100% for AA, NH, JA, LA, and WH, respectively. For PRLR (even without the four tagSNPs), the fraction of SNPs captured with multi-marker R2 > 0.7 was 84%, 92%, 90%, 92%, and 93%. Thus, the selected tagSNPs capture most of the SNPs evaluated in the LD characterization phase, and based on high-density SNPs coverage in this study (1 SNPs every ~1 kb, on average), we expect these tags to also predict the vast majority of all common alleles in these genes.

We sequenced the exons and splice-site regions of PRL and PRLR in germline DNA from 95 advanced breast cancer cases (19 of each racial/ethnic group). PRL and PRLR sequencing confirmed only one missense SNP, Ile100Val (rs16871473) in exon 5 of PRLR. The SNP was observed most commonly among Native Hawaiians (MAFs, 11%, 15%, 5%, 1%, and 2% in AA, NH, JA, LA, and WH, respectively) (Additional File 1, Table S2). A previously reported missense SNP in exon 6 of PRLR (Ile170Leu) was monomorphic in all ethnic groups [41]. For PRL, we discovered a low frequency synonymous SNP in exon 3 (A+444152G). We were also able to validate a previously reported synonymous SNP in exon 5 (rs6239), but not a synonymous SNP in exon 2 (rs6240) or a missense SNP in exon 4 (rs6238) (Additional File 1, Table S1).

Case-control analysis

The distribution of breast cancer risk factors among the 1,615 breast cancer cases and 1,962 controls were consistent with the patterns observed in the overall cohort, and have been previously published [42] (Additional File 1, Table S3). We tested the independent effects of each tagSNP for PRL and PRLR in the case-control population (Additional File 1, Tables S4 and S5). Odds ratios (ORs) and 95% confidence intervals (CIs) were estimated for each tagSNP using unconditional logistic regression adjusted for age and ethnicity (co-dominant effects are reported in the manuscript, detailed genotype-specific effects are shown in the tables). Because of the large number of comparisons being performed, we used a relatively stringent type I error criteria (p < 0.0005) for evaluating the significance of any single association. (This "corrects" for performing approximately 100 independent tests, close to the number of tagSNPs genotyped for both genes). The strongest associations between individual SNPs and breast cancer risk were with SNP34 (rs9466314) in "block 2" of PRL (co-dominant effect OR, 1.48; 95% CI, 1.00–2.18; p = 0.049) and SNP49 (rs34024951) in block 3 of PRLR (co-dominant effect OR, 0.85; 95% CI, 0.73–0.99; p = 0.032) (Table 1). Of note, SNP34 in PRL was only observed among AAs, with a MAF of 6% in cases and 5% in controls in our sample. The missense Ile100Val SNP in PRLR was not associated with breast cancer risk (co-dominant effect OR, 1.02; 95%CI, 0.83–1.24; p = 0.883) (Additional File 1, Table S5).

Table 1 Nominally significant associations between prolactin (PRL) and prolactin receptor (PRLR) tagSNPs and breast cancer risk

We performed haplotype analyses using the most common haplotype as the reference group (Additional File 1, Tables S10 and S11); results were similar when we used all other haplotypes as the reference group (data not shown). In the analysis of the common haplotypes, haplotype 3I of PRL was nominally associated with risk (OR, 1.27; 95%CI, 1.02–1.59; p = 0.036) (Additional File 1, Table S10). This haplotype was only common in NH (14%) and JA (18%), and the effect was observed only in JA (OR, 1.39; 95%CI, 1.07–1.81; p = 0.015; p-heterogeneity = 0.193). No haplotypes in PRL or PRLR haplotypes were significantly associated with breast cancer risk using our type I error criteria (p < 0.0005) (Additional File 1, Tables S10 and S11).

Plasma prolactin level analysis

Among the 362 postmenopausal controls in the biomarker analysis, the median plasma PRL level was 8.1 ng/mL. Prolactin levels did not vary by race/ethnicity, before or after adjusting for potential confounders: parity, age at first pregnancy, body mass index, family history of breast cancer, and menopause age and type (p-heterogeneity = 0.447) (data not shown). The strongest association between a single SNP and PRL levels was with SNP44 (rs2244502) of PRL, which showed approximately a 50% increase in levels between minor allele homozygotes versus major allele homozygotes (Additional File 1, Tables S6 and S7). We also observed nominally significant associations between prolactin levels and seven SNPs in PRL (SNP33, SNP34, SNP39, SNP44, SNP54, SNP62, SNP65) and two SNPs (SNP73, SNP148) in PRLR (Table 2). None of these associations were significant at the p < 0.0005 level.

Table 2 Nominally significant associations between prolactin (PRL) and prolactin receptor (PRLR) tagSNPs and plasma PRL levels

Discussion

We genotyped a high density of SNPs to characterize the haplotype structure of PRL and PRLR genes, using the criterion for haplotype-based studies described by Gabriel et al. [43] and the multivariate Rh 2 statistic [44] to provide high predictability of the common haplotypes in PRL and PRLR. We found that in almost all ethnic groups and for both genes, the selected tagSNPs performed well in predicting the common SNPs typed in the LD characterization phase (average multi-marker R2 = 0.95) and the common haplotypes defined by the tagSNPs (average minimum Rh 2 = 0.87).

Assuming an average multi-marker R2 = 0.90 between causal alleles and tagSNPs or haplotype predictors, we had 96% power to detect relative risks of 1.29 per haplotype or genotype copy with 10% frequency, allowing for a 5% type I error rate. However, given the large number of statistical tests for each gene, we expected several false positive associations. By a more stringent type I error criteria (p < 0.0005) the detectable relative risk, at 90% power, for a dominant allele with 10% frequency, is 1.45 per copy. By ethnic group, we had 78–82% power to detect large ORs ≥ 2.1 (except in NH, ORs ≥ 3.0) with this significance level. The purpose of this study however, was to assess shared common genetic variation across ethnic groups. For PRL levels among 362 controls, only fairly large differences in mean levels could be detected with good power. For example, after correcting for 100 comparisons (e.g. using p < 0.0005), we estimate that we had 90% power to detect an association between PRL levels and a common (10%) variant only when that variant was associated with approximately a 50% change in mean levels per genotype/haplotype copy.

A recent German study of 441 cases and 552 controls reported an increase in breast cancer risk associated with genetic variation in PRL: rs1341239 (SNP35) (OR, 1.67; 95%CI, 1.11–2.50 for homozygous individuals) and rs12210179 (OR, 2.09; 95%CI, 1.23–3.52), which we did not genotype in our sample. SNP35 has been shown to be functionally significant in relation to Systemic Lupus Erythematosus (SLE) [45, 46]. Vaclavicek et al. reported that rs12210179 does not lie within any transcription binding site and is in high LD (|D'| = 0.91) with SNP35 [47]. Among Whites in the MEC, SNP35 is well predicted by tagSNP33, pairwise R2 = 0.86. Using HapMap data [48], rs12210179 is common (27%) among Caucasians (vs. Yorubans 4%, Japanese 1%) and for Caucasians, is well predicted by tagSNP43 (pairwise R2 = 1.00). Though we did not test these SNPs directly in our study, using these "surrogate" tagSNPs, we did not find any significant association with breast cancer risk among Whites (tagSNP33: OR 0.96; 95%CI, 0.80–1.16, p = 0.705; tagSNP43: OR 0.98; 95%CI, 0.78–1.23, p = 0.879) or overall (tagSNP33: OR 1.03; 95%CI, 0.93–1.14, p = 0.584; tagSNP43: OR 1.07; 95%CI, 0.93–1.22, p = 0.346).

Vaclavicek et al. also reported a TGTG haplotype in PRL comprised of rs1341239 (SNP35), rs12210179 (not genotyped in our sample), rs2244502 (tagSNP44), and rs1205960 (tagSNP56) associated with breast cancer risk (OR, 1.42; 95%CI, 1.07 – 1.90) [47]. This haplotype falls in "block" 2 and block 3 of our characterization of the PRL locus (Additional File 1, Table S1). Using 11 tagSNPs for "block 2" (multi-marker R2 = 0.79–1.00 for Whites) and 7 tagSNPs for block 3 (multi-marker R2 = 0.92–1.00 for Whites), we did not observe an association with breast cancer risk (Additional File 1, Table S10). We used "surrogate" tagSNPs 33, 43, 44, and 56 to best approximate the TGTG haplotype but did not observe an association between common surrogate haplotypes and breast cancer risk among Whites (global test p = 0.78) or overall (global test p = 0.70). Further studies are needed to directly evaluate the TGTG haplotype in relation to breast cancer risk, especially among Whites.

We found that tagSNP34 (2.1 kb upstream of SNP35 in the promoter region of PRL) had the strongest association with risk of breast cancer (p = 0.049). It is possible that this SNP may be functionally significant as both SNP34 and SNP35 lie in the distal extra-pituitary promoter region of prolactin. However, this SNP was only observed among AAs, with a minor allele frequency (MAF) of 6% in cases and 5% in controls in our sample. Further studies are needed to assess the relevance of this finding. The strongest association in PRL between a haplotype and breast cancer risk was with haplotype 3I in block 3 (p = 0.036). This haplotype was only observed in JA and NH, and the association with risk was confined to JA.

For PRLR, the only missense SNP previously described in relation to breast cancer risk is a Leu150Ile SNP in exon 6 which was reported in 2 of 38 cases in a Turkish study [41]. In our large sample, this SNP was monomorphic; however, it is possible that it is rare or only observed in certain populations.

Vaclavicek et al. also reported a protective TCC haplotype in PRLR (OR, 0.69; 95%CI, 0.54–0.89; p = 0.004) using just three tagSNPs. The TCC haplotype consists of rs13354826 (not genotyped in our sample, block 2), rs9292573 (SNP59, block 3), and rs37389 (SNP141, block 7). In Whites, these SNPs are well predicted: rs13354826 (tagSNPs 7 and 35, HapMap data, multi-marker R2 = 1.00), SNP59 (tagSNP55, pairwise R2 = 1.00), and SNP141 (tagSNP139, pairwise R2 = 0.94). We used "surrogate" tagSNPs 7, 35, 55, and 139 to approximate the TCC haplotype and found that the common haplotypes comprised of these surrogate SNPs were not significantly associated with risk. Though we are unable to form a direct prediction of the TCC haplotype, we believe that our approach is comprehensive enough to have detected a true association within this region of the strength reported by Vaclavicek et al. Using 56 tagSNPs across high density coverage of 210 kb of the PRLR locus (25 kb upstream of first alternative exon E13 to 10 kb downstream of exon 11), we did not find an association between SNPs or haplotypes in PRLR and breast cancer risk.

We did not generate convincing evidence of an association between PRL levels and common genetic variation in PRL and PRLR, although our study was limited by small sample size. The most significant p-value was 0.002 for SNP44 in PRL, which corresponds to a 48% increase in PRL levels between major and minor allele homozygotes. The Nurses Health Study [16] demonstrated that > 1.6-fold difference between upper and lower quartiles of PRL levels was associated with a 34% increase in breast cancer risk. We did not observe an association between breast cancer risk and SNP44 (p = 0.575). However, even if the association between SNP44 and prolactin levels were correct, and assuming a direct influence of genetically determined prolactin levels on breast cancer risk consistent with the Nurses Health Study, the 48% increase in PRL levels for minor allele homozygotes of SNP44 would still only correspond to a 10% risk increase between carriers and non-carriers of two copies. Such an increase in risk is not detectable in this study with reasonable power, which could explain the apparent lack of association between SNP44 and breast cancer risk in this study. Further studies in larger samples are needed to definitively assess the relationship between this polymorphism, plasma PRL levels and breast cancer. In addition, our results may not be generalizable to premenopausal women since we only included postmenopausal women in our analysis. Prolactin levels have been shown to decline slightly among postmenopausal women compared to premenopausal women [2]. However, the NHS study evaluated prolactin levels among premenopausal and postmenopausal women and found no difference in risk of breast cancer by menopausal status: premenopausal (RR 1.3, 95% CI 0.9–1.9) vs. postmenopausal (RR 1.3, 95% CI 1.0–1.8) women [16, 29]. It is unclear whether we could draw similar conclusions from our study population.

Strengths of this study include the large case-control sample size, comprehensive assessment of LD block structure, and tagSNP selection providing excellent prediction of nearly all SNPs or common haplotypes, across five racial/ethnic populations. However, the ability to definitively evaluate ethnic-specific risks and associations with plasma PRL levels should be interpreted with caution, due to the small number of subjects in these groups. Further studies using larger samples of PRL levels are needed to assess the relationship with polymorphisms in the PRL and PRLR genes, and in particular, to validate the association observed between PRL levels and SNP44 in PRL.

Conclusion

This the largest and most comprehensive study of common genetic variation in PRL pathway genes in relation to breast cancer risk and plasma PRL levels. In contrast to a recent study of PRL and PRLR in relation to breast cancer, we observed no strongly significant associations with breast cancer risk. We also did not find an association between common genetic variation in PRL or PRLR and circulating plasma PRL levels. Our results emphasize the importance of using high density genotyping to adequately characterize genes for use in association studies and caution against false positive results when interpreting these data. Though we did not observe an association with breast cancer risk, results from our study provide a framework for future association studies of PRL pathway genes in relation to other diseases (such as Systemic Lupus Erythematosus) and for larger studies of plasma PRL levels.

Methods

Subjects

The MEC consists of over 215,000 men and women in Hawaii and Los Angeles (with additional African-Americans from elsewhere in California) and has been previously described in detail [49]. The cohort is mainly comprised of five self-described racial-ethnic populations: Native Hawaiians, Japanese-Americans and Whites from Hawaii, and African-Americans, Japanese-Americans and Latinos from Los Angeles. Between 1993 and 1996, participants entered the MEC by completing a self-administered mail questionnaire that asked detailed information about dietary habits, demographic factors, personal behaviors, history of prior medical conditions, family history of common cancers, and for women, reproductive history and exogenous hormone use. The participants were between the ages 45 and 75 when they entered the cohort.

Incident cancers in the MEC are identified by record linkage to the Hawaii Tumor Registry, the Cancer Surveillance Program for Los Angeles County, and the California State Cancer Registry. These population-based tumor registries participate in the National Cancer Institute's Surveillance, Epidemiology and End Results (SEER) program of cancer registration which is known to have an excellent (98%) case ascertainment. From the registries we also obtained information about stage of disease at diagnosis. Breast cancer cases were classified as "advanced" cases when diagnosed with invasive/non-localized disease (SEER stage ≥ 2) at diagnosis.

Beginning in 1996, blood samples were collected from incident breast cancer cases. At this time, blood collection was also initiated in a random sample of MEC participants to serve as a control pool for genetic analyses. The participation rates for providing blood sample were ≥ 65% for cases and controls. Demographic characteristics related to socio-economic status and acculturation (e.g. age at cohort entry, education, place of birth, and years living in the United States) were similar among those who provided a blood sample and women in the entire cohort. Eligible breast cancer cases in this study consisted of women with incident breast cancer diagnosed after enrollment in the MEC through April 2002. Controls were women without breast cancer prior to entry into the cohort and without a cancer diagnosis up to April 2002, and were frequency matched to cases by age and ethnicity. Because < 6% of cohort members have moved outside of the Hawaii and Los Angeles between enrollment (1993–1996) and the cut-off date for diagnosis (April 2002) the likelihood of missing cases that accrued in the cohort over this period of time is low.

The study consists of 1,615 invasive breast cancer cases (345 African Americans, 425 Japanese Americans, 335 Latinas, 109 Native Hawaiians, and 401 Whites) and 1,962 controls. By racial/ethnic group, the number of cases and controls were 345/426 AA, 109/290 NH, 425/420 JA, 335/386 LA, and 401/440 WH. The study protocol was approved by the Institutional Review Boards at the University of Hawaii and at the University of Southern California.

Subjects included in the analysis of plasma PRL levels were a random sample of the controls in the case-control panel. A total of 500 postmenopausal women with previously collected biospecimens (100 in each ethnic group) were included. Women reporting hormone therapy use at blood draw were excluded (n = 128), and individuals with PRL levels that were 2.5-fold outside the normal range were excluded (n = 10).

Gene Sequencing

We sequenced the exons and splice-site regions of PRL and PRLR in germline DNA from 95 advanced breast cancer cases (19 of each racial/ethnic group). We used DNA samples from advanced cases to increase the probability of discovering single nucleotide polymorphisms (SNPs) that are biologically relevant to breast cancer. Sequencing was performed using ABI BigDye terminator chemistry on the ABI 3730 DNA Analyzer (Applied Biosystems, Foster City, CA). The PolyPhred program was used to identify polymorphisms with manual review by at least two observers, and all putative coding variants were validated by genotyping in the same panel of advanced cases and in the multiethnic panel (discussed below).

Characterization of Linkage Disequilibrium and Haplotype Patterns

We used a haplotype-based approach to study common variation in PRL and PRLR in the MEC, previously described elsewhere [42]. We selected single nucleotide polymorphisms (SNPs) from both the public (National Center of Biotechnology Information [50]) and private (Celera [51]) databases to construct high density SNP maps that included up to 20 kilobases (kb) upstream of the transcription initiation site and 10 kb downstream of the last exon of each gene, for a total coverage of 59 kb in PRL and 210 kb in PRLR. Block structure was assessed using SNPs with MAF ≥ 10%. Blocks were initially defined following alignment across racial/ethnic groups; borders were characterized by SNPs at the extreme ends of the block in any one ethnic group, except for African-Americans, whose block sizes, as expected, were modestly smaller than the other groups. We tested the suitability of this block definition by evaluating whether SNPs surrounding presumed block borders modified the number or identity of common haplotypes estimated within the blocks; changes in the number of haplotypes and the introduction of recombinant haplotypes would indicate whether SNPs were spanning a potentially important site of historical recombination and guided us in redefining a block boundary.

We genotyped common SNPs (MAF > 5% in at least one racial/ethnic group) at a density of 1 SNP every ~1 kb on average across the locus, all known missense SNPs in public database, and all newly identified missense SNPs in our sequencing effort. In total, 139 (PRL) and 276 (PRLR) SNPs were selected and genotyped in a multiethnic panel of 349 women in the MEC without a history of cancer (n = 69–70 per racial-ethnic group). This sample size allows > 99% power to detect common haplotypes (≥ 5% frequency) that are shared across all ethnic groups, and about 90% power to detect common ethnic-specific haplotypes. Of these SNPs, 36 (PRL) and 74 (PRLR) were identified as monomorphic and 17 (PRL) and 22 (PRLR) genotyped poorly (SNPs missing genotype data for ≥ 25% of samples or out of Hardy-Weinberg equilibrium more than one of the populations, p ≤ 0.01). This left 80 (PRL) and 173 (PRLR) SNPs with MAF = 5% in at least one racial-ethnic group to be included in the haplotype analysis.

The |D'| and r2 statistics were used to assess pairwise linkage disequilibrium (LD) between the common SNPs. Within regions of strong LD [43], haplotype frequency estimates were constructed from the genotype data in the multiethnic panel (one ethnicity at a time) using the expectation-maximization (E-M) algorithm of Excoffier and Slatkin [52]. The squared correlation (Rh 2) between the true haplotypes (h) and their estimates were then calculated as described by Stram et al.[44]. "Tagging" SNPs (tagSNPs) for the case-control study were then chosen by finding the minimum set of SNPs for each ethnic group that would have Rh 2 > 0.7 for all common haplotypes with an estimated frequency of ≥ 5%. TagSNP selection was performed using the tagSNPs program [53].

Values of the multi-marker and pairwise R2 values between tagSNPs and unmeasured SNPs were calculated using the Tagger algorithm [40] in Haploview and the slightly more general method given in Stram 2004 [54].

Genotyping

DNA for all subjects was extracted from white blood cell fractions using the Qiagen Blood Kit (Qiagen, Chatsworth, CA). SNP genotyping in the multiethnic panel was performed using the Sequenom (Sequenom Inc, San Diego, CA) platform. Tag SNP genotyping in the breast cancer cases and controls was performed by the 5' nuclease TaqMan allelic discrimination assay (ABI7900) and the Illumina (Illumina Inc, San Diego, CA) platforms. Replicate blinded quality control samples (5%) were included to assess reproducibility of the genotyping procedure; the concordance was ≥ 99.7% for all platforms.

Plasma Prolactin Measurements

Prolactin was measured using a double-antibody, immunoradiometric assay from Diagnostic System Laboratories (Webster, Texas) in hormone analysis laboratories at the International Agency for Research on Cancer. The assay was performed in multiple batches with equal numbers of each population in each batch. The theoretic sensitivity (as stated by the manufacturer) is 0.1 ng/ml. Mean intra- and inter-batch coefficients of variation were 5.4% and 12.8% respectively, using 25 microliters sample volumes. Plasma PRL levels have been shown to be stable in whole blood for 24–48 hours [55]. In the MEC, time from blood collection to processing was no more than six hours.

Statistical Analysis

Haplotype frequencies among breast cancer cases and controls were estimated using the tagSNPs selected to distinguish the common haplotypes (≥ 5% frequency) for each ethnic group in the multiethnic panel as described [56]. The E-M algorithm was used to estimate haplotype frequencies for the tagSNPs in the combined dataset (cases + controls) and individual estimates of haplotype count (expected number of copies of each haplotype carried by each individual) from the E-M were outputted to an external file and merged with case-control status. These estimates were then used as explanatory variables in logistic regression models.

As shown empirically [57], the majority of common variation is shared across racial and ethnic populations [57, 58] while the biological effects on risk for the majority of common disease-associated alleles have also been shown to be consistent across populations [59]. These observations justify pooling genetic data across racial and ethnic populations if no heterogeneity is noted. To assess the consistency of genetic effects across populations, we first tested for heterogeneity across racial-ethnic groups prior to pooling genetic data. These tests were performed using a likelihood ratio test following the inclusion of an interaction term between the each haplotype (or SNP) and ethnicity in the logistic regression model. Pooled odds ratios (ORs) and 95% confidence intervals (CIs) were then estimated for each haplotype and tagSNP using unconditional logistic regression adjusted for age and ethnicity. Because of the large number of comparisons being performed we used a relatively stringent type I error criteria (p < 0.0005) for evaluating the significance of any single association. (This "corrects" for performing approximately 100 independent tests, close to the number of tagSNPs genotyped for both genes).

We used the methods described by Zaykin et al. to perform global tests of association between haplotypes and cancer risk within each LD block and to estimate haplotype-specific odds ratios [60]. ORs were estimated for each common haplotype using the most common haplotype as the reference group and for each SNP using the more common genotype as the reference group. We also performed the haplotype analyses using all other haplotypes as the reference group and performed individual SNP analyses for co-dominant effects, both of which yielded similar results (data not shown). Because further adjustment for study area (Hawaii or Los Angeles) and the established breast cancer risk factors (first-degree family history of breast cancer, body mass index, parity, age at first birth, age at menarche, type and age at menopause, use of hormone replacement therapy, and alcohol consumption) did not impact our results, we only present results from the age- and ethnicity-adjusted models.

We also calculated the effect of SNPs and estimated haplotypes on plasma PRL levels using generalized linear models adjusted for continuous (age, anthropometry) and categorical (reproductive history) variables. The hormone measurements were log-transformed to best approximate a normal distribution. These values were transformed back to normal physiologic values for presentation. Means are presented as least-squares means (LS means). For all analyses, a dominant, co-dominant, and recessive model were fitted.

The haplotype frequencies and counts were estimated using tagSNPs program [53]. All other statistical analyses were conducted using SAS version 9.1 (SAS Institute, Cary, NC).