Introduction

BRCA1 and BRCA2 mutation carriers are at increased risk for developing breast cancer and/or ovarian cancer. Estimates of the cumulative risk of breast cancer by age 70 years range from 46% to 87% for BRCA1 mutation carriers and from 43% to 84% for BRCA2 mutation carriers [16]. Evidence from these studies suggests that breast cancer risks in mutation carriers are modified by environmental or genetic factors. A number of large studies, facilitated through the Consortium of Investigators of Modifiers of BRCA1/BRCA2 (CIMBA), have evaluated associations between genetic polymorphisms and breast cancer risk in BRCA1 and BRCA2 mutation carriers [715].

The candidate gene (or candidate SNP) approach for identifying potential risk modifiers has been successfully used to identify a SNP in the 5' untranslated region of RAD51. Until recently, this finding has provided the most reliable evidence for a genetic modifier in BRCA2 mutation carriers [7]. A major disadvantage of using this approach to identify common genetic modifiers of breast cancer, however, is the limited understanding of mechanisms and pathways that underlie breast cancer development in families carrying mutations in BRCA1 or BRCA2. An alternative and powerful approach that can overcome such issues is the use of genome-wide association (GWA) studies to identify candidate SNPs. Analysis of breast cancer risk-associated SNPs identified by a large population-based GWA study of breast cancer [16] has shown that several of these SNPs also appear to modify risk in BRCA1 and/or BRCA2 mutation carriers [8]. Not all of the breast cancer-associated SNPs assessed have been found to modify risk in carriers, however, and some of the risk associations are specific for BRCA2 mutation carriers only and not BRCA1 [8]. While GWA studies specifically addressing risk for BRCA1 and/or BRCA2 carriers are a more direct approach to identifying modifiers of these genes using an agnostic approach, GWA studies require large sample sizes to identify genetic modifiers with confidence. To address the problem of inadequate sample size, the CIMBA was established in 2005 to link clinical and epidemiological data from many groups from around the world [17]. The GWA approach is still limited, however, in that study designs involve predefined stringent selection criteria for which SNPs identified from the initial whole genome scan are going to be analysed in subsequent replication studies, a study design enforced by current genotyping costs. Moreover, GWA studies are often limited in information about exogenous risk factors, such as environmental exposures, which confounds any effort to explore the effect of environmental factors in modifying gene-disease associations. Global gene expression analysis as a means to agnostically identify candidate genetic modifiers has the potential to prioritise SNPs for candidate genes for association studies. This may be particularly valuable given recent observations that SNPs associated with risk of cancer in the general population appear to reside in noncoding regions that may modulate gene expression.

An alternative approach to prioritising SNPs and candidate genes for association studies in BRCA1 and BRCA2 mutation carriers could rely on the selection of genes displaying associations with BRCA1 or BRCA2 mutation status at the expression level in response to DNA damage. In a previous study, we used a novel combinatorial approach to identify a subset of 20 irradiation responsive genes as high-priority candidate BRCA1 and/or BRCA2 modifier genes [18]. The expression levels of these genes were shown to be associated with BRCA1 and/or BRCA2 mutation status in irradiated lymphoblastoid cell lines from female carriers when compared against irradiated lymphoblastoid cell lines from healthy controls. Furthermore, each of the genes were tagged with one or more SNPs shown to be associated with breast cancer risk from the Cancer Genetic Markers of Susceptibility (CGEMS) Phase 1 Breast Cancer Whole Genome Association Scan [19, 20]. In the present study we investigated the association of these polymorphisms, tagged to genes demonstrated in vitro to be involved in irradiation response, with risk of breast cancer for BRCA1 and BRCA2 mutation carriers.

Materials and methods

Study participants

Eligibility of study participants was restricted to female BRCA1 or BRCA2 pathogenic mutation carriers who were aged 18 years or older. Fifteen clinic and population-based research studies from the USA, Canada, Australia, the UK and Europe submitted data to the present study (Table 1). Information collected included year of birth, age at diagnosis of breast cancer or ovarian cancer, age at last observation, family membership, ethnicity and information on bilateral prophylactic mastectomy and oophorectomy. All centres have obtained informed consent from study participants and the institutional review board approved protocols. In total, this study included up to 4,724 BRCA1 and 2,693 BRCA2 eligible female mutation carriers. Of the 2,193 and 1,189 unaffected BRCA1 and BRCA2 carriers, respectively, 972 (44.3%) and 589 (49.5%) had a relative that was in the affected group.

Table 1 Distribution of BRCA1 and BRCA2 mutation carriers by study site

SNP selection and genotyping

In a previous report, we proposed 13 genes (ARHGEF2, HNRPDL, IL4R, JUND, LSM2, MAGED2, MLF2, MS4A1, SMAD3, STIP1, THEM2, TOMM40, VNN2) as candidate modifiers of breast cancer risk for BRCA1 mutation carriers, and 14 genes (ARHGEF2, JUND, MLF2, SMAD3, STIP1, THEM2, TOMM40, ABL1, ELMO1, EPM2AIP1, PER1, PLCG2, PLD3, SLC20A1) as candidate modifiers of breast cancer risk for BRCA2 mutation carriers (see Additional file 1) [18]. Thirty-seven SNPs denoted by CGEMS as being tagged to these genes were initially identified as showing some association with breast cancer risk (P < 0.05) (see Additional file 2). Of these 37 SNPs, a panel of 32 variants were selected after successful assay design and genotyped on two platforms, using the Illumina GoldenGate assay (Illumina Inc., San Diego, California, USA) and the Sequenom MassARRAY iPLEX platform (Sequenom, San Diego, CA, USA), as previously described [21, 22]. The genotyping method used for each participating study is detailed in Table 1. Five SNPs tagged to five candidate genes (JUND, MAGED2, MLF2, MLH1, STIP1) had call rates <95% and were excluded from the analysis. The minor allele frequencies of three SNPs (rs2893535 - ELMO1, minor allele frequency = 0.033; rs2304911 - PER1, minor allele frequency = 0.043; and rs3802957 - MS4A1, minor allele frequency = 0.04) were considered too small for reliable analysis. The number of genes assessed for their associations with breast cancer risk for BRCA1 and BRCA2 mutation carriers was therefore eight and 10, respectively.

Statistical methods

Relative risks (RRs) and 95% confidence intervals were estimated using weighted Cox proportional hazards models. Each subject was followed from birth to the earliest of breast cancer, bilateral mastectomy, ovarian cancer, last follow-up, or age 80. The phenotype of interest was time to breast cancer. Mutation-specific weights were calculated using the age distribution of affected and unaffected individuals according to the methods previously outlined by Antoniou and colleagues [23]. Analyses were stratified by year of birth, ethnicity, country of residence, study site, and mutation status. A robust variance estimate was used to account for relatedness amongst individuals. Primary SNP analyses assumed a log-additive relationship between the number of minor alleles carried by each individual and time to breast cancer. Wald P values below 0.05 were declared of interest. Secondary analyses were carried out in which RR estimates were separately generated for those carrying one and two copies of the minor allele versus those with two copies of the major allele. Between-study heterogeneity was examined in each SNP by including an interaction term between the genotype and study centre.

Owing to the highly-selected nature of subjects, a number of sensitivity analyses were examined. To limit the effect of potential survival bias, subjects diagnosed more than 5 years prior to study enrolment were excluded (number affected analysed = 1,342 and 762 for BRCA1 and BRCA2 carriers, respectively). Other models were examined that excluded women with ovarian cancer (number excluded = 491 and 151 BRCA1 and BRCA2 carriers, respectively). Finally, as risk of breast cancer is reduced after bilateral oophorectomy [24, 25], analyses were carried out treating oophorectomy as a time-dependent covariate in the Cox proportional hazards models. All P values are two sided and analyses were carried out using R software [26].

Results and Discussion

A cohort of up to 4,724 BRCA1 and 2,693 BRCA2 female mutation carriers was used for the present study. Of these, 4,035 mutation carriers were diagnosed with breast cancer or ovarian cancer at the end of follow-up and 3,382 were censored as unaffected at a mean age of 44 years. The patient characteristics of BRCA1 and BRCA2 mutation carriers are presented in Table 2.

Table 2 Patient characteristics

The RR estimates for the association between SNP genotypes and risk of breast cancer for BRCA1 and BRCA2 mutation carriers are presented in Table 3 and Table 4 respectively. Of the 24 SNPs that passed quality control, the minor alleles of two SNPs were found to be associated with increased risk for BRCA1 mutation carriers (rs10242920 - ELMO1, P = 0.043; and rs480092 - LSM2, P = 0.015) and the minor alleles of three SNPs to be associated with increased risk for BRCA2 mutation carriers (rs1559949 - HNRPDL, P = 0.021; rs3825977 - SMAD3, P = 0.018; and rs7166081 - SMAD3, P = 0.004). The minor alleles of two SNPs, rs1559949 (HNRPDL) and rs3808814 (ABL1), were associated with decreased risk for BRCA1 (P = 0.022) and BRCA2 (P = 0.030) mutation carriers, respectively.

Table 3 Genotype distributions of 24 candidate modifier SNPs and hazard ratio estimates for BRCA1 mutation carriers
Table 4 Genotype distributions of 24 candidate modifier SNPs and hazard ratio estimates for BRCA2 mutation carriers

All SNPs selected for the present study (see Additional file 2) had previously been reported to be at least marginally associated (P < 0.05) with breast cancer risk through the CGEMS Phase 1 Breast Cancer Whole Genome Association Scan [18], and to be tagged to a gene whose expression level was associated with BRCA1 and/or BRCA2 mutation status in irradiated lymphoblastoid cell lines [18]. The minor allele of four out of six SNPs shown here to be associated with risk in BRCA1 mutation carriers (rs1559949 - HNRPDL; rs480092 - LSM2) or BRCA2 (rs3825977 and rs7166081 - SMAD3) had risk estimates for the homozygous genotype that were concordant with the odds ratio reported by the CGEMS study (Table 3 and 4, and Additional file 2). Furthermore, the expression of HNRPDL and LSM2 was associated with BRCA1 mutation status and the expression of SMAD3 was associated with BRCA2 mutation status [18]. The risk estimate of rs10242920 (ELMO1) was also concordant with the odds ratio determined by the CGEMS study; and although the expression of ELMO1 was not associated with BRCA1 mutation status at P < 0.001, there was an association with gene expression at P < 0.005 [18]. In contrast, the risk estimates of rs3808814 (ABL1) and rs1559949 (HNRPDL) in BRCA2 mutation carriers are not concordant with the odds ratio determined by the CGEMS study. Forest plots of study groups with 70 or more carriers and tests of heterogeneity are shown for two of the most significant SNPs (rs3825977, P-het = 0.619 and rs7166081, P-het = 0.218 at the SMAD3 locus), stratified by study site (Figure 1). The minor alleles of rs3825977 and rs7166081 are in high linkage disequilibrium (r2 = 0.77), which would be expected if their association with increased breast cancer risk is bona fide.

Figure 1
figure 1

BRCA2 plot of relative risk for rs3825977 and rs7166081 at the SMAD3 locus. BRCA2 plots of study group-specific relative risk (RR) for rs3825977 and rs7166081 at the SMAD3 locus. Study groups with 70 or more carriers and tests of heterogeneity are shown for (a) rs3825977 (overall RR (95% confidence interval (CI)) = 1.20 (1.03, 1.40), Ptrend = 0.018) and (b) rs7166081 (overall RR (95% CI) = 1.25 (1.07, 1.45), Ptrend = 0.004). OR, odds ratio.

Although further study is required to confirm whether genetic variation in SMAD3 plays a role in modifying risk of breast cancer, SMAD3 has been shown to interact with the BRCA2 protein - suggesting a possible mechanism through which SMAD3 may modify BRCA2 function [27]. Furthermore, SMAD3 is a critical regulatory factor of the transforming growth factor beta pathway, which is known to play a key role in the development of breast cancer as well as many other cancers [28, 29]. In addition, a recent study comparing dense breast tissue (a known breast cancer risk factor) with nondense tissue identified reduced expression of SMAD3 to be associated with dense tissue, indirectly supporting a role of SMAD3 expression with breast cancer risk [29].

Choosing candidate BRCA1 and BRCA2 modifier genes from a novel combinatorial approach [18], we show that four SNPs tagged to three of the 14 candidate genes show an association with breast cancer risk for BRCA1 or BRCA2 mutation carriers. We initiated the present study, however, with the expectation that SNPs in eight of the 14 genes may be associated with altered expression in BRCA1 mutation carriers, and 10 of the 14 genes with altered expression in BRCA2 carriers. We can thus argue that three out of 18 (17%) valid comparisons showed an association with risk. For either interpretation, the rate of observed association is greater than the one in 20 (5%) expected by chance. In addition, post hoc mining of the expression dataset showed that another SNP (rs10242920 - ELMO1) association that was consistent with the effect reported in the CGEMS dataset was also actually associated with altered expression in carriers, albeit with less significance (P = 0.005) than originally used for gene and SNP selection. These findings suggest that the combinatorial approach may be a useful method to prioritise candidate modifier genes for polymorphism association studies. It is notable that CIMBA GWA studies of BRCA1 and BRCA2 mutation carriers are currently underway [30]. One might therefore anticipate that the combinatorial approach would provide even greater enrichment for prioritising SNPs from GWA studies that directly relate to the disease state under study. Further studies with larger cohort size are therefore warranted to assess the benefit of carrying out such an approach.

Conclusions

We have explored the value of using biological information embedded in gene expression data to prioritise candidate modifier genes for SNP association studies. Using this combinatorial approach we were able to demonstrate a threefold enrichment of genes that contain SNPs associated with breast cancer risk for BRCA1 or BRCA2 mutation carriers. Most notable was the evidence that the SMAD3 gene, which encodes a key regulatory protein in the transforming growth factor beta signalling pathway, may contribute to increased risk of breast cancer in BRCA2 mutation carriers. These results suggest that the combinatorial approach may be a useful method to prioritise candidate modifier genes for polymorphism association studies.