Background

Epidemiological studies have provided consistent evidence of associations between environmental (predominantly lifestyle and reproductive) factors and subsequent risk of breast cancer (BC). More recently, genome-wide association studies (GWAS) have identified more than 70 single nucleotide polymorphisms (SNPs) that influence breast cancer risk [1]. Detecting a gene-environment (GxE) interaction between a SNP and an environmental risk factor has the potential to shed light on the biological process leading to disease, identify women for whom these risk factors are most relevant, and improve the accuracy of epidemiological risk models [2]. A comprehensive review summarising the rationale for and the challenges of studying GxE interactions advocated a range of measures including supporting new and larger prospective studies, the reporting of stratified analyses as supplementary material and pre-planned analyses coordinated across multiple studies [2]. In this commentary we review progress in investigating GxE interactions in the field of BC. We define GxE interaction as the modification of the effect of a genetic risk factor by an environmental factor, assessed statistically by testing the effects of gene and environment for departure from additivity, on an appropriate scale (usually the log or logit in disease studies). We focus on GxE interactions between common SNPs and established risk factors for BC (Table 1), discuss the implications of testing marker SNPs rather than the underlying causal variants that they tag and consider whether GxE studies have fulfilled their potential for illuminating disease processes or predicting risk.

Table 1 Established risk factors assessed in GxE interaction studies

GxE interactions between previously reported SNPs and established risk factors for BC

The first large (that is, at least 5,000 cases and 5,000 controls) GxE study of this type was carried out within the Million Women Study [3]. In this analysis of 7,610 cases and 10,196 controls investigating potential GxE interactions between 12 SNPs and 10 established risk factors for BC there were no GxE interactions that were significant after adjusting for multiple testing. The most significant GxE interaction was between CASP8-rs1045485 and alcohol consumption (unadjusted P = 0.003). Since the publication of this report, there have been four further analyses of this type (Table 2), two from the Breast Cancer Association Consortium (BCAC) [4],[5] and two from the Breast and Prostate Cancer Cohort Consortium (BPC3) [6],[7]. Only one of these, the largest (23 SNPs in 34,793 cases and 41,099 controls) [5], reported statistically significant GxE interactions, namely between LSP1-rs3817198 and parity (number of live births), CASP8-rs1045485 and alcohol consumption (replicating the most significant finding in the Million Women study [3]) and 1p11.2-rs11249433 and ever being parous. However, none of these interactions was replicated in the largest BPC3 study (39 SNPs in 16,285 BC cases and 19,376 controls [7]). A meta-analysis of the BCAC and BPC3 data suggested a possible interaction between SLC4A7-rs4973768 and smoking status but replication of this result has not yet been attempted.

Table 2 Details of GxE interaction studies comprising at least 5,000 cases and 5,000 controls

The Shanghai Breast Cancer Genetics Study tested for interactions using a risk score formed as the weighted sum of genotypes from 10 SNPs [8]. This would improve the power to detect a risk factor that has interactions with numerous SNPs, when there is insufficient power for the individual interactions. Although this study found no interactions with the risk score, this approach holds promise for identifying interacting risk factors in limited sample sizes.

Identification of novel risk SNPs through GxE interactions

SNPs with strong interaction effects may only be detectable when analysing gene and environment together, so they are missed by studies that consider SNPs in isolation. Methods that model and test the main and interaction effects of gene and environment jointly [9], or exploit the power of a case-only design while retaining robustness to possible gene environment dependence [10],[11] have been developed for these purposes. Recently, several of these methods were applied to 71,527 SNPs with suggestive association with BC [12]. Interactions were identified between two SNPs on 21q22.12 (rs10483028 and rs2242714) and adult body mass index (BMI), and one in ARID1B (rs12197388) with age at menarche and with parity. rs12197388 was only significant in the joint test of main and interaction effects, and the interaction term was not significant but the two SNPs on 21q22.12 were detected via their interactions, and further studies of this nature may discover more interactions using these novel methods.

Using tag-SNPs as proxies for an underlying causal variant

The GxE studies described above have relied on using marker SNPs, predominantly identified through GWAS, as proxies for the underlying causal variants. This usually leads to a loss in power to detect interactions [13]. However, if gene and environment are dependent, a marker SNP can show an interaction even if there is no interaction at the causal variant [14]. These `spurious interactions’ tend to arise when the causal variant is rare in comparison to the marker. This may not often be the case, but it nevertheless warrants caution when reporting GxE interactions. We recently studied a marker SNP (rs10235235) associated with a reduction in urinary levels of an estrogen metabolite [15]. In 47,346 cases and 47,569 controls in the Collaborative Oncological Gene-environment Study (COGS) [1],[16] this SNP showed (1) association with BC risk, (2) association with age at menarche in controls (but not cases) and (3) an interaction in which age at menarche modified the effect of rs10235235 on BC risk. In this example of a GxE interaction, therefore, the genetic risk factor (rs10235235) is dependent on the environmental risk factor (age at menarche), which could lead to a false positive [14]. Of the interactions reported to date, gene-environment dependence has been observed between LSP1-rs3817198 and parity and 21q22.12-rs10483208/rs2242714 and BMI. In cases such as these, an interaction can only be definitively established when all variation in the associated regions has been identified and tested.

Conclusions

Several of the recommendations made by Hunter in 2005 [2] have been pursued: large new prospective studies continue to be supported (for example the Breakthrough Generations study, a long-term cohort study focused on BC has recruited 112,049 women over the period 2003 to 2011 [17]), consortia of case-control (BCAC) and cohort studies (BPC3) have coordinated their efforts for analyses of data from >70,000 women and the results of stratified analyses have been conscientiously reported in supplementary tables [5],[7]. However, one of the lessons of the first generation of BC GWAS [18]-[20] was that the per-allele disease odds ratios (ORs) associated with individual tag-SNPs were much smaller than hypothesised (1.07 to 1.26). Results from the first generation of GxE analyses suggest that the same may be true for interactions, with the reported interaction ORs ranging from 1.06 to 1.59. If marginal ORs of 1.07 to 1.26 require scans of several thousand cases and several thousand controls then, depending on the number of GxE interactions being tested, only GxE studies that include tens of thousands of cases and controls will have the power required to detect interactions. It is hardly a coincidence that the first study to report statistically significant GxE interactions was the first study of this order of magnitude [5]. Of the three significant interactions reported by Nickels and colleagues there is replication only for CASP8-rs1045485 and alcohol consumption. It is currently too soon to tell whether GxE interactions will shed light on disease processes and improve the accuracy of epidemiological risk models. Before we can make this assessment we will need to replicate or refute the reported interactions, identify the causal variants that underlie tag-SNP associations and validate the next generation of epidemiological risk models.

Authors’ contributions

OF and FD wrote this commentary jointly. Both authors approved the final version.

Authors’ information

OF is a group leader in genetic epidemiology at the Breakthrough Breast Cancer Research Centre. FD is professor of statistical genetics at the London School of Hygiene and Tropical Medicine.