Introduction

Breast cancer is the most common malignancy in women, excluding nonmelanoma skin cancer, and the second leading cause of cancer death in women following lung cancer [1]. A family history of breast cancer is a significant risk factor for the development of the disease; at least 10 to 15% of all breast cancer cases may be due to inheritance of a single gene mutation or multiple genetic variants [24]. The identification of the hereditary breast and ovarian cancer genes BRCA1 and BRCA2, which are involved in DNA damage response, specifically double-strand break repair by homologous recombination [5, 6], has greatly increased our understanding of the molecular pathways important in breast cancer susceptibility. Additionally, clinical studies have led to the development of effective screening and prevention strategies in women who carry BRCA1 and BRCA2 mutations [7], some of which, such as risk-reducing oophorectomy, confer a survival benefit [8]. However, ultimately only 20 to 30% of familial breast cancers are due to BRCA1 or BRCA2 mutations [9] and additional genetic variants, such as common low penetrance alleles, must contribute to breast cancer susceptibility. Early identification of individuals at increased risk of breast cancer due to these low-risk variants may lead to enhanced screening and prevention strategies and potentially improved overall survival for this group of patients. This review will summarize the current understanding of common low penetrance susceptibility alleles that contribute to breast cancer risk.

Classification of breast cancer susceptibility genes

With some exceptions, there is an inverse relationship between the risk conferred by a variant in a breast cancer susceptibility gene and the frequency of the variant in the population [4]. Variants generally can be classified as rare high-risk mutations, rare moderate-risk mutations, and common low-risk variants. The majority of variants within a given gene generally fall into one of these categories, although exceptions are being increasingly recognized; for example, the moderate penetrance mutation BRCA1 p.Arg1699Gln [10]. In general, deleterious mutations in the BRCA1 and BRCA2 genes occur in approximately 1:400 to 1:800 women in an unselected population [11] and confer a high (40 to 80%) lifetime risk of breast cancer [12]. Deleterious mutations in genes such as TP53 and PTEN also confer high lifetime risks of breast cancer, but occur even more rarely in the population [1316]. A number of moderate penetrance mutations have been identified in genes such as ATM, BRIP1, CHEK2 and PALB2 among others, which confer a twofold to fourfold increased risk for breast cancer, although the risk appears higher in the context of a family history [1720]. Altogether, high and moderate penetrance breast cancer susceptibility mutations in these genes probably account for just over 30% of familial breast cancer cases.

A proportion of the remaining familial breast cancer risk is explained by common low penetrance alleles, of which the majority that have been studied are single nucleotide polymorphisms (SNPs). Carriage of the risk allele at these loci typically confers between a 1.04-fold to 1.40-fold increase or a 0.75-fold to 0.95-fold decrease in breast cancer risk. These alleles are often found between 10 and 50% risk allele frequencies in the population and may act in a combinatorial polygenic manner in an individual to affect breast cancer risk. These interactions are complex; considering alleles with a relative risk for breast cancer of 1.5 and found at a population frequency of 30%, it has been estimated that an individual would need 33 to 40 such risk alleles to explain a threefold increased risk of breast cancer [21]. Whereas genes with mainly high penetrance mutations were identified by linkage and positional cloning and genes with mostly moderate penetrance mutations by candidate gene approaches, identifying the large number of important common low-risk breast cancer variants has required large-scale unbiased discovery approaches. Genome-wide association studies (GWAS) have been critical in the identification of these common low-risk variants.

Evolution in methods to identify common low-risk breast cancer susceptibility alleles

Candidate gene approaches to identify common breast cancer risk alleles

Beginning in the 1990s, multiple case–control studies were conducted to identify common low penetrance variants associated with a risk of breast cancer. These initial studies by necessity employed a candidate gene approach. The most commonly studied loci show biallelic inheritance, giving rise to three genotypes: the common allele homozygote, the heterozygote and the minor allele homozygote [22]. In many candidate gene studies, the frequencies of the three genotypes at a given candidate locus are assayed in population-based breast cancer cases and matched controls, and, assuming a dominant genetic model, the relative risk of breast cancer for individuals with the heterozygote or minor allele homozygote genotypes are calculated using the common allele homozygotes as the baseline. Unfortunately, many of these studies yielded either false positive or nonsignificant associations with breast cancer risk, probably because the majority of the early studies were underpowered [22]. Dunning and colleagues [22] identified 46 early candidate gene association studies in breast cancer. The median number of cases and controls was only 319, and only 10 of 46 studies had >90% power to detect a 2.5-fold increase in risk for a SNP with a minor allele frequency of 0.2. Furthermore, the three loci identified as potentially remaining significant in the meta-analysis were considered to be false positive because the associations with breast cancer risk showed P values in the range of 0.02 to 0.002, which increased to over 0.05 considering multiple comparisons.

Therefore, to identify novel low penetrance breast cancer risk loci, large consortia were formed with the goal of conducting adequately powered case–control genetic association studies. The Breast Cancer Association Consortium (BCAC) was established with this goal and initially included over 20 collaborative groups from European, Australian, American and Asian centers. The BCAC initially employed a candidate gene approach and pooled data from 18 mostly European studies to evaluate 16 SNPs previously reported to be significantly associated with breast cancer risk [23]. By increasing the number of cases and controls genotyped by approximately threefold over any one study of a SNP, the BCAC found that only five of the 16 SNPs (31%) had borderline statistical significance in the larger pooled population [23]. Genotyping of additional individuals within the BCAC study population showed that association of only two SNPs (12%) retained statistical significance: rs1045485 (2q33.1, CASP8 p.Asp302His) and rs1982073 (19q13.2, TGFB1 p.Pro10Leu) (Table 1) [24]. These studies confirmed that large numbers of cases and controls are needed to identify replicable associations of SNPs and breast cancer risk.

Table 1 Major genome-wide association and candidate gene studies to identify SNPs associated with breast cancer risk in populations of European and Asian ancestry

The multistage genome-wide association study approach

The development of platforms to assay the genotypes of hundreds of thousands of SNPs simultaneously allowed identification of risk alleles without a priori knowledge of the location of the risk locus. Using linkage disequilibrium, several hundred thousand SNPs can be used to represent the several million SNPs in the human genome (tag SNPs), allowing for GWAS associating SNP genotypes with disease risk instead of a candidate gene approach. Given the impracticality and cost of genotyping a large enough number of cases and controls to give sufficient power using a genome-wide array, the initial GWAS utilized multistage approaches [2527] (Table 1). Easton and colleagues initially performed genome-wide genotyping of 227,876 SNPs in a case set of 390 family-history-positive breast cancer cases and 364 controls from the United Kingdom, followed by genotyping of the 12,711 most significant SNPs in 3,990 unselected cases and 3,916 controls also from the United Kingdom [25]. From this set, 30 SNPs were selected and genotyped in 21,860 cases and 22,578 controls from the BCAC. Hunter and colleagues performed genome-wide genotyping of 528,173 SNPs in 1,145 cases and 1,142 controls, followed by genotyping of eight SNPs in an additional 1,776 cases and 2,072 controls from the Cancer Genetic Markers of Susceptibility population from the United States [26]. Cases were non-Hispanic Caucasian women with mostly postmenopausal breast cancer, unselected for family history. Finally, Stacey and colleagues genotyped 311,524 SNPs in 2,183 cases and 12,877 controls from Iceland, again unselected for a family history of breast cancer [27]. The nine top SNPs from this study were then genotyped in an independent replication set of 3,898 cases and 6,921 controls from Iceland, Spain, Sweden and the Netherlands. In addition, Stacey and colleagues genotyped 10 candidate SNPs at 5p12-11 in 5,028 cases and 32,090 controls of European descent [28].

In total, these three initial GWAS identified significant associations for a risk of breast cancer with nine SNPs in seven genomic regions. Three SNPs within the FGFR2 gene were identified – rs2981582 (10q26.13, FGFR2 c.109 + 906 T > C), rs2420946 (10q26.13, FGFR2 c.109 + 1899A > G), and rs1219648 (10q26.13, FGFR2 c.109 + 7033 T > C) [25, 26] – and the association of this locus with breast cancer risk has been replicated in a number of other studies and retained statistical significance in meta-analyses (see references in later Table 3). The other SNPs identified were rs13387042 (2q35, intergenic), rs10941679 (5p12, intergenic), rs889312 (5q11.2, intergenic, 5′ to MAP3K1), rs13281615 (8q24.21, intergenic), rs3817198 (11p15.5, LSP1 c.*13 + 200 T > C), and rs3803662 (16q12.1, intergenic, 5′ to TOX3) [25, 27, 28] (Table 1), and the associations with breast cancer risk have also been well replicated (see references in later Table 3). Of note, the SNPs identified by Stacey and colleagues only showed association in estrogen receptor (ER)-positive breast cancer; however, the numbers of ER-negative cases included was small and the other studies did not stratify by ER status [27, 28].

Subsequent genome-wide association studies and candidate gene studies in European and Asian populations

Several other multistage GWAS and candidate gene studies were carried out between 2008 and 2012 in populations of individuals of mainly European or Asian ancestry [2934]. These studies identified 18 SNPs showing significant associations with a risk of breast cancer. Zheng and colleagues identified rs2046210 (6q25, intergenic, 6 kb 3′ to CCDC170) to be associated with a risk of both ER-positive and ER-negative breast cancer in Chinese individuals; this association was replicated in a population of American women of European ancestry [29]. Three GWAS and two candidate gene studies were performed to continue the analysis of populations of European ancestry [3034]. These studies identified associations with breast cancer risk for 14 SNPs: rs11249433 (1p11.2, intergenic), rs4973768 (3p24.1, SLC4A7 c.*2242G > A), rs3757318 (6q25, CCDC170 c.1294-129G > A), rs1011970 (9p21.3, CDK2NB antisense RNA), rs865686 (9q31.2, intergenic), rs2380205 (10p15.1, intergenic, 2.6 kb 5′ to GDI2), rs10995190 (10q21.2, ZNF365 c.981 + 59126G > A), rs704010 (10q22.3, ZMIZ1 c.-337 + 12121 T > C), rs614367 (11q13.3, intergenic), rs10771399 (12p11.22, intergenic, 29 kb 5′ to PTHLH), rs1292011 (12q24.21, intergenic), rs999737 (14q24.1, RAD51B c.1037-43041C > T), rs6504950 (17q22, STXBP4 c.-156-6504G > A), and rs2823093 (21q21, intergenic, 5′ to NRIP1) (Table 1).

The iCOGS array and the advent of the large-scale candidate gene association study

The most recently identified set of SNPs associated with breast cancer risk in Europeans and Asians come from an unprecedented genotyping effort by the Collaborative Oncological Gene–environment Study (COGS) [35]. A total of 211,155 SNPs were rationally chosen for a custom array (iCOGS) and tested in 52,675 unselected breast cancer cases and 49,436 controls from the BCAC. The SNPs on the iCOGS array were nominated by consortium members, and were: SNPs identified by meta-analysis of prior GWAS; SNPs for fine mapping of known susceptibility loci; candidate functional variants and moderate penetrance variants; and SNPs related to other traits, including other cancers and medical conditions. In all, data were obtained for 199,961 SNPs in 52,675 cases and 49,436 controls from the BCAC study groups, which now include 41 European populations, nine Asian populations, and two African-American populations.

In the first phase of the main COGS breast cancer study, associations with breast cancer risk for previously established breast cancer loci were studied [35]. For the FGFR2 genomic region, SNP rs2981579 (10q26.13, FGFR2 c.110-12117 T > C) was chosen (this SNP is in linkage disequilibrium at r2 <0.6 with the three SNPs described above) and showed highly significant associations with a risk of breast cancer. For the remaining 26 SNPs presented in Table 1, strong association with breast cancer risk was shown for 22 of the SNPs. The SNPs rs2380205 (10p15.1, intergenic, 2.6 kb 5′ to GDI2) and rs1045485 (2q33.1, CASP8 p.Asp302His) showed a trend towards association, and the SNPs rs2284378 (20q11, RALY c.-93 + 6158 T > C) [63] and rs1982073 (19q13.2, TGFB1 p.Pro10Leu) [23, 24, 44] were not tested on the iCOGS array. Finally, three SNPs initially found to be associated with ER-negative breast cancer – rs10069690 (5p15.33, TERT c.1951-205G > A), rs17530068 (6q14.1, intergenic) and rs8170 (19p13.1, BABAM1 c.837G > A) (Table 2) [57, 63, 92] – were tested as established loci in this phase of the study. SNPs rs10069690 and rs17530068 were found to have a significant association and rs8170 showed a trend towards association with breast cancer risk. Michailidou and colleagues therefore considered there to be 27 established low penetrance breast cancer risk loci prior to the discovery phase of the COGS study (the 26 bold SNPs in Table 3 studied in COGS and rs2284378) [35].

Table 2 Major genome-wide association and candidate gene studies to identify SNPs associated with breast cancer risk in other populations
Table 3 Extended information on SNPs identified in the major GWAS for breast cancer risk or SNPs used to generate published polygenic risk scores

To identify novel SNPs associated with breast cancer risk, 29,807 SNPs identified by analysis of nine prior GWAS but not found in prior breast cancer loci were chosen and data collected for 45,290 cases and 41,880 controls [35]. From this analysis, 41 SNPs were newly identified as significantly associated with breast cancer risk in unselected cases of mainly European and Asian descent [35]. In all, the authors estimated that ~14% of familial breast cancer risk in people of European descent is explained by these 67 established loci, with an additional ~14% of the familial risk explained by SNPs that showed associations in the COGS study but did not reach statistical significance.

Some of the SNPs newly identified in the COGS study as associated with breast cancer risk are found in genes with a known or plausible role in cancer susceptibility. Seven of the SNPs are single nucleotide variations found within the gene boundaries of DNA repair or cell cycle genes, namely: rs11571833 (13q13.1, BRCA2 p.Lys3326Ter), rs11552449 (1p13.2, DCLRE1B p.His61Tyr), rs9790517 (4q24, TET2 c.-193 + 17535C > T), rs2046210 (6q25, intergenic, 6 kb 3′ to CCDC170), rs2236007 (14q13.3, PAX9 c.631 + 41G > A), rs2588809 (14q24.1, RAD51B c.757-98173 T > C), and rs941764 (14q32.11, CCDC88C c.271-15014 T > C). In addition, three SNPs are in the immediate vicinity of genes that have shown prior associations with a risk of other malignancies: rs12493607 (3p24.1, TGFBR2 c.95-3300G > C), rs7072776 (10p12.31, intergenic, 382 bp 3′ of MLLT10), and rs6001930 (22q13.1, MKL1 c.-59-16944A > G). Further studies will need to be done to determine whether these SNPs or other variants in linkage disequilibrium are functionally causative, and whether they influence expression or function of the gene proximate to which they are located or other surrounding genes.

From lumping to splitting – genome-wide association studies and candidate gene studies identify common low penetrance risk alleles in subgroups of patients

Common low penetrance loci in estrogen receptor-positive versus estrogen receptor-negative breast cancer

Patients with breast cancers that are negative for ERs (ER-negative) with or without negativity for Her2, the former referred to as triple-negative breast cancers (TNBC), have a significantly poorer prognosis and occur in patients with different clinical characteristics than ER-positive breast cancers [93, 94]. The set of common risk variants contributing to breast cancer risk is thus likely to be different among ER-positive patients, ER-negative patients and TNBC patients. Delineation of ER-negative/TNBC versus ER-positive breast cancer-specific loci could refine risk models to assist in identification of women at higher risk for TNBC; that is, the poorer prognosis type of breast cancer. In the GWAS by Stacey and colleagues, the cases were stratified by ER positivity and a stronger association was seen for the three identified SNPs with ER-positive than ER-negative breast cancer [27, 28].

The majority of the other early GWAS, however, did not stratify by ER status because the number of patients with ER-negative breast cancer was small in these studies. Focused studies with adequate power were therefore needed to identify SNPs associated with risk in this breast cancer subtype. A number of candidate SNP studies have investigated some of the known common risk variants specifically in patients with either ER-positive breast cancer, ER-negative breast cancer or TNBC [36, 44, 66, 75, 84, 92, 95, 96]. In some cases for the SNPs at established breast cancer loci, the associations were stronger for or specific to TNBC – that is, rs13387042 (2q35, intergenic), rs889312 (5q11.2, intergenic, 5′ to MAP3K1), rs3817198 (11p15.5, LSP1 c.*13 + 200 T > C), rs999737 (14q24.1, RAD51B c.1037-43041C > T), rs3803662 (16q12.1, intergenic, 5′ to TOX3), and rs8170 (19p13.1, BABAM1 c.837G > A) [44, 66, 92] – or ER-negative breast cancer – that is, rs11249433 (1p11.2, intergenic) [36]. In other cases, SNP associations were stronger for or specific for a risk for ER-positive breast cancer – that is, rs865686 (9q31.2, intergenic) [75] and rs614367 (11q13.3, intergenic) [84].

Other studies have used the fact that certain patient subgroups have a higher incidence of TNBC, for example BRCA1 mutation carriers and African-Americans, to study common risk variants in this subgroup. Antoniou and colleagues performed a GWAS in BRCA1 carriers; in this study, rs8170 (19p13.1, BABAM1 c.837G > A) was then genotyped in ER-negative and TNBC cases and found to be significantly associated with the risk of breast cancer in these subgroups (Table 2) [51]. In the study by Haiman and colleagues, a GWAS was first performed in patients of both African and European ancestry unselected for hormone receptor status; the identified SNP rs10069690 (5p15.33, TERT c.1951-205G > A) was then genotyped in a set of TNBC patients and found to be significantly associated with this subgroup (Table 2) [57]. Finally, Siddiq and colleagues performed a meta-analysis of prior GWAS and found that rs17530068 (6q14.1, intergenic) and rs2284378 (20q11, RALY c.-93 + 6158 T > C) were associated with a risk of ER-negative breast cancer (Table 2) [63]. The COGS group has also used the large-scale candidate SNP approach specifically in patients with ER-negative breast cancer and TNBC [39]. In this study, 13,276 SNPs on the iCOGS array were assayed in 6,514 ER-negative cases and 41,455 controls. The study identified four new SNPs – rs4245739 (1q32.1, MDM4 c.*32C > A), rs6678914 (1q32.1, LGR6 c.213-7375G > A), rs12710696 (2p24.1, intergenic), and rs11075995 (16q12.2, FTO c.-138 + 11162A > T) – that show specific associations with ER-negative breast cancer (Table 2) [39]. In addition, this study group performed a meta-analysis of 10,707 ER-negative cases and 76,649 controls and found that 18 of the 26 established breast cancer loci had associations with P <0.05 with ER-negative breast cancer, as did 25 of the 41 new loci from the main COGS breast cancer study (Table 3) [39].

Common low penetrance loci in ethnic subgroups

There is much interest in using SNP genotyping to stratify breast cancer risk and inform clinical management (as discussed below). However, one of the recognized limitations is the extent to which the major GWAS performed in European populations can be generalized across ethnicities. A number of candidate gene studies and GWAS have now been carried out in patients of African and East Asian descent; some of the most well-studied common SNPs have shown odds ratios (ORs) consistent with those seen in European studies (Table 3) [56, 58, 81, 9799]. However, many of the other SNPs identified as associated with risk in Europeans have not been studied in African, East Asian or other populations. Recently, however, Zheng and colleagues performed a GWAS in women with breast cancer of Asian ancestry and combined the data with genotyping of Asian individuals using the iCOGS array. They found that 31 of the 67 established breast cancer loci also show associations with a risk of breast cancer in East Asians (Tables 2 and 3) [37]. In addition, Long and colleagues genotyped the 67 loci in African-American cases and controls, and found that seven of the 67 SNPs showed significant associations and three SNPs had borderline associations with breast cancer risk in African-Americans with on average 83% African ancestry (Tables 2 and 3) [42].

Several SNPs have been found to be associated with breast cancer solely in studies of women of either African or Asian descent, but many need to be replicated in independent populations of the discovery ethnicity. Associations with one Asian-specific SNP that is not in linkage disequilibrium with the established SNP at that genomic region, rs9485372 (6q25.1, TAB2 c.6 + 68962G > A), was replicated in the COGS study above (Table 3) [37]; however, other SNP associations have not yet been replicated. The 6q25.1 chromosomal region demonstrates an important caveat when attempting to generalize association of specific loci across ethnicities. Associations with breast cancer risk were found independently within 6q25.1 at rs2046210 (intergenic, 6 kb 3′ to CCDC170) in an East Asian population with an OR of 1.29 (95% confidence interval (CI) = 1.21 to 1.37) [29], and at rs3757318 (CCDC170 c.1294-129G > A) in European populations with an OR of 1.30 (95% CI = 1.17 to 1.46) [32]. The association at rs2046210 has been replicated in European populations [67] but not in African populations, and other SNPs have not yet been studied. However, subsequent studies have found associations for different SNPs within this region with breast cancer risk in African populations. SNP rs9397435 (intergenic, 8.9 kb 3′ of CCDC170) has an OR of 1.34 (95% CI = 1.10 to 1.63) in Africans [99], and rs2046211 (intergenic, 6 kb 3′ of CCDC170) has a protective effect with an OR of 0.80 (95% CI = 0.67 to 0.95) (note that none of these SNPs are in linkage disequilibrium but are within a 35 kb region near CCDC170) [68].

Common risk alleles in other populations

A small number of studies have begun to look for common breast cancer risk variants in other populations. Orr and colleagues performed a GWAS in European male breast cancer patients and found that SNPs at rs1314913 (14q24.1, RAD51B c.757-59007C > T) and rs3803662 (16q12.1, intergenic) were significantly associated with cancer risk [70]. Rafiq and colleagues performed a GWAS in women with breast cancer under the age of 40 and found an association of the SNP rs421379 (5q14, intergenic) with breast cancer prognosis [100]. However, both of these studies were carried out on small numbers of cases (~500) and therefore require independent validation. Gold and colleagues studied women of Ashkenazi Jewish descent and found an association with breast cancer risk and the SNP rs6569479 (6q22.33, RNF146 c.3-1173 T > C) [64]. However, SNP rs2180341 (RNF146 c.-108-746G > A), which is in linkage disequilibrium with rs6569479, was not significantly associated with breast cancer risk in the BCAC population [57]. A weak association was seen in BRCA1/2 carriers; however, this genomic region was not included in the recent COGS studies [40, 41, 65]. This example demonstrates the difficulty of studying SNPs in populations with limited numbers of individuals and demonstrates the need for large collaborative consortia to pool resources to study common variants in such populations.

Common low penetrance risk alleles modify breast cancer risk in BRCA1 and BRCA2patients

Individuals with BRCA1 or BRCA2 germline mutations have a 60 to 80% and 40 to 60% risk of developing breast cancer in their lifetime, respectively [12, 101]. Multiple environmental or genetic factors may be responsible for the variable penetrance seen within individuals and families; one possible modifier of risk is the presence of common low-risk breast cancer susceptibility variants. Study of the genetic variation influencing the risk of breast cancer and ovarian cancer in BRCA1/2 mutation carriers has been mainly carried out by the Consortium of Investigators of Modifiers of BRCA1/2 (CIMBA), an international group of over 40 institutions [102]. Identification of genetic loci associated with a decreased or increased risk of breast cancer in BRCA1/2 carriers may allow the development of personalized risk prediction models. A number of candidate gene studies of the established breast cancer loci have been carried out in female BRCA1/2 mutation carriers as part of the CIMBA effort [38, 45, 54, 60, 77, 80, 103].

In addition, two multistage GWAS had been carried out in BRCA1/2 carriers prior to COGS (Table 2). Gaudet and colleagues genotyped 529,163 SNPs first in 809 BRCA2 carriers with breast cancer (cases) and 804 BRCA2 carriers without breast cancer (controls), followed by genotyping of 113 SNPs in 1,263 cases and 1,222 controls [78]. From this study, significant associations were found for three established breast cancer loci: rs2981582 (10q26.13, FGFR2 c.109 + 906 T > C), rs3803662 (16q12.1, intergenic, 5′ to TOX3), and rs16917302 (10q21.2 ZNF365 c.981 + 41642A > C). In addition, the novel SNP rs311499 (intergenic, 1.4 kb 3′-GMEB2) was identified as significantly associated with breast cancer in BRCA2 carriers. In BRCA1 mutation carriers, Antoniou and colleagues first genotyped 1,190 BRCA1 mutation carriers with breast cancer and 1,193 carriers without breast cancer and then studied identified SNPs in unselected cases from both the United Kingdom and the TNBC consortium, identifying rs8170 (19p13.1, BABAM1 c.837G > A) as modifying the risk of breast cancer in these populations [51].

The COGS studies have now clarified the associations of the established breast cancer SNPs with breast cancer risk in BRCA1/2 carriers and identified some novel associations (Table 2). Couch and colleagues [40] and Gaudet and colleagues [41] genotyped 11,705 BRCA1 carriers and 8,211 BRCA2 carriers, respectively, from CIMBA. Comparing BRCA1 mutation carriers with breast cancer with those carriers without, associations with risk of breast cancer were confirmed for three of the established breast cancer loci that have been prior associated with breast cancer risk in BRCA1 carriers, namely: rs2046210 (6q25, intergenic, 6 kb 3′ to CCDC170), rs10771399 (12p11.22, intergenic, 29 kb 5′ to PTHLH) and rs8170 (19p13.1, BABAM1 c.837G > A) [40]. Associations were borderline for rs3803662 (16q12.1, intergenic, 5′ to TOX3) and nonsignificant for rs13387042 (2q35, intergenic). For BRCA2 mutation carriers, associations with risk of breast cancer were confirmed for five of the established risk loci, namely: rs4973768 (3p24.1, SLC4A7 c.*2242G > A), rs2420946 (10q26.13, FGFR2 c.109 + 1899A > G), rs16917302 (10q21.2 ZNF365 c.981 + 41642A > C), rs3817198 (11p15.5, LSP1 c.*13 + 200 T > C), and rs3806332 (16q12.1, intergenic, 5′ to TOX3) [41]. Six chromosomal regions that had not previously been significantly associated with breast cancer risk in BRCA2 mutation carriers, probably due to inadequate power, were found to contain SNPs with significant associations, namely: rs27633 (12p11.22, PTHLH c.-266 + 555A > C), rs1688611 (5q11.2, intergenic, 5′ to MAP3K1), rs10965163 (9p21.3, MTAP c.561C > T), rs4733664 (8q24.21, intergenic), rs13039229 (20q13.33, intergenic, 2.1 kb 3' of PTK6), and rs2253407 (6q25.1, SYNE1 c.23020-466C > A). SNP associations were nonsignificant with breast cancer risk for nine of the established breast cancer SNPs. Importantly, four SNPs were newly found to be associated with breast cancer risk in BRCA1 mutation carriers: rs2290854 (1q32.1, MDM4 c.903 + 20A > G), rs6682208 (1q32.1, intergenic, 3′ to MDM4), rs11196174 (10q25.3, TCF7L2 c.381 + 22730A > G), and rs11196175 (10q25.3, TCF7L2 c.381 + 25248 T > C). In BRCA2 mutation carriers, three novel SNP associations with breast cancer risk were identified: rs184577 (2p22.2, intergenic), rs9348512 (6p24.3, intergenic), and rs619373 (Xq27.1, FGF13 c.50-73946C > T).

Fine mapping of genomic regions – towards assigning functionality to common risk variants

The COGS efforts to fine map a number of identified loci is one of the next crucial steps in the study of common breast cancer risk variants, namely the functional characterization of identified SNPs [59, 85]. A number of SNPs within or near the TERT genomic region have been associated with a risk of a variety of cancers [104], including breast cancer and ovarian cancer. To study the TERT region in breast cancer risk, 110 SNPs on the iCOGS array were genotyped in 46,451 women with breast cancer and 42,599 controls from the BCAC and in 11,709 BRCA1 mutation carriers from CIMBA; 7,435 cases were ER-negative and 27,074 cases were ER-positive [59]. The authors identified three regions within the TERT locus on 5p15.33 and found that the SNP most strongly associated with risk depended on the type of breast cancer. The six SNPs in linkage disequilibrium at r2 >0.6 in the first region corresponding to the TERT promoter (that is, rs2736107) were associated with breast cancer risk overall. In regions two and three, which correspond to introns 2 to 4, the five SNPs in linkage disequilibrium at r2 >0.6 in region two (for example, rs7734992), and the three SNPs in region three (rs72709458, rs2242652, and rs10069690) were associated with a risk of overall breast cancer, ER-negative breast cancer and cancer in BRCA1 carriers, but showed only a borderline association with ER-positive breast cancer risk. The group further characterized potential functionality of the SNPs by investigating whether they were found in open chromatin regions using in silico ENCODE data, and whether they had effects on TERT promoter activity or splicing by in vitro assays.

French and colleagues performed fine-mapping and functional studies at the 11q13.3 region using 731 SNPs on the iCOGS array [85]. This study found three independent regions in the 11q13.3 chromosomal region, which were strongly associated only with ER-positive breast cancer and not with ER-negative breast cancer risk. Through functional studies, French and colleagues also provided evidence that the SNPs within this region influence the risk of ER-positive breast cancer via reduction of transcriptional activation of CCND1. This study provides a model for the necessary future studies of the functional consequences of the extensive list of common low-risk variants that have now been identified from GWAS. It is hoped that functional studies of these SNPs may lead to a better understanding of breast cancer pathogenesis and eventual development of novel prevention strategies or treatments for breast cancer.

Using analysis of common breast cancer risk alleles in clinical practice

The identification of mutations in high penetrance breast cancer susceptibility genes in women allows for the delineation of individual risk and assists in guiding clinical recommendations, such as enhanced breast cancer screening (for example, magnetic resonance imaging screening starting at age 25) or chemoprevention with tamoxifen [2]. However, the majority of familial breast cancer patients are BRCA1/2 mutation-negative. In these women, single or multiple low penetrance and/or moderate penetrance genetic variants may be responsible for an increased risk of breast cancer. Currently, genotyping of common breast cancer risk variants are not incorporated into breast cancer risk assessment models.

In clinical practice, breast cancer risk is predicted using models such as the National Cancer Institute’s Breast Cancer Risk Assessment Tool based on the Gail model [105]. This model determines a woman’s risk of breast cancer based on ethnicity, personal and family history of breast cancer, age of first menstruation, age of first live birth and number of breast biopsies. As the model has been validated to accurately predict breast cancer risk in Caucasian, African and Asian American women [106108], any risk assessment incorporating SNP genotyping must improve upon this well-validated model.

Using computer simulation, Gail demonstrated that adding genotype information from seven SNPs (Table 3, polygenic risk score (PRS) model D) slightly improved the discriminatory accuracy to predict breast cancer risk, as measured by the area under the curve, but less so than adding mammographic density [109]. Gail predicted that risk categories for women would minimally change with the inclusion of SNP data [91]. In order to study the addition of SNP genotyping to breast cancer risk models in human subjects, Wacholder and colleagues performed SNP genotyping of 10 SNPs (eight established SNPs, two others; Table 3, PRS model C) in 5,590 cases and 5,998 controls [90]. The cases were women with breast cancer between the ages of 50 and 79, and 98.5% of patients were of European ancestry. Adding genotype information only minimally improved the area under the curve of the Gail model from 58.0% to 61.8%. Interestingly, however, 32.5% of patients were reclassified into the highest risk quintile and 20.4% were reclassified into the lowest risk quintile. To test the hypothesis that additional SNPs may improve the performance of SNP genotyping, Husing and colleagues analyzed 32 SNPs (24 established SNPs, eight others; Table 3, PRS model B) in 6,009 European postmenopausal breast cancer cases and 7,827 controls [89]. Adding additional SNPs modestly improved the discriminatory accuracy of classic risk models, but again probably would not be of sufficient clinical benefit to justify genotyping costs.

It is possible that SNP genotyping did not improve breast cancer risk prediction in the above studies because they were performed in unselected patient populations. Sawyer and colleagues thus recently genotyped 22 breast cancer risk variants (Table 3, PRS model A) in 1,143 high risk women from a familial breast cancer clinic [88]. These women had on average 1.82 first-degree to third-degree relatives with breast cancer and a median age of 45 at diagnosis. Sawyer and colleagues calculated a log-additive score that weighted each risk allele by its odds ratio, called the PRS, and found this score significantly increased from 0 (95% CI = -0.03 to 0.03) in controls to 0.30 (95% CI = 0.26 to 0.33, P = 2.4 × 10–29) in BRCA1/2 mutation-negative familial breast cancer cases. The PRS was significantly associated with breast cancer diagnosis under age 35 and contralateral breast cancer risk. The PRS could therefore possibly reclassify young women with a family history of breast cancer into a risk category for which intensive breast cancer screening or chemoprevention with tamoxifen might be recommended. In addition, the PRS may help predict contralateral breast cancer risk in women with a breast cancer diagnosis, therefore assisting in the decision for or against bilateral mastectomy.

In summary, genotyping of women with familial breast cancer for common low penetrance variants may have a significant impact on risk assessment, particularly in women for whom a high penetrance mutation is not found. However, prospective trials using a PRS that includes the recent SNPs contributed by COGS will need to be conducted before this strategy can be employed routinely. The outcomes of the trials must not only include an indication of the ability of SNP genotyping to improve risk stratification, but also determine whether reassigning patients to different risk strata affects endpoints such as breast cancer detection rates, breast cancer-related morbidity and mortality, patient quality of life, or overall healthcare costs. It is important to note that these studies have so far been performed on individuals of European descent using SNPs associated with risk and therefore one must be extremely cautious as any model generated may not accurately predict risk in other ethnicities.

Conclusions

Variants in multiple genes and intergenic regions are associated with a risk of developing breast cancer. These variants range from rare mutations that confer a high risk of breast cancer to common SNPs that confer individually minimally increased or decreased risks of breast cancer and probably act in a polygenic manner. Multistage GWAS in addition to smaller association, linkage and candidate gene studies have led to the publication of thousands of variants, but only a relatively small set are likely to be truly related to breast cancer risk. Associations found in one ethnicity cannot be generalized across all ethnicities, meaning that findings in one population need to be specifically validated in another. In addition, certain common risk variants are likely to be important only in certain breast cancer subtypes, such as ER-positive or ER-negative breast cancer, BRCA1 or BRCA2 carriers, or male breast cancer patients. Although the identification of common breast cancer risk variants has so far had little clinical impact, recent studies pre-COGS demonstrate that it may be possible to use carefully designed SNP genotyping panels to augment breast cancer risk models for high-risk populations such as those with familial breast cancer or to predict contralateral breast cancer risk. The recent COGS studies show that approximately 28% of familial breast cancer risk is accounted for by common variants. A thorough understanding of the involved SNPs may allow them to be used in risk stratification. Given the continual expansion of technology and decreases in cost in SNP genotyping, it may now be possible to develop a low-cost SNP genotyping tool that can be used in conjunction with standard risk assessment models. These tools must be tested prospectively and across ethnicities to demonstrate clinical utility.