Introduction

Breast cancer is the most commonly diagnosed cancer in women, in the world, with an estimated 1.67 million new cancer cases diagnosed in 2012. Breast cancer mortality is the second most common cancer-related death in women in the more developed regions of the world and accounts for 15.4% of cancer-related deaths in women [1]. Breast cancer outcome is affected by several factors including: age, tumour size, tumour grade, extent of local and distal spread at diagnosis, oestrogen receptor (ER) status, human epidermal growth factor receptor 2 (HER2) status and treatment received. It is also likely that inherited host characteristics, such as genetic variants, are important [2].

The association between common germline genetic variation and breast cancer survival has been examined in many candidate gene studies investigating genes in pathways known to be involved in breast cancer [3]. These studies have identified numerous single nucleotide polymorphisms (SNPs) associated with outcome at nominal significance levels, but none have been widely replicated in further studies. The exceptions to this are three genome-wide association studies (GWAS) [4-6] and a study from the Breast Cancer Association Consortium, which had substantial power to detect associated variants with large effect sizes (hazard ratio (HR) >2) [7]. Two of those GWAS have reported significant associations for three polymorphisms (rs9934948, rs3784099, rs4778137) [4,6]. The aim of this study was to evaluate the association of previously reported SNPs with prognosis using data from a hypothesis-generating pooled analysis of eight breast cancer survival GWAS from ten studies including 37,954 breast cancer cases [8].

Methods

Literature review

Studies reporting common polymorphisms associated with breast cancer prognosis were identified by searching both Google Scholar and Pubmed. We searched Google Scholar using the search terms: ‘breast cancer’, ‘survival’, ‘prognosis’, ‘polymorphisms’ and ‘SNPs’. The search terms for Pubmed were ‘breast cancer’ AND (‘survival’ OR ‘prognosis’) AND (‘polymorphism’ OR ‘SNP’). The references of all identified studies were then individually interrogated for any additional studies. The search was last updated on 6 June 2014. We considered studies to be eligible for inclusion if they reported an association between a germline genetic variant and at least one of the following end points: overall survival, disease-free survival and breast cancer-specific survival (BCSS). Studies evaluating the prognostic importance of rare high-penetrance variants with minor allele frequency <2% in BRCA1, BRCA2 and CHEK2 were omitted from the review. Only one study conducted ER subtype-specific analyses.

For the purposes of comparison, all studies that used genetic models that grouped together two genotypes into a single category were defined as using ‘dominance models’. This category includes both dominant and recessive models as each study's definition of a dominant or recessive model is dependent on which allele is the major or minor allele, whether they consider the effect allele to be bi-directional, or whether they focus on only the risk allele.

Genome-wide association studies

We used data from a combined analysis of eight breast cancer GWAS, from ten studies [9-19], that had genotype data from a genome-wide SNP array and had collected follow-up time data for the 37,954 breast cancer cases [8]. Genotype and sample quality control were carried out separately for each study. In short, SNPs were excluded based on: low call rate, minor allele frequency <1% and significant deviation of genotype frequencies from the Hardy-Weinberg equilibrium. Samples were excluded for: low call rate, ambiguous gender, relatedness and extreme heterozygosity. We also excluded subjects of less than 90% European ancestry. Sample ancestry was determined separately for each GWAS included in the meta-analysis using either principal component analysis, multi-dimensional scaling or LAMP based on ethnicities from HapMap samples. Samples with less than 90% European ancestry were excluded. As different genotyping arrays had been used for the different studies, imputation had been performed using a reference panel from the 1000 Genomes Project [8,20]. We utilised the imputed data for the SNPs of interest in this study. Details of the pooled studies are shown in Additional files 1 and 2.

Cox proportional hazards models were fitted to assess the association of genotype with breast cancer-specific mortality under a co-dominant (log-additive) genetic model using the likelihood ratio test. The models were adjusted for principal components in order to minimise the effect of population substructure, and the Collaborative Oncological Gene-environment Study (COGS) [16] dataset was stratified by study. Each survival GWAS was analysed separately and the results were harmonised and combined using a standard inverse-variance weighted fixed-effects meta-analysis. In order to compare the results with the published associations we used a one-sided test based on the reported direction of effect. In the initial analysis all 56 SNPs' models were unadjusted for prognostic factors. However, we conducted multivariable analysis of the previously reported SNPs that were significantly associated with survival adjusting for age, stage and grade using 29,360 samples from the COGS study.

Results

Literature review

We identified 46 publications reporting nominally significant associations between 62 germline variants and survival after a breast cancer diagnosis. Details of each variant and the reported association with breast cancer prognosis are shown in Additional file 3. The median sample size was 890 cases; the smallest study had 85 cases and the largest 25,853. Fifty-nine variants were from 44 candidate gene studies and three variants were identified through GWAS. The candidate genes were involved in the following pathways: DNA repair, cell cycle control, matrix metalloproteinases, immune response, drug response, tumour progression, vitamin D receptors and miscellaneous other pathways (Table 1). Findings from the identified publications were infrequently replicated; only six variants out of the 62 were reported in at least one subsequent publication.

Table 1 Previously identified breast cancer survival genes in cancer-related pathways

Meta-analysis findings

Results from the GWAS meta-analysis included 58 of the 62 previously identified variants discussed above. The SNP (rs2886162) was replaced by a perfectly correlated tagSNP (rs2364725, r2 = 1). Associations for four of the variants identified: rs4778137 in OCA2, rs3803662 in TOX3, rs1042522 in TP53 and rs2479717 in CCND1 were discovered in studies carried out by the Breast Cancer Association Consortium using sets of samples included in our GWAS meta-analysis. Therefore, we are unable to replicate these associations independently in the full dataset. The substantial sample overlap between the studies that identified associations with rs4778137 and rs3803662 means that there is little to be gained by attempting to replicate their associations in the additional samples included in the meta-analysis. However, the sample sizes in the studies identifying rs1042522 and rs2479717 were relatively small, so we evaluated their association with BCSS in the GWAS meta-analysis omitting the samples from studies used in the original publications. The two SNPs were evaluated in 29,224 and 31,434 samples respectively.

The results for the 56 SNPs evaluated in the meta-analysis are presented in Additional file 4. In the analysis of all cases, five SNPs (rs2981582, rs1800566, rs9934948, rs1800470 and rs3775775) were significant with one-sided P value <0.05, 51 SNPs were not significant at this nominal P value. The most significant association was for rs2981582 in FGFR2 (per G allele HR 1.09, 90% confidence interval (CI) 1.04 to 1.14, one-sided P value = 0.00085). All significantly associated SNPs had good imputation quality (r2 = 0.9 to 1). The imputation r2 for all 56 SNPs can be found in Additional file 4. No single SNP reached the stringent level of significance generally regarded as genome-wide significant (P value <5x10−8) but the number of moderately significant associations (5) was somewhat greater than that expected by chance (2.8). This is illustrated by the quantile-quantile plot shown in Figure 1. Seven SNPs not significantly associated with prognosis in all patients were significant in ER-positive disease. We found evidence of ER-positive specific associations with prognosis for seven out of the twelve SNPs nominally associated (P <0.05) with survival. These SNPs were not previously identified in patients with specifically ER-positive disease; however, our observations may agree with the previously reported results as most breast cancers are ER positive. We measured the level of heterogeneity between the studies included in the pooled analysis for the 12 SNPs associated with survival. There was moderate evidence of heterogeneity for the SNP rs2981582 (I2 = 41.1%, P value = 0.084). For all other SNPs there was low heterogeneity (I2 < 25%, P value >0.2). Details of the SNPs nominally associated with BCSS are shown in Table 2. The results for the nominally associated SNPs adjusted for age, stage and grade are shown in Additional file 5. The HRs for some of the SNPs were attenuated after adjustment. Also, the associations with BCSS of SNPs rs3775775 and rs2333227 were stronger in the multivariable analysis.

Figure 1
figure 1

Quantile-quantile plot of results from look-up of previously reported associations in genome-wide association studies. Tests were one-sided with direction assumed from previous association.

Table 2 Previously reported associations replicated in the meta-analysis

Discussion

There have been few studies focused on the replication of sub-genome-wide significant associations identified previously. Previous replication studies have focused on reporting the SNPs with the strongest evidence of association. We have found some evidence to support previously reported associations between common germline genetic variants and breast cancer prognosis. However, the moderate evidence for some variants provides a rationale for continued research efforts to identify such variants. Significant variants were for the most part candidates in cancer-related genes as is shown in Table 1. Despite the larger sample size and therefore increased power to detect true associations with prognosis in comparison to previous studies, a possible reason for associations failing to reach genome-wide significance may still be limited power. Figure 2a illustrates that for our analysis with 2,900 survival events from 37,954 cases, there is limited power to detect associations at stringent significance levels for modest effect sizes based on a variant with a 0.3 minor allele frequency. Figure 2b shows that almost five times as many events would be needed to detect with 80 per cent power at P value <10−8 an allele with a minor allele frequency of 0.3 that confers a HR of 1.1.

Figure 2
figure 2

Power (%) to detect true associations with survival time across a range of minor allele frequencies and numbers of events. (a) Power (%) to detect true associations with survival time over a range of effect sizes at increasing orders of significance given a minor allele frequency of 0.3 and 2,900 events. We used an imputation r2 = 0.8 to account for suboptimal imputation. (b) Power (%) to detect true associations with survival time for increasing numbers of events, at increasing orders of significance, given a minor allele frequency of 0.3 and an effect size of 1.1. We used an imputation r2 = 0.8 to account for suboptimal imputation.

In a two-sided test, five of the previously reported associations with prognosis were significantly associated with BCSS in the GWAS meta-analysis but had discordant directions of effect to the original results. These discrepancies may be caused by differing ethnicity between the sample populations [21] as the meta-analysis is specific to patients with European ancestry whereas the five original studies consider non-European populations [6,22-24]. On the other hand, they may also represent false positive associations in both discovery and replication data.

Many previously published studies used a dominance model to evaluate associations. We only used a co-dominant model to detect association in the GWAS. This is justified because thousands of common variants [25] associated with a range of diseases have been identified using a co-dominant model with little or no evidence for dominance. It seems unlikely that breast cancer survival would differ substantially from other phenotypes in any true, underlying genetic model. Where the true underlying model is co-dominant this approach will maximise statistical power. While it is possible that some variants may be truly associated under a dominance model, for example through loss of heterozygosity of the specific germline variant in the tumour, we would still have reasonable power to detect such an association with the large sample size of the GWAS under a co-dominant model.

A further way to increase power to detect robust associations with prognosis is to reduce the level of heterogeneity in the phenotype. Studies focusing on identifying subtype-specific associations will have increased power to detect variants associated with a particular subtype than an analysis on all patients will have. In particular, studies considering disease subtypes, for example ER-negative disease, may provide valuable information into the reasons for known prognostic differences between subtypes. We identified seven SNPs associated with ER-positive disease. These SNPs were not previously identified in specifically ER-positive disease, however, our observations may agree with the previously reported results as most breast cancers are ER positive. In addition, studies looking at interactions with specific treatments, most notably adjuvant chemotherapy, hormonal therapy and adjuvant radiotherapy, may further inform targeted treatment of subgroups of patients according to their inherited genetic information. Some of the previously reported associations with prognosis were found in specific subgroups of patients; however, as yet the sizes of these studies are limited. Large subtype-specific studies are needed in order to investigate interactions with particular subgroups effectively. The generation of sufficiently large studies to deliver strongly significant results, as well as having good outcome and treatment data to enable powerful subtype-specific analyses, will only be possible by combining data resources through large-scale global collaborations. Case-control studies including approximately 100,000 cases are now being conducted to identify common variants associated with risk. It seems a realistic goal to carry out case-cohort studies of a similar size. Reliable identification of SNPs associated with breast cancer prognosis may help to understand the molecular mechanisms of tumour progression and metastasis. Ultimately, this may lead to the development of new therapeutic targets. Polygenic risk scores based on multiple risk alleles have been shown to have potentially useful discrimination [26]. Similar polygenic prognostic scores may improve discrimination of prognostic and treatment benefit tools such as PREDICT [27].

Conclusions

We have found limited evidence to support the assertion that germline genetic variation influences outcome after a diagnosis of breast cancer. Large studies with detailed clinical and follow-up information are needed in order to achieve sufficient statistical power to detect associations at stringent significance thresholds. In addition, power can also be increased by reducing the level of phenotype heterogeneity, which will also provide valuable insights into prognostic differences between subgroups.