Introduction

Inherited common genetic variation is likely to influence survival in breast cancer patients [1]. Results from pre-clinical experiments have shown different metastatic behaviors in mice with different genetic backgrounds [2,3,4,5,6,7]. In addition, familial studies of breast cancer patients have shown that women with a first-degree relative with a poor prognosis breast cancer have a worse prognosis compared to women with a first-degree relative with a good prognosis cancer [8]. Moreover, genome-wide and candidate gene association studies have discovered common genetic variants associated with specific subtypes of breast cancer based on the expression of the estrogen receptor (ER) [9,10,11], progesterone receptor (PR), and the amplification of the human epidermal growth factor receptor 2 (HER2) [12, 13], which are known breast cancer prognostic factors [14, 15]. Finally, a number of studies have suggested that specific common germline genetic variants affect breast cancer prognosis both overall and within subgroups of patients [16,17,18,19,20,21,22,23,24].

Despite the supporting evidence, it remains challenging to identify common germline variants associated with breast cancer-specific survival. This may partially be explained by the good prognosis of breast cancer patients, which leads to underpowered analyses. Even large studies based on worldwide consortia cannot reach the number of breast cancer deaths necessary to detect small to moderate associations at a genome-wide significant level [19, 20, 25]. However, breast cancer is a heterogeneous disease, and it is possible that stronger associations between common germline variants and breast cancer-specific survival are present in certain patient subgroups, but cannot be detected in breast cancer overall. Previous studies provide modest evidence supporting this hypothesis [16, 19, 20].

The aim of our study was to evaluate the evidence for associations of inherited common genetic variants with breast cancer-specific survival within more homogeneous subgroups of breast cancer patients, defined by prognostic factors representative of tumor biology and/or by the type of systemic treatment. To this end, we performed genome-wide association analyses within clinically relevant, defined subgroups of patients based on hormone receptors, tumor grade, age at diagnosis, and type of systemic treatment [26, 27]. We also explored the subgroup-specific associations identified by previous studies [16, 19, 23, 28, 29], to confirm or refute those results.

Materials and methods

Study sample

We selected female breast cancer patients of European ancestry from studies participating in the Breast Cancer Association Consortium (BCAC). We included patients with available information about vital status and number of years from diagnosis to last follow-up who were diagnosed with a primary invasive breast cancer of any stage and were at least 18 years old at diagnosis. The final study sample consisted of 91,686 breast cancer patients from 70 BCAC studies. A description of the included studies is given in Additional file 1: Supplementary Table S1.

Information about histopathology, survival, and treatment was collected by individual studies and pooled and harmonized at the Netherlands Cancer Institute before incorporation into the BCAC database at the University of Cambridge (version 12, July 2019). All studies were approved by the relevant ethics committees and informed consent was obtained from all patients.

Patient subgroups

The subgroups of interest were defined based on age at diagnosis, estrogen receptor (ER) status, progesterone receptor (PR) status, human epidermal growth factor receptor 2 (HER2) status, tumor grade, and the use and type of systemic treatment, as available in the BCAC database. For age at diagnosis and tumor grade, we focused on subgroups characterized by worse prognosis. We thus defined 15 subgroups: (a) patients younger than age 40 years at diagnosis; (b) patients with grade 3 tumors; (c) patients with ER-positive (ER+) tumors, who received endocrine therapy (any kind); (d) patients diagnosed with ER-negative (ER−) tumors, who received chemotherapy (any kind); (e) patients with tumors that were hormone receptor (HR) positive (ER+ or PR+) and HER2-negative (HER2−); (f) patients with HR-positive (HR+), HER2− tumors, who received chemotherapy (any kind); (g) patients with HR+, HER2− tumors, who did not receive chemotherapy; (h) patients with HR+, HER2-positive (HER2+) tumors; (i) patients with HR-negative (HR−), HER2+ tumors, (j) patients with HR−, HER2− tumors; (k) patients who received Tamoxifen; (l) patients who received an aromatase inhibitor; (m) patients who received a Cyclophosphamide Methotrexate Fluorouracil (CMF)-like chemotherapy regimen; (n) patients who received taxanes; (o) patients who received anthracyclines.

The rationale and references to the literature supporting the choice of each subgroup for inclusion in a genome-wide association study on survival are given in Additional file 1: Supplementary Table S2. We did not include the subgroup of HER2+ tumors treated with Trastuzumab because of a relatively small number of patients and low event rate, leading to analyses that are more underpowered than those presented.

Patients with metastatic breast tumors at diagnosis (1.1 % of all included patients) were excluded from the subgroup analyses whose definition was based on the use and type of systemic therapy as generally they are treated with palliative intent [15, 30, 31].

In addition to the subgroup analyses, we also performed a genome-wide analysis of 15-year breast cancer-specific survival in all breast cancer patients. We performed this analysis to evaluate whether associations between common germline variants and breast cancer-specific survival in subgroups could be detected in the full dataset of patients. Genome-wide analyses for survival (unstratified by subtype) were previously performed [20] based on 12 GWAS datasets, but these included fewer patients from iCOGS and OncoArray (n = 84,757), with shorter follow-up, than were available in the current dataset. We focused our analyses on the iCOGS and OncoArray datasets, because the remaining 10 GWAS datasets used in the previous study did not include information about tumor characteristics, beyond ER status, or treatment, which were crucial for the subgroup analyses.

Due to the presence of missing values in the variables used to define the subgroups, not all patients could be classified by each subgroup. The number of patients included in each subgroup, together with the number of breast cancer-specific deaths, patient/tumor characteristics, treatment, and follow-up information, are shown in Additional file 1: Supplementary Tables S3-S4, and Additional file 2: Supplementary Table S5.

Imputation of missing values in clinical and pathological variables

For secondary adjusted analyses, we imputed missing values in the clinical and pathological variables using the Multiple imputation by Chained Equations (MICE) R package (v. 3.2.0), as described in Additional file 2: Supplementary Methods. A list of imputed variables and corresponding percentages of missing values and imputation methods is provided in Additional file 2: Supplementary Table S6.

Genotyping and imputation of genetic variants, ancestry analysis, and quality controls

Methods related to genotyping and genotype imputation have been described previously [17, 18, 20]. In brief, patients were genotyped with two different arrays: iCOGS and OncoArray [17, 32]. Only samples that were inferred to have European ancestry, based on genotype data, were included in the analyses. Non-genotyped variants were initially imputed based on the 1000 Genomes Project Phase 3 (October 2014) release as reference panel. More recently, non-genotyped single-nucleotide variants (SNVs) were re-imputed using a reference panel from the Haplotype Reference Consortium (HRC) [33] in order to improve imputation quality, especially for rarer variants. Analyses were performed on genotyped variants or imputed variants with a minor allele frequency (MAF) > 0.01. Imputed variants were included in the analyses if they had imputation r2 > 0.7. Approximately 10 million variants were analyzed.

Statistical analyses

The outcome in the analyses was breast cancer-specific survival (time to death due to breast cancer). Hazard ratios (HR) and 95% confidence intervals (CI) were estimated using delayed entry Cox regression models, where the time at risk was considered as starting from the time of study entry if the study entry was after diagnosis (22.9% within 1 year after diagnosis, and 27.3% more than 1 year after diagnosis) and from diagnosis if the time of study entry was missing (24.5%), at diagnosis (16.9%) or before diagnosis (8.4%). The time-to-event was right censored at the time of last follow-up, or at 15 years after diagnosis, whichever came first. Patients who died of unknown cause or causes other than breast cancer were censored at the time of death if death occurred before 15 years from diagnosis or at 15 years otherwise.

With reference to the results of Early Breast Cancer Trialists’ Collaborative Group (EBCTCG) [34], we additionally performed analyses within the subgroups whose definition was based on the use and type of systemic therapy, where we restricted the maximum follow-up time to 5 years after diagnosis (Additional file 2: Supplementary Table S5). The goal of those analyses was to investigate the potential short-term effects of germline variants on patients who received specific types of systemic treatment, since the effect might not be constant over time and treatment plans tend to focus on the first 5 years after diagnosis [15, 31].

Cox regression analysis was performed within each subgroup of interest, separately, and was stratified by country. All the analyses were performed separately by genotyping platform (iCOGS vs OncoArray), and the results were combined via a fixed-effects meta-analysis. The standard errors of the HR estimates were re-computed based on the likelihood ratio test statistic, as done previously [20] (Figs. 1 and 2). For variants that satisfied the inclusion criteria (MAF>0.01 and r2 > 0.7) on only one genotyping platform, we included the result for that specific platform (Tables 1 and 2). However, for variants with an association P < 5E−08, we also computed HR and 95% confidence interval in the other genotyping platform to verify that the direction of the association was the same (Additional file 2: Supplementary Table S7).

Fig. 1
figure 1

Genome-wide analyses of 15-year breast cancer-specific survival. The Manhattan plots show the results of the unadjusted analyses (stratified by country). Results are represented as −log10 of the p value from Cox regression models (y-axis). The x-axis shows the chromosome number, where chromosome 23 represents the X chromosome. The red line represents the genome-wide significant threshold 5E−08, the blue line corresponds to the threshold 1E−05. Breast cancer patients included in the analysis in panel: a younger than age 40 years at diagnosis; b diagnosed with a grade 3 tumor; c diagnosed with an ER+ tumor and treated with (any) endocrine therapy; d diagnosed with a ER tumor and treated with (any) chemotherapy; e diagnosed with an ER+ or PR+, and HER2 tumor; f diagnosed with an ER+ or PR+, and HER2 tumor treated with (any) chemotherapy; g diagnosed with an ER+ or PR+, and HER2 tumor not treated with chemotherapy; h diagnosed with an ER+ or PR+, and HER2+ tumor; i diagnosed with an ER and PR and HER2+ tumor; j diagnosed with an ER and PR and HER2 tumor; k treated with tamoxifen; l treated with aromatase inhibitor; m treated with CMF-like chemotherapy; n treated with taxanes; o treated with anthracyclines; p all

Fig. 2
figure 2

Genome-wide analyses of 5-year breast cancer-specific survival. The Manhattan plots show the results of the unadjusted analyses (stratified by country). Results are represented as −log10 of the p value from Cox regression models (y-axis). The x-axis shows the chromosome number, where chromosome 23 represents the X chromosome. The red line represents the genome-wide significant threshold 5E−08, the blue line corresponds to the threshold 1E−05. Breast cancer patients included in the analysis in panel: a diagnosed with a ER+ tumor and treated with (any) endocrine therapy; b diagnosed with a ER− tumor and treated with (any) chemotherapy; c diagnosed with a ER+ or PR+, and HER2− tumor treated with (any) chemotherapy; d diagnosed with a ER+ or PR+, and HER2− tumor not treated with chemotherapy; e treated with tamoxifen; f treated with aromatase inhibitor; g treated with CMF-like chemotherapy; h treated with taxanes; i treated with anthracyclines

Table 1 GWAS significant (P<5×10-8) and noteworthy (BFDP<0.15) results by subgroup, and corresponding results from adjusted analyses
Table 2 Meta-analysis results for variants analysed on OncoArray only in the unadjusted analyses in Table 1

Inflation of the likelihood ratio test statistics was estimated, within each subgroup, by dividing the median of the observed test statistics values by the median of a \( {\chi}_1^2 \) distribution (Additional file 2: Supplementary Figures S1 and S2). To assess the noteworthiness of the observed associations, we made use of the Bayesian false discovery probability (BFDP) measure [35]. To compute BFDPs, we set the prior probability of true association to 10−4 [36, 37], as done previously [20], and chose the prior distribution of the log hazard ratio of interest (effect size of a variant) to be a Normal distribution with mean 0 and standard error equal to 0.2 [36]. We describe associations with BFDP < 0.15 as “noteworthy” [20]. For each noteworthy result at a prior of 10−4, we also provided BFDPs under two, more restrictive, prior probabilities of true association (10−5 and 10−6; Additional file 3: Supplementary Tables S8-S9) [36, 37]. In addition, we estimated the power to detect genetic variant associated with 15-year and 5-year breast cancer-specific survival by subgroup (Additional file 3: Supplementary Table S10 and S11) as described in Additional file 2: Supplementary Methods.

For each genome-wide significant (P < 5E−08) [38] and/or noteworthy (BFDP < 0.15) association observed in the primary unadjusted subgroup analyses, we performed secondary analyses adjusted for age at diagnosis, tumor characteristics, and type of systemic treatment not used in the definition of the specific subgroup in which the association was detected (Tables 1 and 2). Secondary adjusted analyses were performed to account for residual heterogeneity; we used imputed covariates in order to keep the same sample size.

For each genome-wide significant or noteworthy association in the primary unadjusted analyses, we looked at the functional annotation of the surrounding genomic area, using the Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA GWAS) tool [39] (Additional file 2: Supplementary Figures S5 and S6). We also tested whether the expression of the nearest genes correlated with distant metastasis-free survival in breast cancer patients using KMplotter [40, 41].

PancanQTL [42] was used to identify cis-expression quantitative trait locus (eQTLs), trans-eQTLs, and survival eQTLs in breast cancer to see whether the genome-wide significant or noteworthy genetic variants from the primary analyses could be linked with the expression levels of genes affecting survival. In addition, for all the genome-wide significant and/or noteworthy associations detected in the primary analyses, we searched the GWAS catalog [43] to see whether there was already evidence of those being associated with breast cancer or other traits.

Results

After a median follow-up of 8.1 years, there were a total of 7531 breast cancer deaths among 91686 breast cancer patients (Additional file 1: Supplementary Tables S3-S4 and Additional file 2: Supplementary Table S5).

In the 15-year breast cancer-specific survival analyses, power for detecting genome-wide significant associations was < 0.45 for effect sizes (HRs) < 1.20 in all subgroups investigated and for all minor allele frequencies. The power was highest in the subgroups of grade 3 tumors, ER+ tumors treated with endocrine therapy, HR+ and HER2− tumors, and patients who received tamoxifen (Additional file 3: Supplementary Table S10). In the 5-year breast cancer-specific survival analyses, power was highest in the subgroups of patients with an ER− tumor who received chemotherapy, patients who received tamoxifen, and patients who received anthracyclines (Additional file 3: Supplementary Table S11).

Genome-wide significant and/or noteworthy associations with 15-year or 5-year breast cancer-specific survival were observed in the unadjusted analyses based on all patients (Additional file 2: Supplementary Table S7) and analyses of eight out of the 15 subgroups investigated (Tables 1 and 2; Figs. 1 and 2). The genomic inflation factor of the unadjusted genome-wide analyses varied from 0.981 to 1.028 (Additional file 2: Supplementary Figures S1- S4); it is therefore unlikely that the association results were affected by cryptic population substructure.

Two genome-wide significant associations were observed in the unstratified analysis based on patients genotyped using OncoArray (Additional file 2: Supplementary Table S7), namely variants rs57714252 (P = 4.7E−08) and rs4129285 (P = 4.9E−08), both situated in an intergenic region of chromosome 4 (Additional file 2: Supplementary Figure S5). These results were only based on the OncoArray data, since on iCOGS the variants did not satisfy the inclusion criteria for genotypes (iCOGS imputation r2 = 0.62). The corresponding estimates in the iCOGS data were in the opposite direction compared to the OncoArray estimates, and the results from the meta-analysis were not genome-wide significant and showed a large BFDP (Additional file 2: Supplementary Table S7).

Genome-wide significant associations were observed in the analysis restricted to patients diagnosed with a grade 3 tumor, with five correlated variants (Tables 1 and 2) located on chromosome X (Additional file 2: Supplementary Figure S5) in intron 1 of TBL1X. For the most significant variant, rs5934618, the alternative G allele was associated with increased risk of breast cancer death in unadjusted analyses (meta-analysis hazard ratio (HR) [95% confidence interval (CI)] 1.32 [1.20, 1.45], P = 1.4E−08, BFDP = 0.01; Table 2). The meta-analysis result remained substantially unchanged after adjusting for age at diagnosis, additional tumor characteristics, and treatment with (neo)adjuvant chemotherapy (HR [95% CI] 1.31 [1.18, 1.44], P = 7.9E−08, BFDP = 0.05; Table 2). The variant was not associated with the outcome in lower-grade tumors or in all patients combined (heterogeneity by grade P = 1.5E−03; Additional file 2: Supplementary Figure S7). All the five variants overlap chromatin features H3K4me3 and H3K27ac (associated with active transcription start sites) in multiple mammary cell types from normal breast tissue (Additional file 2: Supplementary Figure S8). Furthermore, there was evidence of TBL1X expression being associated with distant metastasis-free survival (HR [95%CI] for high vs low expression 1.71 [1.20,2.44], P = 2.7E−03) specifically in grade 3 patients, but not in patients with lower-grade disease (Additional file 2: Supplementary Figure S9). However, there was no evidence of association with TBL1X expression in normal breast tissue with any of the variants identified in our genome-wide analyses (Additional file 2: Supplementary Figure S10).

In the same subgroup of grade 3 tumors, we observed a noteworthy, non-genome-wide significant association with variant rs66871326, located on chromosome 2 in an intron of C2orf80. For variant rs66871326, the alternative A allele was associated with decreased risk of breast cancer death (HR [95% CI] 0.85 [0.80,0.90], P = 2.1E−07, BFDP = 0.11). The corresponding BFDP increased to 0.49 after adjusting for age at diagnosis, additional tumor characteristics, and treatment with (neo)adjuvant chemotherapy (Table 1).

We identified six variants on chromosome 15 with genome-wide significant associations within the subgroup of patients diagnosed with an ER+ or PR+, HER2− tumor (Table 1). We identified two independent variants, namely rs8030394 and rs112813972, both situated in an intronic region of THSD4 (Additional file 2: Supplementary Figure S5). For the most significant variant, rs8030394, the T allele was associated with increased risk of breast cancer death (HR [95% CI]: 2.47 [1.81, 3.37], P = 1.1E−08, BFDP = 0.42). For the second variant, rs112813972, the C allele was associated with decreased risk of death (HR [95% CI] 0.40 [0.28,0.55], P = 4.0E−08, BFDP = 0.70). These associations were not genome-wide significant after adjusting for age at diagnosis, additional tumor characteristics, and treatment with (neo)adjuvant chemotherapy, and the corresponding BFDPs increased to 0.72 and 0.90, respectively (Table 1).

We observed genome-wide significant associations from the 15-year breast cancer-specific analyses in the subgroups of patients with an ER+ or PR+ and HER2− tumor who did and did not receive chemotherapy. Three correlated variants on chromosome 2 were identified within the subgroup of patients who received chemotherapy. The most significant variant, rs62192052, is located in an intronic region of DNER (Additional file 2: Supplementary Figure S5) and was associated with decreased risk of death (HR [95% CI] 0.15 [0.08, 0.28], P = 2.6E−09, per T allele; Table 1). Although the result remained genome-wide significant after adjusting for age at diagnosis and additional tumor characteristics, the BFDP from both the unadjusted and adjusted analysis was ≥ 0.99 (Table 1; Supplementary Table S8), indicating that this association is almost certainly a false positive. Variant rs56248395, located on chromosome 11 in an intron of NAV2 (Additional file 2: Supplementary Figure S5), was associated with breast cancer death in the subgroup of patients with an ER+ or PR+ and HER2− tumor who did not receive chemotherapy (HR [95% CI] 2.33 [1.72,3.15], P = 4.8E−08, per T allele; Table 1). This result had BFDP ≥ 0.59 and was only based on the OncoArray data, since on iCOGS the variants did not satisfy the inclusion criteria for genotypes (iCOGS imputation r2 = 0.66; Additional file 2: Supplementary Table S7). The corresponding estimates in the iCOGS data (HR [95% CI] 1.07 [0.80,1.41], P = 6.6E−01; Additional file 2: Supplementary Table S7) and from the meta-analysis (HR [95% CI] 1.53 [1.25,1.89], P = 5.2E−05; Table 2) were not genome-wide significant and not noteworthy (meta-analysis BFDP≥0.97; Table 2; Supplementary Table S9).

We observed three additional single SNP noteworthy associations from the 15-year breast cancer-specific survival analyses. The intergenic variant rs4679741 on chromosome 3 was associated with breast cancer death in the subgroup of patients with an ER+ tumor treated with endocrine therapy (HR [95% CI] 1.18 [1.11, 1.26], P = 1.6E−07, BFDP = 0.09, per G allele). This result became genome-wide significant after adjusting for age at diagnosis, tumor characteristics, and treatment with chemotherapy (HR [95% CI] 1.20 [1.13, 1.28], P = 1.1E−08, BFDP = 0.01). The BFDP of this association remained < 0.15 when considering 10−5 as prior probability of true association (Additional file 3: Supplementary Table S8). PanCanQTL did not show any cis-eQTLs, trans-eQTLs nor survival eQTLs for this variant. Variant rs1106333 on chromosome 3, whose nearest gene is GRIP2, was associated with risk of dying of breast cancer in the subgroup of patients with an ER− tumor who received chemotherapy (HR [95% CI] 1.68 [1.39,2.03], P = 5.6E−08, BFDP = 0.12 per A allele). This result also became genome-wide significant after adjusting for additional prognostic factors (HR [95% CI] 1.70 [1.41,2.05], P = 4.4E−08, BFDP = 0.11). PanCanGTL revealed the presence of a cis-eQTL linking variant rs1106333 with GRIP2 expression in prostate adenocarcinoma but not in breast cancer. There was no evidence of association of GRIP2 expression levels with distant metastasis-free survival within ER− breast cancer patients treated with chemotherapy based on KMPlotter data (Additional file 2: Supplementary Figure S11). The last association of interest was observed in the subgroup of patients who received anthracyclines with intergenic variant rs34072391 on chromosome 7, but was not noteworthy after adjustment for additional prognostic factors (Table 1).

In the 5-year survival analyses focused on the treatment subgroups, we observed two genome-wide significant associations within the subgroup of patients diagnosed with an ER− tumor who received chemotherapy. The most significant variant was rs78754389, located on chromosome 4 in an intronic region of gene ARAP2 (Additional file 2: Supplementary Figure S5; HR [95% CI] 1.79 [1.46,2.20], P = 1.7E−08, BFDP = 0.07 per A allele). This result remained both genome-wide significant and noteworthy after adjusting for age at diagnosis and additional tumor characteristics (HR [95% CI] 1.67 [1.39,2.00], P = 4.1E−08, BFDP = 0.09; Table 1). However, PanCanQTL did not show any cis-eQTLs, trans-eQTLs nor survival eQTLs for rs78754389 and there was no evidence of association of ARAP2 expression levels with distant metastasis-free survival within ER− breast cancer patients treated with chemotherapy based on KMPlotter data (Additional file 2: Supplementary Figure S12). The second genome-wide significant variant was rs117685664, located on chromosome 8 (HR [95% CI] 0.26 [0.16,0.42], P = 4.6E−08, per T allele). This association was not genome-wide significant after accounting for the age at diagnosis and additional tumor characteristics (Table 1), and the corresponding BFDPs from unadjusted and adjusted analysis were 1.00 and 0.99, respectively, indicating a false positive finding.

We also observed one additional noteworthy but not genome-wide significant association in the 5-year breast cancer-specific survival analyses in the subgroup of patients who received Tamoxifen with variant rs72775397, situated in the 3′ untranslated region of MCTP1 (HR [95% CI] 1.36 [1.21,1.53], P = 1.8E−07, BFDP = 0.11, per C allele). The association was attenuated after adjustment for additional prognostic factors (HR [95% CI] 1.11 [1.04,1.20], P = 3.8E−03, BFDP = 1.00).

We did not identify any genome-wide significant or noteworthy association in any of the remaining seven subgroups investigated (Figs. 1 and 2). In addition, none of the above reported associations have a BFDP < 0.15 when considering 10−6 as prior probability of true association (Additional file 3: Supplementary Tables S8). Moreover, none of the subgroup-specific genome-wide significant associations detected by previous studies using a smaller version (both in terms of number of cases and of length of follow-up) of the iCOGS and/or OncoArray BCAC datasets were replicated at P < 0.001 (Additional file 3: Supplementary Table S12).

Discussion

We investigated the association of over 10 million common germline genetic variants with breast cancer-specific survival within 15 patient subgroups based on prognostic factors representative of tumor biology or related to the type of systemic treatment. Our hypothesis was that focusing on more homogeneous subgroups of breast cancer patients might reveal otherwise undetected associations. Besides type of systemic treatment, the definition of the subgroups was based on current clinically used biological characteristics for tumor subtyping and treatment decisions: patient’s age, tumor histological grade, ER, PR, and HER2 status ( [31]; Supplementary table S2). We did not have gene expression or copy number aberration data available to classify tumors on the basis of specific biological processes [44, 45]; however, relevant survival differences have been reported among the four subtypes based on ER, PR, and HER2 status [27, 46, 47].

A concern about performing GWAS on several subgroups of patients is the increased proportion of false discoveries, also known as type I errors. For this reason and to overcome additional limitations of the association p values [35,36,37], we made use of the BFDP approach and only considered as robust candidates those associations with BFDP < 0.15 at a prior probability of 10−4.

We found evidence of four loci potentially associated with breast cancer survival: one in the subgroup of patients diagnosed with a grade 3 tumor, one in the subgroup of patients with an ER+ tumor and treated with endocrine therapy, and two in the subgroup of patients with an ER− tumor and treated with chemotherapy.

The most significant variant identified in the subgroup of grade 3 tumors, rs5934618, is situated in intron 1 of TBL1X, a gene which encodes the Transducin (beta)-like 1X-linked protein. Both TBL1X and the closely related gene TBLR1 have been implicated in the activation of the Wnt/beta-catenin signaling pathway, which has been reported to be overactivated in the progression and proliferation of several tumors, including breast tumors, where it has been linked with reduced overall survival [48,49,50]. Rs5934618 and the other four correlated variants identified in our genome-wide analysis overlap with chromatin features H3K4me3 and H3K27ac in normal breast; these histone marks are generally characteristics of gene promoters and/or enhancers and might indicate that one or more of these variants act through modulating expression of TBL1X. There was no direct evidence that any of these variants are expression single-nucleotide polymorphisms (eSNPs) for TBL1X, but this might reflect the tissues examined or that the variants only regulate the gene in a specific context.

The remaining three variants potentially associated with breast cancer survival were as follows: rs4679741, identified in the subgroup of patients diagnosed with an ER+ tumor and treated with endocrine therapy; rs1106333 and rs78754389, identified in the subgroup of patients diagnosed with an ER− tumor and treated with chemotherapy. For variant rs4679741, it is unclear which the potential target genes might be, while there was no evidence linking the other two variants to the expression of the closest genes in breast cancer nor evidence of association between those genes and survival within the specific subgroups of breast cancer patients. Nevertheless, there are several mechanisms through which the four identified variants could affect survival. For example, they could act through regulation of an unannotated long noncoding RNA [51, 52] or microRNA [53, 54]. Further functional studies including epigenetic mechanisms are needed in order to gain more insights about the detected associations and to ascertain the potential underlying biological mechanisms.

We did not find strong evidence of germline variants associated with breast cancer-specific survival in any of the other subgroups of patients investigated. In addition, we did not replicate any of the subgroup-specific associations identified by previous studies. One of these associations, with variant rs4458204, was previously detected in the subgroup of patients with an ER− tumor who received chemotherapy [16]. The estimated HR (95% CI) was 1.81 (1.49–2.19) with association P = 1.9E−09. In our analysis of the same subgroup, we obtained a much lower HR estimate and the association was no longer statistically significant (HR (95% CI) 1.14 (0.99, 1.32), P = 6.0E−02), suggesting that the previous result was a false positive. Even though there is some overlap in terms of patients between the previous study and our current study, the latter is based on a substantially larger number of breast cancer patients and it includes more complete follow-up data.

A major strength of our study is the sample size, which was the largest to date and provided reasonable power to detect associations with breast cancer-specific survival within specific subgroups of patients. On the other hand, our study is subject to several limitations that are intrinsic to large consortium studies: these include variation in study design, time periods of diagnosis, and duration of follow-up, all of which can contribute to within subgroup heterogeneity. Some broad treatment-related subgroups, namely ER+ treated with any endocrine therapy and ER− treated with any chemotherapy, may include different treatments due to the wide period of diagnosis included in our study. On the other hand, the majority of patients were diagnosed between 2000 and 2009 (69.9% and 64.5% for ER+ treated with any endocrine therapy and ER− treated with any chemotherapy, respectively). If any impact on the results, the variation in treatment over time might have hampered the detection of associations between variants and survival in these subgroups. Several studies did not report the cause of death for all patients. Out of 14,606 deaths observed within the first 15 years after diagnosis, 7531 (51.6%) were due to breast cancer. Of the remaining 7075 deaths, 4905 (33.6%) were due to causes other than breast cancer, and for 2170 deaths (14.8%) it was unknown whether they were due to breast cancer or to other causes. This will have led to a loss of power, given that most of the deaths of unknown cause are likely to have been due to breast cancer. A related weakness of the study is its dependence on accuracy of cause of death certification and on coding practices of underlying cause of death in different countries. However, despite potential inaccuracies in cause of death, we considered it more valid to focus on deaths reported as due to breast cancer than by considering all deaths together, which would include those due to other causes. An additional limitation of the study is that in most subgroups we had very limited power to detect highly significant associations, particularly for small to moderate effect sizes (HRs 1.05–1.30), even for variants of relatively high minor allele frequency (MAF = 0.20). Therefore, we may have missed variants with low to moderate associations with survival.

Conclusions

In conclusion, we found evidence of four loci associated with breast cancer-specific survival within specific patient subgroups. The variants identified appear to be independent of known additional prognostic factors, as shown in the results of the adjusted analyses based on imputed clinic-pathological variables, and could, after proper validation, improve prognostic estimates and potentially help in better stratifying patients in treatment subgroups. However, the power for many subgroups is limited due to the low number of events. Even so, given the lack of evidence of strong associations in many of the patient subgroups investigated, and the fact that previously reported variants were not confirmed, our results suggest that the impact of common germline genetic variant on breast cancer-specific survival might be limited.