Breast cancer is the most frequent cancer among women and the second leading cause of cancer-related death after lung cancer in Europe. In addition to genetic variants with high and moderate penetrance, more than 90 common germline genetic variants contributing to breast cancer risk have been identified, comprising about 37 % of the familial relative risk of the disease (Michailidou et al. 2013, 2015). This suggests that a substantial portion of inherited variation has not yet been identified. In addition, most of the known common susceptibility variants reside in non-coding regions and result in subtle regulation of gene expression. The biological mechanisms through which genetic variants exert their functions are still not entirely understood.

The ability to evade immune destruction has been increasingly recognized as a key hallmark of tumors (Hanahan and Weinberg 2011). Tumor cells may secrete immunosuppressive factors like TGF-β which hampers infiltrating cytotoxic T lymphocytes and natural killer cells (Yang et al. 2010). Inflammatory cells like regulatory T cells (Treg cells), a subset of CD4+ T lymphocytes, as well as myeloid-derived suppressor cells (MDSCs) may be recruited into the tumor environment, which are actively immunosuppressive (Lindau et al. 2013; Reisfeld 2013). Higher prevalence of Treg cells has been found in various cancers (Chang et al. 2010; Michel et al. 2008; Watanabe et al. 2002), including breast cancer (Bates et al. 2006). There is evidence that tumor infiltrating Treg cells endowed with immunosuppressive potential are associated with tumor progression and unfavorable prognosis, especially in estrogen receptor (ER)-negative breast cancer (Bates et al. 2006; Kim et al. 2013; Liu et al. 2012a). In addition, infiltrating MDSCs were also found in murine mammary tumor models (Aliper et al. 2014; Gad et al. 2014), but their relevance for breast cancer patients also in terms of prognosis is not well-understood. Furthermore, previous association studies have identified susceptibility alleles for breast cancer in two genes, TGFBR2 (transforming growth factor beta receptor II) (Michailidou et al. 2013) and CCND1 (cyclin D1) (French et al. 2013), which may be involved in immune regulation in cancer patients (Gabrilovich and Nagaraj 2009; Krieg and Boyman 2009), including those with breast cancer. We hypothesized that immunosuppression pathway genes, particularly those relevant to Treg cell and MDSC functions, may harbor further susceptibility variants associated with breast cancer tumorigenesis, with a possible differential association by ER status.

In this analysis, we investigated associations between breast cancer risk and single nucleotide polymorphisms (SNPs) in 133 candidate genes in the immunosuppression pathway in individual level data from the Breast Cancer Association Consortium (BCAC). We also assessed associations with breast cancer risk at the gene and pathway levels. Furthermore, we used publicly available datasets through the UCSC Genome Browser (2015) to examine the putative genetic susceptibility loci for potential regulatory function.

Materials and methods

Study participants

In this analysis, participants were restricted to 83,087 women of European ancestry from 37 case–control studies participating in BCAC, including 42,510 invasive breast cancer cases with stage I–III disease and 40,577 cancer-free controls. Of all breast cancer patients, 26,094 were known to have ER-positive disease and 6870 to have ER-negative disease. Details of included studies are summarized in Online Resource 1. All studies were approved by the relevant ethics committees and all participants gave informed consent (Michailidou et al. 2013).

Candidate gene selection

Candidate genes relevant to the Treg cell and MDSC pathways were identified through a comprehensive literature review in PubMed (DeNardo et al. 2010; DeNardo and Coussens 2007; Driessens et al. 2009; Gabrilovich and Nagaraj 2009; Krieg and Boyman 2009; Mills 2004; Ostrand-Rosenberg 2008; Poschke et al. 2011; Sakaguchi et al. 2013; Sica et al. 2008; Wilczynski and Duechler 2010; Zitvogel et al. 2006; Zou 2005), using the search terms “immunosuppression”/“immunosuppressive”, “regulatory T cells”/“Treg cells”/“FOXP3+ T cells”, “myeloid derived suppressor cells”/“MDSCs”, “immunosurveillance”, and “tumor escape”. The final candidate gene list included 133 immunosuppression-related genes (Online Resource 2). SNPs within 50 kb upstream and downstream of each gene were identified using HapMap CEU genotype data (2015) and dbSNP 126.

SNP association analyses

For the BCAC studies, genotyping was carried out using a custom Illumina iSelect array (iCOGS) designed for the Collaborative Oncological Gene-Environment Study (COGS) project (Michailidou et al. 2013). Of the 211,155 SNPs on the array, 4246 were located within 50 kb of the selected candidate genes. Centralized quality control of genotype data led to the exclusion of 651 SNPs. The exclusion criteria included a call rate less than 95 % in all samples genotyped with iCOGS, minor allele frequency (MAF) less than 0.05 in all samples, evidence of deviation from Hardy–Weinberg equilibrium (HWE) at p value <10−7, and concordance in duplicate samples less than 98 % (Michailidou et al. 2013). A total of 3595 SNPs passed all quality controls and was analyzed.

Per-allele associations with the number of minor alleles were assessed using multiple logistic regression models, adjusted for study, age (at diagnosis for cases or at recruitment for controls) and nine principal components (PCs) derived based on genotyped variants to account for European population substructure. We assessed the associations of SNPs with overall breast cancer risk as primary analyses, and then restricted to ER-positive (26,094 cases and 40,577 controls) and ER-negative subtypes (6870 cases and 40,577 controls) as secondary analyses. Differences in the associations between ER-positive and ER-negative diseases were assessed by case-only analyses, using ER status as the dependent variable. To determine the number of “independent” SNPs for adjustment of multiple testing, we applied the option “--indep-pairwise” in PLINK (Purcell et al. 2007). SNPs were pruned by linkage disequilibrium (LD) of r 2 < 0.2 for a window size of 50 SNPs and step size of 10 SNPs, yielding 689 “independent” SNPs. The significance threshold using Bonferroni correction corresponding to an alpha of 5 % was 7.3 × 10−5.

In order to identify more strongly associated variants, genotypes were imputed for SNPs at the locus for which strongest evidence of association was observed, via a two-stage procedure involving SHAPEIT (Howie et al. 2012) and IMPUTEv2 (Howie et al. 2009), using the 1000 Genomes Project data as the reference panel (Abecasis et al. 2012). Details of the imputation procedure are described elsewhere (Michailidou et al. 2015). Models assessing associations with imputed SNPs were adjusted for 16 PCs based on 1000 Genome imputed data to further improve adjustment for population stratification. To determine independent signals within imputed SNPs at STAT3, we ran a stepwise forward multiple logistic regression model including the most significant genotyped SNP rs1905339 and all imputed SNPs, adjusted for study, age and 16 PCs.

SNP association analyses and case-only analyses were all conducted using SAS 9.3 (Cary, NC, USA). All tests were two-sided.

For multiple associated SNPs located at the same gene, a Microsoft Excel SNP tool created by Chen et al. (2009) and the software HaploView 4.2 (Barrett et al. 2005) were used to examine LD structure between these SNPs. To be able to inspect LD structures and also for gene-level analyses, allele dosages of imputed SNPs had to be converted into the most probable genotypes. Therefore, we categorized the imputed allele dosage between [0, 0.5] as homozygote of the reference allele, the value between [0.5, 1.5] as heterozygote, and the value between [1.5, 2.0] as homozygote of the counted allele. The regional association plot was generated using the online tool LocusZoom (Pruim et al. 2010).

Gene-level and pathway association analyses

Gene-level associations were determined by a subset of PCs, which were derived from a linear combination of SNPs in each gene explaining 80 % of the variation in the joint distribution of all relevant SNPs. Associations with derived PCs were assessed within a logistic regression framework (Biernacka et al. 2012), for overall breast cancer, ER-positive and ER-negative diseases, respectively. Pathway association of the immunosuppression pathway was assessed based on a global test of association by combining the gene-level p values via the Gamma method (Biernacka et al. 2012). For gene-level associations, associations with p value <3.8 × 10−4 (Bonferroni correction) were considered statistically significant. To gain empirical p values for gene-level associations of TGFBR2 and CCND1 as well as for the pathway association, a Monte Carlo procedure was used with up to 1,000,000 randomizations (Biernacka et al. 2012). An exact binomial test based on the results of the single SNPs association analyses was carried out to estimate enrichment of association in the immunosuppression pathway. Gene-level and pathway association analyses were carried out in R (version 3.1.1) using the package ‘GSAgm’ version 1.0.

Haplotype analyses

To follow up the interesting gene associations observed, haplotype analyses were performed to identify potential susceptibility variants. Haplotype frequencies were determined with the use of the estimation maximization (EM) algorithm (Long et al. 1995) implemented in PROC HAPLOTYPE in SAS 9.3 (Cary, NC, USA). Haplotypes with frequency more or equal than 1 % were examined and the most common haplotype was used as the reference. Rare haplotypes with frequency less than 1 % were grouped into one category. Haplotype-specific odds ratios (ORs) and 95 % confidence intervals (CIs) were estimated within a multiple logistic regression framework, adjusted for the same covariates as in the single SNP association analyses. Global p values for association of haplotypes with breast cancer risk were computed using a likelihood ratio test comparing models with and without haplotypes of the gene of interest.

Gene expression analyses

In order to examine whether potential causative genes influence RNA expression in breast tumor tissue, we downloaded RNA sequence level 3 data from The Cancer Genome Atlas (TCGA) (2015). We retrieved the RNA expression level as the form of RNA-Seq by expectation–maximization (RSEM) based on the IlluminaHiSeq_RNASeqV2 array. Gene expression differences in RNA levels between 989 invasive breast cancer tissues and 113 matched normal tissues for four genes of interest (STAT3, PTRF, IL5, and GM-CSF) were analyzed using a two-sided Wilcoxon–Mann–Whiney test. In addition, data from 183 breast tissues in the GTEx (V6) (2015) publically available online databases were evaluated to obtain information on whether the most interesting variants (rs1905339, rs8074296, rs146170568, chr17:40607850:I and rs77942990) were expression quantitative trait loci (eQTL) for any gene. Also, GTEx was queried to obtain information on whether the five variants were eQTL for STAT3 or PTRF.

Functional annotation

To investigate potential regulatory functions of interesting polymorphisms, we used the Encyclopedia of DNA Elements (ENCODE) database through the UCSC Genome Browser as well as Haploreg v4 (Ward and Kellis 2012).


Selected characteristics of the study population are described in Table 1. The controls and breast cancer patients included in this study had comparable mean reference ages of 54.8 and 55.9 years and also the proportion of postmenopausal women was similar (68 % in controls and 69 % in breast cancer patients). The proportion of women indicating a family history of breast cancer in first degree relatives was as expected greater in breast cancer patients (25 %) than in controls (12 %).

Table 1 Characteristics of breast cancer cases and controls

Single SNP associations

Excluding the known TGFBR2 and CCND1 breast cancer susceptibility loci, the quantile–quantile (QQ) plot for associations with overall breast cancer risk for the genotyped SNPs of the other candidate genes indicated deviation from expected p values and thus evidence of further SNPs associated with breast cancer risk (Online Resource 3). Genetic associations with overall breast cancer risk for all assessed 3595 SNPs are summarized in Online Resource 4.

Four independent genotyped SNPs (LD r 2 < 0.3) were significantly associated with breast cancer risk at p value <7.3 × 10−5, accounting for the multiple comparisons (Table 2). The four significant SNPs were located in or near TGFBR2, STAT3 and CCND1. Since TGFBR2 and CCND1 have been identified as breast cancer susceptibility loci in previous studies (French et al. 2013; Michailidou et al. 2013; Rhie et al. 2013), we focused on the association of the SNP at STAT3. The variant rs1905339 (A>G) at STAT3 was positively associated with overall breast cancer risk (per allele odds ratio (OR) 1.05, 95 % confidence interval (CI) 1.03–1.08, p value = 1.4 × 10−6). It showed similar associations with ER-positive and ER-negative cancers (Online Resource 5). We did not observe further SNPs that were significantly associated with ER-positive or ER-negative disease (data not shown).

Table 2 TGFBR2, CCND1 and STAT3 SNPs associated with overall breast cancer risk in women of European ancestry after Bonferroni correction (p value <7.3 × 10−5)

To identify additional susceptibility variants at STAT3, we further investigated 707 SNPs that were well-imputed (imputation accuracy r 2 > 0.3) and with MAF >0.01 spanning a ±50 kb window around STAT3. Seven independent signals at STAT3 were found through the stepwise forward selection procedure. The genotyped SNP rs1905339 was not selected. The imputed SNP rs8074296 (A>G), which was in high LD with rs1905339 (r 2 = 0.99), showed a comparable OR for the association with overall breast cancer risk with a more extreme p value (per allele OR 1.05, 95 % CI 1.03–1.08, p value = 8.6 × 10−7, Table 3). A second imputed SNP rs146170568 (C>T), associated with a per allele OR of 1.32 (95 % CI 1.16–1.50, p value = 2.1 × 10−5), was still strongly associated at a p value of 3.2 × 10−4 after accounting for rs8074296 (Table 3). None of the independently associated imputed SNPs besides rs8074296 were correlated with rs1905339 or with each other (r 2 ≤ 0.01, Fig. 1). As rs8074296 and rs1905339 are located closer to PTRF than to STAT3, we additionally analyzed data of 178 imputed variants located within ±50 kb of PTRF. Associations of most additional variants in the PTRF region with breast cancer risk were attenuated in analyses conditioning on rs8074296 (Table 4). The variants chr17:40607850:I and rs77942990 still showed a strong association with breast cancer risk (per allele OR 1.09, 95 % CI 1.04–1.15, p value = 0.0005; and per allele OR 1.09, 95 % CI 1.04–1.15, p value = 0.0007, respectively). These two variants were also not in LD with rs8074296 (r 2 = 0.09 and 0.07, respectively) while all other variants in Table 4 were at least in moderate LD with rs8074296 (r 2 ≥ 0.46, Online Resource 6). The LD plot (Online Resource 6) also shows that chr17:40607850:I and rs77942990 are in high LD (r 2 = 0.83). A regional association plot for the genotyped SNP rs1905339 and all 885 imputed SNPs within ±50 kb of STAT3 and PTRF included in this analysis is shown in Fig. 2. Associations of SNPs shown in Table 3 as well as associations of chr17:40607850:I and rs77942990 with breast cancer risk were not significantly heterogeneous between studies (all p values for heterogeneity >0.1); forest plots can be found in Online Resource 7 to 16.

Table 3 Associations with overall breast cancer risk for seven independent imputed SNPs at STAT3 in women of European ancestry
Fig. 1
figure 1

Linkage disequilibrium plot showing r 2 values and color schemes for the genotyped SNP rs1905339 and seven independent imputed SNPs as well as imputed SNP rs181888151 within ±50 kb of STAT3. The linkage disequilibrium (LD) plot shows that SNP rs1905339 is in strong LD with the imputed SNP rs8074296 (r 2 = 0.99), and independent of the other six imputed SNPs (r 2 ≤ 0.01) at STAT3. LD was estimated based on control data

Table 4 Associations with overall breast cancer risk for 19 imputed variants near PTRF in women of European ancestry
Fig. 2
figure 2

Regional association plot for the genotyped SNP rs1905339 and 885 imputed SNPs within ±50 kb of STAT3 and PTRF. Each dot represents an SNP. The color of each dot reflects the extent of linkage disequilibrium (r 2) with SNP rs1032070 (in purple diamond). Genomic positions of SNPs were plotted based on hg19/1000 Genomes Mar 2012 European. Association is represented at the −log10 scale. cM/Mb centiMorgans/megabase

Gene-level and pathway associations

Gene-level associations with risks of overall breast cancer, ER-positive and ER-negative diseases, respectively, for the 133 candidate genes in the immunosuppression pathway are summarized in Online Resource 17. TGFBR2 and CCND1 showed significant associations with overall breast cancer risk (p value <10−6 and 3.0 × 10−4, respectively). In addition, IL5 and GM-CSF may be further potential susceptibility loci of breast cancer (p value = 1.0 × 10−3 and 7.0 × 10−3, respectively). STAT3 showed a less significant association with overall breast cancer risk (p value = 0.033). The immunosuppression pathway as a whole yielded a significant association with overall breast cancer risk (p value <10−6). Similar gene-level and pathway associations were found for ER-positive but not for ER-negative breast cancer (Online Resource 17). We found significant enrichment of association in the immunosuppression pathway based on the results of the single SNPs association analyses (313 of 3595 tests significant at α = 0.05, exact binomial test p value = 2.2 × 10−16).

Haplotype analyses

Despite the evidence for a possible role of IL5 and GM-CSF in breast cancer susceptibility from the gene-level analysis, no individual SNPs at IL5 or GM-CSF yielded significant genetic associations. To identify potential susceptibility haplotypes, haplotype-specific associations were assessed based on seven SNPs in or near IL5 (rs4143832, rs2079103, rs2706399, rs743562, rs739719, rs2069812 and rs2244012) and nine SNPs in or near GM-CSF (rs11575022, rs2069616, rs25881, rs25882, rs25883, rs27349, rs27438, rs40401 and rs743564). The LD structures for these SNPs at IL5 and GM-CSF are shown in Online Resource 18 and 19, respectively. In our study sample of women of European ancestry, 11 and 7 common haplotypes with frequency >1 % were observed at IL5 and GM-CSF, respectively. The haplotype AAAACGG in IL5 was associated with a decreased overall breast cancer risk (OR 0.96, 95 % CI 0.93–0.99, p value = 5.0 × 10−3, Table 5). In GM-CSF, the haplotype AAGAGCGAA was also associated with a decreased overall breast cancer risk (OR 0.92, 95 % CI 0.87–0.96, p value = 2.7 × 10−4, Table 6). The global p value for haplotype association was significant for both IL5 (p value = 0.005) and GM-CSF (p value = 0.007).

Table 5 Haplotype associations with overall breast cancer risk for seven SNPs at IL5 in women of European ancestry
Table 6 Haplotype associations with overall breast cancer risk for nine SNPs at GM-CSF in women of European ancestry

Gene expression analyses

Using TCGA RNA sequencing level 3 data, we found that RNA expression levels of STAT3 and IL5 were significantly higher in 113 normal tissue samples compared to 989 breast tumor samples (p value = 1.3 × 10−3 and 7.0 × 10−4, respectively, Online Resources 20 and 21), while overall expression of IL5 was low in both tissues. Also expression levels of PTRF were significantly higher in normal tissue compared to tumor tissue samples (p value ≤0.0001, Online Resource 22). GM-CSF expression was very low and did not differ between breast tumor samples and normal tissue samples (p value = 0.49, Online Resource 23). Among 183 mammary tissues in the GTEx database, SNPs rs1905339, rs8074296 and rs77942990 were not significantly correlated with STAT3 (p values = 0.36, 0.36, and 0.2, respectively; Online Resource 24 to 26) or PTRF expression (p values = 0.4, 0.4, and 0.39 Online Resource 27 to 29). The SNPs rs1905339 and rs8074296 were significant eQTL for TUBG2 (both p values = 9.9 × 10−7, Online Resource 30 and 31). The STAT3/PTRF variants rs146170568 and chr17:40607850:I were not available in the GTEx database.


Our comprehensive examination of associations between polymorphisms in the immunosuppression pathway genes and breast cancer risk revealed that STAT3, IL5, and GM-CSF may play a role in overall breast cancer susceptibility among women of European ancestry.

The in silico functional analysis revealed that within a ±50 kb window of STAT3, several polymorphisms are located in regulatory regions that could actively affect DNA transcription (Fig. 3). The SNP rs181888151, which is in complete LD with rs146170568 (r 2 = 1) but independent of rs1905339 (r 2 = 0.01, Fig. 1) was significantly associated with increased risk for overall breast cancer (per allele OR 1.31, 95 % CI 1.16–1.49, p value = 2.8 × 10−5). Together with a further independently associated imputed SNP rs141732716, these polymorphisms reside in strong DNase I hypersensitivity and transcription regulatory sites (Fig. 3). This suggests that they may be functional polymorphisms, but further experimental work is required for confirmation.

Fig. 3
figure 3

UCSC genome browser graphic for SNPs at the STAT3/PTRF region. The UCSC genome browser graphic shows functional annotations for the SNPs rs1905339 (red), correlated SNPs (r 2 > 0.80, green), as well as the other independent imputed SNPs (black) in or near the STAT3/PTRF region

STAT3 encodes the signal transducer and activator of transcription 3, which is a member of the STAT protein family. Activated by corresponding cytokines or growth factors, STAT3 can be phosphorylated and translocate into the cell nucleus, acting as a transcription activator. In addition, STAT3 plays a key role in regulating immune response in the tumor microenvironment (Yu et al. 2009). STAT3 signaling is required for immunosuppressive and tumor-promoting functions of MDSCs (Cheng et al. 2003, 2008; Kortylewski et al. 2005, 2009; Kujawski et al. 2008; Ostrand-Rosenberg and Sinha 2009; Yu et al. 2009), as well as for Treg cell expansion (Kortylewski et al. 2005, 2009; Matsumura et al. 2007). STAT3 has been reported in several previous genome-wide association studies (GWAS) to be associated with immune relevant diseases such as Crohn’s disease (Barrett et al. 2008; Franke et al. 2008; Yamazaki et al. 2013), inflammatory bowel disease (Jostins et al. 2012), and multiple sclerosis (Jakkula et al. 2010; Patsopoulos et al. 2011; Sawcer et al. 2011). Additionally, expression of STAT3 was suggested to be enriched in triple-negative breast cancer, and negatively associated with lymph node involvement and breast tumor stage in a study based on an in silico network approach (Liu et al. 2012b). However, the association of rs1905339 with triple-negative breast cancer risk in our study (N triple-negative breast cancer = 2600) was similar and not stronger compared to the association observed for overall breast cancer risk (per allele OR 1.06, 95 % CI 0.99–1.14, p value = 0.11).

The genotyped SNP rs1905339 is also located at 7 kb 5′ of PTRF, which encodes the polymerase I and transcript release factor, and is not known to be directly involved in immunosuppression. In addition, two independently associated imputed SNPs rs8074296 and rs12952342 (r 2 = 0.99 and 0 with rs1905339, respectively, Fig. 1) are located at 8 kb 5′ and 0.8 kb 3′ of PTRF, respectively (Fig. 3). PTRF is known to contribute to the formation of caveolae, small membrane caves involved in cell signaling, lipid regulation, and endocytosis (Chadda and Mayor 2008). Recently, down-regulation of PTRF was observed in breast cancer cell lines and breast tumor tissue, suggesting that PTRF expression might be an indicator for breast cancer progression (Bai et al. 2012). The SNPs rs1905339 and rs8074296 were also found to be eQTL for TUBG2 (tubulin, gamma 2) in the GTEx database, the expression of TUBG2 decreased with each variant allele (Online Resources 30 and 31, respectively). TUBG2 encodes γ-tubulin, a protein required for the formation and polar orientation of microtubules in cells. It is currently unknown, whether TUBG2 plays a role in breast cancer development or progression.

The other two potential susceptibility loci, IL5 and GM-CSF, are both located in a known cytokine gene cluster at 5q31. IL5 encodes interleukin 5, a cytokine secreted by CD4+ T helper 2 cells (Mills 2004; Parker 1993). IL5 is a growth and differentiation factor for both B cells and eosinophils, triggering eosinophil- and B cell-dependent immune response (Mills 2004; Parker 1993). GM-CSF encodes granulocyte–macrophage colony stimulating factor, a cytokine that controls differentiation and function of granulocytes and macrophages. GM-CSF is also a MDSC- inducing and activating factor in the bone marrow (Ostrand-Rosenberg and Sinha 2009; Serafini et al. 2004). In the tumor microenvironment, GM-CSF is the cytokine for dendritic cell differentiation and function, and it is often found to be underexpressed (Zou 2005). Additionally, 5q31 has been found to be a susceptibility locus for rheumatoid arthritis (Okada et al. 2012, 2014) and inflammatory bowel disease (Jostins et al. 2012).

Immunosuppression is a complex network with plenty of contributors, including transcription factors (e.g., STAT3), as well as immune mediating cytokines (e.g., IL5 and GM-CSF). Results of this analysis indicate that genetic variation in different components of the immunosuppression pathway may be susceptibility loci of breast cancer among women of European ancestry.

The main strengths of the present analysis were its large sample size, the uniform genotyping procedures and centralized quality controls used. The imputation of genotypes in the most interesting susceptibility loci provided an opportunity to identify more strongly associated variants. Assessments of gene-level associations also provided evidence for additional putative susceptibility loci. A limitation was the lack of an independent sample to replicate the observed associations; this will be feasible in the future using new studies participating in the BCAC. Further functional studies are still needed to identify causal variants and to investigate the underlying biological mechanisms.


Overall, our data provide strong evidence that common variation in the immunosuppression pathway is associated with breast cancer susceptibility. The strongest candidates for mediating this association were STAT3, IL5, and GM-CSF, but we cannot exclude the possibility of multiple alleles each with effects too small to confirm.