Introduction

Breast cancer in situ (BCIS) is a preinvasive breast cancer (BC) with the potential to transform into an invasive tumor within a time period that could vary between a few years to decades [1]. Only a subset of BCIS evolves into the invasive stage, and not all invasive cancers arise from BCIS [24]. Which factors influence the progression of BCIS to invasive BC is still unclear [2, 5, 6]. BCIS was rarely diagnosed before mass screening for BC, but since the introduction of screening they comprise about 20 % of all diagnosed BC [7, 8].

Ductal carcinoma in situ (DCIS) is the most common form of noninvasive BC. It is characterized by malignant epithelial cells inside the milk ducts of the breast. DCIS is known to be a different entity from lobular carcinoma in situ (LCIS), which is characterized by proliferation of malignant cells in the lobules of the breast [9] and is more frequently associated to lobular invasive BC than to ductal invasive BC. DCIS is generally considered a precursor lesion of invasive BC; however, a direct causality has not been firmly established because it is not possible to verify that the removal of DCIS decreases the risk of developing the invasive disease [3, 10].

BCIS is largely understudied and its etiology is poorly understood compared to invasive BC. Family history of BC is considered one of the strongest risk factors [11, 12], clearly stressing the importance of the genetic background. However, only a small number of studies have investigated the genetic risk factors specific for BCIS [13, 14] or DCIS [15, 16]. Genome-wide association studies (GWAS) including both invasive and BCIS cases tend to find similar associations between the two diseases but no specific loci have been identified for BCIS [1719]. Findings from the Million Women Study indicated that 2p-rs4666451 may be differentially associated with invasive BC and BCIS [13], while Milne and colleagues identified the association of 5p12-rs10941679 with lower-grade BC as well as with DCIS, but not with high-grade BC [15].

With the aim of verifying whether susceptibility SNPs identified through GWAS on invasive BC are also relevant for BCIS, we selected 39 single nucleotide polymorphisms (SNPs) previously shown to be associated with invasive BC, and performed an association study on 1317 BCIS cases and 14,006 controls in the context of the US National Cancer Institute’s Breast and Prostate Cancer Cohort Consortium (BPC3). In addition, we compared the association in BCIS with 10,645 invasive BC cases to investigate whether the two types of disease share a common genetic profile or not.

Methods

Study population

The National Cancer Institute’s Breast and Prostate Cancer Cohort Consortium (BPC3) has been described extensively elsewhere [20]. Briefly, it consists of large, well-established cohorts assembled in Europe, Australia and the United States that have both DNA samples and extensive questionnaire information collected at baseline. Cases were women who had been diagnosed with BCIS or invasive BC after enrolment in one of the BPC3 cohorts. This study included 10,645 invasive BC cases, 1317 BCIS cases and 14,006 controls. Of the 1317 BCIS cases included in this study, 71 % had information on tumor histology. Out of these, 85 % had DCIS and 15 % had LCIS. Controls were healthy women selected from each cohort. Relevant institutional review boards from each cohort approved the project and informed consent was obtained from all participants. The names of all approving Institutional Review Boards can be found in the Acknowledgements section.

SNP selection and genotyping

The SNPs included in this analysis were reported to show a statistically significant association with invasive BC risk (P <5 × 10−7) in at least one published study. For eight SNPs whose assays did not work satisfactorily we selected a surrogate in complete linkage disequilibrium (r2 = 1 in HapMap Caucasian in Europe (CEU)). In particular, for the following SNPs we have genotyped either the original SNP or the surrogate: rs4415084 (surrogate rs920329), rs9344191 (surrogate rs9449341), rs1250003 (surrogate rs704010), rs999737 (surrogate rs10483813), rs2284378 (surrogates rs8119937 and rs6059651), rs2180341 (surrogate rs9398840), rs311499 (surrogate rs311498,) and rs1917063 (surrogate rs9344208).

Genotyping was performed using TaqMan assays (Applied Biosystems, Foster City, CA, USA), as specified by the producer. Genotyping of the cases and controls was performed in four laboratories (the German Cancer Research Center (DKFZ), the University of Southern California, the US National Cancer Institute (NCI), and Harvard School of Public Health). Additional information on the genotyping techniques is given elsewhere [21]. Laboratory personnel were blinded to whether the subjects were cases or controls. Duplicate samples (approximately 8 %) were also included.

Data filtering and statistical analysis

Concordance of the duplicate samples was evaluated and found to be greater than 99.99 % for each SNP. Each SNP was tested for Hardy-Weinberg equilibrium in the controls by study. We investigated the association between genetic variants and BCIS risk by fitting an unconditional logistic regression model, adjusted for age at recruitment and cohort (defined as study phase in NHS). Since there were only 19 BCIS patients in the European Prospective Investigation into Cancer (EPIC) we did not adjust the BCIS risk models for country. Instead, we performed sensitivity analyses, excluding EPIC. The genotypes were treated as nominal variables, comparing heterozygotes and minor allele homozygotes to the reference group major allele homozygotes. For the same reason, we did not adjust the risk models for ethnicity but performed sensitivity analyses excluding non-Caucasians.

To test if there were differences in the genetic susceptibility for the two diseases, we performed case-case analyses and subgroup analyses, matching distinct controls to BCIS cases and invasive cases, respectively. The matching factors were age at baseline, menopausal status at baseline and cohort. The same type of case-case analyses were carried out comparing allele distributions between invasive BC and DCIS cases. Furthermore, we investigated the specific associations of the alleles with DCIS.

The significance threshold was adjusted, taking into account the large number of tests carried out. Since some of the SNPs map to the same regions and might be in linkage disequilibrium, for each locus we calculated the effective number of independent SNPs, the number of effectively independent variables (Meff), using the SNP Spectral Decomposition approach (simpleM method) (13). The study-wise Meff obtained was 31 and the adjusted threshold for significance was 0.05/(31) = 0.0016. All statistical tests were two-sided and all statistical analyses were performed with SAS software version 9.2 (SAS Institute, Inc., Cary, NC, USA).

Bioinformatic analysis

We used several bioinformatic tools to assess possible functional relevance for the SNP-BCIS associations. RegulomeDB [22] and HaploReg v2B [23] were used to identify the regulatory potential of the region nearby the SNP. The GENe Expression VARiation database (Genevar) [24] was used to identify potential associations between the SNP and expression levels of nearby genes expression quantitative trait loci (eQTL).

Results

In this study, we investigated the possible effect of 39 SNPs associated with invasive BC on the susceptibility of BCIS using 1317 BCIS cases and 14,006 healthy controls in the framework of BPC3. The relevant characteristics of the study population are presented in Table 1. The vast majority (69 %) of the study participants were postmenopausal and of European ancestry.

Table 1 Characteristics of the study subjects (BCIS and controls)

We removed subjects from the NHS cohort for the analysis of ZMIZ1-rs1045485 and 11q13-rs614367 since the genotype distribution showed departure from the Hardy-Weinberg equilibrium among the controls (P = 8.4 × 10−4 and P = 6 × 10−4, respectively) in this cohort. All other SNPs were in Hardy-Weinberg equilibrium (P >0.05). The results of the sensitivity analyses showed that the exclusion of EPIC and non-Caucasian subjects did not affect the results (data not shown).

SNP associations comparing BCIS with controls

We found significant associations (at the conventional 0.05 level) between 14 SNPs and risk of BCIS, with P values ranging from 0.041 (GMBE2-rs311499) to 3.0 x 10−6 (FGFR2-rs2981582) (Table 2). When accounting for multiple testing (P <0.0016), five SNPs (CDKN2BAS-rs1011970, FGFR2-rs3750817, FGFR2-rs2981582, TNRC9-rs3803662, 5p12-rs10941679) showed a statistically significant association with BCIS. Another variant (ZNF365-rs10995190) was very close to this significance threshold (P = 0.0019). None of the SNPs associated exclusively with estrogen receptor negative (ER-) BC (C19Orf62-rs8170, RALY-rs2284378, USHBP1-rs12982178 and TERT-rs10069690) or with both ER- and estrogen receptor positive (ER+) (6q14-rs13437553, 6q14-rs9344191, 6q14-rs17530068 and 20q11-rs4911414) in the literature showed an association with BCIS in this study, even at the 0.05 level.

Table 2 Association between the selected SNPs and risk of developing breast cancer in situ

SNP associations comparing DCIS with controls

By utilizing information on tumor histology we selected the DCIS cases and investigated the associations between the alleles and risk. Of the five SNPs significantly associated with BCIS, two (CDKN2BAS-rs1011970, TNRC9-rs3803662) showed a statistically significant association with DCIS (Table S1 in Additional file 1).

SNP associations comparing BCIS with invasive BC

Using case-case analyses to explore possible heterogeneity of associations of the SNPs with the risk of BCIS compared to invasive BC, we found no significant differences in the distribution of the genotypes of the selected SNPs by outcome (Table 3). The strongest difference was observed for CDKN2BAS-rs1011970, although it was not statistically significant considering multiple testing (P value for case-case comparison = 0.006), suggesting a stronger association of CDKN2BAS-rs1011970 with BCIS than with invasive BC. We also performed a subgroup analysis (BCIS vs. invasive) using matched controls in order to more clearly observe the direction of the associations between the selected SNPs and the risk of the two diseases. These latter analyses confirmed that CDKN2BAS-rs1011970 had a preferential association with BCIS compared to invasive BC, however, in both cases the minor T allele was associated with increased risk (Table S2 in Additional file 2).

Table 3 Case-case analysis between invasive breast cancer and breast cancer in situ

When comparing invasive BC to DCIS, we observed that CDKN2BAS-rs1011970 showed the most promising, albeit nonsignificant association (P value for DCIS vs. BC case-case comparison = 0.0206, Table S3 in Additional file 3). We also noticed a stronger association of CDKN2BAS-rs1011970 with DCIS compared to invasive BC in the subgroup analyses (Table S4 in Additional file 4).

Additionally we also performed an association study considering only invasive BC and we found significant associations at the conventional 0.05 for 28 loci (P values ranging from 0.0387 to 2.27 × 10–16) (Table S2 in Additional file 2).

Possible functional effects

For CDKN2BAS-rs1011970, HaploReg showed that the G to T nucleotide change of the SNP may alter the binding site for three transcription factors: FOXO4, TFC12 and p300. The Regulome DB had no data for this SNP and Genevar showed that the T allele is associated with decreased CDKN2BA gene expression (P = 0.002).

Discussion

With the aim of better understanding the relationship of the genetic background with BCIS, we analyzed the associations of 39 previously identified BC susceptibility SNPs with BCIS risk compared to normal controls and invasive BC cases. Our general observation, as noted by others [13, 16], is that BCIS and invasive BC seem to share the same genetic risk factors. This is also supported by the fact that for the five alleles that were significantly associated (P <0.0016) with BCIS risk the odds ratio (OR) for BCIS risk was on the same side of 1 as the OR for invasive disease. This was true also for all the 14 alleles that were nominally (P <0.05) associated with BCIS risk with the exception of GMEB2-rs311499. However, none of the established ER- specific BC susceptibility loci were associated with BCIS risk in our study. This is not surprising because it is likely that most of the BCIS cases in our study might be ER+ (the information on this variable is extremely sparse in BPC3) and suggests that, from a genetic point of view, ER+ and ER- tumors have different risk factors even for the first stages of carcinogenesis. However, it is difficult to draw a definitive conclusion without more complete ER status data in BPC3.

When conducting case-case analysis, we observed a difference in the association of CDKN2BAS-rs1011970 with invasive BC and BCIS, suggesting an association with BCIS only, although this difference was not statistically significant after adjusting for multiple comparisons (P = 0.006). The association between rs1011970 and BC risk (OR = 1.20) was reported by Turnbull using a large GWAS conducted in European studies and was replicated in the Breast Cancer Association Consortium (BCAC; OR = 1.09) [25, 26]. The lack of association between this SNP and risk of invasive BC in our study does not appear to be due to a lack of statistical power, since with 10,645 invasive BC cases and 14,006 controls we had more than 80 % power to detect an OR of 1.1 or greater, while the ORs reported by Turnbull for this polymorphism ranged from 1.19 to 1.45, depending on the type of statistical model used. However, the results reported by Turnbull originate from cases with a family history of invasive BC, which might explain the contradictory results. These could also arise due to differing adjustments in the statistical models, different screening programs or ways of diagnosing BCIS, or by chance. Additionally, the results from Turnbull and colleagues arise from a case-control study while ours are from a prospective cohort and it has been observed that there might be discrepancies between the two study designs [27]. We found significant associations at the conventional 0.05 level with invasive BC risk for 28 of the loci. For all of these SNPs, the directions of the associations were consistent with those reported in the literature [25, 28].

From a biological point of view the association between rs1011970 and BCIS is intriguing since the SNP lies on 9p21, in an intron of the CDKN2B antisense (CDKN2B-AS1) gene, whose sequence overlaps with that of CDKN2B and flanks CDKN2A. These two genes encode cyclin-dependent kinase inhibitors and are frequently mutated, deleted or hypermethylated in several cancer types, including BC [2932].

HaploReg showed that the G to T nucleotide change of rs1011970 altered the binding ability of three important cell cycle regulators (FOXO4, TFC12 and p300), possibly altering CDKN2B regulation. This hypothesis is corroborated by Genevar, which showed that the T allele was associated with a decreased gene expression. These data are consistent with the observation of an increased BC risk associated with the minor allele. The CDKN2B gene regulates cell growth and inhibits cell cycle G1 progression. The malfunctioning of this checkpoint might be particularly important in the initiation of the tumor. CDKN2B has been repeatedly found to be hypermethylated – a sign that the gene has been shut down, in benign lesions of the breast and in BCIS [30, 31], indicating its involvement in the early phases of carcinogenesis. Furthermore, Worsham and colleagues found that CDKN2B was crucial for initiating immortalization events but less important for progression to malignancy [33]. Taken together, these results suggest an involvement of the gene in early BC carcinogenesis and are consistent with our findings that the association of the SNP with BC overall could be due to its association with development of early-stage tumors, including BCIS, through the downregulation of the CDKN2B gene.

A limitation of this report is the fact that since the study focuses on the 39 SNPs associated with risk of invasive BC, there may be other SNPs specific for BCIS that could not be identified with this approach.

Conclusions

In conclusion, our findings further support that the genetic variants associated with risk of BCIS and invasive BC largely overlap, with the possible exception of rs1011970, a putatively functionally relevant SNP situated in the CDKN2BAS gene that may be a specific BCIS locus. The discovery of a specific locus for BCIS may improve our understanding on both invasive and noninvasive BC susceptibility. However, our results for rs1011970 do not meet the criteria of statistical significance imposed by the number of tests and therefore could still reflect a chance finding.