Introduction

Breast cancer aggregates in families and has a considerable inherited component. Approximately 20% of the genetic risk for breast cancer is explained by pathogenic mutations in the high-penetrance genes BRCA1, BRCA2, TP53, STK11 and PTEN1. Other rare, intermediate-risk variants, such as PALB2, CHEK2 and ATM account for about 5% of the inherited risk2,3 and common low-risk variants for another 18–19%4,5,6.

Checkpoint kinase 2 is a protein product of the CHEK2 gene that localizes to chromosome 22q12.1. It is part of the network that responds to DNA damage in order to maintain genomic integrity7. The protein-truncating variant CHEK2:c.1100delC is associated with a two-threefold risk of breast cancer8,9. In women with familial aggregation of breast cancer, the risk is even higher. An odds ratio of up to 4.8 has been seen in women with a family history of breast cancer, which is equivalent to a 37% cumulative risk of breast cancer by the age of 70 years8,9,10. In addition, the c.1100delC allele has been associated with younger age at onset, a threefold increased risk of a second breast cancer, as well as a worse prognosis among women with oestrogen receptor-positive cancer9,11,12.

The considerably higher risk in women with a family history of breast cancer is in accordance with the suggested polygenic model where several susceptibility loci together confer a multiplicative effect on breast cancer risk13,14. The fact that the model also can be applied to CHEK2:c.1100delC carriers is supported by a study of low-risk breast cancer variants in 34 000 women with and without a family history of breast cancer. A polygenic risk score (PRS) that was based on the combined risk of 74 low risk variants was calculated. The result suggested that the polygenic risk score could be used to stratify risk in c.1100delC carriers and that the low-risk variants explained a part of the familial risk. The authors estimated that 20% of CHEK2:c.1100delC carriers with the highest PRS had an estimated lifetime breast cancer risk of > 30%. Correspondingly, 20% of carriers with the lowest PRS had an estimated lifetime risk of 14% which is close to the average population risk15. A synergistic effect between low-risk variants and BRCA1 and BRCA2 mutations has also been shown16. The risk for mutation carriers being affected is thus modified by other genetic variants and family history in addition to lifestyle factors. A risk prediction model, the Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA) has been developed to calculate the lifetime risk of breast cancer, including carriers of a moderate-penetrance allele such as CHEK2:c.1100delC. The BOADICEA model allows risk stratification for established genetic and non-genetic risk factors17. Still, other causative gene variants possibly remain to be identified, since the previously identified low-, intermediate-, and high-risk genes cover less than half of the estimated heritable component. Characterising factors that increase the risk in carriers of moderate-risk alleles is important, in order to identify the high-risk group that benefits most from preventive interventions. In this study, we used whole-exome sequencing of a CHEK2:c.1100delC positive cohort with familial breast cancer, to identify putative risk modifying alleles. In the first phase we aimed to find candidate risk alleles for further validation in the second phase with larger cohorts of CHEK2:c.1100delC positive cases and controls.

Results

We performed whole-exome sequencing in 28 breast cancer cases with germline CHEK2:c.1100delC, 28 familial breast cancer cases and 70 controls. Candidate alleles were selected for validation in larger cohorts (Fig. 1).

Figure 1
figure 1

Flowchart describing the working process of evaluating genotype data in search of variants that specifically modify breast cancer risk in CHEK2:c.1100del carriers. *Breast Cancer Association Consortium.

Recessive variants

We analysed the exome sequencing data for a discovery of rare homozygous variants in CHEK2:c.1100C carriers, to identify risk alleles with recessive inheritance pattern. Only one variant was suggested, rs16897117. Among the 28 CHEK2 carriers, there were 3 patients homozygous for rs16897117, whereas among the non-carrier breast cancer cases or healthy controls, there were no rs16897117 homozygotes. We set up to test the hypothesis of rs1689711 being a CHEK2:c.1100C risk modifier in larger sample collections, starting with 67 CHEK2 patients, as well as 688 non-carrier breast cancer cases and 246 healthy controls. This study confirmed the skewed allele distribution, with fewer individuals heterozygous for rs16897117 among the CHEK2 patients than among non-carrier patients or healthy controls. In a case-only analysis, the odds ratio between rs16897117 rare allele (A) and CHEK2:c.1100delC was 0.46 (95% confidence interval CI 0.17–1.04, P 0.053 (Table 1: SWEA1).

Table 1 Rs16897117 association with CHEK2:c.1100delC in a case-only analysis.

Next, we did another follow-up using 45 CHEK2 carriers plus 87 familial breast cancer patients and 47 controls from the Swedish cohorts. None of the CHEK2 carriers or the familial breast cancer patients were found to be homozygous for the rs16897117 variant. The only two homozygous individuals of this follow-up were identified in the control group. No skewness in allele distribution was observed in any of these groups (Table 1: SWEA2). The results seemed less clear, but to resolve this, we tested the association between rs16897117 and c.1100delC in a Finnish population, where the c.1100delC allele has a relatively high, 1.2%, frequency18. Genotyping of three independent patient series identified a single c.1100delC carrier patient, who was homozygous for rs16897117. The skewed allele distribution for rs16897117 was observed in the Helsinki cohorts, but not in the Tampere cohort. A study-stratified OR for association between rs16897117 and c.1100delC, combining all cohorts from Sweden and Finland, was 0.69 (95% CI 0.46–1.03, P 0.073), encouraging further analysis.

Finally, the genotype data for rs16897117 and c.1100delC were obtained from the OncoArray project of the Breast Cancer Association Consortium5. The availability of a good number of healthy c.1100delC carriers in the consortium data enabled a proper interaction analysis for c.1100delC, rs16897117, and breast cancer risk. In the BCAC data, there was no allelic imbalance between rs16897117 and c.1100delC (Table 2). A likelihood-ratio test comparing a breast cancer risk model with c.1100delC-rs16897117 interaction term with a plain model with c.1100delC and rs16897117 as independent risk factors did not support rs16897117 as a dosage-dependent risk modifier for c.1100delC carriers (Table 3). The BCAC data included four c.1100delC carriers, who were homozygous for rs16897117. These were all breast cancer cases, but the sample counts were too low for a reliable analysis.

Table 2 BCAC breast cancer cases and healthy controls with available data on CHEK2:c.1100delC and rs16897117 from the OncoArray project.
Table 3 The breast cancer risk associated with CHEK2:c.1100delC and rs16897117 in the BCAC data.

Coding non-synonymous candidate variants

In the discovery phase, exome sequencing data were analysed with a set of criteria in search of CHEK2:c.1100delC candidate variants. Fourteen non-synonymous variants were subject for testing, but only 11 were analysed due to technical issues with TaqMan probes (Table 4). The 11 missense variants detected in the CHEK2:c.1100delC carriers were evaluated in the validation phase. None of the variants could be replicated with similar patterns as in the discovery phase (Table 5). Thus, none was suggested to be a modifier of breast cancer risk in CHEK2:c.1100delC carriers.

Table 4 Variants selected in the discovery phase for further validation.
Table 5 Odds ratios for the 11 validated candidate alleles in CHEK2:c.1100delC familial breast cancer and sporadic breast cancer.

Discussion

We aimed to identify candidate risk variants that specifically modify risk in CHEK2:c.1100delC carriers through whole-exome sequencing of a small number of samples followed by validation in a case–control association study. No CHEK2:c.del1100C-specific candidate variants could be identified. Previously identified variants that modify breast cancer risk in CHEK2:c.1100delC carriers are also risk variants in the general breast cancer population. The common low-risk variants that predispose to breast cancer have also shown synergistic effects with CHEK214. To our knowledge, no other genetic modifiers of CHEK2:c.1100delC have been suggested. Previously identified common alleles, associated with breast cancer in the general population have also been shown to modify risk in BRCA1 and BRCA2 mutation carriers, in a subtype specific manner16. A recent GWAS identified several novel loci that were associated with at least one tumour feature (ER-status, progesterone receptor status, tumour grade, human epidermal growth factor 2 receptor) and also loci that differed by the molecular subtype, luminal or non-luminal, of breast cancer19. The observations imply that tumour features should be taken into account when searching for candidate variants in CHEK2:c.del1100C carriers. Several loci that specifically modify risk in BRCA1 and BRCA2 carriers have also been found16,20,21,22,23,24,25,26,27,28,29. These are all low-risk susceptibility alleles identified through testing of candidates from breast cancer genome-wide association studies in BRCA1/2 mutation carriers and through fine-mapping of candidate regions.

Future studies of CHEK2:c.1100delC modifying candidates could be done with more loose criteria in the discovery phase to increase the probability of finding good candidates for further testing. In accordance with previous findings, gene-specific modifiers are likely to be common low-risk variants. CHEK2:c.1100delC-specific modifiers may then rather be identified through large-scale genome-wide association studies. With this method, we found no support for a CHEK2:c.1100delC-specific genetic modifier. More studies of CHEK2:c.1100delC genetic modifiers are therefore warranted to improve risk assessment in clinical practice.

Methods

In order to identify candidate variants, we conducted a discovery phase, where whole-exome sequencing was performed in 28 CHEK2:c.1100delC carriers with familial breast cancer, another 28 familial breast cancer patients and 70 healthy controls (spouses of colorectal cancer patients) from the Swedish cohorts. Candidate variants were validated in larger cohorts (Fig. 1).

Sample preparation, discovery phase

Genomic DNA was subjected to whole-exome sequencing at the National Genomics Infrastructure in Uppsala, Sweden. Exome-enriched sequencing libraries were prepared using the Agilent SureSelectXT Human All Exon V5 XT2 + UTR kit (Agilent, Santa Clara, California, US). Cluster generation and 125 cycle paired-end sequencing was performed using the Illumina HiSeq 2500 system and v4 sequencing chemistry (Illumina, San Diego, California, US). Next-generation sequencing was performed at SciFiLab, University of Uppsala.

Selection of non-synonymous candidate variants

After exome sequencing, all detected coding non-synonymous variants in the CHEK2:c.1100delC carriers were evaluated. The cases of hereditary breast cancer and the healthy controls (spouses of cases with hereditary colon cancer) served as genotyping controls in the work of identifying candidate alleles. Only variants passing a set of criteria, described below, were selected for further evaluation. The criteria were as follows:

Allele frequency

Ratios of the allele frequencies of the variants were calculated. A ratio of 2.0 or more between CHEK2:c.1100delC cases and healthy controls and/or a ratio of 1.5 or more between CHEK2:c.1100delC cases and familial breast cancer cases was required.

Gene function

Genes/variants that were selected should display a function of a putative cancer driver gene when evaluated by online genome browser databases (OMIM, GeneCards) and scientific publications available on PubMed.

Reference databases

A more than 30% higher allele frequency in CHEK2:c.1100delC carriers compared with regional reference databases was required (ExAC non-Finnish population, 1000genome2014oct European, SweGen Variant Frequency Browser, exome sequencing data from 200 Danes30 and anonymous exome data from a cohort of 249 controls from the Department of Clinical Genetics, Karolinska University Hospital).

Sequencing accuracy

Only variants with a sequencing accuracy of 65%, or more, in all study groups were included. The variants passing the selection criteria were functionally annotated using the in silico tools SIFT, Polyphen2 HDIV/HVAR, LRT, MutationTaster, FATHMM, RadialSVM, LR, and MutationAssessor.

Validation of non-synonymous candidate variants

Eleven SNPs (rs2297809, rs17860405, rs8176786, rs34523498, rs117739035, rs34983477, rs152451, rs811925, rs7962217, rs34492126 and rs2287749) were genotyped using TaqMan SNP genotyping assay (Thermo Fisher Scientific, Waltham, Massachusetts, USA). rs35932273 was genotyped by Sanger sequencing following PCR. The candidates were validated in 72 cases with CHEK2:c.1100delC, 328 cases of sporadic breast cancer, 408 cases of familial breast cancer and 284 controls from the Swedish cohorts.

Genotyping of a recessive candidate allele

Exome sequencing data were analysed in search of recessive candidate variants in CHEK2:c.1100delC carriers. One recessive variant, rs16897117, was suggested, as among the 28 CHEK2 carriers, there were 3 patients homozygous for rs16897117, whereas among the non-carrier breast cancer cases or healthy controls, there were no rs16897117 homozygotes. The rs16897117 was further evaluated in Swedish and Finnish cohorts and in data from the Breast Cancer Association Consortium, BCAC.

Swedish cohorts

The 28 samples from CHEK2:c.1100delC carriers analysed in the discovery phase were collected from the Department of Clinical Genetics, Karolinska University Hospital. A total of 112 samples from CHEK2:c.1100delC carriers were collected from the SWEA-study, a national Swedish collaboration aiming to study the prevalence of established breast cancer genes as well as to validate candidate genes and single nucleotide polymorphisms (SNPs) in Swedish women with familial breast- and ovarian cancer (72 and 112 samples for validation of non-synonymous variants and the recessive variant respectively). All CHEK2:c.1100delC carriers were previously affected by breast cancer except for two carriers who had been diagnosed with ovarian cancer. All cases of hereditary breast cancer were collected from the Department of Clinical Genetics, Karolinska University Hospital and had previously received counselling and screened negative for relevant high-risk genes (28 samples for discovery phase, 87 and 408 samples for validation of non-synonymous variants and the recessive variant respectively). Cancer-free spouses of colorectal cancer patients served as controls (70 samples for the discovery phase, 284 and 293 samples for validation). They were recruited through the Swedish Colorectal Cancer Low-Risk Study. All 775 cases of breast cancer used for evaluating the recessive variant were collected from the Department of Clinical Genetics, Karolinska University Hospital. The 328 cases of sporadic breast cancer samples used in validation of non-synonymous variants were collected from a population-based cohort from Södersjukhuset, Stockholm. Genomic DNA was extracted from peripheral blood samples. Samples were genotyped using TaqMan SNP genotyping assay (Thermo Fisher Scientific, Waltham, Massachusetts.

Finnish validation cohorts

Rs16897117 was genotyped in two breast cancer cohorts from the Helsinki region, one including 1721 unselected cases and 755 additional familial cases18,31,32,33 and another consisting of 993 unselected cases34, as well as in a cohort of 666 breast cancer patients from the Tampere region, described in detail previously31,33 (Table 1). CHEK2:c.1100delC genotype data were readily available from one of the Helsinki cohorts35, the other two Finnish cohorts were genotyped for c.1100delC with a TaqMan assay.

BCAC data

The BCAC data used for final validation of the rs16897117 was retrieved from the OncoArray project, described previously5. We included in the analysis the independent studies participating in the consortium, if there was sufficient data on reliably imputed c.1100delC available (at least 10 carrier cases and 10 healthy carrier controls per study). Only the study subjects with European ethnic background were included, and the Swedish and Finnish cohorts included in the discovery analyses were excluded. The selection yielded 13,767 breast cancer cases and 21,456 controls (Table 2).

Statistical analysis

Odds ratios, 95% confidence intervals and p-values were calculated to test the association with allele frequency using the DeFinetti programme provided as an online source36. The validation analyses were performed using R environment for statistical computing version 3.6.1 (R Core Team (2019)37. For the case-only analysis of the Swedish and Finnish cohorts, a stratified Mantel–Haenszel odds ratio was estimated with R library epiDisplay38. The BCAC data analysis was performed with logistic regression. The interaction between c.1100delC and rs16897117 was assessed with likelihood-ratio test.

Ethics declaration

This study was approved by the Ethics Committee of Karolinska Institutet/Karolinska University Hospital. All individual studies, from which data was used, were approved by the appropriate medical ethical committees and/or institutional review boards. All methods were performed in accordance with the relevant guidelines and regulations. All study participants provided informed consent.