Introduction

Breast cancer (BC) is the most common cancer in women and the leading cause of cancer death1. Hereditary risk factors account for many incidences, where having a close relative with BC increases the risk substantially. It is estimated that about 20–30% of all new BC cases are due to hereditary risk factors2,3 and high-risk genes account for 5–10%2.

Analyzing families to identify variants shared by affected individuals has resulted in the identification of the BRCA1 and BRCA2 genes4,5,6. The BRCA1/2 genes are the most common high-risk genes accounting for 15% of the familial cases and result in up to 60–85% risk of developing BC6. Additional high-risk genes have been identified by the candidate gene approach where genes with a function that could contribute to BC, such as DNA repair mechanism, have been screened resulting in the identification of CHEK2, ATM, PALB2, and BRIP17.

Despite the identification of strong genetic risk factors, many BC incidences are of unknown genetic causes. With improved technology and increased sample collection, extensive genome-wide association studies (GWAS) have linked more than 170 genomic loci to increased risk of BC8,9,10,11,12,13. However, these loci are common and confer low risk to BC.

Here, we performed exome sequencing on 59 BC patients from 24 Swedish families with the aim of identifying variants that could contribute to BC. First, pathogenic variants in known BC susceptible genes were analyzed. Secondly, rare and high impact variants in new BC candidate genes shared by all affected family members were identified.

Material and methods

Families

The individuals in this study were BC patients from families that had undergone genetic counseling at the Department of Clinical Genetics, Karolinska University Hospital Solna, Sweden. All families comprised of at least three close relatives with BC (range 3–8 BC patients). As a part of the study, additional family members were recruited when possible. For each family, one to four individuals were whole-exome sequenced, resulting in 59 BC patients from 24 families represented by of 1st to 4th degree relatives. In total, the study used three families with four sequenced individuals (WES-4s, average age of onset 50.8 ± 6.7 years, consisting of 1st to 3rd degree relatives), six families with three sequenced individuals (WES-3s, average age of onset 49.4 ± 11.9 years, consisting of 1st to 4th degree relatives), 14 families with two sequenced individuals (WES-2s, average age of onset 49.4 ± 10.9 years, consisting of 1st to 4th degree relatives) and one family with one sequenced individual (WES-1s, age of onset 47 years).

All patients gave written informed consent to participate in the study and to donate blood samples. The study was approved by the regional ethics committee in Stockholm. All methods were conducted in accordance with the Declaration of Helsinki guidelines.

Exome sequencing of BC patients

DNA was quantified using a Qubit Fluorometer (Life Technologies, US). Sequencing libraries were prepared according to the TruSeq DNA Sample Preparation Kit EUC 15005180 or EUC 15026489 (Illumina, US) at an average coverage of 100×. Briefly, 1–1.5 ug of genomic DNA was fragmented (Covaris, Inc., US) and all samples were subjected to end-repair, A-tailing, and adaptor ligation (Illumina Multiplexing PE adaptors). A gel-based size selection step was performed, and the adapter-ligated fragments enriched by PCR, followed by purification using Agencourt AMPure Beads (Beckman Coulter, Sweden). Exome capture was performed by pre-pooling equimolar amounts and performing enrichment in 5- or 6-plex reactions according to the TruSeq Exome Enrichment Kit Protocol (EUC 15013230). Library size was analyzed on a Bioanalyzer High Sensitivity DNA chip (Agilent Technologies, Sweden) and concentration calculated by quantitative PCR. The pooled DNA libraries were clustered on a cBot instrument (Illumina) using the TruSeq PE Cluster Kit v3. Paired-end sequencing was performed for 100 cycles using a HiSeq 2000 instrument (Illumina) with TruSeq SBS Chemistry v3, according to the manufacturer’s protocol. Basecalling was performed with RTA (1.12.4.2 or 1.13.48) and the resulting BCL files were filtered, de-multiplexed, and converted to FASTQ format using CASAVA 1.7 or 1.8 (Illumina).

Bioinformatics workflow

Sequencing reads were aligned to the reference genome GRCh37 using BWA14 and Picard (http://broadinstitute.github.io/picard/) used to mark PCR-duplicated reads. Variants were called using GATK by following the best practice procedure implemented at the Broad Institute15. Variant annotation was done by ANNOVAR16, including RefSeq gene17 and dbSNP15018. Max minor allele frequency (MMAF) was calculated from the ExAC19, 200Danes20, SweGen21, and 1000 Genomes Project allele frequencies22. To assess and predict pathogenic effects of the variants ClinVar23,24, ACMG classification25 and the in silico predictor tool CADD26 were used. CADD > 20 and CADD > 30 indicate the 1% and 0.1% of most deleterious variants, respectively.

To exclude variants with missing data, BC genotype frequency (BC_GF) was calculated for every variant. A variant with a BC_GF of 0.8 indicates that 80% of the patients had genotypes for that particular variant. No alternative method was used to confirm the genetic variants identified in this study. The presence of high-risk variants was confirmed by manual inspection of the bam files in the IGV software27.

Known BC-predisposing genes: variant selection

Variants in 15 BC and ovarian cancer (OC) genes commonly screened at Karolinska University Hospital as a part of genetic counseling (ATM, BRCA1, BRCA2, BRIP1, CHEK2, EPCAM, MLH1, MSH2, MSH6, NBN, PALB2, PMS2, RAD51C, RAD51D and TP53) were identified in BC families. All variants that (1) had BC_GF > 0.8; (2) MMAF < 0.2; (3) were not considered benign according to ClinVar; and (4) had CADD > 20 were selected for analysis.

Novel BC-predisposing genes: variant selection

Variants that were (1) detected in all family members; (2) had BC_GF > 0.8; (3) with MMAF < 0.01; and (4) with CADD > 20 were selected for further analysis. Additionally, variants that were (1) detected in all family members of WES-3s and WES-4s; (2) had BC_GF > 0.8; (3) with MMAF < 0.001; and (4) with CADD > 25 were defined as high-risk variants.

Ethical statements

All patients gave written informed consent to participate in the study and to donate blood samples. The study was approved by the research ethics committee at Karolinska Institutet and the regional ethics committee in Stockholm. All methods were conducted in accordance with the Declaration of Helsinki guidelines.

Results

Pathogenic variants in known BC-predisposing genes were seen in five BC families

Previously, only one affected individual in each family has been tested for variants in known BC and OC-predisposing genes. Therefore, we searched for variants in the 15 genes from the clinical panel (see “Material and methods” section) in all 59 BC patients from the 24 families.

In total, 10 variants were seen in 13 individuals from 9 families (Table 1). Three of the variants were known pathogenic variants: (1) c.2108delTinsGGA (rs786203384, p.(Lys703fs)) in the BRIP1 gene, (2) c.2748 + 1G > T (rs753153576) in the PALB2 gene, and (3) c.1100delC (rs555607708) in the CHEK2 gene. The BRIP1 frameshift variant and the PALB2 splice donor variant result in protein truncation and were observed in one family each. CHEK2 variant c.1100delC was seen in four individuals from three different families, in the WES-2 family Br15 and the WES-3 family Br7 and in two individuals from the WES-4 family Br1 (Table 1). Five additional missense variants listed as VUS (variant of uncertain significance) or conflict interpretation of pathogenicity were detected in the BC families (Table 1).

Table 1 Variants in known BC and OC predisposing genes.

Since family members of families Br4 and Br16 carried clear pathogenic variants in the PALB2 and the BRIP1 genes, these two families were excluded from further analysis.

Nearly 40 pathogenic variants in novel BC candidate genes were seen in BC families

In the remaining 22 BC families we searched for new BC-predisposing genes. All variants that (1) were observed in all family members within each family; (2) had BC_GF > 0.8; (3) MMAF < 0.01 and (4) CADD > 20 were selected for further analysis.

In two families, the WES-4 and WES-2 families Br2 and Br18, no variants were observed after applying the criteria. In the remaining 20 families, 544 variants in 521 genes were observed (Tables 2, S1S2), where the majority of the variants were missense (n = 506, Table S2). There were a total of 38 variants with potential pathogenic effect (stop-gain, splicing and frameshift indels), where 20 variants were detected in four of the 12 WES-2s families (Br10, Br12, Br21 and Br22) (Table S1).

Table 2 Overview of risk variants with MMAF < 0.01, CADD > 20 and shared by all family members.

Most of the deleterious variants were stop-gain (n = 22) and mainly detected in WES-2s families (n = 15). Two of the stop-gain variants were detected in genes that are involved in DNA damage response, (1) rs146594026 (c.C2152T, p.(Q718*)) in the EXO1 gene that was detected in the WES-1 family Br24 and (2) rs147021911 (c.C5101T, p.(Q1701*)) in the FANCM gene that was detected in the WES-2 family Br21 (Table S1). Furthermore, 29 missense variants with CADD > 30 were detected in the BC families (Table S2). Among those 29 were rs28363218 (c.C604T, p.(R202C)) in the RAD54L gene detected in the WES-2 family Br22, chr2:216240089A > G (c.T5462C, p.(F1821S)) in the FN1 gene detected in the WES-2 family Br12, rs544274181 (c.G1382A, p.(R461H)) in the MET gene detected in the WES-1 family Br24 and rs138942541 (c.G331T, p.(D111Y)) in the ECD gene detected in the WES-2 family Br23 (Table S2).

Recurrent genes, defined as genes segregating variants in more than one family, were seen among the BC families where we identified 46 variants located in 23 genes (Table S3). Two of the variants were detected in two families each: (1) rs142493383, in ALPP gene in families Br11 and Br24, and (2) rs200175537 in the CLEC16A gene in families Br12 and Br17. The DNAH14 and OBSCN genes harbored three missense variants each, while we observed two variants in the remaining 19 genes. All variants were missense, apart from one stop-gain variant seen in the LHCGR gene in the WES-2 family Br19, and two frameshift deletions in the RTN3 and TTLL12 genes seen in the WES-2 families Br 22 and Br10, respectively (Table S3).

To further identify the most likely high-risk variants, a stricter criterion was applied to identify very rare and high impact variants in the larger families with sequencing data from 3 to 4 family members. The variants with MMAF < 0.001 and CADD > 25 were considered the most likely high-risk variants. In total, 22 variants in 22 genes were identified in six of the nine WES-3s and WES-4s families (Table 3). All variants except one were detected in the WES-3 families, and all but two were missense. Most high-risk mutations were detected in family Br7, followed by families Br9, Br5 and Br6 (n = 6, 5, 4 and 4, respectively) (Table 3). A stop-gain variant, rs143701013 (c.C889T, p.(R297*)), in the last exon of the ZNF563 gene was observed in the WES-4 family Br3 where one individual was a homozygote carrier, and a frameshift deletion, rs769623079 (c.631_632del, p.(C211fs)), was seen in exon 7 in the FANK1 gene in the WES-3 family Br5 (Tables 3, S1).

Table 3 High-risk variants with MMAF < 0.001, CADD > 25 and shared by all family members within families of 3 and 4 sequenced individuals.

FANCM stop-gain variant was observed in BC patients from four BC families

Finally, we searched for rare variants with high CADD that were observed in several BC patients, although not segregating within the families. In total, 15 variants with MMAF < 0.01 and CADD > 25 and detected in at least three families were seen (Table S4). A stop-gain variant, rs147021911 (c.C5101T, p.(Q1701*)) in the FANCM gene, was found in 6 individuals from four families. The variant was detected in both family members of the WES-2 family Br21, as well as in two individuals from the WES-4 family Br3 and one individual from each of the two WES-2 families Br18 and Br23 (Table S4). Similarly, a missense variant, rs1065746 (c.G3244C, p.(D1082H)) in the HTT gene was observed in both family members of the WES-2 family Br20, as well as in two individuals from the WES-4 family Br3 and two individuals from the WES-3 family Br8 (Table S4). A missense variant, rs149133270 (c.G1379A, p.(R460Q)), in the MPO gene found in the WES-1 Br24, was also seen in four individuals from two WES-4 families and one WES-3 family. The remaining seven variants were seen in three families each (Table S4).

Discussion

To identify known and novel causative variants that could contribute to hereditary BC, we exome sequenced a selection of patients from 24 Swedish BC families. First, we screened for variants with a pathogenic or possible pathogenic consequence in known BC predisposing genes. Secondly, we searched for rare variants that segregated in BC families with predicted high impact and which could have contributed to the disease.

Three pathogenic variants in the BRIP1, PALB2 and CHEK2 genes were found in five families. Since loss-of-function variants in the BRIP1 and PALB2 genes increase the risk of BC and OC28,29, these variants were considered to be the main cause of the increased cancer risk in these two families. The c.1100delC variant in the CHEK2 gene is a well-known variant considered to confer an increased risk of BC30. However, the risk is considered moderate, and it cannot be concluded that this variant solely explains the BC risk in these families. Several variants with uncertain significance were detected in the BC families. However, further analyses are needed to determine their contribution to BC.

To identify new BC predisposing genes, strict filtering was performed on the remaining families. Variants shared by all family members and with deleterious effects or high CADD were as critera for possible high-risk predisposing variants in the families. In total, 38 deleterious variants and over 500 missense variants were seen in the families, most of them in WES-2s families. Of the 506 missense variants, 29 had CADD > 30 and were considered strong candidates to predispose to the disease.

We observed variants located within genes that have previously been linked to BC. The FANCM gene is part of the Fanconi anemia complementation group, which includes the well-known BC risk genes BRCA2, BRIP1 and PALB2. Like those genes, FANCM is involved in DNA double-strand break repair and has been linked to BC31,32,33. The stop-gain variant in the FANCM gene seen here in Swedish BC patients has previously been reported in BC patients31,32 including familial cases31, and is has been sugressed to be common in Finnish triple-negative BC patients32. Here, it was found in four families, although not in all family members, suggesting this variant can be a risk factor for BC.

Several other interesting variants were seen in genes that could contribute to BC, such as the RAD54L and FN1 genes. The RAD54L gene is involved in DNA recombination, along with the RAD51C and RAD51D genes, and has been linked to BC34. The variant is located in exon 7 that contains helicase motif I and Ia35,36. These motifs identify helicases and are important for protein function. The FN1 gene is involved in cell adhesion, the oncogene MET, and the ECD gene, a cell cycle regulatior, are all interesting candidates and have previvously been reported in BC37,38,39,40,41. Further studies are needed to understand their contribution to BC.

This study has several limitations that need to be considered. The cohort consists of a limited number of BC patients and families that were exome sequenced. Therefore, variants outside of the exons are not analyzed here, and our analysis is limited to single nucleotide variants and smaller indels. Furthermore, a strict selection criterion was applied to identify novel risk genes that are rare and assumed with a high impact, thereby excluding more common variants that might contribute to the disease. Since part of our criteria was that variants needed to segregate within all family members sequenced, we have a bias towards more variants detected in smaller families and families containing close relatives. Finally, only affected family members were analyzed. Including unaffected family members could have been beneficial regarding variant filtering.

Conclusions

Identifying new risk genes is important for genetic counseling of BC families and to determine the cancer risk in family members. Here, we analyzed pathogenic variants in known and novel BC predisposing genes in families with a strong history of BC. Several interesting candidate genes were observed that could have contributed to the disease in these families. Further studies are needed to evaluate the contribution of those genes and variants to and increased BC risk.