Background

Heredity represents a major cause of colorectal cancer (CRC) with at least 20% of the cases estimated to develop due to genetic factors and about 5% being linked to inherited variants in cancer-predisposing genes [1,2,3,4]. Currently, patients with CRC are referred to germline mismatch repair (MMR) testing based on the identification of high-risk phenotypic features (i.e. early age of onset, family history, clinical criteria), but beyond microsatellite instability (MSI) and MMR immunohistochemistry (IHC) testing for Lynch syndrome (LS), no systematic approach to hereditary risk assessment exists [5].

LS is caused by a defective MMR system due to presence of germline defects in at least one of the MMR genes, MLH1, MSH2, MSH6, PMS2 or to deletions of the 3′ portion of the EPCAM gene [6]. LS is clinically classified according to the Amsterdam (AMS) criteria and/or the Bethesda guidelines, both relying in clinical information and family history. The Bethesda guidelines also take into account the MSI signature characteristic of MMR-deficient tumors [7,8,9,10]. LS patients have an increased lifetime risk of CRC (70–80%), endometrial cancer (50–60%), stomach cancer (13–19%), ovarian cancer (9–14%), cancers of the small intestine, the biliary tract and brain as well as carcinoma of the ureters and renal pelvis [11].

However, a high proportion of cases who meet the clinical criteria for LS (~ 60%) do not carry pathogenic variants in the MMR genes and have been reported as familial colorectal cancer type X (FCCTX) or Lynch-like syndrome (LLS) according to their MSI status [12,13,14,15,16]. The genetic mechanisms are undetermined in the majority of these families [14].

DNA sequencing (DNA-seq) studies using multigene panels have reported that as much as ~ 18% of patients diagnosed with CRC below the age of 50 years have pathogenic variants in several genes that are not traditionally associated with CRC (ATM, CHEK2, BRCA1, BRCA2, CDKN2A and PALB2) [5, 17]. Notably, there is a need to determine whether these variants contribute to hereditary CRC risk via the combination of low- and moderate-penetrance susceptibility alleles [5, 17, 18].

Given the high frequency and wide spectrum of pathogenic variants, it has been suggested that genetic counseling and testing with a multigene panel should be considered for all patients with early-onset CRC [17, 19,20,21,22,23]. Importantly, the identification of high-risk CRC patients is a major issue, because morbidity and mortality from CRC and extracolonic cancers in these patients and their relatives can be decreased by early screening and intensive surveillance [19, 24,25,26].

In an effort to discover inherited genetic variants that influence biological and clinical characteristics of familial CRC developed in unrelated high-risk patients, who previously tested negative for pathogenic variants in MMR genes, we examined 44 cancer associated genes using next generation sequencing (NGS), and applied minigene-based assay to analyze the impact of a subset of genetic variants on RNA splicing.

Methods

Study population

The Hereditary Cancer Biobank of the Norwegian Radium Hospital was used to identify unrelated high-risk CRC individuals from families that fulfilled the AMS criteria or the revised Bethesda guidelines [7,8,9,10, 27]. By the standard diagnostic clinical techniques, all study subjects were demonstrated not to carry pathogenic variants or large genomic rearrangements in MMR genes (MLH1, MSH2, MSH6 or PMS2).

Ethical approval for the study was granted by the Norwegian Data Inspectorate and Ethical Review Board (ref 2015/2382). All examined patients signed an informed consent for their participation in the study.

Targeted sequencing

Genomic DNA was isolated from peripheral blood samples and targeted sequencing was carried out using a TrueSeq amplicon based assay v.1.5 on a MiSeq apparatus, as previously described [28, 29]. The 44-gene panel used in this study includes genes associated with cancer predisposition as described in a prior study [28, 29].

Sequencing data analysis

Paired-end sequence reads were aligned to the human reference genome (build GRCh37) using the BWA-mem algorithm (v.0.7.8-r55) [30]. The initial sequence alignments were converted to BAM format and subsequently sorted and indexed with SAMtools (v.1.1) [30]. Genotyping of single nucleotide variants (SNV) and short indels was performed by GATK’s HaplotypeCaller. Filtering of raw genotype calls and assessment of callable regions/loci were done according to GATK’s best practice procedures, as described more detailed previously [28].

Variants were annotated using ANNOVAR (version November 2015) [31] and were queried against a range of variant databases and protein resources, namely dbSNP (build 147) [32], 1000 Genome Project phase3 [33], Exome Aggregation Consortium (ExAC) (http://exac.broadinstitute.org, accessed August 2015) [34], Genome Aggregation Database (gnomAD) (http://gnomad.broadinstitute.org, accessed October 2017) [34], Norwegian Germline Variations Database (http://norgene.no/vcf-miner/, accessed October 2017), ClinVar (May 2016) [35], UniProt Knowledgebase (release March 2016) [36] and the Pfam protein domain database (v29, December 2015) [28, 37].

Nomenclature and classification of genetic variants

The nomenclature guidelines of the Human Genome Variation Society (HGVS) were used to describe the detected genetic variants [38]. The recurrence of the identified variants was established by interrogating four databases (in their latest releases as of November 2016): the Leiden Open Variation Database (LOVD), the Universal Mutation Database (UMD), ClinVar and the Human Gene Mutation Database (HGMD). The variants were classified according to the 5-tier classification system into the following categories: class 5 (pathogenic), class 4 (likely pathogenic), class 3 (uncertain variants or variants of unknown significance, VUS), class 2 (likely not pathogenic) and class 1 (not pathogenic) [3].

In silico analyses of VUS

Two types of bioinformatics methods were used to predict the impact of selected variants on RNA splicing. First, we used MaxEntScan (MES) and SSF-like (SSFL) to predict variant-induced alterations in 3′ and 5′ splice site strength, as described by Houdayer et al. 2012 [39], except that here both algorithms were interrogated by using the integrated software tool Alamut Batch version 1.5, (Interactive Biosoftware, http://www.interactive-biosoftware.com). For prediction of variant-induced impact on exonic splicing regulatory elements (ESR), we resorted to ΔtESRseq- [40], ΔHZei- [41], and SPANR-based [42] as described by Soukarieh et al. [43]. Score differences (Δ) between variant and wild-type (WT) cases were taken as proxies for assessing the probability of a splicing defect. More precisely, we considered that a variant mapping at a splice site was susceptible of negatively impacting exon inclusion if ΔMES≥15% and ΔSSFL≥5% [39], whereas an exonic variant located outside the splice sites was considered as a probable inducer of exon skipping if negative Δ scores (below the thresholds described below) were provided by all the 3 ESR-dedicated in silico tools. We chose the following thresholds: <− 0.5 for ΔtESRseq-, <− 10 for ΔHZei-, and < − 0.5 for SPANR-based scores. In addition, we evaluated the possibility of variant-induced de novo splice sites by taking into consideration local changes in MES and SSFL scores. In this case, we considered that variants located outside the splice sites were susceptible of creating a competing splice site if local MES scores were equal to or greater than those of the corresponding reference splice site for the same exon.

In silico protein impact predictions of missense variants were performed with Align-GVGD (the VUS were predicted as deleterious when the values were from C35 or higher), SIFT, and MAPP using Alamut Batch version 1.4.4 (Interactive Biosoftware) and additionally with PolyPhen-2 and MutationTaster [44,45,46,47,48].

Cell-based minigene splicing assays

In order to determine the impact of selected exonic variants on splicing, we performed functional assays based on the comparative analysis of the splicing pattern of WT and mutant reporter minigenes, as follows. First, genomic regions containing the exon of interest (internal exons only) and at least 150 nucleotides of the flanking introns were amplified by PCR [49] using patients’ DNA as template and primers indicated in Additional file 1: Table S1. Next, representative minigenes were created by inserting the PCR-amplified fragments into a previously linearized pCAS2 vector [43]. All constructs were sequenced to ensure that no unwanted mutations had been introduced into the inserted fragments during PCR or cloning. Then, WT and mutant minigenes were transfected into HeLa cells grown in 12-well plates (at ~ 70% confluence) using the FuGENE 6 transfection reagent (Roche Applied Science). Twenty-four hours later, total RNA was extracted using the NucleoSpin RNA II kit (Macherey Nagel) and, the minigenes’ transcripts were analyzed by semi-quantitative RT-PCR using the OneStep RT-PCR kit (Qiagen), as previously described [43]. The sequences of the RT-PCR primers are shown in Additional file 1: Table S1. Later, RT-PCR products were separated by electrophoresis on 2.5% agarose gel containing EtBr and visualized by exposure to UV light under saturating conditions using the Gel Doc XR image acquisition system (Bio-Rad), followed by gel-purification and Sanger sequencing for proper identification of the minigenes’ transcripts. Finally, splicing events were quantitated by performing equivalent fluorescent RT-PCR reactions followed by capillary electrophoresis on an automated sequencer (Applied Biosystems), and computational analysis by using the GeneMapper v5.0 software (Applied Biosystems).

Results

Clinical characteristics and family history

Upon querying the Hereditary Cancer Biobank of the Norwegian Radium Hospital for cases that fulfill the AMS and/or the revised Bethesda guidelines, we identified 34 unrelated potential high-risk CRC individuals who did not carry pathogenic variants in MMR genes. The median age at first CRC diagnosis was 51.5 years (range: 34–86 years).

Pedigree information showed that 13 (38%) families fulfilled the AMS I and/or II criteria and the revised Bethesda guidelines while 21 (62%) met the revised Bethesda guidelines only (Table 1). Fifteen (44%) patients had tumors with MSI and/or MMR IHC data available, of which 2 (13%) were MSI-high and/or MMR deficient. Clinical, family and tumor data information is detailed in Table 1.

Table 1 Summary of International Classification of Diseases, 9th Revision (ICD9), gender, age at diagnosis, clinical criteria and tumor molecular characteristics of the familial CRC families

Germline findings

Given that the families that fulfilled the AMS criteria and/or the Bethesda guidelines did not carry pathogenic variants in the MMR genes, we hypothesized that other genes could be implicated in the genetic determinism of these phenotypes.

In order to pursue this hypothesis, we collected DNA samples from all probands and performed high-throughput sequencing of a panel of 44 cancer-associated genes. For the 34 samples, mean depth of coverage ranged from 127 to 507 with the fraction of target bases with coverage ≥25 ranging from 80% to 93. The NGS results revealed that each individual carried an average of 26 SNV (between 19 and 33 per individual) in the set of 44 cancer susceptibility genes, most of which were common polymorphisms (allele frequency ≥ 1% in the general population) according to the ExAC database, and some being classified as benign or likely benign (class 1 or class 2) according to either ClinVar or the American College of Medical Genetics and Genomics (ACMG) guidelines [35, 50] (Table 2).

Table 2 Characterization of germline variants found among Norwegian familial CRC individuals

Importantly, we identified a likely pathogenic variant in a moderate-penetrance gene (CHEK2 c.470 T > C, p.I157T) in a female patient diagnosed with colon cancer at 42 years, melanoma at 44 years and BC at 57 years with a proficient IHC MMR profile and fulfilling the revised Bethesda guidelines (Patient 19,609) (Table 1).

The CHEK2 c.470 T > C has been classified as pathogenic according to the ACMG guidelines [51], and has a lower allele frequency (1.89*10–3) in the Norwegian population, compared to the non-Finnish European population (5.4*10–3) (http://norgene.no/vcf-miner/ and gnomAD database, respectively) [34, 35, 50]. The variant is reported in ClinVar as “conflicting interpretations of pathogenicity, risk factor” (Variation ID: 5591). When the revised Bethesda guidelines were considered, the mutation detection rate was thus 4.8% (1/21).

Overall, 25 unique VUS were found in 18 out of the 34 patients (Table 2). The detected VUS were distributed among 17 different genes: MAP3K1 (in 2 patients), NBN (in 3 patients), NOTCH3 (in 3 patients), RAD51B (in 3 patients), MSH2 (in 2 patients), PALB2 (in 2 patients), POLE (in 2 patients) and the remaining were found in APC, ATM, AXIN2, BRCA1, CHEK2, EPCAM, MSH6, MUTYH, RAD51C and STK11 (Table 2). The minor allele frequency (MAF) values of these variants were very low or no frequency data have been reported.

Protein and splicing-dedicated in silico analyses

The 25 unique VUS were analyzed by using five in silico prediction tools with different underlying algorithms to estimate the impact of the variants on the structure and function of the corresponding proteins.

Concordances between the 5 prediction tools were found for 2 out of the 25 VUS, suggesting a potentially damaging effect on protein level for the variants: MUTYH c.812G > A (p.R271Q) and MSH2 c.128A > G (p.Y43C) (Table 3). In the other hand, 6 out of 25 VUS were consistently predicted as benign: NBN c.1720 T > A (p.L574I),

Table 3 In silico data obtained for the variants of unknown significance (VUS) identified in our study of familial CRC individuals

BRCA1 c.4315C > T (p.L1439F), MAP3K1 c.764A > G (p.N255S), CHEK2 c.74 T > C (p.V25A), PALB2 c.232G > A (p.V78I) and APC c.4334C > T (p.T1445I). Discrepancies were pronounced for the variants in the POLE (n = 2), STK11, MAP3K1, PSMC3IP, RAD51C, MSH6, AXIN2, MSH2, NBN, NOTCH3, RAD51B, PALB2 and EPCAM genes (Table 3).

Two out of the 25 VUS were bioinformatically predicted to affect RNA maturation by potentially modifying splicing signals (Table 3). More specifically, according to our in silico results, NOTCH3 c.5854G >A (identified in Patients 3222 and 4932) was predicted to potentially induce exon 32 skipping by alteration of exonic splicing regulatory elements, whereas MAP3K1 c.764A > G (detected in Patient 21,368) was predicted to introduce a deletion of the first 131 nucleotides of exon 3 (r.634_764del) due to the creation of a putative new acceptor splice site. Skipping of NOTCH3 exon 32 would produce a transcript with a frameshift deletion of 98 nucleotides (NOTCH3 r.5816_5913del), potentially leading to the production of a carboxy-terminally truncated NOTCH3 protein p.(Lys1940Glyfs*14). The MAP3K1 r.634_764del transcript would be expected to be degraded by nonsense mediated decay and/or result in a very short MAP3K1 protein p.(Val212Leufs*45). The NOTCH3 c.5854G >A was identified in two patients (Patients 3222 and 4932) that fulfilled the revised Bethesda guidelines and AMS criteria, respectively while the MAP3K1 c.764A > G (Patient 21,368) in a patient which family fulfilled the revised Bethesda guidelines (Table 1).

Minigene splicing assays

Because patient RNA was not available, we decided to experimentally assess the impact of these 2 variants (NOTCH3 c.5854G >A and MAP3K1 c.764A > G) might have on RNA splicing by performing cell-based minigene splicing assays.

As shown in Fig. 1 we found that NOTCH3 c.5854G >A and MAP3K1 c.764A > G did not modify the splicing pattern of the minigenes’ transcripts. These data thus disagree with the in silico predictions and suggest that either the exon 32 of NOTCH3 and the exon 3 of MAP3K1 are refractory to splicing mutations (the predictions thus being incorrect) or that the minigenes used in our study do not fully reproduce the splicing pattern of the mutant exons in NOTCH3 and MAP3K1 bona fide transcripts (the predictions being eventually correct). Complementary studies using RNA from NOTCH3 c.5854G >A and MAP3K1 c.764A > G carriers need to be performed to verify the pertinence of these results.

Fig. 1
figure 1

Evaluation of variant-induced splicing alterations by using a cell-based minigene assay. a Structure of pCAS2 minigenes used in the splicing reporter assay. The bent arrow indicates the CMV promoter, boxes represent exons, lines in between the boxes indicate introns, and arrows below the exons represent primers used in RT-PCR reactions. The minigenes were generated by inserting a genomic fragment containing the exon of interest together with its flanking intronic sequences into the intron of pCAS2, as described under Materials and Methods. b Analysis of the splicing pattern of pCAS2 minigenes carrying variants identified in this study. Wild-type (WT) and mutant constructs, as indicated, were introduced into HeLa cells and the transcripts of the minigenes were analyzed by RT-PCR 24 h post-transfection. The image shows the results of a representative experiment in which the RT-PCR products were separated on a 2.5% agarose gel stained with EtBr and visualized by exposure to ultraviolet light. M, 100 bp DNA ladder (New England Biolabs). c Quantification of splicing events observed in the minigene splicing assay. The relative levels of exon inclusion indicated under the gel are based on RT-PCR experiments equivalent to those shown in B but performed with a fluorescent forward primer and then separated on an automated sequencer. Quantification results were obtained by using the GeneMapper v5.0 software (Applied Biosystems) and correspond to the average of two independent fluorescent-RT-PCR experiments. d Representative fluorescent RT-PCR experiment. The panel shows superposed peaks corresponding to the WT and mutant products (in blue and red, respectively), as indicated

Discussion

The major unexpected finding in our Norwegian high-risk CRC cohort was the detection of a likely pathogenic variant in CHEK2 (c.470 T > C, p.I157T), a moderate-penetrance gene not traditionally associated with CRC, in an individual with a LS-evocative personal/family history and a high number of Class 3 variants in BC- and CRC- associated genes. Interestingly, the CHEK2 (c.470 T > C, p.I157T) has an allele frequency of 1.89*10–3 in the Norwegian population (http://norgene.no/vcf-miner/), and is reported in ClinVar as having conflicting interpretations of pathogenicity/being a risk factor (Variation ID: 5591). Importantly, there is no systematic classification for most of the genetic variants found by NGS, and, in more general terms, the impact of low- to moderate-penetrance pathogenic variants with respect to clinical management is not fully understood [52]. Co-segregation or case-control studies for further evaluation will be key in understanding whether such germline variant may have a modifying effect, since we do not yet have evidence-based guidelines for the majority of these genes.

On the other hand, CHEK2 germline variants have been described to confer an elevated risk of BC (relative risk = 3.0) [53]. However, the presence of pathogenic variants in CHEK2 is not frequently associated with cancer in high-risk BC families, prompting speculation that there may be several low-penetrance or moderate-penetrance BC risk genes segregating independently within these families [23, 54, 55]. Co-segregation analyses may add clues in our understanding whether this germline variant is implicated in CRC predisposition. Finally, we did not find pathogenic variants in POLE in our cohort, which is in contrast to what has been described in families with high burden of CRC adenomas and carcinomas in addition to extra-colonic cancers [56].

According to the Prospective LS Database (PLSDB), a total of 125 Norwegian families had a demonstrated pathogenic variant in either MLH1 (n = 21), MSH2 (n = 52), MSH6 (n = 36), or PMS2 (n = 16) [25]. On the other hand, a large portion of high-risk CRC families without pathogenic variant in MMR or EPCAM genes may be explained by a polygenic model involving a combination of multiple genomic risk factors, including the effect of either low-penetrance susceptibility alleles [57], high-penetrance genes which have not been tested, or the effect of environmental factors. In addition, emerging data suggest that CRC cases negative for pathogenic MMR variants may contain a significantly higher number of copy-neutral loss of heterozygosity (cnLOH) regions, some located within well-known oncogenes and tumor suppressor genes, compared to cases of sporadic CRC [58]. These genomic variations, which were not investigated in this study, may provide an additional explanation for high-risk CRC phenotypes without MMR or EPCAM pathogenic variants.

Recent NGS studies described the presence of heterozygous pathogenic BRCA1/2 or APC variants as well as biallelic MUTYH alterations in individuals with clinical features resembling those of LS [5, 22]. More precisely, those studies reported that 7% of patients with CRC carried pathogenic variants in non-LS genes, including 1.0% with BRCA1/2 mutations, and nearly two thirds of probands with high-penetrance non-LS mutations lacked clinical histories suggestive of their respective syndromes [5].

From 34 high-risk CRC individuals, our NGS panel testing identified one patient that carried a pathogenic variant in a gene with reportedly moderate penetrance. Our finding is in line with the mutation frequency (6%) in non-LS cancer susceptibility genes for individuals undergoing LS genetic testing [21] and 4% of patients with BC tested negative for BRCA1/2 genes [23]. Our results may have implications for an appropriate genetic counseling and follow-up of the patients and family members.

Besides the likely pathogenic CHEK2 variant, we identified a total of 25 variants in our cohort for which there were not so much data as to their clinical significance. We thus undertook bioinformatics analyses in an attempt to predict the biological impact of these Class 3 variants, both at the RNA and protein level, the ultimate goals being: (i) to discriminate pathogenic from non-pathogenic alterations in this set of variants and (ii) to further pinpoint the genetic determinants of high risk CRC in our cohort. On one hand, our RNA splicing-dedicated bioinformatics evaluation predicted that 2 out of the 25 VUS identified in this study (NOTCH3 c.5854G >A, p.V1952 M and MAP3K1 c.764A > G, p.N255S) could potentially affect RNA splicing. These two variants were then experimentally analyzed by performing minigene splicing assay. Our results revealed that neither variant altered the splicing pattern of the representative minigenes, suggesting that they do not affect the splicing of NOTCH3 or MAP3K1 transcripts. Additional experiments based on the analysis of RNA from carriers of these variants will be important to verify our minigene results. On the other hand, our protein-dedicated bioinformatics analysis yielded 8 consistent predictions (2 VUS predicted as deleterious and 6 as benign) and several conflicting results that were not explored further.

In this scenario, not only functional tests, but also co-segregation studies will be key to understanding whether the VUS detected in this work are non-pathogenic or otherwise have a causal or a modifying effect. Importantly, we do not yet have evidence-based guidelines for the majority of the genes carrying the VUS identified in this study and, in more general terms, the impact of low- to moderate-penetrance pathogenic variants with respect to clinical management is not fully understood. Most of these variants may in the future be reclassified as deleterious or benign, but in the meantime, they cannot be used to make clinical decisions [59]. Informed (re)classification of VUS in cancer-associated genes may cater to more appropriate risk-management, and may provide significant clues for the identification of additional patients carrying such uncommon variants.

NGS panel testing may benefit patients with a personal or family history compatible with more than one recognized CRC inherited syndrome. The CRC risk management strategy for these individuals is not yet available and there is a need to identify new high-, moderate-, and low- penetrance gene variants that may affect the risk of CRC or LS-associated tumors in non-MMR pathogenic carriers. The identification of such gene variants in combination with family history may contribute to more intensive surveillance and improved prevention [23].

Conclusions

Our study provides information on genetic locus that might possibly be related to cancer susceptibility, demonstrating that genes presently not routinely tested may be important for capturing cancer predisposition in these patients. In addition, we stratified 25 VUS by the use of RNA splicing- and protein-dedicated in silico analyses. Further studies are necessary for making reliable estimates of cancer risk for the VUS found in this study and allowing appropriate genetic counseling for the patients and their relatives.

Surveillance for early cancer detection is essential to ensure optimal survival for patients afflicted with familial cancers. Our findings pinpoint the need of more studies to unravel the mechanisms underlying the development of CRC in high-risk patients and the identifying for new cancer predisposition genes.