Introduction

Around 6 % of colorectal cancers (CRC) comprise hereditary syndromes for which high-penetrant mutations are found in syndrome-specific genes [37]. Lynch syndrome, familial adenomatous polyposis (FAP), MUTYH associated polyposis (MAP), Juvenile Polyposis syndrome (JPS), PTEN hamartoma tumor syndrome and Peutz–Jeghers syndrome (PJS) are among the most well known.

The introduction of next generation sequencing (NGS) using whole-genome sequencing (WGS), whole-exome sequencing (WES) and multigene panels have made it possible to identify a spectrum of new mutations and also new causative genes in hereditary CRC. New syndromes have been described like the recently reported, Polymerase Proofreading-Associated Polyposis (PPAP) and NTHL1-associated polyposis [31, 50]. Established syndromes with unsolved causative genetic mechanisms are also gradually being explored, which is the case for the hereditary mixed polyposis syndrome (HMPS) [12, 51].

CRC syndromes have historically been defined based on family history and/or genetics as well as tumor characteristics. For hereditary non-polyposis colorectal cancer, Amsterdam criteria, tumor testing for microsatellite instability (MSI) as well as presence of causative mutations in the mis-match repair (MMR) genes, have been used for sub-classification [19]. Overlaps in mutation spectrum between polyposis and non-polyposis syndromes are also recognized. The HMPS is characterized by the presence of polyps of several histological types localized to the large bowel. Adenomatous polyps as well as polyps of serrated or sessile serrated type can be present. The JPS and HMPS may show overlapping phenotypes and may appear indistinguishable [13, 26]. In HMPS, duplications in the regulatory domain of GREM1 were recently identified [12, 34], but except from the GREM1 regulatory pathogenic duplications, also causative mutations in BMPR1A have been found [4]. Mutations in BMPR1A have also been reported in hereditary non-polyposis CRC with microsatellite stable (MSS) tumors (FCC type-X) [8, 29]. The duplication upstream GREM1 in a family with a few polyps of a more juvenile histology reported by our group, demonstrates the complexity of the phenotype-based classification of this syndrome [34].

Multigene panel testing in Lynch syndrome has recently been used to identify mutations in unexpected high penetrant cancer-predisposing genes (e.g. BRCA1 and BRCA2) [45, 54]. Several studies have also demonstrated cost benefits as well as gain in mutation detection when using multigene panels compared to analyses of single- or a few genes [9, 39, 40, 45].

Additional research is required to understand the genetic heterogeneity in these groups and the diversity in genotype to phenotype correlations. In our study we used a multigene CRC-panel consisting of 19 high risk- and moderate risk genes as well as clinically less well defined genes in the MMR system and the wnt signaling pathway [28, 48]. The panel was applied to patients diagnosed with CRC divided into six clinical subgroups. The classification into subgroups was based on family history and/or phenotype of the disease. All patients had initially been referred for a specific diagnostic test, Lynch syndrome or a polyposis syndrome. The purpose of this study was to demonstrate the diagnostic difficulties associated with genotype to phenotype diversity. In the performed study, which also included screening for large deletions and duplications, we were able to demonstrate improved mutation detection frequencies compared to conventional multi-step analyses. The strategy also allowed for reduction in costs compared to previously used screening procedures.

Materials and methods

Ninety-one index patients were included in this study. Clinical characteristics of the index patients and families are summarized in Supplementary Table 1. The medical journals were reviewed and the patients were divided into six clinical subgroups. All the patients were originally referred for clinical FAP- and/or Lynch syndrome mutation analyses during 2000–2015, but no mutations were identified in the genes analyzed. The subgroups were: (1) CRC familial or unknown inheritance, not polyposis, (2) unexplained adenomatous polyposis >100 polyps, inheritance, (3) unexplained adenomatous polyposis 1–100 polyps, inheritance, (4) Unexplained adenomatous polyposis, unknown inheritance, (5) familial or simplex atypical polyposis/mixed polyposis/serrated polyposis and (6) polymerase proofreading associated polyposis (PPAP). The study has been approved by the local ethics committee at the University of Gothenburg, Sweden.

DNA extraction, amplification and sanger-sequencing

Genomic DNA was extracted using the BioRobot EZ1 (Qiagen, Hilden, Germany) with the EZ1 DNA Blood 350 µl kit (Qiagen). Amplification, purification and Sanger sequencing were carried out as described previously [14]. Primers used for direct sequencing were identical to those used in the amplification reactions. All primer information is available upon request. All variants found by capture NGS (Next Generation Sequencing) were confirmed with Sanger sequencing.

Library preparation, hybridization capture and MPS sequencing

DNA samples were quantified using the Qubit system (Life Technologies, Carlsbad, CA, USA). Two µg of DNA were fragmented using the Covaris S2 Ultrasonicator (Covaris, Woburn, MA, USA), the samples were then analyzed on the Bioanalyzer (Agilent Technology, Santa Clara, CA, USA) for correct fragment sizes. The SureselectXT Custom 3-5.9 Mb library kit (Agilent Technology, Santa Clara, CA, USA) was used for the capture and included 19 genes APC, MUTYH, BMPR1A, SMAD4, STK11, PTEN, MLH1, MSH2, MSH6, PMS2, EPCAM, CDH1 were all high risk genes. MLH3, MSH3, PMS1,AXIN2, CTNNB1,CHEK2 and MET were genes that are part of the MMR system, wnt signaling and/or were found at the time of the development of the test to have an increased but not well defined risk [24, 28, 44, 53]. Regions of 50 kb upstream and downstream of all genes and all intronic regions were included in the target region. For the APC gene an additional region of 100 kb upstream was included, since causative deletions in the promoter region have been found [23, 32, 35, 41]. For the MET gene only coding exons were targeted. Eight samples were pooled before capture and the concentration of each pooled library was determined by using the Qubit and the Bioanalyzer. Sequencing was performed on the Illumina HiSeq 2000 (Illumina, San Diego, CA, USA) with 2 × 94 or 2 × 97 bp paired end reads.

Analysis of sequencing data

An in-house analysis pipeline was used in which the main steps after demultiplexing included read alignment to the reference human genome hs37d5ss (1000 genome with decoy sequences) by Novoalign, marking of PCR duplicates (Picard tools, http://picard.sourceforge.net) and quality score recalibration, indel realignment and variant calling performed with the Genome Analysis Tool Kit (GATK) package [27]. For all samples and positive controls variants were called with GATK UnifiedGenotyper with a call confidence of 10.

Copy number variation (CNV) analysis

The CNV analysis was based on read depth, one read-pair represents one data point in a sliding window over the target region. A normalized coverage depth ratio including GC-normalization between a sample and an average of 23 normal samples (baseline) were computed. Detection of abnormal coverage ratios were found by visual inspection of plots of the coverage ratios over the targeted regions. Deletions were detected as a lower coverage (cut off 0.75) and duplication as a higher coverage (cut off 1.25). Regions with an abnormal coverage ratio were further inspected in IGV (Integrative Genomics Viewer) and breakpoints were analyzed [33].

Filtration of variants and databases

Variants in exons and in ± 20 bp flanking intronic sequences were evaluated for pathogenicity. Truncating nonsense, frameshift indels and variants located in consensus splice-acceptor and-donor sites were presumed and evaluated as disease causing. All other variants, synonymous and non-synonymous, were compared with the following public databases; the Single Nucleotide Polymorphism database (dbSNP) together with 1000 Genomes [1], the National Heart, Lung and Blood Institute (NHLBI) Exome Sequencing Project (ESP) (http://evs.gs.washington.edu/EVS/), ExAc (Exome Aggregation Consortium, Cambridge, MA (URL: http://exac.broadinstitute.org) [20 (02, 2014) accessed]), TCGA data (www.cbioportal.org) and with in-house information. Variants with a minor allele frequency (MAF) of ≤1 % were further analyzed, the rest of the variants were treated as polymorphisms, this also included likely benign variants. Thirty-two missense variants with an MAF ≤1 % were classified based on three in silico protein prediction tools, SIFT, PolyPhen-2 and CADD. SIFT (Sorting Intolerant From Tolerant) predicts a damaging mutation if the score is ≤0.05, and tolerated if the score is >0.05 [20]. PolyPhen-2 (Polymorphism Phenotyping version 2), predicts probably damaging and possibly damaging mutations with a higher confidence if values are near 1 [3]. The Combined Annotation Dependent Depletion (CADD) is a method to measure deleteriousness by comparing the annotation of fixed or almost fixed derived alleles with those of simulated variants. Several parameters are taken into account when using CADD, including, allelic diversity, annotation and functionality, pathogenicity, disease severity, experimentally measured regulatory effects, complex trait associations and highly ranked known pathogenic variants within individual genomes. Variants that are more likely to be observed in the genome are proposed to be more benign while variants that are more likely to be simulated (not observed) are proposed to have a more deleterious effect. This is measured in a Phred-like scale C-score, were a score of 10 represents the 10 % most deleterious substitutions that can be done to the human genome and a score of 20 represents the 1 % most deleterious variants. Higher score is associated with a higher probability of a deleterious effect with a recommended cut-off at 15 [16].

Classification of variants by the InSiGHT database [46] was considered correct. For variants not included in this database published literature and classification done by HGMD [42] as well as Leiden open source variation (LOVD) databases and also ClinVar [22] were used in combination with in-house information to make a manual classification. The manual classification criteria was used according to the five-class system following guidelines from the International Agency for Research on Cancer (IARC): 1 = Benign, 2 = Likely benign, 3 = Variant of Unknown clinical Significance (VUS), 4 = Likely pathogenic and 5 = Pathogenic.

Results

The gene panel was applied to 91 patients, previously tested negative for mutations in the polyposis genes (APC, MUTYH, BMPR1A, SMAD4, STK11) and/or a combination of different MMR genes (MLH1, MSH2, MSH6, PMS2,) depending on the primary indication when the referral was issued. The patients were sub-grouped based on their clinical characteristics (Supplementary Table 1). Sequencing was performed over the entire gene regions as described and all coding regions were covered at least 30 × except for CDH1 ex1, EPCAM ex1, MSH3, ex1 and MLH1 ex12 which in five samples were covered at least 25x. For the whole targeted region the mean coverage was 417x in all 91 samples. The analyses of variants included the coding region and ± 20 bp of intronic sequences. The CNV analysis was based on the entire covered gene regions.

In total 8 pathogenic class 5 and 8 likely pathogenic class 4 variants were found (Tables 1, 2). This gives a mutation detection frequency of 8.8 % (8/91) for the class 5 variants only and a frequency of 17.6 % (16/91) when also class 4 variants are included. These results are in concordance with the results obtained in other studies of similar gene panels [6, 18]. Two pathogenic variants in PMS2 in patients I:26 and I:50 were missed in the initial analyses of the MMR genes performed in an external laboratory. Thirty-two missense variants, all of them found in a heterozygote state, with MAF ≤1 %, according to the filtration criteria, were analyzed and classified manually or according to the InSiGHT database [46] in the case the variant was included in this database. The results are presented in Table 2 and include four class 5 pathogenic variants, two likely pathogenic class 4 variants and 26 class 3 variants of unknown clinical significance (VUS). The APC variant, c.1902 T > G, was recently found to have a major splicing effect on exon 14 resulting in loss of this exon [10]. The variant was found in a patient (III:61) with unexplained familial adenomatous polyposis (1–100 polyps), this patient also had a VUS, APC c.4472T > A, p.Phe1491Tyr. Both of these variants segregated with affected individuals and neither of them were found among healthy individuals from the family. Two class 5 variants were found in MUTYH, one each in patients III:71 and I:42, respectively. The c.536A > G, p.Tyr179Cys and c.1187G > A, p.Gly396Asp were found in a heterozygote state and are classified as pathogenic if found homozygote or in a compound heterozygote state.

Table 1 Truncating variants among 91 index patients
Table 2 Classification of missense variants with a MAF < 1 % among 91 index patients

Nine patients had more than one variant remaining after the filtration, including three with truncating variants in BMPR1A, PMS2 and AXIN2. The BMPR1A c.969delT variant (Table 1) was found together with one likely pathogenic variant (class 4) in CHEK2 c.470T > C, p.Ille157Thr (Table 2), in an mixed polyposis case (V:87), additionally this patient carried a CHEK2 VUS, c.190G > A, p.Glu64Lys (Table 2). A truncating variant in PMS2 c.861_864del (Table 1) was found together with the VUS APC c.7402T > C, p.Ser2468Pro (Table 2) in patient I:50. The AXIN2 c.254del (Table 1) variant and the synonymous VUS MSH2 c.1275A > G (Table 2) were both found in patient I:55.

Tumor characteristics e.g. MSI and IHC can be of value for interpretation of the VUS. For 52 of these patients we had results from only MSI tests or for both MSI and IHC tests (Supplementary Table 1).When investigating the VUS present among these patients there are some findings. Patient I:47 has a tumor which is MSI-H and present a loss of MLH1/PMS2 proteins, this patient has a VUS in MLH3, c.1870G > C, p.Glu624Gln. This VUS was also found in two patients with an MSS (I:8, I:10) tumour phenotype. The variant is interpreted differently between the in silico protein predication tools used, it has a low CADD score (17) and is quite common in the ExAc population database (0.73 %). Since tumors from patients with this variant can be both MSI or MSS, it is difficult to conclude the pathogenic effect of the variant. In two patients with a MSI-H tumour phenotype, one MSH6:c.3226C > T, p.Arg1076Cys (I:56) and one MSH2:c.2013T > A, p.Asn671Lys (I:92) VUS were found. These variants are predicted damaging by all in silico protein predication tools, exhibit a high CADD score (32 respectively 28.2) and are rare (0.0091 %) or not present in the population database ExAc. Both variants might be predicted to have a likely pathogenic effect. TCGA data (www.cbioportal.org) shows a high functional impact for the MSH6 variant. In the patients with an MSS tumor phenotype, eight unique MMR variants were found. The variants exhibit conflicting in silico protein predication results. Combined with a lower CADD score in general, the variants might be predicted to have a likely benign effect, consistent with their MSS phenotype.

Four structural variants were found and they are presented in Table 3. An individual (patient III:65) from a family with phenotypic AFAP was found to carry a 1.9 kb heterozygote deletion located 2 kb upstream of SMAD4 (hg19/chr18:g.48537165_48539080del). The deleted region includes an insulator element 200 bp in size (chr18:g.48537803-48538002). Additional upstream deletions were found in MSH3 (I:34) and CTNNB1 (I:57). Another patient (I:6) had a 24.2 kb duplication in CDH1 intron 1 (hg19/Chr16:g.68802080_68826280del).

Table 3 Structural variations detected among 91 index patients

Discussion

In this study we show the importance of using multigene panels which allows for a parallel comprehensive screening for CRC syndromes. Mutations in BMPR1A have been found in an extended phenotypic spectrum beyond juvenile polyposis, including HMPS, AFAP simplex, familial colorectal cancer type X (FCCX) and early onset CRC without familial history and MSI negative tumours [4, 8, 29, 30]. To this spectrum we add a patient with an atypical polyposis (V:87, this patient also carries two CHEK2 variants, Table 2) and three patient with unexplained adenomatous polyposis and different number of polyps. Patient IV:76 had a splice-site variant c.230 + 2T > C (class 4), II:59 had a truncating variant, c.441delT, Phe147Leufs*18 (class 5) and the last patient (II:58) had a probable pathogenic (class 4) missense mutation, c.1409 C > T, p.Met470Thr in BMPR1A. This missense mutation has previously been found in a patient with a juvenile polyposis phenotype and around 300 polyps throughout the entire gastrointestinal tract [15]. Two patients from Group I, “CRC familial or unknown inheritance not polyposis”, had variants in AXIN2. In one of these patient with late onset of CRC a truncating AXIN2 variant was found together with an MSH2 variant of unknown significance (I:55). The second patient (I:11) presented with an AXIN2 missense variant c.2051C > T, p.Ala684Val. Variants in AXIN2 have been reported in patients with CRC and oligodentia and in patients with oligodentia solely [21, 52]. It is suggested that truncating pathogenic variants in AXIN2 are more likely to predispose carriers to syndromic oligodontia and colorectal cancer compared to missense variants [25]. To our knowledge oligodonita was not present in any of our patients.

We found a large deletion in the regulatory region of SMAD4 in a patient with unexplained adenomatous polyposis (1–100 polyps) (III:65). An insulator element that may act as a barrier to enhancer action is located in the deleted region. Transcription of genes beyond the insulator is not stimulated by the enhancer when the insulator is active. This deletion might therefore have an effect on the expression of the gene. In a recent study two SMAD4 mutations in patients without juvenile polyps were identified, one with around 20–99 adenomatous polyps and the other one without reported polyps, which further extends the phenotypical spectrum for this gene [6].

There is a complexity of combinations of possible ligand receptors and downstream effectors in the BMP/TGFR-β signalling pathways and this might explain part of the genotype-phenotype relationship. There might also be a genotype-phenotype correlation depending on where in the gene the mutation is located. Several genes in the BMP/TGFR-β signalling pathway are mutated in hereditary CRC as well as sporadic CRC and possibly inactivation of also other genes in this pathway might predispose carriers to CRC. It seems as if patients with mutations in APC, BMPR1A, SMAD4 and GREM1 can have similar polyposis phenotypes but carriers of GREM1 mutation with HMPS might not have the same risk for extra-colonic disease as patients with BMPR1A mutations and HMPS [47].

In a recent multigene-panel based CRC study 1.4 % (8/586) had CHEK2 risk alleles or truncating mutations, two of the patients had the c.470T > C, p.Ile157Thr, variant and four c.1100delC alleles, all had polyps or CRC, none of them had a personal history of breast cancer, but six had at least one family member with breast cancer [6]. Around 2 % (2/91) of our patients had CHEK2 variants, V:87 had both c.190G > A, p.Glu64Lys and c.470T > C, p.Ile157Thr and I:20 had the splice variant c.319 + 2T > A, and they did present with polyps or CRC but no breast cancer has been reported in the families as we know of. Variants in CHEK2 still remain of uncertain clinical relevance as is further emphasized by the fact that V:87 also carried a truncating probably pathogenic variant in the BMPR1A gene (c.969delT). In a recent study CHEK2 variants have been found among individuals with various types of cancer, which might be partly due to the high population frequency of the common CHEK2 variants (c.1100delC and p.Ile157Thr) [45].

The truncating MLH3 mutation c.3563 C > G, p.Ser1188* was found in homozygote state in an unexplained polyposis case (IV:69) with duodenal polyps and CRC. The MLH3 protein as well as the PMS1 protein can dimerize with MLH1 and assist in single nucleotide mis-match DNA-repair, but their roles are not well understood [38]. Variants in the genes have been found in patients without a family history, in some cases also in sporadic patients and/or in healthy controls. Variants have also been found together with other MMR gene variants, suggesting PMS1 and MLH3 to be low risk genes in Lynch syndrome [17]. The clinical significance of the variant we report here, is therefore difficult to estimate. However, recently compound heterozygote loss of function (LoF) germline mutations in the MSH3 gene were identified in patients with an unexplained adenomatous polyposis. The data presented by Adam et al. strongly support disease causing MSH3 mutations to follow a recessive mode of inheritance [2]. A comparable scenario might possibly also be considered for mutations in MLH3.

When comparing the VUS in the MMR genes to the corresponding results from the MSI and IHC test of the tumours, some conclusion might be drawn concerning the pathogenicity. Two variants, one in MSH6 c.3226C > T, p.Arg1076Cys (I:56) and one in MSH2 c.2013T > A, p.Asn671Lys (I:92), that were identified in patients who presented with a MSI-H phenotype (no IHC results were available), might be predicted to be likely pathogenic. Both of these variants are predicted damaging by all the protein predication tools used, they also have a very high CADD score and are very rare or not present in the population database ExAc. TCGA data shows a high functional impact for the MSH6 variant. It is feasible to predict these variants as presumably likely pathogenic at this point until more functional data is available.

The patient (I:6) with the intronic duplication in CDH1also had breast cancer. It is known that CDH1 mutations can be found in patients with lobular breast cancer and in hereditary diffuse gastric cancer. Although no obvious functional elements are found in this region it cannot be ruled out that the duplication has an effect on the transcription or regulation of the gene.

The search for germ-line mutations in risk individuals have been focused on mutations associated with highly penetrant disease phenotypes, which include a stepwise approach leading to an expensive strategy and underestimation of familial cases [43]. The increased use of multigene panels have already shown a higher mutation detection rate compared with traditional testing based on clinical criteria [6, 18], as is also confirmed by this study. The reason for this is probably a large genetic heterogeneity and overlapping clinical presentation of the different CRC syndromes. Limited knowledge of the medical and/or family history or an atypical presentation of the CRC syndromes might lead to an incorrect diagnosis of patients. The possibility of panel-based testing is beneficial not only for the patient but also for time and cost savings. However, there is also a complexity of information that can result from a multigene-panel test. Variants may also be coincidental or explain only part of the clinical phenotype. Segregation analyses could in these cases be used to further understand the clinical significance of variants. In this study, when also structural variants are included, in total 33 % (30/91) of the patients have at least one VUS. When eliminating those with a disease-causing variant already identified 29 % (26/91) of the patients have a VUS of which the majority are located in MMR genes, in concordance also with other reports [6]. A patient without identified mutation in this study could have mutations in high penetrant recently identified genes, which were not included in this panel. Several candidate genes for both polyposis and non-polyposis syndromes have been identified [5, 11]. Multigene panels used for detection of pathogenic variants in CRC syndromes frequently include genes for which the cancer risk is not well known and management guidelines are not yet established. Classifying the genes into different categories based on these issues might be advisable [7]. The implementation of multigene-panel based technology into the clinic implies new opportunities and challenges which might also require introduction of new models for genetic counselling.