Background

Hirschsprung disease (HSCR, OMIM 142623) is a developmental disorder occurring in 1 of 5.000 live births. It is characterized by the absence of ganglion cells along variable lengths of the distal gastrointestinal tract, which results in tonic contraction of the aganglionic colon segment and functional intestinal obstruction. Such aganglionosis is associated with a delay in the entry of neural crest-derived cells into the foregut, as well as a deferred progression of enteric neural crest cells along the gut [15]. Based on the length of the aganglionic region, patients could be classified as short-segment (S-HSCR: aganglionosis up to the upper sigmoid colon, 80 % of cases), long-segment (L-HSCR: aganglionosis up to the splenic flexure and beyond, 17 % of cases) and total colonic aganglionosis forms (TCA, 3 % of cases) [1]. HSCR most commonly presents sporadically with reduced penetrance and male predominance, although it can be also familial with an autosomal dominant or autosomal recessive model of inheritance. HSCR occurs as an isolated trait in 70 % of cases and it is associated with other congenital malformation syndromes in the remaining 30 % [1, 3, 4].

Therefore, HSCR is regarded as a disorder with complex genetic basis, in which the contribution of several different loci acting in an additive or multiplicative manner is usually required to cause the disease. The RET proto-oncogene is the major susceptibility gene for HSCR since more than 80 % of identified mutations associated with HSCR are located in this gene, including both coding and noncoding variants [68]. Mutations in RET coding sequence account for up to 50 % or 7–20 % of familial and sporadic cases, respectively [1]. Other genes encoding members of a variety of signalling pathways related to enteric nervous system (ENS) development, have been also reported to be related to HSCR (GDNF, NRTN, PSPN, EDNRB, EDN3, ECE1, NTF3, NTRK3, SOX10, PHOX2B, L1CAM, ZFHX1B, KIAA1279, TCF4, PROK1, PROKR1, PROKR2, GFRA1, NRG1, SEMAPHORIN 3A, SEMAPHORIN 3C and SEMAPHORIN 3D). However, mutations in these genes only explain the minority forms of L-HSCR/TCA or syndromic forms of the disease [915].

The development of next-generation sequencing (NGS) technologies has a great impact in human mutation detection procedures given their high throughput nature. In the last 10 years we have witnessed a tremendous increase in sequencing speed paralleled by costs falling dramatically by 10.000–100.000 fold compared to the classical Sanger method [1619].

The 454-GS Junior (Roche) is a NGS sequencer that leads to a rapid sample processing. In 2012, a study of three class III semaphorin as candidate genes based on amplicon sequencing (454-GS Junior Platform) was performed in 47 HSCR samples. They reported 37 sequence variants, where 10 were unique to HSCR patients, including 5 missense mutations in these three genes that may be potentially involved in the pathogenesis of HSCR [11]. More recently, PCR-based RainDance technology and 454 FLX sequencing have been applied to analyze 62 genes in 20 Chinese HSCR patients and 20 Chinese non HSCR controls, reporting 5 rare damaging variants likely involved in the disease [20].

Here, we have used the 454 GS-Junior Platform to perform NGS-based targeted sequencing to validate the design of our panel. With such purpose, we selected a group of 11 patients carrying a total of 18 different variants, previously identified by Sanger method, in any of the genes included in the panel. After panel validation, we determined the set of candidate variants carried by our patients in these HSCR-associated genes.

Methods

Patients and control subjects

Our study involved a total of 11 Spanish HSCR index patients, comprising a male: female ratio equal to 10:1 with different phenotypic features (two with TCA, four with L-HSCR, four with S-HSCR and one with not available data) (Table 1). All patients were referred to our Department of Genetics, Reproduction and Fetal Medicine. Additionally, we had a total of 26 DNA samples from available family members of our patients that were used to perform subsequent segregation analysis of the new identified variants.

Table 1 Description of the 11 patients included in the study, detailing all of the variants previously detected

We also included a group of 200 healthy control subjects comprising unselected, unrelated, race, age, and sex-matched individuals, to determine the allelic frequency of the new variants in our population.

All subjects underwent peripheral blood extraction for genomic DNA isolation using MagNA Pure LC system (Roche, Indianapolis, IN) according to the manufacturer’s instructions. DNA samples were stored at −80 °C until needed for further analyses.

Ethics statement

A written informed consent was obtained from all the participants for clinical and molecular genetic studies. The study was approved by the Ethics Committee for clinical research in the University Hospital Virgen del Rocío (Seville, Spain) and complies with the tenets of the declaration of Helsinki.

Design of the capture panel and estimation of panel yield

A capture panel of HSCR related genes was designed by our group and the final file was submitted to Roche NimbleGen (Roche NimbleGen Inc., Madison, WI, USA) to synthesize the hybridization probes. The probes covered 235 regions (exons and closer introns) of 26 known HSCR genes with a total of 44.196 bp in design region (Additional file 1). Flanking sequences were also detected by our sequencer, raising the number to 62.515 bp.

Sensitivity and specificity of the panel were calculated according to methods previously described [21]. Regarding sensitivity, it was calculated as the percentage of variants previously detected by conventional Sanger sequencing that the panel is able to detect. This was tested with 18 variants previously diagnosed (SNVs, insertions and deletions). The specificity was calculated as the percentage of variants detected by the panel that conform to sequencing quality controls and are validated by Sanger sequencing, and therefore are true variants.

DNA library preparation and targeted sequencing

Library preparation was performed according to the manufacturer’s protocol [SeqCap_EZ_Library_LR_Guide_v2.0 and SeqCap_EZ_LR_DoubleCapture_Rapid_v1p4_2 protocols (Roche NimbleGen Inc., Madison, WI, USA)]. Briefly, 500 ng of genomic DNA was fragmented among 500–1500 bp, then end repaired and ligated with adaptors. The library was amplified by precapture linker-mediated PCR (LM-PCR). After purification, 1 μg LM-PCR product was hybridized to custom designed SeqCap EZ Library (Roche NimbleGen, Madison, WI, USA). After washing, amplification was performed with post-capture LM-PCR. This process was repeated twice. The final concentration of each captured library was measured with Quant-iT PicoGreen dsDNA Assay Kit (Invitrogen, Carlsbad, CA, USA) and diluted at 106 molecules/μl. To perform the emulsion PCR, a 0.7 molecule per bead ratio was chosen. After enrichment, a maximum of 250.000 beads were sequenced on 454-GS Junior (Roche) sequencer according to the manufacturer’s protocol (Sequencing Method Manual GS Junior, Titanium Series).

Bioinformatic analyses of the sequencing results

Sequencing reads were aligned to human hg19 reference by GS Reference Mapper software (Roche, version 2.7). Improperly mapped reads were filtered out with the SAMtools package. The BEDtools package was applied to analyze the coverage and the percentage of covered bases. Variant calling was performed with GATK (Genome Analysis Toolkit, version 1.4 for SNVs and 1.0 for INDELs). A minimum of 6X coverage was required for every detected variant; at least 25 % of total reads were needed to support the variant allele and variants with a disequilibrium between forward or reverse < 15 % were removed. Sequence variation annotation was performed using VARIant ANalysis Tool (version 2.1.0) [22]. Annotated variants present in NCBI dbSNP [23] and 1000 Genomes project [24] databases with a minor allele frequency (MAF) > 0.05 were discarded. The remaining variants were compared with human mutation databases such as HGMD [25] and ClinVar [26], to detect known disease-associated variants previously identified by Sanger method. Additional novel sequence variants identified were further prioritized considering their inheritance and type of changes. Candidate variants were obtained based on two criteria:

  1. 1)

    New variants only present in one patient:

    In a first step of the analysis, we discarded variants registered on Ensembl [27] and dbSNP databases. Only exonic and closer intronic regions were selected. All new detected variants were searched in 1000 Genomes and Exome Variant Server [28] to confirm their status of “new variant”.

  2. 2)

    Variants registered in databases:

    Variants with MAF < 0.05 present in Biomart [29] and Variant Effect Predictor [27] were considered. All data were managed with the online web tool Galaxy Project [3032].

Assessment of the pathogenicity of variants

The in silico prediction tools used were: SIFT [33] and PolyPhen2 [34], to establish the pathogenicity of amino acidic changes; the ENCODE Project [35] to determine the location of variants in regulatory regions; The Berlekey Drosophila Genome project [36], to study splice-site changes; MUpro [37] and I-Mutant2.0 [38] for prediction of protein stability and UniProt [39] to determine the protein domains where the variants were located.

Criteria to select patients after NGS analyses for further discussion

After NGS analyses, we selected patients based on their new variants detected by this study, in compilation with their previous known genetic background. We excluded those ones who: 1) carry one or several previously described variant(s) that could explain the phenotype of the patient and/or 2) the new variants detected in this study were predicted as benign or they were located at regulatory regions which would require additional studies to ascertain their role in the gene function.

Sanger validation and segregation analyses

All putative HSCR-related variants and 4 panel regions with insufficient coverage by NGS were validated by Sanger sequencing. DNA sequences were obtained from Ensembl and Primer3 [40, 41] was used for primer design (data and conditions available under request). The products were sequenced by an automated sequencer 3730 DNA analyzer (Applied Biosystems®). Variants were analyzed with the program DNASTAR® Lasergene 8 SeqMan Pro™ (DNAstar, Madison, WI) [42]. All variants were tested for segregation in all available family members by Sanger sequencing and analyzed in a group of 200 healthy control subjects.

Dataset was submitted to the European Nucleotide Archive with an accession number PRJEB7384.

Results and discussion

Panel yield

The average percentage of covered bases was 97 % and the median percentage of reads on target of our panel was 82.5 %. The high mean coverage obtained (422X) could be explained by the small size of the panel (less than 50.000 base pairs) (Table 2). From the 235 regions contained in the panel, 231 regions had a minimum coverage above 6X. Moreover, 91.3 % of bases had coverage above 20X. Both sensitivity and specificity were of 94 and 82.8 % respectively.

Table 2 Summary of statistics of targeted sequencing in our patients

Validation of the panel and detection of new variants

The two main goals of this approach were both the validation of our panel, using variants previously identified by Sanger method in our series of patients (Table 1), and the discovery of new variants that could help to further define the complex genetic basis of the pathology in each patient (Table 3). An average of 200 different SNVs was detected in each patient. After the application of stringent filter criteria, a range of 1 to 4 candidate variants per patient were selected. The SNV validation rate was 88 %. In addition, 6 INDELs were selected for further analysis and 3 were validated by Sanger.

Table 3 New variants detected by NGS-based targeted sequencing in all patients

After exclusion of all false-positives, validation and segregation analyses were performed. A total of 13 new different coding variants potentially involved in HSCR were obtained and only 5 were previously described. In addition, we identified 11 new non-coding variants in regulatory regions, most of them with an in silico prediction of affecting enhancer, promoter and/or CCCTC-binding sites (CTCF) (Table 3). It has been previously determined the critical role of regulatory variants in intronic regions, mainly a common RET variant (rs2435357; 10:g.43086608 T > C) located in a gut-specific RET enhancer element in intron 1 [8]. A higher focus on these kind of variants would be interesting in further studies because most of NGS targeted studies are limited to present coding variants, but non-coding variants located in regulatory regions can also affect the gene expression and thereby, the phenotype of disease.

Contributions of new variants

The previously known genetic background of our patients, together with the new variants found, allowed us to define more precisely the molecular basis of the disease in 4 of the 11 patients (numbers 2, 3, 5 and 8) (Fig. 1). The remaining 7 cases were not found to carry any new relevant variant that contributed to better explain their phenotype.

Fig. 1
figure 1

Family trees of patients 2, 3, 5 and 8. All previously identified variants and the new ones found in this study were included. Symbols: V = variant; arrow = patient included in the study; genotypes: − = wildtype allele; + = non-standard allele; * = not available DNA

Of note, patients 2, 3 and 5 presented alterations in class-III semaphorin and in GFRα receptor genes simultaneously (Fig. 1 and Table 3). Several families of molecules implicated in attractive and repulsive guidance are involved in axon guidance, such as semaphorins and GDNF. Some crucial mechanisms in HSCR are mediated by GDNF, which requires GFRα1 as a co-receptor for optimal ligand binding and activation, and both act as chemoattractants to promote neurite outgrowth [43, 44]. Based on these previous studies, we hypothesize that an additive effect of variants in both semaphorins (involved in cell migration) and GFRα receptors (related to proliferation and cell survival) may act as modifier in HSCR. Recently, it has been demonstrated that Sema3C/3D signaling is an evolutionarily conserved regulator of ENS development and its dys-regulation leads to enteric aganglionosis [45]. Paratcha and Charoy functionally showed the interplay between GDNF and GFRα, as well as SEMAs and GDNF signaling during axon guidance, respectively. Charoy et al. analyzed single and double mutant mouse models to confirm that gdnf is the principal trigger of Sema3B, acting with NrCAM. In addition, genetic and in vitro experiments provide evidence that this gdnf effect is mediated by NCAM/GFRα1 signaling. In conclusion, our observations suggest a potential combination of variants in these genes that could contribute to disease, based on the demonstrated interplay among this type of molecules, although further functional and statistical studies would be required for confirmation.

Patient 2 (L-HSCR) presented a previously known SEMA3D p.Arg634Gln variant, with maternal inheritance and a damaging/benign in silico prediction by SIFT/Polyphen respectively. We have identified four new heterozygous variants. The most relevant one was GFRA1 p.Gln222Leu, with paternal inheritance and an in silico prediction of tolerated/possibly damaging (Fig. 1, Tables 3 and 4). This patient could fit in the additive model proposed for HSCR based on the paternal and maternal inheritance of his variants. As we mentioned before, the joint effect of variations in SEMAs and GFRAs genes could help to gain insight into the genetic basis of the disease in this patient.

Table 4 In silico predictions of functional effect for most relevant variants in patients 2, 3, 5 and 8

Patient 3 (L-HSCR) was previously known to carry PROK1 p.Arg48Trp variant with a probably damaging in silico prediction. Our group previously published that PROK1 may participate in a complementary signalling to the RET/GFRα1/GDNF pathway, giving support to the proliferation/survival and differentiation of precursor cells during ENS development [14]. From the new variants detected in this patient (Fig. 1, Tables 3 and 4) and based on both the in silico probably damaging prediction and the described interconnection among these genes, we suggest that GFRA1 p.Tyr85Asn could interact with SEMA3D p.His424Gln and thus, together with PROK1 p.Arg48Trp variant, would contribute to better understand the genetic basis of HSCR in this case.

Patient 5 (S-HSCR) had a previously known variant in PROKR1 p.Lys354Asn with an in silico prediction of tolerated/probably damaging. In this study, he was found to carry a synonymous variant in SEMA3C p.Ala571Ala, predicted as pathogenic due to the alteration of an exonic splicing enhancer site (ESE) (Fig. 1, Tables 3 and 4). The ESE sites are targeted essentially by Serine/Arginine-rich proteins defining the splice-sites within the exons [46]. Genomic variations causing aberrant splicing may represent up to 50 % of all mutations that lead to gene dysfunction and pathology [4749]. Furthermore, patient 5 showed a new variant in GFRA2 p. Arg94Cys with damaging prediction (Table 4). GFRA2 had been previously evaluated as a candidate gene for HSCR in just one previous study [50]. Six coding variants were identified, but only 2 led to an amino acidic change at protein level. Both changes were located at the C-terminus of GFRA2, a region which is not crucial for GFRα binding to either RET or GDNF family members. The authors concluded that GFRΑ2 variants were unlikely to represent a common genetic cause or modifier of the HSCR phenotype. In contrast, our analyses have revealed a heterozygous C > T variant in exon 2 of GFRA2 gene, that causes a highly conserved arginine-94 residue substitution to a cysteine residue in the cysteine rich domain of the receptor. In silico predictions suggest that p.Arg94Cys variant would decrease the stability of protein structure and it could be a non-neutral change. Our results would suggest that GFRA2 might be reconsidered as a candidate gene for HSCR.

Finally, patient 8 (L-HSCR), who presented a known variant in NTF3 p.Gly76Arg (benign prediction), was found to carry GFRA1 p.Tyr85Asn (pathogenic prediction) (Fig. 1, Tables 3 and 4) as well. Different studies have described the association of polymorphisms with HSCR, which might suggest the possibility to consider GFRA1 p.Tyr85Asn as a putative susceptibility factor in this patient [51, 52]. However, to confirm this hypothesis, further case-control studies in additional series of patients are required.

Conclusions

We have validated the high capacity of the NGS targeted sequencing to detect SNVs, which accounts for most of the variants, pathogenic or not, in the genes included in the panel. Many of the possible insertions and deletions detected by NGS with the 454 GS-Junior (around a thousand for each patient) were false positives due to the limitations of the technique to detect this type of variants [53]. Our study also provides a higher coverage of the included regions and a manageable amount of data to be analyzed than other studies [54]. Additional newly discovered HSCR-linked genes could be included in panels similar to ours due to their flexibility. Also, this design could be adapted to different sequencing platforms.

Our validated NGS panel has resulted in a fast, effective and easy method to characterize the genetic background in our patients and to identify new variants that could be associated to HSCR. Our results expand the previously known set of variants carried by these patients and further support the feasibility of using NGS targeted sequencing in diseases with complex genetic basis such as HSCR. Moreover, this technique may help in the understanding of the genetic and molecular basis of the disease, providing a new tool in clinical practice to simultaneously analyze many genes as well as to identify several molecular events contributing to the phenotype.