Prioritization of genes driving congenital phenotypes of patients with de novo genomic structural variants
Genomic structural variants (SVs) can affect many genes and regulatory elements. Therefore, the molecular mechanisms driving the phenotypes of patients carrying de novo SVs are frequently unknown.
We applied a combination of systematic experimental and bioinformatic methods to improve the molecular diagnosis of 39 patients with multiple congenital abnormalities and/or intellectual disability harboring apparent de novo SVs, most with an inconclusive diagnosis after regular genetic testing.
In 7 of these cases (18%), whole-genome sequencing analysis revealed disease-relevant complexities of the SVs missed in routine microarray-based analyses. We developed a computational tool to predict the effects on genes directly affected by SVs and on genes indirectly affected likely due to the changes in chromatin organization and impact on regulatory mechanisms. By combining these functional predictions with extensive phenotype information, candidate driver genes were identified in 16/39 (41%) patients. In 8 cases, evidence was found for the involvement of multiple candidate drivers contributing to different parts of the phenotypes. Subsequently, we applied this computational method to two cohorts containing a total of 379 patients with previously detected and classified de novo SVs and identified candidate driver genes in 189 cases (50%), including 40 cases whose SVs were previously not classified as pathogenic. Pathogenic position effects were predicted in 28% of all studied cases with balanced SVs and in 11% of the cases with copy number variants.
These results demonstrate an integrated computational and experimental approach to predict driver genes based on analyses of WGS data with phenotype association and chromatin organization datasets. These analyses nominate new pathogenic loci and have strong potential to improve the molecular diagnosis of patients with de novo SVs.
KeywordsStructural variation Copy number variants Neurodevelopmental disorders Intellectual disability Multiple congenital anomalies Driver genes Whole-genome sequencing Transcriptome sequencing Topologically associating domains Position effects
Human Phenotype Ontology
Reads per kilobase per million mapped reads
Single nucleotide variant
Topologically associating domain
Variant of unknown significance
De novo constitutional structural variations (SVs) including deletions, duplications, inversions, insertions, and translocations are important causes of (neuro-)developmental disorders such as intellectual disability and autism spectrum disorder (ASD) [1, 2]. Clinical genetic centers routinely use microarrays, as well as karyotyping in some cases, to detect SVs at kilo- to megabase resolution . The interpretation of the pathogenicity of an SV mainly relies on finding overlap with SVs in other patients with similar phenotypes [4, 5]. SVs can affect large genomic regions which can contain many genes and non-coding regulatory elements . This makes it challenging to determine which and how specific affected gene(s) and regulatory elements contributed to the phenotype of a patient. Therefore, the causative genes driving the phenotype are frequently unknown for patients with de novo SVs which can hamper conclusive genetic diagnosis.
SVs can have a direct effect on the expression and functioning of genes by altering their copy number or by truncating their coding sequences . In addition, SVs can indirectly influence the expression of adjacent genes by disrupting the interactions with their regulatory elements . New developments in chromatin conformation capture (3C)-based technologies such as Hi-C have provided the means to study these indirect, position effects . Most of the genomic interactions (loops) between genes and enhancers occur within megabase-sized topologically associating domains (TADs). These domains are separated from each other by boundary elements characterized by CTCF-binding, which limit the interactions between genes and enhancers that are not located within the same TAD [8, 9]. For several loci, such as the EPHA4 , SOX9 , IHH , and Pitx  loci, it has been demonstrated that disruption of TAD boundaries by SVs can cause rewiring of genomic interactions between genes and enhancers, which can lead to altered gene expression during embryonic development and ultimately in disease phenotypes . Although the organization of TADs appears to be stable across cell types, sub-TAD genomic interactions between genes and regulatory elements have been shown to be relatively dynamic and cell type-specific . Disruptions of genomic interactions are therefore optimally studied in disease-relevant cell types, which may be obtained from mouse models or from patient-derived induced pluripotent stem cells. However, it is not feasible to study each individual locus or patient with such elaborate approaches, and disease-relevant tissues derived from patients are usually not available. Therefore, it is not yet precisely known how frequently position effects contribute to the phenotypes of patients with developmental disorders.
A few computational tools such as SVScore and the Ensembl Variant Effect Predictor have been developed to predict the pathogenicity of SVs, but these mainly predict the potential direct impact of SVs on genes and do not take the specific phenotype of the patient into account [16, 17]. It has been shown that the use of computational methods based on combining phenotypic information from the Human Phenotype Ontology (HPO) database (phenomatching) with previously published chromatin interaction datasets can improve the interpretation of the molecular consequences of de novo SVs [18, 19, 20]. These approaches have largely been based on data derived from a small set of cell types and techniques. Here, we further expand these in silico approaches by integrating detailed phenotype information with genome-wide chromatin conformation datasets of many different cell types. By combining this method with whole-genome and transcriptome sequencing, we predicted which genes are affected by the SVs and which of these genes have likely been involved in the development of the disease phenotype (e.g., candidate driver genes). Accurate characterization of the effects of SVs on genes can be beneficial for the prediction of potential clinical relevance of the SVs. Detailed interpretation of the molecular effects of the SVs helped to identify candidate driver genes in 16 out of 39 patients who had an inconclusive diagnosis after conventional genetic testing. By applying the computational method on larger cohorts of patients with de novo SVs, we estimated the contribution of position effects for both balanced and unbalanced SVs.
Patient selection and phenotyping
A total of 39 individuals with de novo germline SVs and an inconclusive diagnosis were included in this study. Individuals P1 to P21 and their biological parents were included at the University Medical Center Utrecht (the Netherlands) under study ID NL55260.041.15 15-736/M. Individual P22, previously described by Redin et al. as UTR22 , and her parents were included at the San Luigi University Hospital (Italy). For individuals P23 to P39, lymphoblastoid cell lines (LCL) were previously derived as part of the Developmental Genome Anatomy Project (DGAP) of the Brigham and Women’s Hospital and Massachusetts General Hospital, Boston, MA, USA . Written informed consent was obtained for all included individuals and parents, and the studies were approved by the respective institutional review boards.
DNA and RNA extraction
Peripheral blood mononuclear cells (PBMCs) were isolated from whole blood samples of individuals P1 to P22 and their biological parents using a Ficoll-Paque Plus gradient (GE Healthcare Life Sciences) in SepMate tubes (STEMCELL Technologies) according to the manufacturer’s protocols. LCL derived from individuals P23 to P39 were expanded in RPMI 1640 medium supplemented with GlutaMAX (Thermo Fisher Scientific), 10% fetal bovine serum, 1% penicillin, and 1% streptomycin at 37 °C. LCL cultures of each individual were split into three flasks and cultured separately for at least 1 week to obtain technical replicate samples for RNA isolation. Genomic DNA was isolated from the PBMCs or LCL using the QIASymphony DNA kit (Qiagen). Total RNA was isolated using the QIAsymphony RNA Kit (Qiagen), and RNA quality (RIN > 8) was determined using the Agilent RNA 6000 Nano Kit.
Purified DNA was sheared into fragments of 400–500 bp using a Covaris sonicator. WGS libraries were prepared using the TruSeq DNA Nano Library Prep Kit (Illumina). WGS libraries were sequenced on an Illumina Hiseq X instrument generating 2 × 150 bp paired-end reads to a mean coverage depth of at least × 30. The WGS data was processed using an in-house Illumina analysis pipeline (https://github.com/UMCUGenetics/IAP). Briefly, reads were mapped to the CRCh37/hg19 human reference genome using BWA-0.7.5a using “BWA-MEM -t 12 -c 100 -M -R” . GATK IndelRealigner  was used to realign the reads. Duplicated reads were removed using Sambamba markdup .
Structural variant calling and filtering
Raw SV candidates were called with Manta v0.29.5 using standard settings  and Delly v0.7.2  using the following settings: “-q 1 -s 9 -m 13 -u 5.” Only Manta calls overlapping with breakpoint junctions called by Delly (± 100 bp) were selected. Rare SVs were selected by filtering against SV calls of 1000 Genomes  and against an in-house database containing raw Manta SV calls of ~ 120 samples (https://github.com/UMCUGenetics/vcf-explorer). De novo SVs were identified in individuals P1 to P22 by filtering the SVs of the children against the Manta calls (± 100 bp) of the father and the mother. Filtered SV calls were manually inspected in the Integrative Genome Viewer (IGV). The conformations of the complex derivative chromosomes were manually reconstructed based on genomic orientations of the filtered SV calls. De novo breakpoint junctions of individuals P1 to P21 were validated by PCR using AmpliTaq gold (Thermo Scientific) under standard cycling conditions and by Sanger sequencing. Primers were designed using Primer3 software (Additional file 1: Table S1). Breakpoint junction coordinates for individuals P22 to P39 were previously validated by PCR [21, 28].
Single nucleotide variant filtering
Single nucleotide variants and indels were called using GATK HaplotypeCaller. For individuals P1 to P21 (whose parents were also sequenced), reads overlapping exons were selected and the Bench NGS Lab platform (Agilent-Cartagenia) was used to detect possible pathogenic de novo or recessive variants in the exome. The identified single nucleotide variants were classified according to the American College of Medical Genetics and Genomics (ACMG) criteria. De novo variants were only analyzed if they affect the protein structure of the genes that are intolerant to missense and loss-of-function variants. Only putative protein-changing homozygous and compound heterozygous variants with an allele frequency of < 0.5% in ExAC  were reported.
RNA sequencing and analysis
RNA-seq libraries were prepared using TruSeq Stranded Total RNA Library Prep Kit (Illumina) according to the manufacturer’s protocol. RNA-seq libraries were pooled and sequenced on a NextSeq500 (Illumina) in 2 × 75 bp paired-end mode. Processing of RNA sequencing data was performed using a custom in-house pipeline (https://github.com/UMCUGenetics/RNASeq). Briefly, reads were aligned to the CRCh37/hg19 human reference genome using STAR 2.4.2a . The number of reads mapping to genes were counted using HTSeq-count 0.6.1 . Genes overlapping with SV breakpoints (e.g., truncated genes) were also analyzed separately by counting the number of reads mapping to exons per truncated gene fragment (up- and downstream of the breakpoint junction). RNA-seq data obtained from PBMCs (individuals P1 to P22) and LCL (individuals P23 to P39) were processed as separate datasets. The R-package DESeq2 was used to normalize raw read counts and to perform differential gene expression analysis for both datasets separately . Genes with more than 0.5 reads per kilobase per million (RPKM) mapped reads were considered to be expressed.
Cutoffs used to classify affected genes as T1, T2, or T3 candidate driver genes
1. Phenotype association
Disease association score (0–5)
pLI > 0.9
RVIS < 10
HI < 10
Total phenomatch score
Phenomatches (% of HPO terms with phenomatch score > 5)
Mode of inheritance
2. Effect of SV on gene
Support score (0–6)
3. Driver classification
Phenotype association + effect of SV on gene
Weak + weak
Strong + weak
Medium + strong
Strong + strong
Computational prediction of the effects of SVs on genes
For each patient, the protein-coding genes located at or adjacent (< 2 Mb) to the SVs were selected. The HPO terms linked to these genes in the HPO database were matched to each individual HPO term assigned to the patient and to the combination of the patient’s HPO terms. For each gene, the number of phenomatch scores higher than 1 (low phenomatches) and higher than 5 (high phenomatches) with individual patient HPO terms was calculated. The strength of the association (none, weak, medium, or strong) of each selected gene with the phenotype of the patient was determined based on the total phenomatch score, the number of low and high phenomatches, the mode of inheritance, and the disease association score (Table 1, Additional file 2: S1a).
Subsequently, potential direct and indirect effects of the SVs (none, weak, or strong) on the genes were predicted (Table 1, Additional file 2: Figure S1a). The prediction analyses were based on chromatin organization and epigenetic datasets of many different cell types obtained from previous studies (see Additional file 1: Table S2 for data sources).
First, we determined which TADs of 20 different cell types overlapped with the de novo SVs and which genes were located within these disrupted TADs [34, 35, 36] (Additional file 2: Figure S1b). To determine if the disrupted portions of the TADs contained regulatory elements that may be relevant for the genes located in the affected TADs, we selected the 3 cell types in which the gene is highly expressed based on RNA-seq data from the Encode/Roadmap projects  reanalyzed by Schmitt et al.  (Additional file 2: Figure S1C). The number of active enhancers (determined by chromHMM analysis of Encode/Roadmap ChIP-seq data ) in the TADs up- and downstream of the breakpoint junction in the 3 selected cell types was counted (Additional file 2: Figure SS1D). Virtual 4C was performed by selecting the rows of the normalized Hi-C matrices containing the transcription start site coordinates of the genes. The v4C profiles were overlapped with the breakpoint junctions to determine the portion of interrupted Hi-C interactions of the gene (Additional file 2: Figure S1e). In addition, promoter capture Hi-C data of 22 tissue types [38, 39, 40, 41] and DNAse hypersensitivity site (DHS) connections  were overlapped with the SV breakpoints to predict disruption of long-range interactions over the breakpoint junctions (Additional file 2: Figure S1f). Genes with at least a weak phenotype association and a weak SV effect are considered as T3 candidate genes. Genes were classified as T1 candidate drivers if they have a strong association with the phenotype and are strongly affected by the SV. Genes classified as T2 candidate driver can have a weak/medium phenotype association combined with a strong SV effect or they can have a medium/strong phenotype association with a weak SV effect (Fig. 2a, Table 1).
SV and phenotype information large patient cohorts
Breakpoint junction information and HPO terms for 228 individuals (excluding the individuals already included in this study for WGS and RNA-seq analysis) with mostly balanced SVs were obtained from Redin et al. . Phenotype and genomic information for 154 patients with de novo copy number variants ascertained by clinical genomic arrays were obtained from an in-house patient database from the University Medical Center Utrecht (the Netherlands).
WGS reveals hidden complexity of de novo SVs
In silico phenomatching approach links directly affected genes to phenotypes
Subsequently, we performed RNA sequencing on primary blood cells or lymphoblastoid cell lines derived from all 39 individuals to determine the impact of de novo SVs on RNA expression of candidate driver genes. RNA sequencing confirmed that most expressed genes directly affected by de novo deletions show a reduced RNA expression (97 of 107 genes with a median reduction of 0.46-fold compared to non-affected individuals) (Fig. 2d). Although duplicated genes show a median of 1.44-fold increase in expression, only 14 of 43 (~ 30%) of them are significantly overexpressed compared to the expression levels in non-affected individuals. In total, 87 genes are truncated by SVs and 4 of these are classified as T1/T2 candidate drivers. The genomic rearrangements lead to 12 possible fusions of truncated genes, and RNA-seq showed an increased expression for 2 gene fragments due to the formation of a fusion gene (Additional file 2: Figure S4, Additional file 1: Table S5). None of the genes involved in the formation of fusion genes were associated with the phenotypes of the patients, although we cannot exclude an unknown pathogenic effect of the newly identified fusion genes. We could detect expression for 3 deleted and 2 duplicated T1/T2 candidate drivers, and these were differentially expressed when compared to controls. The RNA sequencing data suggests that most genes affected by de novo deletions show reduced RNA expression levels and limited dosage compensation. However, increased gene dosage by de novo duplications does not always lead to increased RNA expression, at least in the blood cells of patients.
Prediction of position effects of de novo SVs on neighboring genes
In 28 of the included cases (72%), our prioritization method did not predict T1/T2 candidate driver genes that are directly affected by the de novo SVs. Therefore, we investigated the position effects on the genes surrounding the de novo SVs to explain the phenotypes in those cases that were not fully explained by directly affected candidate driver genes. We extended our candidate driver gene prioritization analysis by including all the protein-coding genes located within 2 Mb of the breakpoint junctions, as most chromatin interactions are formed between the loci that are less than 2 Mb apart from each other . Of the 2754 genes adjacent to the SVs, 117 are moderately to strongly associated with the specific phenotypes of the individuals based on the phenotype association analysis. However, this association with the phenotype does not necessarily mean that these genes located within 2 Mb of the breakpoint junctions are really affected by the SVs and thus contributing to the phenotype. To determine if the regulation of these genes was affected, we first evaluated the RNA expression levels of those genes. Three quarters (81/117) of the genes linked to the phenotypes were expressed, but only 9 of these showed reduced or increased expression (Fig. 2d). However, RNA expression in the blood may not always be a relevant proxy for most neurodevelopmental phenotypes [61, 62]. Therefore, we developed an extensive in silico strategy to predict potential disruption of the regulatory landscape of the genes surrounding the SVs (Additional file 2: Figure S1). Because the interactions between genes and their regulatory elements are cell type-specific, a large collection of tissue-specific Hi-C, TAD, promoter capture Hi-C (PCHiC), DNase hypersensitivity site (DHS), RNA, and ChIP-seq datasets was included (Additional file 1: Table S2). Several embryonic and neural cell type (such as fetal brain and neural progenitor cells) datasets were included that may be especially relevant to study the neurodevelopmental phenotypes in our cohort.
To predict potential disruption of the regulatory landscape of genes, we first selected for each of the assessed cell types the (1) TADs [34, 35, 36], (2) the PCHiC interactions [38, 39, 40, 41], and (3) DHS connections  overlapping with the transcription start site of each gene adjacent to the SVs. We overlapped these gene-specific genome conformation features with the breakpoint junctions of the identified SVs to determine the proportion of disrupted genomic interactions for each gene (the “Methods” section, Additional file 2: Figure S1). We also counted the number of enhancers (which are active in cell types in which the genes show the highest RNA expression ) that are located on disrupted portions of the TADs. Additionally, we performed virtual 4C (v4C) for each gene by selecting the rows of the normalized Hi-C matrices containing the transcription start site coordinates of the genes as viewpoints, because the coordinates of TAD boundaries can be dependent on the calling method and the resolution of the Hi-C [63, 64, 65] and because a significant portion of genomic interactions crosses TAD boundaries . Integrated scores for TAD disruption, v4C disruption, potential enhancer loss, disruption of PCHiC interactions, and DHS connections were used to calculate a position effect support score for each gene (Additional file 2: Figure S1). Finally, indirectly affected genes were classified as tier 1, 2, or 3 candidate drivers based on a combination of their association with the phenotype and their support score (Fig. 2a, Table 1).
Of the 117 genes that were associated with the phenotypes and located within 2 Mb of the SVs, 16 genes were predicted to be affected by the SVs based on the in silico analysis and therefore classified as T1/T2 candidate driver gene (Fig. 2b, Additional file 2: Figure S5). The validity of the approach was supported by the detection of pathogenic position effects identified in previous studies. For example, the regulatory landscape of SOX9 was predicted to be disturbed by a translocation 721 kb upstream of the gene in individual P5, whose phenotype is mainly characterized by acampomelic campomelic dysplasia with Pierre-Robin syndrome (PRS) including a cleft palate (Additional file 2: Figure S6). SVs in this region have been predicted to disrupt interactions of SOX9 with several of its enhancers further upstream, leading to phenotypes similar to the phenotype of individual P5 [66, 67]. In individual P39, who has been previously included in other studies, our method predicted a disruption of FOXG1 expression regulation due to a translocation (Additional file 2: Figure S1), further supporting the hypothesis that deregulation of FOXG1 caused the phenotype of this individual [21, 68].
Another example of a predicted position effect is the disruption of the regulatory landscape of the HOXD locus in individual P22. This individual has complex genomic rearrangements consisting of 40 breakpoint junctions on 4 different chromosomes likely caused by chromothripsis . One of the inversions and 1 of the translocations are located in the TAD upstream (centromeric) of the HOXD gene cluster (Fig. 2c). This TAD contains multiple enhancers that regulate the precise expression patterns of the HOXD genes during the development of the digits [43, 69, 70]. Deletions of the gene cluster itself, but also deletions upstream of the cluster, are associated with hand malformations [71, 72, 73]. The translocation in individual P22 disrupts 1 of the main enhancer regions (the global control region (GCR)), which may have led to altered regulation of the expression of HOXD genes, ultimately causing brachydactyly and clinodactyly in this patient.
Our approach predicted position effects on T1/T2 candidate driver genes in ten included cases (26%) of which eight cases have balanced or complex SVs. This suggests that these effects may be especially important for balanced SVs.
Prediction of driver genes improves molecular diagnosis
In silico driver gene prediction in larger patient cohorts
Subsequently, we also assessed the value of our driver prioritization approach for individuals with unbalanced copy number variants. We collected genetic and phenotypic information of 154 patients with a broad spectrum of (neuro-)developmental disorders who harbor de novo copy number variants (< 10 Mb) that were identified by clinical array-based copy number profiling (Additional file 2: Figure S7a,b; Additional file 1: Table S7). The CNVs in the majority (83%) of these individuals have been previously classified as pathogenic according to the clinical genetic diagnostic criteria (Additional file 2: Figure S7b). These criteria are mostly based on the overlap of the CNVs with CNVs of other individuals with similar phenotypes, and the causative driver genes were typically not previously specified. Our method identified T1/T2 candidate driver genes in 88/154 (57%) individuals, including 9/26 individuals with CNVs previously classified as VUS (Fig. 4a, Additional file 1: Table S6). Interestingly, support for position effects on candidate drivers was only found in 11% of the cases with CNVs, suggesting that pathogenic position effects are more common in patients with balanced SVs than in patients with unbalanced SVs (Fig. 4b). No driver genes were identified for 39% of the previously considered pathogenic CNVs (based on recurrence in other patients). In some cases, the potential drivers may remain unidentified because of incompleteness of the HPO database or insufficient description of the patient’s phenotypes. However, given the WGS results described for our patient cohort, it is also likely that some complexities of the CNVs may have been missed by the array-based detection method. The data also suggests that many disease-causing genes or mechanisms are still not known and that some SVs are incorrectly classified as pathogenic.
More than half of the patients with neurodevelopmental disorders do not receive a diagnosis after regular genetic testing based on whole-exome sequencing and microarray-based copy number profiling . Furthermore, the molecular mechanisms underlying the disease phenotype often remain unknown, even when a genetic variant is diagnosed as (potentially) pathogenic in an individual, as this is often only based on recurrence in patients with a similar phenotype. Here, we applied an integrative method based on WGS, computational phenomatching and prediction of position effects to improve the diagnosis, and molecular understanding of the disease etiology of individuals with de novo SVs.
Our WGS approach identified additional complexities of the de novo SVs previously missed by array-based analysis in 7 of 18 cases, supporting previous findings that WGS can have an added value in identifying additional SVs that are not routinely detected by microarrays [50, 80, 81]. Our results indicate that duplications in particular are often more complex than interpreted by microarrays, which is in line with previous studies [48, 82]. WGS can therefore be a valuable follow-up method to improve the diagnosis particularly of patients with copy number gains classified as VUS. Knowing the exact genomic location and orientation of SVs is important for the identification of possible position effects.
To systematically dissect and understand the impact of de novo SVs, we developed a computational tool based on integration of HiC, RNA-seq, and ChIP-seq datasets to predict position effects of SVs on the regulation of gene expression. We combined these predictions with phenotype association information to identify candidate driver genes. In 9/39 of the complex cases, we identified candidate drivers that are directly affected by the breakpoint junctions of the SVs. Position effects of SVs have been shown to cause congenital disorders, but their significance is still unclear . Our method predicted position effects on genes associated with the phenotype in 28% and 11% of all studied cases with balanced and unbalanced de novo SVs, respectively. Previous studies estimated that disruptions of TAD boundaries may be the underlying cause of the phenotypes of ~ 7.3% patients with balanced rearrangements  and of ~ 11.8% of patients with large rare deletions . Our method identified a higher contribution of position effects in patients with balanced rearrangements mainly because our method included more extensive chromatin conformation datasets and also screened for effects that may explain smaller portions of the phenotypes. Our method, although it incorporates most of all published chromatin conformation datasets on untransformed human cells, focuses on the disruptions of interactions, which is a simplification of the complex nature of position effects. It gives an insight in the potential effects that lead to the phenotypes and prioritizes candidates that need to be followed up experimentally, ideally in a developmental context for proofing causality.
SVs can affect many genes, and multiple “disturbed” genes may together contribute to the phenotype. Indeed, in eight, cases we found support for the involvement of multiple candidate drivers that were affected by one or more de novo SVs. This supports previous findings that it can be important to consider multigenic effects to obtain a complete genetic diagnosis . Such multigenic effects may be especially important for patients with large and complex SVs affecting many genes. This may underlie the relatively high amount of multigenic effects we predicted in our cohort compared to previous, mainly exome sequencing-based work that found a contribution of multilocus variation in 4.9% of cases . In many of the studied cases, our method did not detect candidate drivers. This may be due to insufficient data or knowledge about the genes and regulatory elements in the affected locus and/or due to missing disease associations in the used databases. Additionally, de novo SVs are also frequently identified in healthy individuals in whom they do not have any pathogenic impact [83, 84, 85]. Some of the detected SVs of unknown significance may actually be benign and the disease caused by other genetic or non-genetic factors. The datasets underlying our computational workflow can be easily updated with more detailed data when emerging in the future, thereby enabling routine reanalysis of previously identified SVs. Moreover, our approach can be extended to study the consequences of SVs in different disease contexts such as cancer, where SVs also play a major causal role.
Interpretation of SVs is important for clinical diagnosis of patients with developmental disorders, but it remains a challenge because SVs can have many different effects on multiple genes. We developed an approach to gain a detailed overview of the genes and regulatory elements affected by de novo SVs in patients with congenital disease. We show that WGS, if not available as a first-tier test, can be useful as a second-tier test to detect variants that are not detected by exome- and array-based approaches.
We wish to thank all individuals who participated in this research and the participants of the Developmental Genome Anatomy projects and their families. We thank Giulia Pregno and Giorgia Mandrile for the clinical and biological study of individual P22, as well as the investigators (J. Gusella, E Liao, C. Morton, M Talkowski) for sharing the samples (P23–P39) and data for further research. We thank Utrecht Sequencing Facility (USEQ) for providing RNA sequencing service and data. USEQ is subsidized by the University Medical Center Utrecht, Hubrecht Institute and Utrecht University. We would also like to thank the Hartwig Medical Foundation for providing whole-genome sequencing services.
SM and JV performed the experiments and computational analyses. JG and RH ascertained and enrolled individuals P1 to P21 and provided the phenotypic information. JK and SM cultured the LCL. NB and LdlF performed the DNA and RNA isolations and lab support. SB, RJ, MJvR, and WK provided support for computational analyses. EvB collected the genomic and phenotypic information of individuals U1–U111. GP provided the material for individual P22. MET the provided LCL and information of individuals P22–P39 and performed the prior analyses of these cases. SM, JV, JG, JK, and EC designed the study. SM, JV, and EC wrote the manuscript. All authors read and approved the final manuscript.
This work is supported by the funding provided by the Netherlands Science Foundation (NWO) Vici grant (865.12.004) to Edwin Cuppen, as well as the National Institutes of Health (GM061354, MH115957, HD081256) to Michael Talkowski.
Ethics approval and consent to participate
All individuals or their parents provided written informed consent to participate in the study. The study was approved by the Medical Ethics Committee (METC) of the University Medical Center Utrecht (NL55260.041.15 15-736/M). The study was performed in accordance with the Declaration of Helsinki.
Consent for publication
All participants in this study provided consent for publication.
The authors declare that they have no competing interests.
- 19.Zepeda-Mendoza CJ, Ibn-Salem J, Kammin T, Harris DJ, Rita D, Gripp KW, et al. Computational prediction of position effects of apparently balanced human chromosomal rearrangements. Am J Hum Genet. 2017:1–12.Google Scholar
- 20.Kevin Y, Vincent G, Thomas G, Satisch S, Jacques P, Jean Baptiste B, et al. Looking for broken TAD boundaries and changes on DNA interactions: clinical guide to 3D chromatin change analysis in complex chromosomal rearrangements and chromothripsis. Methods Mol Biol. 2018;1769:197–208.CrossRefGoogle Scholar
- 23.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2009;20:254–60.Google Scholar
- 49.Nazaryan-Petersen L, Eisfeldt J, Pettersson M, Lundin J, Nilsson D, Wincent J, et al. Replicative and non-replicative mechanisms in the formation of clustered CNVs are indicated by whole genome characterization. PLoS Genet. 2018:1–25.Google Scholar
- 71.Svensson AM, Curry CJ, South ST, Whitby H, Maxwell TM, Aston E, et al. Detection of a de novo interstitial 2q microdeletion by CGH microarray analysis in a patient with limb malformations, microcephaly and mental retardation. Am J Med Genet. 2007;1889:1885–9.Google Scholar
- 74.Webster E, Cho MT, Alexander N, Desai S, Naidu S, Bekheirnia MR, et al. De novo PHIP-predicted deleterious variants are associated with developmental delay, intellectual disability, obesity, and dysmorphic features. Cold Spring Harb Mol case Stud. 2016;2:a001172.PubMedPubMedCentralCrossRefGoogle Scholar
- 77.Tan TY, Gonzaga-Jauregui C, Bhoj EJ, Strauss KA, Brigatti K, Puffenberger E, et al. Monoallelic BMP2 variants predicted to result in haploinsufficiency cause craniofacial, skeletal, and cardiac features overlapping those of 20p12 deletions. Am J Hum Genet. 2017;101:985–94.PubMedPubMedCentralCrossRefGoogle Scholar
- 80.Stavropoulos DJ, Merico D, Jobling R, Bowdin S, Monfared N, Thiruvahindrapuram B, et al. Whole-genome sequencing expands diagnostic utility and improves clinical management in paediatric medicine. npj Genomic Med. 2016;1:15012.Google Scholar
- 85.Collins RL, Brand H, Karczewski KJ, Zhao X, Alföldi J, Khera A V, et al. An open resource of structural variation for medical and population genetics. bioRxiv. 2019;578674, https://doi.org/10.1101/578674.
- 86.Middelkamp S, Vlaar JM, Giltay J, Korzelius J, Besselink N, Boymans S, et al. Sequencing datasets, European Genome-phenome Archive. 2019;https://www.ebi.ac.uk/ega/studies/EGAS00001003489.
- 87.Middelkamp S, Vlaar JM, Giltay J, Korzelius J, Besselink N, Boymans S, et al. Custom code, Github. 2019;https://github.com/UMCUGenetics/Complex_SVs.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.