Background

Developmental delay, intellectual disability, and related phenotypes (DD/ID) affect 1–2% of children and pose medical, financial, and psychological challenges [1]. While many are genetic in origin, a large fraction of cases are not diagnosed, with many families undergoing a “diagnostic odyssey” involving numerous ineffective tests over many years. A lack of diagnoses undermines counseling and medical management and slows research towards improving educational or therapeutic options.

Standard clinical genetic testing for DD/ID includes karyotype, microarray, Fragile X, single gene, gene panel, and/or mitochondrial DNA testing [2]. The first two tests examine an individual’s entire genome with low resolution, while the latter offer higher resolution but over a small fraction of a person’s genome. Whole-exome or whole-genome sequencing (WES or WGS, respectively) can provide both broad and high-resolution identification of genetic variants and hold great promise as effective diagnostic assays [3].

As part of the Clinical Sequencing Exploratory Research (CSER) consortium [4], we have sequenced 371 individuals with one or more DD/ID-related phenotypes. One hundred affected individuals (27%) were found to harbor a pathogenic or likely pathogenic (P/LP) variant, most of which were de novo. Sixteen percent of P/LP variation was identified upon reanalysis that took place after initial assessment and results return, supporting the value of systematic reanalysis of variant data to maximize clinical effectiveness. We also describe 21 variants of uncertain significance (VUS; a subset of the 42 total VUSs identified in this study) in 19 genes not currently associated with disease, but which are intriguing candidates. The genomic data we generated and shared through dbGaP [5], ClinVar [6], and GeneMatcher [7] may prove useful to other clinical genetics labs and researchers. Our experiences and data strongly support the value of large-scale sequencing for clinical and research progress related to pediatric neurological disease.

Methods

Study participant population

Participants were enrolled at North Alabama Children’s Specialists in Huntsville, AL, USA. A parent or legal guardian was required to give consent for all probands and assent was obtained from those probands who were capable. Probands were required to be at least two years old, weigh at least 9 kg (19.8 lbs), and be affected with developmental and/or intellectual delays; more detailed information regarding enrollment, including phenotypic criteria, is provided in the “Supplemental Methods” (see Additional file 1).

Whole-exome and whole-genome sequencing

Blood samples were sent for sequencing at the HudsonAlpha Genomic Services Laboratory (http://gsl.hudsonalpha.org). Genomic DNA was isolated from peripheral blood and WES (Nimblegen v3) or WGS was conducted to a mean depth of 71X or 35X, respectively, with > 80% of bases covered at 20X. WES was conducted on Illumina HiSeq 2000 or 2500 machines; WGS was done on Illumina HiSeq Xs. Reads were aligned and variants called according to standard protocols [8, 9]. A robust relationship inference algorithm (KING) was used to confirm familial relationships [10].

WGS copy number variant calling

Copy number variants (CNVs) were called from WGS bam files using ERDS [11] and read depth [12]. Overlapping calls with at least 90% reciprocity, less than 50% segmental duplications, and that were observed in five or fewer unaffected parents were retained and subsequently analyzed for potential disease relevance. All CNVs found within 5 kb of a known DD/ID gene, within 5 kb of an OMIM disease-associated gene [13], or intersecting one or more exons of any gene were subject to manual curation.

Filtering and reanalysis

Using filters related to call quality, allele frequency, and impact predictions, we searched for rare, damaging de novo variation or inherited X-linked, recessive or compound heterozygous variation in affected probands, with modifications for probands with only one (duos) or neither (singletons) biological parent available for sequencing.

Potential secondary variants (i.e. medically relevant but not associated with the proband’s DD/ID) were also sought within parents. We assessed variants in 56 genes flagged by the American College of Medical Genetics and Genomics (ACMG) as potentially harboring medically actionable, highly penetrant genetic variation [14], those associated with recessive disease in OMIM [13], and carrier status for CFTR, HBB, and HEXA.

We also searched for all those variants listed as pathogenic or likely pathogenic in ClinVar [6], regardless of inheritance or affected status. Further details for variant annotation and filtration are supplied in “Supplemental Methods” (see Additional file 1).

For reanalysis, variants were reannotated with additional data, including updated versions of ClinVar [6], ExAC [15], DDG2P [16], and gene or variant lists identified in publications related to DD/ID genetics [17,18,19], and refiltered as described above and in “Supplemental Methods” (see Additional file 1). Candidate variants found in genes that were either not known to associate with disease or were found in individuals with phenotypes dissimilar from previously reported associations were submitted to GeneMatcher (https://genematcher.org/) [7].

Variant classification

Variants were classified into one of five categories: pathogenic; likely pathogenic; VUS; likely benign; or benign. Our study began prior to publication of the formal classification system proposed by the ACMG [20], although our evidence and interpretation criteria are conceptually similar. Multiple lines of evidence, with mode of inheritance, allele frequency in population databases, and quality of previously reported disease associations weighing most heavily, are required to support assignments of pathogenicity. The “Supplemental Methods” section (see Additional file 1) contains a detailed description of our assertion criteria and these criteria are also available via ClinVar [6]. The key annotations, including mode of inheritance, allele frequencies, PubMed identifiers, and computational inferences of variant effect, used to support the disease relevance of each variant are supplied in Additional file 2: Table S1.

Variant validation

WES and WGS were carried out under a research protocol and were not completed within a CAP/CLIA laboratory. All variants found to be medically relevant and returnable were validated by Sanger sequencing in an independent CLIA laboratory (Emory Genetics Laboratory) before being returned to participants, although these validated variant results are not CLIA-compliant as the input DNA was originally isolated in a research laboratory.

Analysis of trios as singletons

For probands subjected to WGS as part of trios, we removed parental genotype information from their associated VCFs and subsequently filtered to identify variants that are expected to be extremely rare in the general population and/or affect genes known to associate with disease. Scores from the Combined Annotation Dependent Depletion (CADD) algorithm [21] were subsequently used to rank P/LP variants within the filtered variant subsets from each relevant proband. See “Supplemental Methods” (see Additional file 1) for details.

Functional assays

RNA isolation, complementary DNA (cDNA) synthesis, quantitative polymerase chain reaction (qPCR), and western blotting were conducted according to standard protocols. Details are provided in “Supplemental Methods” (see Additional file 1).

Results

Demographics of study population

We enrolled 339 families (977 individuals total) with at least one proband with an unexplained diagnosis of a DD/ID-related phenotype (see “Study participant population” in “Supplemental Methods” – Additional file 1). A total of 284 participating families were enrolled with both biological parents; 261 of these families had one affected proband, 21 families had two affected probands, and an additional two families had three affected probands. As each proband (including siblings within a family) was used to anchor a proband-parent “trio” as an analytical unit, our study includes a total of 309 trios from 284 families. We also enrolled 35 proband-parent “duos” that included one proband and one biological parent. Additionally, we enrolled two families with one biological parent and two affected probands (four “duos”) and one duo family with three affected probands (three “duos”), leading to a total of 42 “duos” from 38 families. Finally, we enrolled 17 “singleton” families in which no parents were available for testing; for 14 of these only one proband was tested and in three families two affected siblings were sequenced (a total of 20 “singleton” probands).

During the course of this study, a decision to replace WES with WGS was made. In total, WES was performed on 365 individuals (127 affected) and WGS was performed on 612 individuals (244 affected). WES and WGS were sequenced to an average depth of 71X and 35X, respectively, with > 80% of bases covered ≥ 20X in both experiment types. DNA from probands subjected to WES was also analyzed via a SNP array to detect CNVs if clinical array testing had not been previously performed.

The study population had a mean age of 11 years and was 58% male. Affected individuals displayed symptoms described by 333 unique HPO [22] terms with over 90% of individuals displaying intellectual disability, 69% with speech delay, 45% with seizures, and 20% with microcephaly or macrocephaly. Of the affected individuals, 18% had an abnormal brain magnetic resonance imaging (MRI) result and 81% had been subjected to genetic testing prior to enrollment in this study (Table 1).

Table 1 Pathogenic/Likely pathogenic rates by clinical annotation and family structure among the 371 DD/ID-affected individuals

DD/ID-associated genetic variation

WES and WGS data were processed with standard protocols to produce variant lists in each family that were subsequently annotated and filtered; filtered variant lists were subject to manual review (see “Methods”). KING, a robust relationship inference algorithm, was used to confirm familial relationships [10]. Variant pathogenicity was classified based on allele frequency, inheritance status, published reports, computational deleteriousness predictions, and other sources of evidence; these assertion criteria are described in detail in the “Supplemental Methods.” All variants described here were confirmed by Sanger sequencing (see “Methods”) in probands and available family members before being returned to participants.

One hundred (27%) of the 371 probands had P/LP variants, while an additional 42 (11.3%) harbored a VUS (Table 2). Given that most probands had been previously tested via microarray prior to their enrollment in this study, large CNVs were detected in only 11 affected individuals; three were classified as a VUS, while the remaining eight were P/LP (Table 2; Additional file 2: Table S1; Additional file 3: Figure S1).

Table 2 Results of WES and/or WGS for 371 DD/ID-affected individuals

Most (76%) P/LP variation occurred de novo, while 12% of individuals inherited P/LP variants as compound heterozygotes or homozygotes (Additional file 3: Figure S2A). An additional 5% of individuals were males with an X-linked maternally inherited P/LP variant. Finally, 7% of participants who harbored a P/LP result were sequenced with one or no biological parent and thus have unknown inheritance (Additional file 3: Figure S2A). Most P/LP variants were missense mutations (52%), while 39% were nonsense or frameshift, 7% were predicted to disrupt splicing, and 2% led to inframe deletion (Additional file 3: Figure S2B). Variants that were classified as a VUS or greater were identified in 97 genes, excluding large CNVs, with variants in 23 (24%) of these genes observed in two or more unrelated individuals (Additional file 2: Table S1; Additional file 4: Table S2).

Pathogenic/likely pathogenic variant rates across families of varying structure and phenotypic complexity

Affected individuals were categorized into one of three analytical structures based on the number of parents that were sequenced along with the proband(s): proband-parent trios (309); duos with one parent (42); and proband-only singletons (20). A P/LP result was found in 29.1% of trio individuals, 19% of duo individuals, and 15% of singletons (Table 1).

We believe that at least some of the decline in P/LP variant yield in duos and singletons reflects the analytical benefits of trio sequencing to efficiently highlight de novo variation. However, given that one or both biological parents were unavailable or unwilling to participate in duo or singleton analyses, the P/LP rate comparisons among trios/duos/singletons may be confounded by other disease-associated factors (depression, schizophrenia, ADHD, etc.). For example, most (11 of 20) of the singleton probands were adopted owing to death or disability associated with neurological disease in their biological parents. To assess the relationship between identification of a P/LP variant and family history, we separated all probands into three types: simplex families in which there was only one affected proband and no first-degree to third-degree relatives reported to be affected with any neurological condition (n = 93); families in which the enrolled proband had no affected first-degree relatives but with one or more reported second-degree or third-degree relatives who were affected with a neurological condition (n = 85); and multiplex families in which the proband had at least one first-degree relative affected with a neurological condition (n = 123) (Additional file 5: Table S3). Thirty-eight probands with limited or no family history information were excluded from this analysis.

P/LP variants were found in 24 (20%) of the 123 multiplex families (20 out of 97 trios) in contrast with 35 (37.6%) of 93 simplex families (31 out of 80 trios), suggesting a P/LP identification rate that is twice as high for simplex, relative to multiplex, families. While larger sample sizes are needed to confirm this effect, the rate difference is significant whether all enrolled families (p = 0.002) or only those sequenced as trios (p = 0.008) are considered. Rates in families that were neither simplex nor multiplex (i.e., proband lacks an affected first-degree relative but has one or more affected second-degree or third-degree relatives) were intermediate, with 26% of all such families having a P/LP result (28% of trios). Of relevance to the trio/duo/singleton comparison described above, 11 of 13 (85%) singletons for which we had family history information had an affected first-degree relative, in contrast with 41% for duos and 39% for trios (Additional file 5: Table S3). This enrichment for affected first-degree relatives likely contributed to the generally reduced rate of P/LP variants in singletons observed here.

Multiplex family findings include examples of both expected and unexpected inheritance patterns. For example, two affected male siblings were found to be hemizygous for a nonsense mutation in PHF6 (Börjeson-Forssman-Lehmann syndrome MIM:301900) inherited from their unaffected mother. In another family, we found the proband to be compound heterozygous for two variants in GRIK4, with one allele inherited from each parent. Interestingly, both the mother and father of this proband report psychiatric illness and extended family history of psychiatric phenotypes is notable. While these data are insufficient to conclude that they are indeed causative, it is plausible that the observed psychiatric phenotypes are at least partially attributable to the variation in GRIK4 found in this family. We also found two distinct returnable de novo variants within two families. Affected siblings in family 00135 each harbored a returnable de novo variant in a different gene, including a VUS in SPR (Dystonia MIM:612716) and a pathogenic variant in RIT1 (Noonan syndrome MIM:615355), while two probands (00075-C and 00078-C) who were second-degree relatives to one another harbored independent pathogenic de novo variants, one each in DDX3X (X-linked ID MIM:300958) and TCF20 (Additional file 2: Table S1).

Alternative mechanisms of disease

While the majority of DD/ID-associated genetic variation found here is predicted to lead to missense, frameshift, or nonsense effects (Additional file 3: Figure S2B), a subset of probands harbor variants predicted to disrupt splicing and, in some cases, potentially alternative mechanisms of disease. As an example, we sequenced an affected 14-year-old girl (00003-C, Additional file 2: Table S1) who presented with severe ID, seizures, speech delay, autism, and stereotypic behaviors. WES revealed an SNV within the splice acceptor site of intron 2 in MECP2 (c.27-6C > G, MIM:312750), identical to a previously observed de novo variant in a 5-year-old girl with several features of Rett syndrome, but who lacked deceleration of head growth and exhibited typical growth development [23]. Laccone et al. showed by qPCR that the variant produces a cryptic splice acceptor site that adds five nucleotides to the messenger RNA resulting in a frameshift (p.R9fs24X) [23]. It is likely that both the canonical and cryptic splice sites function, allowing for most MECP2 transcripts to produce full-length protein, resulting in the milder Rett phenotype observed in the individual described here and the girl described by Laccone et al. [23].

In another affected proband (00126-C), we identified compound heterozygous variants in ALG1 (Additional file 2: Table S1). This proband has phenotypes consistent with ALG1-CDG (congenital disorder of glycosylation MIM:608540) including severe ID, hypotonia, growth retardation, microcephaly, and seizures, and was included as part of a comprehensive study of ALG1-associated phenotypes [24]. The paternally inherited missense mutation (c.773C > T (p.S258L)) has been previously reported as pathogenic [25], while the maternally inherited variant, which has not been observed before (c.1187 + 3A > G), is three bases downstream of an exon/intron junction (Fig. 1a). We performed qPCR from patient blood RNA and found that intron 11 of ALG1 is completely retained in both the proband and the mother (Fig. 1a–d). The retention of intron 11 results in a stop-gain after adding 84 nucleotides (28 codons).

Fig. 1
figure 1

Intronic variants in ALG1 and MTOR disrupt splicing and introduce early stop codons. a Diagram showing the region of ALG1 surrounding the variant found in the proband and mother, an A > G transition three nucleotides downstream from the splicing donor site of intron 11. E = exon. b The ALG1 variant leads to increased retention of intron 11. cDNA from patient derived RNA extracted from blood was amplified using the PCR F/PCR R primer set (shown in panel 1A) to test for intron 11 retention. The control samples are cDNA derived from RNA extracted from blood of an unrelated individual as well as the father of the proband that did not harbor the variant. The proband, and mom, from which the variant was transmitted, both harbor the incorrectly spliced transcript retaining intron 11. Control reactions lacking RT were also performed and did not show the PCR product containing the fully retained intron (data not shown). c, d qPCR analysis shows that the variant leads to inclusion of the entire intron 11. Controls are two unrelated individuals and the father of the proband. The affected individuals are the proband and mother. e Diagram showing the region of MTOR surrounding the variant, an A > G transition two nucleotides upstream of the splicing acceptor site. E = exon. f The region surrounding intron 4 was amplified using PCR F and PCR R (position indicated in (e)), and shows partial retention of the intron. The retained partial intron was not detected in control reactions lacking RT (data not shown). g, h qPCR from blood RNA shows that the 5′ splice site is not affected by the variant, but that the 3′ acceptor site is, leading to partial retention (134 bp) of intron 4. Controls included unrelated individuals and the maternal half aunt of the proband. Affected individuals are the proband and half-sibling. For all qPCR analyses, RNA was extracted from blood and ΔΔCT values were calculated as a percent of affected individuals and normalized to GAPDH. The sequences of all oligos used are found in Additional file 3: Table S7

In a separate family consisting of affected maternal half siblings (00218-C and 00218-S, Additional file 2: Table S1, Fig. 1e) we found a variant in a canonical splice acceptor site (c.505-2A > G) of MTOR intron 4. The half siblings described here both have ID; the younger sibling has no seizures but has facial dysmorphism, speech delay, and autism, while his older sister exhibits seizures. We presume that the maternal half siblings inherited the splice variant from their mother, for whom DNA was not available, who was reported to exhibit seizures. We conducted qPCR and Sanger sequencing using blood-derived RNA from both siblings, finding transcripts that included an additional 134 nucleotides from the 3′ end of intron 4, ultimately leading to the addition of 20 amino acids before a stop-gain (Fig. 1f–h, Additional file 3: Figure S3). Because the stop-gain occurs early in protein translation, this splice variant likely leads to MTOR loss-of-function. Mutations in MTOR associate with a broad spectrum of phenotypes including epilepsy, hemimegalencephaly, and intellectual disability [26]. However, previously reported pathogenic variants in MTOR are all missense and suspected to result in gain-of-function [27]. Owing to this mechanistic uncertainty, we have classified this splice variant as a VUS. However, given the overlap between phenotypes observed in this family and previously reported families, we find this variant to be highly intriguing and suggestive that MTOR loss-of-function variation may also lead to disease. MTOR is highly intolerant of mutations in the general population (RVIS [28] score of 0.09%) supporting the hypothesis that loss-of-function is deleterious and likely leads to disease consequences.

Proband-only versus trio sequencing

Our trio-based study design allows rapid identification of de novo variants, which are enriched among variants that are causally related to deleterious, pediatric phenotypes [29]. However, we also assessed to what extent our P/LP rate would differ if we had only enrolled probands. Thus, and to avoid the confounding of family history differences among trios, duos, and singletons (see above), we subjected variants found by WGS within all trio-based probands to various filtering scenarios blinded to parental status and assessed the CADD score [21] ranks of de novo variants previously classified as P/LP (Fig. 2; Additional file 6: Table S4). While parentally informed filters were the most sensitive and efficient (e.g. > 60% of P/LP variants were the top-ranked variant among the list of all de novo events in each respective proband), filters defined without parental information were also effective. For example, among all rare, protein-altering (i.e. missense, nonsense, frameshift, or canonical splice-site) mutations found in genes associated with Mendelian disease via OMIM [13] or associated with DD/ID via DECIPHER [16], 20% of P/LP variants were the top-ranked variant in the given proband, most ranked among the top five and > 80% ranked among the top 25. These data suggest that most P/LP variants could be found within probands analyzed without parental information, although additional curation time, likely in proportion to the drops in P/LP variant rank within any given filtered subset, would be required (Additional file 6: Table S4).

Fig. 2
figure 2

Ranks of pathogenic/likely pathogenic variants filtered without parental data relative to trio-defined de novo events. Most pathogenic/likely pathogenic variants, even under models that only consider population frequencies (e.g. “Rare”), rank (based on CADD) among the top 25 hits in a patient, and many rank as the top hit. Restrictions to rare coding variants and/or those affecting OMIM/DDG2P [13, 16] genes further enrich for causal variants among top candidates, making diagnosis feasible without parents

In contrast to P/LP variants, VUSs would have been more difficult to identify without parental sequencing (Additional file 3: Figure S4), owing to the fact that many VUSs do not affect genes known to associate with disease. Also, those VUSs that do affect genes known to associate with disease tended to have lesser computationally estimated effects and therefore lower CADD ranks [21]; if they were more overtly deleterious, they would likely have been classified as P/LP. Discovery of candidate or novel disease associations, many of which are likely to eventually be shown as robust, is thus substantially more effective within trios.

Secondary findings in participating parents

We found genetic variation unrelated to DD/ID, i.e. secondary findings, in 8.7% of parents (Additional file 7: Table S5). Of parents, 1.5% were found to harbor a P/LP variant related to a self-reported secondary condition, such as variants in SLC22A5 that underlie a primary carnitine deficiency (MIM:212140). We also examined 56 genes identified by the ACMG as potentially harboring actionable secondary findings [14], revealing P/LP variants in 12 parents (2.0%), a rate similar to that observed in other cohorts [14, 30]. Finally, we performed a limited carrier screening assessment, identifying 28 (4.6%) parents as carriers of P/LP variation in HBB (sickle cell anemia MIM:603903), HEXA (Tay-Sachs disease MIM:272800), or CFTR (cystic fibrosis MIM:219700). We also assessed parents as mate pairs and searched for genes in which both are heterozygous for a P/LP recessive allele. These analyses yielded one parental pair (among 285 total) as carriers for variants in ATP7B, associated with Wilson disease (MIM:277900).

Reanalysis of WES and WGS data

To exploit steady increases in human genetic knowledge, we performed systematic reanalyses of WES/WGS data. We approached reanalysis in three ways: (1) systematic reanalysis of old data, with the goal of reassessing each dataset every 12 months after initial analysis; (2) mining of variant prompted by new DD/ID genetic publications; and (3) use of GeneMatcher [7] to aid in the interpretation of variants in genes of uncertain disease significance.

As shown in Table 3, these efforts led to an increase in pathogenicity score for 15 variants in 17 individuals. In nine cases, a new publication became available that allowed a variant that had not been previously reported or that was previously reported as a VUS to be reclassified as P/LP. Three additional changes were a result of discussions facilitated by GeneMatcher [7], while the remaining upgrades resulted from reductions in filter stringency (changes to read depth and batch allele frequency) or clarification of the clinical phenotype. Among all 44 variants originally found to be VUSs, five (11.3%) have been upgraded. The most rapid change affected a de novo variant in DDX3X, which was upgraded from VUS to pathogenic approximately one month after initial assessment, while a de novo disruption of EBF3 was upgraded from VUS to pathogenic approximately 2.5 years after initial assessment. VUSs associated with DD/ID, especially when identified via parent-proband trio sequencing, thus have considerable potential for upgrade. Additionally, of the 211 families who originally received a negative result, P/LP variation was identified for ten (4.7%) through reanalysis. These data show that regular reanalysis of both uncertain and negative results is an effective mechanism to improve diagnostic yield.

Table 3 Variants with an increase in pathogenicity score due to reanalysis

Identification of novel candidate genes

We have identified 21 variants within 19 genes with no known disease association but which are interesting candidates. For example, in one proband (00265-C) we identified an early nonsense variant (c.2140C > T (p.R714X), CADD score 44) in ROCK2, with reduction of ROCK2 protein confirmed by western blot (Additional file 3: Figure S5). ROCK2 is a conserved Rho-associated serine/threonine kinase involved in a number of cellular processes including actin cytoskeleton organization, proliferation, apoptosis, extracellular matrix remodeling, and smooth muscle cell contraction, and has an RVIS [28] score placing it among the top 17.93% most intolerant genes [31]. As a second example, in two unrelated probands (00310-C and 00030-C), we identified de novo variation in NBEA, a nonsense variant at codon 2213 (of 2946, c.6637C > T (p.R2213X), CADD score 52), and a missense at codon 946 (c.2836C > T (p.H946Y), CADD score 25.6). NBEA is a kinase anchoring protein with roles in the recruitment of cAMP dependent protein kinase A to endomembranes near the trans-Golgi network [32]. The RVIS score [28] of NBEA is 0.75%. While these variants remain VUSs, the fact that they are de novo, predicted to be deleterious, and affect genes under strong selective constraint in human populations, suggests they have a good chance to be disease-associated.

Discussion

We have sequenced 371 individuals with various DD/ID-related phenotypes. Of these individuals, 27% harbored a P/LP variant, most of which were de novo and protein-altering. We found that the P/LP yield is impacted by presence of neurological disease in family members, as our success rate drops from 38% for probands without any affected relatives to 19.5% for probands with one or more affected first-degree relatives. These data are consistent with the observation of higher causal variant yields in simplex families relative to multiplex families affected with autism [33]. It in part reflects the eased interpretation of de novo causal variation relative to inherited, and likely in many cases variably expressive or incompletely penetrant, causal variation (e.g. 16p12) [34].

A total of 127 probands were subject to WES and 244 were subject to WGS. The P/LP identification rate was not significantly different between the two assays when considering only SNVs or small indels (p = 0.30). However, WGS is a better assay for detection of CNVs [35] and, while our patient population is depleted for large causal CNVs owing to prior array or karyotype testing, we have identified CNVs that we classified as P/LP in eight individuals.

We have also demonstrated the value of systematic reanalysis, which has thus far yielded P/LP variants for an additional 17 individuals (17% of total P/LP variation, 4.6% of total probands). Given the rates of progress in Mendelian disease genetics [36] and the development of new genomic annotations, we believe that systematic reanalysis of genomic data should become standard practice. While the costs and logistical demands for implementation at large scales are unclear, reanalysis has the potential to considerably increase P/LP variant yields over time (e.g. in our study, ~8% for cases > 1 year removed from initial analysis). Furthermore, as more pathogenic coding and non-coding variants are found, the reanalysis benefit potential is largest for WGS relative to WES; the former typically has slightly better coverage of coding exons in both our data (Additional file 3: Table S6) and previous studies [35], and reanalysis of pathogenic non-coding variation is impossible with WES.

Our data clearly suggest trio-based sequencing as more sensitive and analytically efficient than proband-only sequencing, supporting the value of trios in clinical diagnostics; as sequencing costs continue to drop, testing parents should eventually be offered routinely. Further, VUSs and novel candidates are more difficult to identify without parental sequence data and proband-only approaches will ultimately confer less benefit in terms of discovery of new disease associations. However, current sequencing costs, when coupled to overall priorities (e.g. per-patient yield versus total number of diagnoses) may lead to variability in decision-making about how to best allocate resources. For example, tripling per-patient sequencing costs will, under many realistic cost scenarios, lead to fewer total diagnoses within a given total budget even though the per-patient diagnostic yield is higher and curation time reduced for trios relative to singletons. Our retrospective analyses, in which we evaluated ranks of pathogenic variants under various filtering parameters, may provide useful information in making these decisions. Trade-offs in curation time, which will correlate with P/LP variant ranks, and sensitivity can be estimated empirically, in relative terms, using these data (Fig. 2; Additional file 6: Table S4).

Variation detected through our studies has already helped lead to the discovery of at least one new disease association, as we identified two patients that harbor de novo variants in EBF3, a highly conserved transcription factor involved in neurodevelopment that is relatively intolerant to mutations in the general population (RVIS [28]: 6.78%). Through collaboration with other researchers via GeneMatcher [7], we were able to identify a total of ten DD/ID-affected individuals who harbor EBF3 variants, supporting the conclusion that disruption of EBF3 function leads to neurodevelopmental phenotypes [37]. It is our hope that the other VUSs described here and systematically shared via ClinVar [6] and GeneMatcher [7] will also help to facilitate new associations.

Conclusions

We have demonstrated the benefits of genomic sequencing to identify disease-associated variation in probands with developmental disabilities who are otherwise lacking a precise clinical diagnosis. Indeed, by combining genomic breadth with resolution capable of detecting SNVs, indels, and CNVs in a single assay, WGS is a highly effective choice as the first diagnostic test, rather than last resort, for unexplained developmental disabilities. The ability for WGS to serve as a single-assay replacement for WES and microarrays underscores its value as a frontline test. Furthermore, the benefits and effectiveness of WGS testing is likely to grow over time both by accelerating research (for example into the discovery of smaller pathogenic CNVs and pathogenic SNVs outside of coding exons) and by facilitating more effective reanalysis, a process which we show to be an essential component to maximize diagnostic yield.