Introduction

Neurodevelopmental disorders (NDDs), such as autism spectrum disorder (ASD), epilepsy, intellectual disability (ID), and developmental disability (DD) are complex disorders characterized by impairment in cognition, learning, and motor skills. From twin and family studies, it has become apparent that NDDs possess a strong genetic component (Freitag, 2007; Gejman et al., 2010). Estimates of heritability for various NDDs have ranged from 0.3 to 0.9, with heritability estimated to be greater than 0.5 for both ASD and ID (Flint, 2001; Kaufman et al., 2010; Tick et al., 2016). The evident contribution of genetic factors to NDD diagnoses has provided reason for routine prenatal whole exome or genome sequencing to identify potentially deleterious genetic variations (Soden et al., 2014; Tărlungeanu & Novarino, 2018). In particular, whole exome sequencing has proved a useful tool to identify, at a low cost, coding variants in genes that are highly intolerant to mutation and play important roles in typical neurodevelopment (Srivastava et al., 2019).

The identification and prioritization of NDD risk genes is important for the discovery of underlying biological mechanisms that are perturbed in NDDs (Cardoso et al., 2019). Previous studies have identified many monogenic forms of NDDs and revealed the multifactorial and polygenic nature of most NDD diagnoses (De Felice et al., 2015; Niemi et al., 2018; Sztainberg & Zoghbi, 2016). In particular, rare de novo mutations that are observed in genes in NDD cases at a significantly higher rate than expected relative to unaffected controls have pinpointed many candidate NDD genes, with more than one thousand genes estimated to be NDD risk genes (De Rubeis et al., 2014; Heyne et al., 2018; Iossifov et al., 2012; Kaplanis et al., 2020; McRae et al., 2017; O’Roak et al., 2012a, b; Sanders et al., 2012; Satterstrom et al., 2020; Wilfert et al., 2017).

De novo mutations are a class of rare genetic variation in which variants, that are not observed in parental genomes, exist in offspring due to mutagenesis in germ cells or errors in replication or recombination (Acuna-Hidalgo et al., 2016). De novo mutations may exist as single nucleotide variants, insertions and deletions (indels), and copy number variants. Because de novo mutations are not inherited, highly penetrant mutations can arise in genes that are critical to neurodevelopment and likely under purifying selection (Iossifov et al., 2012; Uddin et al., 2014). In fact, individuals affected by NDDs experience a greater burden of non-synonymous de novo mutation compared to unaffected controls (Coe et al., 2019; Wilfert et al., 2017). Study of ASD simplex families from the Simons Simplex Collection (SSC) has found that de novo likely gene disruptive (LGD) mutations occur at a nearly twofold increased rate in affected cases (0.21) relative to controls (0.12), as well as displaying an increased rate of missense mutation (Iossifov et al., 2014). Furthermore, the study of genetic modules impacted by these de novo mutations has pinpointed several biological processes relevant to NDD etiology, such as chromatin remodeling, the Wnt pathway, synaptic transmission, and the long-term potentiation pathway (Chow et al., 2019; Kwan et al., 2016; O’Roak et al., 2012a, b; Wilfert et al., 2017).

The benefits associated with successful early prediction of NDDs include the improved ability of parents to make informed decisions about potential early application of treatments (Boivin et al., 2015; Cioni et al., 2016; Corsello, 2005). It is important to note that most NDDs cases cannot be predicted using de novo coding variation alone; the exome constitutes 1–2% of the human genome and the majority of NDD-associated variants are likely to reside in non-coding regions involved in the regulation of gene expression (Short et al., 2018; Turner & Eichler, 2019). Currently, it is estimated that only ~ 10% of ASD cases and ~ 20–30% of ID/DD cases have de novo LGD variants, and the rate of such variants in the general population is significantly lower (Wang et al., 2021). The genetically and phenotypically heterogeneous nature of NDDs indicates that many factors, including common and non-coding genetic variants and non-genetic factors, account for a large fraction of diagnoses, further complicating our ability for the early prediction of these disorders. However, it is possible to confidently predict a subset of individuals who will likely develop NDDs due to de novo coding variation in the form of non-synonymous de novo mutations. Despite the polygenic nature of NDDs and the multitude of potential genetic or environmental causes, focusing specifically on un-inherited, de novo mutations that disrupt protein coding sequence permits early prediction for a small fraction of cases with very low false positive rates.

The early prediction of NDDs requires a very low false positive rate (FPR) due to potential negative consequences, such as the costs associated with early intervention treatments, that may result from false positive prediction. The minimization of the FPR is clinically most relevant in genetic counseling settings for parents with suspected or confirmed familial risk for NDDs and to aid in the decision to begin early intervention treatments in young children. Early diagnosis of NDDs via a combination of behavioral and motor assessments, imaging, and genetic testing followed by early prediction methods can greatly benefit patient outcomes and lead to timely, appropriate treatment (Hadders-Algra, 2021). Previously, a method for the early prediction of complex disorders, Odin, used de novo LGD variants observed in cases and controls and co-expression data to identify cases at very low FPR (Huynh & Hormozdiari, 2018). The shallow neural net (SNN) with novel objective function introduced here incorporates LGD de novo mutation, constraint, and conservation data to achieve a higher (> 0.30129) true positive rate (TPR) at very low FPR (< 0.01) in comparison to traditional classification models such as random forest (RF), support-vector machine (SVM), and logistic regression (LR). Furthermore, the proposed SNN model achieves similar PR-AUC and ROC-AUC to other machine learning approaches. An ensemble model that averages predictions among the SNN, RF, SVM, and LR models is able to achieve a slightly increased TPR at FPR < 0.01 and comparable PR-AUC. Additionally, the SNN is able to rank genes according to their relative importance in NDDs given LGD or missense de novo variation, prioritizing candidate NDD genes.

Methods

Objective

The main objective is to investigate the potential of using machine learning approaches for early prediction of NDDs using de novo coding genetic variants in a subset of cases. More formally, we are interested in utilizing de novo coding variants in maximizing the fraction of affected NDD cases accurately predicted when limiting the FPR to virtually zero.

Data Collection and Preprocessing

To distinguish NDD cases from unaffected controls using de novo coding variation, de novo likely gene-disruptive (LGD) and missense variants were retrieved from denovo-db (version 1.6.1) (Turner et al., 2017, p.). These data consisted of 9962 individuals with primary phenotypes of autism spectrum disorder (ASD), intellectual disability, and developmental disability and 2245 controls, of which 6509 cases and 1251 controls possess non-synonymous coding de novo mutation (Supplementary Table 1). In total, the 7760 samples possessed 1974 LGD (cases: 1715; controls: 259) and 10,777 (cases: 9073; controls: 1704) missense de novo coding mutations. PrimateAI scores were used to quantify the pathogenicity of missense variants, in which position-specific scores were calculated for each missense variant while incorporating conservation, solvent accessibility, and secondary structure data to yield predictions of deleteriousness (Sundaram et al., 2018). Probability of loss-of-function intolerance (pLI) and loss-of-function observed/expected upper bound fraction (LOEUF) scores from the gnomAD browser (v2.1.1), Residual Variation Intolerance (RVIS) scores based on ExAC v2 release 2.0 (March 15, 2017 version), and phastCons element scores were also used as features (Karczewski et al., 2020; Petrovski et al., 2013; Siepel et al., 2005).

LGD-specific and missense-specific feature matrices were generated, in which rows represent individuals with LGD or missense variation from denovo-db and columns represent genes containing de novo mutations (Fig. 1A, Additional File 1).

Fig. 1
figure 1

Methods overview. A De novo LGD and missense variants from probands with NDDs and controls were retrieved from denovo-db and arranged into feature matrices. Constraint and conservation information, in the forms of pLI, LOEUF, RVIS, and average phastCons element conservation scores were incorporated as gene score features (Karczewski et al., 2020; Petrovski et al., 2013; Siepel et al., 2005) (Additional File 1). B To perform hyperparameter optimization and model training, samples were divided into training (75%) and testing (25%) sets. Hyperparameter optimization occurs via threefold cross validation on the partitioned training set. For the SNN model (C), performance is measured as the TPR at FPR < 0.01, which is calculated by determining the number of cases (class: black) with predicted probability greater than that of any control (class: white) in the validation fold. For baseline models, consisting of the RF, LR, and SVM classifiers, respective loss functions are minimized. C The SNN consists of a single hidden layer and a loss function that seeks to minimize the product of predicted FP and a parameter \({\lambda }_{1}\), subtracted from the TP. D During the prediction phase, using the model trained with optimized hyperparameters, a prediction is made on the withheld testing set. For samples that simultaneously have both LGD and missense variation, two separate probabilities are retrieved from LGD- and missense-specific models for a given individual, and the maximum predicted probability is returned per individual. E For ranking genes based on their importance to NDDs, artificial samples are generated such that each artificial sample has a single de novo variant in a unique gene, using either LGD or missense variation, separately. Application of the prediction phase (D) on artificial samples yielded a ranking of the relative importance of a gene to NDDs determined via de novo coding variation

Model Architecture Development and Hyperparameter Tuning

Separate models were trained for de novo LGD variation and missense variation, referred to as SNN LGD-specific and missense-specific models. Each variation-specific SNN consists of two phases, a hyperparameter optimization phase and a prediction phase. After splitting all samples into training (75%) and testing (25%) sets, the hyperparameter optimization phase is applied to the training set, choosing optimal hyperparameters within a selected search space (Fig. 1B, Supplementary Table 2, Additional File 1). The purpose of the hyperparameter optimization phase for the SNN is to select a set of hyperparameters that yield the largest TPR at very low FPR on the training set to use during the prediction phase. Similarly, RF (sklearn.ensemble.RandomForestClassifier), SVM (sklearn.svm.LinearSVC), and LR (sklearn.linear_model.LogisticRegression) classifiers, hereon referred to as baseline models, are individually subjected to hyperparameter optimization and prediction phases. To allow direct comparison of each model’s performance, identical training/testing splits are provided to SNN and baseline models. The performance of SNN and baseline models are additionally compared to the TPR and FPR of the following heuristics, in which an individual is classified as a case if the individual has an LGD mutation in: (1) any gene with a (i) SFARI score of 1 (high confidence ASD gene) or (ii) SFARI score of 1 or 2 (strong candidate ASD gene) (https://gene.sfari.org/database/gene-scoring/), (2) any gene identified by SPARK as a (i) prioritized or (ii) risk gene, and (3) any gene with i) pLI >  = 0.90 or (ii) LOEUF < 0.35 (Additional File 1).

In the hyperparameter optimization and prediction phases (Fig. 1C),

$$loss=1-(TP- {(\lambda }_{1}*FP))$$
(1)

is used as a custom loss function (Eq. 1) for the SNNs, in which the objective is to minimize the product of the number of false positives (FP) and the hyperparameter \({\lambda }_{1}\) subtracted from the true positives (TP). The value of \({\lambda }_{1}\) is selected during the hyperparameter optimization phase. The SNN architecture consists of an input layer, a hidden layer with ReLU activation and an optimized number of neurons, and an output layer with sigmoid activation and L2 regularization with an optimized regularization parameter \({\lambda }_{2}\). The SNN uses the Adam optimization algorithm.

To return a prediction that incorporates both LGD and missense variation for individuals who possess both types of variants simultaneously (referred to as a ‘combined’ prediction), predictions are retrieved from the separately trained LGD- and missense-specific models for SNN and baseline models. For a given sample with both LGD and missense variation, the maximum predicted probability from the two separately trained variation-specific models is returned as the predicted probability of being a case primarily due to de novo coding variation (Fig. 1D). By using the maximum predicted probability, the model is trained to learn the class of an individual given their de novo mutation that is predicted to have the largest deleterious effect.

The average performance of a model over 100 independent training and testing splits is measured by determining the average TPR at FPR < 0.01, ROC-AUC, and PR-AUC for LGD-specific, missense-specific, and combined predictions for the SNN approach using the custom loss function, three baseline models, an ensemble model, and an ensemble model excluding SNN predictions (Additional File 1). To demonstrate the importance of gene score features and PrimateAI scores to increased TPR at FPR < 0.01, SNN and baseline models were trivially trained on one-hot encoded feature matrices indicating only the presence or absence (denoted as 1 or 0, respectively) of de novo LGD or missense mutation, and performance metrics were returned. To additionally assess the performance of the missense-specific model using only deleterious missense variation with PrimateAI scores ≥ 0.803 (as described in Sundaram et al., 2018), the missense-specific model (i) was trained using only samples with deleterious missense variation (PrimateAI ≥ 0.803) without discarding any samples, or (ii) was executed while excluding samples without deleterious missense mutation from training and testing sets.

NDD Gene Ranking

To rank genes according to their relative importance to NDDs using de novo coding variation in the form of de novo LGD mutations or missense mutations, two sets of artificial samples (LGD- and missense-specific) were created. The artificial samples each contain a single LGD (or missense) variant in a unique gene in the human genome (Fig. 1E). The probability of being a case is predicted for each of these artificial samples using the previously trained SNN LGD- or missense-specific models. For every artificial sample and its corresponding gene containing a de novo LGD (or missense) variant, the predicted probability indicates the relative importance of the gene to NDD risk from de novo coding variation. Enrichment of de novo LGD and missense mutation in NDD cases relative to controls was assessed (Additional File 1).

Results

To identify, at very low FPRs, a subset of affected NDD cases using rare coding variation consisting of de novo LGD and missense variation, LGD- and missense-specific feature matrices indicating the presence of de novo variation within genes were constructed (Fig. 1A). Additional features incorporating constraint and conservation data were used to improve classification of NDD cases using LGD variation. The ability of SNNs (Fig. 1C) to classify NDD cases at very low FPRs were compared to various classifiers, including RF, SVM, and LR (baseline models), in addition to three heuristics.

De Novo LGD Mutations Distinguish a Subset of NDD Cases from Controls with Low FPR

At very low FPRs (FPR < 0.01), an SNN trained on an LGD-specific feature matrix captures 30.1% of NDD cases possessing any de novo LGD coding variation. In comparison to baseline models, the SNN trained on an LGD-specific feature matrix is able to identify 5.29–10.25% [95% confidence interval (CI)] more NDD cases at FPR < 0.01 than the RF classifier, and more than 5.73–17.26% (95% CI) NDD cases than SVM or LR models (Fig. 2, Table 1, Supplementary Fig. 1). To measure the extent to which the SNN and other models achieve increased TPR at FPR < 0.01 compared to a randomized model, a z-score was also calculated (Additional File 1, Table 1).

Fig. 2
figure 2

Receiver operating characteristic and PR curves for LGD- and missense-specific and combined predictions for SNN and baseline (RF, SVM, and LR) models. Random classification is displayed as a dashed blue line in all PR curves. Models trained on LGD-specific variation feature matrices additionally use constraint and conservation gene score information, whereas models provided with missense-specific feature matrices do not use gene score information. For LGD-specific features, the SNN achieves greater TPR at low FPR < 0.01 compared to baseline models, a trend which is evident even at FPR < 0.05 (A), and the SNN achieves comparable precision at lower recall compared to baseline models (B). Models trained on missense-specific variation alone are poor predictors of NDD status; SNN and baseline models show similar TPR at FPR < 0.05, with the SNN achieving slightly higher rates at low FPR (C). The SNN displays comparable precision at low recall thresholds when trained on missense-specific variation (D). E For combined prediction for samples with both missense and LGD variation, the proportion of cases captured at FPR < 0.01 is largest for the SNN, and similar precision at low recall is observed for the SNN compared to baseline models (F)

Table 1 Average TPR at FPR < 0.01, ROC-AUC, and PR-AUC for LGD-specific, missense-specific, and combined SNN, baseline, ensemble models, and randomized predictions

For the SNN, ROC-AUC and PR-AUC values of 0.72785 (0.7227–0.7326, 95% CI) and 0.9505 (0.9490 to 0.9519, 95% CI), respectively, were observed (Table 1). Observed PR-AUC values were comparable among the SNN and baseline models in their deviance from the randomized model, displaying similar z-scores. Note that due to the large number of cases in proportion to controls in available datasets, PR-AUC values for SNN and baseline models are significantly inflated; the random assignment model has an PR-AUC of over 0.85.

The inclusion of gene score features derived from pLI, LOEUF, RVIS, and phastCons element scores improves upon an SNN trivially trained only on LGD mutations (TPR at FPR < 0.01 = 0.24532, ROC-AUC = 0.66696, PR-AUC = 0.93597) (Supplementary Table 3, Supplementary Fig. 2). In addition, baseline and SNN models yield similar performance metrics when trivially trained on only LGD mutations, indicating. that the inclusion of gene constraint and conservation information is important to accurate classification of NDD cases using de novo LGD mutations (Supplementary Table 3).

Compared to the TPR and FPR of the three previously described heuristics, in which a sample was classified as a case if the sample possessed an LGD mutation in a set of prioritized genes, decreased TPR at low FPR thresholds in comparison to the SNN was observed for each heuristic (Supplementary Table 4, Supplementary Fig. 3). No heuristic achieved similar TPR values greater than 0.30 at FPR less than 0.01.

Integration of Missense and LGD-Specific Models Improves Prediction on Individuals with Both De Novo Missense and LGD Variants

To assess the ability of de novo missense mutations to distinguish NDD cases from unaffected controls, de novo missense variants from individuals with at least one missense variant were retrieved, consisting of 6947 samples possessing a total of 10,777 missense mutations. SNN and baseline models trained on missense variation capture less than 2.6% of NDD cases at FPR < 0.01 (Fig. 2, Table 1), indicating that accurate prediction of NDDs using only missense de novo variants is an extremely challenging problem. Slightly increased TPR at FPR < 0.01 is observed when the missense-specific model is trained only on deleterious missense variation without removing any samples from training and testing; excluding samples without deleterious missense variation from training and testing yields 2242 samples (2257 cases; 248 controls) with 2,505 deleterious missense variants and increased TPR at FPR < 0.01 (Supplementary Table 5).

For samples possessing both de novo missense and LGD variants, accurate prediction of NDD cases at low FPR can be improved by taking the maximum predicted probability from two models trained separately on only missense or LGD variation, referred to as a ‘combined’ prediction (Fig. 2, Table 1). Combined prediction on samples with both missense and LGD variation captures an increased fraction of cases. For example, compared to the LGD-specific SNN, an SNN using combined prediction is able to detect at most 4.22% more affected cases at FPR < 0.01 (95% CI). TPR at FPR < 0.01—associated z-scores for the SNN are greater by 1.41- 2.51 than values observed for baseline models using combined predictions.

Ensemble Prediction Yields Increased TPR at Very Low FPRs Compared to Separately Trained SNN and Baseline Models

An ensemble prediction was generated by returning the average predicted probability from the SNN, RF, SVM, and LR models for a given sample in the testing set. Compared to SNN and baseline models for LGD-specific, missense-specific, and combined models, the ensemble model consistently yields a larger TPR at low FPR values (Supplementary Fig. 4, Table 1). The predictive contribution of the SNN to the ensemble model is more substantial than that of the baseline models. For example, the TPR at FPR < 0.01 is greater for LGD-specific and combined prediction SNNs than ensemble models that exclude SNN predictions, referred to as ‘Ensemble—SNN’ (Table 1, Supplementary Fig. 4). Additionally, for LGD-specific and combined predictions, there is no overlap of 95% CIs between SNN and Ensemble—SNN models. From the ensemble prediction's constituent models, the SNN performs most similarly to the full ensemble method, differing by 0.586% and 1.582% in TPR at FPR < 0.01 given LGD-specific and combined predictions, respectively. In addition, the ensemble model achieves a slightly higher average PR-AUC, as evidenced by an increased z-score, than any individual SNN or baseline model for corresponding LGD-specific (0.95176) or combined predictions (0.95215) (Table 1).

Integration of Constraint, Conservation, and De Novo Mutation Data Permit NDD Gene Prioritization

Training of SNNs (Fig. 1C) on variation-specific feature matrices enables NDD risk gene ranking according to the effect of de novo missense and LGD mutations within specific genes (Fig. 1E). For example, using only LGD variants during SNN training reveals genes that are sensitive to LGD mutations and play important roles in typical neurodevelopment. Gene rankings and associated SFARI Gene scores are displayed in Supplementary Table 6 in descending order according to their relative importance to NDD risk.

For artificial LGD samples (that each possess a single LGD variant in a unique gene), an increased enrichment of LGD variants is observed in NDD cases relative to unaffected controls at increasing predicted probabilities (Fig. 3A), and a slight increased enrichment of missense variants is also observed in NDD cases for genes ranked according to a trained LGD-specific SNN (Fig. 3B). The difference in enrichment \(({E}_{diff})\) of LGD or missense mutation in cases relative to controls per gene is calculated by Eq. 2 (Additional File 1). Significant correlation exists between pLI (p-value < 2.25e − 79) and LOEUF (p-value < 1.09e − 63) values with predicted probability ranks for artificial LGD samples (Fig. 3C, D). For gene rankings produced by a missense-specific SNN, similar trends in enrichment of de novo coding variation in NDD cases relative to controls are observed, although the range of probabilities predicted by the missense-specific SNN narrows compared to the LGD-specific SNN, and the strength of correlation amongst pLI and LOEUF values with predicted probabilities is reduced (Supplementary Fig. 5).

Fig. 3
figure 3

Increased enrichment of de novo LGD and missense mutation in NDD cases relative to unaffected controls in highly ranked NDD genes according to an SNN trained on an LGD-specific feature matrix. Applying a trained SNN on artificial samples containing a single unique LGD variant allows the SNN to rank genes according to their relative importance to NDD risk with respect to LGD coding variation. The difference in enrichment in NDD cases versus controls per ranked gene is calculated by Eq. 2 (Additional File 1) and displayed on the y-axes. Increasing probability (x-axes) indicates increasing importance to NDD risk. The average predicted probability was determined for each artificial sample over 100 independent iterations, and 95% CI are shown. At increasing probabilities for artificial samples with LGD variants, a steady, increased enrichment of LGD in cases (A) is observed, and a slight enrichment of missense variation (B) in cases relative to controls is also observed at increasing probabilities. The probability (ranks) assigned to genes is significantly correlated with both pLI (C) and LOEUF (D) values retrieved from gnomAD (v2.1.1). pLI values range from 0 to 1, where values above 0.9 suggest intolerance to LGD mutation, whereas LOEUF values represent a ratio of observed over expected LGD mutations and values less than 0.35 suggest intolerance to LGD mutation

For the LGD-specific SNN model, inclusion of gene score features generated from pLI, LOEUF, RVIS, and PhastCons produces rankings with greater enrichment of LGD variation in cases relative to controls at higher probabilities than an LGD-specific SNN model trivially trained on one-hot encoded mutation information that excludes gene score features (Supplementary Fig. 6).

Discussion

To distinguish NDD cases from unaffected controls at extremely low FPRs using de novo coding variation and measures of gene constraint and conservation, we developed a SNN with a customized objective function to maximize TP while simultaneously minimizing FP (Fig. 1). Although most cases of NDDs arise from a variety of classes of genetic variation, particularly common, non-coding, and structural variants, focusing specifically on de novo coding variation of relatively large effect size is a tradeoff to obtain significantly reduced FPR on a small but significant subset of samples. Compared to traditional machine learning techniques, such as RF, support vector machine (SVM), and LR (referred to as ‘baseline’ models), the constructed SNN is able to achieve greater TPR at FPR less than 0.01 given LGD-specific variation (Table 1, Fig. 2). The ability of the SNN to capture more than 30% of cases at FPR < 0.01, corresponding to at least 5.29% more cases than any baseline model (Table 1), indicates that the use of a SNN with the custom loss function (Eq. 1) is beneficial in classifying NDD cases at very low FPR. Note that it is estimated that LGD variants have been observed in roughly 10% of ASD cases and up to 30% of DD cases (Wang et al., 2021). Thus, our results indicate that the proposed SNN should be able to identify > 3% of ASD and > 10% of all DD cases while having an FPR of virtually zero simply by considering de novo LGD variants.

To demonstrate that gene scores related to constraint and conservation, including pLI, LOEUF, RVIS, and phastCons, were useful and necessary for the SNN to yield elevated TPR at FPR < 0.01 compared to baseline models given LGD-specific variation, the performance of trivially trained SNN and baseline models were measured (Supplementary Table 3, Supplementary Fig. 2). During trivial training, only a feature matrix of one-hot encoded values (1 or 0) denoting the presence or absence of a de novo coding variation within a gene were provided as input features to models. We note that most de novo mutations retrieved from denovo-db were identified via simplex studies that facilitate the identification of de novo variants, thus potentially introducing biased prediction in favor of variants identified via simplex rather than multiplex studies. We would also like to note that multiplex NDD cases will have a potentially lower chance of being caused by de novo variants and thus reduce the ability of our model’s accurate prediction of these cases. Similar TPR at FPR < 0.01 values are reported for trivially trained and trivially trained baseline models, indicating that the inclusion of gene score features greatly contributes to the SNN's improved ability to classify NDD cases at very low FPR.

In addition, a simple ensemble method that uses the average predicted probability from SNN and baseline model predictions is able to identify NDD cases at greater TPR at FPR < 0.01 and slightly increased precision at lowered recall than any of its constituent models (Table 1, Supplementary Fig. 4). Excluding SNN predictions from the ensemble model reveals that the SNN, compared to baseline models, contributes substantially to the ensemble model’s ability to accurately classify NDD cases at low FPR values. In fact, for LGD-specific variation, an ensemble method that excludes SNN predictions produces decreased TPR at FPR < 0.01 metrics compared to the SNN alone (Table 1).

The ability of SNN and baseline models to use only missense variation to identify NDD cases is relatively poor. However, the incorporation of both missense and LGD-specific predictions during ‘combined’ prediction for samples containing both LGD and missense variation, in which the maximum predicted probability from two separately trained missense- and LGD-specific models are returned, increases average TPR at FPR < 0.01 compared to using only probabilities predicted by an LGD-specific model (Table 1, Fig. 2). The improved performance of combined predictions indicates that certain samples possessing very deleterious missense variation (in addition to LGD variation) are correctly classified as cases when the predicted probability associated with the missense-specific model, rather than the LGD-specific model, is retrieved.

SNNs trained on LGD- and missense-specific feature matrices containing de novo coding variation from NDD cases and controls are able to rank genes according to their relative importance to NDD risk when applied to artificial samples which each contain a single type of de novo variant in a single gene (Supplementary Table 6). An increased enrichment of de novo LGD and missense mutation in NDD cases relative to controls is observed in highly ranked genes (those with higher predicted probability of being a case) using LGD-specific variation (Fig. 3). Significant, strong correlation exists between predicted probability for artificial samples for both the pLI and LOEUF constraint metrics, showing that the ranking via LGD-specific variation can accurately detect most high risk NDD genes. Among the 50 most highly ranked genes using LGD-specific variation, a total of 47 out of 50 genes are classified as high confidence (39 genes with score 1), strong candidate (6 genes with score 2), and suggestive evidence (2 genes with score 3) autism spectrum disorder (ASD) risk genes, including genes relevant to syndromes, according to SFARI Gene and OMIM (Supplementary Table 6). Among genes with predicted probabilities greater than 0.90 (ranks 1–55), four genes (WDR45, CLTC, BRPF1, and GATAD2B) do not possess SFARI annotations, but have been associated with neurodegeneration and intellectual disability according to OMIM annotations. Highly ranked genes lacking both SFARI Gene scores and OMIM annotations suggest candidate NDD genes susceptible to de novo LGD variation. Evidence of association with NDDs [ZFHX3 (Fuller et al., 2018), CHD5 (Parenti et al., 2021), UBR3 (Murcia Pienkowski et al., 2020)) or enrichment of de novo LGD mutation in NDD cases (ANP32A, SKIDA1 (Coe et al., 2019)), neurodegeneration (ANP32A (Podvin et al., 2020), HECTD1 (Schmidt et al., 2021)), gliomas (LARP4B (Koso et al., 2016)), synapses and neuronal formation (LMTK3 (Takahashi et al., 2020), DOT1L (Franz et al., 2019)] have been studied in model organisms, cell lines, and families for these candidate NDD genes.

Weaker correlation is observed for missense-specific rankings with pLI and LOEUF values, and enrichment of de novo non-synonymous mutation is also present in NDD cases relative to controls, although to a lesser extent compared to LGD-specific rankings (Supplementary Fig. 5). The missense-specific rankings are distinct from LGD rankings in their ability to identify genes potentially sensitive to missense variation (Supplementary Table 6). Among highly ranked genes lacking SFARI Gene scores and OMIM annotations, previous studies suggest association with NDDs and schizophrenia [OBSCN (Hashimoto et al., 2016), PLEC (Dincer et al., 2015), RYR2 (Lieve et al., 2019), ZSWIM8 (Tischfield et al., 2017)], cortical formation and thickness [LAMA5 (Omar et al., 2017), GOLGA3 (Kim et al., 2017)), and neurodegenerative diseases (PKHD1 (Santos-Laso et al., 2020), DNAH1 (Thonberg et al., 2017)].

Our results indicate that we can accurately predict a small, yet significant fraction of NDD cases using de novo coding variants. Currently, whole-exome or whole-genome sequencing of trios is not common practice. However, to make the early prediction of these disorders a reality, such sequencing should become common practice. Furthermore, our approach only covers a small fraction of affected patients and additional methods that use other types of biomolecular signatures, such as common variants, rare non-coding variants, and epigenomic markers, are needed to increase the reach of early prediction to a larger fraction of cases.

Conclusions

In summary, the described SNN identifies NDD cases at higher TPR while having very low FPR in comparison to traditional machine learning methods. Several factors contribute to the improved performance of the proposed approach, namely: the use of gene constraint and conservation features in LGD-specific prediction and a custom loss function that specifically seeks to maximize the TPR while minimizing the FPR. An ensemble method, aggregated from SNN and baseline model predictions, is able to correctly classify a greater proportion of cases at FPR < 0.01 compared to any individual model. The SNN itself is a major contributor to increased TPR at FPR < 0.01 observed in the ensemble model. Although de novo missense mutation alone is a poor predictor of case status relative to LGD mutation, missense-specific predictions are useful during combined prediction for identifying additional cases that possess highly deleterious missense mutation in addition to LGD mutation. Fully trained SNNs on LGD- or missense-specific variation are also useful in NDD risk gene prioritization, revealing candidate NDD genes enriched in de novo non-synonymous mutations in NDD cases relative to controls.