Background

Autism is a neurodevelopmental condition with high heritability and a complex genetic mutational landscape. It has been characterized by social communications deficits, restricted interactions, and repetitive behavior patterns and interests [1]. Its prevalence rate is 1 in 59 children worldwide [2]. Landmark symptoms for autism include hypersensitivity, impulsivity, agitation, mood swings, and mild to severe cognitive functions impairment [3]. These symptoms range from above-average to intellectual disability, accompanied by seizures and language impairment. Defective cross-functionality in relevant domains and other cranial defects in subjects results in autism manifestation, generally before the age of three [4]. Speech-language delay is a unique and empirical phenomenon observed in autistic children. It is crucial to study the causality and molecular markers involved with autism with speech-language delays [5].

Unequivocal genes causative for autism pathophysiology has not been pinpointed, even after decades of autism research advancements starting from linkage to next-generation sequencing techniques to date. Attempts focused on linkage and candidate studies to understand autism-specific variants have implicated several significant findings [6]. Genome-wide association studies (GWAS) were scalable to neuronal function and corticogenesis, which provided confidence to identify risk autism variants viz. PTBP2, CADPS, and KMT2E [7]. However, GWAS could not confirm the detection of strong contributors to common alleles for autism. Association between autism and specific Mendelian disorders has been observed; for example, PTEN macrocephaly is associated with autism severity [8]. Specific copy number loci have been associated with autism with statistical significance values. Various microRNA recognition elements (MRE) modulating single-nucleotide polymorphisms (SNPs) and MRE-creating SNPs present in the 3′ UTR of autism have significant implications. These genes have a notable effect on autism manifestation and severity [9]. Limited information obtained from GWAS, genotyping, and other processes has directed the interest of researchers towards rare variants and point mutational studies. Autism-associated point mutations in various genes have added information and clarity to its molecular basis for manifestation. De novo and other types of point mutations have been identified in 15–20% of the autism subjects [10].

Whole-exome sequencing (WES) can be used as a reliable tool to scan various exomes and identify causal variants for autism. WES helps explore the exome for rare autism-specific de novo and transmitted variants disrupting proteins [11]. It can help to evaluate whether the co-occurrence of de novo events in the same individual increases risks for autism or not. These sequence datasets showcase over 120 casual genes with clinically relevant genetic variants identified for autism in the last decade [12]. These variations can be point mutations, insertions, deletions, and copy number variations in the coding regions, in either the homozygous or the heterozygous state [13]. Multiple levels of sequencing advancements have opened new, quicker, and cost-effective avenues to date. Several SNPs with recurrent deleterious mutations in ARID1B, SCN1A, SCN2A, and SETD2 genes have been reported so far. These identified mutations result in gain or loss of function of one or more functional copies of a gene or contiguous genes besides biallelic mutations of both gene copies, which is suggestive of contribution to autism susceptibility [14]. Numerous genes have been well established with validation in several cohorts to elucidate gene dosage sensitivity and relevance to autism-specific pathways [15]. Although numerous studies have been conducted on autism using the whole-exome sequencing technique yet, its pathophysiology is not thoroughly addressed.

Disease susceptibility for heterozygous variants is influenced by the analysis of the haploinsufficiency and probability of being the loss-of-function–intolerant (pLI) scores for mutated genes [16, 17]. It reflects the intolerance to loss-of-function and deleterious mutations via the functional impact of essential genes (EGs) commonly observed in heterozygous mutations. These scores indicate the cumulative effect of deleterious variants in EGs on complex neurodevelopmental disorders such as autism with a threshold of > 20 [17]. Therefore, this study aims to identify high confidence genes for autism and study their interplay with an in-depth analysis of one master gene with gene variants and associated causative pathways. It pinpoints damaging heterozygous variants in CNTNAP2, a significant gene in autism, and delayed speech-language phenotype.

Methods

The present study consists of 222 whole-exome sequences deposited by Simons Simplex Collection (SSC) at the European Nucleotide Archive with the accession number PRJNA167318. The SSC group has carried out the sequencing using Illumina Genome Analyzer IIx paired-end sequencing platform at the coverage of 100X, with library preparation according to Illumina protocols. In the current study, we performed an exhaustive analysis to identify the high confidence autism genes in the 222 exome sequences of quartet sample sets from the SSC family. Each family under study comprises an autistic subject with unaffected siblings and parents with a detailed case history. Only the affected probands have been studied under the current investigation.

Whole-exome sequences in .fastq format were aligned against the hg19 build of the human reference genome using the Strand Next-Generation Sequencing (NGS) platform due to its accuracy, correctly mapped reads, and receiver operating curves. Post-alignment quality check was performed to remove false-positive variant reads. The sequence data was run on multiple platforms: STRAND NGS, Partek, and direct command prompt algorithm-based pipelines to avoid further false positives. Partek® software (©Partek Inc., St. Louis, MO, USA) and Strand NGS software (Version 2.8, Build 230243 ©Strand Life Sciences, Bangalore, India) were used for the analysis along with direct command prompt algorithms softwares: Burrows-Wheeler Aligner (BWA)-backtracks [18] and Bowtie 2 [19]. BWA and Bowtie 2 have been known as ultrafast and memory-efficient tools for mapping human exome or genomes. Comparative results were obtained at pre-and post-alignment, along with multi-level quality checks (QCs). All the variants identified across multiple platforms were considered for the study and exported in variant calling format (vcf).

Further, variants were called using variant calling program web ANNOVAR (wANNOVAR) with vcf files as input. wANNOVAR software was used to annotate the variant files based on position, gene, amino acid change, zygosity, and mutation effects. Variant calls with a read depth of ≥ 20 were included in the study. Minor allele frequency was limited to P value ≤ 0.05 based on the EXAC and 1000G studies. The candidate genes were filtered for deleterious and damaging mutations—stop gain, stop loss, missense, and nonsynonymous. Pathogenicity scores were calculated across eleven platforms with a minimum threshold of a mutation being tagged as damaging across at least five platforms to be considered for further analysis. The priority-based classification for the known autism candidate genes was performed using the Simons Foundation Autism Research Initiative (SFARI) gene list [20]. PredictSNP tool was used to predict the effects of identified mutations on protein function for prioritization for further characterization using six different robust prediction classifiers for disease-related mutations.

Haploinsufficiency and pLI scores for heterozygous mutations have been used to support the clinical interpretation of novel loss-of-function variants with gene prioritization for whole-exome sequencing [21]. It is calculated to understand and validate the presence of heterozygous mutations and their functionality [16]. The pLI scores indicate the tolerance level of a given gene to loss of function (LoF) based on the number of protein-truncating variants. Thus, the stop gains and frameshift variants are referenced in the human genome using gene size and sequencing coverage metrics. It is often used for the prioritization of candidate genes. For LoF mutations, assumptions are set for three gene classes: null (where LoF variation is completely tolerated), recessive (where heterozygous LoFs are tolerated), and haploinsufficient (where heterozygous LoFs are not tolerated) for tolerance to LoF variation. Observed and expected variant counts have been used to determine the probability that a given gene is extremely intolerant of LoF variation. The closer pLI is to one, the more the gene is intolerant to LoF. pLI ≥ 0.9 is considered an extremely LoF-intolerant gene set [22].

Enrichment analysis for the identified genes was performed through KEGG pathways and an extensive literature review. Gene-enriched pathways relevant to autism were selected along with other known associated genes. BIOSTRING software was used to create gene-gene, gene-protein, and protein-protein interaction networks using the top ten genes.

The disease pathway was created using Ingenuity Pathway Analysis (IPA) with an inbuilt database of curated literature. Upstream and downstream of the mutant genes were overlaid. The pathway enrichment was performed using Z-score, P value, and Jaccard similarity testing to identify enriched disease pathways with disruptions/blocks caused due to mutations [23,24,25,26].

Protein modeling was performed using STRUM software [27]. STRUM is a method for predicting fold stability change (ΔΔG) of protein molecules upon single-point nonsynonymous mutations. It adopts a gradient boosting regression approach to train the Gibbs free-energy changes on various features with different levels of sequence and structure properties [27]. Its uniqueness lies in combining sequence profiles with low-resolution protein structure models from structural prediction. This process enhances the method’s robustness and accuracy, making it applicable to various protein sequences, including those without experimental structures. It starts from wild-type sequences and constructs 3D models by the iterative threading assembly refinement simulations. DOMPred tool was used to derive a graph from the aligned termini positions using PSI-BLAST local alignments. In this case, larger values indicate regions with sequence discontinuities in putative domain boundaries. This also gives the predicted number of domains and the positions of domain boundaries for the predictive peaks. The graph can be visualized to confirm the predicted number of domains and possible domain boundaries. In case of a mutated protein, a larger degree of variation is possible due to disorder and variation in the domain linker region aspects [28].

Results

The authors performed WES analysis for 222 autism subjects with 100X coverage and a 95% confidence interval. The mean read length of ≥ 100bp with 10.19 GB of raw data was obtained from the sequencing reaction. It generated a total of 47 million reads with read quality in terms of a Phred score of 33.32 forward read (R1) and 33.10 reverse read (R2). The Quality scores were 27.65% and 66.47% for R1 and R2, respectively (Fig. 1). The quality check for pre-and post-alignment was of an appropriate standard. Alignment breakdown was marked at ≥ 90% with unique paired alignment in almost all the cases, while the unaligned part was minimal. Local alignment captured 10,000 variants on an average in the SNP processing step before SNP detection was conducted with false discovery rate (FDR) set at ≤ 0.5.

Fig. 1
figure 1

Representative pre- and post-alignment quality check, coverage, and scores for the datasets under study. The graphs represent the quality check scores, alignment breakdown, and average base quality scores and validate the proper alignment of the datasets against the hg19 human reference genome

The identified 10,000 variants were annotated across the regulatory, untranslated regions, exons, introns, downstream, and intergenic regions using SNP detection. On analysis of the exonic variants from the annotated whole-exome sequencing dataset, 943 genes were identified as damaging for autism. On applying SFARI gene scoring, identified genes with scores 1 and 2 were 192 and 182 in number, respectively. Further, on filtering the genes based on pathogenicity and haploinsufficiency scores, the gene list was streamlined to 15 genes with 24 variants. Annotation of these genes revealed rare and deleterious variants for KMT2C, CNTNAP2, CACNA1C, SHANK3, ANK2, HECTD4, MAP1A, SKI, SCRAP, CUL7, ZNF804A, CNTNAP3, CACNA1H, LRP1, and CNTN4 genes (Table 1). These variants belonged to either nonsynonymous, frameshift insertions-deletions, or stop gain variants in the coding regions. This mechanistic study was assisted by exhaustive literature and SFARI gene scoring for in silico validation. The burden of gene variants was observed on chromosomes 1, 2, 3, 4, 6, 7, 11, 12, 14, 16, and 22 (Table 1).

Table 1 Distribution of autism high-risk gene variants in 222 global whole-exome sequences

Among these 15 genes, the study of overlapping variants presents across all the samples revealed exclusive variants for KMT2C in 167 cases, CNTNAP2 in 192 samples, CACNA1C in 152 cases, and SHANK3 in 124 cases (Table 1). Previously reported variants were identified for the filtered gene sets for the reported mutational landscape.

Pathway analysis of the 15 high confidence genes was outlined clustering of autism-relevant genes WDFY3, SHANK2, CNTNAP2, HOMER1, SYNGAP1, and ANK2 with several primary and secondary physical interactors. These genes are encircled and highlighted in red in the schematic pathway represented in Fig. 2. Autism-related processes and phenotypes were enriched in the pathway obtained.

Fig. 2
figure 2

Pathway analysis for the high-risk autism genes using Ingenuity Pathway Analysis (IPA). Clustered protein within the top network-associated genes as derived from IPA algorithms is shown. Proteins identified are encircled and labeled with the protein symbol in red fill. Direct connections between/among proteins are shown in solid lines; indirect interactions are shown as dashed lines or edges. The constructed pathways analysis of the 15 high-risk genes has shown an association with various autism phenotypes and processes, including learning, synaptic transmission, abnormal social behavior, and social withdrawal. These genes coincide with multiple pathways with overlapping connections across different pathways and present in the upstream and downstream of vital processes. Each gene has a divergent and convergent pathway and can pave a path towards autism manifestation

CNTNAP2 showed a haploinsufficiency score of 4.94 with nine damaging nonsynonymous variants—T589P, T118P, H764P, G285A, A588P, W134G, N139S, N139S, R160H, and T831S in seven exons with GERP rank score range of 0.065–0.95. CNTNAP2 had shown exclusive variants with relevant read depth and P value in 86.5% of the cases under investigation (Table 2). Based on the stepwise analysis and interpretations, CNTNAP2 was selected for the downstream analysis. Minor allele frequency ranges from 1.65 × 10−05 to 0.216, with the highest read depth being 210. One unique variant was identified in 192 cases positioned at 589, resulting in the amino acid change from threonine to proline, impairing the epithelial growth factor (EGF) domain with a mutation-induced perturbation of protein folding stability change, ΔΔG value of 2.67 kcal/mol (Fig. 3). The amino acid change T589P was present across 91.06% of the sample cohort. It has an overall confidence score of 87% with deleterious effect across multiple pathogenicity platforms, calculated using the PredictSNP tool. Established miRNA genes, miRNA548AQ, and miRNA548F were mutated within the CNTNAP2 region, adding to the severity in the cohort.

Table 2 Overlapping CNTNAP2 variants for the transcript NM_014141 positioned on chromosome 7 across multiple whole-exome sequence datasets for the current investigation
Fig. 3
figure 3

Schematic gene structure of CNTNAP2 with the unique T589P variant marked with the mapped protein domain

The protein structures for the normal and the mutated CNTNAP2 protein were modeled by STRUM using the CNTNAP2 protein structure with PDB ID 5Y4M as a template. The variants showed a range of ΔΔG ≥ 0.5, indicating stability in mutational sensitivity. The protein folding was distorted and disoriented in the mutated protein. The unique variant T589P, present across multiple populations, showed reduced stability by 0.25, increased solvent accessibility by 9%, and reduced depth by 0.2 in mutant protein product (Fig. 4). The secondary structure underwent negligible changes across normal and mutated protein structures. Ten domain boundaries positioned at 185, 341, 481, 594, 677, 802, 967, 1048, and 1253 were predicted for the modeled normal protein using the DOMPred tool (Marsden, R). These have undergone changes positioned at 174, 341, 478, 591, 717, 802, 967, 1078, and 1253 in terms of coiling and aligned termini profile in modeled mutated protein (Fig. 4). Residues in terms of helix, coiling, and strand have been indicated with a center at the 700th residue from the domain boundaries.

Fig. 4
figure 4

a Protein modeling for normal and mutated CNTNAP2 protein product with marked mutation and conformational change highlighted in a box. The mutated protein structure was rendered as non-functional with reduced stability and depth and increased solvent accessibility. b Domain boundary prediction to understand the changes in residues and aligned termini profile. The coil residue have undergone multiple changes highlighted with an asterisk in the graph across normal and mutated protein leading to disorder and variation

Network analysis of CNTNAP2 protein revealed various physical interactors: CNTN2, CALM1, CALM3, CACNB1, CACNB2, ANK2, ZNF804A, CACNA1H, and CACNA1C proteins with varying interaction scores between 0.5 and 1. Interestingly, all the interactors enriched have also been mutated in the dataset under study contributing to the manifestation of autism through different entry-exit points. The average local clustering coefficient was significant at 0.548, with a P value of 0.00103. The study of each physical interactor’s expression levels revealed shared expression patterns of CNTNAP2 with CACNB2, CALM1, CNTN2, and CALM3 proteins in the human model (Supplementary Figure 2).

Discussion

Deciphering causal genes in autism has been difficult due to its genetic heterogeneity, varied pathogenicity, and associated comorbidities. WES is employed as a single genetic tool with high-throughput computational program to identify causal gene variants and disease pathways for autism [26]. The confidence interval of 95% with an optimal insert length of 200 bp indicates effective enrichment, leading to sequencing results. Appropriate coverage of 100X with a significant Phred score and read depth enhances the confidence with 98% sensitivity and higher positive predictive values for nonsynonymous variants [45]. Quality check scores and mean alignment breakdown values are typical and follow the sequenced data’s default value range. Significance and confidence of detected rare variants depend on the sequence quality, sample size, and the prior probability that the allele exists. The difference in coverage range for variations is 15X across the sample cohort—optimal according to standard sequence data protocol [46]. FDR rate below 0.5 directs towards increased confidence for the variants to be crucial for disease manifestation.

For SNP detection and identifying genes, rare coding variants are selected with a P value of ≤ 0.05 for stringent and accurate correlation of variants to autism. The universally accepted Gene Score database housed at SFARI places each gene into a category with a score based on relevant evidence available. These scores indicate the gene’s relevance and severity in causing autism [47]. Identified variations belong to regulatory regions, exons, introns, and downstream regions in protein-coding and non-protein-coding regions. Multiple genes containing damaging variations have been observed across all samples. Utilization of a custom-developed pipeline with stringent filters revealed 24 significant disease-causing variants. Coding variants are taken into consideration for the downstream analysis focusing on deleterious/damaging variants. There are multiple evidence lines to direct the cumulative effect of deleterious/damaging coding variants for autism [48]. Out of the 15 identified genes, six gene variants with high P value and damaging variants have been shown to disrupt the normal gene function adversely. The identified chromosome burden complies with the previous trends for autism. Each gene identified plays a crucial role in autism manifestation in a unique, well-defined manner [49].

KMT2C, CNTNAP2, and SHANK3 are well-established causal genes for autism coupled with speech-language disabilities and delays [29, 50, 51]. These genes showed damaging mutations impairing crucial protein domains related to speech-language processes and autism. A phenotype-genotype correlation could be set for these genes for speech-language disability, a crucial phenotype in 90% of the sample cohort. For instance, several variants in SHANK3 with a non-functional SRC Homology 3 domain are known to impair the dendritic spine morphology and synaptic transmission, resulting in autism with delayed speech-language [52, 53]. Similarly, KMT2C and CACNA1C have established trajectories for autism manifestation [30, 54]. Despite convergent evidence from multiple studies, the CNTNAP2 gene shows the strongest association with autism [55]. Knockout mice experiments with CNTNAP2 show striking similarities with autism symptoms [56]. CNTNAP2 gene is well established, yet connections remain to be explored, making it an exciting gene to study further. CNTNAP2 is well established

In parallel, the pathway enrichment analysis constructs gene clusters, which showed association with various autism phenotypes and processes such as learning, synaptic transmission, abnormal social behavior, and social withdrawal. Previous studies in autism using machine learning have reported 77 such gene clusters with significant enrichment in crucial pathways in autism pathophysiology, ultimately resulting in autism manifestation [57]. They have overlapping connections across different pathways and present in the upstream and downstream of vital processes. Each gene has a divergent and convergent pathway and can pave a path towards autism manifestation, as shown by multiple study groups [49, 58, 59].

Sequential criteria based on filtration of genes and overlapping of genes in various studies; based on the parameters of haploinsufficiency and pLI, the prevalence of variants, pathogenicity, and gene selection criteria, and the pathway clustering, CNTNAP2 was identified as a high-risk autism gene. Considering overlapping studies using multi-facet criteria of damaging variants and impaired pathways, CNTNAP2 has shown damaging/deleterious, nonsynonymous, stop gain, and frameshift deletion variants with a haploinsufficiency score of ≥ 4.94 calculated at a read depth (RD) of > 70. A connecting link for CNTNAP2 and autism was established through its biological functionality. CNTNAP2 is present in the synaptic junction impairing axonal growth at cortisol neurons [60], responsible for language ability-vital to autism [58].

CNTNAP2 is the crucial player in synaptic plasticity, localized at myelinated axons associated with potassium channels. It functions in the nervous system of vertebrates as cell adhesion molecules and receptors. The chromosome position of 2.4-Mb-sized CNTNAP2 is 7q35-q36.1, which comprises of 24 exons in total. The gene variant identified in 192 cases lies in exon 11 (Table 2, Fig. 3). CNTNAP2 encodes CAM that regulates signaling between neurons, highly expressed in neurons that control language and language development difficulties. It encodes CASPR2 with expression restricted to neurons (transmembrane scaffolding protein) clustering voltage-gated potassium channels at the Nodes of Ranvier. It plays a significant role in “language development” in autism. It is highly expressed in a cortical–striatal–thalamic circuit, involved in diverse higher-order cognitive functions.

Interestingly, previous studies have identified SNP clusters in intronic regions to be associated with communicative behavioral delays in screening normal healthy cohorts. Genetic variance at this locus is suggestive of its role in language endophenotypes [61]. Associations of CNTNAP2 have been identified with crucial autism phenotypes and speech-language impairment [29].

CNTNAP2 showed nine damaging variants with relevant GERP scores and appropriate read depth. Considering the threshold values of GERP, PolyPhen, and SIFT pathogenicity scores enhanced the probability of the selected variants to be disease-causing [62]. Eight out of the nine variants have been cataloged as causal variants with read sequence identifiers and the associated protein domains and motifs impaired. ΔΔG value could be used to predict the variant’s effect on protein folding and perturbations caused by it. A value of ≥ 0.5 kcal/mol is considered destabilizing, enhancing the effect of dysfunctional protein on disease severity [63].

The deleterious nature and high confidence of the overlapping CNTNAP2 variant were considered for the downstream analysis. It is present in the protein coiling of the protein and present across 192 samples. It covers the epithelial growth factor (EGF) protein domain, which is involved in the proliferation and differentiation of nervous tissue during neurogenesis and promotes wound healing [64]. SNPs for EGF play a significant role in the etiology of abnormal behavior in children with autism. Decreased EGF plasma levels are correlated with hyperactivity, decreased motor skills, and the tendency for tiptoeing [64]. The decreased EGF could be due to the increased ligand binding to its receptor resulting in increased EGFR and decreased EGF. This suggests their association with the etiology of autism [64]. Gene disruptions can affect the CNTNAP2 expression through regulatory miRNAs through deletion or duplication in miRNAs. miRNAs affect the cell differentiation in neuronal cells by downregulation of non-sense mediated RNA decay of genes involved in neurodevelopment [65]. Among its potential targets are a few of the notable autism genes—PTEN, SLC1A1, GRIK2, GABRG1, and GABRA4, which have been evaluated through miRNA expression profiling of cell-derived total RNA [66]. For CNTNAP2, ten miRNAs have been identified in regulatory networks as a hub with transcription factors coupled with target genes. However, no such associations have been established so far for CNTNAP2 expression regulatory miRNAs. Intron 3 of CNTNAP2 shows the deletion of one copy of miR548AQ and miR548F in several patients of autism, as evident from the current investigation as well [67].

Protein modeling shows conformational changes in the structure of the mutated protein product. The normal protein, on having a mutation at 589, undergoes a change in ligand to protein binding by introducing a premature stop codon that is predicted to produce a non-functional, impairing secondary structure. The stability is considerably reduced in the mutated proteins in the current investigation indicating a more considerable impact on the protein folding and disease manifestation. The change in Gibb’s free energy upon protein folding increase in negative value is indicative of greater stability. Substitution in protein sequence due to the presence of an SNP can result in a change in ΔΔG, wherein 0.5 and 0.5 kcal/mol indicates stabilizing and destabilizing mutations, respectively [68]. Increased solvent accessibility would expose much more residual active sites of the protein structure, resulting in easier replacement of amino acid for conformation in secondary structure [69, 70]. Destabilizing mutational protein folding renders disease susceptibility, which has a much more collective effect than the stabilizing counterparts. The mutant protein product would contain only extracellular domains secreted from the cell, which can further have additional deletions resulting in impaired protein [61]. Due to random coiling at the predicted domain boundaries, it would result in a disordered protein structure with fewer structural elements with the vast majority of residues’ solvent exposed [71]. This would reduce the protein non-functional with discontinuous domains. Domain boundary analysis is crucial to understand the local decrease of protein structural constraints, with variants present [72]. Also, it reduces specificity, which would ultimately decrease the sensitivity in the ROC curve [28, 73]. This would help understand the protein folding in secondary structure stability and other properties for autism-related functionality. Such studies should be warranted for protein level studies in autism in larger cohorts.

Rare variants in the CNTNAP2 gene, including deletions and nonsynonymous changes, have indicative roles in autism, intellectual disability, developmental delay, and language impairment [29]. Autism shows genetic heterogeneity, and hence, convergence and divergence of gene clusters are often seen in previous studies. Physical interactors of CNTNAP2 protein-containing variants in the sample cohort aggravate the mutational sensitivity and affect the more significant impact. Taking these interactions into account, a model can be put forth with CNTNAP2 protein as a bridge to connect different cell type variants through CNTN2. Disruptions in such bridges are vital to accounting for the various genotypes, and phenotypes accounted for intracellular variations of CNTNAP2 and the intact and mutant protein roles using model animals [74]. Several studies provide genetic evidence for the CNTNAP2 gene to be closely related to altered gene expression in autism brain [56]. CNTNAP2 directly interacts with FOXP2 suggesting a link between language impairment and autism circuital pathway [75]. Mutations in CACNA1H and CACNA1C, which form the alpha subunits and CACNB1 and CACNB2 of the subunits, impair the multiprotein complex calcium voltage-gated channels.

Similarly, various gradients are established and maintained by a rich array of calcium pumps, exchangers such as ANK2, phosphorylases CALM1 and CALM2, voltage-activated, and ion channels, with an array of calcium-binding proteins. These permit tight regulation of calcium concentrations in cytosol and intercellular spaces and downstream signals. These functionalities are essential for normal cognitive functions, especially synaptic plasticity, memory, the excitability of neurons, neurotransmitter emission, axon growth, and neurons [76]. Adhesion molecule CNTN2 and cell recognition molecule CNTNAP2 aid the process in downstream analysis.

Therefore, rare mutations in CNTNAP2 present in the EGF domain could be critical players in the manifestation of autism along with its physical interactors. The gene could act as a marker to warrant studies at the molecular level for further insights into the underlying pathways.

Conclusion

Autism exhibits genetic heterogeneity, and hence, it becomes difficult to pinpoint one single gene for its manifestation. The gene clusters with varied pathways show the convergence of multiple gene variants, resulting in autism manifestation. Whole-exome sequencing proves to be a reliable tool for deciphering the causal genes for autism manifestation. Deciphering the autism exome identified the mutational landscape derived from single and multi-base DNA variants. Genes carrying mutations were identified in synaptogenesis processes, EGF signaling, and PI3K/MAPK signaling. Protein-protein interactions of NrCAM and CNTN4 with CNTNAP2 increased the impact and burden on autism.

Limitation of the study

A detailed study in a larger cohort with parental and sibling exome analysis could be warranted to identify familial markers. Overlapping studies could be performed on similar datasets re-sequencing techniques. Further, the variants identified could be validated using Sanger sequencing if samples were available.