Background

Autism spectrum disorder (ASD) is a heterogeneous group of neurodevelopmental disorders (NDD) with a prevalence of approximately 1 in 160 children worldwide [1] and with variable clinical presentations and outcomes [2]. According to the latest version of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), it is characterized by impaired social communication along with repetitive behavior or restricted interests which can persist throughout lifetime [3, 4]. In addition to these core features, many affected individuals can be afflicted with comorbidities like intellectual disability and epilepsy. A review and meta-analysis of ASD in India reported low prevalence of only 0.0014 − 0.0012% in children aged 1–18 years compared to developed countries like the United States and United Kingdom with a prevalence of 1-1.5% [5]. However, a review across the South Asian population reported its prevalence rate ranging from 0.09 to 1.07% which is similar to that observed in developed countries [6].

The etiology of ASD is not fully understood, although, similar to several neurodevelopmental disorders, genetic risk and environmental exposure appears to contribute to the pathogenesis of ASD [7, 8]. Data from twin studies suggest a strong genetic role and a quantitative meta-analysis on all published twin studies in the context of ASD has estimated heritability component between 64 and 91% [9]. Therefore, genetic testing is recommended in ASD patients and as of 2013, an etiology underlying ASD could be established in around 6–15% cases [10]. Guidelines put forth a decade ago by the American College of Medical Genetics (ACMG) suggests using chromosomal microarray (CMA) as a first line test in ASD since its diagnostic yield was estimated to be between 7 and 9% [2, 10]. However, since then, studies using whole exome sequencing (WES) have evidenced sequence level contribution of de novo variants in the etiology of ASD and recent advancements in computational analyses of WES data suggests improvement in detection of copy number variants (CNVs) too. Indeed, two recent studies have shown that WES was able to detect nearly all clinically relevant CNVs that were detected by CMA thereby increasing its diagnostic yield by approximately 1.6% [11, 12]. In addition, a recent retrospective study using WES on clinically diagnosed 343 children with ASD from Spain suggested a diagnostic yield of ~ 14% with 75% of the cases harbouring a de novo variant [1]. It is predicted that nearly 85% of the disease causing variants reside in the protein coding and splice site regions of the genome, which are well covered by WES [13,14,15]. Various studies have repeatedly shown a better yield and utility of WES over CMA in NDD and thus, WES has now been suggested as a first-tier test for patients with intellectual disability/ NDD [16, 17].

Selection and availability of a first-tier test with high diagnostic yield is desirable in low-middle income countries (LMICs) like India, since patients and families bear the cost of genetic testing. To our knowledge, no study to date has been performed in the Indian population to delineate the genetic architecture of ASD which can aid in the selection of first-tier genetic test. Here, we report the first systematic study to assess the genetic architecture and molecular diagnostic yields for karyotype, Fragile-X testing, CMA and WES in a population-based cohort of 101 patient-parent trios with ASD from India.

Materials and methods

Patient recruitment and sample collection

The study included consecutively recruited 101 children with a confirmed clinical diagnosis of idiopathic ASD based on the DSM-5 [3, 4]. Children with prominent syndromic features, isolated speech delay or isolated sensory processing disorders were excluded from this study. Blood samples of the patient-parent trios were collected. The parents or guardians of all probands provided a written informed consent as per the Helsinki Declaration and the study was approved by the research ethics committee at Foundation for Research in Genetics and Endocrinology, Ahmedabad (ID: FRIGE/IEC/19/2020). All the methods in the study were carried out as per the Helsinki Declaration. High molecular weight genomic DNA was extracted using desalting method [18] and was stored at -20 °C until molecular genetic testing was carried out.

Karyotyping and Fragile-X testing

Karyotyping was performed in all cases regardless of sex, whereas Fragile-X testing was performed only in male probands. Karyotyping was carried out using GTG banding at 500 band resolution to check for gross chromosomal aberrations. Fragile-X testing was carried out by triplet repeat primed – polymerase chain reaction (TP-PCR), that involved analyzing CGG repeat expansion in the 5’ UTR of the FMR1 gene using method as previously described [19]. Children with a normal chromosomal constitution and showing no expansion of the CGG repeats in the 5’ UTR of FMR1 gene were subsequently assessed with CMA and WES.

Chromosomal microarray

CMA was carried out using CytoScan™ Optima array, GeneChip™ System 3000 and Affymetrix platform (Thermo Fisher Scientific, USA) as per the manufacturer’s instructions. Chromosome Analysis Suite Software (ChAS) (Thermo Fisher Scientific, USA) was used to carry out the analysis of the data as per the manufacturer’s recommendations which suggested a minimum resolution of 1 Mb for losses, 2 Mb for gains and 5 Mb for copy neutral loss of heterozygosity. For all candidate CNVs, variants were primarily screened for population frequency and known disease associations using publicly available databases like gnomAD database [20], DGV [21] and DECIPHER [22] and OMIM [23]. Pathogenicity of CNVs were classified in accordance with ACMG and ClinGen classification system [24]. All candidate CNVs were validated in proband and parents using SYBR Green based quantitative PCR (Q-PCR) using ABI’s StepOne Real Time PCR system (Thermo Fisher Scientific, USA) (Supplementary Table 1).

Whole exome sequencing

Genomic DNA of the proband was subjected to selective capture and sequencing of the protein coding regions that included exons and exon-intron boundaries of genes using Agilent SureSelect v6 enrichment kit (Agilent, USA). The library prepared, was subjected to paired-end sequencing with a mean coverage of > 80-100x on the Illumina HiSeq or NovaSeq platform (Illumina, USA). Sequences obtained as FASTQ files were aligned to the human reference genome (GRCh37/hg19) using BWA MEM v0.7.12 [25]. SNVs and indels were called using GATK v4.12 Haplotype caller [26]. In addition to SNVs and small indels, copy number variants (CNVs) were detected from the data using the ExomeDepth v1.1.10 [27].

Variant annotation, filtration and prioritization was performed using Exomiser v12.1.0 [28]. Exomiser uses the hiPHIVE prioritization method that incorporates protein-protein interaction networks and multi-species ontologies along with ranking candidate genes based on the predicted variant pathogenicity associated with the phenotype. The phenotype information was coded in uniform human phenotype ontology (HPO) terminologies [29]. Common variants were filtered based on minor allele frequency in the 1000Genome Phase 3 [30] and gnomAD v2.1 [20] databases. The minor allele frequency cut off was set at 0.02 (2%). The cut-off was set assuming ASD has a global prevalence of 1:100; the frequency of major and minor alleles would be 0.9 (p) and 0.1 (q), respectively, based on the Hardy-Weinberg equilibrium. As ASD is caused by dominant de novo variants in majority of the cases (pq = 0.09) and the prior estimates suggests genetic diagnostic yield of approximately 33%, pq would be 0.027. Only non-synonymous variants in the coding region and canonical splice site variants with a depth of > 20x were used for analysis and clinical correlation. Various in-silico prediction tools such as PolyPhen-2 [31], SIFT [32], MutationTaster2 [33], LRT [34], CADD [35] and MetaDome [36] were used to predict pathogenicity of non-synonymous and indel variants. A CADD_phred score of ≥ 15, slightly intolerant, intolerant or highly intolerant predictions of MetaDome and at least two damaging predictions from the remaining in silico tools were used for selection of candidate variants. In-silico predictions along with available knowledge from various sources and databases as described below was used in prioritising the variant.

Post-gross filtering, variants were prioritized based on the following: (a) known disease causing variant previously reported in databases like ClinVar [37] and HGMD [38]; (b) novel variants in known genes based on the Z-score for missense and pLoF or LOEUF score for loss of function variants available in the gnomAD database [20]; (c) variants in novel candidate genes wherein the respective gene was additionally evaluated for their function using UniProt [39] and Human Protein Atlas (proteinatlas.org) [40]. Tissue expression using GTEx database (gtexportal.org), association/ interaction with known ASD genes using STRING database [41] and, plausible phenotypic outcome in murine models based on the MGI database [42] were assessed. All candidate variants were assessed using IGV [43] to evaluate their quality.

In the case of candidate CNVs, variants were primarily screened for population frequency and known disease associations using publicly available databases like gnomAD database [20], DGV [21], DECIPHER [22] and OMIM [23]. Pathogenicity of CNVs were classified in accordance with ACMG and ClinGen classification system [24].

All candidate SNVs and indels were validated in proband and parents using bi-directional Sanger sequencing using ABI’s SeqStudio platform (Thermo Fisher Scientific, USA) whereas all candidate CNVs were validated using SYBR Green based quantitative PCR (Q-PCR) using ABI’s StepOne Real Time PCR system (Thermo Fisher Scientific, USA) (Supplementary Table 1). This was conducted to delineate mode of inheritance and reclassify variant pathogenicity.

The classification of SNVs was carried out according to the American College of Medical Genetics – American College of Pathologists (ACMG-AMP) guidelines [44] and ClinGen framework [24].

Results

Study cohort

The study cohort consisted of 101 well defined patient-parent trios diagnosed with moderate to severe ASD of unknown etiology as per the DSM-5 criteria. The average age at recruitment was 5 ± 3 years and ranged from 2 to 6 months to 16 years (Table 1). The average maternal and paternal age at the time of conception was 28 ± 4 years and 30 ± 4 years, respectively. The cohort included 72 males (71%) and 29 females (29%), suggesting a male to female ratio of approximately 3:1. Five families had more than one child diagnosed with ASD (Supplementary Information 1). Consanguinity was noted in 8 families (7.9%), whereas non-consanguinity and endogamy in 31 (30.7%) and 62 (61.4%) families, respectively. All 101 probands with ASD also had developmental delay and intellectual disability with some of them having subtle dysmorphism (large and/ or cupped ears, long eyelashes, telecanthus, thin upper lip) (n = 28/101; 27.7%) and epilepsy (n = 28/101; 27.7%) (Supplementary Table 2).

Table 1 Demographics of 101 patient-parent trios

Outcomes from karyotype and fragile X testing

Sequential genetic testing was performed in all 101 patients which began with karyotyping and were followed by fragile X testing (only in male probands), CMA and WES. None of the probands showed gross chromosomal aberrations or had expanded triplet repeat tracks (full-mutation alleles with > 200 CGG repeats) in the 5’-UTR region of the FMR1 gene. Therefore, all probands were subsequently tested using CMA and WES.

Outcomes from chromosomal microarray

From the 101 probands in whom CMA was performed, pathogenic CNVs were detected in 3 cases (2.9%) including two deletions and one duplication (Table 2). Proband ASD-076 had an 8 Mb deletion at the 15q11.2 locus which encompassed 20 OMIM genes and is known to cause 15q11.2 deletion syndrome (OMIM#615,656) or Angelman syndrome (OMIM#105,830). Compared to the individuals with class II deletions (BP2-BP3; ISCA-37,478), individuals with large class I deletions (BP1-BP3; ISCA-37,404) at the 15q11.2 region are observed to have a high likelihood of language impairment and autistic traits, similar to that seen in the proband in our study [45]. Patient ASD-103 was detected with a deletion of 0.19 Mb size at the 9q34.3 locus which encompassed 6 OMIM genes and is associated with Kleefstra syndrome I (OMIM#610,253). Individuals with > 1 Mb deletion of the 9q34 locus have a severe phenotype such as congenital anomalies including heart defects, limb anomalies, seizures and respiratory distress. In contrast individuals having < 1 Mb deletion are observed with a milder phenotype, which in part could explain the phenotype in the proband in the current study such as bruxism, drooling, subtle facial dysmorphism and recurrent episodes of vomiting [46, 47]. Lastly, proband ASD-050 was detected with a 0.52 Mb duplication on the 1q22 locus which consists of 8 OMIM genes. This is a rare CNV which has previously only been reported in a boy with intellectual disability and psychiatric disturbances [48]. Multiple individuals in this family were affected and the duplication variant segregated with the neurological features in all family members with this variant. All CNVs in our cohort were de novo in origin and were observed exclusively in male probands.

Table 2 List of cases observed with pathogenic or likely pathogenic copy number variation using CMA and/or WES.

Outcomes from whole exome sequencing

WES was carried out in 99 of 101 cases, as the cohort contained two monozygotic twin pairs and only one proband from each twin pair was processed for WES. The 99 cases also included the three cases that yielded a result by CMA to assess the sensitivity of WES to detect CNVs. On an average, approximately 3 candidate gene(s) or variant(s) were identified per proband (Supplementary Table 3).

From the 101 patients, pathogenic and/ or likely pathogenic variants were identified in 30 cases (29.7%), of which, SNVs were detected in 27 cases (90%) and CNVs in 3 cases (10%) (Table 3). Interestingly, 3 CNVs detected by CMA were also identified by WES, however, a 0.8 Mb de novo deletion encompassing the BP1 region of the 15q11.2 locus was detected by WES alone (Table 2). On further analysis, the lack of detection of the aforementioned CNV by CMA was due to the lack of probes covering this region on CytoScan™ Optima array.

Table 3 List of cases observed with pathogenic or likely pathogenic single nucleotide variation using WES

Segregation analysis revealed that approximately 66.6% (n = 3 for CNVs and n = 17 for SNVs) of the cases were caused due to a de novo variant. De novo SNVs were found primarily in previously known ASD genes- MECP2, SCN2A, KCNQ2, TBL1XR1, CNTNAP2, TCF4, CAMK2A, NF1, AUTS2, FOXP2 and NLGN3. Of 17 de novo variants, 6 were predicted to be loss of function (pLOF) variants (35.2%) whereas the remaining were missense variants. Remarkably, 6 of the 17 patients had a de novo SNV in the MECP2 gene, which is associated with Rett syndrome (OMIM#312,750). Of these, 5 were female and 1 was a male proband. Interestingly, in a rare case of the male proband aged 2.5 years with Rett syndrome, we observed that the variant c.538 C > T (p.Arg180Ter) in the MECP2 gene originated through a post-zygotic de novo event which led to somatic mosaicism in the proband (Table 3) [49].

In our cohort of patients with pathogenic/ likely pathogenic variants, 5 probands (n = 5/30; 16.6%) were observed with biallelic or hemizygous variants in genes associated with NDD or metabolic disorders with a recessive mode of inheritance (Table 3). Specifically, biallelic variants were detected in (i) ALDH4A1 gene which is associated with hyperprolinemia type II (OMIM#239,510), (ii) NEUROG1 gene which is associated with congenital cranial dysinnervation disorder and autism spectrum disorder [50], (iii) KDM6A gene which is associated with Kabuki syndrome 2 (OMIM#300,867), (iv) LMAN2L gene which is associated with mental retardation 52 (OMIM#616,887) and, (v) ALDH7A1 gene which is associated with pyridoxine dependent epilepsy (OMIM#266,100).

In addition, 4 probands were identified with pathogenic/ likely pathogenic heterozygous variants, which were inherited from one of their parents. In 2 cases, the variants were inherited from unaffected mother and in 1 case the variant was inherited from an unaffected father. In the 4th case, pLOF variant c.202 C > T (p.Gln68Ter) in the RORB gene was inherited from father who also had a clinical history of seizures (Supplementary Table 2; Supplementary Information 1). Of note, in one case (ASD-003), paternal sample was un-available, hence the mode of inheritance couldn’t be deduced. Interestingly, ASD probands with epilepsy had a higher diagnostic yield (n = 15/28; 53.6%) compared to ASD probands without epilepsy (n = 15/73; 20.5%) (χ2 = 10.6, p = 0.001), however, no such association was observed for facial dysmorphism (χ2 = 0.67, p = 0.41) and social/ speech regression phenotypes (χ2 = 0.53, p = 0.47).

Lastly, WES identified 22 VUS variants in 21 patients (n = 21/101; 20.8%; Supplementary Table 4). The variants were identified in genes that have previously been associated with or implicated in ASD etiology as per the Simons Foundation Autism Research Initiative (SFARI) Gene Database and Autism Database (AutDB). Of these, majority of the probands were detected with heterozygous variants (66.6%) which were inherited from either of the unaffected parents with equal distribution. Of note, 3 of the 21 patients following segregation analysis were detected with missense variants in the KMT2C gene (Kleefstra syndrome 2; OMIM#617,768) which were inherited from a healthy parent. Whilst the majority of the cases have been reported with a de novo variant in the KMT2C gene, 4 reports observed variants being inherited from a healthy parent suggesting a potential oligogenic mode of inheritance [51,52,53,54].

Discussion

Almost a decade ago, the ACMG published guidelines recommending CMA as a first tier test for delineating the genetic cause of ASD and other NDDs [2, 10]. Since then, WES coupled with advancements in computational analyses has led to simultaneous detection of SNVs and CNVs. Studies carried out in multiple ethnic populations since 2015 have shown an increased diagnostic yield from WES compared to CMA in ASD [1, 2, 55, 56]. This outcome is supported by the observation of a high proportion of de novo SNVs in ASD patients which are not detectable by CMA. To our knowledge, we here report the first description of the genetic architecture of ASD and simultaneously carry out diagnostic yield comparisons of karyotype, FMR1 triplet repeat expansion, CMA and WES in a cohort of 101 patient-parent trios of Indian origin.

Our data is in congruence with prior reports and supports the utility of WES as a primary genetic diagnostic method for ASD. In the present cohort, WES detected pathogenic/ likely pathogenic variants causative of the ASD phenotype in 29.7% of the cases in contrast with 2.9%, 0% and 0% from CMA, FMR1 triplet repeat expansion and karyotype testing, respectively. Indeed, all three CNVs detected by CMA were also detected by WES together with a fourth CNV which was detected by WES alone. Interestingly, the low yield of CMA in the present cohort can be attributed to two potential reasons. First, gross dysmorphism was an exclusion criteria during recruitment of cases for the study. Prior study by Tammimes et al., has shown a higher diagnostic yield of CMA in children with ASD and major congenital anomaly compared with children with minor physical anomaly [2]. Two, Affymetrix CytoScan Optima oligonucleotide array was used in the current study. The platform consists of 315,608 probes and requires at least 25 probes to call a loss or gain of approximately 100 kb in size. Prior study has shown a trend for differential diagnostic yield with CMA based on both platform resolution and phenotypic manifestation in ASD patients [2]. A higher resolution microarray (1 million probes or more) had a higher diagnostic yield in ASD patients with minor physical anomalies compared to low resolution microarray (44k platform), however, this difference was abated when the test was carried out in ASD patients with major congenital anomalies [2]. It is therefore plausible that the current platform may have missed CNVs that are beyond its detection limit, which could have been picked up with a higher resolution microarray platform. The diagnostic yield in the present cohort is concordant with those reported previously from individual cohort studies [1, 2, 55, 56]. Indeed, a recent meta-analysis in patients with NDD i.e. global developmental delay, intellectual disability and ASD showed diagnostic yield of WES to range from 31 to 53% in contrast to CMA with yield ranging 15–20% [16]. Based on these results, Srivastava et al. outlined a consensus statement and a stepwise algorithm for NDD diagnosis whereby WES is presented as the first-tier test followed by CMA and/or other orthogonal tests.

Interestingly, we observed that in 66.6% and 16.1% of the cases with a genetic diagnosis for ASD, the mode of inheritance for the variant was de novo and recessive, respectively. This is in congruence with prior patient-parent trio cohort studies whereby similar rates for variant’s mode of inheritance was observed [1, 2, 57]. All genes identified carrying potential causative variants were subjected to STRING analysis v11.5 (Fig. 1). The network statistics consisted of 37 unique proteins resulting in 67 various protein-protein interactions (PPI) amongst themselves. In comparison, a random set of same number of proteins, would result in only 12 different interactions. With a p-value of < 1.0e-16, a statistically significant enrichment of PPI in the present cohort indicated a biological connection amongst these proteins. Majority of these proteins are involved in synaptic formation, transcription and its regulation, ubiquitination and chromatin remodeling, as have been observed in prior studies [58]. This leads to a plausible hypothesis that the genetic architecture and etiopathogenesis of ASD is similar across ethnicities and an introduction of a uniform stepwise genetic testing algorithm would yield similar diagnostic yields.

Fig. 1
figure 1

STRING network analysis show genes involved in synaptic junction formation (dark red), signal transduction (grey), transcription regulation (orange) and histone modification (light blue)

In our cohort, three genes (LRFN1, UNC13A and UNC79) were identified as potential novel candidates for ASD. The variant in the LRFN1 gene was a result of a de novo event. LRFN1 interacts with DLG4, a known ASD gene vital in the formation of the post-synaptic complex required for signal transduction [59]. DLG4 is classed under a high confidence category with a gene score of 1 in the SFARI database and has an Evaluation of Autism Gene Link Evidence (EAGLE) score of 2.45, which suggests limited but no contradicting evidence of its role in ASD. Due to the direct interaction between the two genes, LRFN1 could be considered as a potential candidate for ASD, although functional validation is required and was beyond the scope of the current study. The variants in the UNC13A and UNC79 genes were inherited from likely asymptomatic parents and classed as VUS. Both these genes have been listed in the AutDB and SFARI database and have been considered novel due to the absence of an associated phenotype in the OMIM database. A patient with developmental delay, dyskinetic movement disorder and autism has been previously identified with a de novo variant in the UNC13A gene [60]. Additionally, experimental evidence suggests its direct interaction with a known ASD associated gene, STXBP1. Only recently, UNC79 gene has also been associated with neurodevelopmental features including autism [61].

With an increasing awareness of ASD amongst the general populous, there is a high likelihood of increase in demand for genetic testing in children with ASD. In a survey of parents having a child with ASD in USA, 80% of the parents indicated that they would pursue genetic testing to identify risk of ASD in the younger sibling [62]. However, financial concerns, not being offered genetic testing by a physician or a geneticist and lack of awareness are amongst the most common reasons for not opting for genetic diagnosis [63]. In addition, with the advent of development and deployment of new treatments such as trofinetide for Rett syndrome, there is likely to be increase in uptake for genetic testing [64]. This suggests that adoption of a uniform genetic testing algorithm coupled with educating primary care physicians and non-genetic specialists could improve rates of genetic testing and diagnosis in children with ASD.

Limitations

The limitations of our study include a relatively small sample size, possible ascertainment bias related to patients having primarily non-syndromic form of ASD without gross congenital dysmorphism, carrying out WES and CMA in the proband only followed by segregation analysis by orthogonal approaches on prioritized variants and absence of detailed cost-effectiveness assessment. Despite this, we observe similar diagnostic yields to that observed in other cohorts [1, 2, 55]. Additionally, there are technical and interpretation limitations to the identification and prioritization of variants which were classified as VUS. Delineation of pathogenicity of these variants is often challenging because of their incomplete penetrance, variable expressivity and/or sex specific bias [65]. This however would require re-assessment of WES data every 2–3 years as per the consensus statement by Srivastava et al. using updated datasets and new computational tools [16]. Lastly, WES and CMA due to their inherent technical limitations are unable to resolve complex structural re-arrangements (e.g. inversions and translocations) which could play role in the pathogenesis of NDD [66], although, newer genomic technologies such as long-read whole genome sequencing could help to assess their role in the etiology of ASD.

Conclusion

Data from large scale genomic and transcriptomic studies have helped to delineate the genetic architecture of ASD in European/ non-Hispanic white populations. To the best of our knowledge, this is the first study to delineate the genetic architecture of ASD in the Indian population, with de novo variants in genes involved in synaptic formation, transcription and its regulation, ubiquitination and chromatin remodeling as the primary cause. In congruence with data from other ethnic populations, the current study provides evidence supporting the implementation of WES as the first-tier test in the genetic diagnosis of ASD.