Background

Epilepsy is a group of chronic neurological diseases characterized by unprovoked seizures due to abnormal neuronal firing in the brain. It is one of the most frequently encountered medical problems in neurology clinics. Epilepsy affects up to 1 in 26 individuals in the USA and approximately 5–7 per 1000 individuals worldwide with reports indicating that at least 50 to 100 million people in the world suffer from this disease, with approximately 85% of these cases living in developing countries [1,2,3]. However, 20–30% of all epilepsy cases are due to acquired factors, such as infection, stroke, trauma, neoplasms, and autoimmunity, while the remaining cases are thought to be due to genetic factors [4]. Epidemiological studies have observed an increased risk of epilepsy development in the relatives of epileptic individuals [5].

Genome-wide association studies (GWAS) of epilepsy phenotypes using genome-wide genotyping (GWG) arrays have discovered many single-nucleotide polymorphisms (SNPs) associated with the disease. A meta-analysis conducted by the International League Against Epilepsy in 2014 identified SNPs associated with the development of epilepsy in the sodium voltage-gated channel alpha subunit 1 (SCN1A), Protocadherin 7 (PCDH7), FA Complementation Group L (FANCL) and Vaccinia Related Kinase 2 (VRK2) genes [6]. A subsequent large GWAS encompassing over 15,200 epilepsy subjects and 29,600 controls revealed 16 genome-wide significant signals reaching statistical significance. Genes in these loci have diverse roles in histone modification, ion-channels, pyridoxine metabolism, synaptic transmission as well as transcription factors. Interestingly, functional annotation of almost 500 SNPs, that were observed to be significant at a genome-wide statistical threshold, found that most are in intronic (46%) or intergenic (29%) regions, with only 4 non-synonymous SNP associations observed, of which 2 were missense variants [7]. Furthermore, approximately half of the SNPs were implicated to impact gene transcription from gene regulation database analyses [7].

Second-generation sequencing technologies including whole-exome sequencing (WES) have been used in the diagnostic or research settings to identify genetic mutation(s) which may cause highly variable phenotypic expression in epileptic subjects [8,9,10]. The very nature of the WES approach favors detection of rarer highly penetrant gene-coding variants which are often hard to find in GWG-based GWAS approaches.

Performing WES in Saudi Arabian epilepsy populations offer a unique opportunity for the discovery of rare genetic variants impacting this disease as there is a high rate of consanguinity among large tribal pedigrees. A study in the Eastern province of Saudi Arabia has reported the prevalence of epilepsy to be around 6–7 per 1000 individuals [11]. In the present study, WES was performed on 144 individuals diagnosed with epilepsy with varying age-of-onset. An American College of Medical Genetics (ACMG) guideline-based variant prioritization approach followed WES which allowed for the discovery of potentially causative variants in our cohort of epilepsy subjects.

Materials and methods

Patient sampling and ethical approval

Over a period spanning 2018–2020, samples and data from consecutive subjects with epilepsy attending the Neurosurgery Clinics, King Fahd Hospital of the University, Al-Khobar, and King Fahd Hospital, Alhafof, Saudi Arabia, were collected for inclusion in this study. Participants ranged in age from 13–51 and were clinically diagnosed with epilepsy at the point of recruitment. The phenotype data of all subjects were reviewed by a consultant committee to verify uniformity among sites and eligibility consistent with International League Against Epilepsy [12]. In this study, the diagnosis of epilepsy was made by a consultant specializing in epilepsy based on the patients’ clinical history. Moreover, cases with moderate-to-severe intellectual disability, or cancer, were excluded from the study. If there was doubt regarding a patient’s phenotypic eligibility, the individual cases were reviewed by the consultant committee and if needed, additional data were requested prior to a decision being made by the committee regarding inclusion of the patient. However, on completion of the project the medical records of all patients were reviewed again. Among all our epileptic patients included in the study, eight patients also had type 2 diabetes, five patients had hypertension, one patient had cardiovascular disease and stroke, one patient had polycystic ovarian syndrome, and two patients had muscular dystrophy. Table 1 outlines the subjects’ demographic and clinical characteristics.

Table 1 Clinical and demographics characteristics of the 144 Saudi epilepsy subjects

Ethical approval for the study was obtained from the local Institutional Review Board (IRB) committees (IRB-2015-01-063), and the study was conducted according to the ethical principles of the Declaration of Helsinki and Good Clinical Practice guidelines. All patients included in the study signed a written informed consent.

DNA sequencing, read alignment, variant calling and quality control

Blood samples were collected from subjects in EDTA vacutainers and after collection were immediately stored at − 80 °C. Standard DNA preparation was performed using DNeasy Blood kits (Qiagen, MD, USA). Whole-exome sequencing libraries were generated using the Agilent SureSelect Human All Exon Kit V5 (Agilent, CA, USA) and sequenced on a HiSeq 2500 instrument (Illumina, CA, USA) using standard paired-end sequencing protocol. Raw sequencing reads were stored as FASTQ files and then aligned to the human reference genome (GRCh37) using Illumina’s Dynamic Read Analysis for GENomics (DRAGEN) Pipeline. Resultant BAM files were position-sorted and duplicate reads marked. Single-sample gVCF files were generated by the DRAGEN Germline Pipeline, and joint calling of all samples in the study cohort were performed by DRAGEN Joint Genotyping.

Principal components analysis (PCA) and Kinship

KING was used for relatedness inference based on the genotype of exome SNPs (MAF > 0.01) [13]. Estimated kinship coefficient and number of SNPs with zero shared alleles (IBS0) between a pair of individuals were plotted. Parent–offspring, sibling pairs, and unrelated pairs can be distinguished as separate clusters on the scatterplot. Ancestry and kinship toolkit (AKT) was used to calculate PCAs and plot the results [14].

Variant annotation, filtering and prioritization

Variants were annotated with SnpEff to predict the effects of variants [15]. Rare variants were defined as minor allele frequency (MAF) < 1% in the Genome Aggregation Database (GnomAD) [16]. Intronic, synonymous, 3’ and 5’ UTR, up- and downstream variants were identified and excluded from the analysis. The remaining rare variants were considered to be potentially deleterious variants. Genetic variants classified in ClinVar as “Likely pathogenic” or “Pathogenic”, and in Human Gene Mutation Database (HGMD) as disease-causing mutations (DM) for epilepsy or seizures were collected and curated together with research literatures to server as the knowledgebase for variant prioritization and classification [17, 18].

Results

Principal component analysis

The common genetic variants of these Saudi epilepsy individuals show a unique cluster when compared to the world’s major populations based on principal component analysis (Fig. 1). The samples demonstrated a genetically matched background which avoided false attribution of associations due to population stratification.

Fig. 1
figure 1

The x-axis and y-axis denote the value of two components of PCA (PC1, PC2), with each dot in the figure representing one individual. The color for individuals belonging to the Epilepsy Disease study group is illustrated in black. The color for individuals from 1000 genome projects, Europeans (EUR), East Asians (EAS), Admixed Americans (AMR), South Asians (SAS), and Africans (AFR) is red, blue, green, purple and orange, respectively

Potentially pathogenic rare variants identified in 44 epilepsy subjects

Based on a combination of variant filtering, and integration of variant databases in ClinVar, and HGMD, we identified a total of 32 potentially causative pathogenic variants across 30 different genes in 44/144 (30%) epilepsy subjects as shown in Table 2. Additional file 1: Supplementary Table S1 outlines additional minor allele frequencies in additional databases and further annotation of these putative pathogenic variants.

Table 2 Potentially causative pathogenic variants derived from screening of 144 Saudi epilepsy subjects. Chromosomal position is outlined for the 32 putative pathogenic variants in 30 gene regions along with annotation of the putative pathogenic variant mapped to human reference genome build 37 (GRCh37). Minor allele frequencies (MAF) are shown for the: Saudi epilepsy cohort; Genome Aggregation Database (GnomAD); and Human Gene Mutation Database (HGMD). ClinVar annotation for likely clinical significance is also listed

The 30 genes harboring the likely pathogenic variants observed in these 44 Saudi epilepsy subjects were then assessed for overlap with 102 previously collated monogenic epilepsy genes [7]. Likely pathogenic variants in 12 of these 102 monogenic epilepsy genes were observed in 44 epileptic subjects: CHRNA4, CLN3, CLN8, DEPDC5, KCNJ10, KCNMA1, POLG, PRICKLE1, SCN1A, SCN2A, SCN8A and SCN9A. Of the 18 additional genes from Table 2 with likely pathogenic variants, a number including SZT2, SCN10A, UBA5 have been reported in whole-exome sequencing in epilepsy subjects [19, 20]. Only one homozygous mutation, a stop-gain, was observed in single individual [21], in Potassium Calcium-Activated Channel Subfamily M Alpha 1 (KCNMA), a calcium-sensitive potassium channel gene which has been shown to have a role in general and early-onset epilepsy-related phenotypes [22,23,24,25,26]. This individual, a female who was diagnosed with childhood epilepsy at 12 years old, has one family member with a diagnosis of epilepsy and was assessed to have first degree of consanguinity.

Most commonly observed likely pathogenic variants

An in-frame insertion mutation in non-imprinted in Prader–Willi/Angelman syndrome region protein 2 (NIPA2) (chr15, GRCh37 position: 23006299) was observed in seven of the study subjects (shown in Table 3 along with age of onset). A missense variant in SH2B Adaptor Protein 3 (SH2B3) (chr12, position: 111856571) and in Cholinergic Receptor Nicotinic Alpha 4 Subunit (CHRNA4) (chr20 position: 61981924) were both observed in four individuals. Missense variants in STIL Centriolar Assembly Protein (STIL) (chr1, position: 47746675), and in Tryptophanyl-TRNA Synthetase 1 (WARS2) were both observed in three individuals.

Table 3 Most commonly observed putative pathogenic variants from whole-exome sequencing across 144 Saudi epilepsy patients

Variants of unknown significance identified in epilepsy 133 subjects

In highly curated genomic disease databases such as ClinVar, there are a large number of variants of unknown significance (VUS), where there is unknown or conflicting clinical significance to date for association of such variants with epilepsy-related phenotypes. We identified 232 variants of unknown significance across 101 different genes in 133/144 (92%) subjects as shown in Additional file 2:Supplementary  Table S2. Interestingly when the genes harboring these variants of unknown significance were intersected with the 102 previously collated monogenic epilepsy genes it was observed that 43 of these monogenic gene variants were enriched in the 101 loci with VUS from these 133 Saudi epilepsy subjects (Additional file 2: Table S2). We note that two individuals were observed to have homozygous potential pathogenic variants. One individual showed homozygosity for a missense variant in Spermatogenesis Associated 5 (SPATA5), with 3 individuals showing heterozygosity for this mutation (Additional file 2: Supplementary Table S2). SPATA5 has shown clear association with severe childhood epilepsy [27,28,29,30,31,32]. This female individual was diagnoses with childhood epilepsy at the age of 10 years old and two family members having a diagnosis of epilepsy, and both parents are listed as cousins.

Another homozygous mutation was observed in one individual for a missense mutation in Calcium Voltage-Gated Channel Subunit Alpha1 H (CACNA1H) with three additional individuals also carrying one copy of this mutation. Mutations in this gene have been associated with generalized and severe epilepsies [33, 34], although it has been debated whether this is a bona fide monogenic epilepsy gene [35]. This female individual was diagnosed with epilepsy at 15 years of age and does not have any other family members with a diagnosis of epilepsy, or any consanguinity noted.

Discussion

We performed the first whole-exome sequencing study in Saudi Arabia epilepsy subjects. Using 144 individuals, we compared putative pathogenic variants as well as variants of unknown significance with population-based whole-exome sequencing and whole genome sequencing databases.

The highest number of observed mutations across the 144 subjects were observed in NIPA2, a highly selective magnesium transporter. This in-frame insertion variant (NP001171818.1 p. (Asn334Glu335insAsp)) was observed in 7 subjects from 144 overall (5%). This variant has been previously reported within a population of subjects with childhood absence epilepsy (CAE) [36,37,38].

A missense variant in CHRNA4 was observed in 4 subjects. CHRNA4 is a nicotinic acetylcholine receptor, belonging to a superfamily of ligand-gated ion channels which play an established role in signal transmission at synapses. Mutations in CHRNA4 have been reported with nocturnal frontal lobe epilepsy type 1. A missense variant in SH2B3 was observed in 4 subjects. This gene is involved in a range of signaling activities by growth factor and cytokine receptors as part of the SH2B adaptor family of proteins. Mutations in this gene have been associated with susceptibility to celiac disease type 13 and susceptibility to insulin-dependent diabetes mellitus. It has low expression in the brain however as evident in the Genotype-Tissue Expression (GTEx) database. Missense mutations in STIL were observed in 3 subjects. STIL is a cytoplasmic protein which plays a role in the regulation of the mitotic checkpoint machinery. It too has low expression levels in all GTEx brain tissues.

There are a number of prioritized signals observed in this Saudi epilepsy study that may be novel or have very limited reports of association. Deficiency of WARS2 was observed in a patient with severe infantile-onset leukoencephalopathy, profound intellectual disability, spastic quadriplegia, epilepsy and microcephaly [39]. Rare mutations in DPYD have been implicated in children with unspecific neurological symptoms [40]. Epileptic encephalopathy caused by recessive loss-of-function (LoF) mutations have been reported in DENND5A [41]. A previous report of infantile cerebral and cerebellar atrophy showed association with a mutation in MED17 [42]. A LoF mutation in HCN4 has been reported to be associated with Familial benign myoclonic epilepsy in infancy [43]. Mutations in STRADA, SYNJ1, CACNA1A and NPRL3 have also been reported with severe epilepsy-related disease, but the association has not been reported for heterozygous variants in isolated forms of epilepsy [44,45,46,47,48,49,50,51].

A number of prioritized signals of putative pathogenicity were observed that have no reports of association with epilepsy in the literature including SEC24D, PCCA, MYO5A which may be strong candidates for further functional studies. SEC24D was reported to play a role in in vesicle trafficking and mutations in this gene are associated with Cole-Carpenter syndrome, a disorder affecting bone formation [52]. PCCA codes for the alpha subunit of the mitochondrial enzyme Propionyl-CoA carboxylase, and mutations in this gene leads to enzyme deficiency and are associated with propionic acidemia [53]. MYO5A encodes myosin 5A, and mutations in this gene are associated with Griscelli syndrome, which is characterized by hypopigmentation and a primary neurological abnormality [54]. These aforementioned genes may be good candidates for further functional studies.

This study is limited in that incomplete pedigrees and depth of sub-phenotyping are available, although putative pathogenic variants many known epilepsy-related loci are evident, and a number of potential new loci may be prioritized for further investigation. The Saudi population offers a lot of promise for elucidation of the genetic etiology of common diseases such as epilepsy due to consanguinity, with extended homozygosity stretches often observed over several megabases, affording the opportunity to enrich for recessive forms of epilepsy. While this study looked at 144 subjects, we are aiming to expand the cohort to encompass genetic analyses of extended pedigrees with additional phenotyping.

Conclusions

We identified for the first time 32 potentially causative pathogenic variants in Saudi individuals with epilepsy. In addition, several potential new loci have been identified that have no reports of association with epilepsy in other populations. These potentially causative pathogenic variants in these new loci may be prioritized for further functional studies.