Exome sequence analysis of rare frequency variants in Late-Onset Alzheimer Disease

Alzheimer disease (AD) is a leading cause of dementia in elderly patients who continue to live between 3 and 11 years of diagnosis. A steep rise in AD incidents is observed in the elderly population in East-Asian countries. The disease progresses through several changes, including memory loss, behavioural issues, and cognitive impairment. The etiology of AD is hard to determine because of its complex nature. The whole exome sequences of late-onset AD (LOAD) patients of Korean origin are investigated to identify rare genetic variants that may influence the complex disorder. Computational annotation was performed to assess the function of candidate variants in LOAD. The in silico pathogenicity prediction tools such as SIFT, Polyphen-2, Mutation Taster, CADD, LRT, PROVEAN, DANN, VEST3, fathmm-MKL, GERP +  + , SiPhy, phastCons, and phyloP identified around 17 genes harbouring deleterious variants. The variants in the ALDH3A2 and RAD54B genes were pathogenic, while in 15 other genes were predicted to be variants of unknown significance. These variants can be potential risk candidates contributing to AD. In silico computational techniques such as molecular docking, molecular dynamic simulation and steered molecular dynamics were carried out to understand the structural insights of RAD54B with ATP. The simulation of mutant (T459N) RAD54B with ATP revealed reduced binding strength of ATP at its binding site. In addition, lower binding free energy was observed when compared to the wild-type RAD54B. Our study shows that the identified uncommon variants are linked to AD and could be probable predisposing genetic factors of LOAD.


Introduction
Alzheimer disease (AD) is a chronic, progressive neurodegenerative disorder. AD is characterized by the accumulation of extracellular neurotic beta-amyloid plaques and intracellular neurofibrillary tangles composed of hyper-phosphorylated Tau protein leading to neuronal death and cerebral atrophy (Ballard et al. 2011;Harris 2012;Holtzman et al. 2011). The disorder is associated with cognitive dysfunctions and psychiatric and behavioural disturbances. AD significantly contributes to dementia, affecting around 35 million people worldwide. Based on the onset age, AD is classified as earlyonset (age < 65 years) and late-onset (age > = 65 years). Around 10% of AD patients are diagnosed with early-onset AD (EOAD) (Bekris et al. 2010). Studies have documented that the incidence rates for familial dementia and LOAD are significantly higher than population-based estimates. LOAD encompasses complex aetiology with a heritability rate of 58-79% (Vardarajan et al. 2014;Awada 2015).
The prevalence and incidence of AD indicated age as the most influential known risk factor. Early efforts in understanding AD were mainly focused on EOAD. These studies were centred on large multi-generation families' harbouring clear autosomal dominant patterns of inheritance related to mutations in genes that alter amyloid-beta (Aβ) protein production, aggregation, or clearance. EOAD patients were often associated with 3 common genes; Amyloid Beta Precursor Protein (APP) and Presenilin genes (PS1 and PS2), which occur in half of the EOAD cases. The genetic basis of LOAD is more complex, with susceptibility likely conferred by various common but less penetrant genetic factors, such as apolipoprotein E (APOE) alleles, interacting with environmental and epigenetic factors. Although substantial evidence indicates genetic factors as a key player, APOE-4, the only identified LOAD gene seems to act as a primary modifier at the age of onset and in patients with onset before age 70 years (Blacker and Tanzi 1998) (Bekris et al. 2010) (Bellenguez et al. 2019). The APOE-4 allele was overrepresented in both AD divisions. APOE-4/4 homozygotes act as a risk factor in LOAD cases.
Apart from complex genetics, other risk factors contribute to the difficulty in identifying LOAD genes. i) The base rate of LOAD cases is high and increases steeply with age. Hence clustering among families can occur due to chance alone, and several sources of the disease may co-occur in the same family. ii) LOAD occurs at the end of life span, and many individuals do not survive to the age of risk. iii) Elderly patients are prone to other sources of cognitive decline, diluting the power of genetic studies with individuals carrying the disease but are not actual gene carriers (phenocopies) (Blacker and Tanzi 1998) (Andrade-Guerrero et al. 2023). An early diagnosis of the disease is highly crucial for effective treatment. An early diagnosis helps the affected individuals to explore and benefit from drug and non-drug treatment available. An early diagnosis opens prospects for participating in wide range of clinical trials, leading to advancing research and providing medical benefits. The current treatment strategies mainly focus on reliving and delaying the progression of the symptoms.
Large-scale genome-wide association studies (GWAS) and meta-analyses of the GWAS have identified more than 30 different LOAD-susceptible loci, focusing on the European population (Karch and Goate 2015). These GWAS hits documented polymorphisms, mostly in intronic and intergenic regions. In addition, whole-exome microarray and whole-exome sequencing (WES) also contributed to identifying rare and novel variants. Recently, whole exome sequencing techniques have been utilized to understand the mutational mechanisms in several diseases, including cancer . These advanced techniques also help to comprehend the structural mechanisms disrupted upon mutations using computer simulations and docking methods (Udhaya Kumar et al. 2022) (Kumar and Priya Doss 2021) (Tayubi et al. 2022).
The genetic diversity existing among different ethnic groups may have an effect on genetic factors in the pathogenesis of AD. AD genetic and clinical sucecptibility profile seems to be different with different ethnic groups (Al-ThAni et al. 2021). Most of the genetic studies on AD were based on European cohorts, implying the absence of ethnic diversity data in genetic research. Including genetic studies on other populations, including East Asians, can lead to exhaustive genetic details of AD pathogenesis (Miyashita et al., 2022). The current work utilizes the distinct genetic profiles of Koreans with AD to discover LOAD-associated genes and variants. Higher AD occurrences are observed among older Koreans in East-Asian populations (Jang et al. 2021). Whole-exome sequencing analysis of post-mortem hippocampal regions from AD patients and age-matched healthy controls of Korean ethnicity is performed to identify novel LOAD risk genes.

Data retrieval
The exome sequences of LOAD-affected individuals and control samples were retrieved from NCBI SRA (Accession: PRJNA532465) (Leinonen et al. 2011). Post-mortem hippocampal regions of 52 AD and 11 age-matched control samples were retrieved. About 37 LOAD samples were selected based on the disease onset (age ≥ 65 years). 15 samples were from EOAD cases (age < 65 years) and used to verify rare variants. The disease stage, ethnicity, and gender were also verified for the samples (Table 1). The selected samples were chosen for further analysis.

Data processing
The quality control of the raw paired-end exome sequences was carried out using FastQC. Following quality control, high-quality reads were mapped to human genome build GRCh38 using Burrows-Wheeler Aligner (Li and Durbin 2009). The mapped reads were duplicate marked, and base quality score recalibration was carried out using Picard and GATK suite. GATK suite was employed to identify variants. The variants were annotated using Annotate Variation (ANNOVAR) tool (Wang et al. 2010

Docking
Using the molecular docking approach, users can theoretically screen a library of chemicals and forecast the most potent binding sites using a variety of scoring algorithm (AgrAhAri et al. 2018a(AgrAhAri et al. , 2019Ali et al. 2017). Using Autodock Vina docking software, the docking analysis of the ligand ATP with RAD54B was performed. The predicted active site was the primary target for docking the protein-ligand complex. The anticipated active site of RAD54B was then docked with arylidenes ATP using AutodockVina (Trott and Olson 2010). The pose with the highest Auto-dockVina dock score was then chosen for molecular dynamics simulation studies.

Protein-ligand dynamic simulation (MDS)
Selected Protein-Ligand complexes from docking data were subjected to MDS using Gromacs-2019 (AgrAhAri et al. 2018b; Abraham et al. 2015). We obtained the chosen ligand topology by downloading it from the PRODRG website. Utilizing the steepest descent technique, we setup the system and reduced the vacuum for 1500 steps (Petrova and Solov'ev 1997). The structures in a cubic periodic box of 0.5 nm were solvated using the simple point charge (SPC) water model. It was, therefore, sufficient to sustain the complex system with a salt concentration of 0.15 M by introducing an adequate amount of Na + and Cl − counter ions. The system setup was covered based on a previously published work. The ensemble underwent a final simulation run of 100 ns following the NPT equilibration stage. The trajectory was examined using various GROMACS analytic techniques as a last step. The RMSD, RMSF, Rg, solvent accessible surface area (SASA), and intramolecular H-bonds between our protein molecules were calculated for the wild and all mutant proteins, respectively, using the gmx rms, gmx rmsf, gmx gyrate, gmx sasa, and gmx hbond. Molecular Mechanics Poisson-Boltzmann Surface Area (MM-PBSA) was employed to comprehend a substrate's binding free energy (ΔG binding) with a protein throughout the simulation. The GROMACS function g_mmpbsa was utilized to estimate the ΔG binding. Within the final 1000 frames, in the last 50 ns, computation ΔG produced our results (Homeyer and Gohlke 2012).

Steered molecular dynamics
We ran a brief MD run on each system to establish equilibrated structures before running the SMD simulation (Izrailev et al. 1999). The Alphafold structure of RAD54B uploaded to UniProt served as the basis for the MD simulations. The GROMOS54A7 force field was used to run the simulations for both the wild and mutant RAD54B-ATP systems (Huang et al. 2011). GROMOS54A7 force field compatible parameters for the ATP molecule were retrieved from the PRODRG server (Schüttelkopf and van Aalten 2004). The structures were subjected to vacuum energy minimization before using the steepest descent algorithm. With the help of the SPC model, water molecules extended 20 Å from the protein across all sides of the cubic box in which the structures were solvated. Using the steepest descent algorithm, an appropriate number of ions was added to maintain a salt concentration of 0.15 M. The systems were energyminimized for 5000 steps. The systems were first brought into equilibrium in the NVT ensemble and then in the NPT ensemble for 100 ps each. A V-rescale thermostat 37 with a coupling constant of 0.1 ps was used to maintain the physiological temperature of 310 K, and a Parinello-Rahman barostat with isotropic pressure coupling was used to maintain the pressure at 1 bar. The long-range electrostatic interactions were handled by the particle mesh Ewald (PME) sum with a cutoff of 1.0 nm (Essmann et al. 1995). Periodic boundary conditions and a threshold radius of 1.0 nm were employed for van der Waals interactions. Using the LINCS algorithm, all hydrogen atom bonds were restricted (Hess et al. 1997). The final equilibrated structure from the NPT equilibration step was considered the initial structure for performing Steered Molecular Dynamics (SMD) Simulation Studies. Various biomolecular systems have been evaluated using constant-velocity SMD simulation, and encouraging correlations with experimental results are reported. To investigate the ATP molecule's binding affinity at the ATP binding site, separate MD simulations were run on each system. Using constant-velocity SMD simulations, ATP was pulled from the ATP binding site in each system. The centre of mass of the steered group, or the ATP tail region, and the centre of mass of the protein residues constituting the ATP binding pocket were used to define the pulling vector. The constant velocity simulations used a force constant of 1000 kJ mol-1 nm-2. ATP was drawn with a 5 nm/ns pulling velocity for each system. We emphasize that the systems are frequently severely perturbed when greater pulling rates are used in SMD, and some details are less likely to be recorded at such higher pulling rates. The increased pulling rates can also impair the protein's normal elastic response. The pulling velocity was selected to achieve the ideal balance of precision and computing speed.

Results and discussion
The exomes of individuals affected with LOAD were compared with healthy controls. Around 85,000 variants per sample qualifying the initial quality control filters were investigated further. The filtering of variants for synonymous variants and known LOAD risk genes (ABI3,

Pathogenic variants
The variants from two genes, ALDH3A2 and RAD54B, were predicted to be deleterious and pathogenic by the in silico tools.
ALDH3A2 reported two single nucleotide variants (SNV) at exon 7. The gene product is an NAD + oxidoreductase enzyme complex component responsible for oxidizing fatty alcohol to a fatty acid. Mouse knockout studies of the gene  showed several abnormalities corresponding to behavioural traits correlating with movement instability and anxiety issues observed in AD patients. The SNV observed at exon 7 of the ALDH3A2 gene in our sample correlates with the point mutation associated with Sjogren-Larsson Syndrome (SLS). The mutation causes C to T exchange at the nucleotide position 943 in the cDNA leading to the replacement of highly conserved proline to serine (De Laurenzi et al. 1997;Kanetake et al. 2019).
RAD54B reported two SNVs at exon 6 and exon 8. The gene is a member of the helicase superfamily involved in recombination and DNA repair. DNA damage is one of the critical pathological causes of AD, as DNA damage accumulation is noted in patients' brains. Defects in DNA damage and repair enzymes such as RAD54B may facilitate the disease pathogenesis (Murzik et al. 2008;Lin et al. 2020).
CHRNB3 reported two SNVs in exons 5 and 6. The gene codes for the neuronal nicotinic acetylcholine receptor (nAChR) component. The variant associated with CHRNB3 may potentially affect the regulation of nAChRs leading to the disruption of transmitter release, neuronal integration, and cell excitability, as documented in many neurological disorders (Hogg et al. 2003;Abu-Amero et al. 2015). CCR10 reported an SNV in exon 2. CCR10 is a chemokine receptor expressed in astrocytes. Mutation in the gene may alter its ligand binding, inducing immune cascade disturbances in the Central Nervous System (Liu et al. 2014). CLDN3 also reported an SNV in exon 1. The gene codes for a component of tight junction strands. CLDN3 is known to localize in the brain endothelial cells. Any genetic changes in the gene may introduce a breakdown in the blood-brain barrier associated with the endothelial cells (Romanitan et al. 2010). GDA/ Cypin reported around seven SNV and observed among four samples in our dataset. The gene is localized in dendrites, increasing branching and promoting microtubule assembly. Patients with AD show fewer branches of neurons within the hippocampus, potentially reflecting the loss of learning and memory (Arikkath 2012). TMLHE reported two SNVs in exon 6. The gene product is the first enzyme in the carnitine biosynthesis pathway. Carnitine contributes to neuroprotective, neuromodulatory, and neurotrophic functions. Mutations in the gene may affect the carnitine pathway. Loss of function of TMLHE is also shown to be associated with autism spectrum disorders (Nałecz et al. 2004;Virmani and Binienda 2004;Nava et al. 2012).
KLC4 reported five SNVs in exons 7, 8, and 9. Studies on mutant KLC4 zebrafish revealed that it is crucial for peripheral sensory axon branching and proper arborization. The study also showed altered microtubule dynamics. An increased anxiety-like behaviour was also observed, indicating the role of KLC4 in neural circuits (Haynes et al. 2022). MAGI1 reported an SNV in exon 6. The gene plays a significant role in the organization of membrane proteins and cytoskeletons by transmitting signals pertaining to cell-cell or cell-matrix interactions. Mutations in MAGI1 may result in the perturbation of these cell-cell signalling (Hammad et al. 2016). WNT10A reported an SNV in exon 3. WNT10A −/− knockout mice exhibited spatial memory impairment and anxiety-like behaviour (Zhang et al. 2022). WNT10A deficiency was proven to cause hippocampal neuro-degeneration in mice indicating similar effects in AD patients. WFDC2 reported an SNV in exon 2. Human epididymis protein 4 (HE4), encoded by the WFDC2 gene, is a secretory protein expressed in human epididymis and an important biomarker for ovarian epithelial cancer (James et al. 2018). Studies on serum levels of HE4 revealed it as a sensitive biomarker for the early recognition of the cognitive decline in patients suffering from diabetes mellitus (Bai et al. 2020). As observed in AD patients, impairment in the WFDC2 gene function may lead to cognitive decline.
One of the limitations of the current study is that no literature support was found to assess the function and involvement of other genes (GDF9, GK2, GSR, MUTANTHFS; ST20-MUTANTHFS, MIXL1 and CPXM1) in AD or other neurological disorders.

Molecular dynamics (MDS)
Through 100-ns MD simulations, we carried out all-atom MDS to examine the consequences of the complex mutation T459N on the structural integrity of the RAD54B protein [Wild type with ATP (Wild-ATP) and mutant type [T459N] with ATP (Mutant-ATP)].
After 20 ns of observation, we discovered that these trajectory motions grew steadier. As a result, the second half of the trajectory was considered for additional investigation. The mutant complex's RMSD measurements did not reveal appreciable variations in these outcomes. The RMSD of Wild-AT and Mutant-ATP complex is illustrated in (Fig. 1). The RMSD value has produced a consistent  To evaluate and fully grasp the effects of the Wild-AT and Mutant-ATP complex on the flexible areas of the RAD54B, the Ca residue (RMSF) was determined from its time-averaged stance. The RMSF describes that the residue backbone adopts a higher level of fluctuation in Mutant-ATP than in the Wild-ATP complex. It is hypothesized that the mutation T459N alters how ATP binds to the protein and increases the backbone's flexibility. The RMSF information of Wild-ATP and Mutant-ATP complexes is illustrated in (Fig. 2).
The mass-weighted root mean square distance of atoms from their centers of mass can be used to characterize the radius of gyration (Lobanov et al. 2008). The Rg figure shows the competency and form folding of the entire RAD54B structure at various times during the trajectory (Fig. 3). Throughout the simulation, the Wild-ATP complex exhibited a nearly similar Rg value of which the Mutant-ATP complex showed a higher deviation of Rg. As a result, the mutant protein has become more compact, resulting in a slower folding rate than the Wild-ATP.
To assess the hydrophobic core's compactness, the SASA change was studied. The change of SASA of the Wild-ATP and Mutant-ATP complexes with time is shown in (Fig. 4). Both complexes exhibited a nearly similar SASA value throughout the simulation.
The H-bonds numbers formed between RAD54B and ATP during the MDS were also evaluated. The H-bond profile was varied, fluctuating from 0 to 6 with a median of 3 H-bonds in the Wild-ATP and Mutant-ATP complexes (Fig. 5). The average values for the WT-ATP and MT-ATP simulations are provided in Table 5.

Steered molecular dynamics (SMD)
On-time scales attainable by molecular dynamics simulations steered molecular dynamics (SMD) causes ligands to unbind from their biomolecules and change in conformation. A system is subjected to time-dependent external influences, and the system's responses are studied. In the present work, SMD simulations pulled ATP along its unbinding path. SMD simulations were run using the stiff spring constant to pull the ATP molecule from its binding site on the RAD54B protein. In the case of the Wild-ATP complex, the force value increased along the time evolution of the pulling simulation during the initial unbinding phase of ATP (0-600 ps). ATP was released from the binding pocket with a peak rupture force of 925.27 kJ mol −1 nm −1 . However, in the Mutant-ATP complex, the ATP molecule was found to be released from the ATP binding pocket with a similar peak rupture force value of 978.37 kJ mol −1 nm −1 . The forces start to decline after reaching a maximum, indicating disruption of strong non-bonded interactions between ATP and the lining residues of the ATP binding pocket. The force profile of the Wild-ATP complex was relatively flat after 700 ps, indicating only modest interaction in the dissociation path. However, the force plot of the Mutant-ATP complex became flat after 400 ps. Thus, the ATP molecule is difficult to pull out of the binding pocket in the case of wild protein, as strong non-bonded interactions exist between ATP and the pocketlining residues. This fact is reflected in the force plot of the Wild-ATP complex, where a force of 900 kJ mol −1 nm −1 was applied in large timeframes (350-600 ps) to rupture all interactions and release the molecule. This was not observed in the case of the Mutant-ATP complex as the pull force was able to rupture the interactions between ATP and the protein in a very short time frame (300-340 ps), indicating that the ATP molecule weakly bound in the mutant protein, thereby capable of being extracted very quickly from the complex. The Wild-ATP and Mutant-ATP complexes of the SMD graph are illustrated in (Fig. 6).
Wild-ATP and mutant-ATP complexes' binding affinities were measured. Within the active site, we looked at the differential binding capacity. (Fig. 6) compares the binding strength of Wild-ATP and Mutant-ATP complexes examined via the MM-PBSA method. We determined residue-level contributions to the interaction energy throughout a steady simulation trajectory. (Fig. 7) demonstrates that the binding energy of Wild-ATP in the active center pocket was found to be -10.5416 kcal/mol, while the Mutant-ATP procured binding energy of -3.401 kcal/mol. The MM-PBSA suggested that Wild-ATP exhibited significant binding energy in the active binding pocket. They show that Wild-ATP interacted with the active site pocket more favourably than Mutant-ATP (Table 5). Hence, the above computational analyses exhibited the importance of wild RAD54B compared to the mutant RAD54B. These differences would make the mutant RAD54B either disrupt its functional role due to mutation or could influence the pathway that the RAD54B involved.
However, a wider population and biochemical techniques are required to validate the RAD54B mutations in AD patients.

Conclusion
Identifying genetic modifiers is crucial in understanding their significant contribution to the disease's pathogenesis. In the current study, 37 whole exome sequences of LOAD samples were investigated to identify rare variants specific to the East-Asian population. Around 17 genes with potentially deleterious variants not previously studied in LOAD cases were identified. These rare variants are reported for the first time in individuals with LOAD from our study. Association signals that GWAS previously discovered with common and primarily non-functional variants cannot be explained by rare variants. However, our study findings explicitly showed the structural mechanisms of RAD54B mutation and also, more target-based biological experiments should be implemented to learn more about how these genes contribute to AD pathogenesis. Nevertheless, the identified variants were not previously linked to AD, highlighting the capacity of the whole-exome sequencing method to find uncommon variations linked to AD. These rare variants could be considered novel predisposing genetic factors for LOAD and might increase neuro-degeneration.