Milestones in genetics of cerebellar ataxias

Cerebellar ataxias (CAs) comprise a group of rare, neurological disorders characterized by extensive phenotypic and genetic heterogeneity. The core clinical feature is the cerebellar syndrome, which is often accompanied by other neurological or non-neurological signs. In the last 30 years, our understanding of the CA etiology has increased significantly, and numerous ataxia-associated genes have been discovered. Conventional variants or tandem repeat expansions, localized in the coding or non-coding DNA sequences, lead to hereditary ataxia, which can display different patterns of inheritance. Advances in molecular techniques have enabled a rapid and cost-effective detection of causative variants in a significant number of CA patients. However, despite performing extensive investigations, a definite diagnosis is still unknown in the majority of affected individuals. In this review, we discuss the major advances in the genetics of CAs over the last 30 years, focusing on the impact of next-generation sequencing on the genetic landscape of childhood- and adult-onset CAs. Additionally, we outline possible directions for further genetic research in hereditary and sporadic CAs in the era of increasing application of whole-genome sequencing and genome-wide association studies in various neurological disorders.


Introduction
Cerebellar ataxias (CAs) are a heterogeneous group of neurological disorders characterized by impaired coordination of limb and eye movements, and dysarthria. The primary pathology in CAs is progressive cerebellar atrophy; however, in most cases, the phenotype is complex and involves multiple neurological deficits. Due to large phenotypic and genetic heterogeneity the diagnostic work-up in CAs remains a challenge. In terms of etiology, we distinguish acquired, sporadic, and hereditary ataxias. Hereditary ataxias can be further divided into autosomal dominant cerebellar ataxias (ADCAs, also known as spinocerebellar ataxias, SCAs), autosomal recessive cerebellar ataxias (ARCAs), and X-linked ataxias. Based on similar clinical features, we also distinguish episodic ataxias (EAs) and spastic ataxias (SPAX). Sporadic ataxia, previously known as idiopathic, is a progressive disorder of the cerebellum of unknown etiology, which is neither acquired nor monogenic [1]. The term sporadic is also commonly used in cases with no relevant family history, but for the purposes of this article, it will be used as defined above.
It is difficult to estimate the frequency of specific etiologies among CA patients due to differences in ethnicity, age, family history, and other various inclusion criteria of the studied groups. For many years, the cause of ataxia was unknown in the majority of affected individuals. Since the 1990s, we have observed significant progress in understanding the causes of cerebellar degeneration, mainly due to the development of novel genetic techniques. In this review paper, we discuss milestones in genetics of CAs and outline possible directions for further genetic studies.

Cerebellar ataxias caused by tandem repeat expansions
After discovery of the first trinucleotide repeat expansions in fragile X syndrome and spinobulbar muscular atrophy in 1991 [2,3], the era of identifying tandem repeat disorders (TRDs) has begun. Most TRDs in humans are caused by the expansion of short tandem repeats (STRs) (also known as microsatellite DNA), which consist of 1-6 bp repetitive DNA elements. Polyglutamine diseases, caused by the expansion of trinucleotide CAG repeat, account for the majority of TRDs [4]. Among dominantly inherited CAs, CAG repeat expansions were firstly described in SCA1 in 1993, and subsequently in dentatorubral-pallidoluysian atrophy (DRPLA), SCA3, SCA2, SCA6, SCA7, SCA12, and SCA17 [5]. In recessive ataxias, a GAA intronic repeat mutation was described in Friedreich's ataxia (FRDA) in 1996 [6]. Up to date, there are at least 16 known repeat expansion ataxias. Pathogenic expansions are localized in coding or/and non-coding DNA sequences and comprise repeated motifs from 3 to 6 bp ( Table 1). The most recent discovery was a biallelic intronic AAGGG repeat expansion in RFC1 gene in cerebellar ataxia, neuropathy, vestibular areflexia syndrome (CANVAS) in 2019 [7].
Studies on epidemiology of TRDs show that they are a frequent cause of CA worldwide. FRDA is the most common inherited ataxia in Europe, with the prevalence of 2-4 per 100 000 [8]. Fragile X-associated tremor/ataxia syndrome (FXTAS), caused by a trinucleotide CGG expansion in the FMR1 gene, is found in 2 to 4% of men with adult-onset CA and a negative family history [9]. A homozygous RFC1 pentanucleotide expansion, which can manifest not only as CANVAS but also as limited peripheral, vestibular, or cerebellar dysfunction, was detected in 22% of late-onset ataxia cases in the original study [7]. The overall prevalence of SCAs is estimated at 1-3:100 000. The most common SCA worldwide is SCA3, followed by SCA1, SCA2, SCA6, and SCA7 [10]. Polyglutamine SCAs may together account for about half of ADCAs, but there are significant variations in frequency by geographic region [10,11]. A positive family history highly influences the percentage of detected expansions in the studied groups. While screening for the most common repeat expansion SCAs is routinely performed in patients with CA, the detection rate is rather low in individuals with an informative and negative family history, ranging from 0 to 18.9% [10][11][12][13][14]. However, in some non-familial cases, testing for repeat expansion SCAs should particularly be considered, such as SCA7 in early-onset ataxia and retinal dystrophy (possible dramatic anticipation), SCA6 in lateonset ataxia, and SCA8 in slowly progressive ataxia (possible incomplete penetrance) [10,14].

Cerebellar ataxias caused by conventional variants
Since the 1990s, we have observed an increasing number of novel CAs, caused by sequence and copy number variants (CNVs) in various genes. The phenotypes of the most common ARCAs, i.e., ataxia with oculomotor apraxia (AOA) and ataxia-telangiectasia (AT), were differentiated in the 1970s and 1980s [15]. A causative gene for AT was identified in 1995 by positional cloning [16]. In the same year, Ouahchi et al. [17] reported that ataxia with vitamin E deficiency (AVED) is caused by biallelic pathogenic variants in TTPA gene. APTX and SETX gene variants were described as the cause of AOA in 2001 and 2004 respectively [18,19]. However, it was not until the introduction of next-generation sequencing (NGS) techniques that allowed the identification of multiple novel hereditary cerebellar ataxias, most of which are ultrarare. The application of NGS in clinical practice showed that this method is highly effective in the diagnosis of heterogeneous neurological disorders. From the second decade of the 2000s, many studies have been conducted using NGS in patients with various ataxia-related phenotypes. Three main approaches were used: target sequencing panels, which analyzed the coding exons and flanking introns of a restricted number of genes, whole exome sequencing (WES), and recently whole-genome sequencing (WGS). Prior to NGS, patients underwent numerous diagnostic tests in accordance with the standards of a given center. In general, common repeat expansion CAs had to be excluded by targeted techniques because they are not reliably detected by NGS. In a pilot study Németh et al. [20] analyzed 118 known and candidate ataxia genes in 50 index patients. The overall detection rate was 18% and reached 75% in a subgroup of patients with an adolescent onset and a positive family history. The application of WES in pediatric ataxic patients showed a 46% success rate [21]. Subsequent studies confirmed that the highest percentage of diagnoses can be achieved in groups of early-onset CA, in consanguineous families and in patients with a positive family history. However, such cases represent only a minority of CAs.

Utility of next-generation sequencing in childhood-onset cerebellar ataxias
Several studies analyzed the utility of NGS in groups of solely children and adolescents with CA. While the overall prevalence of childhood ataxia is relatively high and estimated at 26:100 000 children, a significant proportion can be attributed to acquired and mixed etiology, such as ataxic cerebral palsy (CP) [22]. However, it is a matter of discussion and further investigation of how many of the so-called CP cases are due to a non-progressive brain damage. Hereditary ataxias in children are characterized by large phenotypic and genetic heterogeneity, and ataxia is often a part of a complex phenotype, with multiple comorbidities. Ataxia can be a sign of congenital hindbrain abnormalities (such as Joubert syndrome, Dandy Walker malformation, and pontocerebellar hypoplasia), complex neurodevelopmental disorders (like MECP2-disorder), or various metabolic and mitochondrial conditions. Specific non-ataxic symptoms, as well as abnormalities on imaging and laboratory investigations, often guide the diagnosis and enable targeted genetic testing.
Diagnostic rates (DRs) of NGS in pediatric ataxia cohorts vary from 25% to over 80%, depending on the selection of the study group and the type of NGS method (panel vs exome sequencing). Sawyer et al. [21] analyzed the utility of WES in childhood-onset CA and obtained a molecular diagnosis in 13 of 28 families (DR = 46%). Similar result was published by Ohba et al. [23], who evaluated children with cerebellar atrophy and reported a 39% success rate (9 of 28 families). Application of targeted ataxia gene panel in a group of 84 pediatric patients resulted in genetic diagnosis in 25% [24]. In consanguineous families, exome sequencing can provide a molecular diagnosis in up to 80% of cases [25,26]. Several genes were recurrently implicated in congenital or infantile-onset CAs with cognitive impairment in few series. These are predominantly ion channel-coding genes, such as CAC-NA1A, CACNA1G, KCNC3, and ITPR1, as well as β-III spectrin gene SPTBN2, which is involved in trafficking and stabilization of membrane proteins [25,[27][28][29][30][31][32]. All are associated with variable degenerative and developmental neurological disorders, including non-progressive or slowly progressive cerebellar ataxia, with a wide range of disease onset, from infancy to adulthood. More information on the utility of NGS in early-onset cerebellar ataxias (EOCAs) comes from studies on larger series of adults, whose symptoms started appearing before the age of 40. Overall, they showed a diagnostic yield of NGS ranging from 21% to over 50%, with higher percentages of molecular diagnoses in patients with a positive family history compatible with Mendelian inheritance [20,[32][33][34][35][36][37][38][39]. The results of NGS studies outlined the most common etiologies of EOCAs in different populations. According to the literature, the most common recessive ataxias in Western countries are FRDA, spastic paraplegia 7 (SPG7), autosomal recessive spastic ataxia of Charlevoix-Saguenay (ARSACS), AOA2, spectrin repeat-containing nuclear envelope protein type 1 (SYNE1)-related ataxia, ataxia-telangiectasia (AT), AOA1, and polymerase gamma (POLG)-related ataxia. In addition, Marinesco-Sjögren syndrome and AVED show a relatively high frequency irrespective of ethnic origins, the latter being particularly common in North Africa and in the Mediterranean [40,41]. All together, they constitute the majority of the more than 100 ARCA etiologies known so far. While most present in childhood and adolescence, onset in adulthood is also frequently described, especially for SPG7 and SYNE1 ataxias, which are typically adult-onset disorders. In 2019, the International Parkinson and Movement Disorder Society Task Force on Classification and Nomenclature of Genetic Movement Disorders proposed a revised naming system for ARCAs, based on a phenotypical prefix followed by the gene name. Overall, 62 disorders with CA as a prominent feature were assigned with ATX prefix while 30 disorders with CA and coexisting other predominant movement disorder with double prefix [40]. In the same year, the Consensus Statement from the Society for Research on the Cerebellum and Ataxias Task Force proposed a list of 59 primary ARCAs [41]. Furthermore, both classifications listed numerous other disorders that may present with ataxia as an additional feature.

Utility of next-generation sequencing in adult-onset cerebellar ataxias
Before the era of NGS, genetic diagnosis of SCAs was based mainly on the exclusion of common CAG trinucleotide expansions, which allowed for the diagnosis of approximately half of ADCAs [10,11,13]. Advances in molecular techniques have led to identification of numerous novel SCAs, with 48 SCA subtypes and 36 casual genes identified so far [42]. Coutelier et al. [27] examined a large cohort of 412 index cases with dominantly inherited CAs, who tested negative for polyglutamine SCAs, using combining panel sequencing and TaqMan Õ polymerase chain reaction assay and reported a high incidence of channelopathies. Pathogenic variants in CACNA1A were found to be the most frequent genetic cause of SCAs in this group, followed by other ion channel-coding genes, such as KCND3, KCNC3, and KCNA1. However, despite a positive first-degree familial history, relevant genetic variants were detected in 15% cases. Similarly, a low diagnostic rate (9.8%) in dominant CAs, negative for CAG repeat expansions, was reported by Chen et al., in the largest sample described so far in China [11]. In this study, over 80% of 480 cases with negative family history remained without a genetic diagnosis. These results indicate that a significant proportion of CAs are genetically unexplained despite strong indicates of genetic contribution. Of note, channelopathies are also a frequent cause of cerebellar ataxia in Canada, but appear to be ultrarare in China and Japan [11,32,43].
The majority of patients presenting to ataxia clinics have late onset of symptoms and a negative family history. After excluding acquired and hereditary causes, they are classified as sporadic adult-onset ataxias (SAOAs) [1]. According to the literature, NGS methods may detect conventional variants in 6-33% of apparently SAOA [44][45][46][47][48]. Giordano et al. [48] screened a large cohort of 194 cases with progressive SAOA for causative variants in 201 ataxia-associated genes and obtained a genetic diagnosis in 6%. In a study by Coutelier et al. [34], a diagnostic yield of exome-targeted capture sequencing in patients with disease onset after 40 years of age was 6.4%. Klockgether and Giordano et al. [1,48] estimated that testing for common tandem repeat expansions, followed by ataxia-specific NGS panel, may result in genetic diagnosis in about 20% of apparently SAOAs. Apart from detecting ultrarare monogenic causes, NGS studies outlined several genes commonly implicated in adult-onset CAs, such as SYNE1, SPG7, and ANO10 [34,36,44]. Importantly, a significant number of patients classified as CAs were found to carry pathogenic variants traditionally associated with hereditary spastic paraplegias (HSPs). This indicates that ataxias and HSPs share similar pathways and mechanisms and gave rise to the concept of a continuous ataxia-spasticity disease spectrum [49]. Table 2 presents genes recurrently involved in several series of ataxic patients and reported as common causes of rare ataxias.

Future genetic testing in cerebellar ataxias
Despite the wide application of NGS in the diagnosis of heterogeneous neurological diseases, it is well known that these methods have several limitations. Multigene panels have proven to be an effective diagnostic tool, showing significant diagnostic yield at relatively low cost. However, they analyze a limited number of genes, which may pose an important problem in the context of the heterogeneity of CAs. Clinical features can overlap with other neurological disorders and can be separately classified as leukodystrophies, metabolic disorders, spastic paraplegias, and intellectual disability, and therefore causative variants may be not captured by a typical ataxia-specific gene panel. In this respect, WES has the advantage of analyzing the sequence of nearly 95% protein-coding regions in the genome. Recently, due to continuous improvements in NGS and bioinformatics data analysis, WES has become widely available in clinical setting.
In terms of CAs, major limitations of NGS are the problem with detection of tandem repeat expansions and diseases caused by alterations in mitochondrial DNA. WES is also considered not adequate for detecting deep intronic variants, copy number variations (CVNs) defined as single exon or larger deletions and duplications, balanced translocations or complex inversions, and low-level mosaicism. There may also be problems in achieving good coverage of the GC-rich regions of the exome [50]. However, many of these issues have been recently addressed. Now, many services offer multigene panels and exome sequencing together with analysis of mtDNA, CNVs and selected deep intronic variants, with efficient capture and satisfying read depth. Although CNVs constitute a minority of variants in progressive CAs, their involvement cannot be ignored. Ngo et al. [46] performed WES in a heterogeneous cohort of 260 patients with CA and/ or spastic paraplegia, with additional CNV and repeat expansion analysis in a representative subset of cases (n = 68) and found two pathogenic CNVs and one trinucleotide expansion. In a study by Marelli et al. [35], a group of 33 patients was examined using mini-exome coupled to read depthbased CNV and found two pathogenic CNVs in SETX gene. While in autosomal recessive diseases the diagnosis is usually facilitated by the presence of a concomitant SNV in the second allele, undetected CNVs may be a potential cause of false-negative results, especially in cases with de novo dominant variants. The combination of multiple NGS approaches, such as exome sequencing, targeted testing, CNV, and repeat expansion analysis, can provide high diagnostic yield of over 50% in various heterogeneous ataxia cohorts [31,46,51].
Among patients with unexplained CA, the use of WGS may be a promising option. WGS is designed to analyze all coding and non-coding sequences of nuclear DNA and covers up to 98% of the whole human genome. By comparison, WES captures up to 95% of the exome, which is only 1-2% of the human genome. Additionally, WGS has more uniform depth of coverage and is more efficient than WES for detecting SNVs, small insertions and deletions (indels), and CNVs within regions that are targeted by WES [52]. Indeed, data show that so far diagnostic advantage of WGS over WES consists mainly in better detection of alterations in the coding regions of the genome [53]. Nevertheless, CAs caused by variants in non-coding DNA sequences have also been reported, like early-onset cerebellar ataxia associated with non-coding RNA, RNU12 [54]. Determination of pathogenicity of deep intronic variants remains a challenge, but this is likely to change in the future with the further improvements in functional studies and bioinformatics tools.
So far, there are only a few studies in the literature assessing the utility of WGS in CAs. Kang et al. [55] performed WGS in patients with CA after negative testing for repeat expansions and multigene panel and found a causative variant in one out of three individuals. Kim et al. [56] examined a heterogeneous group of 18 cases with spastic paraplegia with or without CA and reported a 38.9% diagnostic rate. However, only one intronic variant was detected that could have been missed on WES. Further research on larger groups of patients is necessary to determine the benefits of WGS in CAs.
In the case of CAs, the possibility of detecting tandem repeat expansions using WES and WGS is particularly promising. Standard NGS techniques based on shortread sequencing were traditionally not capable to detect TRDs, except for SCA6, caused by the smallest expansion. Recently, several algorithms were developed to analyze STRs from short-read NGS data and successfully applied in CA patients. Retrospective use of these algorithms can lead to diagnosis in individuals with negative NGS results. Additionally, we can expect to identify new ataxia-causative expansions in the future [57]. Recently, WGS in conjunction with non-parametric linkage analysis has led to the identification of pathogenic repeat expansion in CANVAS and late-onset ataxia [7]. WGS also contributed to the discovery of several novel TRDs, such as benign adult familial myoclonic epilepsy, neuronal intranuclear inclusion disease, oculopharyngodistal myopathy, and others [58,59]. At present, there are several types of SCA, i.e., SCA4, SCA25, SCA30 and SCA32, which genetic cause awaits to be determined. The introduction of new molecular technologies, such as PacBio and Nanopore long-read sequencing, which enables  [32,34,37,[44][45][46][47]51] sequencing normal and expanded STR alleles, can lead to further discoveries in the field of repeat expansion disorders in the future [57].

Cerebellar ataxias-beyond monogenic diseases
Despite the increasing use of advanced genetic techniques, still more than half of patients with CA remain without a specific diagnosis [60]. Determining the cause of the disease is of particular concern in patients with late-onset of symptoms and a negative family history, who constitute the majority of CAs. The term idiopathic late-onset CA has previously been used for all individuals without an apparent acquired or hereditary etiology. Further studies allowed to distinguish from this group patients with multiple system atrophy of cerebellar type (MSA-C), a separate disease entity characterized by the presence of glial cytoplasmic inclusions. SAOA has to be differentiated from MSA-C, and its cause remains unknown. Despite significant phenotypic heterogeneity, SAOA patients present several common features, such as isolated cerebellar atrophy on neuroimaging, pyramidal signs, absent ankle reflexes, reduced vibration sense, and mild urinary symptoms. The mean age of onset reported in the literature varies from 41 to 56 years, and disease progression is significantly slower than in MSA-C [1].
Presumably some patients with sporadic CA are patients with undiagnosed acquired, autoimmune, or hereditary causes. With the introduction of WGS, we can expect identification of novel monogenic etiologies, especially in early-onset and familial cases. However, it appears that a significant proportion of sporadic ataxias may have multifactorial or polygenic background. Genome-wide association studies (GWASs) identify genomic risk variants by a patient to population control variant frequency comparison. To date, no GWASs have been published in the population of patients with sporadic CA. The main issue is that GWASs require large groups of patients and controls, typically thousands of individuals, which is difficult to achieve in CAs. Moreover, an additional challenge is to distinguish phenotypically homogeneous cases. So far, GWASs for common and complex neurodegenerative disorders, such as Alzheimer's disease (AD), Parkinson's disease, amyotrophic lateral sclerosis, and frontotemporal dementia, identified several susceptibility variants, but most of them have very small effect on risk [61]. GWAS in MSA found no significant loci, but detected several potential variants that need to be examined in a larger sample set [62]. Studies show that AD has a significant polygenic component, which may be used in calculating genetic risk of developing the disease.  [20,21,24,[32][33][34][35][36] size from the prior GWAS, is called a polygenic risk score (PRS) and may have a predictive value for multiple common diseases. Promising results that point to potential clinical utility of PRS in the future have been published for AD and epilepsy [63,64]. Collecting a large number of clinically homogeneous ataxia patients through international consortia gives hope to conduct GWASs and identify potential variants enriched in CA. Epigenetic alterations may potentially be responsible for some cases of unexplained CAs; however, there is little literature on this topic. Epigenetic mechanisms such as DNA methylation, histone modifications, and microRNAs (miRNAs) regulate gene expression without changing the underlying genomic sequence and are implicated in various neurodevelopmental processes. Studies on epigenetics in CAs have shown that epigenetic dysregulation is involved in the pathogenesis of FRDA, FXTAS, ataxia-telangiectasia, and several SCAs [65,66]. MiRNAs have been shown to be essential for the survival of Purkinje cells, and their loss leads to the degeneration of the cerebellum and the development of ataxia [67]. Alterations in specific miRNAs levels have been described in SCA1 and SCA3. Given the role of miRNAs in CA neuropathology, it was suggested that pathogenic variants in miRNA-binding sites or miRNAs may be causative for a group of unexplained ataxias [66]. Aberrant DNA methylation profiles were found in several trinucleotide expansion disorders and related to age at onset and somatic repeat instability of the mutated allele [68]. Recently, genome-wide DNA methylation profiling showed differentially methylated loci in ataxia-telangiectasia [69]. Further studies are needed to determine the possible role of epigenetic dysregulation in the pathogenesis of unexplained CAs.

Summary
The last 30 years have been the era of discovering the monogenic causes of CAs. After trinucleotide expansion ataxias and several conventional ataxias, identified by targeted techniques, there was a rapid increase in the number of novel ataxia-causative variants, detected by massive parallel sequencing. The introduction of NGS to clinical and experimental neurology significantly changed our understanding of the complex landscape of cerebellar ataxia genetics. Currently, the common use of multigene panels and WES in clinical practice allow for the rapid diagnosis of CA etiology in a substantial number of patients. However, for the majority of affected individuals, especially adults, genetic findings do not currently inform diagnosis or management. In fact, a significant proportion of CAs remains genetically unexplained despite strong indicates of genetic contribution. Further advances in NGS technique and bioinformatics analysis, as well as the widespread use of WGS, give hope for better detection of conventional variants and repeat expansions in the future. In addition, GWASs will be needed in large series of ataxic patients in order to identify risk loci contributing to cerebellar ataxia. Combining GWASs results with rare-variant burden analyses and repeat expansion data from wholegenome sequencing may unravel the genetic architecture of unexplained CAs.
Funding This publication was prepared without any external source of funding.

Data availability Not applicable.
Code availability Not applicable.

Declarations
Ethics approval Ethical approval was not necessary for the preparation of this article.

Conflict of interest The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.