Genetics of Amyotrophic Lateral Sclerosis

Amyotrophic lateral sclerosis and frontotemporal dementia (ALS-FTD) spectrum disorder is a rare fatal disease with strong genetic influences. The implementation of short-read sequencing methodologies in increasingly large patient cohorts has rapidly expanded our knowledge of the complex genetic architecture of the disease. We aim to convey the broad history of ALS gene discovery as context for a focused review of 11 ALS gene associations reported over the last 5 years. We also summarize the current level of genetic evidence for all previously reported genes. The history of ALS gene discovery has occurred in at least four identifiable phases, each powered by different technologies and scale of investigation. The most recent epoch, benefitting from population-scale genome data, large international consortia, and low-cost sequencing, has yielded 11 new gene associations. We summarize the current level of genetic evidence supporting these ALS genes, highlighting any genotype-phenotype or genotype-pathology correlations, and discussing preliminary understanding of molecular pathogenesis. This era has also raised uncertainty around prior ALS-associated genes and clarified the role of others. Our understanding of the genetic underpinning of ALS has expanded rapidly over the last 25 years and has led directly to the clinical application of molecularly driven therapies. Ongoing sequencing efforts in ALS will identify new causative and risk factor genes while clarifying the status of genes reported in prior eras of research.


The ALS-FTD Spectrum of Disease Has Complex Genetic Architecture
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease clinically recognized for progressive paralysis due to the degeneration of spinal cord and cortical motor neurons [1]. Even with improvements in the multi-disciplinary care of patients and two medications approved by the FDA, ALS is invariably fatal due to respiratory failure, with patients surviving a median of 3 years after symptom onset. The clinical picture of a typical ALS patient is so dominated by mounting motor disability that ALS historically been considered a "motor neuron disease." This nomenclature likely delayed recognition that ALS is a multisystem disease in which a range of non-motor neuronal populations can also be affected [2,3]. The most commonly involved non-motor neurons are those regulating fronto-executive cognitive function; more than half of ALS patients develop fronto-executive dysfunction during the course of the disease with 5-10% meeting criteria for fronto-temporal dementia (FTD) [4][5][6][7]. In addition to frequent clinical co-occurrence, the ALS-FTD spectrum shares an increasing number of genetic causes and underlying neuropathology (mislocalized and aggregated TDP-43 protein) [8,9]. This pathological hallmark makes the ALS-FTD spectrum of disease one among many neurodegenerative diseases that are characterized by pathological protein aggregates, the so-called proteinopathies.
Knowledge of key biological pathways important to ALS-FTD pathogenesis has been greatly advanced by progress in uncovering causative genes and genetic risk factors for the disease. From the very first published case series, familial ALS (fALS) has been recognized, with 7-10% of individuals having a family history of either ALS or FTD [10]. It is within these families that the bulk of monogenic causes have been found. However, 85-90% of ALS patients, so-called sporadic ALS (sALS), have no known family history and typically do not carry rare and high penetrant mutations in known genes. Genetic factors influence disease propensity in this group as well, with studies showing polygenic risk from many common variants [11,12] or, as suggested by preliminary data, for smaller numbers of larger effect variants (so-called oligogenic inheritance) [13,14]. Thus, the genetic structure of ALS is proving to be complex, with a broad range of allele frequencies and effect sizes (reviewed in [15]).

Timeline of Gene Discovery in ALS
Since Charcot described motor neuron disease in the late 1800s, over 40 ALS genes have now been associated with ALS, explaining 25-35% of FALS and 5-10% of apparent SALS. Genetic discoveries in ALS have occurred in four distinguishable epochs, each leveraging important methodological advances (Fig. 1).
In the first era (c.1990-1999), gene discovery required large ALS families powered for high-resolution linkage analysis followed by the arduous task of positional cloning and sequencing. Accordingly, discoveries came slowly in this era for all human diseases and yielded only a single gene for typical ALS, SOD1 [16].
Publication of the human genome inaugurated the second era (c. [2000][2001][2002][2003][2004][2005][2006][2007][2008]. As an exhaustive compendium of genes and their genomic location, the draft genome allowed a simple database query to replace time-consuming positional cloning (ALS2, [17]). Genes within a linkage peak could be quickly prioritized for sequencing based on predicted biological roles. Large families were still required to identify ALS-associated loci but hypothesis-driven candidate gene sequencing could now be pursued. Candidate genes could be those causing other human diseases with phenotypic overlap (DCTN1, GRN, CHMP2B, FIG  4), implicated by neuropathological analysis of human tissues (TARDBP, PRPH), biomarker investigations (ANG), or insights from mouse models of ALS. During this era, the availability of increasingly cheap genome-wide genotyping of common variants made unbiased genome-wide association studies (GWAS) possible [18]. Instead of depending on large families, the success of candidate gene and GWAS methods hinge on large numbers of DNA samples and replication in additional cohorts. This fact prompted organized efforts to collect and bank samples from patients, family members, and controls (predominantly of European ancestry). Some of these efforts (e.g., the NINDS MND collection housed at Coriell) recognized the importance of consenting participants for broad sample and data sharing, Fig. 1 Strategies for the identification of ALS genes. Strategies have evolved from individual gene mapping to increasingly powered technological and statistical methodologies. Since the implication of SOD1, 42 additional genes have been implicated in ALS to date with variable genetic support in replication and functional studies. GWAS, genome-wide association studies; WES, whole-exome sequencing; WGS, wholegenome sequencing ensuring some cohorts would follow an open-access model rather than remaining in the hands of single institutions or consortia. During this era, nearly a dozen genes were reported to cause ALS and common variations at a dozen loci were implicated by GWAS. Only two of these associations have been repeatedly replicated (the region around C9ORF72 and at UNC13A, with OR~1.2 each [12, 19••]), but the evidence for others has either rejected an association or been inconclusive.
The explosion in sequencing speed and reduction in costs brought by "next-generation" short-read sequencing methods ushered in the third era (c.2009-2014) by permitting the rapid and simultaneous sequencing of larger numbers of genes (targeted panels and whole exomes) in increasing numbers of ALS patients. Studies could rapidly identify mutated genes within a locus implicated by linkage in larger ALS families, if they could be found-families with DNA from multiple affected individuals were increasingly scarce [20]. Alternatively, families too small for linkage could be analyzed for recurrently mutated genes. Because whole-exome sequencing returned genome-wide mutational data, it was learned that several other known non-ALS disease genes had ALS presentations within their phenotypic spectrum (VCP, MATR3). These methods also showed that portions of the ALS population carried potentially pathogenic variants in more than one ALS gene and possibly represented oligogenic inheritance [14,21]. This era also saw the first use of wholegenome sequencing (WGS) and repeat-primed PCR to discover the most common cause of ALS to date, expansions of a GGGGCC hexanucleotide repeat in the C9orf72 gene [22,23], and even larger-scale and increasingly prospective DNA sample collections (e.g., Project MinE), including in non-European populations. More than 15 genes and additional GWAS loci were reported in this 5-year period, but approximately one-third of these associations remain uncertain pending additional replication and/or functional studies.
The fourth and current era (c.2015 onward) has been enabled by the large and well-annotated sample repositories focused on simplex/sporadic cases, broader data sharing/ consortia efforts, and the falling costs of whole-genome sequencing. Ever larger GWA studies have validated known loci and provided strong evidence for new regions [12, 19••], while comprehensive ALS gene ascertainment has continued uncovering individuals with possible oligogenic ALS [21,26]. By far the biggest advance in ALS gene discovery has been the application of improved statistical frameworks for genome-wide rare variant case-control studies (e.g., collapsing analysis, rare-variant burden testing [27, 28, 29••]. The ability to identify genes or other genomic regions enriched for mutations in ALS has played a role in implicating at least 11 new ALS-associated genes including TBK1, TUBA4A, KIF5A, ANXA11, TIA1, CCNF, DNAJC7, NEK1, C21orf2, LGALSL, and GLT8D1. These genes are summarized below, focusing on the strength of genetic and functional evidence, genotype-phenotype correlations, proposed mechanisms by which these genes could lead to neurodegeneration, and gaps in our understanding that need filling by future investigation.

Many ALS Genes Highlighting Key Shared Pathways
By overlapping the known functions of causative ALS genes, the field has gained valuable insight into key cellular pathways underlying the pathogenesis of the disease. Although many pathways are highlighted by the genetics of ALS, several are overwhelmingly implicated by the convergence of multiple genes. These include RNA processing, proteostasis, neuroinflammation, vesicle trafficking, and axonal transport.

RNA Processing and Metabolism
TARDBP (Tar DNA binding protein 43 kDa; encoding TDP-43 protein) and FUS encode "prion-like domain" (PLD)-containing proteins with increased aggregation propensity that carry key functions in RNA processing. Predominantly localized to the nucleus, their ability to shuttle between the nucleus and cytosol is compromised in ALS, resulting in pathological TDP-43 and FUS inclusions in the cytosol (reviewed in [30]). Mutations in the genes themselves (TDP-43 C-terminal domain) remain rare [31], yet TDP-43 protein deposition within insoluble cytoplasmic inclusions is seen in almost all SALS cases, while a minority of FALS cases present instead FUS and SOD1 deposition, with SOD1/FUS and TDP-43 deposition being mutually exclusive. TDP-43 aggregation and pathomechanisms are therefore more widely applicable to our understanding of the pathogenesis of ALS [32,33]. Misfolding and accumulation into cytoplasmic nonfunctional aggregates of RNA-binding proteins involved in all steps of mRNA transcription, processing, storage, and degradation have severe consequences in multiple cell types. The loss of TDP-43 and FUS from the nucleus ("loss-of-function"), the formation of protein aggregates ("gain-of-function"), and a combination of both have been implicated in ALS. Loss-of-function mechanisms include dysregulation in RNA metabolism, splicing, and mRNA transport (via hnRNPs for TDP-43 and ELAV-4 for FUS) [34,35]. Mutations in TARDBP can exhibit disease-causing effects in a toxic gain-of-function manner too: misfolding and accumulation of TDP-43 in the cytoplasm sequester mRNA transport proteins and even directly sequester mRNAs into aggregates, impeding efficient mRNA transport in the long processes of astrocytes and neurons, thereby disrupting essential functions away from the nucleus [36]. In addition to TDP-43 or FUS ubiquitylation, aging, or cellular stress factors, genetic factors related to cellular architecture have emerged as central underlying contributors to ALS pathology. A role for nuclear import (Nup proteins) and dysfunction in stress granule (SG) dynamics in seeding TDP-43 aggregation has crystallized recently with mutations in TIA1 and CCNF [37], see below). Tankyrase-1/2 inhibitor Veliparib was found to mitigate cytoplasmic accumulation of TDP-43 in SGs in mammalian systems, likely by inhibiting PARylation of SGs and allowing TDP-43 to shuttle back to the nucleus offering a potential therapeutic target [38]. The specificity of the mechanism of action will be interesting to test as hnRNPA1, another RBP and ALS gene, undergoes PARylation for nucleocytoplasmic shuttling [39]. The interplay between TDP-43 aggregation, SGs, and the nuclear membrane is not restricted to ALS and develops as a common theme in neurodegeneration with similar involvements of SGs and the nuclear pore in Tau and TDP-43 pathology in Alzheimer disease [40,41].

Proteostasis
Cellular stress and subsequent pathological protein misfolding are a common pattern in all neurodegenerative diseases, including ALS-FTD where aggregated TDP-43 and other misfolded proteins are pathological hallmarks [33]. Many established ALS genes play key roles in regulating autophagy or the ubiquitin-mediated pathways for degrading misfolded proteins. These include OPTN, SQSTM1, UBQLN2, and VCP. Several of the newly reported genes also act in these pathways. TBK1 is a clear regulator of autophagy and directly interacts with OPTN [42]. DNAJC7 is a heat shock protein whose mutations may disrupt its ability to act as an effective intracellular chaperone, possibly resulting in increased burden of protein misfolding in ALS-FTD [28]. TIA1 plays a key role in the nucleation of stress granules [43], which are emerging as a potential nidus for the aggregation of TDP-43 and other ALS-related proteins FUS and hnRNPs [44][45][46]. CCNF encodes a protein called cyclin F, a binding partner of another ALS-associated protein VCP. A recent study demonstrated that, in cell culture, CCNF mutations may contribute to ALS pathogenesis by increasing the ATPase activity of VCP in the cytoplasm, which in turn increases TDP-43 aggregation [47].

Neuroinflammation
The role of microglia and other effector cells of the immune system has been well recognized from human neuropathological and animal modeling data. Perhaps the recent genetic validation of its importance comes from the discovery of mutations in TANK-binding Kinase (TBK1). A primary function of TBK1 is to activate autophagy and inflammatory pathways in response to pathogen exposure (mediated through activation of both NF-kB and interferon signaling). The complex role of neuroinflammation in ALS is illustrated by mouse studies of the interaction between TBK1 and SOD1: loss of one copy of TBK1 precipitated earlier disease onset with impaired autophagy but counter-intuitively extended survival by reducing inflammatory activation in later stages of disease [48,49]. These studies appear to provide evidence for the deleterious effects of neuroinflammation in late disease, highlighting the therapeutic potential of targeted antiinflammatory therapies as disease-modifying agents.

Impaired Axonal Transport
Maintaining adequate axonal transport is crucial in large cells such as motor neurons and is known to be dysregulated in ALS [50]. Axonal transport requires the assembly and maintenance of microtubule networks that act as transport highways, facilitating the anterograde and retrograde movement of cargo, including mRNAs, organelles, proteins, and much more. NEK1, through its interactions with FEZ1 and FEZ2, has been shown to be involved in pathways crucial to the maintenance, growth, and repair of these networks with mutations demonstrating impaired microtubule stability [51]. Furthermore, mutations in TUBA4A examined in cell culture demonstrated an impairment in microtubule network assembly and reduction in microtubule stability [52]. Mutations in KIF5A are also thought to result in impaired axonal transport as KIF5A is a kinesin, a microtubule-associated motor protein, responsible for organelle transport, including lysosomes [19••]. Cells also rely on local translation of mRNAs to respond to local stimuli without the need for global changes in the cellular environment [32]. Local translation of mRNAs is reliant on efficient mRNA transport along axons. A recently identified ALS-associated gene, ANXA11 [53], is responsible for tethering mRNAs as part of an RNA granule complex to actively transported lysosomes, enabling them to hitchhike to their terminal destination where they facilitate local neuronal translation.

Challenges in the Interpretation of ALS Genetics
The number of sequenced ALS cases and controls has grown exponentially over the last decade such that the number of reported ALS genes has been doubling every 4 years [15]. This pace coincides with equally rapid advances in molecular therapies like viral gene delivery or gene-silencing and antisense oligonucleotides (ASOs) for gene downregulation or splicing modifications. ASOs targeting SOD1, C9orf72, and FUS are already in clinical trials, and several pharmaceutical or biotech companies are expected to launch trials of AAV vectors targeting SOD1 in the next year. As companies deliberate on which genes to target next, it is essential that the genetic and functional evidence for reported genes be continuously re-evaluated.
It is worth explicitly stating that our certainty that a gene strongly influences ALS is correlated with the route by which the ALS association was made and the length of time since the gene was first reported (Fig. 1). Those genes that were identified in large family linkage studies are clearly causative even on the strength of a single reported mutation (e.g., SOD1, C9orf72, VAPB, SETX). Other genes have been securely implicated when smaller unrelated families show segregation of the same mutation (TARDBP, FUS), when multiple different mutations produce the same molecular defect (KIF5A, TBK1), or when the mutations reported in smaller families cluster regionally within protein domains (UBQLN2). With rarevariant burden testing and GWA studies, the strong evidence comes from the unbiased nature of the approach, the statistical power of genome-wide association, and replication.
For many ALS genes reported from candidate gene sequencing efforts or found in small families, the evidence is not as definitive. In many cases, the literature consists of variants unique to ALS patients identified by sequencing many simplex/sporadic patients. If variants have been found in familial cases, only rarely has DNA been available from other individuals for segregation testing. Even when a second affected individual is available for testing, it is typically a firstdegree relative who has a 50% chance of sharing any variant, ALS-related or not. In the absence of clear-cut evidence from the genetic data, neuropathological signatures can strengthen the case for gene pathogenicity (e.g., aggregates of the ANXA11 gene product exclusive to mutation carriers [53] or the remarkable number of Lewy body-like inclusions found in TIA1 carriers [54]). Unfortunately, the number of genes where even a single mutation carrier has had neuropathological assessment is small. Finally, experimental data from functional studies or animal modeling can be useful in interpreting the importance of these genes. In ALS, even for genes with long-standing and incontrovertible evidence for monogenic causation, recapitulating disease features in mice or finding reproducible cellular phenotypes in patient-derived stem cell models have been challenging. For many of the uncertain ALS genes, there is some experimental data showing functional effects of some ALS-associated mutations, sometimes even in the robustly implicated pathways outlined earlier. However, the specificity of the demonstrated defects is almost never investigated for the gene variants that were also found in controls. Thus, there is uncertainty on the status of data for individual genes which is represented in Fig. 1 by color coding. These assignments are based on interpretation of the current literature, but other experts may feel more or less confident in particular individual genes based on their own weighting of the available evidence. The important point of the color-coding is to emphasize that not every gene reported to cause ALS and appearing on a timeline shares the same level of certainty.

Recently Discovered ALS Genes
As many of the newly discovered genes account for smaller proportions of the ALS population and have only recently been discovered, most have yet to have their neuropathological phenotypes studied or been studied in model systems. Further studies are clearly warranted to fully understand the functional consequences of these gene mutations.
TBK1: TBK1 (TANK-binding kinase 1) plays important roles in autophagy and activation of innate immunity and is a direct regulator of optineurin, another ALS-associated gene [42,55]. It was the first gene for ALS identified using rare variant burden collapsing [56] in a predominantly sporadic cohort and was subsequently validated by a similar approach in familial ALS [57]. The strongest signal in both studies came from loss of function mutations, several of which have shown segregation in large families or been found in multiple studies. Thus, loss-of-function mutations in TBK1 are a clearly established cause of ALS. Several single amino acid deletions have been reported and may be acting by reducing its phosphorylation [58]. Individual studies and meta-analysis [59] have confirmed increased risk with missense variants but implicating specific mutations has been challenging: only a single missense mutation (p.Arg573Gly) has shown segregation in familial disease (primary lateral sclerosis or FTD, [60]). Therefore, at this time, the pathogenicity of specific missense changes in TBK1 is impossible to ascertain in the absence of segregation or clear functional data [61]. Indeed, some missense variants from ALS patients impair TBK1 signaling [57,58]. Given the challenges of interpreting the pathogenicity of missense changes, it is difficult to estimate how common TBK1-induced ALS is. A German cohort reported around 1% of LOF mutations in FALS [62].
TBK1 mutations carriers show typical ages at onset, sites at onset, and rates of progression but appear to have higher rates of cognitive involvement and frank FTD [10]. Mutation carriers with FTD alone have been reported [63]. In fact, TBK1 may be the second only to C9orf72 in frequency among patients with ALS and FTD [64]. Atypical phenotypes, including primary lateral sclerosis [60], progressive supranuclear palsy (PSP) [65], cerebellar ataxia [66], and corticobasal degeneration [67] have also been described. PSP and ataxia in individuals carrying a single amino acid deletion also reported in ALS-FTD and for which functional loss-of-function data exists [58].
TUBA4A: TUBA4A is a major constituent protein of microtubules, essential for cytoskeletal integrity and axonal transport. TUBA4A was first implicated in ALS by burden testing methods in a large familial ALS cohort with confirmation in a second cohort [52]. Unfortunately, all pedigrees with potential mutations were too small or lacked DNA for definitive segregation of any single mutation. Among the other reported rare TUBA4A variants is p.A383T, found in both an Italian FALS patient and a sporadic patient of Chinese ancestry [67]. Other investigations in familial ALS or FTD have found few or no additional TUBA4A mutations [68,69], and burden testing in large cohorts have not demonstrated an association in sporadic patients (ALSdb and Project Mine). The functional effects of only a few mutations in TUBA4A have been examined in cell culture systems. Not surprisingly, some variants demonstrate impaired microtubule network assembly a n d r e d u c t i o n s i n m i c r o t u b u l e s t a b i l i t y [ 5 2 ] . Neuropathological analysis of mutation carriers has not been reported, but tubulin A4A is not clearly mislocalized in spinal cord from sporadic ALS patients [52]. To date, patients with putative mutations in TUBA4A have shown spinal onset with typical upper and lower motor neuron involvement and only a small number have had cognitive impairment or FTD [52,67,68].
KIF5A: KIF5A is a kinesin, a microtubule-associated motor protein, responsible for organelle transport, including lysosomes and mitochondria, and missense mutations in the Nterminal motor domain of KIF5A have long been known to cause spastic paraplegia type 10/CMT2 (SPG10/CMT2) [70][71][72][73]. Recent evidence links loss-of-function mutations in the C-terminal cargo binding domain to ALS. This evidence came from both the largest GWAS in ALS to date and two rare variant burden analyses in familial ALS cohorts [19••, 49]. ALS-associated mutations were found to cluster at the 3′ end of the gene where they disrupt splicing of exon 27 and disrupt the C-terminal cargo binding domain. Loss of the cargo-binding domain or haploinsufficiency presumably disrupts axonal transport in some fashion, though this has yet to be demonstrated. The neuropathology of KIF5A loss-offunction carriers has not been reported. ALS patients with loss-of-function mutations in KIF5A show a lower age at onset (46 vs 65 years) and a remarkably long survival (10 vs. 3 years) [19••]. It should be noted that while the strongest association is ALS with C-terminal loss-of-function mutations and SPG10/CMT2 with N-terminal missense mutations, there have been some individuals within SPG10/CMT2 families presenting with ALS phenotypes, raising the possibility that even missense KIF5A mutations in the N-terminal motor domain can present with ALS [74]. ANXA11: ANXA11 is involved in the calcium-dependent formation of vesicles and potentially mRNA tethering for transport along axons. Its association with ALS was discovered when whole exome data from 50 family ALS cases were overlapped for recurrently mutated genes or variants [53]. This analysis "rediscovered" p.M337V in TARDBP and also revealed a second mutation-p.D40G in ANXA11. This mutation was identified in additional patients with ALS and represents a shared European founder. Post-mortem analysis of a single p.D40G mutation carrier showed large cytoplasmic inclusions of insoluble ANXA11 protein [53]. However, the most definitively causative mutations do not seem to enhance aggregation of ANXA11, but instead disrupt calcyclin binding. There is also evidence that some mutations impair the ability of ANXA11 to tether RNA granules to actively transported lysosomes [75]. ANXA11 mutations are rare in ALS, the handful of reported patients showing typical ALS features. Only a single individual has had clinically apparent dementia [75].
TIA1: TIA1 is an RNA-binding protein like TARDBP and FUS, with a key role in the formation of stress granules. Mutations in TIA were already associated with a distal myopathy with vacuoles (just like other ALS genes VCP, MATR3, HNRNPA2B1). Whole-exome sequencing in a small ALS-FTD family identified a segregating TIA1 variant in the lowcomplexity domain and subsequent analysis of this specific region in cases and controls showed a statistical burden of rare variants in ALS cases [43]. Targeted sequencing in several Chinese ALS cohorts have identified other rare mutations in this region of TIA1, but no segregating mutations have been reported [76,77]. Attempts to replicate this genetic association within three larger cohorts have been unsuccessful [29••, 78, 79••], emphasizing that more investigation is needed. However, there are strong indications from neuropathologic evaluation that TIA1 mutations contribute to ALS pathogenesis. Nine individuals with seven different TIA1 mutations (P362L; M334I; A381T; G355R; V294M; V360M; A381T) in the TIA1 gene showed a remarkably consistent pathological phenotype characterized by frequent round eosinophilic, Lewy body-like inclusions [54] in addition to widespread TDP-43 aggregation. In addition to this shared unique pathology, almost all reported carriers have had clinical FTD in addition to ALS. CCNF: CCNF encodes cyclin F, a member of the FBOX family of proteins and serves as the substrate-binding module of the SKP1-CUL1-F-box protein (SCF) ubiquitin ligase complex. In this role, it binds substrates and directs their ubiquitylation for subsequent degradation by the ubiquitin proteasomal pathway. CCNF was linked to ALS when single-nucleotide polymorphism (SNP) and microsatellite linkage analyses in a large Australian family pointed to Chromosome 16p13.3 and whole-exome sequencing identified p.S621G in CCNF [80]. Other rare or even novel variants have been identified in other cohorts, but none of them have been recurrent or shown to segregate with disease. Mutations reported in ALS do not alter cyclin F stability or disrupt formation of the SCF complex [47], but in vitro can increase the ATPase activity of VCP (another ALS-causative gene) while others impair ubiquitin-mediated proteasomal degradation [47]. Either mechanism disrupts the normal processing of TDP-43, leading to its abnormal cytoplasmic accumulation [47,81]. The strongest effect occurred with the clearly segregating mutation (p.S621G), which also disrupts Lys48specific ubiquitylation and impairs autophagy [81]. When expressed in zebrafish, this mutation also induced increased spinal cord neuronal cell death, impaired motor axonal length, and produced a motor deficit [82]. In sum, the functional analysis of cyclin F supports pathogenicity of the only segregating mutation (p.S621G), with work yet to be done in clarifying whether other identified mutations are causative. The entire spectrum of ALS-FTD has been reported in individuals with rare variants in CCNF with no standout features to date, and neuropathological evaluation has yet to be reported [80].
DNAJC7: DNAJC7 encodes a heat-shock protein cochaperone known to regulate key protein-folding chaperones Hsp70 and Hsp90 with roles in neuroprotection. Understanding of DNAJC7's role in protein folding and quality control has largely been in the context of steroid hormone regulation, but there is emerging evidence that it may suppress cellular responses to innate immunity [83]. Gene-based burden methodologies demonstrated enrichment of premature truncation variants in DNJAC7 [28]. The discovery of DNAJC7 was enabled by the aggregation of three previous ALS whole-exome datasets into a much larger collaborative cohort with unprecedented power. This illustrates the important fact that further gene discovery in ALS will require increasing cooperation and harmonization across the many genetic projects in the field and makes the case for open-access models of data production and sharing. Premature truncating m u t a t io n s i n D N A J C 7 a r e p r e s u m e d t o a c t v i a haploinsufficiency, but functional investigations have not been reported. Similarly, the neuropathological and clinical phenotypes of mutation carriers are yet to be described. NEK1: NEK1 (NIMA related kinase 1) is a serine/threonine kinase involved in cell-cycle regulation, axonal development, and implicated in axonal polarity and axon guidance as well as DNA damage repair [84,85]. Loss-of-function mutations were first implicated in ALS by gene burden testing in a mostly sporadic cohort [56] but then validated in two independent familial cohorts of European ancestry [84,86] and in a Chinese population [87]. The weight of evidence, as with TBK1 and OPTN, favors loss-of-function mutations as causative, but rare missense variation also modestly increases risk [86], especially the p.Arg261His variant [84,88]. How loss of NEK1 function or missense mutations lead to ALS is not yet known, but may lead to the accumulation of DNA damage that contributes directly to cell damage and cell death in neurons that are already vulnerable due to widespread protein misfolding, as demonstrated in cell culture [89]. It is also possible that NEK1 mutations directly contribute to protein misfolding. Indeed, insoluble SOD1 aggregates were recently reported in an ALS case with a NEK1 R812X mutation at post-mortem [90]. Detailed phenotypic information for either loss-of-function or missense variant carriers has not been published.
C21orf2: The biological role of C21orf2 (or CFAP410) is poorly understood, but limited evidence suggests that it functions in DNA damage control and repair similar to NEK1 and plays a role in cilia formation [91]. Its drosophila homolog is implicated in actin structure [92], with possible relevance to ALS genes like TUBA4A, PFN1, KIF5A, and ANXA11. C21orf2 was identified in a large genome-wide association study of common variation [19••] with subsequent replication in a large cohort study [12]. The same study also used gene burden testing to demonstrate an excess of rare nonsynonymous C21orf2 variants in patients with ALS. To date, segregating or recurrent mutations have not been reported, making it difficult to interpret the pathogenicity of individual mutations. Investigations into the functional implications of ALS-associated variants have only recently begun, with structural modeling suggesting that a fraction of them are likely to disrupt the protein structure [93]. The clinical and neuropathological phenotypes of ALS patients carrying C21orf2 mutations have not yet been reported. LGALSL: LGALSL encodes a galectin-related protein whose function is largely unknown. As with most of the new genes highlighted in this review, LGALSL was identified using gene burden methods on whole-exome sequencing. Unlike other discoveries however, this study used burden testing methods capable of implicating mutational hotspots or domains rather than entire genes [29••]. As has been the case for most other gene burden methods, loss-of-function mutations were the strongest association with ALS. This very recent report has not yet been replicated, and no mutations segregating in familial ALS or ALS-FTD have yet been reported. A subsequent paper demonstrated LGALSL variants in 0.382% of cases and 0.068% of controls, hence identifying it as a non-significant candidate [56]. In the initial paper, carriers of LGALSL mutations had significantly early age at symptom onset (13 years earlier than average for the cohort) but were otherwise typical of ALS. Neuropathology of mutation carriers has not been reported. If future studies confirm an association between LGALSL and ALS, this gene could implicate an entirely new biological pathway in ALS pathogenesis.
GLT8D1: GLT8D1 is a ubiquitously expressed glycosyltransferase of unknown function implicated in ALS by familybased whole-exome analysis and candidate gene sequencing [94]. Study of a small ALS family identified a disease-associated haplotype containing rare missense mutations in two genes, ARPP21 and GLT8D1 [94]. The same haplotype was found in additional ALS patients, along with a much smaller number of patients who carried one or the other mutation, but not both. The identification of two ALS patients with other rare variants in exon 4 of GLT8D1 made this gene statistically more likely to be causative, but a role for ARPP21 has not been excluded. The authors hypothesized there could be a synergistic effect between the two mutations. Interestingly, a study in Chinese patients found a nonsignificant enrichment of mutations in ARPP21 but not GLT8D1 [95], making further study of these two genes essential for clarifying their role in ALS. ARPP21 is an RNAbinding protein with known roles in neuronal dendrite elaboration [96], while GLT8D1 is a ubiquitously expressed glycosyltransferase of unknown function. Missense variants demonstrate a mild deficit in enzyme activity and, in an overexpression model, induce a mild increase in cytotoxicity in vitro and impaired locomotion in zebrafish [95]. As with LGALSL, conclusive implication of GLT8D1 would highlight a previously unexplored pathway in ALS pathogenesis. The small number of cases reported thus far makes phenotypic patterns tentative, but plausible as mutation carriers showed earlier onset of disease and those carrying both mutations on the haplotype (GLT8D1 and ARPP21) trended toward shorter survival.

Conclusions and Future Directions
Recent advances in understanding the complex genetic architecture of ALS have been the direct consequence of global collaborative approaches to gather large cohorts with harmonized, well-curated clinical information from a large number of sites. It is therefore crucial that this continues and that there is a standardized approach to such collaborative efforts with the adoption of comparable methodologies, open access to data, and an emphasis on high-quality genetic and biological validation. Even with these concerted efforts, the genes identified will be rare in the ALS population and the mutations difficult to interpret on the bases of genetic data alone. This argues that concerted efforts for neuropathological evaluation and model systems to assess functional effects of mutations will be increasingly important in the interpretation of ALS genetic variants.
This review highlights the importance of ongoing aggregation of datasets through data sharing and collaboration at the very inception of large-scale genomic investigation to improve ascertainment of structural variants, somatic variation, copy number variation, and novel repeats contributing to disease. Future studies exploring these large datasets, including the potential application of artificial intelligence, will need to focus on the development of capabilities to interpret non-coding variation with robust and reproducible effect sizes. Also, the collection of longitudinal data may facilitate the identification of key associations with disease progression and other phenotypic traits, which would generate promising targets for molecularly directed personalized therapies.

Compliance with Ethical Standards
Conflict of Interest Jenna M Gregory has received research grants from the Academy of Medical Sciences, the Jean Shanks Foundation, and the Pathological Society. Delphine Fagegaltier declares no conflict of interest. Hemali Phatnani has received research grants from the ALS Association, the Tow Foundation, Target ALS, and the NIH. Matthew Harms has received research grants from Biogen, the ALS Association, Target ALS Foundation, and Project ALS Foundation. He is a consultant with the Muscular Dystrophy Association.
Human and Animal Rights and Informed Consent This article does not contain any studies with human or animal subjects performed by any of the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.