Introduction

Neurodegeneration is defined as progressive neuronal vulnerability and loss of function. To date, this progression is unstoppable and the diseases are incurable. Neurodegenerative diseases are proteinopathies; there are the accumulation and aggregation of a pathogenic protein in the brain. The pathogenesis of neurodegenerative diseases (ND) is characterized by the aggregation of specific proteins in intracellular inclusions or extracellular aggregates. The hallmark protein tends to be different for each disease [1] but the overlap is striking (Table 1). Amyloid-beta plaques (Aβ) and tau tangles characterize Alzheimer’s disease (AD) affected brains, and tau, but not Aβ, is also aggregated in frontotemporal dementia (FTD). Alpha-synuclein is aggregated in Parkinson’s disease (PD) and dementia with Lewy bodies (DLB). The proteins FUS, TDP-43, and SOD-1 have been found aggregated in amyotrophic lateral sclerosis (ALS), but TDP-43 has been found also in FTD brains. Amyloid-beta plaques have been found in PD brains, and alpha-synuclein aggregates are also present in several AD brains [2]. In fact, abnormal cortical Aβ deposition is present in PD patients with dementia [3, 4].

Table 1 Protein aggregates found in neurodegenerative diseases

A combination of multiple genetic, lifestyle, and environmental factors modulate the risk of ND. A very low percentage of cases show Mendelian inheritance patterns, while the majority of the cases have complex genetic architectures that define the ND genetic predisposition. This genetic burden interacts with lifestyle and environmental factors to predispose patients to ND.

AD is the most common neurodegenerative disease. Aside from the accumulation of amyloid beta plaques and neurofibrillary tangles [5], it is also characterized by the degeneration of the subcortical hippocampal regions and the medial temporal lobe, which are associated with memory impairment [6,7,8]. Even though AD and FTD are both characterized by tau aggregates, in FTD, the degeneration happens in the frontal and anterior temporal lobes, rather than the hippocampus and medial temporal lobe [9]. In fact, FTD and AD are sometimes difficult to distinguish at onset and even during disease progression [10••]. FTD also overlaps genetically, pathologically, and neuropsychologically with ALS. ALS is characterized by the loss of motor neurons in the brain and spinal cord that consequently results in muscle wasting, spasticity, and death, usually within three years [8, 11]. PD is characterized by the loss of dopaminergic neurons in the substantia nigra pars compacta and the presence of Lewy bodies in the surviving neurons. The presence of Lewy bodies, or aggregates of alpha-synuclein, is a characteristic shared with DLB, but the diagnosis of either PD or DLB depends on the order of symptom manifestation. If motor symptoms are the first to manifest, followed by dementia symptoms within a year of PD diagnosis, the individual will be diagnosed with PD with Parkinson’s disease dementia (PDD). If dementia manifests first, the diagnosis will be DLB, even though the person could develop motor symptoms later on in the disease. This is an arbitrary definition that was reached by consensus due to the overlap of these two diseases [12, 13].

Shared Genetic Architecture among Neurodegenerative Diseases

Early genetic studies were focused on the identification of variants within the coding regions of proteins associated with each neurodegenerative trait. These studies allowed the identification of variants in genes such as apolipoprotein E (APOE) for AD and alpha-synuclein (SNCA) for PD among others. The study of families with extreme phenotypic characteristics, such as early age at onset for AD, allowed the identification of additional variants with Mendelian inheritance patterns. However, these fully penetrant mutations in general are present in a low percentage of ND cases.

Genome-wide association studies (GWAS) enabled the systematic screening of the genome. The analyses and meta-analyses of large cohorts identified additional variants with smaller effects on risk that were more common in the population. Currently, the largest GWAS is the meta-analysis of PD that includes a very large number of participants (Table 2) from the 23andMe PD cohort [15]. This study detected 44 loci, of which 17 are new findings missed in previous meta-analysis [19]. The International Genomics of Alzheimer’s Project (IGAP) is the latest and largest GWS meta-analysis published for AD (Table 2). This effort analyzed 74,046 individuals and identified 22 genetic loci. The estimated proportion of variation tagged by all SNPs is = 0.24, while the genetic heritability of AD is 0.74 [20, 21]. The largest GWAS for FTD is a two-stage meta-analysis with a total number of 12,928 participants (Table 2) [16]. Interestingly, additional FTD subtype stratified analyses identified additional loci. This evidence supports that different FTD subtypes have a distinct genetic architecture. A two-stage meta-analysis for DLB analyzed 6197 individuals [17]. In accordance with the already known overlap with AD, the top GWAS hit was the APOE loci. The largest genetic analysis for ALS analyzed 41,398 individuals and reported four loci with genome-wide significant association [18]. GWAS chips captured 8.5% of the genetic heritability, while the total is estimated at 65%. Additional modeling of the data (linear mixed models) identified four additional loci. This may be indicating that the genetic architecture of ALS is extremely heterogeneous, and it might be more informative to subclassify it into subtypes, similar to what was done in the FTD GWAS.

Table 2 Largest GWAS for neurodegenerative diseases

Remarkably, these studies demonstrated what was already observed during pathological examination of ND brains. The extent of overlap among the genetic architecture shared among neurodegenerative diseases is surprisingly high. Genetic studies of late-onset AD identified the ε4 allele of APOE, which increases the risk for AD (OR = 3.1 for heterozygotes, OR = 12 for homozygotes [22]), and it is present in approximately 15% of individuals of European ancestry. Even though it was described for the first time in AD, the ε4 allele has been associated with DLB severity [23], ALS age at onset [24], and its association with PD has also been reported [25, 26].

Autosomal-dominant AD accounts for between 1 and 5% of total AD cases [27] and presents a dominant inheritance pattern with variants in amyloid precursor protein (APP), presenilin 1 (PSEN1), and presenilin 2 (PSEN2) [28]. Several rare variants in PSEN1 have also been reported in PD patients [29, 30]. More recent genetic studies have identified low frequency coding variants in TREM2 associated with AD [31, 32]. Afterwards, variants in the same gene were identified in PD [33] and in FTD [34].

Mutations in LRRK2, a gene associated with PD [35], have also been found in two families with AD [36]. FTD has been linked to mutations in the MAPT and GRN genes [37] which have also been found to be involved in AD and PD [38, 39••]. Finally, mutations in C9ORF72, SOD1, FUS, and others are associated with both FTD and ALS [16, 18, 40,41,42,43].

Polygenic Risk Scores

Polygenic risk scores (PRSs) are simple models that have been instrumental to analyze genetic architecture and predict disease risk of complex traits, such as schizophrenia and bipolar disorder [44]. These scores aggregate genome-wide information to account for the phenotypic variation observed in complex traits, by assuming an additive, non-multiplicative, effect of multiple variants with variable effect sizes. This allows a more accurate assessment of an individual’s risk for a disease, given their genetic background, than evaluating each genetic variation independently. PRS can highlight at-risk individuals for closer examination and allow for the application of early intervention strategies. In addition, PRS can be applied as inclusion criteria for targeted clinical trials. Furthermore, the genetic overlap between comorbid diseases, previously identified only by epidemiological or clinical studies, can be evaluated by PRS, to determine whether the pleiotropic effects of variants identified in one disease leads to increased risk for another disease. For example, major depressive disorder and current psychological distress positively moderate the effect of polygenic risk for obesity on body mass index [45]. Many examples of the use of PRS can be found in the literature, such as the identification of the genetic overlap between schizophrenia and cognitive ability [46], as well as, major depressive disorder and body mass index [45]. Our group has employed PRS to study the extent of the overlap of the genetic architecture among distinct clinical manifestations of Alzheimer’s disease [47••].

Polygenic Risk Score Calculation

Polygenic risk scores are an estimate of disease risk carried by the individual based on the risk alleles and the corresponding effect sizes obtained from the GWAS summary statistics. The GWAS summary statistics will be referred as the base and the dataset to be evaluated as the target. There are three important factors for PRS construction: the base and target must be from independent datasets; quality control has to be performed on both, the base and target GWAS; and the selection of the significance threshold has to be evaluated to optimize prediction power.

Several quality controls need to be applied to the base and target datasets. First, the genome build and affected alleles must be matched between the reference and target datasets. Ambiguous SNPs (A/T and C/G variants) should be removed from the datasets, since it is not possible to match them with certainty. Because GWAS are typically performed one SNP at a time, the identification of independent genetic signals is challenging. In consequence, it is necessary to control for the genetic architecture of the population defined by the linkage disequilibrium (LD). Clumping is the standard approach to deal with LD, as it selects the SNP with lowest p value retaining independent associations for an area of LD. This is preferable to pruning, which is a random process and thus results in a representative SNP that may explain less of the total variance of the region of LD than a clumped SNP [48]. It is also possible to calculate PRS using variants with association significance that pass alternative cutoff values. In this way, PRSs allow to evaluate not only the variants with genome-wide stringent p values, but also the suggestive ones, or even variants with marginal p values (for example threshold = (0.05 × 10−5,1 × 10−4,…,0.05,0.1,…,0.5)) [49•]. Then, the association with additional traits is usually evaluated using logistic regressions. Additional covariates (for example sex and age) and confounding factors (such as principal components from population stratification) are also modeled. Given the multiple thresholds evaluated, it is usually considered significant statistical association when the p value < 1.00 × 10−3 [49•].

The calculation of PRS can be performed using PLINK [50], PRSice [49•], lassosum [51•], LDpred [52•], or Multi-trait analysis of GWAS (MTAG) [53•]. Since the PLINK score function uses a linear scoring system for calculating PRS, all quality control needs to be performed on the input data before running the calculation. Unlike PLINK, PRSice performs clumping and removes ambiguous SNPs as a default. Also, it allows the selection of different p value thresholds giving the best fit scores for the data. The other three methods introduce additional methodological aspects to calculate the PRS. Lassosum uses penalized regression to correct for LD structure and adjust effect sizes, while LDpred assumes a prior for genetic architecture and LD information from a reference panel. In consequence, the choice of methodology to calculate PRS also has to take into consideration the characteristics of the data available for the base and target GWAS. Several manuscripts provide guidelines for selecting optimal approaches [52•, 54•].

Estimates of the shared genetic architecture among traits can be also calculated using summary statistics and not individual genotype data. PRSice provides a convenient model to test it based on the inverse-variance method that corresponds to the instrumental variable method that uses individual-level data for Mendelian randomization approaches [55]. In addition, there are additional approaches to estimate the extent of overlap among traits. The method linkage disequilibrium (LD) score [56] regression estimates the genetic correlation of two traits, analyzing all SNPs in LD. It regresses the χ2 statistic against the LD scores, which is estimated summarizing the LD r2 in a predefined region. The method called coloc tests [57] whether two association signals share a common causal variant. It employs Bayesian approaches, and it was conceived to integrate gene analysis for one disease with expression quantitative locus, but it can also be employed to analyze two trait-associated analyses. Another alternative method is GNOVA (genetic covariance analyzer), which allows the estimation of genetic covariance using the method of moments [58], while allowing to stratify the variants analyzed. Finally, MTAG analyzes summary statistics to estimate genetic correlations among traits using bivariate linkage disequilibrium score regression while correcting for the possibility of overlap between samples. The research questions to be answered, the data available, and the planned analyses should also guide the choice of methodology.

Polygenic Risk Score in Neurodegenerative Diseases

Polygenic risk scores have been used in neurodegeneration for both testing the genetic overlap between characteristics of the same neurodegenerative disease, such as risk and age at onset, and testing the genetic overlap among neurodegenerative diseases.

Genetic Overlap within Characteristics of the Same Disease

Alzheimer’s Disease

Even though many methods have been used to calculate PRS for AD, the overall results are coincident among all of the studies. In general, the results from the GWAS in the International Genomics of Alzheimer’s Project (IGAP) study have been used to model the PRS [14]. The first PRS was published in 2015. By adding the polygenic burden, they were able to predict disease development with an area under the ROC curve of 78% when age sex and APOE genotype were included [59]. In this study, a subset of the IGAP samples was used to investigate the prediction accuracy of models trained with the weights learned from the analysis of the entire IGAP cohort [59]. In another study, the analysis of a subset of the IGAP cohort with neuropathological data produced an increased area under the ROC curve (AUC = 84%) [60]. The authors conclude that most of the missing heritability and the moderate values under the ROC curve may be due to the diagnostic accuracy, and thus, for non-pathologically confirmed AD, there is room for improvement [60, 61••].

The PRS has also been used to detect individuals at greater risk for developing AD, and proved to be successful, even in those individuals that were noncarriers of the APOE ε4 allele. The PRS predicted longitudinal clinical decline in older individuals that showed moderate to high depositions of amyloid beta and or tau [62] and in clinically diagnosed AD individuals [63]. Moreover, the AD PRS was found to predict the level and rate of memory loss in a sample of non-Hispanic whites from the Health and Retirement Study [64] and in the ADNI (Alzheimer’s Disease Neuroimaging Initiative) cohort [65]. In the same work, the authors evaluated the AD PRS in non-Hispanic black participants and found similar results but with a weaker association [64]. The authors argue that distinct factors can limit the power of the PRS in populations with different genetic backgrounds. The genotyping platforms employed for the discovery analyses were designed to capture variation among populations with European ancestry, and the SNPs included might not be as effective in tagging the significant loci in other populations. In addition, most of the GWAS for AD predominantly analyzed participants with European ancestry, and allele frequencies may vary between ethnicities and could alter the detectable effect sizes [64].

PRS has not only been utilized to improve AD diagnosis. It has also been used to test for genetic commonalities between characteristics of AD, like AD risk [47••, 66••, 67], age of disease onset [47••], and AD biomarkers [47••, 68]. PRS has been used to demonstrate that the genetic architecture of sporadic AD is shared with familial AD without Mendelian mutations [66••, 67]. Similarly, the extent of overlap of the known genetic architecture was compared between early (< 65 years at clinical manifestation of symptom) and late onset (> 65 years). The odd ratios between these strata are different (1.40 for sporadic late onset and 1.75 for familial late onset versus 2.27 for sporadic early onset) [47••]. In fact, the genetic factors included in the PRS seem to have additive effects on age at onset [47••]. Finally, PRS has been associated with CSF ptau181-Aβ42 ratio and CSF tau in autosomal dominant AD [47••]. In patients without clinical dementia, the predictive value of amyloidopathy and tauopathy seems to increase as a function of the PRS [68].

Parkinson’s Disease

The largest meta-analysis performed by Nalls et al. [19], prior the inclusion of the 23andMe data [15], has usually been employed as the reference to build PD-related PRS. These PRSs have been associated with the age at onset of PD [69••], faster motor and cognitive decline [70], and PD status [69••]. An initial study reported that only PRS that included the effect of SNPs with p values below nominal significance thresholds were significantly associated with PD in an additional independent PD dataset [71••], implying that the genetic architecture of PD includes many common variants with small effects. Further studies showed that PRS based on more significant (sentinel) SNPs are also associated with PD risk [69••]. In addition, PRSs were employed to show a higher genetic burden in early-onset PD compared to late onset (maximum OR of 4.8 and p < 0.001 [71••]). Thus far, the PD PRS risk has not been successfully associated with CSF alpha-synuclein levels [69••].

Dementia with Lewy Bodies, Frontotemporal Dementia and Amyotrophic Lateral Sclerosis

At the time of this review, no PRS had been attempted for these diseases. Several facts can explain the lack of a PRS in these ND. First of all, the sample sizes for the GWAS of DLB, FTD, and ALS are not as large as AD or PD (Table 2), not is the amount of explained heritability. In addition, both FTD and ALS are very heterogeneous and can be stratified in different subtypes, reducing the sample size for each group and thus the power of the GWAS and the PRS.

Genetic Overlap among Neurodegenerative Diseases

PRS provides the mean to compare the genetic architecture of diseases that are suspected to have some genetic overlap. PRS is used to test if the diseases share genetic architecture. The PRS for AD has been found to be associated with amnestic and nonamnestic mild cognitive impairment, whereas the PRS for PD and FTD were only associated with nonamnestic, mild cognitive impairment. However, using these to predict future dementia was unsuccessful, probably due to the heterogeneity of the population with mild cognitive impairment [72]. A recent report [73] shows that even though the association of AD and PD PRSs with case-control status of DLB is highly significant, the amount of variance explained by both PRS is relatively small (for AD is = 1.33% and 0.14% considering or not the APOE locus, and for PD is = 0.37%). This adds evidence to the fact the DLB is an entity unto itself, with unique genetic risk factors, and not a mixture of AD and PD [73]. Similarly, PRSs have been used to test whether AD, ALS, and FTD are associated with cognitive function and physical health in healthy individuals [8] and to show that, while the three diseases showed an association with cognitive function, the risk for ALS was not associated with physical function.

PRSs have been used to investigate the extent of overlap of the genetic architecture between ALS and schizophrenia [44] and widely employed to determine the shared genetic risk between AD and PD risk and additional characteristics of the diseases, including age at onset, biomarker levels, and disease progression [8, 10••]. In addition, other methods have been employed to evaluate the genetic overlap among neurodegenerative diseases [10••] including fold-enrichment plots of the nominal −log10 (p values) for all FTD-SNPs and a subset of SNPs determined by the significance of their association with PD and AD. The study concluded that there is a genetic pleiotropy between AD, PD and FTD.

Pathway Specific PRSs

Additional research has been performed to interrogate the extent of overlap of pathway specific genetic architecture among traits, by aggregating biological knowledge into the calculation of the polygenic risk to derive pathway specific risk. In these studies, the PRS is calculated for variants located in genes that are part of specific pathways, instead of considering the entire genome. For AD, PRSs summarizing the immune, amyloid beta clearance, and cholesterol pathways were created to predict AD-related biomarkers. In this case, the PRSs were poor predictors of cognitive function [74••]. This may be due in part to the incompleteness of pathways, such that variants in genes not known to be part of a pathway are excluded. Two studies have demonstrated the relationship between AD and inflammatory diseases using PRS [75, 76]. AD risk PRS was associated with increased levels of plasma inflammatory biomarkers, adding additional evidence to the involvement of inflammatory processes in AD [75]. While the global increase in inflammation seen in diseases such as multiple sclerosis does not influence age-related cognitive decline, variants that alter peripheral immunity influence microglial density and expression of immune genes in the aging brain. Thus, the influence of peripheral immune function on glial cell activation warrants further study [76].

Future Directions

The research community has invested a considerable amount of effort generating GWAS data for ND, which has proven instrumental to the discovery of novel variants and genes associated with several disease traits. An ongoing challenge in the field of genetics is determining the relative roles of common variants with small effects and rare variants with large effects on disease phenotypes. Omnigenic model is a promising technique that integrates the effect of both common and rare variants along additional gene regulatory networks to analyze the genetic architecture of complex traits [77]. The power of this approach is constrained by the number of subjects analyzed in whole genome sequencing projects, which thus far is lower than the number of subjects included in GWAS. In addition, it was proposed that participant stratification is a promising strategy for genetic studies [78]. Current analyses of FTD and ALS data support this approach.

PRS combined with pathway analysis are enabling researchers to determine which model better fits complex diseases such as neurodegeneration and can help to determine whether therapeutic and preventative measures will be best targeted to specific genes or more broadly to pathways. Importantly, genetic studies have demonstrated the high complexity of neurodegenerative traits, whose risk is modulated by a large number of variants, with either small effect or very low frequency in the population. Thus, genetic studies need to include a very large number of subjects to pass stringent genome-wide thresholds. However, the sample size collected and meta-analysis varies considerably among diseases (Table 2). The same concept can be applied to the efforts invested in constructing PRS in the different diseases. Querying PubMed for publications that employed PRS for ND provides a snapshot of the quantitative effort thus far invested for the different traits. A total of 43 manuscripts were retrieved while querying the manuscripts for AD “(PRS or Polygenic Risk Score) AND (Alzheimer), but for FTD, the search produced only two manuscripts. Furthermore, a search of very active and highly invested research areas, other than ND, shows that PRSs are widely used, as PubMed included 218 publications for heart disease, 141 for cardiovascular disease, and 1127 for cancer.

Conclusions

Genetic studies have provided valuable insights and novel understanding of ND. Increasing the sample size of the neurodegenerative disease cohorts and performing the meta-analyses on larger studies for FTD, DLB, and ALS will be critical to decipher the genetic structure of each disease and to investigate the genetic architecture shared by these neurodegenerative diseases with overlapping symptomology. In fact, this may also aid in the creation of a clinically useful PRS for neurodegeneration that allows the detection of individuals at risk, so they can enroll in clinical trials of neurodegenerative therapeutics. It is also plausible that a neurodegeneration PRS can be further optimized or combined with other phenotypic or molecular characteristics to specifically predict AD, FTD, DLB, and ALS.

One limitation of PRS is that they can only provide a maximum accuracy [61••], which is bounded to disease prevalence, which varies with the age and heritability. For example, GWAS chips only capture approximately a third of the estimated genetic heritability of AD (0.24 vs 0.76 [20, 21]), and prevalence for AD varies greatly with age (from 3% in the 65–74 age range to > 30% for those older than 85 years). Thus, the genetic risk captured by PRSs, which is lifelong constant, should be combined with additional biomarkers and clinical and environmental data to select at-risk individuals for therapeutic interventions and to produce better diagnostic tools.

To date, no PRS for ND that combines both common and rare variants has been calculated, which may lead to an increase of their accuracy. Novel machine learning methods are being developed to capture compact representations of GWAS [79] and coupled with powerful classification methods, namely deep neural networks, to produce highly accurate predictions [80,81,82]. The parallel development of larger cohorts, molecular phenotyping and the advancement of novel and more sophisticated methods to represent and operate multi-variant models, will allow a more precise discernment of the overlap of the genetic architecture among ND and predict individuals at risk with the accuracy required in clinical settings.