Introduction: Recent Perspective on Alzheimer’s disease (AD) Genetics

Alzheimer’s disease (AD) is often considered from distinct genetic perspectives [1] and grouped into early-onset AD (EOAD) (i.e., age at onset <65 years) and late-onset AD (LOAD). At the same time, AD is often viewed as familial (i.e., when two or more family members are affected) or sporadic. However, even in sporadic AD, it is expected that genetics, interacting with environment, are strongly involved. Recent genetic findings in AD are built upon earlier discoveries that revealed the autosomal dominant forms of AD [related to mutations in APP (amyloid precursor protein) and PSEN1 and PSEN2 (presenilin-1 and -2)]. These autosomal dominant forms of AD are most frequently found in EOAD and account for about 2 % of AD cases overall [1]. Despite the tantalizing indication that a substantial amount of the genetic contribution to AD remained following the discovery of the autosomal dominant forms of AD and the apolipoprotein E (APOE) genotype risk factor, which can be seen as acting as a semi-dominant, incompletely penetrant genetic cause of AD, it is only recently that more advanced genetic analyses have begun uncovering additional genetic risk factors for AD. These new methods fall into two general categories: genome-wide association studies (GWASs) for the detection of common alleles and, more recently, studies using whole-genome and whole-exome sequencing (WGS and WES, respectively) for the detection of rare alleles. We review findings from these types of studies in AD in this article.

A Multitude of Risk Factors: The Genome-Wide Association Study (GWAS) Harvest

GWASs Based on Disease Phenotype: Large-Scale Studies

The first wave of GWASs in AD focused on LOAD. Candidate loci with genome-wide significant statistical associations with AD were identified, but these findings proved difficult to replicate, and few, if any, of these initial genetic associations have remained. However, this first wave of studies enabled researchers to strengthen their approaches in GWAS design and execution, efforts that have borne fruit in the second wave of GWASs, which we examine here. In particular, larger sample sizes, the growing sophistication of statistical methods, especially those related to the pooling of data in meta-analyses, and the use of multi-staged studies that include both discovery and confirmation stages have together greatly increased the power to detect relatively small genetic influences on AD. The limitations of GWASs are well-known, including, most prominently, the fact that the findings are often hard to decipher: significant markers from GWASs do not always implicate specific genes, and the relationships between implicated single nucleotide polymorphisms (SNPs) and relevant genes are often unclear. However, because GWASs point strongly toward genes that neighbor or incorporate the identified SNPs, researchers have nonetheless been able to identify an increasing number of specific genes that are now thought to increase the risk for AD. It should be noted that additional levels of complexity have also been incorporated into such analyses, including replications across different populations, examinations of the impact of allele frequencies in these different populations, and methods that control for potentially confounding factors. For example, in a study combining a GWAS with a family-study structure [2] and a methodology that carefully controlled for APOE allele status, some previously significant genetic variants appeared to be weakened by controlling for APOE while others were confirmed.

The second wave of GWASs have identified a large group of genetic risk factors. Those with the strongest evidence include CLU (clusterin), CR1 (complement receptor 1), and PICALM (phosphatidylinositol-binding clathrin assembly protein), as discussed below. Others include BIN1 (bridging integrator protein 1), EPHA1 (ephrin receptor A1), ABCA7 (adenosine triphosphate-binding cassette protein A7), the MS4A (membrane-spanning A4) genes, CD33 (sialic acid-binding immunoglobulin-like lectin 33), and CD2AP (CD2-associated protein). Perhaps most intriguingly, these findings are beginning to suggest a pattern of non-random association with biochemical functional groups of genes, and, ultimately, this pattern is likely to be where GWASs will make their strongest contribution. One suggested grouping [3] of these GWAS-based candidates points to increasingly strong associations with β-amyloid, lipid export, immune, and synaptic function pathways; these authors note that some genes span and interconnect these pathways, which strengthens the case that the pathways are involved in AD pathobiology.

In fact, prior to about 2009, apart from the clearly autosomal dominant forms of AD, only APOE was found, robustly and repeatedly, to be a genetic risk factor for AD. The difficulty with the first wave of GWAS replication pointed to a need for larger sample sizes and for the more robust application of meta-analytic techniques, as well as increasingly wide networks of consortia to undertake these studies. Since then, consortia have been pooling together large GWAS datasets in meta-analysis-like approaches. With this unprecedented power, a growing list of markers that are thought to be associated with specific genes and candidate risk factor genes has emerged, and some of these risk factors have been replicated. Note that these genes are not necessarily considered causative; rather, they are relatively common alleles that either increase or decrease AD risk. Each individual genomic location may contribute a small amount of risk. Perhaps most significantly, these genes are painting an ever more detailed picture of the pathways involved in AD pathogenesis. Ultimately, an understanding of these genes will greatly enhance our ability to develop specific, disease-modifying treatments that have been lacking to this point and that are critically needed.

A second-wave GWAS that included a two-stage design and more than 16,000 individuals [4] was the first to reliably identify genetic contributors beyond APOE. In this study, CLU and PICALM surpassed genome-wide statistical significance, while BIN1 reached “suggestive” statistical significance, and, as expected, APOE was clearly associated with AD risk. A later study then replicated CLU and PICALM but found that these loci did not add to a predictive model that included the APOE genotype. However, these authors note that this does not preclude a biochemically important role for CLU and PICALM in AD [5]. The study also found that two other markers, near BIN1 and MARK4 [microtubule-associated protein (MAP)/microtubule affinity-regulating kinase 4], reached significance. Another large GWAS (2,032 cases) that was conducted early in the second wave used a multi-stage analysis that provided for within-study replicative evidence to report significant risk effects linked to CLU, again, and CR1, both with increased risk odds ratios (ORs) of around 1.22 [6]. The authors of this study note that both CLU and CR1 had been previously linked to AD pathobiology, in particular with the clearance of β-amyloid.

The Alzheimer’s Disease Genetics Consortium (ADGC) carried out a large US GWAS with a discovery stage, two replication stages, and a meta-analysis [7]. In this study, loci associated with MS4A4A, CD2AP, EPHA1, and CD33 were shown to significantly alter the risk of LOAD. In addition, CR1, CLU, BIN1, and PICALM findings were replicated, but findings related to EXOC3L2 (exocyst complex component 3-like 2) were not replicated. The replicated genes were estimated to have population-attributable factors between 2.72 and 5.97 %. In a companion European Genetic and Environmental Risk for Alzheimer’s Disease Consortium (GERAD) GWAS that included a meta-analysis, MS4A4A, EPHA1, CD2AP, and CD33 were also identified, and a marker at ABCA7 also reached significance [8]. This study also replicated evidence for an association between AD and BIN1 and CR1.

Most recently, in the largest GWAS published to date from the International Genomics of Alzheimer’s Project (IGAP), which included 74,046 subjects and a large, two-stage meta-analysis, 19 loci reached significance [9]; 11 are described as novel, while 8 provided replicative evidence. The previously known loci were related to CR1, BIN1, CD2AP, EPHA1, CLU, MS4A, PICALM, ABCA7, and CD33. Among the novel loci associated with AD risk in this study, the authors particularly highlighted markers near HLA-DRB (human leukocyte antigen-DRB) (encoding major histocompatibility complex class II, DRβB) and ZCWPW1 (zing finger, CW type, with PWWP domain). Evidence was also found tying SORL1 (sortilin-related receptor L) to AD risk; this is of interest because this locus has been identified previously in other studies. These findings illustrate the way that evidence for specific genes can accumulate when multiple approaches are leveraged.

At the same time, complementary evidence has emerged from GWASs of a relatively smaller scale that in an effort to assess the commonality of genetic influences on AD risk has focused on more specific ethnic groups. A two-stage GWAS in 1,009 African Americans with LOAD, for example, identified APOE as well as confirmatory evidence of an association between LOAD and ABCA7 [10]. The study, which included a second stage for replication and comparison, also identified CLU, PICALM, BIN1, EPHA1, MS4A, and CD33 as significant, though, interestingly, the related risk was not always in the same direction as in previous studies; the authors suggest that while the same genes may contribute to AD pathogenesis in African Americans, the causal variants may not be identical. This study also found evidence that suggests PROX1 (prospero homeobox protein 1) and CNTNAP2 (contactin-associated protein-like 2) as additional candidate genes for AD risk.

Alternatively, smaller GWASs can be successful by employing novel or variant analysis methods and approaches. A recent GWAS aimed to reveal patterns of associations within genes by employing a “mega meta-analysis” and an alternative gene-wide analysis; this study led to confirmatory evidence for 20 genes previously identified in second-wave GWASs and also identified two novel loci, TP53INP1 (tumor protein p53-inducible nuclear protein 1) and IGHV1-67 (IGHV1-67 immunoglobulin heavy variable 1-67, pseudogene) [11]. Likewise, a similar study implicated three genes near recently reported SNPs [ZCWPW1, NDUFS3 (nicotinamide adenine dinucleotide [NADH] dehydrogenase [ubiquinone] Fe-S protein 3), and MTCH2 (mitochondrial carrier 2)] [9]. More modestly scaled studies have also been valuable in that they have allowed confirmatory evidence to emerge regarding the markers uncovered in larger studies. A study of 1,291 novel cases with LOAD that included a second meta-analysis step with a larger group, for example, reconfirmed six previously identified markers (inclusive of APOE) [12]: PICALM, BIN1, ABCA7, MS4A4/MS4A6E, and EPHA1. Additionally, a suggestive novel marker was identified as potentially significant in the PPP1R3B (protein phosphatase 1, regulatory subunit 3B) gene. Ideally, such smaller studies serve to reinforce and complement large-scale studies and to enrich the gene discovery process.

Endophenotype-Based GWASs

Along with conventional GWASs that are based on the clinical diagnosis of AD, recent GWASs have utilized endophenotypes. These endophenotype-based studies allow investigators to explore different neurobiological aspects of AD, potentially revealing information that is not accessible when studying AD as the only phenotype. However, because these studies require a more detailed analysis of individual cases beyond the binary presence or absence of the AD diagnosis, they have not yet been scaled up to the large study sizes that are more typical of the second wave of GWASs. Still, although the small sample sizes reduce the power of these studies to detect the relatively small genetic effects expected, interesting and novel findings have emerged.

Neuritic Plaque Burden

In a relatively small sample for which detailed pathological data had been obtained, a candidate-based GWAS approach was undertaken to explore genetic links to neuritic plaque burden [13]. In the GWAS phase of the study, no genome-wide significant SNPs were identified. However, a suggestive link was made to novel candidate genes, including KCNIP4 (Kv channel interacting protein 4), PTGS1 (prostaglandin-endoperoxide synthase 1), and the HLA locus. The latter is particularly interesting given the possible links between immune function and AD. In the candidate gene analyses, APOE, CR1, ABCA7, and CD2AP were found to be linked to neuritic plaque burden as well, thereby connecting findings from previously conducted disease-based, large-scale GWASs to an endophenotype.

Rate of Cognitive Decline

A recent two-stage GWAS used a functional marker of disease progression to find genes related to the rate of cognitive decline in AD [14]. Because there is very little known about the variability in the rate of the progression of AD, this study potentially represents a highly fruitful area for research. Although the sample size of the GWAS was small (303 cases), SPON1 (spondin 1) was identified in both the discovery and replication phases of the study; its minor allele was linked to slowed disease progression. The authors also linked a novel genetic variant to the progression of AD that had not been found in larger studies that used clinical AD as the phenotype.

Cerebrospinal Fluid Tau and p-Tau Levels

A GWAS that was conducted in 1,269 AD cases with cerebrospinal fluid (CSF) tau and p-tau levels as biomarkers revealed four loci that were significant across the genome [15]: APOE, and markers at 3q28 (which is not clearly linked to any gene), GLIS3 (GLIS family zinc finger 3), and 6p21.1 in the TREM (triggering receptor expressed on myeloid cells) gene cluster. These four loci accounted for 22 % of the genetic variation in CSF p-tau levels. However, not all of these markers were linked to AD risk itself.

Plasma Amyloid Peptide Levels

A GWAS was conducted in a sample of older European adults with plasma amyloid peptide levels used as an AD endophenotype [16]. Although no markers reached genome-wide significance in this study, there were suggestive links to several interesting markers.

Taken together, these studies, though as yet small, already suggest that pursuing GWASs using endophenotypes, whether as biomarkers or functional measures, can yield novel genetic associations that are not identified in larger, more typical GWASs. Even with appropriate caution, given that these study findings are not as yet replicated (and keeping in mind the negative fate of findings from the first wave of GWASs), these endophenotype studies are a promising step toward uncovering functionally relevant genetic contributors to AD pathogenesis.

Whole-Genome and Whole-Exome Sequencing

Our ability to sequence whole exomes and whole genomes is progressing rapidly and is just beginning to be applied to the exploration of the genetics of AD. For example, in a familial AD dataset that excluded the three major autosomal forms of AD, exome sequencing identified rare variants in SORL1, a gene that had been previously identified in other settings as playing a role in β-amyloid production [17]. In this case, these rare SORL1 variants would not have been associated with AD in standard GWASs, but they were identified from exome sequencing in a group of families known to be at high risk for AD due to their family history.

TREM2

Jonsson et al. used extensive WGS data in a set of 2,261 Icelanders to identify variants in TREM2. Their genetic analyses found a rare missense mutation in TREM2 (predicted mutation R47H) that was associated with the risk of AD with an OR estimated at 2.09–4.09 (95 % confidence interval), and they replicated this finding in other datasets [18]. A combined genome, exome, and Sanger sequencing approach was used to independently identify the same mutation, associating this heterozygous rare variant in TREM2 with an increased risk of LOAD; a brief letter describing a meta-analysis also reported this finding, with a similar risk (OR 2.65–4.35) [19]. TREM2 had previously been identified as harboring the causative mutation in autosomal recessive Nasu-Hakola disease (polycystic lipomembranous osteodysplasia with sclerosing leukoenchepalopathy), a disease that is distinct from AD, yet, interestingly, one individual with Nasu-Hakola disease had been identified as bearing senile plaques and neurofibrillary tangles long before the genetic source of the disease was known [20]. As with hits from GWASs, these exome sequencing findings suggest clues toward identifying the functional pathways in AD, as TREM2 has been implicated in immune function in the brain. Recently, a TREM-like 2 coding missense variant was identified as potentially playing a protective role in AD, further suggesting that this family of receptors may play crucial roles in AD pathogenesis [21].

PLD3

WES can be particularly effective at revealing low-frequency coding variations with relatively large effects on AD risk. PLD3 (phospholipase D family, member 3) has been identified using a WES approach in 14 large LOAD families [22]. Unlike GWAS methods, this approach by definition directly identifies the mutations linked to risk. In this case, PLD3 is an especially intriguing finding, as it has been linked to APP processing, and the risk associated with these variants is up to twofold.

APP

A major familial form of EOAD is known to be caused by APP mutations, but recently an allele of APP was found to be protective. In a study interrogating large Icelandic WGS data [23], a predicted A673T change in APP reduced both the risk of AD and of cognitive decline in older subjects without AD. This mutation may function by reducing beta cleavage in APP, as in vivo studies revealed a 40 % decline in amyoidogenic peptides. From a genetic perspective, this study illustrates how WGS can lead to further studies that uncover surprising new genetic influences on AD.

NOTCH3

Albeit at a much smaller scale than the studies of APP, a study utilized WES to identify an AD-related mutation in NOTCH3. This mutation was previously associated with the cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL) phenotype and not associated with AD [24].

Copy Number Variants

Copy number variants (CNVs) represent a final source of genetic variation that may be associated with the risk of AD. An initial study identified variation in APP CNVs as underlying familial AD [25]. In a follow-up study of ten families, ten novel private CNVs segregated with EOAD (or families with mixed EOAD/LOAD onset types). The CNVs were noted to be in gene-rich areas, suggesting that these areas may be linked with AD pathogenesis [26]. However, on the whole, this type of genetic variation has not been consistently associated with AD.

Conclusion: Perspective on Pathways

The findings from large-scale GWASs, endophenotype-based GWASs, and studies that use WES and WGS collectively suggest that, apart from a few relatively rare though biologically highly informative causes of AD, much of the variance accounting for genetic risk in AD occurs through common variants that individually contribute modest amounts of risk. This phenomenon has also been observed in other complex genetic diseases that are characterized by both substantial genetic contributions and environmental factors. Perhaps the most promising feature of these early days of “big data” AD genetic research is that the genetic findings discussed here are gradually illuminating the biochemical pathways underlying the pathogenesis of AD; as more genes are identified through large-scale GWASs, WES/WGS, and novel combinations of these methodologies, a picture of the pathophysiology of AD is gradually emerging and coming into sharper focus. Although the total genetic risk accounted for by the markers implicated in AD since the discovery of APOE continue to be relatively small, the discovery of these markers has great potential for understanding the disease. Indeed, with the increasing profusion of genetic markers that can reasonably be associated with particular genes, it has become possible to begin placing these markers into potential pathways, which in turn might be targeted for novel AD treatments. And, as we have noted, while the genetic risk attributable to any given locus may be small, treatments targeting the pathways implicated by these loci may have much larger therapeutic effects.