Current Neurology and Neuroscience Reports

, Volume 11, Issue 3, pp 246–253

Alzheimer’s Genetics in the GWAS Era: A Continuing Story of ‘Replications and Refutations’


    • Department Vertebrate Genomics, Neuropsychiatric Genetics GroupMax-Planck Institute for Molecular Genetics

DOI: 10.1007/s11910-011-0193-z

Cite this article as:
Bertram, L. Curr Neurol Neurosci Rep (2011) 11: 246. doi:10.1007/s11910-011-0193-z


After a decade of intensive investigation but only few replicable results, Alzheimer’s disease (AD) genetics research is slowly picking up pace. This is mostly owing to the completion of several genome-wide association studies (GWAS), which have suggested the existence of over three dozen potential new AD susceptibility genes. Although only a handful of these could be confirmed in subsequent independent replication efforts to date, this success rate is still much higher than in the pre-GWAS era. This review provides a brief summary of the principal methodologic advances in genetics research of the past decade, followed by a description of the most compelling findings that these advances have unearthed in AD. The paper closes with a discussion of the persistent methodologic difficulties and challenges and an outlook on what we can expect to gain from the next 10 years of AD genetics research.


Alzheimer’s diseaseCausal genesRisk genesSusceptibility factorsGenome-wide association studyMeta-analysisComplex genetics


Ten years ago, I co-authored a review article in this journal on the status of Alzheimer’s disease (AD) genetics research [1]. At the time, a “novel AD gene” was proclaimed almost by the month, but subsequent independent replication efforts essentially always failed to support the initial findings. Accordingly, our review was somewhat heretically subtitled “Of Replications and Refutations,” and provided an update on the then most promising genetics findings in AD. Apart from rare and highly penetrant disease-causing mutations in APP (amyloid precursor protein; located on chromosome 21q21.3), PSEN1 (presenilin 1; on 14q24q.2), and PSEN2 (presenilin 2; on 1q42.13) leading to early-onset Mendelian forms of AD, the hottest leads in 2001 revolved around genetic linkage findings on chromosomes 10 and 12, each containing a number of interesting putative susceptibility genes. In addition, over three dozen loci on other chromosomes were discussed as potential AD risk factors.

As we know now based on systematic evaluations of the field as a whole, there were about 450 genetic association studies published by the end of 2001, about one third of which had claimed to have found significant association between AD risk and at least one of approximately 150 potential candidate genes. However, only nine of these showed evidence for significant association based on meta-analysis of the then available data (re-analysis based on [2]). In the decade since, the combined genetic evidence, as judged by a number of publications, has more than tripled (currently counting nearly 1,400 studies), giving rise to no less than 40 candidate AD loci currently showing at least nominally significant risk effects in meta-analyses combining all available published data (see the AlzGene database for an up to date status of the field at [2].

These numbers mean little without considering the scope and quality of the underlying studies. And it is precisely the latter that has seen some drastic changes since the first writing of our status report. In this review, I will provide a brief summary of the principal methodologic advances in genetics research of the past decade, followed by a description of the most compelling findings that these advances have brought about in AD. The review closes with a discussion of the persistent difficulties and challenges and what we can expect from the next 10 years of AD genetics research.

Strategies to Identify Novel Genes in a Complex Trait

AD is a classic “genetically complex disease.” This term describes a situation in which the same clinical syndrome is induced by a multitude of genetic factors that either cause or modify predisposition for the disease in question. Although the genetic underpinnings for some forms of AD can be attributed to a single variant in the DNA sequence of affected individuals (ie, “early-onset familial” or “Mendelian” AD), the vast majority of AD cases do not carry such disease-causing mutations. Instead, these forms (usually summarized as “late-onset” or “non-Mendelian” AD) are believed to be the result of multiple genetic and nongenetic factors that, in concert, significantly alter the risk to develop neurodegeneration and eventually AD. There is now unequivocal evidence suggesting that genes play a (more appropriately: the) significant role in this scenario. For a more detailed discussion of the data supporting this claim, the reader is referred to the most recent and largest twin study on the topic [3], which suggested that the heritability of AD (ie, the proportion of disease susceptibility in a given population due to genetic variation) may be between 60% and 80%. These numbers are in line with earlier estimates from much smaller samples [4].

Finding Disease-Causing Mutations Via DNA Sequencing

Once a genetic contribution to disease susceptibility is suspected or proven, the hunt for the underlying disease genes can commence. There are two overarching strategies that differ based on the individuals under scrutiny (Fig. 1): if a specific disease is “running in the family” and DNA is available from a sufficient number of affected and unaffected relatives, the most efficient strategy is to single out chromosomal regions that are co-inherited with the disease, a strategy referred to as genetic linkage analysis. Once significant linkage intervals have been identified, the underlying disease-related DNA variants can be sought. Before completion of the Human Genome Project [5], this often required actually identifying the genes localized under the linkage peak first (usually referred to as “positional cloning”) before attempting to identify the functional mutations. This strategy was highly successful for a large number of Mendelian diseases, including many common neurodegenerative disorders [6]. In AD, linkage analysis followed by positional cloning led to the identification of APP, PSEN1, and PSEN2 as disease genes [7] (for an up-to-date overview of causal mutations in these and other neurodegenerative disease genes see the Alzheimer Disease & Frontotemporal Dementia Mutation Database at
Fig. 1

Flow chart of the strategies used to identify novel disease genes. This simplified schema outlines popular strategies to identify mutations and polymorphisms causing or predisposing to disease. Depending on the observed or suspected mode of inheritance, the search for disease-related sequence variants typically involves mutation screenings (Mendelian forms) or association analyses (non-Mendelian or sporadic forms). “Focused” in this context usually entails the study of certain gene(s) (usually considered “candidate genes” based on evidence from functional experiments or genetic linkage data), whereas “systematic” refers to large-scale approaches unbiased with respect to the nature of the underlying genes. The latter can be achieved by applying high-throughput sequencing (next-generation sequencing [NGS]; eg, analyzing the entire exome or genome) or genotyping (genome-wide association study [GWAS]) technologies. Initial findings are followed up depending on the original design, and are subsequently (or simultaneously) subjected to functional characterization. Broken lines indicate “short cuts” defining novel disease genes based on the genetic evidence alone, which is the case for most of the currently known susceptibility genes in the neurodegenerative disease field. Note that there are examples of genes/mutations with reduced penetrance (left-hand red box) that are considered bona fide disease genes (eg, certain mutations in PSEN1 in Alzheimer’s disease). (From Lill et al. [25]; with permission)

Fortunately, the advent of an essentially complete map of the human genome as a result of the Human Genome Project eliminates the need for the “positional cloning” part, and enables researchers to plan and prioritize their gene-finding strategy in silico. The most efficient (but also most expensive) approach makes use of recent advances in high-throughput sequencing technologies that now allow researchers to determine the sequence of an individual’s whole genome at base-pair resolution in one experiment. These developments were made possible through massive parallelization and miniaturization of the underlying sequencing reactions (for a recent review see [8]). Although the generation of large-scale sequence data using these new and powerful technologies are relatively straightforward, its bioinformatic management and interpretation are not. This is partly due to the sheer amount of information created. For instance, a single human genome consists of approximately 3.2 billion base-pairs (Gbp), each of which needs to be covered at least 15- to 20-fold to confidently differentiate between wild-type allele and mutation, yielding a minimum of 50 Gbp per DNA sample per experiment. The other, even more problematic aspect is that rare and potentially “functional” DNA sequence variants occur at much higher frequency in the general population than anticipated, albeit without any obvious detrimental effect [9••]. This means that not every amino-acid changing nucleotide substitution found to cosegregate with disease in a given family automatically represents the underlying disease-causing variant. Notwithstanding these difficulties, massively parallel sequencing has already been successfully applied in the identification of disease-causing mutations in a number of Mendelian disorders [10•, 11•], including neurodegenerative diseases [12•].

Finding Genetic Risk Factors Via Genome-Wide Association Screening

The majority of today’s genetics projects are not aimed at the identification of rare, disease-causing mutations, but instead at finding common DNA variants (ie, “polymorphisms”) that can modify susceptibility to high-prevalence disorders, such as AD. Currently, the method of choice in terms of efficiency, cost-effectiveness, and subsequent replication success is to determine the genotypes of hundreds of thousands of single nucleotide polymorphisms (SNPs; the simplest type of polymorphism) on a genome-wide scale (eg, in affected cases and healthy controls). This is followed by statistical analyses, which probe for significant differences in allele or genotype frequencies between cases and controls. Studies using this strategy are most commonly referred to as genome-wide association studies (GWAS). In addition to not being limited to some predefined set of candidate genes, the GWAS approach has several other advantages that increase the validity of the emerging results, eg, the ability to adjust for otherwise difficult-to-detect population substructure, to perform in silico fine-mapping based on genotype imputation, and to serve as a “replication engine” for proposed associations from other datasets without having to perform additional experiments. Conversely, owing to the very large number of markers tested in a typical GWAS, many SNPs are bound to show significant effects by chance alone, which has to be taken into account when reporting and interpreting GWAS results. One significance threshold commonly used to declare “genome-wide significance” in this setting is an α level of ≤5 × 10−8 [13].

Although GWAS have significantly reshaped the genetic landscape of many common human diseases over the past few years, GWAS-based association findings can still be false. In AD, over 40 GWAS “hits” have been proposed since 2007 (Table 1), but probably less than 20% of these appear to be real upon evaluation in independent samples. There are a number of potential reasons for this failure to replicate GWAS findings in AD, the most important being random error aggravated by lack of power due to insufficient sample size. This was already the main reason leading to the long list of refuted association findings in the candidate gene era, just on a much smaller scale. The solution, as before, is to use sufficiently large sample sizes in the initial screening and to rigorously replicate significant or suggestive findings in independent datasets. These can either be genotyped de novo for the putative disease-associated variants, or they can be directly extracted from existing independent GWAS datasets without the need to perform any additional laboratory experiments. The most efficient approach is to combine multiple independent GWAS datasets by meta-analysis, an approach that has led to some significant novel discoveries in other neurodegenerative disease [14, 15], and that is now catching on in the AD field as well.
Table 1

Overview of all published GWAS in AD



SNPs (n)

Samples in GWAS (n) (AD and CTRL)

Samples in follow-up (n) (AD and CTRL)

“Featured” genes

Grupe et al. [37]

USA and UK




APOE, ACAN, BCR, CTSS, EBF3, FAM63Ae, GALP, GWA_14q32.13, GWA_7p15.2, LMNA, LOC651924, MYH13, PCK1, PGBD1, TNK1, TRAK2, UBD

Coon et al. [38]

USA, Netherlandsa





Reiman et al. [39]

Li et al. [40]

Canada and UK




APOE, GOLM1, GWA_15q21.2, GWA_9p24.3

Poduslo et al. [41]






Abraham et al. [42]






Bertram et al. [43]





APOE, ATXN1, CD33, GWA_14q31

Beecham et al. [44]






Carrasquillo et al. [45]






Lambert et al. [46•]






Harold et al. [30•]

USA and Europeb, d





Heinzen et al. [36] (CNV)





Potkin et al. [47]





Seshadri et al. [48]

Europe and USAa, b, d





Naj et al. [49]

USA and Europea, c, d





Modified after content on the AlzGene website (; current on February 1, 2011). Studies are listed in order of publication date (determined by PubMed-ID number). “Featured genes” are those genes/loci that were declared as “associated” in the original publication, but note that criteria for declaring association may vary across studies; in many studies, surrogate marker were used for APOE. “Samples in GWAS” refers to sample sizes used in initial GWAS analyses, whereas “Samples in follow-up” refers sample sizes in follow-up datasets (where applicable); please consult the AlzGene website ( for more details on these studies

a, b, c, dIndicate sample overlap across studies with identical letters

eThis locus was originally named “THEM5

AD Alzheimer’s disease; ADNI Alzheimer’s Disease Neuroimaging Initiative; CNV copy number variants; CTRL controls; GWAS genome-wide association studies; ng not given; SNPs single nucleotide polymorphisms

How Have the Past 10 Years Advanced AD Genetics Research?

To this day, the results of 15 GWAS have been published in the field of AD. Overall, these studies have highlighted more than 40 different loci as potential AD susceptibility modifiers (Table 1). This number is likely going to increase further once additional datasets are screened and/or the existing GWAS datasets are merged and meta-analyzed, as this will lead to increased power to detect very small effect sizes typical for genetically complex diseases. Interestingly, the one overarching common finding emerging from all published AD GWAS to date is the highly significant association between increased AD risk and the presence of the apolipoprotein E (APOE) ε4 allele. Of course, the APOE association with AD resulted from conventional candidate gene analyses performed in the early 1990s [16], and was already the single most outstanding finding of our AD genetics review in this journal 10 years ago [1]. Of the more than 40 loci implicated in addition to APOE by GWAS, only five currently show sufficient evidence to consider them real (ie, BIN1 [bridging integrator 1], CLU [clusterin], CR1 [complement component [3b/4b] receptor 1], and PICALM [phosphatidylinositol-binding clathrin assembly protein]; Table 2). A few others show at least some evidence for replication in independent follow-up studies (eg, CD33 [siglec-3], GAB2 [GRB2-associated binding protein 2], or GWA_14q32 [not assigned to a specific gene yet]), albeit currently no association at genome-wide significance in meta-analyses (see AlzGene database for up-to-date results). For reviews of the potential pathomechanistic implications of these genes, the reader is referred to Table 2 and the existing literature [1722]. The remaining approximate 30 GWAS findings, however, only stand a poor chance of representing genuine associations with AD risk.
Table 2

Established susceptibility genes for non-Mendelian forms of Alzheimer’s disease




OR (95% CI)b




Proposed molecular effects/pathogenic relevanced


Apolipoprotein E


3.685 (3.30–4.12)

1.0 × 10−118



Aggregation and clearance of Aβ; cholesterol metabolism


Bridging integrator 1


1.151 (1.10–1.20)

3.0 × 10−10



Production and clearance of Aβ




0.875 (0.86–0.90)

2.4 × 10−23



Aggregation and clearance of Aβ; inflammation


Complement component (3b/4b) receptor 1


1.159 (1.10–1.22)

2.7 × 10−8



Clearance of Aβ; inflammation


Phosphatidylinositol-binding clathrin assembly protein


0.879 (0.85–0.91)

1.1 × 10−16



Production and clearance of Aβ; synaptic transmission

Only genes currently showing evidence for genome-wide significant association (P value ≤1 × 10−7) are displayed. For an up-to-date overview of these and other potential susceptibility genes see the AlzGene database (

aIndicates genes/loci originally identified by GWAS

bBased on information available on AlzGene (current on February 1, 2011)

cCalculated as described in [2] using the ORs and MAFs in this table; note that these PAF estimates only represent rough approximations of the true values, which are impossible to calculate from ascertained observational samples

dSelection of proposed effects; note that the functional evidence for these loci is often scarce

GWAS genome-wide association studies; MAF minor allele frequency; OR odds ratio; PAF population attributable fraction

More interesting than APOE remaining the single most important risk factor for late-onset AD is the fact that its association with AD still explains more of the population attributable risk than all current non-APOE GWAS findings together (Table 2). Although this situation is likely going to change with the identification of additional GWAS loci, this comparison already highlights one important point: complex genetics is not only characterized by a great deal of locus heterogeneity and difficult-to-model inheritance and interaction patterns, but also by very small effect sizes exerted by the individual risk alleles. In retrospect, APOE must be regarded as the proverbial exception to this rule, which seems to apply to the majority of genetically complex diseases [23]. For instance, one copy of the APOE-ε4 allele increases the risk for AD by more than 300%. One copy of the next best non-APOE allele (ie, rs11136000 in CLU) merely increases the risk for AD by approximately 14%. Although on a population level this association may still account for up to 6.5% of the attributable fraction of AD risk (Table 2) and may therefore add valuable insight to a better understanding of AD pathophysiology, this information is essentially meaningless on the level of the “risk-allele” carrying individual. It remains to be seen whether or not the genetics of non-Mendelian AD follows a similar pattern as schizophrenia, where literally tens of thousands of common polymorphisms appear to determine the extremely polygenic architecture of this neuropsychiatric disorder, with each risk allele only exerting minuscule effects (ie, risk increases ≤5% per allele) [24••]. This situation would be bad news for AD geneticists, as effect sizes that small are extremely difficult to distinguish from statistical noise unless very large samples (ie, >100,000 combined cases and controls) are tested. Proving a functional basis for such weakly associated alleles will be even more difficult with current day biochemical and molecular assays.

What else have the past 10 years of AD genetics research brought to the table other than a handful of new GWAS findings that appear to be real? To this day, surprisingly little. This is remarkably different for other major neurodegenerative diseases such as Parkinson’s disease (PD), frontotemporal dementia (FTD), or amyotrophic lateral sclerosis (ALS). Like AD, all of these are characterized by a dichotomy of rare Mendelian versus more common non-Mendelian disease forms. Especially the former have seen some major discoveries over the past decade (eg, the identification of disease-causing mutations in LRRK2 [leucine-rich repeat kinase 2], DJ-1 [Parkinson disease {autosomal recessive, early onset} 7], and PINK1 [PTEN induced putative kinase 1] in PD, CHMP2B [chromatin modifying protein 2B], GRN [granulin], VCP [valosin containing protein] in FTD, and ANG [angiogenin, ribonuclease, RNase A family, 5], FUS [fused in sarcoma], SETX [senataxin], TARDBP [TAR DNA binding protein] in ALS; all reviewed in [25]). In AD, on the other hand, no additional Mendelian AD genes have been found since the discovery of AD-causing mutations in PSEN2 in 1995 [26]. Does this mean that no additional early-onset AD genes exist? This appears unlikely given that mutations in APP, PSEN1, or PSEN2 likely do not account for all cases of Mendelian AD [27, 28]. Identifying these still elusive AD genes will require more systematic efforts (eg, based on high-throughput massively parallel sequencing).

Another interesting observation is that in AD there appears to be no overlap between the genes driving Mendelian versus non-Mendelian forms of the disease. That is, common polymorphisms in APP, PSEN1, and PSEN2 do not seem to contribute to risk for late-onset AD. This, again, is different from other neurodegenerative disorders like PD or FTD, where the major familial disease genes also represent the top susceptibility factors for sporadic forms of the disease (eg, SNCA [α-synuclein] and LRRK2 in PD [15], and PRGN in FTD [29]). From a pathomechanistic point of view it makes sense that common variants in genes known to cause a disease when highly dysfunctional (eg, owing to amino-acid changing or frameshift mutations) can also cause the same clinical syndrome at later onset ages when the underlying functional effect is more subtle (eg, changes in gene expression due to polymorphisms in regulatory regions). Based on current data, this does not appear to be the case in AD, at least not obviously so. This observation is likely not due to a lack of genotyping data in the respective genes. Overall, more than 80 studies have tested for a potential association between APP, PSEN1, and PSEN2 and AD risk during the past two decades, and several have seen significant effects. However, AlzGene meta-analyses across all available genotype data currently do not support a role of these genes as AD susceptibility factors. In addition, none of the hitherto published GWAS have reported the existence of association signals near these three genes. It remains to be seen whether or not this conclusion continues to hold once improved genotype imputation techniques will allow a more complete investigation of these loci in the GWAS setting.

What is in Store for the Next 10 Years of AD Genetics Research?

Much of the variance of non-Mendelian AD remains unexplained by the currently known susceptibility genes. If this is not owing to a schizophrenia-like polygenic architecture where any single small-effect risk factor will be difficult to identify and validate with any type of genetics technology, the underlying and still elusive AD risk alleles should be discoverable. In this context it needs to be emphasized that little is currently known about the genetic and biochemical mechanisms underlying even the most consistently associated GWAS loci. This is due to the fact that current GWAS microarrays usually do not assay functional polymorphisms (ie, DNA sequence changes directly invoking a change in gene/protein expression or function), but instead are designed to capture as much of the “common variation” as possible. For instance, the lead AD GWAS signal on chromosome 11q14 is elicited by SNP rs3851179, which maps approximately 90,000 bp downstream of the 5′ end of the PICALM gene [30•]. In all likelihood, this SNP therefore has no direct impact on AD pathogenesis but merely “tags” (ie, serves as a proxy for) the actual underlying functional variant. This variant may be located in PICALM, which represents the nearest known open reading frame in the region, but could also be located in another gene, or in a noncoding element nearby. Thus, in addition to identifying entirely novel genome-wide association regions, a great deal of effort in genetics research of AD over the next 10 years will focus on resolving the functional basis for the already known GWAS signals. This strategy is sometimes referred to as “fine mapping,” and is similar in concept to the “positional cloning” experiments in Mendelian genetics. Of course, this problem is not restricted to AD but applies to the vast majority of genetically complex disorders for which GWAS signals have been described [23].

The still elusive functional sequence variants underlying association signals could themselves be common (ie, show a minor allele frequency ≥5% in the general population). However, this would likely imply that their biochemical/functional impact is minimal and will be difficult to establish using today’s variant activity assays (if the functional impact were large [eg, as elicited by many amino-acid changing substitutions], then the association should also account for much of the attributable risk, which is not the case for any of the current GWAS signals in AD with the exception of APOE). One popular alternative theory [31, 32] posits that much of the unexplained variance of complex phenotypes may be contributed by rare alleles (ie, those with a minor allele frequency <<5%), possibly of relatively large genetic effect (ie, with odds ratios >2). If this hypothesis is correct, the underlying alleles should eventually be identifiable via re-sequencing (eg, using one of the newer high-throughput technologies that would allow fine-mapping several large GWAS regions at once, or to discover disease-associated rare alleles de novo). One example for a “rare variant association” in the neurodegenerative disease field is the N370S amino-acid substitution in GBA (glucosidase, beta, acid) in PD [33]. This variant, which has a frequency of less than 1% in most Caucasian populations, increases the risk for PD over threefold at genome-wide significance. However, owing to its rarity, this polymorphism cannot be adequately assessed using genotype data derived from today’s GWAS microarrays and was originally identified using a “candidate gene” approach.

The above discussion has mainly revolved around simple nucleotide changes (eg, SNPs that represent the type of sequence variant typically assayed on a GWAS microarray). However—and regardless whether common or rare in the general population—the underlying sequence variation can also be of a more complex nature (eg, a variation in copy number [copy number variants, CNVs] or consist of structural changes of the chromosomal interval in question). Although some of these changes can be assessed to a certain degree on existing GWAS microarrays, a more definitive characterization often requires the application of additional and usually more laborious experimental methods, such as fluorescent in situ hybridization (FISH) or array-based comparative genomic hybridization (CGH). Although rare and highly penetrant CNVs have been described to cause Mendelian forms of both AD (duplications of APP [34]) and PD (duplications and triplications of SNCA [35]), the role of common CNVs in neurodegeneration is less clear. In AD, only one GWAS has thus far assessed the role of CNVs in contributing to disease risk [36], albeit with no conclusive results.

Finally, in addition to very likely substantially extending our knowledge about non-Mendelian risk factors, AD genetics research over the next 10 years will also re-focus on the search for novel Mendelian AD genes. For the first time in the history of human genetics research, it is now both technically feasible and economically affordable to systematically screen for novel AD-causing mutations at base-pair resolution. As outlined above, the search for novel disease-causing mutations has been successful for a number of other Mendelian disorders over the past decade [10•–12•], and there is no reason to assume that this should not also be the case in AD, provided such additional genes do exist. Although the identification of novel Mendelian disease genes will not make any significant contribution to understanding the incidence of AD on a population-wide level, such discoveries would likely provide invaluable insights into the major pathogenic forces driving neuronal cell death and neurodegeneration in AD, much like APP, PSEN1, and PSEN2 have done in the 1990s, and continue to do across many aspects of today’s basic research in AD.


After a decade of intensive investigation but only few replicable results, AD genetics research is slowly picking up pace. The last 2 years have yielded nearly half a dozen novel genetic risk factors that can now be considered established. The next 10 years will likely produce additional disease-modifying factors and, thanks to recent advances in high-throughput sequencing technologies, possibly additional disease-causing genes. Once the pathophysiologic basis of these genetic correlations has been established, the way is paved toward translating these findings into clinical applications that hopefully in the not too distant future will allow for an earlier prediction and better therapy of this devastating disease.


This work was sponsored by funding from the Cure Alzheimer Fund, the Michael J. Fox Foundation for Parkinson’s Research, and the German Federal Ministry for Education and Research (BMBF). L. Bertram was financially supported by funds from the Deutsche Forschungsgemeinschaft (DFG).


No potential conflict of interest relevant to this article was reported.

Copyright information

© Springer Science+Business Media, LLC 2011