Genome-wide Association Studies in Alzheimer’s Disease: A Review
- First Online:
- Cite this article as:
- Tosto, G. & Reitz, C. Curr Neurol Neurosci Rep (2013) 13: 381. doi:10.1007/s11910-013-0381-0
Over the past decade, research aiming to disentangle the genetic underpinnings of late-onset Alzheimer’s disease has mostly focused on the identification of common variants through genome-wide association studies. The identification of several new susceptibility genes through these efforts has reinforced the importance of amyloid precursor protein and tau metabolism in the cause of the disease and has implicated immune response, inflammation, lipid metabolism, endocytosis/intracellular trafficking, and cell migration in the cause of the disease. Ongoing and future large-scale genome-wide association studies, translational studies, and next-generation whole genome or whole exome sequencing efforts, hold the promise to map the specific causative variants in these genes, to identify several additional risk variants, including rare and structural variants, and to identify novel targets for genetic testing, prevention, and treatment.
KeywordsAlzheimer’s disease Genetics Gene Variation Polymorphism Genome-wide association study Sequencing
Late-onset Alzheimer’s disease (LOAD) typically begins with the onset of symptoms after the age of 60 years and evolves slowly from mildly impaired memory to severe cognitive loss. At death, the most frequent pathological manifestations in the brain include extracellular β-amyloid protein (Aβ) in diffuse and neuritic plaques and intracellular deposits of hyperphosphorylated tau protein, a microtubule assembly protein, in the form of neurofibrillary tangles. Widespread loss of both neurons and synapses also occurs .
An estimated 4.5 million Americans have LOAD. The annual incidence of LOAD increases from 1 % at the age of 60–70 years to 10–30 % at 85 years and older . As the US population ages, it is expected that the number of LOAD cases will increase to 16 million 20 million by 2050, with one in 45 Americans affected [3, 4]. A critical barrier to lessening the impact of this disease is the limited development of drugs to prevent or treat LOAD, which is mostly attributable to incomplete characterization of the basic underlying pathologic mechanisms. Determining which genes and gene networks contribute to LOAD risk would reveal basic pathogenic mechanisms, highlighting key proteins and pathways for drug development (“druggable targets”), and inform the development of genetic testing methods for identifying those at greatest risk of LOAD when preventive measures become available.
In recent years, the genetic analysis of LOAD has focused on identification of common variants through genome-wide association studies (GWAS) and has identified several novel susceptibility genes implementing specific pathways in the disease. This article reviews these studies, discusses their potentials and limitations, and provides suggestions for future research.
Data Source and Study Selection
The primary sources of the studies addressed in this review were full-text articles and abstracts published in English in the PubMed database between 2010 and February 2013. The keywords used for searching PubMed were “dementia,” “Alzheimer’s disease,” “gene,” “genetics,” “epigenetics,” “endophenotype,” and “genome-wide association study.” The abstracts retrieved were read to identify studies addressing the topics included in this review. We also performed a manual search of references cited in published articles. The studies were read in their entirety to assess their appropriateness for inclusion in this article.
Genetic Epidemiology of LOAD
A family history of dementia is one of the most important risk factors for LOAD [5, 6]. Families multiply affected by LOAD are at increased risk of dementia, but the distribution of secondary cases is not consistent with Mendelian inheritance. LOAD is more frequent among monozygotic twins than dizygotic twins [7, 8, 9], and first-degree relatives of patients with LOAD have approximately twice the expected lifetime risk of developing the disease. Heritabilities of 58–79 % for LOAD indicate that in spite of progress made in identifying the underpinnings of the disease, a substantial fraction of LOAD is attributable to unknown genetic factors.
Apolipoprotein E Region
For more than a decade, only one genetic risk factor, the APOE ε4 allele, located on chromosome band 19q13, was an unequivocally established “susceptibility” gene in non-Hispanic Whites of European ancestry. Apolipoprotein E (ApoE) is a lipid-binding protein and is expressed in humans as three common isoforms coded for by three alleles, ε2, ε3, and ε4. A single APOE ε4 allele is associated with a twofold to threefold increased risk; having two copies is associated with a fivefold or more increased risk . In addition, each inherited APOE ε4 allele lowers the age at onset by 6–7 years [11, 12, 13, 14, 15, 16, 17, 18]. APOE ε4 is also associated with lower cognitive performance, in particular the memory domain, is associated with mild cognitive impairment [19, 20, 21, 22], and is associated with progression from mild cognitive impairment to dementia [19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]. Although the population attributable risk of APOE ε4 is estimated at 20–50 % , the presence of ε4 is neither necessary nor sufficient for development of the disease . In ethnic groups other than non-Hispanic Whites, the association between APOE and LOAD was largely inconsistent across studies.
Findings from GWAS
At the beginning of the century, thousands of candidate-gene-based association studies aiming to identify additional susceptibility loci were performed, but only one gene, the sortilin-related receptor (SORL1) , which is implicated in intracellular trafficking of amyloid precursor protein (APP), could be consistently replicated in independent datasets and implicated in the disease. The main reasons for these inconsistencies between studies are sample heterogeneity with differences in linkage disequilibrium (LD) patterns and allele frequencies, and small sample sizes, leading to limited power to detect small or moderate effect sizes. In the past 5 years, technological advances in high-throughput genome-wide arrays have allowed the hypothesis-free simultaneous examination of thousands to millions of polymorphisms across the genome, and large collaborative efforts capitalizing on this technology have significantly advanced knowledge of the genetic underpinnings of LOAD and the pathways involved by identifying several novel risk loci.
Major Alzheimer’s disease (AD) genome-wide association studies (GWAS) performed
Genes identified outside APOE region
Lambert et al. 
Stage 1: 2,032 AD cases; 5,328 controls
Stage 2: 3,978 AD cases; 3,297 controls
Harold et al. 
Stage 1: 3,941 AD cases; 7,848 controls
Stage 2: 2,023 AD cases; 2,340 controls
Seshadri et al. 
Stage 1: 3,006 AD cases; 4,642 controls
BIN1, XOC3L2/BLOC1S3/MARK4, CLU, PICALM
Stage 2: 2,032 AD cases; 5,328 controls
Stage 3: 3,333 AD cases; 6,995 controls
Naj et al. [44••]
Stage 1: 8,309 AD cases; 7,366 controls
MS4A4A, CD2AP, CD33, EPHA1, CR1, CLU, BIN1, PICALM
Stage 2: 3,531 AD cases; 3,565 controls
Hollingworth et al. [43••]
Stage 1: 6,688 AD cases; 13,685 controls
ABCA7, MS4A6A/MS4A4E, EPHA1, CD33, CD2AP
Stage 2: 4,896 AD cases; 4,903 controls
Stage 3: 8,286 AD cases; 21,258 controls
Lee et al. 
549 AD cases; 544 controls
CLU, PICALM, BIN1, CUGBP2, loci on 2p25.1; 3q25.2; 7p21.1; 10q23.1
Reitz et al. [59••]
1,968 AD cases; 3,928 controls
ABCA7, intergenic locus on 5q35.2
Major pathways identified by GWAS
APOE, SORL1, CLU, CR1, PICALM, BIN1, ABCA7
CLU, CR1, EPHA1, ABCA7, MS4A4A/MS4A6E, CD33, CD2AP
Lipid transport and metabolism
APOE, CLU, ABCA7
Synaptic cell functioning/endocytosis
CLU, PICALM, BIN1, EPHA1, MS4A4A/MS4A6E, CD33, CD2AP
In these large-scale GWAS performed in non-Hispanic Whites of European ancestry, the most strongly associated single-nucleotide polymorphisms (SNPs) at each locus other than APOE demonstrated population attributable fractions between 1.0 and 8.0 %, with effect sizes ranging from an odds ratio of 1.16 to an odds ratio of 1.20, i.e., much smaller than for APOE . In the largest GWAS performed to date in Caribbean Hispanics , associations in CLU, PICALM, and BIN1 were replicated and several additional loci on 2p25.1, 3q25.2, 7p21.1, and 10q23.1—which could be replicated in an independent cohort of non-Hispanic Whites of European ancestry from the National Institute on Aging Late-Onset Alzheimer’s Disease Family Study (NIA-LOAD)—were observed. Finally, in the largest GWAS of African Americans performed, Reitz et al. [59••] identified ABCA7 as a major susceptibility locus in this ethnic group. Interestingly, in contrast to all GWAS loci identified in Caucasians, in African Americans the ABCA7 locus had an effect size as strong as that of APOE ε4 (i.e., a 70–80 % increase in risk compared with a 10–20 % increase in risk through the GWAS loci observed in Whites). Although this finding may represent a winner’s curse (i.e., inflation of the estimated effect in a discovery set in relation to follow-up studies) and needs to be confirmed by independent studies in African Americans and functional methods, it may have major implications for developing targets for genetic testing, prevention, and treatment in this ethnic group if proven true. In addition, this study confirmed APOE as a susceptibility gene in this ethnic group, evidence for which prior to this study had been inconsistent across studies, and also replicated CR1, BIN1, EPHA1, and CD33.
The recent GWAS for LOAD using large numbers of cases and controls identified several novel susceptibility loci that are biologically plausible, cluster in specific pathways, and have significantly advanced the understanding of the pathogenic mechanism underlying the disease. Common to all novel loci in non-Hispanic Whites of European ancestry is the modest effect size with odds ratio ranging from 1.1 to 1.5 leaving the APOE ε4 allele by far the strongest risk factor. In contrast, in the largest GWAS performed to date in African Americans, the ABCA7 locus was observed to have an effect size similar to that of APOE (70–80 % increase in risk). The population attributable risk of each of the non-APOE loci is estimated to be 1–8 %. However, this estimate will change with elucidation of the number, allele frequencies, and risk effects of the true functional variants at each locus, and the detection of additional common and rare risk variants and patterns of epistasis.
Replication in independent datasets—if possible across different ethnic groups—and functional validation of the loci identified by GWAS is crucial for several reasons. First, GWAS are not designed to identify the specific causative variants, but rather are designed screen the genome, capitalizing on the LD between genotyped SNPs and the potentially causative variants . As LD can extend over large intervals, the true genetic effectors may be located considerably far away from the SNP showing the disease association, limiting the ability to detect true associations from GWAS. The development of high-throughput genotyping arrays, which have increased the number of genotyped markers to several millions, has decreased this problem to some extent, but not entirely, depending on the LD pattern in the region. Second, signals selected on the basis of statistical significance thresholds in underpowered settings are often subject to the winner’s curse (bias away from the null in the estimated effect of a newly identified allele on disease) [61, 62, 63], and replication can help produce a more accurate, unbiased estimate of the genetic effect of a locus. Third, the probability that an observed association truly exists depends on the power to detect the association, which in turn is a function of minor allele frequency, effect size, sample size, and the observed p value. The distribution of effect sizes of true associations in complex diseases is unknown, but it is likely that most of the large effects in LOAD GWAS have been identified, whereas most of the smaller effects remain to be discovered. The significance threshold needed to preserve the genome-wide type I error rate in studies of individuals with European ancestry is estimated at 5 × 10−7–1 × 10−8 [64, 65]. This threshold is even lower in ethnic groups with greater genetic diversity such as Hispanics, Africans, and African Americans and, consequently, most individual GWAS do not have enough power to distinguish false positives from false negatives. Finally, replication in a population with different environmental or genetic backgrounds may—if assessed in a population with a lower extent of LD such as Africans—help narrow down the location of the causative variant. In addition, it allows one to determine the generalizability of the observed association. However, when the aim is to replicate an observed association, it has to be kept in mind that there are several reasons for the observation of no association, including differences in allele frequencies or LD patterns across populations, or allelic or locus heterogeneity.
There are several additional approaches that can address some of these issues inherent to GWAS. Reclassifying sample subjects into more homogeneous subgroups, for example, based on endophenotypes, can reduce phenotypic heterogeneity and increase the power to detect true associations.
Gene-based association studies, which consider association between a trait and all markers within a gene rather than each marker individually, can be more powerful than traditional individual-SNP-based GWAS. For example, if a gene contains more than one causative variant, then several SNPs within that gene might show marginal levels of significance that are often indistinguishable from random noise in the initial GWAS results. If the effects of all SNPs in a gene are combined into a test statistic and correction is made for LD, the gene-based test might be able to detect these effects. Similarly, genome-wide haplotype-based association studies can characterize loci not detected by univariate analyses. Such gene-based or haplotype-based analyses led to the discovery of NARS2, FRMD6, and FRMD4A as susceptibility loci [66, 67], the latter of which is immediately adjacent to GAB2. Identification and examination of regions with runs of homozygosity (i.e., excess burden of homozygous markers) can help identify recessive causative genes. Evidence is accumulating that a substantial part of the missing genetic variability could be due to epistatic effects or gene–environment interactions. Thus, exploration of gene–gene and gene–environment interactions can identify novel variants not detected by individual testing of SNPs. However, such studies require large samples sizes and/or large effect sizes to achieve adequate power.
Although the latest GWAS arrays include dense SNP maps of several million SNPs with minor allele frequency down to 1 % and novel functional exonic variants that were identified through sequencing of thousands of exomes, they are limited in their ability to detect associations with variants not tagged by the genotyped SNPs. In addition, they are limited in their ability to identify structural variants or rare variants with minor allele frequency of less than 1 %. However, both rare and structural variants are increasingly recognized as being implicated in complex disease . In fact, two recent studies that performed genome sequencing followed by imputation of identified variants in independent datasets implicated the triggering receptor expressed on myeloid cells 2 gene (TREM2) in Alzheimer’s disease by identifying a causative rare missense mutation (rs75932628) resulting in an R47H substitution affecting the gene’s anti-inflammatory function [69•, 70•]. Additional sequencing studies identified rare causative variants in the nicastrin gene (NCSTN) encoding an obligatory component of the γ-secretase complex involved in splicing of APP  as well as CLU . Although individual rare variants may have an effect size large enough to cause disease, the accumulation of several rare variants each with small or modest effect sizes may cross the susceptibility threshold. Ongoing and future large-scale next-generation whole exome or whole genome sequencing techniques will fill this gap and further provide the means to identify the specific causative variants in the genes/regions identified by GWAS. Although appropriate algorithms for the statistical and bioinformatic analysis of sequencing data, in particular for whole genome sequencing data and whole exome or whole genome sequencing data derived from families, still need to be developed and implemented, the recent identification of rare variants in CLU, NCSTN, and TREM2 in LOAD that also cluster in amyloid processing and immune-response/inflammation pathways and were missed by the GWAS but identified by sequencing studies clearly belie the common disease–common variant hypothesis and prove the necessity of approaches with the ability to detect rare variants [69•, 70•, 71, 72]. Once causative variants are identified, functional studies can assess the pathogenic effects of the variants and characterize the molecular pathways in which they are involved or with which they interact, further implicating the gene in the disease and potentially providing targets for effective intervention.
Conclusions and Future Directions
Over the past 10 years, studies capitalizing on high-throughput genome technologies have significantly advanced knowledge of the genetic underpinnings of LOAD. GWAS have identified several susceptibility genes, and sequencing studies have identified specific causative variants in these genes, but have also provided invaluable evidence for an involvement of rare variants in this complex disease, overturning the common disease–common variant hypothesis that had long defined the genetic research of complex diseases. Ongoing and future large-scale next-generation sequencing approaches (both hypothesis-driven and hypothesis-free) are likely to disentangle a significant part of the missing heritability of LOAD, and have the potential to identify targets for genetic testing, prevention, and treatment.
Christiane Reitz was supported by a Paul B. Beeson Career Development Award (K23AG034550) and has received grant support from the National Institutes of Health/National Institute on Aging. Giuseppe Tosto was supported by a Defense Medical Research and Development Program (DMRDP) grant (W81XWH).
Compliance with Ethics Guidelines
Conflict of Interest
Giuseppe Tosto and Christiane Reitz declare that they have no conflict of interest.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.