Introduction

The Alzheimer’s Disease Neuroimaging Initiative (ADNI) was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit organizations, as a $60 million, 5-year public–private partnership (Weiner et al. 2012). The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). Determination of sensitive and specific markers of very early AD progression is intended to aid researchers and clinicians with developing new treatments and monitoring their effectiveness, as well as lessen the time and cost of clinical trials.

The Principal Investigator of this initiative is Michael W. Weiner, MD, VA Medical Center and University of California – San Francisco. ADNI is the result of efforts of many co-investigators from a broad range of academic institutions and private corporations, and participants have been recruited from over 50 sites across the U.S. and Canada. The initial goal of ADNI was to recruit and clinically monitor 800 participants but this initial study (i.e., ADNI-1) has been followed by additional funding and additional iterations, known as ADNI-GO and ADNI-2. To date, these three protocols have recruited over 1,500 adults, ages 55 to 90, to participate in the research, consisting of cognitively normal older individuals, people with significant memory concerns, people with early or late MCI, and people with early AD. The follow up duration for each group is specified in the ADNI-1, ADNI-GO and ADNI-2 protocols. Participants originally recruited for ADNI-1 and ADNI-GO had the option to be followed in ADNI-2. For up-to-date information, see www.adni-info.org.

The ADNI Genetics Core (Saykin et al. 2010), formally established in 2009, aims to provide genetic resources and facilitate genetics research related to ADNI multidimensional phenotypes. As of June 9, 2013, the available ADNI genetics data include the APOE genotyping data for 1,909 participants (818 ADNI-1 participants, 341 additional ADNI-1 participants who failed screening, and 750 ADNI-GO/2 participants), and Genome Wide Association Study (GWAS) data for 1,252 participants. The GWAS data was collected from 818 DNA samples of ADNI-1 participants using Illumina Human 610-Quad genotyping array, and from 434 DNA samples of ADNI-GO/2 participants using Illumina OmniExpress genotyping array.

In this paper, we review the ADNI genetic studies published between 2009 and 2012, where either ADNI APOE or GWAS data have been used. We searched the PubMed database using the EndNote X4 online search tool with the following three criteria: (1) The “Author”, “Title”, “Abstract” or “Keywords” field contains “ADNI” or “Alzheimer’s Disease Neuroimaging Initiative”; (2) The “Title”, “Abstract” or “Keywords” field contains “APOE”, “apolipoprotein”, “gene”, “genetic”, “genetics”, “genotyping”, “genome”, “genomic”, or “genomics”; and (3) The “Year” field value is between 2009 and 2012. We integrated the search results with the ADNI publication database maintained by the ADNI Data and Publications Committee (DPC). We manually reviewed all the abstracts and identified 106 relevant ADNI genetics publications through this extensive search. The number of publications (Fig. 1) grew from 3 in 2009,Footnote 1 to 23 in 2010,Footnote 2 28 in 2011,Footnote 3 and 52 in 2012.Footnote 4

Fig. 1
figure 1

Distribution of publications using the ADNI APOE and GWAS genotyping data between 2009 and 2012: Of the 106 papers, 30 papers used only APOE data, and 76 papers used GWAS data

Among these, 30 papers analyzed only APOE, and 76 used the GWAS data. Table 1 shows a high level classification of these papers based on different genotype, phenotype, and method categories. In particular, rich multidimensional ADNI phenotypes have facilitated a range of quantitative trait (QT) analyses, including structural imaging (N = 55), functional imaging (N = 15, including functional MRI and PET), biofluids (N = 24), and cognition (N = 22), in addition to case control analyses (N = 26). The availability of ADNI genetic and multimodal phenotypic data led to numerous research findings that demonstrate the power of multidimensional quantitative phenotype data for identifying novel genetic variants. For example, as reported in Saykin et al. (2012), the FRMD6 gene (Fig. 2) was identified in three ADNI imaging genetics studies (Furney et al. 2011; Potkin et al. 2009a; Stein et al. 2010a) (N ≤ 1,004) and later validated by a case control GWAS with a much larger sample (Hong et al. 2012) (N > 12,500). This clearly illustrates the statistical power of QT analyses.

Table 1 Classification of reviewed papers based on genotype, phenotype and method categories with an example
Fig. 2
figure 2

As reported in (Saykin et al. 2012), FRMD6 (FERM domain-containing protein 6) was detected in 3 imaging genetics studies using the ADNI data (Potkin et al. 2009a; Stein et al. 2010a; Furney et al. 2011) and validated by case control GWAS (Hong et al. 2012)

Binary versus quantitative phenotypes

Case control analysis

ADNI data has been investigated alone or in combination with other cohorts, such as those in the Alzheimer’s Disease Genetics Consortium (ADGC, http://alois.med.upenn.edu/adgc/), in numerous case control analyses including both candidate gene and GWAS studies. The following highlights a few case control studies that examined ADNI data alone: Potkin et al. (2009a) performed a GWAS and identified APOE and TOMM40 loci at a significance level of p ≤ 10−6; Biffi et al. (2010) focused on a few candidate single nucleotide polymorphisms (SNPs) and confirmed the APOE locus as associated with Alzheimer’s disease in the ADNI cohort; and Lakatos et al. (2010) studied mitochondral haplogroups and SNPs and reported that a mitochondrial haplogroup (UK) might confer genetic susceptibility to AD independent of the APOE ε4 allele. Many more studies combined ADNI with other cohorts, performed candidate gene/GWAS and meta-analyses. These studies nominated or confirmed multiple AD susceptibility loci, including: CR1, CLU and PICALM (Jun et al. 2010), epistatic interaction between TF and HFE (Kauwe et al. 2010a), APOE and MTHFD1L (Naj et al. 2010), CR1 (Antunez et al. 2011b), MS4A gene cluster (Antunez et al. 2011a), ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP (Hollingworth et al. 2011), BIN1 (Hu et al. 2011a), MS4A4/MS4A6E, CD2AP, CD33 and EPHA (Naj et al. 2011). Some studies also discovered suggestive novel associations in PPP1R3B (Kamboh et al. 2012) and FTO (Reitz et al. 2012).

Quantitative phenotype analysis

Given the very rich multimodal quantitative phenotype data available in ADNI, many QT studies have been performed in addition to the case control analyses mentioned above (Table 1). The QT approach has distinct advantages in power over categorical diagnoses (i.e., healthy control vs. AD). QT approaches have ~4–10 times more statistical power (Potkin et al. 2009e; Purcell et al. 2003) (Fig. 3), making use of the entire distribution of trait values, as well as avoiding the often arbitrary or error prone cutoff distinctions, for example, thresholds that distinguish MCI from AD. This observation is illustrated by the FRMD6 gene example shown above (Fig. 2); see also “Strength and limitations of QT analyses” section for more relevant discussion. In addition, QT analysis offers an alternative strategy to discover unanticipated genes associated with AD or AD risk. One can begin with brain imaging or other biomarkers characteristic of AD and identify the genes (or SNPs, or other types of genetic variation) associated with that phenotype. Using imaging and biomarkers as an intermediate phenotype, may have greater sensitivity in clarifying the functional links related to the AD genes than diagnostic categories. In the following section, we provide a systematic review of ADNI genetics findings where imaging, cognition and biomarkers have been used as quantitative phenotypes.

Fig. 3
figure 3

Comparison of sample sizes to reach a GWAS significance level of p < 10−8 for case controls and QT approaches for a p < 10−8 (OR = 1.5) with 10 % variance explained for the QT, a MAF of .10 and marker SNP MAF = .20. (See also Potkin et al. 2009e)

Multi-modal neuroimaging and biomarkers as quantitative phenotypes

Structural neuroimaging

Structural neuroimaging is the most widely studied phenotype category in ADNI (Table 1). Interesting brain-genome associations can be identified at multiple levels (Fig. 4): In the genomic domain, we can examine candidate genes/SNPs, relevant biological pathways/networks, or the entire genome; similarly, in the neuroimaging domain, we can study an individual region of interest (ROI), interesting brain circuits including multiple ROIs, or the whole brain. Figure 4 shows examples of studies in each category (Chiang et al. 2012; Ho et al. 2010; Potkin et al. 2009a, c; Reiman et al. 2009; Risacher et al. 2010, 2013; Saykin et al. 2010; Shen et al. 2010; Sloan et al. 2010; Stein et al. 2010a; Swaminathan et al. 2012c). Below, we first focus on candidate and genome-wide genetic association studies of targeted brain phenotypes, then whole brain analysis of targeted SNPs, and finally review whole genome whole brain analysis.

Fig. 4
figure 4

Multi-level brain-genome association strategies and examples of studies in each category (Risacher et al. 2010; Sloan et al. 2010; Potkin et al. 2009a; Saykin et al. 2010; Risacher et al. 2013; Swaminathan et al. 2012c; Potkin et al. 2009c; Ho et al. 2010; Reiman et al. 2009; Chiang et al. 2012; Shen et al. 2010; Stein et al. 2010a). Relevant thumbnails were reprinted by permissions from (1) Elsevier: [Neurobiology of Aging], (Risacher et al. 2010), copyright (2010); (2) John Wiley & Sons, Inc.: [American Journal of Medical Genetics], (Sloan et al. 2010), copyright (2010); (3) Elsevier: [Alzheimer's & Dementia], (Saykin et al. 2010), copyright (2010); (4) the terms of the Creative Commons Attribution Non Commercial License: [Frontiers in Aging Neuroscience], (Risacher et al. 2013), copyright (2013); (5) Macmillan Publishers Ltd: [Molecular Psychiatry], (Potkin et al. 2009c), copyright (2009); (6) the terms of the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License: [Journal of Neuroscience], (Chiang et al. 2012), copyright (2012); and (7) Elsevier: [Neuroimage], (Shen et al. 2010), copyright (2010)

Note that the findings discussed below included some “promising” results which did not reach conventionally accepted genome-wide thresholds. In addition, some of the studies performed various analyses on related phenotypes using the same dataset. While these studies yielded interesting and promising results warranting further investigation, readers should be aware that the issue of determining the proper statistical threshold in these complex quantitative genetics studies is still a challenging area and a topic of ongoing investigation. Replication of any of the current results in independent samples remains of critical importance for confirmation.

Candidate genetic association study of targeted phenotypes

Structural neuroimaging data have been used in many genetic studies of ADNI data to provide targeted QTs to increase detection power and improve biological interpretability. The relationship of APOE to targeted MRI phenotypes has been investigated in a few ADNI studies. Wolk and Dickerson (2010) analyzed mild AD participants with a cerebrospinal fluid (CSF) molecular profile consistent with AD to show that carriers of the APOE ε4 allele exhibited greater medial temporal lobe (MTL) atrophy, whereas non-carriers had greater frontoparietal atrophy. Andrawis et al. (2012) showed that pooled HC, MCI and AD participants with the APOE ε4 allele had significantly smaller hippocampal volume at 1-year follow-up and non-significantly smaller hippocampal volume at baseline. Jack et al. (2012) showed that the shape of hippocampal volume trajectory as a function of the Mini-Mental State Examination (MMSE) score was affected by interactions with APOE ε4 status. Risacher et al. (2010) showed that annual percent change rates in MRI-based neurodegeneration markers are influenced by APOE genotype. Desikan et al. (2012) and Tosun et al. (2010) both performed longitudinal MRI studies to investigate relations between brain atrophy rate, CSF biomarkers, and APOE ε4 status.

Several other ADNI studies have also reported the effects of specific candidate genes on targeted imaging phenotypes. Rimol et al. (2010) studied sex-dependent association of common variants of microcephaly genes with brain structure to identify and confirm two SNPs (rs914592 and rs2297453) in CDK5RAP2 associated with total cortical surface area in only males. Biffi et al. (2010) reported the association of GWAS-validated and GWAS-promising novel AD loci with several imaging phenotypes. Sabuncu et al. (2012) computed a polygenic score using candidate AD-related SNPs to examine the association between a polygenic AD score and cortical thickness in clinically normal participants. Murphy et al. (2012) showed that in APOE ε4 carriers, the V and A alleles of the cholesteryl ester transfer protein (CETP) gene were associated with greater baseline cortical thickness and less 12-month atrophy in the MTL. Luo et al. (2012) demonstrated that the genetic variation in the interleukin 3 (IL3) promoter is a regulator of human brain volume consistent with a novel role of IL3 in regulating brain development.

GWAS study of targeted phenotypes

GWAS analyses have also been performed on targeted MRI phenotypes in ADNI. Potkin et al. (2009a) performed a QT analysis on HC and AD participants using hippocampal volume to identify 21 genes or chromosomal areas with at least one SNP with P ≤ 10−6 including EFNA5, CAND1, and MAGI2 (Potkin et al. 2009a). Stein et al. (2010b) performed a GWAS to identify 2 SNPs (rs10845840 (GRIN2B) and rs2456930) associated with bilateral temporal lobe volume. Stein et al. (2011) identified and replicated two genes (WDR41 and PDE8B), involved in dopamine signaling and development, associated with right caudate volume. Furney et al. (2011), combining the ADNI-1 and AddNeuroMed data sets, performed a GWAS and identified two SNPs (rs1925690 (ZNF292) and rs11129640 (ARPP-21)) associated with entorhinal cortical volume. Gene-wide scoring highlighted PICALM (phosphatidylinositol-binding clathrin assembly protein) as the most significant gene associated with entorhinal cortical thickness. This PICALM finding confirmed an earlier candidate gene analysis (Saykin et al. 2010) associating PICALM genotype and baseline mean bilateral entorhinal cortex thickness in ADNI-1. Hibar et al. (2012) performed a GWAS that identified common SNPs in the FMO gene cluster associated with differences in lentiform nucleus volume and replicated rs1795240 in FMO in meta-analysis. Melville et al. (2012) performed a meta-analysis using 2 different populations to identify novel GWAS hits associated with hippocampal volume in the APOE, F5/SELP, LHFP, and GCFC2 gene regions. Bakken et al. (2012) identified and replicated two SNPs (rs6116869 and rs238295) in GPCPD1, which is highly expressed in occipital cortex in humans, associated with the proportional surface area of visual cortex. Two recent large-scale GWAS analyses (Bis et al. 2012; Stein et al. 2012) identified and confirmed two SNPs (rs7294919 and rs17178006) associated with hippocampal volume and one SNP (rs10784502) associated with intracranial volume. Saykin et al. (2010) performed a GWAS and identified several genes (CDH8, SCL6A13, MAD2L2, QPCT, and GRB2) in addition to APOE and TOMM40 associated with rate of hippocampal volume loss and rate of change in hippocampal gray matter density over 1 year.

Whole brain analysis of targeted SNPs

A complementary line of research has been the whole brain analysis of targeted SNPs. In morphometric research, the traditional approach has been to trace a set of brain structures on MRI scans—either manually or automatically—and compute their volumes. In parallel, a set of voxel-based methods has been refined over the years to create statistical maps, revealing associations between an imaging measure at each location in the brain, and an external predictor, such as genotype. One such method, tensor-based morphometry (TBM), uses nonlinear deformations to align each participant’s MRI scan to an average brain template, and the degree of compression or expansion is used as a measure of regional brain volume. Using TBM, Ho et al. (2010) showed that carriers of an obesity-related SNP in the FTO gene had lower regional brain volumes than non-carriers in the frontal and occipital lobes. ~46 % of Western Europeans carry the obesity-associated SNP in FTO, and this genetic association with brain structure offers one plausible biological pathway, to explain why people with higher body mass index tend to have smaller volumes for some brain regions. Stein et al. (2010b) also used TBM to search for common genetic variants associated with temporal lobe volume. One such SNP, rs10845840, was located in the GRIN2B gene, which encodes the N-methyl-D-aspartate (NMDA) glutamate receptor NR2B subunit. This protein, which is involved in learning and memory, and excitotoxic cell death, has age-dependent prevalence in the synapse and is already a therapeutic target in Alzheimer’s disease. Risk alleles for lower temporal lobe volume at this SNP were significantly over-represented in AD and MCI patients vs. controls (odds ratio = 1.3; P = 0.039) and were negatively correlated with MMSE score, suggesting lower global cognitive function. Voxelwise maps of genetic associations revealed strong temporal lobe effects. Rajagopalan et al. (2012) also found greater brain atrophy in carriers of the folate-related gene, MTHFR, and replicated the association in a young adult cohort. Carriers of a common SNP in this gene have elevated homocysteine levels. Homocysteine is neurotoxic and may itself promote brain atrophy, therefore constituting a risk for cardiovascular disease that carries additional risk for AD. The FTO and MTHFR studies targeted SNPs that already had a known association with a factor that promotes brain atrophy, suggesting a general “stepping-stone” approach to focus on brain-relevant SNPs. A number of studies mapped the effect of APOE genotype on several brain measures in ADNI, including cortical thickness (Fan et al. 2010) and memory and MRI measures (Wolk and Dickerson 2010).

Whole genome whole brain analysis

Most genetic analyses of the ADNI dataset—and of brain image databases in general—have focused on testing the effects of one or a handful of candidate SNPs. Even so, one can also search the entire genome for common variants that are associated with signals in any part of an image. Several published papers used the ADNI dataset to perform this kind of genome-wide, image-wide, search. In a “brute force” approach, Stein et al. (2010a) proposed “voxelwise GWAS”—or vGWAS—which explored the relation between 448,293 SNPs in each of 31,622 voxels in the brain across 740 ADNI participants. No variants survived as associated with regional brain volumes, but several genes worthy of further exploration were identified, including CSMD2 and CADPS2. Two subsequent papers (Hibar et al. 2011b, c) proposed “vGeneWAS”, where the SNPs in each gene are first prioritized using a principal components regression. The top gene in the study was GAB2, which has been previously associated with late-onset AD (Reiman et al. 2007), suggesting validity of the approach. In a combined voxel-wise and ROI approach, Shen et al. (2010) performed a GWAS, and confirmed that SNPs in the APOE and TOMM40 genes were strongly associated with volumetric variation in multiple brain regions. Their genome-wide, whole brain search also revealed several novel candidate loci (EPHA4, TP63 and NXPH1) warranting further investigation and replication. The EPHA4 finding was interesting in that EPHA1, also a member of the ephrin gene family, was later identified and replicated in two large case control GWAS (Hollingworth et al. 2011; Naj et al. 2011). The NXPH1 association points to the neurexin and neuroligin gene pathway which regulates cell adhesion related proteins that are receiving increasing attention in AD research (Martinez-Mir et al. 2013).

To help find signals in datasets of such vast dimensionality, sparse regression and machine learning methods have been adapted by several research groups to handle imaging measures and identify a compact set of genetic predictors from a vast set of SNPs (reviewed in Hibar et al. (2011a)). Sparse reduced-rank regression (sRRR) was proposed and applied to the ADNI data in Vounou et al. (2012), confirming the key role of the APOE and TOMM40 genes, and also highlighting some novel potential associations with AD. Ge et al. (2012) proposed a new multi-locus method for voxelwise GWAS, and re-analyzed the TBM dataset from Stein et al. (2010a). They found a number of genes with statistically significant associations with regional brain volumes; the most strongly associated gene was GRIN2B, already implicated by Stein et al. (2010a), which encodes the N-methyl-D-aspartate (NMDA) glutamate receptor NR2B subunit and was associated with parietal and temporal lobe volumes. Meda et al. (2012b) adapted parallel independent components analysis (pICA) to whole brain genome-wide analysis, and identified four primary “genetic components” that were associated with a single structural network including regions involved neuropathologically in late-onset AD. In pathway analyses, each component included several genes known to contribute to AD risk (e.g., APOE) or pathologic processes contributing to AD, such as inflammation, diabetes, obesity and cardiovascular disease. Silver et al. (2012) proposed pathways sparse reduced-rank regression (PsRRR) and applied the method to voxel-wise maps of brain change, computed using TBM at 6, 12 and 24 month intervals. High ranking genes included a number previously linked to β-amyloid plaque formation (PIK3R3, PIK3CG, PRKCA and PRKCB), and several known AD risk genes—CR1, TOMM40 and APOE.

Functional neuroimaging

A number of ADNI studies investigated how genetic variants influence functional neuroimaging measures from functional MRI (fMRI) and PET. The role of APOE ε4 allele, the strongest known genetic risk factor for late-onset AD, has been evaluated in many studies. In one fMRI study (Damoiseaux et al. 2012), the authors found significant decreased connectivity in the default mode network in healthy older participants who carried the APOE ε4 allele compared to APOE ε3 homozygotes. A significant sex by APOE genotype interaction in the precuneus was identified; with a significant reduction in default mode connectivity observed only in female ε4 carriers. Jack et al. (2012) evaluated the shape of five biomarker trajectories including CSF Aβ42 and tau levels, amyloid burden measured using [11C]Pittsburgh Compound-B (PiB)-PET, cerebral metabolic rate measured using [18F]fluorodeoxyglucose (FDG)-PET, and MRI, as a function of MMSE (Jack et al. 2012). The authors identified complex biomarker trajectories, which were affected by age and APOE ε4 status. A different study in cognitively normal ADNI participants also showed the association of APOE ε4 status with amyloid burden measured with [18F]florbetapir and glucose metabolism measured with FDG-PET (Jagust and Landau 2012). Interestingly, plasma APOE level was one of the biomarkers associated with brain amyloid burden measured by [11C]PiB-PET in the ADNI cohort (Kiddle et al. 2012). APOE ε4 genotype predicts a higher rate of conversion of MCI to AD and has been suggested—along with other markers—for selecting participants for clinical trials (Singh et al. 2012; Yu et al. 2012).

The role of other genetic variants has also been investigated in a few studies. Xu et al. (2010) investigated the effect of the Val66Met polymorphism of the BDNF gene on regional cerebral metabolic rate measured by FDG-PET, and found significant differences in many brain regions in Met carriers compared to non-carriers. Searching amyloid-related genes in a candidate pathway-based analysis of ADNI-1 participants with [11C]PiB-PET imaging data, an intronic SNP within the DHCR24 gene was identified, where carriers of the protective allele showed lower amyloid burden on a whole-brain voxel-wise level compared to non-carriers (Swaminathan et al. 2012c). Another targeted [11C]PiB-PET study, combining the BLSA cohort and ADNI PiB-PET participants, identified a lower amyloid burden in protective allele carriers compared to non-carriers of the rs3818361 SNP of the CR1 gene (Thambisetty et al. 2012). A SNP by APOE interaction was also observed: APOE ε4 carriers showed a significantly higher amyloid burden compared to APOE ε4 non-carriers, in non-carriers of the CR1 allele.

Fluid biomarkers

To date, explorations of ADNI fluid biomarkers have helped to identify novel candidate loci for CSF biomarker levels, to characterize biomarker effects of established and candidate AD risk markers, and to analyze combinations of fluid, imaging and genetic data in order to predict AD status and AD conversion.

Attempts to identify novel genetic variants have used an endophenotype-based approach, first working to understand how variants influence cerebrospinal fluid levels, then testing those variants as candidates to modify disease risk or other aspects of disease. As demonstrated by Schott (2012) using ADNI data, this approach may reduce heterogeneity in clinical diagnosis, increasing power to detect genetic associations. In addition, this approach leverages the strengths of quantitative traits and provides a clear biological mechanism for disease modifying action of the identified variants. Genome-wide association studies of cerebrospinal fluid levels of Aβ42 and tau/p-tau have been conducted in the ADNI dataset. Han et al. (2010) and Kim et al. (2011) completed two first GWAS analyses of CSF biomarkers, and these studies confirmed the known associations between APOE ε2/ε3/ε4 genotype and these AD biomarkers and reported suggestive associations with other genetic variants. Use of the ADNI data in combination with similar data from other groups has led to the identification of variation in PPP3R1, which is associated with cerebrospinal fluid p-tau levels and rate of decline in AD patients (Cruchaga et al. 2010). These findings were recently confirmed in another dataset (Peterson et al. 2013) and represent a clear example of the successful use of ADNI data in executing an endophenotype-based approach to discover novel disease modifying variants. More recently Cruchaga et al. (2013) published a genome-wide association study of cerebrospinal fluid tau levels using more than 1,200 samples, including ADNI data. This study identified three new genome-wide significant loci for cerebrospinal fluid tau levels including rs9877502, which is located at 3q28 between GEMC1 and OSTN, and also shows association with AD risk, tangle counts in AD patients, and cognitive decline. While Aβ42 and tau levels have been the focus of these studies, ADNI data for other proteins found in blood and/or cerebrospinal fluid have also yielded interesting results. Lourdusamy et al. (2012) used ADNI data to confirm associations between plasma levels of several disease related proteins and cis-located genetic variants. A genome-wide association study of cerebrospinal fluid APOE levels showed association to the APOE ε4 allele and several other suggestive associations (Cruchaga et al. 2012).

There has been frequent use of fluid biomarkers to further characterize the biological effects of reported AD risk markers (Alexopoulos et al. 2011; Cruchaga et al. 2011, 2013; Kauwe et al. 2010b, 2011; Soares et al. 2012; Vemuri et al. 2010). In addition to the well-established associations between the APOE ε4 allele and cerebrospinal fluid Aβ and tau levels, associations between variants in SORL1 and CSF Aβ levels (Alexopoulos et al. 2011) and between variants in BIN1, CD2AP, PICALM and CR1 and cerebrospinal fluid tau levels (Cruchaga et al. 2013) were also reported. Soares et al. (2012) examined the association of APOE ε2, ε3, ε4 genotype with multiple analytes and provided evidence for association with cortisol, interleukin 13, apolipoprotein B, and gamma interferon levels (Soares et al. 2012).

Finally, fluid biomarker data combined with genetic and imaging data have been used to classify case controls status and to predict biomarker trajectory (Hu et al. 2012; Jack et al. 2012; Kiddle et al. 2012; Tosun et al. 2010; Wolz et al. 2012; Yu et al. 2012). Research by Jack et al. (2012) shows the complexity of the relationships between biomarkers and AD, as well as the importance of age and APOE ε4 genotype in the longitudinal trajectories of fluid and imaging biomarkers. The results of these studies suggest combined evaluation of these factors can provide information to improve diagnosis (Hu et al. 2012; Kiddle et al. 2012; Tosun et al. 2010) and prediction of conversion from non-demented to Alzheimer’s disease status (Wolz et al. 2012; Yu et al. 2012).

In sum, fluid biomarker data in ADNI has been leveraged using several strategies. These data have helped to discover novel genetic variants that modulate AD, to characterize effects of AD risk variants, and have improved diagnostic and prognostic evaluations.

Cognitive performance, clinical measures, and other QTs

Neuropsychological performance scores and other measures, including age at onset of AD symptoms and neuropsychiatric inventory questionnaire variables, have been used in several ADNI studies as quantitative phenotypes or independent predictors to investigate the roles of genetic variants. Many studies have focused on the role of APOE genotype. APOE ε4 alleles were more frequent in the memory deficit group (amnestic phenotype) than the dysexecutive function group (dysexecutive phenotype) in mild AD patients (Dickerson and Wolk 2011) and ε4-carriers displayed greater impairment of memory retention compared to ε4-noncarriers in AD patients. ε4-noncarriers showed more impairment in working memory, executive function, and lexical access (Wolk and Dickerson 2010). However, no significant relationship between APOE alleles and dysexecutive and amnestic phenotypes was found in another study using a different phenotypic approach (Mukherjee et al. 2012b). While these studies focused on APOE ε4-carriers compared to ε4-noncarriers, another study examined the role of the protective APOE ε2 allele (Bonner-Jackson et al. 2012). The ε2 allele was associated with slower changes in daily functioning over time and better neuropsychological performance across a number of measures, demonstrating the expected protective effect of APOE ε2 allele. In addition, the relationship between subsyndromal symptoms of depression (SSD) in MCI patients and APOE ε4 allele carrier status was investigated with a higher frequency of ε4 allele carriers found in participants with SSD (Mackin et al. 2012).

While many studies investigated a targeted set of SNPs, other studies have performed GWAS of cognitive decline of patients with MCI (Hu et al. 2011b), of age-at-onset of AD (Kamboh et al. 2011), of global cognitive decline in several studies with mixed diagnoses (De Jager et al. 2012), of executive functioning resilience (Mukherjee et al. 2012a), and of a composite measure of memory performance (Ramanan et al. 2012a). ADNI data were used as a discovery or replication sample in these studies. Sherva et al. (2013) performed an unbiased GWAS on rate of cognitive decline in patients with AD that included ADNI participants. These studies identified the association of SNPs from AD-candidate genes (PICALM and CLU) and novel genes (UBR5, DCHS2, RNASE13, FLJ10357, SORL1) and enriched pathways in GWAS data.

Two studies integrated multi-modal data including imaging, genetics and/or cognitive data to improve the classification accuracy (Yu et al. 2012), as well as the prediction accuracy, of neuropsychological performance (Wang et al. 2012b). One study identified a significant association of one coding variant in CR1 gene with episodic memory decline (Keenan et al. 2012). SNPs previously associated with AD, white matter hyperintensity (WMH), and MRI-identified infarcts were also investigated (Mukherjee et al. 2012b). This study found 58 AD-related and 25 WMH- or infarct-related SNPs with odd ratios >1.5 or <0.67 in the amnestic subgroup of AD patients in contrast to the dysexecutive subgroup. Other targeted studies examined the relationship between SNPs in COMT gene and the presence of apathy (David et al. 2011) and between circadian related SNPs and sleep–wake state (Yesavage et al. 2011). However, these studies did not observe any significant associations.

Analytical strategies

Univariate analysis and multiple testing issues

Univariate analysis is widely used in genetic association studies. Among 106 ADNI papers reviewed here, 72 performed univariate analyses (Table 1). Typical methods used in univariate analysis include Pearson’s chi-squared test as the standard case control allelic test, or linear regression of the phenotype on the allele dosage for quantitative trait analysis (Purcell et al. 2007). As appropriate, covariates are often included in these analyses to remove effects of confounding factors such as APOE status (e.g., Kim et al. (2011)), and different genetic models (e.g., additive, dominant, or recessive) are sometimes considered (e.g., Cruchaga et al. (2010)).

The concern of protecting against type 1 false positive errors is crucial here. Use of unbiased gene discovery methods raises important issues related to false positive findings as the number of associations tested can exceed one million. Classical statistical analytical techniques were not designed for situations where the number of variables exceeds the number of participants by many orders of magnitude. The classical approach treats each association considered between a SNP and the phenotype as a “repeated” test, and argues that one should correct for the number of tests performed (e.g., with Bonferroni’s method, see critique by Gombar et al. (2012)). The widely-used Bonferroni method is not ideal for this application, as it assumes that all the factors, the SNPs in this case, are statistically independent of each other. However, this is not accurate for the genome or typically the case in imaging genetics studies where most SNPs are correlated with other SNPs (via linkage disequilibrium (LD) dependencies) leading to an overcorrection. Genome-wide thresholds of p < 10−8 and Bonferroni corrections have been applied and questioned as either too conservative or too liberal. One alternative is permutation (arbitrarily reassigning diagnostic and/or genotypes to estimate a null distribution of the test statistic (Collingridge 2013)) analysis (e.g., used in Stein et al. (2010b)). This has been used to identify adjusted thresholds for single SNPs, avoiding the need to meet assumptions of normality but is highly computationally demanding and can be slow even running on advanced supercomputers. False discovery rate (FDR) (Benjamini and Hochberg 1995) (e.g., used in Biffi et al. (2010)) is an alternative to the family wise error rate (FWER) (Hochberg and Tamhane 1987) and may be more appropriate to balance the complementary risks of false positives and false negatives. False negatives can be potentially more harmful than false positives, since they may be discarded forever, while a false positive can always be disproved or “falsified” later, albeit at some cost.

Examples of ADNI studies using a univariate analysis strategy include those evaluating: (1) candidate SNP/gene analyses on disease status (Lakatos et al. 2010), imaging measures (Biffi et al. 2010), fluid biomarkers (Kauwe et al. 2011), and cognitive scores (Dickerson and Wolk 2011), and (2) GWAS analyses of disease status (Potkin et al. 2009a), imaging measures (Shen et al. 2010), fluid biomarkers (Kim et al. 2011), and cognitive scores (Mukherjee et al. 2012a). The “voxelwise GWAS” proposed in Stein et al. (2010a) is an extreme case of massive univariate analysis. It examined the association between each possible SNP and voxel pair, and retained the most highly associated genetic variant at each voxel, to reduce the computational burden associated with the data. After applying an inverse beta transform to define an appropriate multiple comparisons correction, this study identified no significant SNP-voxel associations but several promising variants warranting further investigation.

Most genetic analyses of the ADNI dataset—and of brain image databases in general—have focused on the univariate strategy. This approach can quickly identify important single-SNP-single-trait associations, which are easy to interpret. However, this approach treats SNPs and traits as independent units, and overlooks relationships in which multiple SNPs jointly affect a single trait or multiple traits as well as pleiotropy, where one genetic locus influences multiple phenotypic traits. This may potentially lead to overly conservative statistical thresholds. To address these issues, multivariate analysis has also been employed in the ADNI genetics studies, and is discussed in the following section.

Multivariate analysis

To further boost the detection power, some ADNI studies have proposed or employed polygenic or multi-locus approaches, modeling the combined effect of multiple SNPs across the genome (Hibar et al. 2011a). For example, Biffi et al. (2010) calculated a cumulative score to represent the combined effect of a few non-APOE candidate SNPs, and identified its significant association with neuroimaging and clinical outcomes. Kohannim et al. (2011) and Kohannim et al. (2012a) used novel regression models to boost power to detect genetic associations with imaging phenotypes. In particular, they used LASSO regression to evaluate gene effects in a GWAS of the temporal lobe volume and discovered 22 genes that passed genome-wide significance. The effect of the MACROD2 gene was successfully replicated in an independent cohort (Kohannim et al. 2012b).

Following the univariate “voxelwise GWAS” by Stein et al. (2010a), two subsequent papers (Hibar et al. 2011b, c) proposed “vGeneWAS”, a multivariate version of voxelwise GWAS. In “vGeneWAS”, the SNPs in each gene are first encoded using a principal components regression, and then combined effects of these SNPs are tested against image voxels in the brain. Another novel multivariate version of voxelwise GWAS was proposed by Ge et al. (2012), where they used a multi-locus model based on least squares kernel machines to associate the joint effect of several SNPs with neuroimaging traits.

To help find signals in datasets of vast dimension in both imaging and genomic domains, sparse regression methods have also been adapted by several research groups to handle imaging measures and identify a compact set of genetic predictors from a vast set of genotypes SNPs (reviewed in Hibar et al. (2011a)). Vounou et al. (2010) proposed a sparse reduced-rank regression (sRRR) method to detect whole genome-whole image associations, and detected simulated genetic effects introduced into a number of ROIs. Relative to standard univariate modeling, they generally obtained higher detection sensitivities with sRRR, and rapid gains in sensitivity as the sample size increased. A later report, Vounou et al. (2012) used linear discriminant analysis to first find brain voxels in structural brain images that helped to classify individuals as AD or normal elderly. They then used this multivariate biomarker as a phenotype and performed sRRR analysis against the entire genome, which yielded promising results. A further refinement of sRRR was proposed by Silver and Montana (2012). Their method, known as pathways sparse reduced-rank regression (PsRRR), uses group lasso-penalized regression to jointly model the effects of genome-wide SNPs, but also groups them into functional pathways using prior knowledge of gene–gene interactions. Wang et al. (2012a) proposed a sparse multitask regression model that took into account the group structure within the predictors and demonstrated improved performance on predicting candidate AD MRI phenotypes using SNPs from 37 top AD risk genes. A relevant study by the same group proposed a novel task-correlated longitudinal sparse regression model and showed its promise for relating longitudinal MRI phenotypes to AD risk genes (Wang et al. 2012c).

Bi-multivariate correlation models have also been applied in ADNI genetics studies. Meda et al. (2012b) adapted parallel independent components analysis (pICA) to whole brain genome-wide analysis. A popular method for discerning functional networks in resting-state functional MRI scans, pICA identified several genetic components that were associated with an AD-relevant structural brain network. Wan et al. (2011) used elastic net (EN) and sparse canonical correlation analysis (sCCA) to examine SNP associations with hippocampal shape. Sparse regression and correlation methods outperformed standard multiple regression, suggesting the power and efficiency of sparse learning methods in imaging genetics.

A few other multivariate methods have also been used in ADNI genetics studies: (1) Reitz et al. (2012) performed haplotype analysis for relating FTO SNPs to diagnostic status, (2) Sabuncu et al. (2012) calculated a polygenic AD score and studied its association with cortical thickness, and (3) Swaminathan et al. (2012c) employed a candidate gene pathway set-based PLINK analysis to identify genes associated with amyloid imaging measures. In addition, many conventional and novel multivariate machine learning methods (including regression and/or classification) have been applied to predict APOE status (Soares et al. 2012), cognitive status (Wang et al. 2012b; Wolz et al. 2012), diagnostic status (Mattila et al. 2012; Wang et al. 2012b; Wolz et al. 2012), and/or disease conversion (Singh et al. 2012; Ye et al. 2012; Yu et al. 2012) from ADNI multimodal genetic and phenotypic measures.

It is obvious that these multivariate methods provide boosted power to identify complex multi-SNP-multi-trait relationships that univariate approaches are unable to reveal. They are especially suitable for studying disorders with complex genetic and phenotypic structures, such as AD. On the other hand, given the complexity of the identified multi-SNP-multi-trait patterns, it remains a challenge to convert this information into biologically meaningful interpretation.

Meta-analysis

Meta-analysis of GWAS results has become a popular strategy for gene discovery and validation (Evangelou and Ioannidis 2013). Meta-analysis provides a means to quantitatively synthesize GWAS findings from multiple independent studies and has the potential to increase statistical power and reduce false positives. The AlzGene database (http://www.alzgene.org/) provides a collection of meta-analysis results of genetic association studies using AD risk as the primary phenotype in either population-based or family-based cohorts (Bertram et al. 2007). A number of case control studies combined ADNI results with results from multiple other cohorts and performed meta-analytic analyses, including candidate gene studies (Antunez et al. 2011b; Jun et al. 2010; Reitz et al. 2012), GWAS analyses (Antunez et al. 2011a; Hollingworth et al. 2011; Kamboh et al. 2012; Naj et al. 2011), as well as a copy number variation (CNV) study (Swaminathan et al. 2012a).

Meta-analysis has also been performed on several quantitative trait studies using ADNI multimodal phenotype data. The largest study was done by the Enhancing Neuro Imaging Genetics through Meta-Analysis (ENIGMA) consortium (http://enigma.loni.ucla.edu/) (Stein et al. 2012). This study identified common variants associated with human hippocampal and intracranial volumes. It discovered an intergenic variant rs7294919 associated with hippocampal volume (12q24.22; N = 21,151; P = 6.70 × 10−16), an HMGA2 locus rs10784502 associated with intracranial volume (12q14.3; N = 15,782; P = 1.12 × 10−12), and a suggestive association with total brain volume at rs10494373 within DDR2 (1q23.3; N = 6,500; P = 5.81 × 10−7). Other meta-analysis studies where ADNI was included have been performed on findings from GWAS association studies of cognitive measures (Hu et al. 2011a), visual cortical surface area (Bakken et al. 2012), lentiform nucleus volume (Hibar et al. 2012), and hippocampal volume, total cranial volume, and white matter hyper-intensities (Melville et al. 2012), as well as on candidate gene (IL3) findings on brain volume, gray matter volume, and white matter volume (Luo et al. 2012).

Meta-analysis offers a quick and convenient approach to generalize results to a larger population, especially when a combined analysis is impractical among multiple sites (e.g., differences in imaging protocols, genotyping platforms, and/or data sharing policies). As more data is included, the statistical power will typically be increased to detect an effect. However, meta-analysis should be carefully conducted. If the study diversity is ignored (e.g., variability of participants, quality of data, potential for underlying biases), the increased statistical power might be lost and the meta-analysis results could be misleading.

Pathway analysis

Pathway analysis (or gene set enrichment analysis) tests for collective effects among multiple variants within the same biological pathway and represents an emerging strategy that may better connect genetic associations to biological interpretations (Cantor et al. 2010; Hirschhorn 2009; Ramanan et al. 2012b; Wang et al. 2010). A typical pathway enrichment analysis includes the following steps: (1) select one or more pathways to be tested in a hypothesis driven approach (e.g., (Sloan et al. 2010)) or test all pathways in a hypothesis-free, discovery-oriented approach; (2) select one or more appropriate knowledge bases to delineate the genes in the pathway(s) or to score gene relatedness; (3) assign SNPs to a gene or genetic region; (4) use a pathway scoring system to score each pathway or to identify the most implicated gene in a region; (5) use a statistical framework (e.g., permutation as implemented in GENGEN or bootstrapping as implemented in ALIGATOR (Holmans et al. 2009), among others) to test for enrichment of association within a pathway. The diversity of extant strategies for implementing these steps can impact the overall pathway enrichment outcome. Some algorithms (such as GENGEN) require individual genotype and phenotype data, while others only require summary association statistics. For example, text mining tools such as GRAIL (Raychaudhuri et al. 2009) and Chilibot (Chen and Sharp 2004) typically query a list of genes or SNPs to identify functional relationships based on published literature. Other quantitative methods such as GENGEN (Wang et al. 2009), GSA-SNP (Nam et al. 2010), the SNP ratio test (O’Dushlaine et al. 2009), and over-representation algorithms in the Ingenuity (http://www.ingenuity.com/) and MetaCore (http://portal.genego.com/) commercial software packages typically use rankings or other summary statistics from genetic studies to test for trends of association across multiple genes.

Existing pathway analysis methods have a variety of strengths and limitations and can be more or less conservative in relation to statistical confounders such as LD and pathway/gene size (Ramanan et al. 2012b). Nevertheless, pathway-based approaches can have increased power to discover functional relationships that are concealed at the level of individual SNPs or genes (Ramanan et al. 2012b; Wang et al. 2010). In addition, pathways occupied by robustly-associated genes can highlight additional genes with more modest effects on susceptibility but which may provide better targets for biomarker and drug development (Penrod et al. 2011). Finally, pathways may be more likely to bridge differences in genotyping platforms, phenotype processing, and the distribution of common and rare variants across populations, which can impair replication at the SNP or gene level.

Several pathway-based strategies have been successfully applied to ADNI data and reinforce these potential benefits. For example, a pathway analysis of CSF APOE levels identified lipid metabolism processes as enriched with genetic association even after removing APOE SNPs from the analysis, further supporting the hypothesis that lipid-related genes other than APOE contribute to AD neuropathology (Cruchaga et al. 2012). Similarly, despite finding no genome-wide significant associations at the SNP level after controlling for APOE ε4 allele status, a pathway analysis of episodic memory impairment found enrichment of association in numerous pathways, including those related to neurotransmission, calcium and cAMP response element binding protein (CREB) signaling, long-term potentiation, cell adhesion, and inflammation (Ramanan et al. 2012a). Further, Hu et al. (2011a) used GENGEN to demonstrate replication of association for the Gleevec signaling pathway in three independent case control data analyses including ADNI. These results provide striking complements to those obtained via standard methods focusing on individual genetic variants.

Pathways can also be used to focus analyses of smaller samples with rich endophenotypes such as amyloid imaging (Swaminathan et al. 2012c), to find common threads across multimodal phenotype analyses (Meda et al. 2012b), or to constrain the search space for computationally demanding approaches such as gene–gene interaction analyses (Meda et al. 2012a). The ability to prioritize pathways of interest may be particularly important to maximize the utility of next-generation sequencing (NGS) data, given its high granularity and facility for studying common, rare, and structural genetic variation that is more difficult to assess using GWAS alone.

Interaction and network analysis

Most reviewed studies examined the main effect of genetic predictors, but some studies investigated whether SNP-by-SNP interaction or SNP-by-phenotype interaction can affect various ADNI phenotypes. These interaction analyses may explain part of the “missing heritability” in AD. Examples of ADNI SNP-by-phenotype interaction results include the effect of multiple SNP-by-diagnosis interactions (e.g., MAGI2, EFNA5) on hippocampal volume (Potkin et al. 2009a), association between the interaction of a CR1 coding variant and APOE ε4 status and episodic memory decline (Keenan et al. 2012), an interaction between CR1 and APOE which was associated with brain amyloid phenotype (Thambisetty et al. 2012), an association between an interaction among APOE ε4 vs CSF p-tau/Aβ and entorhinal cortex atrophy rate (Desikan et al. 2012), and biomarker trajectory shapes as functions of MMSE that were affected by interactions with age and APOE status (Jack et al. 2012). In addition, two studies performed SNP-by-SNP interaction analyses. Kauwe et al. (2010a) demonstrated that an epistatic interaction between TF and HFE variants was significantly associated with AD risk. Meda et al. (2012a) studied gene–gene interactions within biological pathways associated with 12-month atrophy rate in regional brain volume from the hippocampus and entorhinal cortex, and identified 109 SNP–SNP interactions for right hippocampal atrophy and 125 for right entorhinal cortex atrophy. Enrichment analysis indicated significant SNP–SNP interactions were over-represented in the calcium signaling and axon guidance pathways for both hippocampal and entorhinal cortex atrophy, as well as in the ErbB signaling pathway for hippocampal atrophy. This study also constructed gene–gene interaction networks for entorhinal and hippocampal atrophy, respectively.

Interaction analyses have the potential to explain part of the “missing heritability” in AD. However, given the exponential nature of the search space, identifying high order interactions is not only computationally intensive but also statistically challenging. Developing effective strategies to address these challenges is still an active research topic in bioinformatics (Pan et al. 2013).

Biological findings, interpretation and validation

Major findings and implicated biological pathways

Shown in Supplemental Table 1 is a summary of major findings using ADNI genetics data, compiled according to the following criteria: (1) Only SNP-based and gene-based findings are included; (2) Pathway analysis results are not included; (3) Gene–gene interaction results (e.g., Meda et al. (2012a) and Kauwe et al. (2010a)) are not included; (4) If SNPs or genes are mentioned in the abstract, we include results only from these SNPs or genes (to prevent the table from growing too large); (5) If no SNP or gene is mentioned in the abstract, we include findings satisfying one of the following conditions: uncorrected p ≤ 0.00001 for GWAS studies, uncorrected p ≤ 0.01 for candidate SNP/gene studies, or corrected p ≤ 0.05 if no uncorrected p-values are available. Criterion 5 was determined with the following considerations: (a) most studies reported uncorrected p values; (b) some studies reported corrected p values with a variety of different correction schemes, making it hard for comparison; (c) p value thresholds were determined not based on statistical significance but to include a reasonable number of top hits from each paper. Of note, the FRMD6 gene (Fig. 2) was not included in this table. Although it was identified in each of multiple studies (Furney et al. 2011; Potkin et al. 2009a; Stein et al. 2010a) with uncorrected p ≤ 0.00001, the abstracts of these studies highlighted other more significant findings. The gene names in this table are either from the corresponding papers or extracted from SCAN (http://www.scandb.org/, a SNP and CNV Annotation Database) by querying the corresponding SNPs.

Several points regarding the sample size (N) of these studies are noteworthy. Almost all these studies analyzed the ADNI Phase 1 (ADNI-1) cohort, where the GWAS data were available on 818 ADNI-1 subjects, including 229 HC, 396 MCI and 193 AD participants at the baseline. In Supplemental Table 1, we specified which ADNI groups (HC, MCI, and/or AD) were included in the analyses; see the “cohort” column. After standard quality control (QC) and population stratification procedures for GWAS data, the total QC’ed sample typically included approximately 750 or less Caucasian participants. For some QT analyses, the phenotype data were available only on a subset of participants, e.g., half of ADNI-1 participants with CSF biomarker data, half with FDG-PET, and ~100 with PiB-PET. Therefore, in some cases the QT analysis with these phenotypes had a reduced sample size. Some studies treated MCI converters as AD patients, and thus the AD sample size could be increased and MCI decreased when the longitudinal diagnosis information was used. Some studies coupled the ADNI cohort with other cohorts for combined analysis or meta-analysis, which could result in a larger N. We refer readers to the source publications for the actual N.

The top 10 AD genes (APOE, BIN1, CLU, ABCA7, CR1, PICALM, MS4A6A, CD33, MS4A4E, CD2AP) based on large case control GWAS (Hollingworth et al. 2011; Naj et al. 2011) listed in the AlzGene database (http://www.alzgene.org/) were all discovered or replicated in these ADNI genetics studies and are included in Supplemental Table 1. In particular, several of these top hits were associated with ADNI quantitative phenotypes. For example, the APOE ε4 allele was associated with CSF Aβ42 and tau/Aβ42 (Kim et al. 2011), annual percent change of hippocampal volume (Saykin et al. 2010), total cerebral volume and hippocampal volume (Melville et al. 2012), entorhinal cortex thickness, hippocampal volume and temporal pole cortex thickness (Biffi et al. 2010), and multiple MRI measures (Shen et al. 2010). BIN1 was associated with entorhinal cortex thickness (Biffi et al. 2010). CLU was associated with CSF Aβ42 and p-tau (Kauwe et al. 2011). CR1 was associated with brain amyloid burden (Thambisetty et al. 2012) and episodic memory decline in healthy and cognitively impaired elders (Keenan et al. 2012). Several PICALM variants were associated with entorhinal cortex thickness (Biffi et al. 2010; Furney et al. 2011; Saykin et al. 2010) and CSF p-tau (Kauwe et al. 2011).

In addition to these top AD genes, several other genes were associated with different quantitative traits and discovered in multiple ADNI studies. For example, an APOC1 variant was associated with age-at-onset (Kamboh et al. 2011) and with rate of cognitive decline in elders with mixed phenotypes (De Jager et al. 2012) , while variants in SORL1 were associated with rate of decline in patients with AD. Different FTO variants were found to be associated with MRI phenotypes (Ho et al. 2010) and with disease status (Reitz et al. 2012). GRIN2B was associated with MRI phenotypes in multiple studies using different methods (Ge et al. 2012; Kohannim et al. 2012a; Stein et al. 2010b). MAGI2 was associated with hippocampal volume (Potkin et al. 2009a) and temporal lobe volume (Kohannim et al. 2012a). TOMM40 was found to be associated with disease status (Naj et al. 2010; Potkin et al. 2009a), with MRI traits (Saykin et al. 2010; Shen et al. 2010), and with CSF biomarkers (Cruchaga et al. 2012; Kim et al. 2011; Schott 2012). Overall, 14 genes (APOC1, APOE, BIN1, CD2AP, CLU, CR1, EPHA1, FTO, GRIN2B, MAGI2, MS4A4A, PICALM, TOMM40) have been replicated by at least two groups and in some cases (APOE, PICALM, TOMM40) by up to seven independent analyses. TOMM40 and APOC1 signals are in strong LD with APOE and are therefore often considered to be indicating the APOE locus (Jun et al. 2012), though the topic of multiple signals from this key region remains under investigation.

To better understand the broader functional implications of these findings, we performed pathway enrichment analysis using the MetaCore software package (http://portal.genego.com/). We mapped all the 101 unique hit genes from ADNI papers, which are listed in Supplemental Table 1, to 124 Metacore objects. Top enrichment results are shown in Table 2. Significantly enriched results (FDR p ≤ 0.05, see also Supplemental Fig. 1) by canonical pathway maps, which represent a set of signaling and metabolic maps covering human biology in a comprehensive way, include “cell adhesion, ephrin signaling”, “neurophysiological process, nNOS signaling in neuronal synapses”, “neurophysiological process, NMDA-dependent postsynaptic long-term potentiation in CA1 hippocampal neurons”, “immune response, alternative complement pathway”, and “development, neurotrophin family signaling”. Significantly enriched results (FDR p ≤ 0.05) by cellular and molecular process networks include “development, neurogenesis, axonal guidance”, “cell adhesion, synaptic contact”, “development, regulation of angiogenesis”, “cell adhesion, attractive and repulsive receptors”, and “development, neurogenesis, synaptogenesis”. The top 5 enriched results (FDR p ≤ 1.4 × 10−24) by diseases include “Alzheimer’s disease”, “tauopathies”, “mental disorders”, “late onset Alzheimer’s disease”, and “psychiatry and psychology”. Some of these pathways, including those related to cell adhesion, complement activation and immune responses, and neurotrophic signaling reinforce extant leading hypotheses about the pathogenesis of AD (Crehan et al. 2012; Liu et al. 2012; Rubio-Perez and Morillas-Ruiz 2012; Wyss-Coray and Rogers 2012). Meanwhile, other pathways such as those related to nitric oxide signaling and angiogenesis may represent underexplored or novel targets for future genetic, molecular, and pharmacologic studies.

Table 2 Metacore pathway enrichment analysis results: 101 unique hit genes from ADNI papers are mapped to 124 Metacore object

As a complementary approach, we also performed a network analysis using Chilibot (Chen and Sharp 2004) on 51 genes discovered from ADNI structural MRI genetic studies. Chilibot (http://www.chilibot.net/) searches PubMed abstracts and constructs content–rich relationship networks among biological concepts, genes, proteins, or drugs. No relationship was reported for five genes: BICD1, CAND1, GPCPD1, MAD2L2, and PRUNE2. We further filtered the graph by displaying only interactive relationships (i.e., excluding non-interactive relationship and abstract co-occurrence only). Figure 5 shows the resulting graph, containing 46 (51 − 5) query terms, which provides an aerial view of the genes associated with ADNI structural MRI phenotypes. It also shows the stimulative and/or inhibitory relationships among these genes, as well as how many PubMed abstracts support each of these relationships. The findings from MetaCore and Chilibot synthesize key disease-related mechanisms highlighted by the wealth of ADNI genetic studies and provide novel targets for further analyses.

Fig. 5
figure 5

Chilibot analysis on 51 genes discovered from ADNI structural MRI genetic studies. Chilibot (http://www.chilibot.net/) searches PubMed abstracts and constructs content-rich relationship networks among biological concepts, genes, proteins, or drugs. We did a Chilibot query using 51 genes discovered from ADNI structural MRI genetic studies. No relationship was reported for five genes: BICD1, CAND1, GPCPD1, MAD2L2, and PRUNE2. We further filtered the graph by displaying only interactive relationships (i.e., excluding non-interactive relationship and abstract co-occurrence only). Shown here is the resulting graph, containing 51 − 5 = 46 query terms. Note that the 15 isolated genes shown in the bottom were included in the figure, due to the existence of non-interactive relationship or abstract co-occurrence only relationship (not shown) between them and some query genes. They were shown here as isolated units because there was no interactive relationship connecting them to other genes

Molecular validation and other follow-ups

Genetic studies provide potential insights into novel disease mechanisms. The Aβ hypothesis was first identified from pathological evidence but is further supported by genetic evidence in early-onset familial Alzheimer’s disease. In late-onset Alzheimer disease, approximately 10 loci have been robustly associated with AD susceptibility and other intermediate endpoints (reviewed above). Molecular characterizations of these findings have started to emerge, beyond APOE that was discovered two decades ago. Recent studies suggested that the PICALM gene may be involved in Aβ transportation across the blood–brain barrier and Aβ internalization (Baig et al. 2010; Xiao et al. 2012). Another variant in CR1 has drawn considerable attention since potential functional variations of the gene have been proposed either through structural variation (Brouwers et al. 2012) or a coding variant that may be associated with episodic memory decline (Keenan et al. 2012). Molecular characterization of this gene is challenging as a single gene in the mouse encodes CR1 and CR2 while two separate genes are present in humans (Jacobson and Weis 2008).

Each of the additional genetic loci from GWAS studies defines a region rather than pin-points a causal gene/variant, which can limit the scope of the molecular characterization. Initial regional sequencing and eQTL studies for CLU, PICALM, CR1, BIN1, MS4A6A/MS4A4E, CD33, CD2AP, ABCA7, and EPHA1 have not successfully identified common variants explaining the regional expressions of the genes in the brain or additional coding variants responsible for the observed association signal (Holton et al. 2013). Future studies will likely require larger sample sizes and more comprehensive characterizations of the RNA species in the brain regions (e.g., RNA sequencing). Other novel promising technologies to follow up genetic loci include induced pluripotent stem cell models that carry the genetic background from the patient.

Discussion and future directions

Summary of analytical strategies and biological findings

We have provided a systematic review of 106 papers published between 2009 and 2012, which analyzed ADNI APOE and GWAS data (Fig. 1, Table 1). We presented an overview of genetic findings from case control studies, as well as association analyses of multi-modal quantitative phenotypes including structural neuroimaging, functional neuroimaging, fluid biomarkers, and cognitive performance. We also reviewed a variety of analytical strategies used in these studies, including univariate and multivariate analysis, meta-analysis, pathway analysis, and interaction and network analysis.

We summarized the major findings in Supplemental Table 1, which includes all of the top 10 AD genes from the AlzGene database. In particular, several of these top AD genes (e.g., APOE, BIN1, CLU, CR1, and PICALM) were corroborated by ADNI imaging, fluid, and cognitive phenotypes. ADNI imaging genetics studies also discovered novel findings (e.g., FRMD6) that were later replicated on different data sets. Several genes (e.g., APOC1, FTO, GRIN2B, MAGI2, and TOMM40) were discovered to be associated with multiple phenotypes by multiple studies using different approaches, and these genes warrant further investigation and replication on other data sets.

Pathway and network analyses of these findings implicated multiple interesting biological pathways as well as stimulatory and/or inhibitory relationships among these genes. It is noteworthy that the majority of mapped pathways pointed to neurodevelopmental rather than neurodegenerative processes. In addition to standard SNP-based association studies, ADNI genetics data have also been used in several other applications, including in a comparative study of genotype imputation (Nho et al. 2011), copy number variation analyses (Swaminathan et al. 2011, 2012a, b), and for modeling disease progression (Samtani et al. 2012).

These genetics research accomplishments using ADNI data have indicated the strong potential power of multidimensional quantitative phenotypic data for identification of novel genetic variants and for investigation of disease mechanisms. This should deepen our understanding of the biological pathways involved in the disease trajectory and cognitive decline. The availability of ADNI genetic data and multidimensional phenotypes from multiple imaging and biomarker modalities has enabled new discovery of promising candidate genetic variants which may serve as targets for development of disease-modifying agents or lead to enhanced diagnostic techniques and biomarkers.

Strength and limitations of QT analyses

Compared to case control status, quantitative traits can offer increased statistical power and thus have reduced requirement on sample size. This has been demonstrated by the discovery of the FRMD6 gene (Fig. 2) mentioned earlier, as this gene was identified in three ADNI imaging genetics studies (Furney et al. 2011; Potkin et al. 2009a; Stein et al. 2010a) with modest sample sizes (N ≤ 1,004) and later validated by a case control GWAS with a much larger sample (Hong et al. 2012) (N > 12,500). On the other hand, the samples studied in most QT analyses reviewed here are much smaller than those used in large scale GWAS studies of AD. However, these ADNI QT analyses not only identified associations with top AD risk genes but also discovered potential novel AD loci. This further demonstrates the increased statistical power of QT analyses over case control analyses. Another advantage of using QTs as intermediate phenotypes is that this strategy may have greater sensitivity in clarifying the functional links related to the susceptibility genes than diagnostic categories, as continuous measures may more fully capture participants with subsyndromal changes in pathologic markers of AD but which may not meet MCI or AD diagnostic criteria, making it easier to identify these genes.

Quantitative phenotypes in theory may achieve better statistical power compared to disease susceptibility analysis. The analysis, however, can be greatly influenced by the relative sensitivities and accuracies of different phenotypes (e.g., the 12-item ADAS-cog may not be as sensitive to differences in cognition in prodromal AD). Investigators have detected significant genetic markers from relatively modest sample sets based on quantitative traits (e.g., (Hu et al. 2011b)) yet the additional requirements in characterizing phenotypes (e.g., longitudinal quantitative traits) have also made replication studies more challenging. Improved statistical methods warrant further investigation to minimize the number of these false positives while increasing sensitivity.

Next generation sequencing and convergent “omics”

The ultimate goal of any genetic study of disease is to improve the treatment of the disease. This end can be achieved most directly through new therapeutics, but also via enhanced diagnostics and biomarkers for the improved targeting of patients and for use in therapeutic development. An increased understanding of the genetic basis of any disease should lead to the development of novel treatment options; of the roughly 500 human genes that have been successfully used as drug targets, ~50 % are also linked to human diseases (Wang et al. 2012d). This is substantially higher than the proportion of human genes in the genome that have been linked to disease (roughly 11 %). Although it is impossible to definitively prove the reason for this association, the most likely explanation is that in most cases a gene known to be causal for any given disease is (almost by definition) in a pathway possibly suitable for phenotypic modification to ameliorate the disease. Before this can be attempted, the gene’s “druggability”, or how easily a drug against that gene class can be synthesized, needs to be assessed.

Next generation sequencing (NGS) holds promise in this area, especially when used in conjunction with GWAS data, and ideally will assist in overcoming some of the limitations of traditional GWAS approaches. Specifically, GWAS arrays were designed to identify genetically homogeneous small genomic regions and/or haplotypes (as opposed to individual genes). This fact is not always appreciated by non-geneticists and has undoubtedly led to incorrect assumptions concerning which genes have been implicated by certain GWAS. Upon completion of a GWAS, significant SNPs are frequently identified as “hits” which are then assigned to the closest gene. This strategy does not consider the fact that the SNP in question may well be in tight linkage with another SNP or genetic variant several to hundreds of kb away (Christoforou et al. 2012). Students of the history of AD genetics research may be less likely to make this error as the association of APOE to AD acts as a case study for this phenomenon. The APOE ε4 allele has unequivocally been linked to Alzheimer’s disease in over 100 studies (Bertram et al. 2007); however in large AD GWAS the most significant SNPs have frequently been located physically closer to other genes (APOC1 in (Naj et al. 2011); TOMM40, and PVRL2 in (Harold et al. 2009)) than to APOE (though in tight linkage with the APOE ε4 allele). Jun et al. (2012) concluded “APOE alleles ε2, ε3, and ε4 account for essentially all the inherited risk of AD associated with this region. Other variants including a poly-T track in TOMM40 are not independent risk or AAO (Age At Onset) loci”. The correct assignment of SNPs to genes remains an issue if linkage is not taken into account during GWAS analysis (Christoforou et al. 2012). Imputation can lessen this effect to some extent, but not completely remove it. Additional resolution is afforded by analyses probing for association while conditioning on genotypes at known disease-associated variants.

NGS (either whole genome sequencing or whole exome sequencing) alleviates the issue of being unable to pinpoint the actual causal variants to some extent, and hopefully will identify the next generation of drug targets for AD and other diseases. There is good reason to be optimistic about the use of NGS in Alzheimer’s disease and specifically ADNI, based on some early successes. The genetic profile of the ideal drug target is relatively straightforward; loss of function and nonsense variants in the gene would correlate perfectly with protection from the disease in question (this is because it has historically been easier to make antagonists of a gene than agonists, but the reverse would hold true for gene classes for which agonists can be developed). The population frequency of the protective allele is less relevant; extremely rare alleles can identify drug targets in the correct pathway as well as common ones. A recent example (Jonsson et al. 2012a) comes from the analysis of whole genome sequencing (WGS) of roughly 1,800 Icelanders. Here, the A673T allele in the APP gene appears to confer a roughly 4- to 8-fold reduction in the risk of developing AD, possibly through a reduction in the production of Aβ via less efficient cleavage of APP by BACE1. Assuming these results are verified/replicated, they would strongly suggest BACE1 inhibition would alter the progression of AD if given at an early time point. In a similar example, whole exome/whole genome sequencing in large numbers of individuals led to the identification of the R47H variant in TREM2 as a risk factor for late-onset AD (Guerreiro et al. 2013; Jonsson et al. 2012b). The identification of TREM2 as an AD risk gene is notable in two respects. First, with an odds ratio in the range of 3–5 over normal controls, if confirmed it may be among the largest effect sizes detected for any AD risk factor since the discovery of APOE ε4 in 1993. Secondly, as an immune receptor expressed on microglia and linked to inflammation in the brain, this suggests new directions (or at least the elevation of certain candidate pathways) for compound development. Regardless of whether either of these findings directly leads to a therapy for AD, it is emerging that the application of NGS has already provided us with new insights into the pathophysiology of AD and other neurodegenerative diseases. Early applications of whole exome sequencing in an extreme phenotype design in ADNI-1 data have indicated variants associated with rate of progression of hippocampal volume loss (Nho et al. 2013a, b).

The use of NGS data in conjunction with the ADNI multimodal imaging, fluid and clinical phenotypes will likely be extremely informative. While the discovery of a new risk factor is interesting, the question as to whether the disease is the same in carriers and non-carriers remains open. Is the progression rate identical? Do the relative levels of disease progression biomarkers differ? Do they change at different rates with disease? And equally importantly, are there other genetic factors that affect biomarker levels/activity even if they do not affect the risk of developing the disease? Controlling for variants such as these will be critical during the design/implementation of phase II and III clinical trials. One can argue that it would have been nearly impossible to detect these variants (let alone know the extent of their effect on biomarkers) without the combined analysis/intersection of NGS data with ADNI imaging, fluid and/or cognitive biomarker (and/or detailed clinical) progression information. Indeed, the most common genetic risk factor for AD, the APOE epsilon 4 allele, increases not only the risk of AD but also the rate of AD progression both in terms of cognitive decline (e.g., (Stone et al. 2010)) and temporal lobe atrophy (e.g., (Hua et al. 2010; Risacher et al. 2010; Saykin et al. 2010)). This makes APOE genotyping critical in AD disease progression clinical trials in order to control for variance among individuals over time. Further understanding of the genetic variants influencing imaging, fluid and/or cognitive biomarker levels and change will also increase the power and accuracy of clinical trials.

The first, pioneering “Big Data” project for AD has been launched to enable scientists to analyze the entire genome sequence of over 800 individuals enrolled in the ADNI-GO/2. Sequencing and initial QC have been completed and data release is expected by fall of 2013. This ADNI whole genome sequencing project, coupled with very rich multimodal ADNI phenotypes, is expected to provide important new insights into the genetic mechanisms of Alzheimer’s disease, which in turn will impact the development of new diagnostic, therapeutic and preventive approaches.

Application to clinical drug development

Use of genetic information within clinical drug development primarily depends on effect size and predictive value of the marker(s) to the associated endpoint. To date, no genetic factor beyond the APOE genotype has provided meaningful application to the design and interpretation of clinical trials for dementia and Alzheimer’s disease therapies. While AD case control genetic studies provide drug developers with new targets and pathways to pursue, the small individual effect sizes of the risk variants may not be as useful as APOE for direct application within clinical studies. However, genetic studies of quantitative phenotypes and the convergence of genetics with other biomarker endpoints are providing opportunities for new application of genetics to clinical trial design beyond and in addition to APOE. Potential use of genetic biomarkers in combination with other biomarker and clinical data may include their use in trial design enrichment (enrolling mostly or entirely biomarker-positive subjects), prospective or post-hoc stratification (reduce or correct for heterogeneity), and/or balancing or tailoring strategies. A recent finding from ADNI, the first GWAS of amyloid PET (Ramanan et al. 2013), confirmed the role of APOE in amyloid burden on [18F]florbetapir PET but also detected a genome-wide significant role of the BCHE gene that codes for butyrylcholinesterase, long studied as a constituent of plaque. Together these loci accounted for 15 % of the variance in amyloid deposition.

As the paradigm of patient selection for AD-modifying candidate therapies shifts to an enrichment of the MCI population, drug developers work to reduce costs of clinical trials by aiming to demonstrate response over a shorter duration and with fewer numbers of subjects. Many factors challenge the success of this development model. Most stem from a lack of accepted diagnostic and prognostic biomarkers of disease. The work of several groups (Singh et al. 2012; Wolz et al. 2012; Yu et al. 2012) demonstrate that use of novel statistical and modeling approaches for stratification based on composite biomarker and cognitive rating scales can remove unwanted heterogeneity and will likely improve statistical power within the enrolled cohort. This type of approach can provide a powerful tool for prospective enrichment or stratification in clinical trials. To further increase predictive or prognostic power of these algorithms, inclusion of additional genetic biomarkers may be warranted. For example, if the markers identified in age-at-onset studies can be replicated in a prospective cohort, can these markers add additional predictive power beyond APOE for MCI to AD progression (Kamboh et al. 2011; Roses et al. 2010)? And while the effect of APOE on observed variability in rate of AD progression is at least in part driven by carrier status of the APOE ε4 allele, if the remaining heritability is identified will it be able to increase power in models of prognosis (Potkin et al. 2009b, d; Stone et al. 2010)? If validated, such models could help select a clinical trial population with less heterogeneity, decreasing trial duration and size.

Additionally, there is opportunity to incorporate genetic findings associated with quantitative phenotypes into analyses within early phase clinical development. Most early phase programs use markers of pharmacodynamic response and target engagement/specificity to demonstrate pharmacologic response, including CSF, plasma, and imaging-based biomarkers. Genetic markers identified as being associated with longitudinal changes in these biomarkers could be assessed within the clinical programs and used as covariates in analyses for post-hoc stratification or as enrichment criteria, decreasing phenotypic variability within the trial cohort.

Among the most desired uses of genetic biomarker information is in prospective definition of enhanced pharmacologic response, or as a predictive pharmacogenetic marker. No disease-modifying agent has yet made it into the marketplace, but recent reports have shown hints of efficacy in subgroups of individuals treated with drug candidates. APOE has been assessed in most of these clinical trials, but it alone has not shown correlation with subgroups of responders vs. non-responders. It would be valuable to drug development programs to determine whether additional biomarkers are associated with response profiles, including genetic markers. A list of plausible candidate genes could easily be assimilated based on output from the studies of molecular heterogeneity, disease pathways, endophenotypes, and quantitative phenotypes, which could serve as testable hypotheses in an analysis of drug response within phase 2 testing. These markers could then be prospectively defined and tested for validity in phase 3 clinical trials, potentially defining which individuals could optimally be prescribed the drug.