Background

Late-onset Alzheimer’s disease (AD) occurring after age 65 is the most common type of dementia and is highly heritable, estimated at 60% to 80% [1]. Common single-nucleotide polymorphisms (SNPs) explain 24% to 33% of the total phenotypic variance of AD [2,3,4], of which up to 6% is accounted for by APOE [3]. More than 75 loci affecting AD risk have been identified in several large-scale genome-wide association studies (GWAS) [5,6,7,8,9,10], but much of the underlying genetic architecture of AD remains unknown [11].

Although GWAS conducted in larger samples will undoubtedly reveal additional AD loci, understanding of the genetic influence on AD risk can be improved by examining the association with endophenotypes that potentially highlight specific pathways underlying the complex disease phenotype. Previously, numerous genetic associations have been identified for AD-related endophenotypes, such as cognitive performance [12,13,14,15], brain imaging traits [16,17,18], neuropathological traits [19,20,21], and biomarkers measured in cerebrospinal fluid [22,23,24]. GWAS have found several loci for general cognitive ability [13, 25], but most findings for specific cognitive domains are not genome-wide significant (GWS), inconsistent, and rarely replicated in independent datasets, perhaps because of the variability in neuropsychological (NP) tests administered across cohorts [26,27,28,29]. To address this concern, Mukherjee and colleagues applied confirmatory factor analysis models to co-calibrate and harmonize composite scores for several cognitive domains. The scores obtained are on the same scale, making them comparable to each other regardless of the NP protocol [15, 30].

Here, we conducted a GWAS for cognitive scores of three domains derived from longitudinal, prospectively collected NP tests administered to participants of several large cohort studies. The statistical power for detecting associations with endophenotypes can be increased by studying outcomes of multiple correlated traits under a model of pleiotropy—a phenomenon where a single gene or variant affects multiple phenotypes [31,32,33,34]. This approach has successfully identified novel associations for neuropathological processes in AD [35,36,37]. Because measures of cognitive performance are highly heritable and correlated with each other [38,39,40], they are well suited as outcomes for cross-phenotype genetic association studies. Therefore, we also tested pleiotropy models for each pair of the three cognitive domains to identify novel loci which may be involved in AD.

Methods

Participants

This study included non-Hispanic white participants of the Framingham Heart Study (FHS), National Institute on Aging sponsored Alzheimer’s Disease Research Centers whose phenotypic information was assembled and curated by the National Alzheimer’s Coordinating Center (NACC), the Adult Changes in Thought (ACT) Study, the Alzheimer’s Disease Neuroimaging Initiative (ADNI), and the Religious Orders Study/Rush Memory and Aging Project (ROSMAP). Briefly, FHS is a long-running multi-generation community-based study of cardiovascular disease and other age-related disorders [41,42,43], including cognitive decline and dementia [44, 45]. ACT and ROSMAP are also community-based cohorts that recruited unrelated cognitively normal participants who are followed longitudinally for cognitive disorders [46, 47]. Participants of NACC and ADNI were clinically ascertained for AD research and were cognitively normal or met the criteria for mild cognitive impairment (MCI) or AD at the time of enrollment [48,49,50,51,52]. Extensive cognitive testing of participants of all is conducted at all visits. Details regarding the ascertainment, evaluation, and diagnosis of members of these five cohorts were reported elsewhere [45,46,47,48, 52].

Cognitive domain scores

Scores for executive function, language, and memory domains were derived as previously described [15, 30, 53]. Scores are co-calibrated to put them on the same scale regardless of the cognitive battery administered. Briefly, an expert panel of neuropsychologists (EHT, AJS) and a behavioral neurologist (JBM) assigned each NP test item to one of the three domains. Confirmatory factor analysis in Mplus [54] was used for co-calibration. Cognitive data from the most recent visit were first used to derive scores for each domain, with each domain modeled separately. Test items administered in multiple cohorts functioned as anchors for co-calibration. Parameters for anchor items were forced to be the same across studies to put scores across studies on the same metric. Within study, multiple models (including single factor and bifactor models) were considered with the choice of model determined based on a combination of model fit and concordance with neuropsychological theory. Next, each study’s item parameters from calibration of data at the last visit were fixed and used to obtain scores for each person at each time point. Co-calibrated cognitive scores with a standard error (SE) > 0.6 or derived solely from the mini-mental state examination (MMSE), which has a ceiling effect, were excluded. Time points less than age 60 were only available for FHS and were excluded due to concern that cognitive performance under age 60 may have a different genetic architecture that was only being captured in a single study.

Genotype data processing

We obtained genome-wide SNP data that were processed and imputed using the Trans-Omics for Precision Medicine (TOPMed) reference panel and aligned to Genome Research Consortium human build 38 (GRCh38) [10, 55]. Variants with poor imputation quality (r2 < 0.3), minor allele frequencies (MAF) < 0.01, call rates < 95%, and Hardy-Weinberg Equilibrium (HWE) test p-value < 1 × 10–6 were excluded, and approximately nine million variants remained for each cohort after quality control (QC). Principal components (PCs) of population structure were generated for individuals within each cohort using the set of post-QC variants that were pruned on the basis of a linkage disequilibrium (LD) threshold of 0.1 using the R package GENESIS [56]. Measures of relatedness, kinship coefficients for family-based samples and empirical identity by descent (IBD) in the other samples, were estimated using established procedures [57,58,59].

Genetic and phenotypic correlation estimation

Genetic correlations between each pair of cognitive domain scores (executive function, language, and memory) were estimated in each cohort using GREML [60, 61]. The kinship matrix derived from self-reported FHS pedigrees was incorporated in the estimates using the kinship2 package [62]. We used the empirical genetic relationship matrix (GRM) to account for relatedness among individuals in the other cohorts. To concurrently investigate SNP associations with both performance at the median age and change in cognitive function over time in each domain, we applied a joint test of the marginal genetic effects and gene × age interaction together in a generalized linear mixed model framework with a random slope and intercept as implemented in the mixed‐model association test for gene-environment interactions (MAGEE) R package [63, 64]. Models included terms for SNP, the interaction between SNP and age, and covariates for age, sex, educational level (less than high school, high school, some college, or college graduate), and the first five PCs represented as follows:

$$factor\, score={\alpha }_{A}age+{\alpha }_{B}sex+{\alpha }_{C}education+{\beta }_{G}SNP+{\gamma }_{X}\left(SNP\times age\right)+{\alpha }_{i}\sum\limits_{i=1}^{5}{PC}_{i}+r$$

where αA, αB, and αC indicate the effects of age, sex, and educational level, respectively; βG is the main SNP effect; γX represents the SNP × age interaction effect; αi is the effect of the ith PC (i between 1 and 5); and r is a random intercept. We subtracted the median age for all observations for all individuals in the dataset from the individual’s age at each exam in order to center age because the intercept will refer to the mean outcome value when an individual’s baseline age is equal to the mean age at baseline in each dataset. Models also incorporated the GRM as a random effect.

Cross-trait analyses

We performed cross-traits LD score regression [65] to estimate genetic correlations across general cognitive function, cognitive domain scores, and neuropsychiatric disorders. We used GWAS summary statistics for general cognitive function (n = 300,486) [13], cognitive factor scores (n = 23,066) from the current study, neuropsychiatric disorders—AD, bipolar disorder, schizophrenia (n = 420,531), and depression (n = 370,457)—from the Pan-UK Biobank, and LD scores derived from the 1000 Genomes Project (phase 3) European samples. We only included 4,815,014 variants with imputation quality r2 > 0.6 and MAF > 0.01 in cross-trait analyses.

Genetic association analyses

Analyses were performed in each dataset separately, and the GWAS results were combined across datasets by meta-analyses. To correct systematic inflation in a joint test of the SNP’s main and interaction effects [66, 67], we applied the joint meta-analysis method [68] which considers the covariance between the main and interaction effects, and the inverse variance weighted approach in METAL [69]. Meta-analyses were performed for each cognitive domain in the total sample, clinic-based cohorts (NACC and ADNI), and community-based cohorts (FHS, ACT, and ROSMAP) separately (Fig. S1). Results for the clinic- and community-based cohorts were considered separately because some associations might be unique to one of the cohort groups due to disparity in age or proportion of participants with AD. The genomic inflation factor (λ) was calculated for each GWAS and applied to adjust p-values for each test. A GWS threshold was set at P = 5 × 10–8.

Genome-wide pleiotropy analyses

We conducted a pleiotropy GWAS for each pair of cognitive domains in the total sample, clinic-based cohorts, and community-based cohorts using the pooled GWAS results from the joint meta-analysis (Fig. S1) and the R package PLACO [70, 71]. Because rejecting the global null hypothesis that neither phenotype is associated does not specifically imply the existence of pleiotropy, PLACO tests the composite null hypothesis that no more than one phenotype is associated with a variant. Thus, rejecting the composite null hypothesis implies that both phenotypes are associated with the variant, i.e., pleiotropy. This approach uses the product of the Z-statistics as the test statistic for the association of a given variant with each individual trait. The null distribution of the test statistic takes the form of a mixture distribution that allows for the variant to be associated with none or only one of the traits. Variants with squared Z-scores > 80 for one trait were removed because they could cause spurious pleiotropic signals [72, 73]. Because correlations between the Z-statistics for the association between a variant and the two traits can result in inflated type I errors [74], we adjusted for the Pearson correlation for variants with no effect (P > 1 × 10–4) as suggested by the developers of the method.

Pathway enrichment analyses

We performed several pathway analyses, each of which was seeded with genes containing variants associated with a single cognitive domain or pleiotropy for paired domains (P < 1 × 10–4) in the respective GWAS, using the Ingenuity Pathway Analysis software (QIAGEN Inc.) [75]. Enrichment p-values for each canonical pathway were adjusted for a false discovery rate (FDR) using the Benjamini-Hochberg method [76], and an FDR-adjusted P-value threshold was set at 0.001 to account for the 18 separate pathway analyses (six single and paired domains multiplied by three sample strata).

Results

Cognitive domains are phenotypically and genetically correlated

Compared to the community-based cohorts, the clinic-based cohorts had more males, participants who were younger and better educated, and higher proportion of participants who were diagnosed with MCI or AD (Table 1). Even though the mean and median ages at the last visits of individuals in the clinic-based cohorts were slightly lower than those in the community-based cohorts, scores for executive function and memory were significantly lower (P < 0.001), and the language score was significantly higher (P < 0.001) in the clinic-based cohorts. Phenotypic and genetic correlations for each pair of factor scores in each dataset were moderate to high (phenotypic r = 0.56–0.86, genetic r = 0.57–0.72) (Table S1). Most phenotypic and genetic correlations were higher for ROSMAP compared to the other cohorts. Cross-trait analyses revealed that factor scores for all three cognitive domains are significantly genetically correlated with general cognitive function (0.51 ≤ r ≤ 0.77) (Table S2). Although none of the traits were significantly correlated with AD or other psychiatric disorders, the language domain score was moderately associated with depression (r = 0.60, P = 0.11) and the memory domain score was strongly associated with AD (r = 0.90, P = 0.40). Lack of significance for these results may be due to insufficient power for genetic correlations with dichotomous outcomes.

Table 1 Characteristics of study participants

GWAS identifies multiple established AD and novel loci associated with individual cognitive domains

There was little evidence of genomic inflation (λ = 1.006–1.023) in the GWAS for each cognitive domain and strata of the sample (Figs. S2, S3 and S4). GWS associations were observed for many SNPs in the APOE region for all traits (Table S3). We also identified associations with several other established AD loci (Table 2). BIN1 SNP rs6733839 was associated with language (PJoint = 2.70 × 10–8) and memory (PJoint = 2.37 × 10–9) in the total sample and with both language (PJoint = 1.98 × 10–9) and memory (PJoint = 1.60 × 10–8) in the clinic-based cohorts. The significant joint effect of rs6733839 is due primarily to the SNP’s main effect rather than its interaction with age and is supported by multiple adjacent variants (Figs. S5 and S6). GWS associations for memory were also observed with CR1 SNP rs1752684 (PJoint = 8.85 × 10–9) and MS4A6A SNP rs7232 (PJoint = 3.97 × 10–8) in the clinic-based cohorts, findings which were supported by adjacent SNPs (Fig. S7). Similar to BIN1, results of the joint test of the main and interaction effects for rs1752684 and rs7232 reflect the SNPs’ main effects. GWS associations were also detected with SNPs in four additional loci, ULK2 (rs157405, PJoint = 2.19 × 10–9) with executive function in the community-based cohorts, CDK14 (rs705353, PJoint = 1.73 × 10–8) with language in the clinic-based cohorts, PURG (rs117523305, PJoint = 1.73 × 10–8) with memory in the community-based cohorts, and LINC02712 (rs145012974, PJoint = 3.66 × 10–8) with language in the total sample (Fig. 1). We also identified a GWS association of memory with GRN (rs5848, PJoint = 4.21 × 10–8) in the total sample (Fig. 1). Unlike the associations with the other known AD loci, the interactions of the ULK2 (PG×Age = 7.65 × 10–7), CDK14 (PG×Age = 2.54 × 10–9), PURG (PG×Age = 1.41 × 10–8), LINC02712 (PG×Age = 7.69 × 10–9), and GRN (PG×Age = 1.07 × 10–6) SNPs with age accounted for the significant joint test findings (Table 2).

Table 2 Genome-wide significant associations for cognitive domain scores
Fig. 1
figure 1

Locus Zoom plots showing the association of SNPs in the regions of novel loci with cognitive domains. The SNP with the lowest p-value at each locus is indicated with a purple diamond. Computed estimates of linkage disequilibrium (r2) of SNPs in the region with top-ranked SNP are color-coded according to the key. Vertical blue lines indicate locations of high recombination rates. Locations of genes in the region are shown below the diagram. a Association of rs157405 with executive function in the community-based cohorts. b Association of rs705353 with language in the clinic-based cohorts. c Association of rs117523305 with memory in the community-based cohorts. d Association of rs145012974 with language in the total sample. e Association of rs5848 with memory in the total sample

The APOE region comprised many variants significantly associated with all the cognitive domains in all cohort groupings (Table S3). The high LD between these variants suggests that there are not multiple independent association signals, a conclusion supported by evidence that no genes in this region other than the APOE account for the observed association with AD risk or onset age [77]. Focusing on the APOE SNPs encoding the ε4 (rs429358) and ε2 (rs7412) alleles, the ε4 SNP was significantly associated with lower (worse) scores for all cognitive domains in an age-dependent manner, based on the negative sign of βG×Age. Notably, the magnitude of the effect of the ε4 SNP on memory was approximately 1.7 times greater than on executive function or language in the total sample at the median age (Table 2). In the same sample, the effect of interaction between ε4 and age was 1.5–1.9 times larger for memory compared to executive function or language. Conversely, the ε2 SNP was significantly associated with higher (better) cognitive domain scores in the clinic-based cohorts at the median age with some limited age-dependent effect (Table 2).

Numerous highly suggestive associations (P < 1 × 10–6), including several that were nearly GWS (P < 1 × 10–7), were found for individual cognitive domains with other loci (Table S3). Notably, language was associated with 18 ADCY2 SNPs in the community-based cohorts (top SNP: rs7734697, PJoint = 6.34 × 10–8) and with 19 DAPK2 SNPs in the clinic-based cohorts (top SNP: rs112972763, PJoint = 7.76 × 10–8). Six PLXDC2 SNPs were associated with executive function in the total sample (top SNP: rs7083449, PJoint = 7.02 × 10–8), and most of the evidence was derived from the community-based cohorts.

Genome-wide pleiotropy analysis identifies the association of cognitive domains with the progranulin gene and four novel loci

GWAS for the three pairs of cognitive domains collectively identified GWS evidence of association with SNPs in five independent loci (Table 3, Table S4) with little evidence of genomic inflation in the total sample or separately within the clinic-based and community-based cohorts (λ = 0.964–0.994, Figs. S8, S9 and S10). Consistent with the findings from analyses of individual cognitive domains, GWS evidence of pleiotropy was found for the association of the APOE region SNPs with all cognitive domain pairs in the total sample and the clinic-based and community-based cohorts (Table S4). GWS pleiotropy was also observed with rs6733839, located between BIN1 and CYP27C1, in the total sample (PJoint = 9.01 × 10–12) and the clinic-based cohorts (PJoint = 6.85 × 10–10) for language and memory (Table 3). The association with rs6733839 was evident in the community-based cohorts (PJoint = 2.52 × 10–4), strengthened in the total sample (PJoint = 9.01 × 10–12), well supported by association with neighboring variants (Fig. S11), and attributable primarily to its main effect for each domain (Table 3).

Table 3 Genome-wide significant pleiotropic loci for each pair of cognitive domains (excluding the APOE region)

In the clinic-based cohorts, there was GWS pleiotropy for language and memory with rs73005629 (PJoint = 3.12 × 10–8) located in an intergenic region on chromosome 4 (Table 3, Fig. 2). The joint effect of rs73005629 on language (PJoint = 4.66 × 10–7) and memory (PJoint = 1.47 × 10–7) was equally attributable to its main and interaction effects. There was no evidence of pleiotropy for rs73005629 in the community-based cohorts. Conversely, significant pleiotropy for the same domain pair was observed with NCALD SNP rs56162098 (PJoint = 1.23 × 10–9) and PTPRD SNP rs145989094 (PJoint = 8.34 × 10–9) in the community-based cohorts (Table 3). The association with rs56162098 was not evident in the clinic-based cohorts but was supported by the association with neighboring variants (Fig. 2). The same PTPRD SNP was also pleiotropic for executive function and memory (PJoint = 3.85 × 10–8), but this association is not supported by findings in the clinic-based cohorts (Table 3) or neighboring SNPs (Fig. 2).

Fig. 2
figure 2

Locus Zoom plots showing genome-wide significant pleiotropy for SNPs in the regions of novel loci. The SNP with the lowest p-value at each locus is indicated with a purple diamond. Computed estimates of linkage disequilibrium (r2) of SNPs in the region with top-ranked SNP are color-coded according to the key. Vertical blue lines indicate locations of high recombination rates. Locations of genes in the region are shown below the diagram. a Association of rs12447050 with executive function and memory in the community-based cohorts. b Association of rs56162098 with language and memory in the community-based cohorts. c Association of rs145989094 with executive function and memory in the community-based cohorts. d Association of rs145989094 with language and memory in the community-based cohorts. e Association of rs73005629 with language and memory in the clinic-based cohorts

We also identified significant pleiotropy in the community-based cohorts for executive function and memory with rs12447050, located 5.5 kb upstream from OSGIN1 (PJoint = 4.09 × 10–8). This association was comparably supported by each domain and the SNP’s main effect and interaction with age (Table 3), as well as by neighboring SNPs (Fig. 2). There was no evidence of association with the individual domains or in the pleiotropy model in the clinic-based cohorts. However, the magnitude of effect for rs12447050 and its interaction with age in each domain, as well as the significance levels for the main, interaction, and joint pleiotropy tests in the community-based cohorts and the total sample, were nearly identical (Table 3).

Highly suggestive pleiotropy was observed in the community-based cohorts with two SNPs (rs7081658 and rs7070729) located in the USP6NL/ECHDC3 region, an established AD risk locus, for executive function and language (PJoint = 3.54 × 10–7 and PJoint = 8.76 × 10–8, respectively) and for executive function and memory (PJoint = 2.11 × 10–7 and PJoint = 5.09 × 10–8, respectively); rs7070729 was also pleiotropic for language and memory (PJoint = 7.76 × 10–7) (Table S4). There was also suggestive pleiotropy for executive function and language with two SNPs in the AD risk locus WWOX (rs13329990, PJoint = 8.45 × 10–7; rs11862902, PJoint = 9.60 × 10–7).

Pathways involved in neuronal development or signaling, vascular and endocrine systems are related to cognitive domain performance

A total of 28 canonical pathways were significantly enriched for loci associated with pleiotropy for paired domains (Table 4), noting that none of these pathways were specific to the clinic-based cohorts, and no significant pathways were identified in analyses seeded with top-ranked genes in the GWAS for individual cognitive domains. The evidence for approximately 60% (17/28) of these pathways was derived from analyses of the community-based cohorts only. The top-ranked pathway, synaptogenesis signaling, was significantly enriched for genes that emerged from pleiotropy analysis for all three pairs of cognitive domains. Several pathways are related to neuronal development or signaling (e.g., synaptogenesis signaling, synaptic long-term depression, endocannabinoid neuronal synapse, netrin signaling, GABA and glutamate receptor signaling, and calcium signaling), AD-associated vascular risk factors (e.g., type II diabetes and maturity onset diabetes of young signaling, insulin secretion signaling, dilated cardiomyopathy and cardiac hypertrophy signaling, and nitric oxide signaling in the cardiovascular system), and the endocrine system (e.g., G protein-coupled receptor-mediated nutrient sensing in enteroendocrine cells, insulin, corticotropin-releasing hormone, gonadotropin-releasing hormone, androgen, and oxytocin signaling). Details for suggestive pathways (FDR-adjusted P < 0.05) and the number of seed genes selected from each GWAS and pleiotropy analysis are summarized in Tables S5 and S6, respectively.

Table 4 Canonical pathways significantly enriched for top-ranked GWAS genes

Discussion

Genome-wide scans for performance measures in three cognitive domains in two large clinically ascertained and three community-based cohorts revealed GWS associations with four well-established AD loci (BIN1CR1, MS4A6A, and APOE) and eight loci not previously genetically linked to AD or cognitive decline (ULK2CDK14, PURG, LINC02712, LOC107984373, NCALD, PTPRD, and OSGIN1), as well as with GRN which has been associated with AD and several other dementing illnesses [7, 10, 78,79,80]. These findings were based on analyses that leveraged data obtained from one or more cognitive examinations, considered cognitive performance changes over time, and examined genetic effects on individual or pairs of domains. In comparison to previous GWAS of cognitive performance, which were limited to the availability of data for particular NP tests and focused primarily on clinic-based or community-based samples [26,27,28,29], our study utilized harmonized measures that enabled pooling data obtained using multiple NP protocols and considered associations that may be common or unique to differentially ascertained samples.

To our knowledge, this is the first genome-wide pleiotropy study using harmonized cognitive domain scores. Compared to previous conventional GWAS or pleiotropy studies of individual cognitive traits [26,27,28,29], our genetic analysis of harmonized cognitive scores allows combining results from studies using different NP protocols and permits greater opportunities for replication and meta-analyses. This approach has been successfully used in a variety of studies of cognitive aging [81,82,83,84]. A recent study of five preclinical AD cohorts conducted a factor analysis on three domains—general cognitive performance, episodic memory, and executive function—and established a common algorithm for classifying MCI progression across the heterogeneously evaluated samples [85]. Cognitive factor scores derived in an identical fashion as those used in this study have been utilized for a variety of investigations of AD subgroups [81], which linked cognition to imaging [83], neuropathology [84], and genetics [15]. Similarly, they have been used in genetic studies of cognitive resilience to AD [82, 86].

We identified three novel loci that have functional relevance to processes implicated in AD. Dysfunction of the protein encoded by ULK2, unc51 like autophagy activating kinase 2, has been suggested to cause multiple diseases. ULK2 SNPs have been associated with schizophrenia [87], and a ULK2 circular RNA is expressed more than tenfold in a vascular dementia rat model [88]. Lee and colleagues recently demonstrated that amyloid-β 42 oligomer-mediated loss of excitatory synapses in cortical neurons and hippocampal CA1 neurons requires AMPK-mediated activation of ULK2-dependent mitophagy [89]. PTPRD, protein tyrosine phosphatase receptor type D, was previously reported to be associated with AD susceptibility [90]. A recent study identified a significant association of PTPRD with the accumulation of neurofibrillary tangles that was independent of amyloid-β pathology [20]. NCALD encodes a member of the neuronal calcium sensor family of calcium-binding proteins, which mediates signal transduction in response to calcium in neurons. NCALD is downregulated in the AD brain and may play a protective role in hippocampal CA1 and CA3 regions [91, 92]. This observation is consistent with a finding from a study of differentially expressed proteins in rats fed a high-fat diet suggesting that the memory-impairing effects of diet-induced obesity might potentially be mediated by down-regulated NCALD within the hippocampus [93].

We also identified a GWS signal in GRN, the gene that encodes the anti-inflammatory and neurotrophic factor progranulin (PGRN) [94]. GRN mutations are a well-established cause of frontotemporal lobar degeneration (FTLD). More than 60 disease-causing GRN mutations have been identified, accounting for 20% to 25% of familial FTLD cases and about 10% of all FTLD cases [95]. The most significantly associated GRN SNP in our study, rs5848, was found in the 3’-untranslated region, which is predicted to be a microRNA binding site. Rs5848 is the GRN variant most frequently associated with FTLD and is associated with a reduction in PGRN in plasma and cerebrospinal fluid [96, 97]. In addition to FTLD, several studies have shown an association between clinical AD and the rs5848 T allele, which we found to be linked to lower memory performance in both clinic- and community-based cohorts [98]. A recent large GWAS meta-analysis found a GWS association of AD risk with rs5848-T [10]. A recent study examining neuropathological AD correlates showed that rs5848 T allele carriers had a higher frequency of hippocampal sclerosis and TDP-43 deposits, significantly increased tau pathology burden, but showed no specific association with β-amyloid load or AD neuropathological diagnosis [99]. Interestingly, our finding was exclusive to the memory domain, which is affected in early stages of AD, but also commonly affected in hippocampal sclerosis and limbic-predominant age-related TDP-43 encephalopathy (LATE) [100, 101]. Effects were driven by both the SNP’s main and SNP × age interaction effects. This finding provides additional evidence that variation in GRN may be related to neurodegeneration more broadly and that restoring PGRN levels may be an effective way to prevent and treat dementia [102].

Highly suggestive pleiotropy (P < 1 × 10–7) was also found with other established AD risk loci, including USP6NL/ECHDC3 for all three paired cognitive domains and WWOX for executive function and language. ECHDC3, enoyl-CoA hydratase domain containing 3, was previously reported to be associated with AD [7, 10]. ADCY2, adenylate cyclase 2, was reported to be associated with AD-related changes in hippocampal gene expression [103, 104], as well as AD-associated structural changes detected by brain imaging [105]. A recent GWAS reported the association of DAPK2, death associated protein kinase 2, with amyloid deposition in the brain [106], a finding consistent with studies showing that DAPK1 promotes APP phosphorylation and amyloidogenic processing [107]. PLXDC2, plexin domain containing 2, is upregulated with increasing β-amyloid plaque load or Braak stages [108].

There are no established links of OSGIN1, CDK14, and PURG to AD. The product encoded by OSGIN1 is an oxidative stress response protein that regulates cell death and appears to be a key regulator of both inflammatory and anti-inflammatory molecules [109, 110]. CDK14 encodes a protein kinase whose expression is more than two-fold higher in the brain than in any other tissue. However, it has been linked to cancer in various tissues, primarily outside of the brain. Although the function of PURG is unknown, a SNP in this gene showed significant associations in GWAS of cognitive performance and intelligence [111, 112]. The biological significance of the pleiotropic association of memory and language with a chromosome 4 variant located about 170 kb from LOC107984373, which encodes a long non-protein coding RNA, is also puzzling at this time.

Bioinformatic analyses of the top-ranked genes emerging from GWAS implicated several biological pathways related to neuronal development and signaling, AD-associated vascular risk factors, and endocrine pathways. Notably, all of the significant pathways were identified from analyses of findings from the pleiotropy GWAS analyses, especially those supported by the community-based cohorts. Because pathways were constructed using information from well-established metabolic and cell signaling pathways, they tend to reflect more common or shared mechanisms rather than particular or trait-specific mechanisms. Therefore, pleiotropic loci affecting multiple cognitive domains may be more suitable as seed genes for canonical pathways than loci associated with a single domain. Indeed, in the analyses of community-based cohorts, the numbers of seed genes from pleiotropy GWAS were 1.5–1.7 times larger than those from GWAS of individual cognitive domains (Table S6). Considering that AD pathology results in progressive dysfunction in several cognitive domains over time, the majority of our findings, which emerged from analyses of the community-based rather than the clinic-based cohorts, may represent pathways underlying cognitive processes related to AD progression rather than AD risk.

Interestingly, associations with several well-established AD loci, including BIN1, CR1, and MS4A6A, were observed only in the clinic-based cohorts. Lack of replication in the community-based cohorts might be due to the relative paucity of AD cases and the higher likelihood of mixed pathologies. Conversely, the associations with the known AD locus USP6NL/ECHDC3 and novel loci, including ULK2, NCALD, PTPRD, ADCY2, and OSGIN1, were observed only in the community-based cohorts. Lack of replication in the clinic-based cohorts may indicate that these loci are associated with normal age-related cognitive changes rather than an AD process. However, this explanation seems less likely given their previous association with AD risk (USP6NL/ECHDC3) or functional relevance to processes implicated in AD and/or their association with other AD-related endophenotypes (NCALD, ULK2 and PTPRD). Alternatively, community-based cohort-specific findings may indicate that the effects of these genes are age-dependent or detectable when tracked over time. This idea is supported by the observation of the highly significant SNP × age interaction term for these loci, which for ULK2, PTPRD, ADCY2, and OSGIN1 were responsible for the significant joint effect to a much greater extent than the SNP’s main effect.

We employed a joint test that combines the main genetic effects and SNP × age interaction together to increase our power to detect genetic associations. Nonetheless, interpreting a joint test can be challenging, requiring examination of effect estimates for both the main and interaction terms. The contribution of the SNP’s main effect to the joint association for some findings, including the well-established AD loci and several novel ones (e.g., ULK2, CDK14, PURG, LINC02712, and GRN), was much stronger than its interaction with age. This may reflect that these loci are associated with the development rather than the progression of AD. This aligns with the fact that these loci were initially identified using a case-control design. In contrast, particularly for the more novel loci, the contribution of the SNP × age interaction to the joint association was stronger than the main effect. This may reflect that these loci are associated with the progression rather than the development of AD and could explain why they have not been identified previously, as few genetic studies of AD have utilized a longitudinal design.

Of note, all of the GWS pleiotropic associations and half of the GWS single-domain associations involved the memory domain. This is not surprising as prominent memory impairment is the most common cognitive feature in AD. Nonetheless, prominent impairment in other cognitive domains occurs in a reasonable number of AD cases. Those specific loci were implicated for particular cognitive domains may provide syndrome-specific therapeutic targets with an eye toward a precision medicine approach to AD.

Our results also highlight that the biology underlying cognitive performance in older individuals is complex and likely a function of multiple processes including lifelong ability, neurodegeneration and resilience to neurodegeneration. Genetic architecture may be influencing cognition through each of these processes. Without a measure of underlying pathology, disentangling the mechanism by which genes are affecting cognition is difficult. This point applies to both the current study and the large AD GWAS in which most participants received a diagnosis based only upon assessment of cognition in life. Our finding that AD risk is strongly genetically correlated with the factor score for memory but not executive function or language might provide some insight into these processes. GWAS findings for the individual cognitive domains showing that memory was associated only with established AD risk genes. However, all of the novel associations identified in the pleiotropy analysis included memory as part of the paired outcome. These genetic association patterns might argue that our phenotypes for executive function and language could reflect decline from AD (rather than development of AD), underlying cognitive ability and/or cognitive resilience. Future studies that utilize both measures of cognition and underlying pathology will be needed to better disentangle the genetic architecture underlying these different processes that influence cognition.

This study has several limitations. Although we included several large cohorts whose cognitive test data were co-calibrated and harmonized with each other, the sample size was small compared to the previous GWAS of AD risk. In addition, there was a reduction in power for tests of marginal genetic effects or SNP × age interactions because a large portion of the subjects had only one visit (FHS-21.5%, NACC-21.1%, ACT-9.9%, ADNI-13.3%, and ROSMAP-6.0%). The interpretation of our findings based on the joint effect of the SNP and SNP × age interaction is complicated because the identified loci could imply several meanings to cognitive domain functions or AD. Those findings may reflect genetic associations with the development or progression of AD or both, but additional work is needed to address this issue confidently. Further, our model assumes linearity in the cognitive trajectories, but cognitive trajectories at different disease stages may be non-linear. The observed associations with known AD loci provide validation for our modeling approach. Our results are not adjusted for the number of genome-wide scans performed, but the analyses for each cognitive domain and paired cognitive domain are testing separate hypotheses. Correction for analyses conducted separately in the clinic- and community-based cohorts would raise the significance threshold to 2.5 × 10–8, which would render associations for MS4A6A, LINC02712, GRN, LOC107984373, PTPRD, and OSGIN1 as borderline GWS. Another concern is the lack of replication which will require the availability of co-calibrated longitudinally obtained cognitive data from independent samples which are informative for AD. Ongoing phenotype harmonization efforts of the Alzheimer’s Disease Genetics Consortium, Alzheimer’s Disease Sequencing Project, and other studies will likely yield the data necessary for replication testing. Because some of the identified loci have no obvious connection to AD or cognition, further research is required to determine their mechanistic pathways. Finally, datasets from other population groups containing cognitive domain factor scores for adequately powered samples will be needed to extend our findings which were derived from non-Hispanic whites only.

Conclusion

Our results provide some insight into biological pathways underlying processes leading to domain-specific cognitive impairment and AD. The findings may provide a conduit toward a syndrome-specific precision medicine approach to AD. Increasing the number of datasets by harmonizing measures of cognitive performance in other cohorts, as applied in this study, would likely enhance the discovery of additional genetic factors of cognitive decline leading to AD and related dementias.