Background

Late-onset Alzheimer’s disease (AD) is a leading cause of death in the USA, affecting approximately 1 in 10 Americans over the age of 65 years, with prevalence expected to double by 2050 [1]. Heritability estimates for AD range from 58 to 79% [2]; despite this strong genetic component, much of the underlying genetic variance remains to be explained [3]. Although the APOE ε4 allele is the strongest common genetic risk factor for AD [4, 5], dozens of loci have been associated with AD via genome-wide association studies (GWAS) [6, 7]. However, similar to other complex diseases, the vast majority of genetic discoveries for AD have been from GWAS performed in samples with predominantly European ancestry [8]. Much less is known about AD genetics across diverse populations, and in particular African Americans and Hispanic/Latino Americans who have increased risk of AD compared to European Americans [9,10,11].

Admixture mapping is a powerful alternative approach to GWAS for gene-mapping in recently admixed populations. Unlike widely used GWAS approaches that treat genetic ancestry differences as potential confounders in the analysis, admixture mapping leverages genetic ancestry differences [12,13,14,15]. With admixture mapping, regions of the genome with unusual local ancestry patterns relative to genome-wide averages are tested for association with a phenotype [16]. Admixture mapping is most powerful when both disease risk and trait-locus allele frequencies differ across groups, and it can be viewed as a complement to GWAS [17, 18].

Here, we performed genome scans of AD using both GWAS and admixture mapping approaches to identify regions associated with AD in Caribbean Hispanics, an admixed population with European, Native American, and African ancestry [19, 20]. Admixture mapping identified a genome-wide significant association between AD and Native American ancestry on 3q13.11, while GWAS identified no loci reaching genome-wide significance. Transcriptomic studies in samples with European ancestry nominate ALCAM and BBX as candidate protein-coding genes within the significant admixture mapping signal on 3q13.11, supported by the association between genetic variation and gene expression levels as well as differential expression between AD cases and controls. These results underline the power and challenges of leveraging genetic ancestry differences for new insight into the genetic architecture of late-onset AD in multiethnic populations.

Methods

Data

Genotype and phenotype data for 3067 participants in the Columbia University Study of Caribbean Hispanics and Late Onset Alzheimer’s Disease (CU Hispanics) were downloaded through dbGaP (Study Accession number phs000496.v1.p1), described in detail elsewhere [21]. The CU Hispanics study recruited subjects using both familial AD (22%) and sporadic case-control (78%) ascertainment. Subjects were excluded if they had any missing data for sex, AD status, APOE ε2/ ε3/ε4 genotypes, and either age-at-onset of dementia or age-at-last-evaluation.

European (from Utah) and African (Yorubans) samples from HapMap 3 [22] and Native American samples (Colombians, Pima, and Maya) from the Human Genome Diversity Project [23, 24] were used as reference populations. The reference datasets were merged using PLINK (v1.07) [25], resulting in 598,470 common autosomal single-nucleotide polymorphisms (SNPs). Genome coordinates were updated to build NCBI37/hg19 using LiftOver [26] to match the CU Hispanic data. The reference and CU Hispanics datasets were merged, randomly removing reference samples to balance ancestral population representation. Variants with a genotype missing rate > 5%, samples missing > 5% of genotypes, and 502 duplicated samples were excluded. Heterozygosity analysis identified 43 CU Hispanic outliers for both the F coefficient ( > 0.12, mean 0.02 ± 0.03) and heterozygosity rate (< 0.28, mean 0.32 ± 0.01), consistent with previous reports and pedigree documentation of consanguinity [27]. As both our admixture mapping and association tests adjust for genetic relatedness, keeping these samples had minimal impact on results (Additional file 1). The final combined dataset included 294,252 SNPs and 2754 samples: 2565 CU Hispanics plus 63 samples from each reference population. The overall genotyping rate was 0.993.

Genetic relatedness matrix

A genetic relatedness matrix (GRM) was estimated in a recursive manner using the PC-AiR and PC-Relate functions within the GENESIS R package [28,29,30]. The final combined data set was included in these analyses to improve inference of population structure. PC-AiR partitions subjects into unrelated and related sets based on kinship estimates from KING-robust [31], performs principal components (PC) analysis on the set of unrelated subjects, then projects PC values for the related set. PC-Relate adjusted the GRM for the first four PCs derived by PC-AiR, and the PC-AiR and PC-Relate steps were repeated using this adjusted GRM. The final GRM contains kinship coefficients that are robust to the population structure within our sample.

Ancestry proportions

As suggested by an established pipeline [32], the CU Hispanic and reference samples were phased jointly using ShapeIt2 [33] and 1000 Genomes phase 3 haplotypes [34] as reference. Local ancestry estimation was performed using RFMix (v1.5.4) [35]. Local ancestry values were averaged to estimate global European, African, and Native American ancestry proportions.

Admixture mapping

Admixture mapping was performed using a logistic mixed model for the AD phenotype, in which all European, African, and Native American ancestries were tested simultaneously. Admixture mapping was conducted using the GENESIS R package [30] available in Bioconductor [36]. We fit mixed models under the null hypothesis of no genetic association, adjusting for global ancestry proportions and APOE ε2 and ε4 allele dosages as fixed effects and the GRM as a random effect. The association between each admixture linkage disequilibrium (LD) block and the null model was evaluated by a score test. Recent admixture, as such observed in Hispanic/Latinos, creates long-range LD which dramatically reduces the number of independent tests in an admixture mapping genome scan, leading to a less-severe multiple testing correction. Genome-wide significance was defined as P < 5E−05 and suggestive evidence for significance was defined as P < 0.001, as suggested by previous studies of Hispanic populations [37, 38]. We evaluated the suitability of these significance thresholds by extending the method proposed by Shriner et al. 2011 [39] for three ancestral populations. We estimated the effective number of tests for each ancestral population by fitting autoregressive models to the vectors of African, European, and Native American local ancestry dosages in our sample (African: 251.1, European: 210.3, Native American: 281.2) and defined the final effective number of tests as the sum of the two largest values. This Bonferroni-corrected significance threshold of P < 9.39E−05 is slightly less conservative than our original threshold, suggesting it is well-suited for this sample.

Secondary admixture mapping analyses considered the effect of each reference group separately to identify which ancestral population was driving the significant signals. The coefficients of each lead SNP in the most significant LD-block were estimated, taking the allelic dosage of the ancestry driving the signal into account. Manhattan plots were prepared using the qqman R package [40], while regional association plots were generated using LocusZoom [41]. Additional sensitivity analyses assessed the robustness of our findings to age and sex covariate adjustment.

Association testing

SNPs and samples were submitted to the data cleaning procedures described above without the inclusion of reference samples, leaving 931,670 SNPs and 2565 CU Hispanic samples. We conducted the association testing for AD using a logistic mixed model implemented in the GENESIS R package [30]. Using the fitted null model described above, we tested the association between each SNP and the phenotype with a score test. Genome-wide significance was defined as P < 5E−08. Region-specific thresholds within the 3q13.11 locus for significant (P < 6.74E−05) and suggestive (P < 1.35E−05) evidence for association were adjusted for the effective number of tests, estimated by Genetic Type I error calculator [42].

Locus interpretation and gene prioritization

Conditional admixture mapping analyses were performed, applying the original model with further adjustment for allele dosage at SNPs of interest, individually and jointly. LD was estimated by both r2 and D’ using PLINK [25] in a set of 1349 unrelated CU Hispanics. LD plots based on the correlation statistic D’ by reference population were prepared using Haploview [43]. The Ensembl Variant Effect Predictor (v99 [44]) toolset generated SNP-level annotations within regions of interest.

The Accelerating Medical Partnerships for Alzheimer’s Disease (AMP-AD) project has provided a publicly available repository of multi-omic data aimed at finding genetic targets for AD therapeutics. We extracted significant cis expression quantitative trait loci (cis-eQTLs) from a recent AMP-AD study [45] (https://www.synapse.org/#!Synapse:syn17015233), representing RNA-sequencing data generated on brain samples from the Mayo study, Religious Orders Study, Rush Memory and Aging studies, and Mount Sinai Brain Bank study generated across 7 tissues types: cerebellum (N = 261), temporal cortex (N = 262), dorsolateral prefrontal cortex (= 573), inferior gyrus (N = 230), superior temporal gyrus (= 225), frontal pole (N =260), and parahippocampal gyrus (= 225). We extracted evidence for differential gene expression in post-mortem brain tissues between those affected by AD and controls from another AMP-AD study [46] (https://www.synapse.org/#!Synapse:syn11914606), restricted to the meta-analysis results from the random effects model. A false discovery rate (FDR) cutoff of < 0.05 provided by the AMP-AD studies was applied to both the differential gene expression and eQTL results.

The genome is organized into topologically associated domains (TADs) in three-dimensional space, where genes within the same TAD are more likely to be regulated by common cis-regulatory elements and transcription factors. Genes within the same TAD as the haplotypes associated with AD were extracted from the 3D Genome Browser [47] and the human dorsolateral prefrontal cortex data (DLPFC) [48], again using the study-specific FDR < 0.05 as the significance threshold.

Genetic variation and patterns of LD vary across populations, and ideally colocalization analyses should use association and eQTL results representing the same population; unfortunately, large eQTL studies of Caribbean Hispanic populations are unavailable. Colocalization analyses comparing our admixture mapping or association studies are restricted to comparisons with the AMP-AD eQTLs representing samples with primarily European ancestry, which may identify relationships between eQTLs and AD risk shared between these populations [49, 50]. Approximate Bayes factor colocalization was performed using the Coloc package in R (v3.2-1) [50], which computes five posterior probabilities: PP0 = no association with either trait; PP1 = association with trait 1 but not trait 2; PP2 = association with trait 2 but not trait 1; PP3 = association with both traits, two independent causal SNPs; and PP4 = association with trait 1 and trait 2, one causal SNP shared for both traits. The LocusCompareR package in R (v1.0.0) [51] illustrated the correlation between admixture mapping or association results and eQTL data.

Results

Sample description

The CU Hispanics data represented 2565 subjects, where the 1174 cases were affected either by familial AD (22%) or sporadic AD (78%). Age-at-onset ranged from 44 to 100 years while the censoring age among the unaffected controls ranged from 35 to 100 years. The mean age and sex are similar across cases and controls (Table 1). The frequency of the protective APOE ε2 allele [52] is approximately 35% lower among cases, while the well-established risk allele ε4 [4, 5] is almost twice as common among cases than controls (Table 1). Global average ancestry proportion estimates vary widely across samples, from nearly zero to 0.99 per reference population (Fig. 1). Average ancestry proportions are 0.58 ± 0.17 European, 0.33 ± 0.19 African, and 0.09 ± 0.08 Native American ancestry.

Table 1 Sample description
Fig. 1
figure 1

Estimated global ancestry proportions within the Caribbean Hispanics. X-axis: samples sorted by proportion of European ancestry, Y-axis: estimated global ancestry proportion. Colors correspond to reference populations: Blue for African, Purple for European, and Cyan for Native American

Admixture mapping and GWAS

We identified a genome-wide significant association between AD and local ancestry at 3q13.11 (P < 5E−05; Table 2, Fig. 2). The 3q13.11 signal is supported by significant evidence of association across multiple LD-blocks (103.7 to 107.7Mb, min. P = 8.76E-07; Table 2), where the lead SNP rs10933849 is a common variant across the ancestral populations (alternate frequency: 0.56, 0.84, and 0.61 for 1000 Genomes phase 3 Africans, Europeans, and Native Americans, respectively). This region spans five protein-coding genes: ALCAM, CBLB, BBX, CCDC54, and CD47. Secondary analyses indicated that Native American ancestry at the lead SNP of each LD-block was associated with a protective effect against AD risk (OR 0.580.66; P < 3.24E-04; Additional file 2). Greater correlation is observed between 15 SNPs tagging the LD blocks within the 3q13.11 locus in the Native American reference data than in the European or African data (Additional file 3), providing further evidence that the admixture mapping signal between AD and 3q13.11 is driven by a Native American haplotype.

Table 2 Evidence of association between local ancestry and Alzheimer’s disease in the Caribbean Hispanics
Fig. 2
figure 2

Association between Alzheimer’s disease and local ancestry among Caribbean Hispanics. Panel a: the joint European, African, and Native American ancestries admixture mapping analysis. Panels b, c, and d: results from single ancestry admixture mapping analyses for Native American, African, and European ancestries, respectively. Significant and suggestive thresholds represented by red and blue lines, respectively. Loci with significant or suggestive evidence of association with Alzheimer’s disease are highlighted with vertical bars labeled with their chromosomal position

Suggestive evidence of association between local ancestry and AD was observed at six additional loci: 2q22.2, 6q22.31, 8q24.22, 9p21.3, 14q12, and 19p13.3 (P < 0.001; Fig. 2, Table 2). LD-block-specific results for significant and suggestive associations with AD are provided in Additional file 2. Two LD-blocks with European background were responsible for the suggestive signal at 2q22.2, intersecting the gene LRP1B. The suggestive signal on 6q22.31 is driven by the Native American ancestry and was captured by a single LD-block within the TRDN gene. On 8q24.22, we observed three LD-blocks with Native American background driving the signal which spans the ZFAT gene. Two LD-blocks spanning the DMRTA1 gene were responsible for the signal on 9p21.3, driven by the Native American ancestry. The signal on 14q12, driven by the African ancestry, was captured by five LD-blocks implicating ARHGAP5 and AKAP6. Nine LD-blocks within a 1.3Mb region were responsible for the signal on 19p13.3 driven by African ancestry, implicating ABCA7 and dozens of other genes (Table 2). Sensitivity analyses revealed the admixture mapping results are robust to the inclusion of age and sex as covariates (Additional file 4). In contrast, traditional GWAS for AD did not identify any loci reaching genome-wide significance (P < 5E-08; Additional file 5).

Locus interpretation and gene prioritization

Targeted association testing within the 3q13.11 locus found two SNPs significantly associated with AD (rs12494162, P = 2.33E-06; rs1731642, P = 6.36E-05), and 22 SNPs with suggestive evidence of association with AD (P < 1.35E-03; Table 3). The first SNP, rs12494162, falls within an intron of lncRNA DUBR, while rs1731642 is an intergenic variant. These two SNPs, rs12494162 and rs1731642, are not in LD within our data (r2 = 0.003; D’= 0.17) and may represent independent association signals. This is consistent with LocusZoom plots of the admixture mapping and association signals at 3q13.11 using 1000 Genomes Native American estimates of LD (Fig. 3). The lead SNP rs12494162 is in LD with several other SNPs with evidence of association with AD, as expected. In contrast, the lead SNP from the admixture mapping analysis has modest evidence of LD with other SNPs on haplotypes associated with AD, as the admixture mapping signal is driven by differences in ancestry proportions rather than specific genotypes at the locus.

Table 3 Variants with significant or suggestive evidence for association with Alzheimer’s disease within 3q13.11
Fig. 3
figure 3

Patterns of linkage disequilibrium within the admixture mapping (top) and association (bottom) signals at 3q13.11. LocusZoom plots were drawn using the 1000 Genomes Native American estimates of linkage disequilibrium (r2; Nov. 2014). Chromosomal position on the hg19 map is shown on the X-axis, while the Y-axis provides evidence of association with Alzheimer’s disease as the –log10(P) value

Conditional admixture mapping analyses including both rs12494162 and rs1731642 as covariates eliminated the signal at 3q13.11 (P = 0.01), while analyses conditioned on either SNP alone only weakened the signals (Additional file 6), suggesting that admixture mapping and GWAS approaches may be tagging the same underlying variant. We assessed evidence of colocalization between eQTLs identified in DLPFC samples from subjects with primarily European ancestry and the admixture mapping and association signals at 3q13.11, as comparable studies representing Native Americans are unavailable. These analyses can only identify shared genetic architecture between eQTLs shared across populations and our admixture mapping or association results, which may represent fewer than half of eQTLs [49]. The leading eQTL within the 3q13.11 locus falls within a haplotype significantly associated with AD in the admixture mapping analysis (Fig. 4a): rs12629430 is significantly associated with the expression of lncRNA DUBR (Z = −4.47, FDR = 4.9E−04), lincRNA RP11.446H18.1 (Z = −6.22, FDR = 1.0E−07), and lincRNA RP11-446H18.6 (Z = −4.77, FDR = 1.4E−04). Colocalization analyses of the eQTL and admixture mapping signals did not reject the null hypothesis (PP0 = 0.9550). In contrast, the lead SNP from our targeted association testing within 3q13.11, rs12494162, is also an eQTL significantly associated with the expression of lincRNA RP11.446H18.1 (Z =−6.01, FDR = 3.4E−07), and lincRNA RP11-446H18.6 (Z =−4.44, FDR = 5.4E−04) (Fig. 4b). Colocalization analyses are not significant, but suggest association with both AD and gene expression here and weakly favor the model of independent SNPs driving these association (PP3 = 0.5070) rather than one shared SNP (PP4 = 0.4130).

Fig. 4
figure 4

Colocalization between cis-eQTLs and admixture mapping (a) and association results (b) at 3q13.11. a Colocalization results between significant admixture mapping haplotypes and eQTLs from dorsolateral prefrontal cortex (DLPFC) data. The color scale depicts extent of linkage disequilibrium (LD) with the lead cis-eQTL (rs12629430, purple diamond) in the 1000 Genomes Native American sample, which falls within a significant haplotype block. b Colocalization results between association test of 3q13.11 and cis-eQTLs from DLPFC data. The color scale depicts the amount of LD with the lead SNP from the association tests (rs12494162, purple diamond), based on 1000 Genomes Native American samples. The lead SNP rs1249162 is a significant cis-eQTL in the DLPFC data. Note: the difference in eQTL plots between panels a and b are due to differences in SNP marker density between admixture mapping and association testing

Within the 3q13.11 locus (chr3:103,747,624-107,725,831), we prioritized candidate protein-coding genes which fell either within one of the 15 LD-blocks associated with AD or within an intersecting TAD using the following features in transcriptomic studies representing European ancestry: (1) genes in which expression in brain tissue is significantly associated with cis-eQTL within the region of interest and (2) genes which are differentially expressed in the brain between AD cases and controls. The 3q13.11 region of interest spans five protein-coding genes: ALCAM, BBX, CBLB, CCDC54, and CD47 while four additional genes fall within the same TAD as BBX: IFT57, HHLA2, MYH15, and KIAA1524 (Additional file 7). A recent transcriptomic study [45] of 1694 brain samples identified 369 significant cis-eQTLs for IFT57, 182 significant cis-eQTLs for ALCAM, 118 significant cis-eQTLs for CBLB, 47 significant cis-eQTLs for CD47, 22 significant cis-eQTLs for BBX, and 6 significant cis-eQTLs for MYH15 (FDR < 0.05). The strongest cis-eQTL per gene is reported in Table 4, with all cis-eQTLs reported in Additional file 8. Another transcriptomic study [46] including an overlapping sample set of 2114 brain samples representing 478 cases and 300 controls identified significant evidence for differential gene expression in AD for both ALCAM (Z = 2.75, FDR = 2.76E-02) and BBX (Z = 3.73, FDR = 1.84E-03) (Additional file 9). While variation in the 3q13.11 region is associated with expression levels of ALCAM, BBX, CBLB, CD47 IFT57, and MYH15 in the brain, only ALCAM and BBX were significantly differentially expressed between AD cases and controls.

Table 4 Significant cis expression quantitative loci (eQTLs) for candidate genes within 3q13.11 region of interest

Discussion

Admixture mapping of AD within a Caribbean Hispanic sample identified one genome-wide significant signal on 3q13.11 (P = 8.76E-07) and six unique suggestive signals at 2q22.2, 6q22.31, 8q24.22, 9p21.3, 14q12, and 19p13.3. The admixture mapping signal on 3q13.11 spanned 15 haplotype blocks, where the Native American ancestry is associated with reduced risk of AD. Association between the Native American ancestry and reduced risk of AD has previously been reported [53, 54]. Suggestive evidence of association between the 3q13.11 locus and AD has recently been reported in an African American GWAS involving nearly three times the sample size as our study [55], demonstrating the effectiveness of the admixture mapping approach as a complement to GWAS.

While admixture mapping provides insights into the genetic basis of disease in multiethnic populations, integration of AD transcriptomics allowed us to nominate candidate genes within 3q13.11. ALCAM and BBX, the genes with significant evidence for both brain eQTLs and differential expression between AD cases and controls, both have robust support in the literature for a functional relationship to AD. Proteomic studies suggest ALCAM, which plays a role in neuron-neuron adhesion and neurite growth networks, is dysregulated during the progression of AD [56]. ALCAM is also involved in blood-brain barrier disruption and T cell-dependent neurodegeneration [57], biological pathways implicated in the progression of AD [58]. Furthermore, ALCAM is a target gene of miR-142 which is significantly upregulated in the AD brain [59, 60]. BBX is differentially expressed in the entorhinal cortex and hippocampus and appears to play a role in the crosstalk between the peripheral blood and the central nervous system [61]. Multiple studies have shown that BBX is differentially expressed in the AD brain [61, 62], while another implicated BBX as a candidate Master Regulator responsible for AD progression [63].

Each of the loci harboring suggestive admixture mapping signals have also been previously associated with AD risk and/or pathology. The signal on 19p13.3 is driven by African ancestry and spans ABCA7, a gene in which coding changes have been associated with risk of AD in both African American and European American samples [64,65,66,67]. LRP1B within the 2q22.2 signal has been implicated in the production and presentation of amyloid beta (Aβ) [68], while multiple LRP1B haplotypes are associated with risk of developing AD in studies representing European Americans [69] and Caribbean Hispanics [70]. Variants on 14q13.1 near NPAS3 have been associated with AD biomarkers [71] and general cognitive function [72]. Variants in ZFAT on 8q24.22 have been associated with extreme longevity [73] and cerebrospinal fluid tau/Aβ42 levels [74]. Within 6q22.31, TRDN variants have implicated in cerebral Aβ deposition in APOE ε4 non-carriers [75] and rate of cognitive decline in AD [76]. Finally, 9p21.3 has previously been linked to AD risk [77], and variants within the region have been associated with both vascular dementia and AD [78].

Limitations

Our study has several limitations. Admixture mapping identifies regions associated with a given trait which must then be fine-mapped to identify the underlying risk variants. Colocalization analysis is not well powered in our study due to the poor representation of non-European populations in large eQTL data sets, as the genetic architecture of eQTLs can be ancestry specific [49, 79]. Fine-mapping analyses of whole-genome sequence data collected in this sample may allow the detection of the variants responsible for the admixture mapping signals. Publicly available datasets comparable in size or Native American ancestry proportions suitable for replication analyses are not available. Ongoing efforts, including AMP-AD and the Alzheimer’s Disease Sequencing Project, will provide data which may assist these efforts in the future.

Conclusions

Most AD GWAS have represented samples with European ancestry, and alternative strategies may detect additional genetic variants influencing AD in multiethnic populations. Caribbean Hispanics, despite being more likely to be diagnosed with AD [80, 81], have been underrepresented in AD genetics studies [82]. We illustrated the power of admixture mapping for detecting loci associated with AD in a Caribbean Hispanic sample, provided robust evidence for this association, and nominated several candidate genes with orthogonal functional and statistical evidence for a role in AD. Further investigation of these loci and nominated genes could lead to a better understanding of the genetic heterogeneity of AD in populations with significant Native American ancestry.