Introduction

The apolipoprotein E (APOE) gene is the strongest Alzheimer’s disease (AD)-associated genetic factor [1,2,3], which can explain 13.4% of phenotypic variance and 25.2% of genetic variance of AD [4]. Minor alleles of the exonic single-nucleotide polymorphisms (SNPs) rs429358 and rs7412 in the APOE gene encode the ε4 and ε2 alleles, respectively. The ε2 allele is considered as a protective factor against AD, whereas the ε4 allele is advocated to be a major variant predisposing to AD [3, 5].

The APOE gene encodes a lipoprotein mainly involved in lipid transfer and metabolism. Nevertheless, its functional impacts are not limited to lipid profile alterations and related vasculopathies [6]. The APOE involvement in AD pathogenesis has been widely studied, revealing various molecular and biological processes differentially impacted by different APOE alleles. For instance, the ε4 allele has been linked to increased production and decreased clearance of β-amyloid, stress-mediated increased tau hyperphosphorylation, accelerated cortical atrophy (e.g., in the medial temporal lobe), baseline neuronal hyperactivity (e.g., in the hippocampus), reduced cerebral glucose metabolism, damaged synaptic structure and function, increased cytoskeletal and mitochondrial dysfunction, and abnormal hippocampal neurogenesis [7].

Despite strong associations between APOE and AD, neither the ε2 nor ε4 allele is considered as a causal factor for AD development [5, 8,9,10]. Addressing the mechanisms of actions of the ε2 and ε4 alleles is essential for understanding AD pathogenesis and AD risk assessment. The complex regional interactions and haplotype structures in the APOE locus (19q13.3) have been emphasized by a growing body of studies [11,12,13,14,15,16,17,18,19]. These studies indicate the potential roles of nearby polymorphisms in modulating the impacts of the APOE alleles on AD risks in the form of haplotypes and combinations of genotypes (called compound genotypes). The analyses of haplotypes leverage the idea that AD can be affected by haplotypes driven by genetic linkage between nearby SNPs [20]. The functional linkage may drive, however, compound genotypes consisting of not only local but also distant variants [21].

In this study, we used a comprehensive approach to examine intra- (cis-acting) and inter- (trans-acting) chromosomal modulators of the impacts of the APOE rs7412 or rs429358 SNPs on the AD risk in the ε4- or ε2-negative sample. We leveraged samples of the AD-affected (N = 6,136) and unaffected (N = 10,555) subjects from five studies: (i) to perform a comparative analysis of LD between rs7412 or rs429358 and other autosomal SNPs in the human genome in the AD-affected and unaffected subjects, (ii) to examine AD risks for carriers of compound genotypes comprised of rs7412 or rs429358 and the identified intra- and inter-chromosomal SNPs in LD with them, and (iii) to identify biological functions and diseases enriched by genes harboring these SNPs.

Methods

Study participants

We used data on subjects of European ancestry from (Table S1): three National Institute on Aging (NIA) Alzheimer’s Disease Centers data (ADCs) from the Alzheimer’s Disease Genetics Consortium (ADGC) initiative [22], whole-genome sequencing (WGS) data from the Alzheimer’s Disease Sequencing Project (ADSP-WGS) [23, 24], Cardiovascular Health Study (CHS) [25], Framingham Heart Study (FHS) [26, 27], and NIA Late-Onset Alzheimer’s Disease Family Based Study (LOAD FBS) [28]. The ADSP-WGS’s subjects who were also present in other datasets were excluded to make datasets independent. The APOE genotypes were either directly reported by original studies (ADGC, ADSP-WGS, FHS) or were determined based on the rs429358 and rs7412 genotypes (CHS and LOAD FBS). The diagnoses of AD cases in the five analyzed datasets were mainly based on the neurologic exams [29, 30], and the AD status was reported either directly (ADGC, ADSP-WGS, FHS, LOAD FBS) or in the form of ICD-9 (International Classification of Disease codes, ninth revision) codes (CHS).

Genotype data and quality control (QC)

We used whole-genome sequencing (ADSP-WGS) and genome-wide data from different array-based platforms (ADGC, CHS, FHS, LOAD FBS). SNPs were first imputed to harmonize them across analyzed datasets [31]. Low-quality data were excluded using PLINK [32] as follows: (1) SNPs and subjects with missing rates > 5%, (2) SNPs with minor allele frequencies (MAF) < 5%, (3) SNPs deviated from Hardy–Weinberg with P < 1E-06, and (4) SNPs, subjects, and/or families with Mendel error rates > 2% (in ADSP-WGS, FHS, and LOAD FBS which include families). In addition, imputed SNPs with r2 < 0.7 were filtered out (ADGC, CHS, FHS, LOAD FBS). Selecting SNPs presented at least in one study resulted in a set of 1,645,025 SNPs for the analysis.

Two-stage LD analysis

Design

Our analyses were performed separately in stratified samples obtained by dividing each dataset into four groups based on the APOE genotypes and AD status. First, we determined ε4-negative (ε2ε2, ε2ε3, and ε3ε3 genotypes) and ε2-negative (ε4ε4, ε3ε4, and ε3ε3 genotypes) subsamples. Then, each subsample was divided into AD-affected and unaffected groups (herein referred to as AD and NAD groups, respectively). We evaluated LD between the APOE rs7412 or rs429358 SNP and each SNP in the genome in two stages.

Stage 1: LD analysis in individual and pooled datasets

We examined LD (i.e., r statistics) using the haplotype-based method [33,34,35] in each of the four selected subsamples in each dataset individually and combined. The statistically significant LD estimates were determined using a conservative chi-square test χ2 = r2n [35], where n is the number of subjects rather than gametes to address the uncertainty in inferring haplotypes from unphased genetic data [16, 18, 36, 37]. The variances of the r statistics were calculated using the asymptotic variance method detailed in [37]. The LD analysis was performed using haplo.stats r package [38].

Stage 1 provided two sets of SNPs in LD with the APOE SNPs in each subsample. The first set was generated following the discovery-replication strategy (herein referred to as replication set). In this case, SNPs were selected if their LD with the APOE SNP attained: (1) genome-wide (P < 5E-08) or suggestive-effect (5E-08 ≤ P < 5E-06) significance in any of the five datasets, which was considered as a discovery set, and (2) Bonferroni-adjusted P < 0.0125 (= 0.05/4, where 4 is the number of potential replication sets) in at least one of the other four datasets [31]. The second set included SNPs in significant LD with the APOE SNPs at genome-wide or suggestive significance in the pooled samples of all five datasets that were not in the replication set.

Stage 2: group-specific LD

We examined whether SNPs identified in stage 1 had group-specific LD by contrasting r between pooled AD and NAD groups, Δr = rAD-rNAD, using a permutation test [39, 40]. Significant Δr indicated SNPs in group-specific LD with rs7412 or rs429358. Bonferroni-adjusted thresholds, accounting for the number of tested SNPs, were used to identify significant findings.

Analysis of the AD risk

For each group-specific SNP, survival-type analysis was performed to examine the impact of a compound genotype variable (CompG) on the AD risk. The CompG included four compound genotypes comprised of rs7412 or rs429358 genotypes and genotypes of a group-specific SNP (Table 1).

Table 1 Compound genotype constructed based on the genotypes at rs7412 or rs429358 and the identified group-specific SNPs

We fitted the Cox regression model (coxme and survival R packages [41, 42]), considering the age at onset of AD as a time variable. We used sex, the top five principal components of genetic data and ADC cohorts (in ADGC) as fixed-effects covariates, and family IDs (LOAD FBS, FHS, ADSP-WGS) as a random-effects covariate. The results from five datasets were combined through inverse-variance meta-analysis using GWAMA package [43]. The CompG1 compound genotype was the reference factor level. We used a chi-square test with one degree of freedom [44] to estimate the significance of the difference between the effect sizes for CompG3 and CompG4:

$${\chi }^{2}= \frac{{\left({b}_{CompG3}-{b}_{CompG4}\right)}^{2}}{{se}_{CompG3}^{2}+ {se}_{CompG4}^{2}}$$

where bCompG3 (seCompG3) and bCompG4 (secompG4) are the beta coefficients (standard errors) corresponding to the CompG3 and CompG4 genotypes in the Cox model, respectively. Significant findings were identified at the Bonferroni-adjusted levels correcting for the numbers of ε2- and ε4-associated group-specific SNPs.

Functional enrichment analysis

The Database for Annotation, Visualization and Integrated Discovery (DAVID) [45] and Metascape [46] web tools were used to identify gene-enriched REACTOME pathways [47] and DisGeNET diseases [48]. The analysis was performed for genes harboring SNPs in group-specific LD with rs7412 or rs429358 separately. We used false discovery rate (FDR) adjusted significance cut off at PFDR < 0.05 [49] to identify significantly enriched terms by two or more genes.

Results

SNPs in LD with rs7412 (APOE ε2 allele)

In stage 1, we found that 306 SNPs mapped to 27 loci were in LD with rs7412 at P < 5E-06 in the AD group (21 SNPs in 9 loci, Table S2), the NAD group (198 SNPs in 20 loci, Table S3), or both AD and NAD groups (87 SNPs, all in the APOE locus, Table S4). Of them, we identified LD of rs7412 with 58 SNPs not in the APOE locus (or other loci on chromosome 19) in the AD (19 SNPs in 8 loci) or NAD (39 SNPs in 19 loci) groups. For most SNPs, 219 of 306, the magnitudes of LD (i.e., |r|) were smaller in the pooled AD than NAD group (181 of 248 SNPs in the APOE locus and 38 of 58 inter-chromosomal SNPs). We also observed that the r signs were the same in these two groups for 272 of 306 SNPs.

In stage 2, we found 24 SNPs (Table S5) having group-specific LD with rs7412 at a Bonferroni-adjusted significance P < 1.63E-04 (= 0.05/306). Of them, 16 SNPs were mapped to 6 non-APOE loci. All of them were identified in the pooled sample of either the AD (14 SNPs) or NAD (2 SNPs) group. LD estimates for 14 of these 16 SNPs were characterized by opposite signs of r in these groups (Fig. 1). Also, 15 of them had larger magnitudes of r in the AD group than NAD group. The remaining 8 SNPs were in the APOE locus, of which rs11669338 (NECTIN2) attained significance only in NAD group, whereas all the others were significant in both groups. All 8 SNPs had the same signs of r in the AD and NAD groups, whose magnitudes were smaller in the AD than NAD group (Fig. 1).

Fig. 1
figure 1

Linkage disequilibrium r between the identified group-specific SNPs and rs7412 in the ε4-negative sample of all five datasets combined. The x-axis shows SNP identifiers, genes harboring these SNPs, and chromosomes. Red boxes: Alzheimer’s disease-affected group (AD). Blue boxes: Alzheimer’s disease-unaffected group (NAD). The vertical lines show 95% confidence intervals

SNPs in LD with rs429358 (APOE ε4 allele)

In stage 1, we found that rs429358 was in LD with 801 SNPs (143 loci) at P < 5E-06 in the AD group (301 SNP in 73 loci, Table S6), the NAD group (351 SNP in 81 loci, Table S7), or both AD and NAD groups (149 SNP; all in the APOE locus, except 2 SNPs, Table S8). In the AD and NAD groups, we identified LD of rs429358 with 159 (72 loci) and 344 (80 loci) SNPs not in the APOE region, respectively, totaling 503 SNPs. Of all 505 SNPs (154 loci) not in the APOE locus in AD, NAD, and AD&NAD groups, one locus harboring FXYD5 and FAM187B genes (11 SNPs, NAD group) was on chromosome 19, and the other 494 SNPs (153 loci) were not on chromosome 19. The LD magnitudes were smaller in the pooled AD than NAD group for 370 of 801 SNPs (26 of 296 SNPs in the APOE locus and 344 of 505 SNPs in the non-APOE loci). The r signs were the same in these two groups for 711 of 801 SNPs.

In stage 2, we identified 57 SNPs with group-specific LD at a Bonferroni-adjusted significance P < 6.24E-05 (= 0.05/801). As seen in Table S9, 17 of 57 SNPs are mapped to 11 non-APOE loci. All of them were identified in the pooled sample of either the AD (10 SNPs) or NAD (7 SNPs) group. The magnitudes of r were larger in the pooled AD than NAD sample for SNPs whose significant LD was identified in the AD group and vice versa. The r signs for 13 of these 17 SNPs were opposite in these AD and NAD samples. The other 40 SNPs were located in the APOE locus. Magnitudes of r for all SNPs, except rs769449 (APOE), were larger in the pooled AD than NAD sample. For all SNPs, except rs11083767 (EXOC3L2), the r signs were the same in these AD and NAD samples (Fig. 2).

Fig. 2
figure 2

Linkage disequilibrium (LD) r between the identified group-specific SNPs and rs429358 in the ε2-negative sample of all five datasets combined. (A) LD for inter-chromosomal SNPs, i.e., SNPs not on chromosome 19. (B) LD for intra-chromosomal SNPs. The x-axis shows SNP identifiers, genes harboring these SNPs, and chromosomes in A. Red boxes: Alzheimer’s disease-affected group (AD). Blue boxes: Alzheimer’s disease-unaffected group (NAD). The vertical lines show 95% confidence intervals

AD risk for carriers of compound genotypes

We performed Cox regression analysis to examine the impact of compound genotypes comprised of a group-specific SNP and either rs7412 (Tables 2 and S10, Fig. 3A) or rs429358 (Tables 2 and S11, Fig. 3B) on the AD risk. An advantage of using compound genotypes is that we can explicitly examine the effect of a minor allele of a group-specific SNP independently of the effect of the ε2 or ε4 allele (CompG2), the impact of the ε2 or ε4 allele independently of the minor allele of that SNP (CompG3), and the combined effects of these minor alleles (CompG4) in the same model with the same reference genotype (CompG1) (Table 1).

Table 2 Bonferroni-adjusted significant results from the survival-type meta-analysis of compound genotype (CompG) associations with Alzheimer’s disease risk using SNPs in group-specific LD with rs7412 or rs429358
Fig. 3
figure 3

The results of the meta-analysis of the associations of compound genotypes comprised of SNPs (shown on the x-axis) in group-specific linkage disequilibrium with (A) rs7412 in the ε4-negative sample or (B) rs429358 in the ε2-negative sample with the Alzheimer’s disease risk. CompG2 (green) indicates ε3ε3 subjects carrying at least one minor allele of the SNP; CompG3 (red) denotes (A) ε2 or (B) ε4 carriers having major allele homozygotes of the SNP; CompG4 (blue) indicates (A) ε2 or (B) ε4 carriers having at least one minor allele of the SNP. CompG1 indicating the ε3ε3 subjects carrying major allele homozygotes of the SNP was the reference. Black vertical lines show 95% confidence intervals (negative direction for rs769449 was truncated for better resolution). The x-axis shows SNP identifiers, genes harboring these SNPs, and chromosomes. One asterisk (*) indicates nominally significant differences in the effects between CompG3 and CompG4 at (A) 2.08E-03 ≤ P < 0.05 and (B) 8.77E-04 ≤ P < 0.05. Two asterisks (**) indicate Bonferroni-adjusted significance in those differences at (A) P < 2.08E-03 and (B) P < 8.77E-04. No asterisk indicates non-significant differences in (A). B shows only 17 group-specific SNPs for which the differences in the effects between CompG3 and CompG4 attained P < 0.05

AD risk for carriers of 24 rs7412-bearing compound genotypes ( Tables 2 and S10, Fig.  3 A )

Our analysis showed that none of eight CompG2 genotypes bearing SNPs from the APOE locus attained Bonferroni-adjusted significance PBε2 = 2.08E-03 (= 0.05/24), although rs405509 minor allele was beneficially associated with AD, independently of ε2, at nominal significance P = 0.0238. In contrast, six of 16 CompG2 genotypes comprised of rs7412 and non-APOE locus SNPs were beneficially associated with AD at the nominal significance (PBε2 ≤ P < 0.05). For one CompG2, we observed beneficial association of rs2884183 minor allele (11q22.3, DDX10) with AD at P < PBε2 independently of the ε2 allele.

All CompG3 genotypes were beneficially associated with AD (although non-significantly for rs11668861) because of the leading role of the ε2 allele and the lack of minor alleles of the group-specific SNPs. Also, regardless of the significance, all CompG4 genotypes were beneficially associated with AD risk, with 10 of them (seven in the APOE locus) reaching P < PBε2. For all 16 group-specific inter-chromosomal SNPs, the effects for CompG4 were smaller in magnitude than those for CompG3 either at the nominal (12 SNPs) or P < PBε2 (four SNPs) significance (Fig. 3A).

AD risk for carriers of 57 rs429358-bearing compound genotypes ( Tables 2 and S11, Fig.  3 B )

We found that one of 40 intra-chromosomal CompG2 genotypes comprised of rs483082 (APOC1) and rs429358 was adversely associated with AD risk independently of the ε4 allele at Bonferroni-adjusted significance PBε4 = 8.77E-04 (= 0.05/57). None of 17 CompG2 with inter-chromosomal SNPs attained P < PBε4.

Each of 57 CompG3 and CompG4 genotypes was adversely associated with the AD risk. None of the differences in the effects between them attained P < PBε4 for inter-chromosomal SNPs. In contrast, we identified seven (PBε4 ≤ P < 0.05) and 10 (P < PBε4) differences in the effects between CompG3 and CompG4 for SNPs within the APOE locus (Fig. 3B).

Biological functions and diseases

Our analysis was performed for 11 and 19 genes harboring SNPs in group-specific LD with ε2-encoding rs7412 and ε4-encoding rs429358, respectively. We found that 7 and 4 REACTOME pathways were enriched at P < 0.05 using genes from the ε2 (Fig. S1) and ε4 (Fig. S2) sets, respectively. Four of them, i.e., “plasma lipoprotein assembly,” “plasma lipoprotein clearance,” “NR1H3 and NR1H2 regulate gene expression linked to cholesterol transport and efflux,” and “NR1H2 and NR1H3-mediated signaling,” were enriched in both ε2 and ε4 sets. Three pathways, however, were ε2-specific, including “cell–cell junction organization,” “plasma lipoprotein assembly, remodeling, and clearance,” and “cell junction organization.” There were no enriched ε4-specific pathways.

Disease annotations (Tables S12 and S13) included 14 terms that were enriched at PFDR < 0.05 by both the ε2 and ε4 gene sets. They were mainly related to neurological diseases (e.g., AD and other dementia phenotypes, memory performance, mild cognitive disorder, and primary progressive aphasia), serum lipid traits (e.g., dyslipoproteinemias, serum low-density lipoprotein (LDL) cholesterol measurement, and serum total cholesterol measurement), serum albumin measurement, and C-reactive protein measurement.

Seven terms were only enriched in the ε4 set at PFDR < 0.05 (Table S13) which included mental deterioration, atherogenesis, triglycerides measurement, and high-density lipoprotein measurement as well as multiple hematological and immune system-related terms (i.e., autoantibody measurement, acute monocytic leukemia, and peripheral T-cell lymphoma).

Discussion

Our comprehensive approach examining intra- and inter-chromosomal modulators of the impacts of the APOE rs7412 or rs429358 SNP encoding the ε2 or ε4 allele on the AD risk provided four insights.

First, we identified 306 (27 loci) and 801 (143 loci) SNPs in LD with rs7412 and rs429358, respectively, at genome-wide (P < 5E-08) or suggestive-effect (5E-08 ≤ P < 5E-06) significance in AD, NAD, or both groups. Of them, 58 (27 loci) and 505 (154 loci) SNPs were not on APOE locus, indicating potential inter-chromosomal modulators of the impacts of the ε2 or ε4 allele on the AD risk.

Second, among these SNPs, we found significant differences in LD between AD and NAD groups for 24 (16 inter-chromosomal SNPs in 6 loci) and 57 (17 inter-chromosomal SNPs in 11 loci) SNPs with rs7412 and rs429358, respectively, at the Bonferroni-adjusted significance level (Figs. 1 and 2, and Tables S5 and S9). This finding strongly supports modulating roles of the intra- and inter-chromosomal SNPs on the impacts of the ε2 or ε4 allele on the AD risk, predominantly tailored to either AD-affected or unaffected subjects.

Third, Cox regression analysis identified Bonferroni-adjusted associations of minor alleles of rs2884183 (11q22.3, DDX10) and rs483082 (19q13.32, APOC1) with decreased and increased AD risk independently of the ε2 and ε4 alleles, respectively (Table 2).

Fourth, Cox regression analysis revealed that the beneficial and adverse effects of the ε2 and ε4 alleles, respectively, on the AD risks were significantly modulated by other SNPs, and that this modulation was fundamentally different for these alleles. Specifically, the beneficial effect of the ε2 allele was decreased by minor alleles of all 16 group-specific inter-chromosomal SNPs (with a significant decrease at Bonferroni-adjusted level for variants mapped to JADE2 and SDK2 genes) (Fig. 3A). In contrast, the adverse effect of the ε4 allele was significantly modulated by ten APOE locus (intra-chromosomal) SNPs; the ε4 impact was weakened by minor alleles of four SNPs mapped to TOMM40 and APOE genes and major alleles of six SNPs mapped to NECTIN2, TOMM40, APOE, and APOC1 genes (Fig. 3B).

The APOE locus-specific LD patterns corroborated our previous findings observed for SNP pairs [18] and triples [17]. However, according to the GWAS catalog [50], none of the identified 33 inter-chromosomal group-specific SNPs has been associated with AD or AD-related pathologies (e.g., amyloid plaque) in previous GWAS at genome-wide or suggestive significance. Rs1884507 (ZFP64, in LD with rs7412) and rs12139692 (NEGR1, in LD with rs429358) were associated with triglycerides [51] and intelligence [52], respectively, at P < 5E-08. Other SNPs mapped to FRMD4A [53] and NEGR1 [54] have been previously associated with AD at genome-wide and suggestive significance, respectively. In addition, several SNPs mapped to the JADE2, FRMD4A, DDX10, SDK2, ZFP64, TSHZ2, ZDHHC14, NEGR1, and SLC5A8 genes, in interaction with SNPs in the other non-APOE-locus genes, were associated with AD-related brain pathologies such as diffuse amyloid plaque, PHF-tau, and neurofibrillary tangles at P < 5E-08 [55]. Also, an IL1RAP variant was previously associated with amyloid plaque accumulation rate at P < 5E-08 [56]. Additionally, SNPs mapped to JADE2, ELAVL2, and TSHZ2 have been associated with educational attainment [57] and those mapped to ELAVL2 and NEGR1 with intelligence and general cognitive ability [58, 59].

Next, we discuss JADE2 and SDK2 genes harboring inter-chromosomal SNPs, which significantly modulate the effects of the ε2 allele on AD risk (Table 2). JADE2 is involved in ubiquitination of histone demethylase LSD1 [60] and may play roles in the LSD1-mediated regulation of neurogenesis and myogenesis [61, 62]. LSD1 is required for neuronal survival and was implicated in tau-induced neurodegeneration in AD and frontotemporal dementia [63, 64]. Additionally, JADE2 (alias PHF15) may regulate the microglial inflammatory response [65].

SDK2 is involved in lamina-specific synaptic connections which are essential to form neuronal circuits in retina that detect motion [66]. Visual impairments including motion detection abnormalities have been reported in AD [67] and Huntington’s disease [68]. Also, visual working memory (i.e., object identification and location recall) was previously associated with the ε4 allele and β-amyloid accumulation [69].

We also highlight DDX10 gene harboring rs2884183, which is associated with AD risk independently of ε2 (Table 2). The RNA helicase DDX10 affects ribosome assembly and modulates α-synuclein toxicity [70]. α-Synuclein may synergistically interact with β-amyloid and tau protein to promote their accumulation [71] and may be involved in the pathogenesis of AD in addition to synucleinopathie (e.g., Parkinson’s disease) [72, 73]. DDX10 may also affect ovarian senescence [74].

Our enrichment analysis of biological functions (Figs. S1 and S2) suggested that group-specific LD with rs7412 or rs429358 entails SNPs in genes, which are involved in lipid and lipoprotein metabolism. Additionally, LD with rs7412 entails SNPs in genes, which may contribute to cell junction organization. These biological processes have been implicated in AD pathogenesis [31, 75,76,77,78,79]. The disease enrichment analysis (Tables S12 and S13) mostly highlighted the enrichment of AD, dementia phenotypes, and other neurological diseases as well serum lipid traits in both the ε2 and ε4 gene sets. In addition, multiple lipid traits and neurological and immune system-related disorders were enriched in the ε4 gene set.

Investigating the impacts of group-specific SNPs on gene expression revealed that several SNPs in LD with rs7412 (Table S5), including rs11668861 (NECTIN2), rs6021874, rs6021877, rs2426435, and rs1884507 (ZFP64), are in LD (P < 0.0001 in the CEU population of Utah Residents with Northern and Western European Ancestry [80]) with expression quantitative trait loci (eQTLs) whose minor alleles increase NECTIN2 and ZFP64 expressions in the brain tissue (Table S14). Also, among SNPs in group-specific LD with rs429358 (Table S9), SNPs mapped to CLEC12A (rs611819, rs479624, rs478829, rs2760953, rs526157, and rs2961542) and NECTIN2 (rs416041, rs11668861, rs6859, rs406456, and rs3852860) are in LD (P < 0.0001 [80]) with eQTLs altering the expressions of these two genes. In addition, rs4803770 and rs71352239 (APOC1P1) are themselves eQTLs for this gene whose minor alleles decrease APOC1P1 expression in the brain tissue (Table S14) [81]. In addition, the transcription factor-binding sites (TFBS) enrichment [46] shows that that JADE2, ELAVL2, FRMD4A, and APOC1 genes (harboring ε2 group-specific SNPs) have a common TFBS motif corresponding to RXRB within ± 2 kb of their transcription starting sites (P < 2.00E-06 and PFDR < 5.01E-03) [82]. Also, TMEM125, DNMT3A, ZDHHC14, and BCL3 genes (harboring ε4 group-specific SNPs) share a TFBS motif corresponding to SP3 within ± 2 kb of their transcription starting sites (P < 1.58E-05 and PFDR < 2.51E-02) [82].

Despite the rigor, this study has limitations. The first is that GWAS datasets do not provide phased genetic data, and therefore, probabilistic estimates of haplotypes may adversely impact the power of LD analyses. Second, due to the small frequency of the ε2 allele in the general population, the LD analysis of rs7412 may not have optimal statistical power, particularly in the AD-affected group because of the protective role of the ε2 allele against AD. Third, because genotypes were available from WGS in ADSP and genome-wide arrays in the other datasets, we imputed SNPs to harmonize them across all five datasets. Imputation generally results in less accurate genotype calls compared with WGS, particularly in genomic regions with low coverage on the arrays. Low imputation quality may adversely impact the results of the analyses. Although we excluded SNPs with imputation quality of r2 < 0.7 to offset the impacts of potential inaccuracies, replication of the results using directly genotyped SNPs could add robustness to our findings. Fourth, while the Cox regression analysis of genetic associations using AAO of a complex trait provides higher statistical power than the logistic regression analysis of the case–control status [83], we acknowledge limited abilities in determining exact AAO due to slow progression of AD. For instance, AD is not usually diagnosed when the brain pathologies start to develop years before clinical manifestations. Fifth, the small number of genes may affect the accuracy of the functional enrichment analysis. Finally, further stratifying of the AD group based on the pathological information on AD sub-phenotypes would provide valuable insights into the genetic heterogeneity of AD. Also, including subjects with mild cognitive impairment (MCI) in LD analyses as a separate stratum may help to identify APOE allele-dependent genetic factors contributing to MCI progression to AD. Such additional stratifications would require large datasets with more comprehensive clinical and pathological data.

Conclusion

Our comprehensive analysis provides compelling evidence that intra- and inter-chromosomal variants can modulate the impacts of the ε2 and ε4 alleles on the AD risk. The survival-type analysis robustly shows predominant modulating roles of the inter-chromosomal SNPs for the ε2 allele and the APOE-region SNPs for the ε4 allele. We identified two variants in DDX10 (11q22.3) and APOC1 (19q13.32) genes with beneficial and adverse associations with AD risk independently of the ε2 and ε4 alleles, respectively. Functional enrichment analysis highlighted ε2- and/or ε4-linked processes involved in lipid and lipoprotein metabolism and cell junction organization which have been implicated in AD pathogenesis. Our results advance the understanding of the mechanisms of AD pathogenesis and help improve the accuracy of AD risk assessment.