Background

Following the identification of the APOE ε4 allele as a risk factor for late-onset Alzheimer's disease (LOAD) in 1993 [1], consistent replication of subsequently identified candidates was not achieved until 2009, when two genome-wide association studies (GWAS) [2, 3] identified associations of variants in or near CLU, PICALM , and CR1 with LOAD, which were consistently replicated in multiple large, independent case-control studies [417]. Subsequently, a variant near BIN1 was reported [4] to achieve genome-wide significant association in a later GWAS published in 2010 that also replicated well in follow-up studies [1419]. These results demonstrate the utility of the hypothesis-free GWAS approach for identifying loci that associate with LOAD and the necessity of pooling samples and data from multiple centers to obtain resources with sufficient statistical power (GWAS typically > 14,000, follow-up typically total > 28,000) to detect the modest ORs (e.g. 0.8/1.2) associated with these variants in GWAS and follow-up studies.

Two recently published companion studies by Hollingworth et al. [20] and Naj et al. [17] performed meta-analysis of two large GWAS datasets (n > 75,000). Besides APOE, CLU, PICALM, and CR1, the meta-analyses revealed association at ABCA7 (p = 5 × 10-21), MS4A6A (p = 1.2 × 10-16), MS4A4E (p = 1.1 × 10-10), EPHA1 (p = 6 × 10-10), CD2AP (p = 8.6 × 10-9) and CD33 (p = 1.6 × 10-9). In addition, the two datasets revealed opposing association (Naj et al. OR = 0.93, p = 0.001; Hollingworth et al. OR = 1.06, p = 0.03) of the variant near ARID5B (rs2588969) with LOAD, suggesting potential heterogeneity at this locus. In this study, we genotyped the variants identified at the CD2AP, EPHA1, and CD33 loci in our independent case-control dataset comprising six case-control series (n = 6,835). To assess the opposing associations at the ARID5B locus, we also genotyped the two ARID5B variants included in the Hollingworth et al. study. Genotypes from our follow-up case-control series (Mayo 2) for variants in ABCA7, MS4A6A and MA4A4E were included in Stage 3 of the Hollingworth et al. study, so we have not included these three variants in this study. We have performed meta-analyses of five variants (at CD2AP, EPHA1, ARID5B and CD33 loci) in our six case-control series, which showed no significant series heterogeneity. Furthermore, we have performed logistic regression analysis of our pooled series adjusting for covariates. Finally, we have used a Fisher's combined test to evaluate the significance of the association of these five variants in our data combined with the data in the Hollingworth et al. and Naj et al. studies.

Results

We genotyped five variants (CD2AP; rs9349407, EPHA1; rs11767557, ARID5B; rs2588969 and rs4948288, CD33; rs3865444) in our independent follow-up case-control series (Mayo2) from three North American and three European Caucasian series. Detailed information about these samples is shown in Table 1 and genotype counts are shown in Table 2. Samples used in this study do not overlap with those included in the Naj et al. study and have not been included in any of the published LOAD GWAS. The Mayo2 dataset included in the Hollingworth et al. publication only included genotypes for ABCA7, MS4A6A and MA4A4E.

Table 1 Details of the Mayo2 samples used in this study and genotype counts
Table 2 Genotype counts for each of the six Mayo2 series

Meta-analyses of allelic association in the six Mayo2 series performed using a DerSimonian-Laird random effects model (Figure 1) revealed a significant pooled OR for the EPHA1 variant (Figure 1b; OR = 0.88, p = 0.008) comparable to that previously published by Naj et al. (OR = 0.87) and by Hollingworth et al. (OR = 0.90). As shown in Figure 1c and 1d, we also observed significant association for both ARID5B variants (rs2588969, OR = 1.08, p = 0.046; rs4948288, OR = 1.11, p = 0.008) with ORs comparable to those reported by Hollingworth et al. (OR = 1.06 and 1.07, respectively) and in the opposing direction to those reported by Naj et al. for rs2588969 (Stage 1+2 OR = 0.93, p = 7.7 × 10-4). As shown in Figure 1a and 1e, we did not observe significant association for CD2AP (OR = 0.98, p = 0.76) or CD33 (OR = 0.96, p = 0.32) in our meta-analyses. Breslow-Day tests provided no significant evidence that the ORs for any of these variants were heterogeneous among our series (all p > 0.25), as shown in Figure 1. The variant with the most heterogeneity was CD2AP (rs9349407) where the estimated percentage of variation due to heterogeneity across studies (I2) was 25.1% (95% CI 0%-70%) suggesting the presence of some heterogeneity for that variant.

Figure 1
figure 1

Forest plots for meta-analysis of CD2AP, EPHA1, ARID5B , and CD33 variants in our six Mayo2 case-control series. ORs (boxes) and 95% CI (whiskers) are plotted for each population and shown on the right of each plot. Combined OR is the overall OR calculated by the meta-analysis using a random effects model. P-values are provided for the combined ORs and Breslow-Day tests of heterogeneity. I2 gives an estimate of between studies variance.

To adjust for important covariates, we included age-at-diagnosis/entry, sex and APOE ε 4 dosage in logistic regression analyses of all five variants in each of the six Mayo2 series; in our analysis of all Mayo2 series combined, series was included as an additional covariate. Table 3 shows the results for the six Mayo2 series combined (Mayo follow-up) as well as for each of the six individual Mayo2 series. For the purpose of comparison, we have also included in Table 3 the published GWAS results for the same variants. Adjustment for covariates revealed comparable ORs to those obtained in the meta-analyses, with improved p-values for the EPHA1 (OR = 0.87, p = 5 × 10-4), CD33 (OR = 0.92, p = 0.049) and CD2AP (OR = 0.97, p = 0.56) loci. However, the associations of the ARID5B variants were no longer significant following adjustment for covariates (rs2588969: OR = 1.05, p = 0.30, rs4948288: OR = 1.07, p = 0.11) suggesting that these associations may be dependent upon the series, age-at-diagnosis/entry, sex and/or APOE ε 4 dosage of the individual.

Table 3 Association of CD2AP, EPHA1, ARID5B, and CD33 variants with LOAD in the initial studies (ADGC and GERAD+) and Mayo2 follow-up series

In order to estimate the overall association of these five variants in our data combined with the previously published associations, we used Fisher's method to combine the p-values for all associations (Table 3; Mayo2/ADGC/Hollingworth). We found that adding our data to those previously reported, increased the strength of evidence for all variants as LOAD risk modifiers (CD2AP: p = 6.5 × 10-11, EPHA1: p = 2.1 × 10-15, ARID5B rs2588969: p = 2.3 × 10-9, ARID5B rs4948288: p = 4.0 × 10-4, CD33: p = 1.8 × 10-13).

Discussion

We report here successful replication of the association of two variants with LOAD in a large (n = 6,835), independent case-control study; rs11767557, which is located 3 kb upstream of EPHA1 (p = 5 × 10-4) and rs3865444, which is located 373 bp upstream of CD33 (p = 0.049). The ORs we observed in our meta-analyses (EPHA1 = 0.88, CD33 = 0.96) were comparable to those reported by both Naj et al. (EPHA1 = 0.87, CD33 = 0.89) and by Hollingworth et al. (EPHA1 = 0.90, CD33 = 0.89) such that the estimated p-values for association of these variants in all data (n > 42,000) were an impressive 2.1 × 10-15 for EPHA1 and 1.8 × 10-13 for CD33.

Although our meta-analyses showed successful replication of the association of the ARID5B variants rs2588969 (OR = 1.08, p = 0.046) and rs4948288 (OR = 1.11, p = 0.008) with a direction of association consistent with that reported by Hollingworth et al. (OR = 1.06 and 1.07, respectively), the associations did not survive adjustment for age-at-diagnosis/entry, sex and APOE ε 4 status (p = 0.30 and 0.11, respectively). This covariate-dependent association could explain the opposing association reported by Naj et al. in their discovery (OR = 0.88) and replication (OR = 1.05) datasets for rs2588969; the only ARID5B variant they tested. Therefore, while estimation of the p-values for association of the ARID5B variants in all datasets combined were highly significant (rs2588969; p = 2.3 × 10-9 and rs4948288; p = 4.0 × 10-4), interpretation of these associations should be treated with caution and should take into account the age-at-diagnosis/entry, sex and APOE ε 4 dosage of the populations. Finally, although the estimated p-value for association of rs9349407 (located in intron 1of CD2AP) in all datasets was 6.5 × 10-11, there was no evidence for association of this variant in our dataset alone (OR = 0.97, p = 0.56).

Our Mayo2 collection of case-control series studies provided a total of 2,634 LOAD and 4,201 controls. Combining across studies to perform global tests of significance for additive genotypic trend tests gave us 80% power to detect ORs ranging from 1.17 (or 0.855) for variants with a minor allele frequency (MAF) of 0.2 to 1.13 (or 0.883) for variants with a MAF of 0.45 in controls. The study provided approximately 60% power to detect the OR of 1.11 that we report for CD2AP (MAF = 0.27).

Case-control studies such as this are not designed to ascertain whether the variants with reported association with LOAD risk are the functional variant but they can identify a linkage disequilibrium (LD) block within which a truly functional variant may reside. Our results indicate that the EPHA1 and CD33 variants represent excellent candidates for targeted deep sequencing or high density genotyping in order to define the locus further, followed by subsequent functional studies of nearby genes to elucidate the mechanism behind these associations. With the exception of rs9349407, which lies within intron 1of CD2AP, all of these variants lie within intergenic regions but for ease of the reader, we have thus far only referred to the nearest gene for each variant. This by no means signifies that these variants (or the functional variants in LD with them) are assumed to affect the expression or function of the nearest gene but may affect other nearby genes. Until it is known which gene underlies these associations, all nearby genes should be included in follow-up functional investigation (all genes that reside within 100 kb of these variants are listed in Additional file 1, Table S1).

Conclusions

Taken along with our previous publications [5, 18, 20, 21], we have now performed follow-up association studies of 25 of the top GWAS-identified candidate LOAD genes and successfully replicated the association of eleven variants (in or near ABCA7, BIN1, CD33, CLU, CR1, EPHA1, GAB2, LOC651924, MS4A6A/4E and PICALM), eight of which are currently ranked in the top ten (after APOE) on AlzGene. This recent success in replicating genetic association highlights the utility of multiple, large case-control follow-up studies to confirm the novel associations reported by large GWAS, thus confirming them as good candidate genes for functional follow-up studies.

Methods

Ethics statement

Approval was obtained from the ethics committee or institutional review board of each institution responsible for the ascertainment and collection of samples. Written informed consent was obtained for all individuals that participated in this study.

Case-control subjects

The Mayo2 case-control series consisted of Caucasian subjects from the United States ascertained at the Mayo Clinic Jacksonville, Mayo Clinic Rochester, or through the Mayo Clinic Brain Bank. Additional Caucasian subjects from Europe were obtained from Norway [22], Poland [23], and from six research institutes in the United Kingdom that are part of the Alzheimer's Research UK (ARUK) Network. Although the ARUK samples used in this follow-up do not overlap with those employed in the original GWAS publication by Hollingworth et al., the same subject/sample ascertainment methodology was followed. The ARUK series included here are from Bristol, Leeds, Manchester, Nottingham, Oxford and Southampton. Since the Manchester cohort only consisted of LOAD cases, the Manchester cases were combined with subjects in the Nottingham series.

Genotyping

All genotyping was performed at the Mayo Clinic in Jacksonville using TaqMan® SNP Genotyping Assays in an ABI PRISM® 7900HT Sequence Detection System with 384-Well Block Module from Applied Biosystems, California, USA. The genotype data was analyzed using the SDS software version 2.2.3 (Applied Biosystems, California, USA).

Statistical Analyses

Meta-analysis of allelic association and Breslow-Day tests were performed using StatsDirect v2.5.8 software. Meta-analyses were performed using the results from each individual case-control series. Summary ORs and 95% CI were calculated using the DerSimonian and Laird (1986) random-effects model [24]. Breslow-Day tests were used to test for heterogeneity between populations. PLINK software [25] (http://pngu.mgh.harvard.edu/purcell/plink/) was used to perform logistic regression analysis under an additive model adjusting for age-at-diagnosis, sex and APOE ε 4 dose as covariates. In our analysis of all series combined, series was included as an additional covariate. Since genotype counts were not reported for series included in the Naj et al. or Hollingworth et al. studies, we employed a Fisher combined test to combine p-values across series. Power calculations, based on a Mantel-Haenszel chi-square test that pooled across six different study groups, were obtained to estimate the detectable odds ratios for an ordinal effect using a range of minor allele frequencies spanning those expected from the candidate variants.