Polygenic Risk Score Analysis of Alzheimer’s Disease in Cases without APOE4 or APOE2 Alleles

The We and others have previously shown that polygenic risk score analysis (PRS) has considerable predictive utility for identifying those at high risk of developing Alzheimer’s disease (AD) with an area under the curve (AUC) of >0.8. However, by far the greatest determinant of this risk is the apolipoprotein E locus with the E4 allele alone giving an AUC of ∼0.68 and the inclusion of the protective E2 allele increasing this to ∼0.69 in a clinical cohort. An important question is to determine how good PRS is at predicting risk in those who do not carry the E4 allele (E3 homozygotes, E3E2 and E2E2) and in those who carry neither the E4 or E2 allele (i.e. E3 homozygotes). Previous studies have shown that PRS remains a significant predictor of AD risk in clinical cohorts after controlling for APOE ε4 carrier status. In this study we assess the accuracy of PRS prediction in a cohort of pathologically confirmed AD cases and controls. The exclusion of APOE4 carriers has surprisingly little effect on the PRS prediction accuracy (AUC ∼0.83 [95% CI: 0.80-0.86]), and the accuracy remained higher than that in clinical cohorts with APOE included as a predictor. From a practical perspective this suggests that PRS analysis will have predictive utility even in E4 negative individuals and may be useful in clinical trial design.


Introduction
P olygenic risk score (PRS) analysis enhances the predictability of the diagnosis of AD (1). In a recent analysis, we showed that the area under the curve (AUC) in a pathologically confirmed case/ control series was 0.84 (2). However, by far the largest contribution to this risk analysis is the E4 allele (risk) and the E2 allele (protective) which gave AUC of 0.68 (E4 alone) and 0.69 (E4+E2) as compared to the overall PRS AUC=0.75 in clinical samples (1). An important practical and theoretical consideration is to understand how good PRS is when the risk at the APOE locus is removed. When this was tested in the clinical series (1) the AUC was reduced from 0.75 in the whole dataset to 0.65 in E3 homozygotes. Assessment of the significance of PRS adjusting for APOE4 statistically was performed (3) and indicated little change in the models' statistical significance. However for practical application, e.g. selecting individuals for clinical trials, statistical significance is not an informative measure of the algorithm performance. To our knowledge, the PRS accuracy in E3 homozygotes has never been directly investigated in pathologically confirmed samples. Therefore, we tested this in our pathological series by removing from the analysis, first all E4 carriers and then, all E4 and E2 carriers from both the case and the control data sets.
The sample characteristics of the original dataset used in this study were the same as in our previous analysis (2). This project was declared IRB exempt (MedstarProject #2003-118) under the Code of Federal Regulations, 45 CFR, 46. The primary data consisted of 1011 cases and 583 controls. We first eliminated all those samples who had an E4 allele (leaving 354 cases and 454 controls) and then additionally those with an E2 allele (leaving 321 cases and 365 controls homozygous for the E3 allele). From the total 36,481,940 imputed single nucleotide polymorphisms (SNPs), we excluded those with an Info score below 0.8 and MAF<0.01. This resulted in 7,868,100 SNPs which were used for the analysis. Genome-wide association analysis was performed for each SNP using logistic regression analysis as implemented in snptest (4) with adjusting for gender and first two principal components which were selected after visual inspection of each pair of PCs to adjust foe any potential stratification in the data Predictive modelling was performed using a polygenic score approach based upon AD associated SNPs according to the IGAP study (5). We converted the imputed genotypes of our samples into "most probable" genotypes with a probability over 90%. The correlated SNPs were pruned using parameters r2=0.1, a physical distance threshold of 500Kb, preferentially retaining the SNP most significantly associated with AD (2). The AD GWAS association p-value threshold for SNP inclusion was 0.5, as this currently maximally captures polygenic risk in the greatest number of samples (1). The models were fitted using IGAP (stage I) summary statistics data as a training set and predicting AD/control status in our study. We note that our cohort is part of the IGAP study (5) and therefore the results maybe slightly biased due to the 1.3% overlap. To adjust for the overlap, we used a simulation approach as described in (2), assuming that our dataset (N=686) is a random subset of the IGAP study (N=54,162). In short, we simulated 1000 times effect sizes of SNPs with mean b~N(BIGAP, sd=0.12*SEIGAP), where BIGAP is the beta-coefficient and SEIGAP is the standard error for that SNP in the IGAP study, and the coefficient 0.12 was estimated empirically (see (2) for details).

Results
The prediction accuracy (AUC) in the full pathologically confirmed dataset was AUC=0.73 [0.71-0.75] for E4 alleles and AUC= 0.75 [0.73-0.77] for E4 and E2. When PRS was included to the predictive model, the AUC for the full pathologically confirmed dataset was 0.84, after adjusting for the overlap with the training IGAP dataset used for SNP selection (2). The original unadjusted AUC was 0.87 and was 0.84, after adjusting for the overlap with the training set used for SNP selection (2).
Removing all individuals with an E4 allele only reduced the unadjusted AUC from 0.87 to 0.84 and then removing all E2 carriers (i.e. restricting the analysis to E3 homozygotes) had a further small effect and reduced the AUC to 0.83. Thus, in contrast to the results obtained with the clinical series, the AUC is only marginally reduced by removing E4 and E2 carriers.
We tested three possible explanations for this finding: 1) the people who get AD without an E4 allele have more AD risk alleles, i.e. alleles at other loci have bigger effects in the absence of E4; 2) the effects of APOE and other risk SNPs are independent; 3) the results are driven by inflation due to the overlap between the discovery (IGAP) and test (E3 homozygote pathologically confirmed AD cases and controls) datasets.
First, we ran a GWAS analysis with snptest software only for E3 homozygote cases and controls. The majority of the top IGAP SNPs did not show statistically significant association in this small sample set and their effect sizes were not higher than the effect sizes in the whole data set (data not shown). This strongly suggests that the E3 homozygotes with disease do not have a greater excess of other AD risk alleles.
Next, we counted the number of risk and alternative alleles for sets of SNPs at different significance thresholds (reported by the IGAP study) for each subject in the E3 homozygote subgroup and in the rest of the dataset. We compared the average number of risk and alternative alleles (per person) using a chi-square test for a 2x2 table: (Risk Allele -Alternative Allele) x (E3 homozygotes -other genotypes). This analysis was performed in cases and controls separately as cases in general may have more risk alleles than controls. The results are summarized in Tables 1 and 2, respectively. There were no significant differences in the mean number of risk and alternative alleles per person among the E3 homozygotes versus the other genotypes in either the pathologically confirmed AD cases or the pathologically confirmed controls.
We also compared the predictive accuracy of the best model (PRS for SNPs with p-values≤0.5) with and without APOE in three subgroups, namely E4 carriers (644 cases and 115 controls), E4 and E2 carriers (677 cases and 204 controls) and E3 homozygotes (321 cases and 365 controls). Table 3 shows the estimated AUC in those subgroups for the PRS models with more significant SNPs (p≤0.001 in IGAP study) and the best predictive PRS model (1), combining all available independent SNPs with p-values ≤0.5, when the APOE region is included and excluded. The results clearly show that the PRS predictive accuracy is almost the same in any subgroup, when APOE is excluded. Note that the full dataset (shown in the second column of Table 3), has the largest overlap with IGAP, and therefore the AUC estimate for this group has the most (~2%) inflation (see (2) for details). Finally, we adjusted our main result (AUC=0.83 in the E3 homozygote dataset) for the overlap with the discovery IGAP dataset using a simulation approach (2). The adjusted AUC and the confidence intervals were calculated as average AUC and CI over 1000 simulations, AUCADJ = 0.83 [95% CI: 0.80-0.86]. Figure 2 shows the distribution of standardized PRS for the E3 homozygote cases and E3 homozygote controls. In the negative polygenic extreme group (PRS smaller than -2), there were 17 controls and 0 cases. In the positive extreme group (PRS greater than 2), there were 11 cases and 1 control. Looking at the extremes (PRS < -1.5) and (PRS > 1.5), there were 1 case and 49 controls and 41 cases and 4 controls, respectively.

Discussion
Our results show that the predictive accuracy of PRS in pathologically confirmed E3 homozygotes is high and equivalent to the predictive accuracy of the whole dataset. This finding indicates that APOE is an independent risk factor for the disease. This result is in contrast to the PRS observed in clinical cohorts where restricting analyses to E3 homozygotes resulted in a large reduction in the PRS accuracy. We believe this is likely to be because of poor diagnostic accuracy among those labeled as AD in the absence of an E4 allele: this interpretation is consistent with post mortem follow up of AD clinical trials, which suggested a diagnostic inaccuracy of up to 25% (6,7). From a mechanistic perspective, this result suggests that the genetic architecture of AD in E3 homozygotes is similar to that in the other APOE genotypes since a similar proportion of risk is captured by PRS in all genotypes. This result does not support the belief that E3 homozygotes with AD have more predisposing variants at other loci. This result suggests that PRS analysis is likely to have utility in clinical trial design.