Predicting Amyloid Burden to Accelerate Recruitment of Secondary Prevention Clinical Trials

BACKGROUND: Screening to identify individuals with elevated brain amyloid (Aβ+) for clinical trials in Preclinical Alzheimer’s Disease (PAD), such as the Anti-Amyloid Treatment in Asymptomatic Alzheimer’s disease (A4) trial, is slow and costly. The Trial-Ready Cohort in Preclinical/Prodromal Alzheimer’s Disease (TRC-PAD) aims to accelerate and reduce costs of AD trial recruitment by maintaining a web-based registry of potential trial participants, and using predictive algorithms to assess their likelihood of suitability for PAD trials. OBJECTIVES: Here we describe how algorithms used to predict amyloid burden within TRC-PAD project were derived using screening data from the A4 trial. DESIGN: We apply machine learning techniques to predict amyloid positivity. Demographic variables, APOE genotype, and measures of cognition and function are considered as predictors. Model data were derived from the A4 trial. SETTING: TRC-PAD data are collected from web-based and in-person assessments and are used to predict the risk of elevated amyloid and assess eligibility for AD trials. PARTICIPANTS: Pre-randomization, cross-sectional data from the ongoing A4 trial are used to develop statistical models. MEASUREMENTS: Models use a range of cognitive tests and subjective memory assessments, along with demographic variables. Amyloid positivity in A4 was confirmed using positron emission tomography (PET). RESULTS: The A4 trial screened N=4,486 participants, of which N=1323 (29%) were classified as Aβ+ (SUVR ≥ 1.15). The Area under the Receiver Operating Characteristic curves for these models ranged from 0.60 (95% CI 0.56 to 0.64) for a web-based battery without APOE to 0.74 (95% CI 0.70 to 0.78) for an in-person battery. The number needed to screen to identify an Aβ+ individual is reduced from 3.39 in A4 to 2.62 in the remote setting without APOE, and 1.61 in the remote setting with APOE. CONCLUSIONS: Predictive algorithms in a web-based registry can improve the efficiency of screening in future secondary prevention trials. APOE status contributes most to predictive accuracy with cross-sectional data. Blood-based assays of amyloid will likely improve the prediction of amyloid PET positivity.

MEASUREMENTS: Models use a range of cognitive tests and subjective memory assessments, along with demographic variables. Amyloid positivity in A4 was confirmed using positron emission tomography (PET).

RESULTS:
The A4 trial screened N=4,486 participants, of which N=1323 (29%) were classified as Aβ+ (SUVR ≥ 1.15). The Area under the Receiver Operating Characteristic curves for these models ranged from 0.60 (95% CI 0.56 to 0.64) for a web-based battery without APOE to 0.74 (95% CI 0.70 to 0.78) for an in-person battery. The number needed to screen to identify an Aβ+ individual is reduced from 3.39 in A4 to 2.62 in the remote setting without APOE, and 1.61 in the remote setting with APOE.

CONCLUSIONS:
Predictive algorithms in a web-based registry can improve the efficiency of screening in future secondary prevention trials. APOE status contributes most to predictive accuracy with cross-sectional data. Blood-based assays of amyloid will likely improve the prediction of amyloid PET positivity.

Keywords
Trial-ready cohort; Alzheimer's disease; machine learning Background Screening cognitively normal older individuals for the presence of elevated cerebral amyloid-beta protein ("Aβ+") and inclusion in secondary prevention trials for Alzheimer's disease (AD) is invasive, expensive and slow. The current gold standards to measure Amyloid-β in the brain require either positron emission tomography (PET) or cerebrospinal fluid (CSF) assay. For example, the Anti-Amyloid Treatment in Asymptomatic Alzheimer's disease (A4) trial conducted amyloid PET on 4,486 individuals in order to identify 1,323 Aβ + individuals for an amyloid PET screen fail rate of 71% (1). The Number Needed to Screen (NNS) to identify each Aβ+ individual was 3.39 individuals.
Trial-Ready Cohort in Preclinical/Prodromal Alzheimer's Disease (TRC-PAD) is a research program that was initiated to find solutions to these challenges in trial recruitment and site management, as described in Aisen, et al. Submitted (2). There are three elements that make up the TRC-PAD platform; Alzheimer's Prevention Trial (APT) webstudy (aptwebstudy.org), Site Referral System (SRS) and the Trial Ready Cohort (TRC). The APT webstudy invites participants to enroll into the study. At the time of enrollment, participants are asked for demographic, medical and lifestyle information. They are asked to complete longitudinal web-based cognitive testing and symptom questionnaires. With these data, we aim to estimate the likelihood that an individual is Aβ+ before they are invited to participate in a secondary prevention trial. The SRS helps facilitate the participants deemed to be most likely Aβ+ from APT to go for in-clinic assessments where they proceed with the TRC screening. During the TRC screening phase participants are administered additional testing, including Preclinical Alzheimer's Cognitive Composite (PACC) (3) and genotyping, before assessing their eligibility for an amyloid test.
In this paper, we describe how the prediction models and algorithms used in TRC-PAD were derived from A4 screening data. We anticipate blood-based biomarkers will greatly improve predictions of amyloid positivity, and this is a focus of future work and an aim of TRC-PAD. Predictors in the current analysis are limited to demographics, cognitive and functional assessments, and APOE genotype.

Population and Study Design
The study design and screening data for A4 have been previously described (7,8) and Institutional Review Boards have approved both A4 and TRC-PAD studies. The A4 screening dataset contains N=4,486 participants, of which 1323 (29%) were classified as Aβ +. Amyloid PET imaging was conducted with florbetapir F18 and summarized by mean cortical standardized uptake value ratio (SUVR) relative to the whole cerebellum. Participants were considered eligible to continue screening for A4 based on an algorithm combining both quantitative SUVR (≥1.15) and qualitative visual read performed at a central laboratory. A SUVR between 1.10 and 1.15 was considered to be elevated amyloid only if the visual read was considered positive by a two-reader consensus determination. Participants who were considered Aβ+ were slightly older; with mean/standard deviation (SD) age of 72.10/4.89 in the Aβ+ group and 70.95/4.53 in the Aβ-group. However, there were no observed differences in sex and education. Aβ+ participants were more likely to have a family history of dementia and at least one APOEε4 allele. In addition, Aβ+ participants performed worse on the screening Preclinical Alzheimer Cognitive Composite (PACC) results and had higher scores on the Cognitive Function Index. Table 1 describes the collections of predictors that we considered to train different predictive algorithms. All screening data for the A4 Study were collected during supervised clinic visits. However some components of the A4 screening battery are being captured remotely in the APT webstudy, including demographic, Cogstate brief battery (9), family history (sibling or parent with Alzheimer's), and Cognitive Function Instrument (10) (CFI) variables indicated in Table 1. We consider predictive algorithms using these "remote" variables only, as well as a more thorough battery that would require a supervised clinic visit with an administration of the PACC3. In all, we considered 6 models: (1) remote battery without APOE, (2) remote battery with APOE, (3) in clinic battery without APOE, (4) in clinic battery with APOE, (5)

Statistical Analysis
Extreme Gradient Boosting (XGBoost) (4) is a decision tree-based machine learning technique (6). A single decisions tree, or regression tree, is easy to interpret but provides relatively poor prediction. Aggregating a large number of trees can improve prediction accuracy. Boosting is a technique in which models are trained in sequence, with each new model making cumulative improvements. At each iteration the data are re-weighted such that misclassified data points receive larger weights. XGBoost is a scalable tree boosting algorithm, that is optimized and designed to be highly efficient, flexible, and portable.
XGBoost supports monotone constraints and customized objective functions. We applied monotone constraints to predictors such as age, number of APOEε4 alleles (0, 1 or 2), and assessment scores that we expect to have a generally monotonic relationship with amyloid PET SUVR (Supplemental Figure 1). The default XGBoost objective function is mean squared loss, meaning decision trees are selected to minimize the residual sum of squares. Because XGBoost does not provide confidence intervals with mean squared loss, we applied the Quantile Regression loss function to estimate the 50%, 2.5%, and 97.5% quantiles of the predictions. XGBoost model has a number of hyper-parameters that are used to assist in the issue known as the bias-variance trade-off (13). Hyper-parameters are fixed before the model is fitted and are not learned from data. We used 10-fold Cross-Validation (CV) to assess the out-of-sample bias and variance for given hyper-parameter values, and Bayesian Optimization (14) to optimize the hyper-parameter selection. We use SHapley Additive exPlanation (SHAP) (15) values to summarize the importance of each predictor to the overall predictive accuracy of each model. More details about the model fitting procedures are provided in the supplemental material (Supplemental Table 1). Our main interest lies in the predictive accuracy of the models. In order to assess this, we split the data randomly into 80% training and 20% test. After fitting the models with the training data, we assess their predictive accuracy with the independent test data. Analyses were conducted with R version 3.6.2 (r-project.org) with packages xgboost (4) Figure 1 shows the relative contributions, in terms of SHAP values, for each predictor to the predictive accuracy of each model. As expected, when available, APOE genotype is the most important predictor for these cross-sectional models. We see that age, CFI, education, and family history also enter the top 5 most valuable predictors in some models. Figure 2, the Receiver Operating Characteristic (ROC) curves and Area under the Curve (AUC) for the 6 models, also demonstrates the relative value of APOE. The dashed lines are models fitted without the APOEε4 variable and the solid lines are for models that include APOEε4. The ROC curves were generated using a cut point SUVR value of 1.15 for a binary separation between amyloid positive and negative. In general, we see AUCs in the range 0.60 (without APOE) to 0.73 (with APOE).  Table 2 reports operating characteristics from several screening algorithm scenarios. The top half provides operating characteristics when a threshold is selected to provide 50% prediction prevalence (i.e. select half the participant pool to receive amyloid PET scans). With 50% prediction prevalence, the NNS is about 2.5 participants with APOE and 3.0 participants without APOE. When the threshold for predicted amyloid PET is increased to 1.15, the NNS is reduced to about 1.7 participants with APOE and 2.5 participants without APOE. However, this results in much lower sensitivity, and as we can see from Figure 3, a threshold of 1.15 would be practical only with participant registries of 10,000-13,000 to identify 1,000 Aβ+ participants.

Discussion
This work, in the context of the TRC-PAD platform, can facilitate the development of participant selection algorithms. TRC-PAD has two main selection points; the first is from the APT webstudy to in-clinic assessment (stage 1) and the second is from in-clinic to amyloid testing (stage 2). In stage 1, consented webstudy participants are referred to their nearest TRC-PAD site, identified via the use of self-reported zip codes. They are then ranked based on their SUVR prediction. In addition to this predicted SUVR, the selection process considers demographics to achieve diversity and if the participant has known prior amyloid testing and results. During the first in-clinic visit of the referred participants in stage 1, additional cognitive testing, in the form of the PACC, and APOE genotyping is performed. With this additional information, the SUVR predictions are updated and presented for central authorization of amyloid testing.
This work has shown that by collecting relatively simple demographics, cognitive and functional assessments remotely, via the webstudy, we will be able to reduce screen fail rates and improve enrollment. Even small improvements in NNS can have a large impact on the expense of screening for Preclinical AD clinical trials. For example, assuming a conservative estimate of 3,500 US Dollars (USD) per scan, the A4 study spent a total of about 4,486x3,500 (USD) = 15,701,000 (USD) for screening amyloid PET scans alone to identify 1,323 Aβ+ individuals (NNS=3.39). Reducing the NNS from 3.39 to 2.62, which seems plausible with the simplest remote battery, would have reduced this cost by 3,569,090 (USD) to 1,323x2.62x3,500 (USD) = 12,131,910 (USD). In addition to the remote data setting, this work included the value of APOE genotyping and collection of PACC during an in-clinic screening. Adding APOE genotype might reduce NNS to below 2.00, for a total PET screening cost of 1,323x2.00x3,500 (USD) = 9,261,000 (USD). The financial impact would be less with a cerebrospinal fluid (CSF)-based, or blood-based, amyloid screen, but the impact on subject and site burden would remain significant. From a statistical aspect, we have demonstrated the use of Machine Learning Techniques to both optimize, via Bayesian Optimization, and produce predictive models using XGBoost. We have illustrated how to make inferences from a modelling approach that is primarily used for prediction via the SHAP metric.
One limitation of these pre-screening algorithms is that the cohort characteristics will be impacted. For example, we would expect the algorithms to produce an older cohort with an even greater proportion of APOEε4 carriers than a cohort selected without a pre-screen. This could be mitigated by stratifying the screening process to ensure an adequate sample of younger, APOEε4 noncarriers; but with adverse effects on the NNS. Another consideration is the inability for these models to extrapolate beyond the data in the continuous variables such as age. A second potential limitation is in the bias of the training data. As we start using these models in TRC-PAD and collect additional data, we will assess whether the models are biased against any additional covariates collected.
Future work will focus on utilizing longitudinal cognitive and functional change and/or the use of blood-based biomarkers to improve the performance of these predictive models and algorithms. We anticipate, based on analyses of the Alzheimer Disease Neuroimaging Initiative (ADNI) (5), that longitudinal change may be a valuable predictor of amyloid status. In addition, we will incorporate plasma amyloid peptide ratios (currently in validation testing) into the final stage of prediction and expect a large improvement in prediction.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.    We considered predictive algorithms which could be applied to data captured either remotely via a web-based registry, or in the clinic (though all data in A4 was collected in clinic), as indicated in the table.
In all we considered 6 models: (1)    The top half of the table provides demographic characteristics when a threshold is applied to predicted amyloid PET SUVR that results in a 50% prediction prevalence (half of the screening pool is predicted positive and tested with a PET scan). The first column indicates the threshold required to attain 50% prediction prevalence. The bottom half of the table applies a threshold of 1.15. We can see in all the scenarios where APOE is included in the model, at least 29 of the 30 participants with APOE4 2 allele (in the test data) have been selected.