Introduction

Alzheimer disease (AD) is a progressive neurodegenerative disorder that is characterized by the abnormal accumulation of amyloid plaques and tau neurofibrillary tangles followed by neuronal loss. Both tau pathology and neuronal atrophy are most prominent in the medial temporal lobes and the hippocampus, brain regions involved in episodic memory (13). The primary clinical feature of AD is a progressive deterioration of cognitive ability, which often initially presents as an impairment in episodic memory (4). Cognitive tests used in the diagnosis of prodromal AD or mild cognitive impairment (MCI) due to AD emphasize tasks of episodic memory (56). In many cases, the degree of memory impairment strongly predicts who will progress to more severe stages of AD dementia (78).

AD pathology can accumulate for years before the onset of detectable clinical or cognitive symptoms (9). Biomarkers are now widely available to establish evidence of preclinical AD and include amyloid and tau PET imaging, cerebrospinal fluid markers of amyloid-β peptide (Aβ), tau, and tau phosphorylated at position 181 (p-tau181), and more recently, plasma biomarkers of similar analytes including tau phosphorylated at position 217 (p-tau217) (1011). Individuals who are in the preclinical phase of the disease are at high risk of ultimately developing symptomatic AD (1213) and therefore are increasingly targeted for enrollment in clinical trials. The rate of cognitive decline in preclinical AD is strongly associated with performance on episodic memory tasks (12, 1417). As such, recently completed clinical trials in preclinical AD have emphasized episodic memory measures in their primary endpoints to demonstrate clinical efficacy of treatment (1819).

Cognitive data can also be used cross-sectionally to enrich clinical trial cohorts for patients at high risk of progression. The original preclinical AD staging model (13) suggested that individuals with abnormal AD biomarkers coupled with subtle cognitive decline are at the highest risk of disease progression, a hypothesis that has been supported empirically (12), leading some trials, such as the A4 study (18), to require a certain score on an episodic memory test to be eligible for enrollment.

Despite the established importance of episodic memory in preclinical and prodromal AD, there is little consensus on how best to measure this construct. Indeed, memory measures vary widely in terms of their stimuli (e.g., single item words vs. paired associates vs. paragraphs vs. objects vs. faces), method of administration (e.g., number of learning trials, number of retrieval attempts, presence of semantic or phonological cues), method of testing (free recall, cued recall, recognition) and retention interval (immediate recall vs. delayed recall) to name just a few. Nevertheless, memory measures are often considered interchangeable. For example, the study validating the most popular general cognitive composite (ADCS-PACC) utilized different list learning measures in separate cohorts (20). This presumed equivalency is unwarranted as there are numerous demonstrations that one memory test (typically a list learning measure) is superior to another (typically paragraph recall) (15, 21).

The primary goal of this report is to compare three common memory tests in terms of their association with baseline differences in preclinical AD pathology. It is important to note that memory tests that may be highly sensitive to AD pathology at baseline may not be the same tests that are sensitive to longitudinal change, if, for example, there are confounding effects of practice-related improvements (22). Previous work examining the associations of cognitive composites with AD pathology has used CSF or neuroimaging biomarkers (23, 24). Recently, plasma biomarkers of AD pathology have become widely available and likely will form the basis of a cost-effective cohort enrichment strategy for future clinical trials that additionally make serial assessments more feasible (2527). Therefore, in this report we focus on plasma biomarkers of amyloid pathology (the Aβ42/40 ratio), both amyloid and tau pathology (p-tau217), and neurodegeneration (neurofilament light chain [NfL]) (2831).

A secondary goal was to compare the sensitivity of plasma biomarkers to decline in cognitive composite scores that employ the different memory tests available. Cognitive composites are now accepted as primary endpoints in secondary treatment trials (19, 32), and could be optimized by including only particularly sensitive memory tests, or conversely, by excluding poorly performing measures. If plasma biomarkers are to form the backbone of clinical trial recruitment, it is essential to establish the specific cognitive tests and composite scores that are most strongly associated with these measures of AD pathology.

Methods

Participants

This study analyzed longitudinal cognitive data from participants in an ongoing study of memory and aging at the Knight Alzheimer Disease Research Center in Washington University in St. Louis. Participants are typically recruited via referrals, outreach events hosted by the study team, and word of mouth. Study volunteers can range in age and may be cognitively healthy or have varying levels of cognitive impairment, but for the current analyses we restricted the sample as described below to best meet the goals of this study. All participants provided informed consent to participate in these studies and study procedures were conducted in accordance with the Declaration of Helsinki. To be included in the present analysis, participants must have been 65 years of age or older and clinically normal at baseline. Furthermore, they were required to have had measurements of all 3 plasma biomarkers (Aβ42/40 ratio, p-tau217, and NfL) within 2 years of their baseline cognitive visit, and to have at least 1 additional follow-up cognitive assessment. To avoid our statistical models being overly influenced by a few participants who have extremely long periods of follow-up, we restricted our follow-up data to a maximum of 10 years. Finally, due to COVID era closures, the cognitive battery was disrupted between 2020 and 2022. Some measures were temporarily dropped, and others were converted to an online administration format. Thus, we limit our analyses to data that was collected prior to the year 2020. Our final sample consisted of 161 cognitively healthy older adults and relevant demographic information on this cohort is presented in Table 1.

Table 1 Participant characteristics at the baseline cognitive visit

Clinical and cognitive assessments

The presence of clinical dementia symptoms is established using the Clinical Dementia Rating® (CDR®) where a rating of 0 indicates the absence of symptoms (33). Participants in the current sample were all rated CDR 0 at their baseline assessment. A comprehensive cognitive battery is also administered annually which covers a wide range of cognitive domains including memory, attention, language, and processing speed (24). Our primary interest is on the memory tests for which we compare three common measures: free recall (FR) using the picture version of the Free and Cued Selective Reminding Test with Immediate Recall (FCSRT+IR) (34), paired associates (PA) recall from the Wechsler Memory Scale (35), and paragraph recall (PR) either from Wechsler Memory Scale Revised (36) or from the Craft Story recall test (37). For our second analytical goal, we developed two cognitive composites consisting of the Digit Symbol Substitution Test, Trail Making B, Category Fluency for Animals and either FR only (PACC-FR) or both FR and PR (PACC-FR-PR) included as measures of episodic memory. MMSE was not included in the PACCs since composites that include it are less sensitive in detecting Aβ-related cognitive decline in Aβ+ individuals (17).

Plasma collection and processing

The plasma collection and processing protocol has been previously described (28). Briefly, blood was collected at 8 AM following an overnight fast. Aβ42 and Aβ40 were measured by C2N Diagnostics using an immunoprecipitation-mass spectrometry assay (38), p-tau217 was measured with the Lilly-developed assay at Lund University (31), and NfL was assessed with Quanterix Nf-Light assay kits at Washington University.

Statistical Analysis

To enable comparison across the different memory measures, all cognitive tests were z-scored to the mean and standard deviation of the sample at baseline. For each cognitive outcome, linear mixed effects models were constructed using the lme4 package (39) in the R statistical computing environment, version 4.3.1, with baseline age, education, gender, years in study (hereafter referred to as “time”), a plasma biomarker, and the biomarker by time interaction included as fixed effects and random intercepts and slopes of time across participants. FR, PA, PR and the two cognitive composites were used as outcomes in separate models. Separate models were generated including either the plasma Aβ42/ Aβ40 ratio, p-tau217, or NfL. Outcomes are reported as a mean estimate with an associated 95% confidence interval. D-scores are provided as a measure of effect size and were calculated using the EMAtools (40) package in R. The d scores of longitudinal change (i.e., the biomarker by time interaction) are shown in Table 2 for each plasma biomarker and each cognitive outcomeFootnote 1.

Table 2 d-scores for longitudinal change in each outcome for each biomarker

Data availability policy

Data are available upon an approved request to the Knight ADRC (https://knightadrc.wustl.edu/Research/ResourceRequest.htm).

Results

Plasma Aβ42/40 results

A summary of the plasma Aβ42/ 40 models is presented in Figure 1 (full model output and additional figures are provided in the Supplement). As shown, none of the cognitive tests or composite scores were associated with plasma Aβ42/40 at baseline (top panel). However, the rate of change in FR was associated with baseline plasma Aβ42/40 (the ratio by time interaction) with a relatively large effect size (beta = 0.05, CI = [0.02, 0.08], p = 0.004, d = 0.59). In contrast, the rate of change in PR and PA were not associated with baseline plasma Aβ42/Aβ40. The rate of change in a global cognitive composite that includes FR as the memory measure (PACC-FR) was more strongly associated with baseline plasma Aβ42/40 (beta = 0.03, CI = [0.01, 0.05], p = 0.007, d = 0.51) than a PACC score (PACC-FR-PR) that included both FR and PR (beta = 0.02, CI = [−0.00, 0.04], p = 0.105, d = 0.30). Thus, in this sample, the rate of change in FR alone was most strongly associated with baseline plasma Aβ42/Aβ40.

Figure 1
figure 1

Results from the plasma Aβ42/ Aβ40 models

Points are the regression estimate with a 95% confidence interval. Baseline = the cross-sectional difference associated with plasma Aβ42/ Aβ40, Longitudinal = the decline associated with baseline amyloid (i.e., the plasma Aβ42/ Aβ40 by time interaction). FR = Free recall from the Free and Cued Selective Reminding Test, PR = paragraph recall, PA = paired associates.

Plasma p-tau217 results

The plasma p-tau217 models are summarized in Figure 2 (full model output and additional figures are available in the Supplement). As shown in the top panel, all cognitive tests were associated with p-tau217 at baseline with the exception of PACC-FR composite with small to moderate effect sizes (ds, FR= −0.32, PR = −0.45, PA = −0.33, PACC-FR-PR = −0.32). Similarly, the rates of change in FR and PA (but not PR), were associated with baseline p-tau217 levels. This decline was largest for FR (beta = −0.08, CI = [−0.13, −0.04], d = −0.51) followed by PA (beta = −0.07, CI = [−0.11, −0.02], d = −0.42). Rates of change for cognitive composite scores were relatively similarly associated with baseline p-tau217 (PACC-FR: beta = −0.07, CI = [−0.09, −0.05], d = −0.79; PACC-FR-PR: beta = −0.06, CI = [−0.08, −0.04], d = −0.76).

Figure 2
figure 2

Results from the p-tau217 models

Points are the regression estimate with a 95% confidence interval. Baseline = the cross-sectional difference associated with p-tau217, Longitudinal = the decline associated with p-tau217 (i.e., the p-tau217 by time interaction). FR = Free recall from the Free and Cued Selective Reminding Test, PR = paragraph recall, PA = paired associates.

NfL results

A summary of the plasma NfL models is presented in Figure 3 (full model output is available in the supplement materials). As with the plasma Aβ42/40 models, none of the memory tests nor the cognitive composites were associated with NfL at baseline. Rates of change for FR and PA had similar associations with baseline plasma NfL (FR: beta = −0.03, CI = [−0.07, 0.00], d = −0.37; PA: beta = −0.03, CI = [−0.07, 0.00], d = −0.40). Furthermore, rates of change for both cognitive composites were similarly associated with baseline plasma NfL (PACC-FR: beta = −0.02, CI = [−0.04, −0.00], d = −0.36; PACC-FR-PR: beta = PACC-FR: beta = −0.02, CI = [−0.04, −0.00], d = −0.38).

Figure 3
figure 3

Results from the NfL models

Points are the regression estimate with a 95% confidence interval. Baseline = the cross-sectional difference associated with NfL, Longitudinal = the decline associated with NfL (i.e., the NfL by time interaction). FR = Free recall from the Free and Cued Selective Reminding Test, PR = paragraph recall, PA = paired associates.

Discussion

Due to their cost effectiveness, ease of collection, and concordance with other markers of AD pathology, plasma biomarkers are poised to become the primary cohort enrichment strategy for clinical trials on AD. Cross-sectional impairment and longitudinal decline in episodic memory is consistently one of most sensitive cognitive signals of AD. It is critical, therefore, to establish which specific memory measures are the most strongly correlated with plasma biomarkers. We discuss our findings centered around several key points.

First, at baseline, there was no association between plasma Aβ42/40 or NfL and any memory test. However, all three memory tests were associated with p-tau217, with the strongest effect appearing on paragraph recall. Plasma p-tau217 reflects both amyloid and tau pathology (30, 42, 43), and tau pathology is more strongly associated with cognitive impairment than amyloid pathology alone (44,45). Second, there were clear dissociations among the memory measures when considering longitudinal change. There was a strong association between decline in the pFCSRT+IR and plasma Aβ42/40 (d=0.59), which was not seen for paragraph recall and paired associates recall (ds = −0.25 and 0.30, respectively). Other studies have also reported that FR outperformed paragraph recall in predicting incident AD or biomarker profiles (21, 46, 47). Although the differences were more modest, decline in pFCSRT+IR was more strongly associated with baseline p-tau217 as compared to PA; the decline in FR and PA had similar associations with baseline plasma NfL. The poor longitudinal performance of the paragraph recall tests is possibly due the pronounced practice effect relative to list learning tests (22, 48). The additional advantage of the pFCSRT+IR over PA is likely due in part to the fact that pFCSRT+IR controls attention and semantic encoding during acquisition to maximize recall and PA does not.

Although preclinical AD can be diagnosed in the absence of cognitive impairment, evidence of cognitive decline has become an important outcome for investigating the prognostic efficacy of plasma biomarkers. We focused on episodic memory because impairment on episodic memory tasks is the hallmark cognitive deficit of AD and occurs early in the disease course. When preclinical AD was first described, cognitive impairment was thought to occur after β-amyloid plaque deposition and neurofibrillary tau aggregation pathology (13). An important question is when in preclinical AD does amyloid accumulation become associated with cognitive impairment (49).

Evidence of FR sensitivity to early biomarker changes in preclinical AD can be gleaned from clinical and biomarker studies. The early emergence of FR impairment as a predictor of symptomatic AD has been observed in longitudinal cohort studies in the US and Europe. Recently, we have identified a subset of cognitively normal participants who have impaired FR in several cohort studies including the Knight ADRC (18.1%), HABS (15%), A4 (20%) and the BLSA (16%) (5053). Using the assessment closest to death, 300+ cases from the clinicopathological series from the Knight ADRC were classified into Braak stages (54). FR scores were lower in cases at Braak stage III compared to Braak stages 0 and I (combined) while MMSE and CDR scores for individuals did not differ from Braak stages 0/I until Braak stage IV.

Of course, it is now standard practice to examine decline on cognitive composite scores as opposed to single tests. Nevertheless, the specific tests selected to comprise the final composite will have a critical bearing on the final results. For example, the pFCSRT+IR is one component of the PACC in the HABS cohort that also included paragraph recall, digit symbol substitution, and the MMSE. When FR was included in the PACC, differences between +/− Aβ groups emerged earlier then when FR was not included over 3 and 5 years of follow-up (55). In the A4 cohort, the magnitude of the decrease in FR at subclinical levels of Aβ compared to normal levels was more than twice that of the other PACC components and with a larger effect size than the PACC (49). These results mirror the present study, where a cognitive composite that included only FR substantially outperformed the composite that included both FR and PR. This benefit of an FR only composite was specifically associated with baseline plasma Aβ42/40, as both composite scores were similarly sensitive to cognitive decline that was associated with baseline plasma p-tau and NfL.

It was unexpected that rate of change in FR was so strongly associated with baseline plasma Aβ42/40. This may indicate the temporal relationship between plasma Aβ42/40 and FR: that Aβ42/40 changes shortly before FR starts to decline. Global and theoretically derived cognitive composites exhibited stronger associations with the interaction of age and plasma Aβ42/40 levels than empirically derived memory composites or raw scores from single memory tests including story recall, a list learning test, and a visual memory test (23). Both list learning (AVLT) and story recall (LM) exhibited insignificant biomarker associations individually but when combined with a visual memory test that itself was significantly associated with the biomarker, the association of the composite was enhanced unlike what we observed in the current study. Nevertheless, episodic memory declines assessed by FR occur earliest in preclinical AD with executive functioning declining several years later (16). It is possible that non-memory aspects of standard cognitive composites are more strongly related to tau pathology, which occurs later and is not indexed by plasma Aβ42/40 as shown by the increased sensitivity to tau when executive function tests are combined with FR. A recent study found that plasma Ab42/40 changed 5 years earlier than a measure of plasma p-tau217 (56). When tau pathology begins to also accumulate, as reflected by p-tau217, the predictive utility of a cognitive composite becomes greatly enhanced. We see this as another demonstration that the selection of specific tests to be used in composites can be critical to the measurement of its associations with biomarkers and clinical progression.

The PACC-FR-PR in the current study is similar to the Z-scores of Attention, Verbal fluency, and Episodic memory for Nondemented older adults (ZAVEN) composite because the composite tests focus on memory and executive function (17). The ZAVEN is comprised of the DSST, FAS, story recall, and the CVLT instead of the pFCSRT+IR. Over 6 years of follow-up, cognitively normal participants in the AIBL cohort with high (SUVR >1.90 and intermediate (1.50–1.90) levels of amyloid burden showed greater cognitive progression when measured by the ZAVEN than other composites that either included the MMSE or did not include measures of executive function. Even amyloid burden levels under CL40, a composite measure of executive functioning/processing speed and memory retrieval tasks provided the strongest prediction of decline in the HABS cohort, while PACC score remained optimal at high levels of Aβ (>CL40) (57).

Despite the many strengths of this study including a large, well-characterized cohort and many years of repeated testing, there are some limitations that should be noted. First, our sample is highly educated, and the majority of the sample self-identified as White. This may limit generalizability of the findings to the larger population, especially if there are differences across race in biomarker levels (58,59), or cognitive test scores (60,61) due to differences in social or environmental factors. It will be important in future analyses to consider social determinants of health and other factors that may modify the relationships shown in the present work. Additionally, we examined only linear rates of change, and it may be fruitful to also consider non-linear trajectories in subsequent analyses.

Conclusion

Plasma Aβ42/40, p-tau217, and NfL are strong predictors of FR decline in preclinical AD. Caution is recommended when combining components in cognitive composites, particularly when considering decline that may be associated with plasma Aβ42/40. Combining FR with story recall may weaken the association with plasma Aβ42/40, thereby reducing their prognostic value. Episodic memory and executive function are important domains to be assessed in preclinical AD; which tests are used in their measurement may affect the magnitude of associations with biomarkers and clinical progression. Our results suggest that FR may be an ideal test to consider when monitoring longitudinal changes in memory. While FR decline on the pFCSRT+IR may mark the start of episodic memory impairment in preclinical AD, other methods like those that include daily brief repeated memory testing may reveal impairment at an even earlier point (62, 63).