Introduction

Detecting changes in cognition in the earliest stages of Alzheimer disease (AD) in longitudinal studies requires that assessment measures meet several criteria. Assessments must be psychometrically valid and reliable and test relevant domains of cognition such as episodic memory, executive function, attentional control, processing speed, language, and orientation. For both overall scores and domain-specific scores, these measures should have good retest reliability and inter-rater reliability. Of particular importance, especially for studies that enroll large asymptomatic cohorts, is that the tests included are sensitive enough to detect subtle changes in cognition that characterize the preclinical asymptomatic stages of AD [1, 2]. Attempting to maximize sensitivity, AD cohort studies and clinical trials typically involve extensive cognitive testing sessions including dozens of individual measures. The comprehensiveness of a cognitive evaluation is often valued as a method to provide more nuanced and sensitive insights into cognitive changes commonly associated with AD pathology [3, 4]. However, this comprehensiveness comes at a cost as lengthier evaluations can be problematic for multiple reasons. First, longer assessments increase participant burden and require significant personnel time for test administration, scoring, quality control, and data entryFootnote 1. Second, day-to-day variability in stress and mood can significantly impact participants’ attention and fatigue, factors which have been shown to particularly influence older adults’ cognitive performance [5]. Finally, the tests which tap into the domains of cognition that decline in the earliest stages of AD, including episodic memory, attention, and working memory, are perceived as most taxing and effortful [6, 7] and often have suboptimal reliability [8].

Short-form cognitive assessment tools, sometimes referred to as cognitive screening measures, have been used for decades in clinical studies to provide a snapshot of overall cognitive and functional abilities and require a fraction of the time it takes to complete their long-form counterparts. The brevity and simplicity of these assessments make them popular tools among researchers and clinicians when assessing cognitive decline. The most common short-form measure in the AD literature is the Mini-Mental State Examination (MMSE) [9]. The MMSE requires approximately 10 min to administer and has high sensitivity and specificity for detecting symptomatic AD [10]. However, at milder stages of dementia, this test has exhibited poor sensitivity [11]. Specifically, due to ceiling effects, individuals with preclinical AD or early symptomatic AD (e.g., mild cognitive impairment [MCI] due to AD) are more likely to score within the “normal” range [12,13,14]. Beyond these validity and reliability issues, the MMSE is subject to copyright restrictions and carries fees for its use [15, 16], making it costly for large-scale studies and restricts research to investigators with sufficient resources.

For these reasons, the MMSE was not included in the National Alzheimer’s Coordinating Center (NACC) Uniform Data Set (UDS) in its third iteration (UDS 3) [17, 18] in lieu of the Montreal Cognitive Assessment (MoCA) [19], a short-form cognitive assessment similar to the MMSE with several advantages. The MoCA requires approximately 10–15 min to administer and tests seven domains of cognition: memory, visuospatial function, naming, attention, language, orientation, and abstraction. The most unique feature of the MoCA is that, apart from simple assessments of orientation and abstraction like those included in the MMSE, it uses items and test concepts drawn from classic long-form neuropsychological test batteries. The MoCA includes condensed versions, or a “sampling,” of more comprehensive measures including the trail making test, letter fluency, confrontation naming, digit span, and verbal list learning and recall. Like the MMSE, it is highly sensitive to dementia, but unlike the MMSE, it is more sensitive to early symptomatic AD [14, 20]; has shown sensitivity to AD biomarkers [21]; and is available at no cost for non-profit use, although its publisher has recently begun requiring training and certification fees (https://www.mocatest.org/training-certification/).

The goal of the current study was to determine how the short-form MoCA measure compares to standard long-form neuropsychological tests in terms of predicting the onset of symptomatic dementia in the NACC cohort. The NACC cohort is a large, well-characterized sample of older adults enrolled in ongoing studies of aging and dementia at ~ 30 Alzheimer's Disease Research Centers across the USA. Additional analyses were done using participants enrolled in studies at the Charles F. and Joanne Knight Alzheimer Disease Research Center (Knight ADRC) to determine the sensitivity of the short-form and long-form measures to AD neuroimaging biomarkers, including amyloid positron emission tomography (PET), tau PET, and magnetic resonance imaging (MRI) structural measures. We hypothesized that, in comparison with the MoCA, the long-form cognitive measures from the UDS 3 would show superior specificity and sensitivity in predicting conversion to symptomatic AD and would be more sensitive to AD neuroimaging biomarkers.

Study 1

Methods

Participants

The NACC UDS 3 is a standardized evaluation consisting of clinical and cognitive measures administered to participants enrolled in ongoing studies of aging and dementia at ~ 30 centers funded by the National Institute on Aging (NIA) Alzheimers Disease Research Center (ADRC) program. Written informed consent is obtained at the individual ADRCs and approved by individual Institutional Review Boards (IRBs). We included participant visits submitted to NACC by the ADRCs from March of 2015 to August 2019. This time period reflects when the MoCA and other UDS 3 measures were introduced into the ADRCs and submitted to NACC. Because we were interested in determining the utility of the MoCA as compared to the long-form UDS 3 cognitive battery in predicting disease progression from cognitive normality to onset of symptomatic disease, we included only participants who were cognitively normal at their first visit when MoCA was introduced into the ADRCs. Participants were required to have a Clinical Dementia Rating™ (CDR™) [22] of 0 at their first visit and have at least one follow-up visit.

Clinical and cognitive measures

Clinical status was determined with the CDR which uses a 5-point scale to characterize six domains of cognitive and functional performance that are applicable to AD and other dementias [22]. The domains include memory, orientation, judgment and problem solving, community affairs, home and hobbies, and personal care. CDR scores are determined through semi-structured interviews with the participant and a reliable informant such as a family member or friend. A CDR score of 0 indicates cognitive normality, 0.5 = very mild dementia, 1 = mild dementia, 2 = moderate dementia, and 3 = severe dementia.

The UDS 3 cognitive battery includes measures of episodic memory (Craft Story 21, Benson Complex Figure Recall), language (the Multilingual Naming Test (MINT)), visuospatial functioning (Benson Complex Figure Copy), immediate attention (Trails A, Number Span Forward), working memory (Number Span Backwards), and executive functioning (Trails B; see Weintraub et al., 2018 for a detailed description and associated references for individual measures). Each test was standardized using the mean and standard deviation from the first visit for individuals who remained CDR 0 (i.e., non-converters) to form Z-scores. Z-scores from each test were then averaged together to form domain scores (i.e., memory, visual, attention, language) which were then averaged together to form a simple global composite score. Because the MoCA total score is commonly used in the diagnosis of cognitive impairment [23], our analyses focused on comparing the MoCA total score with the UDS 3 global composite score.

Statistical analyses

Time to symptomatic conversion was operationalized as the time, in years, from participants’ initial study time point to the time point in which they were first determined to have a non-zero CDR score (or, in the case of individuals who never converted to a non-zero CDR, their final time point in the study). Individuals who died during follow-up were not included in the analyses (i.e., 148 non-converters and 31 converters). Two cut points, one at − 1 SD and one at − 1.5 SDs, were used for both the MoCA (scores of 23 and 21, respectively) and the UDS 3 global composite score. Similar cut points have been used in the literature to indicate the presence or absence of cognitive impairment [24,25,26,27,28]. Log-rank tests were used to compare the Kaplan-Meier survival curves for high and low groups for each measure (i.e., the MoCA and the UDS 3 global composite). Adjusted hazard ratios (HRs) and their 95% confidence intervals were estimated using a Cox proportional hazards model with covariates including baseline age, self-reported gender, years of education, and presence of one or more apolipoprotein ε4 (APOE4) alleles with the survival package version 3.2 in the R statistical computing environment [29].

To obtain measures of the two tests’ diagnostic discrimination abilities [30], the scales were subjected to an area under the curve (AUC) analysis of receiver operating characteristics (ROC) curves using the pROC package version 1.17 in R [30]. This analysis focused on examining which score provided better diagnostic accuracy in predicting conversion status using baseline data from individuals who were CDR 0 at their initial visit. The ROC analysis results were interpreted following the diagnostic accuracy guidelines from Swets (1996) such that an AUC < 0.70 indicates low diagnostic accuracy, an AUC in the range of 0.70–0.90 indicates moderate diagnostic accuracy, and an AUC ≥ 0.90 indicates high diagnostic accuracy [30]. DeLong’s test for correlated ROC curves was used to test whether the two areas under the curve were significantly different from one another.

Finally, multiple regression model comparison analyses were used to directly compare the efficacy of the baseline, continuous MoCA, and UDS 3 global composite scores in predicting the time to symptomatic conversion. Because the MoCA and UDS 3 global composite were correlated with one another, r = 0.70, p < 0.001, a single multiple regression model including both variables would be subject to potential multicollinearity issues. Therefore, non-nested models were used, and model fit was evaluated based on the R2, Akaike information criterion (AIC), Bayesian information criterion (BIC), and deviance values. For all models, males served as the reference group for the gender variable, and APOE negative served as the reference groups for the APOE variable.

Results

Demographics

Six thousand two hundred thirty cognitively normal older adults aged 72.9 ± 10.4 years from the NACC cohort were followed for an average of 2.3 ± 1.2 annual assessments (range = 1–5 assessments). As expected, converters had more assessments than non-converters, t(746.92) = 12.90, p < 0.001, d = 0.55. Participants that enroll as cognitively normal that are observed for longer periods are more likely to convert due to advancing age (see Table 1 for additional demographic information).

Table 1 Study 1 demographic data

Survival model results

Survival analyses were conducted using outcomes from the short-form MoCA and the long-form UDS 3 global composite to predict the time to symptomatic onset (defined as a change from CDR 0 to CDR > 0). Percentages of high- and low-scoring converters and non-converters for each cut point are presented in Table 2. Outcomes were contrasted on HR magnitude and their confidence intervals. For reference, higher HRs are interpreted as representing a better prediction of future conversion and narrower confidence intervals suggest less variability in prediction. Primary analyses contrasted individuals who scored above vs. below the − 1 SD and − 1.5 SDs cut points on the MoCA and UDS 3 global composite. HRs for symptomatic conversion by the cut point criteria described above, adjusted for demographic covariates, are presented in Table 3. For all models, the adjusted cumulative incidence of time to symptomatic conversion was increased in the individuals who scored below the cut point after controlling for demographic covariates as compared to individuals who scored above the cut point, HRs > 1.57, ps < 0.001. Comparison of HRs indicated that the UDS 3 global composite was slightly better at predicting symptomatic conversion than the MoCA with the − 1 SD cut point. However, the opposite was true (MoCA HR > UDS 3 global composite HR) with the − 1.5 SD cut point (see Table 3). Overall, these results ultimately suggest that both the MoCA and the UDS 3 global composite show relatively equivalent HRs and are thus similarly, yet modestly, efficacious in predicting symptomatic conversion. Kaplan-Meier curves for the MoCA and UDS 3 global composite at the two different cut points are presented in Fig. 1.

Table 2 Study 1 cut point Ns
Table 3 HRs for time to symptomatic conversion by task score adjusted for covariates
Fig. 1
figure 1

Survival curves for the MoCA and UDS 3 global composite for the two different cut points (− 1 SD and − 1.5 SDs)

ROC results

ROC analyses in the NACC sample indicated an area under the curve (AUC) estimate for the MoCA of 0.64 and an AUC estimate for the UDS 3 global composite of 0.66. DeLong’s test for two correlated ROC curves was non-significant, Z = 1.48, p = 0.13, indicating that these measures demonstrate similarly low diagnostic accuracy in predicting conversion to symptomatic AD [31].

Model comparison results

The multiple regression models predicting the time to symptomatic conversion indicated that both the continuous MoCA and UDS 3 global composite scores significantly predicted time to symptomatic conversion after covarying out effects of age, self-reported gender, years of education, and APOE4 status. The results of the model comparison analyses suggested that the model including the UDS 3 global composite score outperformed the model including the MoCA score in predicting time to symptomatic conversion. This was indicated by the larger R2 and smaller AIC, BIC, and deviance values (see Table 4).

Table 4 Study 1 model comparisons

Study 2

Methods

Participants

Participants were community-residing volunteers enrolled in longitudinal studies of memory and aging at the Knight ADRC at Washington University in St. Louis. We were interested in determining the associations between the MoCA and long-form measures with AD neuroimaging biomarkers, so we selected participants based on the availability of at least 1 amyloid PET scan, 1 tau PET scan, and a volumetric MRI that were collected within 1 year of each other. We also sought to determine the sensitivity of the MoCA and long-form measures in tracking disease progression from the asymptomatic to symptomatic stages; thus, data were analyzed in two groups: participants who were CDR 0 and participants who were CDR > 0. All participants signed a standard informed consent document, and all procedures were approved by the Institutional Review Board at Washington University in St. Louis.

Clinical and cognitive measures

Measures included the CDR (as described in study 1), MoCA, and UDS 3 (as described in study 1). The UDS 3 global composite was calculated in the same way as described in study 1.

Neuroimaging

Amyloid PET imaging was performed with either florbetapir (18F-AV-45) or Pittsburgh Compound B (PiB) and was acquired on a Biograph mMR (Siemens Medical Solutions, Malvern, PA). All florbetapir PET scans include data 50- to 70-min post-injection, and all PiB PET scans include data 30- to 60-min post-injection. All data were converted to standardized uptake value ratios (SUVRs) with the cerebellar cortex used as a reference region before then being converted to the Centiloid scale [32, 33]. PET data were processed with an in-house pipeline using regions of interest derived from FreeSurfer (https://github.com/ysu001/PUP) [34]. This approach corrects for the spillover signal from adjacent regions of interest and non-brain tissue on the basis of the scanner point spread function and the relative distance between regions. This partial volume correction approach accounts not only for spillover from different areas in the brain but also for spillover from the non-brain regions into the brain. Amyloid deposition was quantified with the average across the left and right lateral orbitofrontal, medial orbitofrontal, rostral middle frontal, superior frontal, superior temporal, middle temporal, and precuneus regions. Tau PET imaging used the tracer 18F-AV-1451 (flortaucipir) and was acquired on a Biograph 40 PET/CT scanner (Siemens Medical Solutions). Data from the 80- to 100-min post-injection window were converted to SUVRs using a cerebellar cortex reference and partial volume corrected. Deposition was summarized with the average of the bilateral entorhinal cortex, amygdala, inferior temporal lobe, and lateral occipital cortex [35]. MRI data were acquired on a Siemens Biograph mMR or Trio 3T scanner. T1-weighted images were acquired with a magnetization-prepared rapid acquisition gradient echo sequence acquired in the sagittal orientation with a repetition time of 2300 ms, an echo time of 2.95 ms, a flip angle of 9°, 176 slices, an in-plane resolution of 240 × 256, and a slice thickness of 1.2 mm. Images underwent volumetric segmentation with FreeSurfer 5.3 (freesurfer.net) to identify the regions of interest for further analysis [36, 37]. Cortical thickness values were obtained for each hemisphere for a limited number of regions of interest (ROIs) reflecting the brain atrophy patterns in AD [38]. Cortical thickness was calculated as the shortest distance between the cortical gray/white boundary to the gray/CSF boundary [39].

Statistical analyses

The relationships between the AD neuroimaging biomarkers (at baseline) and the MoCA/UDS3 global composite in CDR 0 and CDR 0.5 individuals were examined using multiple regression models in the R statistical computing environment. As in study 1, due to the high degree of correlation between the MoCA and UDS 3 global composite scores, non-nested model comparison metrics (i.e., R2, AIC, BIC, and deviance) were used in lieu of a single, nested multiple regression model to avoid potential confounding multicollinearity issues. Separate models were run predicting each biomarker and separately for each subset of the data (i.e., CDR 0 s, CDR 0.5 s) using demographic predictor variables as the covariates and either the MoCA or the UDS 3 global composite as the key independent variable. For all models, males served as the reference group for the gender variable, and APOE negative served as the reference group for the APOE variable.

Results

Demographics

Participants included 416 mostly older adults aged 45 to 92 years old enrolled in ongoing studies at the Knight ADRC (see Table 5 for the complete demographic information) and who had completed at least one relevant biomarker scan and a clinical assessment. Most participants (n = 365) were cognitively normal with a CDR of 0.

Table 5 Study 2 demographic data

Regression results

As shown in Table 6, analyses of CDR 0 participants indicated small but significant relationships between both short- and long-form scores (i.e., the MoCA and UDS 3 global composite) and tau PET and cortical thickness (ps 0.007–0.03). Model comparison results indicated that the MoCA and UDS 3 global composite were relatively similarly sensitive in predicting biomarker values as indicated by the R2, AIC, BIC, and deviance values (see Table 6). As expected, in the CDR 0.5 subsample, there were stronger relationships between the MoCA and UDS 3 global composite and all of the neuroimaging biomarkers, including amyloid PET, as compared to the CDR 0 sample (see Table 7). Model comparison analyses indicated that the MoCA was a better predictor of biomarker levels as compared to the UDS 3 global composite in the CDR 0.5. Scatterplots showing the relationships between the MoCA and UDS 3 global composite score and the three neuroimaging biomarkers, with separate regression lines for CDR 0 and 0.5 s, are displayed in Fig. 2. As shown in the scatterplots, and as made evident by the regression models, both the MoCA and UDS 3 global composite were essentially equivalent in their associations with the neuroimaging AD biomarkers.

Table 6 Biomarker multiple regression models for CDR 0 s
Table 7 Biomarker multiple regression models for CDR 0.5 s
Fig. 2
figure 2

Biomarker correlations with MoCA/UDS 3 global composite by CDR group

General discussion

The present study compared the efficacy of a popular short-form cognitive measure (MoCA) against a global composite score created from the more comprehensive long-form UDS3 cognitive battery in predicting AD progression. AD progression was operationalized as the time to the onset of initial symptoms. Additionally, we compared the short- and long-form measures on their sensitivity to AD neuroimaging biomarkers. Using data from the NACC, we compared the survival analyses, ROC curves, and multiple regression models using the MoCA total score and a global cognitive composite to investigate whether one measure outperformed the other. In a sample of data from the Knight ADRC, which included several neuroimaging biomarkers in addition to both the MoCA and UDS 3 cognitive battery, we further investigated the predictive power of each score in predicting biomarker levels in participants who were cognitively normal (CDR 0) and in those with very mild dementia (CDR 0.5).

Our first hypothesis, that the long-form measure would outperform the MoCA in predicting disease progression, was supported, although both measures produced only modest associations. We compared commonly used standard deviation (SD) cut points, at − 1.0 SD and − 1.5 SD, for both measures. Similar cut points have been used in the literature to indicate the presence or absence of cognitive impairment [24,25,26,27,28]. HRs from the survival analyses indicated that both the MoCA and UDS 3 global composite were relatively equivalent in their sensitivity to symptomatic conversion, though ROC analyses indicated that neither reached the threshold for acceptable diagnostic accuracy. Model comparison analyses in the NACC sample indicated that the UDS 3 global composite outperformed the MoCA by only a slim margin.

It appears that longer, more comprehensive cognitive assessments may afford a marginal benefit over shorter cognitive screening measures like the MoCA when used to predict the time to symptomatic disease onset. In contrast, when stricter cut points were used (i.e., − 1.5 SDs), the MoCA outperformed the long-form composite score. These findings should be interpreted with caution, however, because there are far fewer cases at this low-performance threshold for the long-form global composite than the MoCA (see Table 2 and confidence interval magnitude in Fig. 1), which was expected given that the sample was selected based on a baseline CDR of 0. These data also corroborate existing studies which find that cutoff scores, which purportedly identify individuals at the greatest risk for cognitive decline and symptom onset [2], have unacceptably poor predictive power, even when coupled with AD biomarker information [40].

We also hypothesized that the long-form global composite would prove superior to the MoCA in associations with AD neuroimaging biomarkers including amyloid PET, tau PET, and an AD-specific ROI cortical thickness measure. However, analyses in the Knight ADRC sample did not support this hypothesis. There were only minor associations between the MoCA and long-form global composite measures and tau PET SUVR and cortical thickness in CDR 0 individuals. As expected, and consistent with prior work in this cohort [41, 42], more robust relationships were seen in CDR 0.5 s and significant effects extended to all three imaging markers, including amyloid PET. Although it may be somewhat surprising that amyloid PET was not significantly associated with either cognitive measure in the CDR 0 participants, this has been observed in previous studies. Specifically, relationships with amyloid and cognition are relatively inconsistent [43,44,45], and accumulating evidence suggests that changes in cognition in the preclinical stages of AD are predominantly driven by tau pathology and structural changes, rather than by amyloid alone [41, 46, 47].

Limitations

The findings of this study should be considered in light of a number of limitations, which may be addressed in future studies. First, although it would be optimal to directly compare the performance of the short-form and long-form measures to one another in a single multiple regression, the high degree of correlation between the measures would result in potential multicollinearity issues if included in a single model together. Thus, our analyses were restricted to non-nested model comparison techniques. Second, we only compare baseline performance of the short-form and long-form measures here; it is possible that the long-form battery is more sensitive to decline than the short-form MoCA when assessing longitudinal change. Third, only a limited number of demographic covariates were included in the models presented. It is, of course, possible that the results may differ if other covariates were included. Finally, the participants included in the NACC cohort and the Knight ADRC cohort are composed of mostly highly motivated older adults who are comprehensively phenotyped and often engaged in imaging and fluid biomarker studies and therefore are not representative of the general population. Although a primary goal of the NIA ADRC programs and the Knight ADRC is to diversify enrollment to include underrepresented persons in aging and dementia research, the participants in these studies are primarily White and highly educated (see Tables 1 and 5).

Conclusions

Ultimately, the MoCA and UDS 3 global composite exhibit relatively equivalent, but limited, sensitivity to symptomatic conversion. Thus, depending on the use case, choosing the MoCA over longer cognitive batteries may afford sufficient sensitivity while also reducing administration time, costs, and participant burden. On the other hand, longer and more comprehensive measures do provide some unique power in predicting disease progression, depending on the cutoff point used. A shortcoming of both approaches is that neither demonstrated acceptable classification accuracy nor robust biomarker relationships in cognitively normal participants. In order for researchers and clinicians to reliably detect AD pathology in preclinical individuals, novel assessment methodologies with increased sensitivity and reliability (such as ecological momentary assessment studies or paradigms which move away from the traditional “one-shot” approach [48]) are necessary.