Introduction

Alzheimer’s disease (AD) is the fifth leading cause of death for older adults and affects one out of every nine older adults in the USA (Alz.org, 2021). Older adults who at one time have normal levels of cognition but who later receive an AD diagnosis are said to be in the preclinical stage of AD. In this preclinical stage, brain pathology develops including amyloid plaques and fibrillar tau (Jack et al., 2018). Tools to aid early detection of this preclinical stage would allow individuals to engage in lifestyle changes such as cognitive stimulation, mindfulness, and exercise (Deckers et al., 2015) and pharmaceutical treatments (Sperling et al., 2014) to lengthen their healthspan – the time during which they are cognitively healthy (Rowe & Kahn, 1987). While assessing in vivo markers of AD pathology can identify older adults at high risk of eventually being diagnosed with AD, current procedures to do so are expensive or invasive, making such detection inaccessible to much of the population such as working-class individuals or those who do not live close to a hospital. This barrier leaves a glaring necessity for more affordable techniques to identify AD risk consistent with those that detect AD pathology and are associated with progression of AD symptoms.

Much research is now focused on sophisticated neuropsychological tests that can identify subtle and early objective decline in cognition (but still in the normative range of cognition relative to the population) that can predict future cognitive impairment or conversion to later AD stages (e.g., Mortamais et al., 2017; Papp et al., 2019; Thomas et al., 2018). While early research used neuropsychological tests to measure single cognitive domains like executive function or episodic memory (e.g., Glisky et al., 1995), more recent research suggests that composite scores that measure multiple cognitive domains might be more sensitive. Such multi-domain composites like the Preclinical Alzheimer’s Cognitive Composite (PACC) have been rising in popularity to serve this purpose (Donohue et al., 2014; Lim et al., 2016; Mormino et al., 2017; Papp et al., 2017). Along these lines, proposals have been made to suggest that a common factor across many cognitive domains differs between cognitively normal older adults and those with AD, with single domains like episodic memory offering only a small proportion of unique effect in AD (e.g., Salthouse & Becker, 1998).

However, the level of cognitive performance estimated through both single-domain tests (e.g., episodic memory) and multi-domain composites (e.g., fluid abilities) often differ across diverse populations, potentially influencing the interpretation of the scores. For example, some Black older adults have lower cognitive performance and a higher incidence of an AD diagnosis than non-Hispanic White Americans, but at the same time also have less cognitive decline over time (e.g., Weuve et al., 2018). Researchers have largely attributed this contradictory pattern to some Black older adults being closer to the threshold of cognitive impairment such that slight declines in cognition can increase the likelihood of an AD diagnosis (Weuve et al., 2018).

An alternative explanation is that cognitive tests favor educated, non-Hispanic White males who originally created many of the neuropsychological tests that we use today (Helms, 1992; Jones, 2003). Some studies have questioned the face validity of neuropsychological tests when used in diverse groups because of a potential for a misdiagnosis of AD, although we note not all studies have found that traditional neuropsychological tests differ in minority groups from non-Hispanic Whites (Barnes et al., 2016). One recommendation to account for such measurement differences is to use a “personally relevant” cognitive marker that calibrates test scores (Rentz & Weintraub, 2000; Weintraub et al., 2018). The present study aimed to explore how such a cognitive marker, dubbed a crystallized-fluid ability discrepancy, differs across traditionally marginalized and under-represented groups: ethnoracial minorities, women, and those with lower levels of education.

Diverse characteristics influencing an Alzheimer’s disease (AD) diagnosis

Black and Hispanic Americans are twice as likely to be diagnosed with AD and related dementias than non-Hispanic White Americans (Matthews et al., 2018; Perkins et al., 1997; Steenland et al., 2016; Tang et al., 2001). Traditional cognitive screening tests (e.g., Mini-mental State Examination (MMSE) or Montreal Cognitive Assessment (MoCA)) have well-accepted limitations due to cultural and linguistic elements (e.g., error in correctly identifying “rhinoceros” in a naming subscale). In fact, some of the increase in AD diagnoses might be exacerbated by inappropriate cut-offs used during the screening process for marginalized groups (e.g., Goldstein et al., 2014) or biases during structured interviews (Kiselica, in press). Increasing evidence suggests that ethnoracial differences in cognitive test scores can measure disparate socioeconomic characteristics and arise from differences in family income, education, learning materials, and safe physical environment, to name a few (Cottrell et al., 2015). Higher socioeconomic status (SES) – often measured by education level – has been an indicator of resources that allows one to engage in cognitively rich activities to protect a person from cognitive and functional decline, thus delaying a diagnosis of AD (e.g., Bennett et al., 2003; Roe et al., 2007). This notion has been formalized into the concept of cognitive reserve (Stern, 2012). Thus, minoritized groups with lower SES may have lower cognitive reserve.

Lower SES has been associated with increased stress, which both directly and indirectly leads to deterioration of brain pathways involved in attention and memory (Letang et al., 2021; Nogueira et al., 2016; Zimmerman et al., 2016). Lower SES is also associated with poorer knowledge of beneficial health behaviors (Cubbin & Winkleby, 2005), access to healthcare resources (Adler et al., 1994; Braveman et al., 2010; Kennedy et al., 1998), lower levels of green space, and increased prevalence of fast-food chains (Larson et al., 2009; Powell et al., 2007). All of these factors are sustained and exacerbated by structural institutions that maintain a hierarchy of benefits for upper class non-Hispanic White males at the top and everyone else below, thereby decreasing health generally as well as cognition for under-represented groups (e.g., Barnett et al., 2012; Pohl et al., 2021). Related research has shown that everyday discrimination, which is often higher in minoritized groups, is also related to poorer cognition, further increasing cognitive disparities (e.g., Barnes et al., 2012; Ozier et al., 2019; Sutin et al., 2015; Zahodne et al., 2019). For these reasons, among others, socioeconomic conditions outside of one’s control have been proposed as a fundamental cause of many diseases (Link & Phelan, 1995). According to this idea, SES can influence disease processes through a variety of pathways, given its link to such a wide array of financial, social, and cognitive resources.

Late-life cognition and risk for AD not only differs between ethnoracial categories and SES, but also with biological sex (for review, see Buckley et al., 2018). Declines in sex hormones in later life have a direct link to AD risk in men and women. AD pathogenesis is believed to be regulated by estrogen and progesterone in women and primarily by androgens in men (Vest & Pike, 2013). The sharp reduction in sex hormones for women after menopause might then diminish the neuroprotection that these hormones have in reducing AD pathology, potentially explaining why women are 1.5 times more likely to be diagnosed with AD than men (Andersen et al., 1999; Matthews et al., 2018). Women diagnosed with AD sometimes have more neuropathology, faster rates of brain atrophy, and faster rates of cognitive decline than men (Ardekani et al., 2016; Buckley et al., 2018; Cavedo et al., 2018; Duarte-Guterman et al., 2021; Hohman et al., 2018).

Intersectionality of diverse characteristics

We have briefly summarized how diversity among older adults might lead to differential risk of late-life cognitive decline and risk for AD. Little research has focused on cognitive aging from an intersectional perspective (McDonough et al., 2021). In one example, over 5,000 non-Hispanic White, Black, and Hispanic men and women were tested on their memory and visuospatial abilities (Avila et al., 2019). Cognitive decline was greater for Black women compared to Hispanic men and non-Hispanic women, after adjusting for age and education. The Weathering Hypothesis proposes that social-disadvantaged individuals often stemming from systemic discrimination experience accelerated aging due to chronic and recurring stressors (Geronimus et al., 2006, 2015). Intersectional approaches related to cognitive decline and AD risk are important in the field of cognitive aging given that cumulative disadvantages (e.g., social factors, ethnoracial category, sex, and SES) may help explain how multiple disparities contribute to cognitive decline or “cognitive weathering” across age groups. Indeed, recent research has shown that ethnoracial disparities in a variety of cognitive domains is related to chronic stress and SES even in young adults (Letang et al., 2021).

An argument for a crystallized-fluid ability discrepancy score

The intersectionality of diverse characteristics can complicate the interpretation of low cognitive performance measures as a marker for risk for AD, especially if the measures do not account for educational and cultural biases (especially for non-Western, Educated, Industrialized, Rich, and Democratic (non-WEIRD) populations). At the same time, it may not be realistic to assume any task can be completely unbiased. Thus, a different strategy is to re-calibrate existing cognitive measures so that they can be more person-specific (Weintraub et al., 2018). Such a measure might account for an individual’s education level, linguistic ability, and lifelong intellectual ability within a cross-sectional sample to infer longitudinal declines in cognition (Deary et al., 2013). In contrast to this idea, most studies investigating cross-sectional differences in cognition confound individual differences in declining cognition with adult level or lifelong intellectual functioning that are affected by education and linguistic ability (Deary et al., 2013). Specifically, an older adult with current low levels of cognition could be at this low level because (a) they declined from previously higher levels, (b) they have always had a low but stable level of cognition performing at their own capacity (i.e., non-artificial low level), or (c) they have an artificially low level of performance because of education, linguistic, cultural, or acculturation differences due to the testing process.

A crystallized-fluid ability discrepancy score might address these issues. This ability discrepancy score is calculated by subtracting one’s fluid ability from one’s crystallized ability (Dierckx et al., 2008; Lezak, 1995; McCarthy et al., 2005), both of which derive from validated cognitive assessments with good reliability. Fluid abilities refer to cognitive domains that require online processing and do not require previous knowledge, including processing speed, executive function, episodic memory, and reasoning (Johnson et al., 2004; Lezak, 1995; Wechsler, 1944). Conversely, a crystallized ability describes knowledge that has been gained through learning or experience and has often been measured through language tasks (Blair & Spreen, 1989; Ekstrom et al., 1976; Wechsler, 1944; Zachary & Shipley, 1986). These two categories of ability often are highly correlated with one another throughout one’s adult lifespan (Cattell, 1971; Deary et al., 2013; Kaufman & Horn, 1996); however, a discrepancy, or asymmetry, begins to emerge in older adults diagnosed with AD and related dementias (O’Carroll & Gilleard, 1986; Wechsler, 1944). Specifically, fluid abilities begin to decline early, whereas some crystallized abilities are relatively spared until later stages.

Extending these ideas to people at risk for AD, McDonough et al. (2016) found that larger ability discrepancy scores were associated with in vivo estimates of amyloid plaques and neurodegeneration – two biomarkers of AD. McDonough and Popp (2020) replicated these findings in a larger, independent sample and further showed that an ability discrepancy better predicted AD biomarkers than a composite episodic memory score and single-domain discrepancy scores (e.g., crystallized-episodic memory). Such consistent associations with AD biomarkers suggest the potential for an ability discrepancy score to aid in the identification of high-risk subgroups in the cognitively normal range. Specifically, we argue that both fluid and crystallized abilities often depend on language (e.g., verbal materials and instructions) and accounting for crystallized abilities helps adjust for levels of fluid ability that might be artificially lower than expected as previously outlined. While promising, previous tests validating this measure have used majority non-Hispanic White samples with relatively high SES. Thus, it may not be a sufficient predictor of risk for AD or sensitive enough to detect intra-individual variability among the diverse aging population. Instead, it might be useful for detecting broad differences in cognitive decline and ranking individuals by their performance (e.g., large vs. small cognitive discrepancy).

Current study

The present study aimed to build upon the previous research by addressing the following questions. First, does an ability discrepancy reduce pre-existing differences in cognitive assessments often found in groups at risk for AD or other related dementias such as ethnoracial category, SES, sex, and their intersection? Second, is an ability discrepancy measure equally predictive of AD symptom severity across varying levels of diverse characteristics? Lastly, does an ability discrepancy predict AD symptom severity even when controlling for a composite score of episodic memory or a fluid composite score?

Method

Study details

Since 2005, the National Alzheimer’s Coordinating Center (NACC) has collected the Uniform Data Set (UDS) on participants from approximately 30 past and present US Alzheimer’s Disease Research Centers (ADRC). All participants entering the ADRCs follow standard protocols by a trained clinician according to NACC guidelines. The data used in these analyses spanned all three versions of the UDS from September 2005 to December 2016 and included data from 35 different ADRCs.

Participants

Participants in the final sample were included if they had all crystallized measures and at least four of the five fluid measures of interest, years of education to serve as a proxy for SES, self-reported ethnoracial category, self-reported sex, and a Clinical Dementia Rating (CDR)® dementia staging instrument (Morris, 1993). The CDR is a semi-structured interview given to both participants and informants to assess cognitive functioning and daily functions (memory, orientation, judgment and problem solving, community affairs, home and hobbies, and personal care). The global CDR score groups people into five categories of dementia staging (0, no impairment; 0.5, questionable impairment; 1, mild impairment; 2, moderate impairment; and 3, severe impairment). The sum of boxes score (CDR-SOB) is obtained by summing each of the domain box scores ranging from 0 to 18 to form a symptom severity score.

Exclusion criteria were a primary language self-reported as anything other than English, and “other” ethnoracial category. Participants differed in their reasoning for coming to the ADRC (e.g., clinical evaluation vs. research participation), which might lead to different profiles of cognition. To reduce this variability, we also restricted participants to those whose primary reason was for research.

The final cross-sectional sample consisted of 14,257 individuals aged 60–104 years. Table S1 (see Online Supplemental Material (OSM)) includes the demographic data for cognitively normal older adults (CDR global score = 0) and those classified as on the AD spectrum (CDR global score > 0). We note that while the percentage of participants representing ethnoracial minorities and lower education backgrounds is small (which is often found even in nationally representative datasets; Zahodne et al., 2017), the large sample contains a sufficient number of participants from these backgrounds to test for differences between these categories.

Neuropsychological testing

Cognitive measures were chosen a priori from available fluid and crystallized measures based on McDonough and Popp (2020) to form the ability discrepancy scores. These measures included logical memory (immediate and delayed) that measures recall of details in two stories (Wechsler, 1997), category fluency that measures the number of animals and vegetables that can be named within 1 min (Benton, 1983), Trail Making Test that measures the speed at which letters and numbers can be sequentially connected via a pencil (Reitan, 1979), WAIS-R Digit Symbol that measures the speed of matching a symbol to a digit using a key (Wechsler, 1997), and the 30-item Boston Naming Test that measures the number of named picture objects of varying difficulty level (Kaplan et al., 1983).

For each participant, an ability discrepancy score was calculated by standardizing each measure within the cognitively normal group (CDR = 0). In the whole sample, individual task measures were sometimes missing. Because these scores do not rely on any single task or cognitive domain (cf. McDonough & Popp, 2020; McDonough et al., 2016), all measures that were available were averaged together to form fluid and crystallized ability scores. Crystallized ability was composed of the language tasks (i.e., category fluency and Boston Naming Test) and fluid ability was composed of the other measures. While the tasks used to form this crystallized ability composite are sometimes considered fluid ability scores, they were most representative of language abilities and have been validated to be used as appropriate calibration tasks for the other types of fluid abilities used here (McDonough & Popp, 2020). Notably, these language tasks load together with more traditional tasks of crystallized ability like word reading (McDonough & Popp, 2020). Moreover, this study showed that including these language tasks into a crystallized ability score to form an ability discrepancy score resulted in greater sensitivity to beta-amyloid accumulation and cortical thinning in AD signature regions than the inclusion of word reading alone.

Discrepancy scores for each subject were calculated by subtracting each fluid composite score from the crystallized composite score. Greater scores represented a larger discrepancy in ability, representing a precipitous decline in cognitive functioning (e.g., Kaufman & Horn, 1996; Matarazzo & Herman, 1985; Schretlen et al., 1994). Lower scores, on the other hand, represented more successful cognitive aging. A separate memory composite score also was calculated by averaging the immediate and delayed logical memory scores to be used as a comparison.

Statistical analyses

In the first analysis, we provide a foundation for the assumption that an ability discrepancy score might minimize existing disparities in cognition between sex, ethnoracial category, and education levels. We also tested the assumption that age cohort does not strongly alter these effects by stratifying the analyses by age decade. For these analyses, we conducted linear mixed models to test the main effects of each demographic factor (between subjects) and composite measure (ability discrepancy, fluid composite, memory composite; within subjects) on performance. Explicitly testing the different composite measures provides evidence as to whether group disparities in commonly used cognitive composites (fluid and memory) are larger than any disparities in an ability discrepancy score. Additional Bayesian tests were conducted to provide evidence that groups differed in each of the composite measures.

Next, hierarchical linear regression models were conducted to investigate the impact of sex, education, Hispanic ethnicity, and race (Black, Asian) on level of ability discrepancy, and whether these measures of diversity might moderate the predictability of ability discrepancy level on dementia severity (CDR-SOB). We included all higher-order interactions of our diversity measures to assess potential differences due to intersectionality of these characteristics. Hierarchical models were used to build each regression model in sequential steps while also allowing us to see whether any missing values in the covariates led to selection bias effects. To the extent that the predictors of interest remained significant in the fully adjusted models compared with the unadjusted models, we would conclude that the results would be robust to both the included covariates and the slight change in sample characteristics due to listwise removal of participants with missing data. The first step of each regression model included only the key predictors of interest and the outcome variable. The second step included interactions between each of the key predictors (if applicable). Critically, including interactions of diversity characteristics serves as our method of investigating intersectionality. Although intersectionality is sometimes assessed by creating separate subgroups, this method provides a systematic manner to assess multiple combinations of intersecting identities through different orders of interactions. The last step included all of the available covariates that might impact the interpretation of sex, education, or ethnoracial category effects. We set the alpha level to 0.005 in accordance with recent recommendations (Benjamin et al., 2018).

The first hierarchical model tested the extent that level of ability discrepancy differed with sex, education, and ethnoracial category. In this model, each of the measures of diversity and their interactions were entered as the independent variable and ability discrepancy was entered as the outcome variable. Covariates were chosen based on their previous associations with cognitive status such as risk for comorbidity, risk for cardiovascular disease, risk for cognitive impairment other than dementia, and functional status. The specific covariates were dummy coded when appropriate and included: age; year visiting the center; permanent move to nursing home; average number of packs smoked per day; smoked cigarettes in past 30 days; total years smoked cigarettes; stroke (remote, recent, absent); Parkinson’s disease present; active depression in the last 2 years; marital status (married, widowed, divorced, separated, never married, living as married); type of residence (private residence, retirement/independent living community, assisted living); level of independence (independent, assistance with complex activities, assistance with basic activities, completely dependent); hypertension (remote, recent, absent); alcohol abuse (remote, recent, absent); and diabetes (remote, recent, absent).

The second and third hierarchical models tested the extent that the level of ability discrepancy predicted AD symptom severity and whether our measures of diversity impacted this discrepancy-cognitive status relationship. Ability discrepancy, the measures of diversity, and their interactions were entered as predictors. CDR-SOB score was entered as an outcome variable. The same covariates were used in these two models as in the first model. In the final model, an episodic memory composite score also was entered as a predictor to test whether controlling for episodic memory eliminated any of the effects. For all models, we report standardized betas and interpret values < 0.1 as very small, between 0.1 and 0.2 as small, between 0.2 and 0.3 as moderate, and greater than 0.3 as large (Gignac & Szodorai, 2016).

Results

Consistent with prior work (e.g., Cattell, 1971; Kaufman & Horn, 1996), the crystallized ability and average fluid ability scores were positively correlated with one another in this cognitively normal sample, r(8,079) = 0.60, 95% CI [0.59, 0.61], p < 0.001. Subtracting the average fluid composite score from the crystallized ability score resulted in an ability discrepancy score for each subject that ranged from -2.11 to 2.29.

Comparison of diversity characteristics on cognitive measure stratified by age decade

The first set of analyses tested the extent of group differences in cognitive performance as a function of sex, education, and ethnoracial category in cognitively normal individuals (CDR = 0) and stratified across age decade (see Figs. S1 and S2 in the OSM). For adults in their 60 s (N = 2,777), we found significant interactions between cognitive measure and each of the group characteristics other than Hispanic ethnicity at the set alpha level of 0.005 (Table 1). These interactions were driven by a larger difference among sexes (female > male), education (higher > lower), identifying as Black (Black < White), and identifying as Asian (Asian > White) for the composite scores than the ability discrepancy score. Thus, group differences were largely reduced when using an ability discrepancy score compared with single and multi-domain composite scores. Congruent with these findings, Bayes factor outputs for the sixties decade suggests anecdotal evidence for no group differences in ability discrepancy scores, but moderate to “extreme” evidence for most of the group differences in fluid and memory composite scores (Table S2, OSM). The exceptions include only anecdotal evidence for group differences on the fluid composite score between Hispanic White and non-Hispanic White Americans and moderate evidence for no differences on the memory composite score between Black Americans and non-Hispanic White Americans. Similar patterns of results were found for adults aged in their seventies (N = 3,336), 80 years and older (N = 1,968), and when including the whole sample controlling for age (N = 8,081). The primary exception was that in the seventies and eighties age groups, the fluid composite did not strongly differ from ability discrepancy for Black Americans.

Table 1 Comparisons between demographic characteristics and cognitive composite score to an ability discrepancy score

Overall, most comparisons revealed group differences in fluid and memory composite scores across each of the decades consistent with prior research (e.g., Helms, 1992; Jones, 2003). Critically, nearly every group difference was smaller for ability discrepancy scores in cognitively normal older adults, regardless of age decade (Fig. 1). These results suggest that considering crystallized abilities might level the playing field for assessing baseline cognition across diverse demographic groups.

Fig. 1
figure 1

Comparison between three measures of cognition (ability discrepancy, fluid composite, episodic memory composite) as a function of sex, education, and ethnoracial category. In each panel, disparities between cognition scores are reduced when using an ability discrepancy score compared to a fluid or memory composite score

Intersectional characteristics predicting ability discrepancy, fluid cognition, and episodic memory

We next tested the extent that interactions among these diverse characteristics (thus representing intersectionality) were associated with varying levels of ability discrepancy and compared it with episodic memory. The first HLM only included the main effects of these characteristics in cognitively normal older adults. Standardized beta coefficients and standard errors for each factor can be found in Table S3 (OSM). The first unadjusted model was significant, F(6, 8,074) = 3.23, MSE = 0.53, p = 0.0036. Individuals with higher education (p = 0.0032) were more likely to have higher ability discrepancy scores. The second unadjusted model included interactions among each factor to identify differences in ability score associated with intersectionality of diverse characteristics. The second model was significant, F(16, 8,064) = 2.31, MSE = 0.53, p = 0.0022. Of the main effects, years of education remained significant (p = 0.0021). Of the interactions entered into the model, only the education × sex interaction reached significance (p = 0.00066). As seen in Fig. 2, this interaction occurred because the relationship between years of education and ability discrepancy was stronger for females than males. Specifically, higher education was associated with a larger ability discrepancy score in females (ß = 0.033, SE = 0.008, p < 0.001), and no relationship was found in males (ß = -0.009, SE = 0.010, p = 0.37). The third model included our covariates and was significant, F(42, 7,468) = 2.11, MSE = 0.53, p < 0.0001. Neither the size nor the significance of the effects changed from the second to the third model, suggesting the effects from Model 2 were robust. When the episodic memory composite score was included in the model (Model 4, F(43, 7,467) = 12.91, MSE = 0.51, p < 0.0001), the education × sex interaction was still significant (p = 0.002), as was the effect of episodic memory (p < 0.0001). These findings suggest that intersectionality of sex and education on the ability discrepancy measure were largely independent of traditional cognitive measures of early declines in AD found by episodic memory. Because the ability discrepancy score did, in part, consist of episodic memory, it is not surprising that the episodic memory composite score also significantly predicted ability discrepancy.

Fig. 2
figure 2

Interaction of Education and Sex on Level of Ability Discrepancy. While males with lower education had higher ability discrepancy scores than females, males with higher education had lower ability discrepancy scores than females. To the extent that a greater ability discrepancy score is indicative of greater risk of developing dementia, higher education does not appear to be protective for females

In two final models, we tested the extent that intersectionality differences also might be found on fluid cognition (Model 5) and episodic memory (Model 6) composite scores after controlling for covariates. Each of the models was significant (ps < 0.001). Although no significant interactions were found, both models revealed main effects of age (ps < 0.001), sex (ps < 0.001), years of education (ps < 0.001), identifying as Hispanic ethnicity (ps < 0.005), and identifying as Asian (ps < 0.001). The episodic memory score had an additional main effect of identifying as Black (ps < 0.001). While the fluid cognition and episodic memory composite scores revealed similar patterns in comparison with each other, they differed markedly from an ability discrepancy. Notably, whereas the mean composite scores showed many large group differences (as readily apparent in Figs. 1, S1 (OSM), and S2 (OSM)), the ability discrepancy score showed differences only for females with more education relative to other groups also with more education.

Impact of level of ability discrepancy on AD symptom severity as a function of diversity

The next set of analyses assessed the extent that level of ability discrepancy predicted AD symptom severity across the AD spectrum (Models 7–10). We also tested whether this discrepancy-symptom relationship was moderated by sex, education, and ethnoracial category. The first regression only included the main effect of ability discrepancy. Standardized beta coefficients and standard errors for each factor can be found in Table S4 (OSM). The first model (Model 7) was significant, F(1, 14,255) = 31.00, MSE = 2.24, p < 0.0001. A positive relationship was found between ability discrepancy and CDR-SOB score (p < 0.0001). The second model (Model 8) was also significant, F(30, 14,226) = 13.95, MSE = 2.197, p < 0.0001. The highest order interaction in this model was an ability discrepancy × sex × education × Hispanic ethnicity interaction (p < 0.001; Table S4 (OSM)). To better understand this interaction, two additional regressions were conducted separately for non-Hispanic and Hispanic White groups (see Fig. 3). For non-Hispanic White Americans (N = 12,729), a main effect of ability discrepancy was found on CDR-SOB score (ß = 0.080, MSE = 0.020, p < 0.001) but neither the ability discrepancy × sex (p = 0.033), ability discrepancy × education (p = 0.34), nor the ability discrepancy × education × sex interactions (p = 0.27) reached our threshold of significance. For Hispanic Americans (N = 343), we see a different pattern. Only the ability discrepancy × education × sex interaction reached significance (ß = -0.420, MSE = 0.124, p = 0.001), which was due to no significant effect of ability discrepancy on CDR-SOB score for females (ps > 0.06) but a nearly significant ability discrepancy × education interaction for males ((ß = 0.537, MSE = 0.192, p = 0.0058). Hispanic males with at least a college degree did not show a relationship between an ability discrepancy and AD symptom severity. However, for Hispanic males with a high school degree or lower, a greater ability discrepancy score was associated with less severe AD symptoms (the opposite direction to that in non-Hispanic White Americans and less educated Hispanic females). As shown in Table S4 (OSM), these interaction patterns were still significant after including covariates (Model 9; four-way interaction p < 0.001) and episodic memory (Model 10; four-way interaction p < 0.001).

Fig. 3
figure 3

Four-way interaction between Ability Discrepancy, Sex, Education, and Hispanic ethnicity on CDR Sum of Boxes score. For non-Hispanic White Americans, a higher ability discrepancy was associated with greater Alzheimer’s disease (AD) symptom severity regardless of sex and educational level. In contrast, Hispanic males with lower education exhibited a relationship in the opposite direction and Hispanic females with higher education exhibited no relationship

In the last two models, we tested the extent that fluid cognition (Model 11) and episodic memory (Model 12) composite scores equally predicted the CDR-SOB score across the diverse groups. Beta values and standard errors can be found in Table S4 (OSM). For the fluid composite score, the highest order interaction was a significant cognitive marker × sex × Black interaction (p < 0.001). While all groups showed a negative relationship between fluid composite and AD symptom severity, non-Hispanic White males showed a stronger relationship than non-Hispanic White females, whereas Black females showed a stronger relationship than Black males (see Fig. 4). For the episodic memory composite score, the same three-way interaction between cognitive marker × sex × Black was found (p < 0.001; see Fig. 4). However, an additional episodic memory × education interaction was significant (p < 0.001), such that older adults with lower education had a stronger negative relationship between episodic memory and AD symptoms than older adults with higher education (see Fig. 4).

Fig. 4
figure 4

Interactions between Cognitive Marker, Sex, and Race and between Episodic Memory × Education on CDR Sum of Boxes score. Non-Hispanic White males showed a stronger relationship than non-Hispanic White females whereas Black females showed a stronger relationship than Black males. Older adults with lower education had a stronger negative relationship between episodic memory and Alzheimer’s disease (AD) symptoms than older adults with higher education

Together, these results suggest that none of the cognitive markers tested show the same relationships with AD symptom severity across all groups of older adults. However, the ability discrepancy score complimented the more traditional mean composites with the exception of being less sensitive to AD symptom severity in older Hispanic males with lower education and older Hispanic females with higher education. In contrast, the fluid and episodic memory composite scores were less sensitive (a) in non-Hispanic White females, (b) in Black males, and (c) for more educated older adults (for episodic memory only).

Discussion

The present study investigated whether a crystallized-fluid ability discrepancy measure (a) differed across intersecting identities that have previously been shown to increase the risk for AD in late life, and (b) predicted severity of AD symptoms similarly across the subgroups. We also compared this measure to commonly used mean composite measures of fluid and episodic memory abilities.

An ability discrepancy score levels the playing field across diverse characteristics

First, we found that the fluid and memory composite scores generally exhibited large group differences with sex, education, and ethnoracial category in cognitively normal adults. In contrast, the ability discrepancy score showed much smaller group differences. These patterns were supported by Bayes factor values that consistently indicated evidence toward no group differences in the ability discrepancy score, but for moderate to extreme group differences in the mean composite scores with the exceptions of differences for Hispanic and Black Americans relative to non-Hispanic White Americans. Comparisons between the three types of measures showed that the ability discrepancy score leveled the playing field for baseline levels of cognition relative to fluid and memory composite measures. One exception was for ethnoracial category, especially in Hispanic Americans. The relatively small benefit of an ability discrepancy score between ethnoracial differences stands in contrast to the large benefits the score has in reducing differences between sexes and education levels. This pattern suggests that although a crystallized-fluid ability discrepancy score reduces baseline differences among many subgroups at risk for AD, this measure might be especially useful in reducing differences due to sex and education.

Baseline cognitive level is an important predictor of dementia risk (e.g., Weuve et al., 2018). Individuals who start out with lower levels of cognition have less room to fall until those cognitive declines begin to interfere with daily functions. For example, Weuve et al. (2018) showed that some Black Americans are at a higher risk of developing a dementia diagnosis than non-Hispanic White Americans, and they argued this pattern arose because of their lower baseline levels of cognition. From this perspective, lower levels of cognition in marginalized groups might be due to lower levels of cognitive reserve. However, this argument assumes that baseline cognition was accurately assessed across all diverse subgroups. Some researchers argue that neuropsychological tests are culturally biased (intentionally or unintentionally) to favor the creator of those tests – namely, educated non-Hispanic White and English-speaking men (Helms, 1992; Jones, 2003). These biases can be manifested in what constitutes intellectual ability, the language standards used in testing instructions and materials, the cultural relevance of the stimuli, and even what a correct answer is assumed to be. Thus, higher performance on neuropsychological tests might be more of a measure of acculturation to the dominant culture than to actual level of cognitive ability. As stated by a recent perspective paper, very little research has been conducted on cognitive reserve in ethnoracial categories (and other marginalized groups) to fully understand how to balance these different perspectives (Babulal et al., 2019). Although the link with cognitive reserve is unclear, the present findings show that an ability discrepancy score is one way to successfully reduce such baseline differences between subgroups. By using language tasks to correct for fluid ability, perhaps one can directly assess how much language is negatively impacting cognitive assessments via differences in acculturation to the testing environment or language standards.

Similar corrections to cognitive measures have been made in health disparities research. Specifically, researchers have advocated for statistically controlling for quality of education through reading level (e.g., Manly et al., 2004). Reading level has been proposed as a better indicator of education quality than years of education because many older ethnoracial minorities grew up in segregated school systems that had fewer hours in a day and fewer school days in a year than schools for non-Hispanic White Americans (e.g., Crowe et al., 2013). Example proxies for education quality include word pronunciation tasks or vocabulary tasks. Such corrections can eliminate or at least attenuate mean levels of cognitive performance between subgroups (e.g., Crowe et al., 2013; Fyffe et al., 2011; Garcia et al., 2021; Manly et al., 2004). Although measures of reading level were not available in this data set, McDonough and Popp (2020) previously showed that composite measure of multiple language abilities (including the Boston Naming Test and Category Fluency) loaded onto the same factor as word reading and were sufficient to create the crystallized portion of the ability discrepancy score that was more sensitive to AD biomarkers than word reading alone. Using objective language tasks to construct an ability discrepancy score, as was done in the present study, likely was a key factor in reducing the ethnoracial and educational level gap found in the mean composite scores.

Ability discrepancies differ at the intersection of education and sex

Our second main finding was that a greater ability discrepancy score (i.e., greater AD risk) was associated with higher education for females but no relationship in males. Instead of interpreting the meaning of an ability discrepancy in highly educated females, this deviation from the other groups might simply indicate that an ability discrepancy does not level the playing field for this subgroup. Females often have better language abilities than males (e.g., Asperholm et al., 2019a, b) that may have inflated their crystallized scores (and/or perhaps increased their cognitive reserve). If so, females would naturally show a greater ability discrepancy score than males. From this perspective, it is worth emphasizing that whereas only females with high education appear to differ in baseline ability discrepancy levels, the mean composite scores were much more susceptible to potentially artificial baseline differences not only in sex and years of education, but also in age and each of the ethnoracial categories assessed. Ultimately, these findings suggest that an ability discrepancy, as constructed here, should be used with caution in research and clinical settings for highly educated females.

Ability discrepancy predicts AD symptom severity

Lastly, we assessed the relationship between ability discrepancy and AD symptom severity across diverse sets of characteristics. This analysis can inform whether an ability discrepancy is sensitive to AD symptom severity across the AD spectrum and test whether this relationship is similar across diverse and intersecting backgrounds. Using the CDR-SOB score as our measure of AD symptom severity, we found that a greater ability discrepancy score was associated with more AD symptoms in unadjusted analyses. When further probing the intersecting effects of subgroups, we found an interaction between ability discrepancy, sex, and education for Hispanic Americans. Breaking down this interaction revealed that Hispanic males with at least a college level of education and Hispanic females with a high school degree or less showed similar positive relationships as the other participants in the sample (i.e., greater ability discrepancy and more severe AD symptoms). However, two subgroups deviated from these patterns: Hispanic males with a high school degree or lower showed the reverse pattern (greater ability discrepancy, fewer AD symptoms) and Hispanic women with at least a college education evidenced no relationship between an ability discrepancy score and AD symptom severity. At minimum, these findings suggest that while an ability discrepancy score is generally predictive of AD symptom severity across many diverse subgroups, this association does not hold for some intersecting identities, and should be considered cautiously by researchers and clinicians.

A broad span of literature suggests that, as a group, those with a lower education level (and lower SES, more generally) and those who identify as being Hispanic have greater prevalence and incidence rates of an AD diagnosis compared with non-Hispanic White Americans (Matthews et al., 2018; Perkins et al., 1997; Steenland et al., 2016; Tang et al., 2001). Although Hispanic Americans, as a group, have lower education levels compared to other ethnoracial categories in the USA (Braveman et al., 2010), our findings suggest that an ability discrepancy measure might be less sensitive to risk for AD in this subgroup. One reason for this decrease in sensitivity is that the cognitive profile of Hispanic Americans may differ in AD compared to non-Hispanic White Americans. Indeed, Hispanic Americans can sometimes exhibit less cognitive impairment than non-Hispanic White Americans despite having autopsy-confirmed AD (Weissberger et al., 2019). One speculation is that bilingualism among some Hispanic Americans might confer a greater degree of cognitive reserve, thereby reducing AD symptoms (e.g., Bialystok et al., 2007; Schweizer et al., 2012). The present data set did not have information about bilingualism or multilingualism to test these ideas.

Comparing ability discrepancy with other cognitive domains

We also examined whether episodic memory, a domain of cognition known to decline early in preclinical AD (e.g., Boraxbekk et al., 2015; Sperling et al., 2011; Weintraub et al., 2018), could explain the relationship between an ability discrepancy and AD symptoms. Controlling for episodic memory did not alter any of the findings. However, episodic memory was associated with AD symptom severity (as measured by the CDR-SOB Score) in these models, suggesting that both an ability discrepancy score and estimates of episodic memory might contribute unique information if used to detect people at risk for AD or used as an outcome variable for potential treatments (see also Salthouse & Becker, 1998). This conclusion is supported by a study that found that greater non-memory impairments at baseline were associated with steeper annual rates of decline in CDR score than baseline memory scores (Scheltens et al., 2018). The authors suggested that non-memory impairments lead to faster AD symptom severity than memory impairments.

Although both episodic memory and fluid cognition were related to AD symptoms, these effects were qualified by interactions with ethnoracial category and sex. Specifically, the associations were relatively weaker for non-Hispanic White females and Black males. These interactions suggest that, like an ability discrepancy, the cognitive markers are not equally sensitive to AD symptoms across subgroups. The reasons, however, for these weaker associations are not clear. Regardless, it is clear that the unique relationships between episodic memory/fluid cognition and AD symptoms should be considered in the context of potential biases in measurement, which is relevant to recent recommendations to include such scores as key outcome measures in clinical trials (e.g., Chhetri et al., 2018; Donohue et al., 2014). The present study suggests that such fluid composite scores might not be equally sensitive to treatment outcomes in all individuals, especially those at most risk for AD such as ethnoracial minorities, women, and those with lower levels of cognitive reserve.

Concerns regarding the formation of an ability discrepancy score

Despite its potential, several concerns exist to its formation and implementation. The crystallized ability score used here was comprised of tasks relying on language performance from which to anchor other fluid cognitive abilities. Our primary argument is that such anchoring can help correct for premorbid levels of intelligence and reduce the influence of language difficulties that can lead to an artificially lower estimation of other fluid abilities. However, such anchoring also provides limitations to understand cognitive decline on the AD spectrum. Specifically, some productive language tasks including the Boston Naming Test and Verbal Fluency used here have been used as early indicators of AD-related declines in cognition (Jacobs et al., 1995), although sometimes not as early as episodic memory (Hamel et al., 2015; Howieson et al., 2008; Mistridis et al., 2015). From this perspective, one might argue that subtracting out such important cognitive indicators in the discrepancy score used here may contradict its use. Indeed, we have argued previously that multiple measures of receptive language tasks like vocabulary or word pronunciation would make for a more suitable crystallized ability composite (McDonough & Popp, 2020; McDonough et al., 2016). Unfortunately, multiple versions of such receptive language abilities often are not available in standardized neuropsychological batteries. Thus, the present study relied on tasks that greatly depend on language to form these composite scores. Although perhaps counterintuitive, we have previously validated the use of such composites by showing that a discrepancy score using the same productive language tasks was related to amyloid accumulation and neurodegeneration in AD signature regions, while a language discrepancy score using word pronunciation as the only measure of crystallized ability did not correlate with those same AD biomarkers (McDonough & Popp, 2020). This finding also is consistent with a meta-analysis that showed amyloid was not correlated with semantic memory (Hedden et al., 2013) and in autopsy measures showing no relationships between the productive language tasks and amyloid plaques, diffuse senile plaques, or limbic neurofibrillary tangles (Price et al., 2009).

If language tasks can be used as indicators of risk for AD, then how can we interpret an ability discrepancy score that uses language tasks as a method of correction? One possibility is that an ability discrepancy score measures earlier subtle declines in cognition before declines in language abilities. Indeed, multiple longitudinal studies have suggested that declines in language abilities occur after declines in other cognitive domains like episodic memory (Hamel et al., 2015; Howieson et al., 2008; Mistridis et al., 2015). A second possibility is that language deficits in AD may represent a subtype of AD. Converging evidence suggests that one infrequent AD variant (occurring in 19–22.4% of AD cases) increases atrophy primarily in the left temporal lobe and is associated with lower language performance (Vogel et al., 2021; Zhang et al., 2021). This recent research is consistent with lesion-mapping studies and volumetric studies that point selectively to a role of left lateral temporal cortex involved in productive language tasks like the Boston Naming Test (Baldo et al., 2013; Seidenberg et al., 2005). The left temporal AD variant was also associated with fewer participants with abnormal amyloid levels (Zhang et al., 2021), consistent with the lack of association between language deficits and AD pathology found in previous studies (Hedden et al., 2013; McDonough & Popp, 2020; Price et al., 2009). Thus, an ability discrepancy score, as defined in the present study, may not capture early cognitive decline in this AD subtype, but may capture early cognitive decline in typical AD expressions and perhaps other subtypes not associated with language deficits.

Another criticism is that an ability discrepancy score relies on the use of difference scores (Frazen et al., 1997; Rogosa & Willett, 1983). Difference scores can sometimes reduce between-subject differences and compound measurement error from each score that makes up the subtraction (Hedge et al., 2018). However, we propose that composite fluid and crystallized scores can be used to reduce measurement error in comparison with the single-task scores commonly criticized (McDonough & Popp, 2020). Additionally, even those who criticize the use of difference scores acknowledge that such scores might be useful when subtracting a baseline measure to control for unwanted between-subject variance (Hedge et al., 2018). In this case, a crystallized/fluid ability subtraction is theoretically motivated based on the cognitive domains that often decline early in the AD process (i.e., fluid abilities) versus later (i.e., crystallized abilities).

Study limitations

The study should be interpreted considering its limitations. First, while the sample from the NACC is quite large and diverse, it should not be interpreted as a nationally representative sample. Participants in the sample were selected based on their interest in helping research rather than randomly selected from the community. Second, the sample also consists of residents predominantly in urban areas near medical centers. Thus, the sample contains large numbers of highly educated non-Hispanic White Americans. Third, while this study was the first to examine the predictive value of ability discrepancy scores across diverse groups of older adults, we used baseline data only for these inferences. Longitudinal investigations are needed to further study how diversity metrics influence the predictability of ability discrepancy scores on late-life cognition and conversion to AD. Fourth, several measures were unavailable in the NACC data set including word reading or vocabulary measures to compare with the other language measures, memory recognition scores to complement recall scores, and information about bilingualism or multilingualism. Fifth, all the tests used here were originally developed by educated non-Hispanic White and English-speaking men, thereby constraining how we conceptualize both fluid and crystallized abilities. Novel cognitive tests that take a broader and more inclusive perspective are needed to fully appreciate one’s abilities and how they relate to ADRD. Lastly, the effect sizes were relatively small. Other researchers have noted an apparent inverse relationship between sample size and effect size (e.g., Karlamangla et al., 2014), possibly due to an increase in noise in the measures across sites or increases in heterogeneity of larger samples.

Conclusion

As the number of ethnoracial minorities and SES gaps continues to rise in the USA, increased attention also is being paid to health disparities across these categories and how they intersect with other characteristics like sex. In the case of AD, Black Americans, Hispanic Americans, and females are more likely to be diagnosed with the disease than non-Hispanic White males, but the origins of these increased risks are still being discovered, including possible misdiagnoses underlying these apparent risks (e.g., Goldstein et al., 2014; Kiselica et al., 2021). Research also is slowly uncovering how these health disparities intersect, thus giving more importance to the notion that not all individuals can be simply lumped into one or two categories of people (McDonough et al., 2021). Rather, different types of disparities have the potential to interact and lead to poorer cognitive outcomes (Matthews et al., 2018). However, to understand the origins of these health disparities and eliminate them, a person-specific measure of cognition that accounts for biases in culture and SES is needed. The present study provided initial evidence that an ability discrepancy score might serve as such a measure that is built on the foundation of clinical neuropsychology, and may enhance existing practices by modifying the traditional inherent biases that arise from lack of access, interest, and inclusion of diverse populations.