Paired-associate learning (PAL) has long been used to assess memory performance across aging in both experimental and clinical settings (e.g., Naveh-Benjamin, 2000; Wechsler, 1945, 2009; Zaretsky & Halberstam, 1968). In a verbal PAL task, subjects are asked to memorize a list of arbitrary cue–response word pairs (e.g., sky-tea, day-box). At test they are asked to recall response words when provided with the cue words. The associative ability tested in PAL tasks is considered an essential mechanism for human memory (e.g., associating a name with a person; desRosiers & Ivison, 1986).

Many studies have shown that older adults consistently perform worse on PAL tasks, including nonverbal stimuli (e.g., product-price, item-location, face-name), compared with younger adults (for a review, see Naveh-Benjamin & Mayr, 2018). Naveh-Benjamin (2000) attributed the worse performance in older adults to a deficit in their ability to bind new pieces of information together. Alternative theories have claimed that older adults’ insufficient ability in inhibitory control (Hasher & Zacks, 1988) leads to binding too much irrelevant information (hyperbinding), and consequently interferes with forming relevant associations during learning and selecting target associations during retrieval in PAL tasks (e.g., Biss, Campbell, & Hasher, 2012; Campbell, Hasher, & Thomas, 2010). These theories have been used as evidence to support the notion that memory, along with other cognitive systems (e.g., attention and executive control), decline as one ages (Park & Festini, 2017).

Recent studies by Ramscar and colleagues (Blanco et al., 2016; Ramscar, Hendrix, Love, & Baayen, 2013; Ramscar, Hendrix, Shaoul, Milin, & Baayen, 2014; Ramscar, Sun, Hendrix, & Baayen 2017), however, have challenged this deficit perspective of cognitive aging. Ramscar and colleagues propose that the poorer PAL performance seen in older adults reflects the accumulation of linguistic knowledge over time rather than any age-related cognitive decline. As a person ages, they necessarily experience more occurrences of meaningful word sequences, which increases their knowledge of the relationship between words. The accumulation of these meaningful associations will, in turn, cause unrelated words to become more negatively associated (Ramscar et al., 2017). Thus, forming associations between unrelated word pairs in PAL will be particularly problematic for older adults, simply because that “the learning of a nonsensical link between two unconnected words must increasingly compete with prior learning to the effect that this link is nonsensical” (Ramscar, Hendrix, Love, & Baayen, 2013, p. 458).Footnote 1 We will refer to the theories put forth by Ramscar and colleagues as the information accumulation perspective of aging (see also Wulff, De Deyne, Jones, Mata, & Aging Lexicon Consortium, 2019, for a similar perspective on aging).

Based on this notion, Ramscar et al. (2017) predicted that age-matched individuals who had less experience with language should perform better on a PAL task. To operationalize language exposure, Ramscar et al. (2017) used age-matched bilinguals and monolinguals. Bilingual speakers necessarily have less linguistic experience in one language compared with monolingual speakers of that language because their usage and experience with a language is split (Gollan et al., 2011). Thus, the information accumulation perspective of aging predicts that bilinguals should have less interference from prior linguistic knowledge. The results of Ramscar et al. (2017) bore this prediction out, with older bilinguals performing significantly better on a verbal PAL task than older monolingual participants.

However, comparing monolinguals to bilinguals introduces one potential confounding variable—namely, executive control. Previous studies have found that older bilinguals perform better than age-matched monolinguals on executive control, including better distraction inhibition (e.g., Bialystok, Craik, & Luk, 2008; Bialystok, Craik, & Ryan, 2006; Treccani, Argyri, Sorace, & Della Sala, 2009; for a review, see Bialystok, 2017). Although many recent studies have challenged the bilingual advantages in executive functioning (e.g., Dick et al., 2019; Lehtonen et al., 2018; Von Bastian, Souza, & Gade, 2016), most of these studies focus on young adults, the population who typically have peak performance in various executive functioning tasks. The effects have been observed more consistently in older adults (for a review, see Antoniou, 2019). Thus, the better performance of bilinguals found in Ramscar et al. (2017) may also be taken as evidence to support the inhibitory deficit hypothesis of paired-associate learning (e.g., Campbell et al., 2010). That is, the better executive functioning of bilinguals helps them inhibit irrelevant associations, which older monolinguals may find difficult, and thus promotes PAL performance.

The goal of the current study is to remove this potential confound by using a within-subjects design, where it will be shown that older adults can perform equally as well as younger adults when the number and strength of associations that a word has is controlled for. To accomplish this, the semantic diversity model (SDM; Jones et al., 2012) was used to quantify the associative strength that a word has to other words in the lexicon, described below.

Semantic diversity as a measure of association

The semantic diversity (SD) of a word measures the content variability of the contexts that the word occurs in. The greater the number of dissimilar contexts that a word appears in, the higher the word’s semantic diversity. Jones et al. (2012) showed that a semantic diversity measure provided a superior fit to lexical decision and naming times over a word frequency and contextual diversity count (operationalized as the number of documents or paragraphs that a word occurs in; see Adelman, Brown, & Quesada, 2006).

Subsequent studies have further supported the importance of semantic diversity across various tasks, including spoken word recognition (Johns, Gruenenfelder, Pisoni, & Jones, 2012), natural language learning (Johns, Dye, & Jones, 2016a), and word recognition across aging and bilingualism (Johns, Sheppard, Jones, & Taler, 2016b). All these studies have demonstrated that the semantic diversity of the contexts that words appear in are an important organizational principle of our mental lexicon. Words that appear in more unique contexts are more likely to be needed in new contexts, and thus, are stored more strongly in memory (for a review, see Jones, Dye, & Johns, 2017).

Additionally, contextual and semantic diversity has been shown to be an important information source in child language acquisition and usage (e.g., Hills, Maouene, Riordan, & Smith, 2010; Hsiao & Nation, 2018; Hsiao, Bird, Norris, Pagán, & Nation, 2019; Joseph & Nation, 2018).

Although semantic diversity was not originally built to measure associations between words, it captures this information indirectly. A word that is of high semantic diversity necessarily appears in more distinct contexts across learning, which means that it also co-occurs with a greater number of words. Oppositely, a low semantic diversity word appears in redundant contexts, and thus only co-occurs with a more limited set of words. Given equal frequency of two words, a high semantic diversity word will have a proportionally greater number of associations with other words compared with a word with low semantic diversity. The below simulation using the SDM will validate these assumptions.

The SDM is a computational model that measures the semantic diversity of words within a corpus by weighting the uniqueness of information that each new context provides, using an expectancy-congruence mechanism (Jones et al., 2012). To accomplish this, words are represented in a Word × Context matrix. A context is operationalized as groups of 20 sentences from a corpus of natural language. Each time the model encounters a new context, a new column is added to the matrix. For words that do not occur in the new context, its value for that column is zero. For each word that occurs in that context, its value for that column is determined by taking the similarity between the current context and the stored meaning of that word (the word’s corresponding row in the matrix). The more dissimilar the current context is compared with past usage, the higher the value is in that column. A word’s semantic diversity value is then the sum of the values in its row (for a formal description of the model, see Jones et al., 2012).

To formally establish the relationship between semantic diversity and word association, we ran the SDM on a set of words with equal frequency but different semantic diversity, and examined how frequency, semantic diversity, number of associations, and strength of associations change as a function of the number of contexts the model experiences. To measure these variables, the SDM was run on 100,000 contexts, from a corpus of young adult books (see Johns, Jones, & Mewhort, 2019). Number of associations was defined as the number of unique words that the target words co-occurred with across learning (equivalent to the degree of a node in a semantic network). Average association strength was calculated by going through each word that a target word co-occurred with and summing the SD values across their joint contexts, and then dividing it by the total number of words that the target word co-occurred with. This will give the average association strength that a word has to other words in the lexicon. The target words used in the simulation are described in the Materials section below, attained from the young adult corpus (see Appendix Table 2).

Figure 1 displays the results of this simulation. Figure 1a shows that the two sets of words are matched on word frequency, while Fig. 1b shows that the word sets diverge on semantic diversity. Figure 1c shows that words with high semantic diversity both occur with a greater number of words (i.e., these words have more associations), and, importantly, Fig. 1d shows that they also have a higher average level of association to those words. This means that words that are high in semantic diversity have both a greater number of associations to other words in the lexicon, and also a stronger level of association to those words.

Fig. 1
figure 1

Various measurements of word strengths and word association from the semantic diversity model. a Shows that low and high semantic diversity word sets have equal word frequency. b Shows that they diverge on semantic diversity. c Shows that high semantic diversity words have a greater number of overall associations. d Shows that they also have a greater average strength of association. Values are means from the high and low semantic diversity words contained in the Appendix Table 2 for the young adult corpus

From a learning theory perspective, this suggests that high SD words should be relatively more difficult to form new associations to. However, the current simulation also shows that the divergences that SD causes takes greater amounts of experience: as the model accumulates experience, there is a greater separation between high and low SD words. This suggests that older adults should be more affected by high SD words, given the greater level of experience that older adults have had with language. The goal of this article is to test this prediction by giving older and younger adults a paired-associate learning task, where the cue word is either low or high in semantic diversity. If the information accumulation perspective on aging is correct, then older adults should perform significantly worse when needing to form new associations to high SD words compared with low SD words, while younger adults should perform relatively equally to low and high SD words.



Participants were younger (18–29 years old) and older (45–60 years old) native speakers of American English recruited from, an online subject pool for behavioral studies (Palan & Schitter, 2018). Each participant was awarded $2.71 for their participation. A sample size of at least 50 participants in each age group was decided prior to data collection based on previous PAL studies. Fifty-seven younger and 57 older participants completed the experiment. We excluded data from five younger participants and seven older participants because either their overall accuracy was less than 25% or they got 0% accuracy in one of the experimental trials. Thus, data from 52 younger (age M = 23 years, SD = 3 years) and 50 older (age M = 52 years, SD = 5 years) participants were analyzed.


Cue and response words were selected from two corpora—one composed of young adult books and the other composed of general fiction books (see Johns & Jamieson, 2019; Johns et al., 2019). Each corpus consisted of 100,000 contexts, operationalized with a moving window of 20 sentences (meaning each corpus consisted of 2,000,000 sentences). The general fiction corpus contained approximately 29.2 million words, while the young adult fiction contained approximately 25.6 million words. The size discrepancy between the two corpora is due to young adult books having shorter sentences.

Ten high semantic diversity and 10 low semantic diversity words with equal frequency were selected as cue words from each corpus. Twenty response words of similar frequency were also selected from each corpus, with semantic diversity measures in between the high and low SD cue words. All the words were also matched on familiarity and imageability from the MRC Psycholinguistic Database (Coltheart, 1981). The detailed statistics of the stimuli is shown in Table 1, and the specific words used are contained in the Appendix Table 2.

Table 1 Statistics of the stimuli (reported as mean ± standard deviation)

The experiment is a 2 (older vs. younger adults) × 2 (high vs. low SD) factorial experiment. For every subject, each cue word was randomly paired with a different response word. The resulting 40 word pairs were randomly assigned to four experimental lists of 10 pairs, counterbalanced across subjects. Each list contained five high and five low semantic diversity word pairs. Words from different corpora do not intermix in the same list (i.e., two lists contained the word pairs from the young adult corpus and two lists contained the word pairs from the general fiction corpus).


The experiment was implemented using jsPsych (de Leeuw, 2015), and data collection was managed by JATOS (Lange, Kühn, & Filevich, 2015). There were four experimental trials, each containing a learning phase and a test phase of a word-pair list. During the learning phase, each word pair in a list appeared twice in the center of the screen in random order. Each word pair stayed on screen for 1.5 seconds. A cross was displayed in between word pairs for 1 second. Participants were instructed to memorize all the word pairs. Immediately after the learning phase, there was a cued recall test, in which participants were provided with one cue word at a time in randomized order, and they were asked to enter the other word in that pair. Before the experimental trials, each participant also finished a practice trial with five word pairs selected from the original Wechsler Memory Scale (Wechsler, 1945) in order for them to be familiar with the task.


There was no performance difference for the word pairs from the young adult or general fiction corpus, so the performance of two word lists were collapsed and analyzed together.

We conducted generalized linear mixed-effects models on the effects of age (younger vs. older), level of semantic diversity (low vs. high) on the accuracy of paired-associate learning using the lme4 package (Bates, Mächler, Bolker, & Walker, 2015) in R (R Core Team, 2018). Age and level of semantic diversity were coded with contrast coding (1, −1), and were included as fixed effects, both as main effects and as interactions. We chose the model with the maximal random effect structure that would converge justified by the data (Barr, Levy, Scheepers, & Tily, 2013). Specifically, participants were included as a random intercept and a random slope by semantic diversity uncorrelated with the random intercept. Items were included as a random intercept only. The complete model was GLMM_ACC = glmer(AccuracyAge × SD + (1|Participants) + (0 + SD| Participants) + (1|Item), family = “binomial,data = PAL_data). Significance was assessed via model comparison with an alpha of 0.05.

The data are displayed in Fig. 2. A significant effect of age was found, such that the accuracy of older participants was lower than that of younger participants, β = 0.22, SE = 0.10, χ2(1) = 4.84, p < .05. Importantly, there was a significant interaction of age and semantic diversity, β = −0.11, SE = 0.04, χ2(1) = 8.03, p < .01. For older adults, there was a significant effect of semantic diversity, β = 0.21, SE = 0.06, χ2(1) = 11.32, p < .001, meaning that their accuracy was lower in the high SD condition than in the low SD condition. However, the effect of SD was not significant within younger participants, β = −0.01, SE = 0.07, χ2(1) = 0.02, p = .88. More importantly, when comparing the two age groups, we found that there was no significant performance difference in the low semantic diversity condition, β = 0.09, SE = 0.10, χ2(1) = 0.86, p = .35. The significant effect of age was driven by the worse performance of older adults in the high semantic diversity condition, β = 0.33, SE = 0.11, χ2(1) = 8.79, p < .01.

Fig. 2
figure 2

Results of the verbal PAL task for older and younger adults with cue words that were either high or low in semantic diversity. The data indicate that the older adults have a greater difficulty in forming new associations to high semantic diversity words. Error bars are standard error


The goal of this article was to further evaluate the information accumulation perspective of aging using paired-associate learning, building off of the work of Ramscar et al. (2017). This perspective proposes that it is the continual accumulation of experience in memory that leads older adults to have worse performance across a variety of psychometric tests (Ramscar et al. 2014). Ramscar et al. (2017) explicitly tested this hypothesis using age-matched bilinguals and monolinguals. The goal of the current study was to remove the potential confound of executive functioning differences between monolinguals and bilinguals by using a within-subjects design. This was accomplished by attaining words that have either high or low levels of association to other words, but equivalent word frequency, through the use of semantic diversity (Jones et al., 2012). In a simulation study, it was shown that words high in semantic diversity had a greater level of association to other words, compared with words that had low semantic diversity (see Fig. 1). It was found that even though older adults had lower overall accuracy, the decrease in performance was driven by the high SD words. The performance of older adults did not differ from younger adults in the low SD condition. This finding demonstrates that it is the greater amount of associations that older adults have acquired through experience that causes their poorer performance on a PAL task.

The results of this paper show the power and promise of using corpus-based models to better understand trends in development and aging, joining a growing body of literature (e.g., Hills et al., 2010; Hsiao & Nation, 2018; Johns et al., 2018; Taler, Johns, & Jones, 2019). Corpus-based models allow for a quantification of the lexical information that people have been exposed to at various stages of aging. By mapping model outputs to human behavior, it provides insights into how experience shapes cognition. When combined with large-scale data collection designed to estimate the quantity and type of lexical experience (e.g., Brysbaert, Stevens, Mandera, & Keuleers, 2016), these models provide a promising pathway for the development of more realistic cognitive models of aging.

The results contained in this article, as well as the work of Ramscar et al. (2014, 2017), suggest that the behavioral detriments that are seen in older adults are not necessarily due to any structural changes in the cognitive system of older adults, but instead reflect differences in informational task difficulty that older adults face. In terms of a PAL task, the buildup of word associations within the lexicon impairs the ability of older adults to form arbitrary new associations. Corpus-based modeling allows for a determination of the buildup of lexical information across the life span, enabling a better understanding of the impact of information accumulation on older adults.

Open practices statement

The data for this experiment are available at