Age of acquisition estimates for 3,000 disyllabic words

Over the course of a lifetime, people develop a large vocabulary of many thousands of words. Age of acquisition (AoA, i.e., the age at which a word was learned) is a variable that documents this process. Many studies have shown that AoA is related to performance on a variety of tasks including picture naming (Carroll & White, 1973), recognition memory (Cortese, Khanna, & Hacker, 2010), reading aloud and lexical decision (Cortese & Khanna, 2007), and more (for a review see, Juhasz, 2005). The AoA effect in reading aloud and lexical decision remains significant even when frequency is controlled (Brysbaert & Cortese, 2011). Overall, there are very few variables that relate to word and picture processing performance as consistently as does AoA.

There has been some controversy as to the locus of AoA effects. Juhasz (2005) reviewed the results of picture naming, word naming, and lexical decision experiments and found the largest effects of AoA in picture naming tasks, followed by lexical decision, then word naming. One interpretation of this pattern is that AoA has primarily a semantic basis because tasks that place a primary emphasis on semantic information (e.g., picture naming and lexical decision) also show larger effects of AoA than tasks that rely less on semantic information (e.g., naming). This pattern of results is consistent with both the semantic locus hypothesis (Steyvers & Tenebaum, 2005), and the network plasticity hypothesis (Ellis & Lambon Ralph, 2000). The semantic locus hypothesis proposes that earlier acquired concepts provide a structure onto which later acquired words associate. Thus, early AoA concepts have more connections from other concepts than later acquired concepts. Furthermore, words with more connections are more easily retrieved, resulting in an effect of AoA. The network plasticity hypothesis predicts an effect of AoA based on how connectionist models learn. As new words are presented to a model, the resulting changes in connection weights are not constant over time. Rather, those items learned earlier result in larger changes to the connection weights than those learned later. Over time, the model loses plasticity, resulting in less efficient learning of words presented later. This cost for later acquired concepts occurs especially when the relationship between inputs and outputs is not consistent (Ellis & Lambon Ralph, 2000). For example, balk may be acquired relatively late, but the computation of its phonological code may benefit from orthographic-phonological knowledge of earlier acquired words that share a similar orthographic-to-phonological mapping (e.g., talk, walk, and chalk). In contrast, an inconsistent word such as pint, will not benefit from previous knowledge of other int words (e.g., mint, hint, and lint) because they are pronounced differently.

Also controversial is the separation of the effects of AoA and imageability in predicting reading aloud performance because imageability and AoA are moderately correlated (Cortese & Khanna, 2007; Schock, Cortese, & Khanna, in press). Imageability has been found to significantly predict reading aloud performance by several researchers (e.g., Balota, Cortese, Sergent-Marshall, Spieler, & Yap, 2004), using a large scale multiple regression method. However, when Cortese and Khanna (2007) added AoA to the set of predictor variables, the effect of imageability was no longer significant. In addition, Monaghan and Ellis (2002) reported that, when AoA was controlled, the interaction between frequency, consistency and imageability, reported by Strain, Patterson, and Seidenberg (1995) was no longer significant. So, it has been suggested that the observed effect of imageability on reading aloud performance may actually be due to a failure to adequately control for AoA (Monaghan & Ellis, 2002; Ellis & Monaghan, 2002).

In addition, partially due to the limited availability of AoA estimates, it has been somewhat difficult to separate AoA effects from word frequency effects as earlier acquired words tend to be associated with higher frequency values than later acquired words (see, e.g., Zevin & Seidenberg, 2004). As AoA estimates are becoming more readily available, the picture that emerges is that AoA influences word processing even after word frequency has been well controlled (see Brysbaert & Cortese, 2011).

Word processing studies have traditionally used factorial designs with small sets of items, however flaws in this approach (see Balota, et al., 2004) have prompted many researchers to move to a mega study approach (Balota et al., 2004; Cortese & Khanna, 2007; Chateau & Jared, 2003; New, Ferrand, Pallier & Brysbaert, 2006; Yap & Balota, 2009). As noted by Balota et al. (2004), some of the shortcomings of the factorial approach include dichotomizing continuous variables (see, e.g., Humphreys, 1978), difficulty in controlling for all of the relevant variables (also see Cutler, 1981), problems associated with stimulus selection (Forster, 2000), an over emphasis on determining statistical significance at the expense of assessing the relative influence of factors. The mega study (for a review see, Balota, Yap, Hutchison, & Cortese, in press) typically employs a large number of trials, and multiple regression analyses are conducted to examine the influence of predictor variables on performance measures. For example, the English Lexicon Project (ELP, Balota et al., 2007) provides reaction time estimates in the reading aloud and lexical decision tasks for over 40,000 English words. While AoA estimates are available for large numbers of monosyllabic words (see Cortese & Khanna, 2008), AoA ratings do not exist for most of the polysyllabic words in the ELP. For example, we now have obtained AoA and imageability ratings for 3,000 monosyllabic words and 3,000 disyllabic words. Of this corpus of 6,000 words, only 3.3 % (2.3 % of the disyllabic corpus) are representated in the Morrison, Chappell and Ellis (1997) norms, 20.2 % (14.5 % of the disyllabic corpus) are represented in the Bird, Franklin and Howard (2001), norms, and 20.0 % (17.8 % of the disyllabic corpus) are represented in the Stadthagen-Gonzalez and Davis (2006) norms. So, as word processing research turns more to the study of polysyllabic words, and the mega study approach increases in frequency, normative data for large sets of polysyllabic words will be required.

The current study provides AoA estimates for 3,000 disyllabic words. The procedures used to collect these estimates were very similar to those used by Cortese and Khanna (2008). It is expected that these norms will be useful to those who are analyzing performance in the ELP or those whose studies require a large number of disyllabic words.

Method

Participants

Thirty-two students enrolled in undergraduate psychology courses at the University of Nebraska Omaha (28) and Creighton University (4) participated for course credit or extra credit. The participants ranged in age from 17 to 40 (M = 20.69). Twenty-six participants were female, and six were male. Their education level ranged from first year of college to fourth year of college (M = 1.9). Based on free response, there were 28 Caucasian participants, two Asian participants, one Black participant and one Hindu participant.

Stimuli

The stimuli were 3,000 disyllabic words. Stimulus characteristics for these words are presented in Table 1. Table 2 provides a correlation matrix of the relationship among AoA and other semantic variables for a relatively large number of disyllabic words (N = 2,792). For a smaller set of words (N = 302) that were in common with the Bennett, Burnett, Siakaluk, and Pexman (2011) norms, we computed the correlation between AoA and body-object interaction (r = -.06, N.S.). In addition, for a small number of words in common with Bradley and Lang’s (1999) corpus, (N = 160) we computed correlations between AoA and valence (r = -.03, N.S.) and between AoA and arousal (r = -.17, p = .03). The words examined here were mainly monomorphemic, but very common multimorphemic words were also included (e.g. awesome, baseball, bathroom). They ranged in frequency from zero to 71.21 occurrences per million, according to the Zeno et al. (1995) norms and ranged in length from three to 11 letters. We began with 23,365 disyllabic words and narrowed this list to 3,000, as our previous research has indicated this is the number of items that can be rated in approximately 4 hours. To reduce the number of items, the list of 23,365 was divided among 7 undergraduate research assistants who each reviewed a section of the list and selected the words (s)he knew. All words that were not familiar to the undergraduate assistants were eliminated, leaving 15,434 words. We proceeded this way to reduce the number of words and to include mostly words that undergraduates know. To reduce the sample further, many multimorphemic words were eliminated, leaving a representative, but not exhaustive list of disyllabic words.

Table 1 Stimulus characteristics of the 3,000 disyllabic words used in the current study
Table 2 Correlation matrix for AoA and semantic variables

Procedure

The procedures of Cortese and Khanna (2008) were followed as closely as possible. A computer was used to collect ratings in a laboratory. Participants were asked to rate words on a scale of one to seven based on their subjective estimate of how early the word was acquired (the instructions appear in the Appendix). Two sessions of 1.25 to 2.00 hours were conducted within one week of each other. Each session was comprised of four blocks of 375 words each for a total of 1,500 words. At the end of each block, the participant was given the opportunity to take a break. Stimuli were presented in a different random order for each participant.

Each trial consisted of a word being presented in lowercase letters in the center of the screen, while the rating scale was visible at the bottom of the screen. The ratings were entered using the number keypad on the right side of the keyboard. Responses that were faster than 500 ms were followed by the message “response too fast – slow down!” on the bottom of the screen. After 2000 ms the word then reappeared on the screen to be rerated. This delay was intended to discourage the participant from again responding too quickly. Responses that were not numbers between 1 and 7 were followed by the message “response invalid – try again” at the bottom of the screen for 2000 ms, after which the word reappeared on the screen to be rerated. The instructions and scale were the same as those given to participants in Cortese and Khanna (2008) for monosyllabic words, with two exceptions (see Appendix). First, “single syllable” was changed to “two syllable.” In addition, participants were instructed that whenever they noticed that a word had more than one meaning, they should provide the estimate for the meaning that was acquired first. This modification was based on work by Khanna and Cortese (2011) that examined AoA for ambiguous words.

Analyses, results and discussion

Data were collected from 32 participants. Participants were not individually monitored during the collection of data and so a method was needed for ensuring that each participant took the task seriously (i.e., did not make ratings by pressing buttons without consideration). We screened the data using the following procedure. The overall mean for each item was calculated by averaging the ratings given to that word by each of the 32 participants. Next, the correlation between each participant’s ratings and the overall means of all 3,000 words was calculated. From this set of correlation coefficients, a mean correlation (.677) and standard deviation (.137) were established. One participant, whose correlation coefficient was more than 2 standard deviations below the mean, was eliminated. The rationale for eliminating this participant’s responses was that either the participant did not take the task seriously, did not understand the instructions, or was not representative of the population we were interested in sampling. In fact, this participant’s correlation coefficient, r = .05, was 4.58 standard deviations below the overall mean. We then collected responses from an additional participant. Thus, the estimates provided in the current study were derived from 32 people. Of those 32 participants who remained, the correlation coefficients ranged from .33 to .83.

To establish content validity, correlations between those items common to our data set and 3 others, Stadthagen-Gonzalez and Davis (2006, r = .84, N = 535), Morrison et al. (1997, r = .72, N = 68), and Bird et al. (2001, r = .78, N = 436), were calculated (see Fig. 1). Interrater reliability was assessed via Cronbach’s alpha (α = .962). The Spearman Brown coefficient was .965.

Fig. 1
figure 1

Scatterplots of the relationships between the AoA ratings obtained in the current study and those from Stadthagen-Gonzalez and Davis (2006), Morrison, Chappell, & Ellis (1997), and Bird, Franklin, and Howard (2001)

Combining the AoA estimates obtained in the present study with the 3,000 estimates reported by Cortese and Khanna (2008) provides researchers with AoA estimates for 6,000 words obtained via very similar procedures. We were interested in examining the relationship between the AoA value of a word and the number of semantic associates that generate that word in word association as well as the AoA value of a word and the number of semantic associates that the word itself produces. We assessed these relationships using our AoA estimates and the word association norms of Nelson, McEvoy, and Schreiber (2004). We found that 3,055 (1716 of which were monosyllabic and 1339 were disyllabic) of 6,000 words that our participants and Cortese and Khanna’s (2008) participants had rated for AoA appeared as cues and as an associate by at least one other word in the Nelson et al. (2004) norms. We found that relative to later acquired words, earlier acquired words have a relatively large number of semantic associates that produce them in word association (r = -.474, p < .001). This relationship is consistent with the idea that later acquired words are learned via association to earlier acquired words. Furthermore, the relationship was log linear; the correlation coefficient increased when the number of semantic associates variable was log transformed, (r = -.619, p < .001, see Fig. 2). In contrast, the AoA value of a word is not related to the number of associates it produces in word association (r = .026, p = .157). These results are remarkably similar to the relationships reported by De Deyne and Storms (2008) in Dutch. Based on analyses involving 1,117 Dutch words, DeDeyne and Storms reported correlations of r = -.61 (for a word’s AoA and the frequency of being produced as an associate of other words), and r = .03 (for a word’s AoA and the number of associates it produces). We note that these values are also based on log transformed values for the number of semantic associate variables.

Fig. 2
figure 2

The scatterplot of the relationship between the age of acquisition of the target and the number of associates that produce the target in free association

Speed of response was not emphasized in our instructions. However, reaction time data were collected. The average reaction time across all items was 2007.11 ms. Average reaction times for individual items ranged from 1140.25 ms to 6727.27 ms. Reaction time was negatively correlated with rating, (r = -.141) such that words acquired earlier in life took longer to rate.

Perhaps the greatest value of these AoA ratings is that they can be used in conjunction with the imageability ratings that we have obtained previously to assess independent contributions of AoA and imageability on reading aloud and lexical decision performance for monosyllabic and disyllabic words. Previously, in their analyses of monosyllabic words, Cortese and Khanna (2007) reported that AoA accounted for unique variance in reading aloud and lexical decision reaction times whereas imageability’s effect was limited to lexical decision (although it did account for unique variance in the accuracy of reading aloud). More recently, Schock, Cortese, and Yap (2011) looked at these relationships in 1,937 disyllabic words. More specifically, using the current set of AoA estimates and imageability estimates from Schock et al. (in press), Schock et al. (2011) entered AoA in Step 8 of a hierarchical regression analysis, after initial phoneme characteristics and numerous sublexical and lexical variables were controlled. In fact, Schock et al. examined imageability and AoA while controlling for all of the variables examined in the recent Yap and Balota (2009) studies. Schock et al. found that AoA and imageability each accounted for unique variance in reading aloud and lexical decision reaction times. Interestingly, the effects of AoA and imageability were more similar across reading aloud and lexical decision tasks than they were across these same tasks for monosyllabic words. In contrast, Cortese and Khanna (2007) found that AoA and imageability effects were much stronger in lexical decision than reading aloud. We also note that while AoA also accounted for unique variance in reading aloud and lexical decision reaction times for monosyllabic words (see, e.g., Brysbaert & Cortese, 2011; Cortese & Khanna, 2007), imageability did not account for unique variance in reading aloud reaction times for monosyllabic words when AoA was controlled (Cortese & Khanna, 2007).

We maintain the idea that AoA has primarily a semantic basis which is consistent with the aforementioned semantic locus hypothesis (i.e., AoA affects the structure of semantic associations) and the network plasticity hypothesis (i.e., AoA effects emerge more strongly when associations between inputs and outputs are more arbitrary as they are between orthography/phonology and semantics). In addition, we hypothesize that the difference in results for imageability between monosyllabic and disyllabic reading aloud reaction times reflects the idea that disyllabic words take longer to process and allow more time for semantic information to influence processing. Specifically, Yap and Balota (2009) reported that reaction time increases with number of syllables (r = .44, p < .001).

In addition, these norms combined with our previously published norms will be useful for researchers interested in further examining effects of AoA in monosyllabic and disyllabic words. Given that Schock et al. (2011) found that AoA and imageability accounts for unique variance in reading aloud and lexical decision performance, it will be important to control for the influences of these variables. Also, given that AoA values now exist for a very large set of words, it will be easier to distinguish effects of AoA from other factors as well (e.g., word frequency).