Participants
Thirty college students (16 females) participated in Experiment 1. All were native English speakers with normal or corrected-to-normal vision, right-handed and aged from 18 to 23 years (mean = 18.59). None had been diagnosed with a learning disability. Six additional participants who did not attend all three sessions were excluded from data analysis. Participants provided written informed consent before the experiment and received credits for their time to fulfill one of the course requirements. All experimental procedures were carried out with the approval of the local institutional review board.
Stimuli
Trained words and meaning probes
Forty high frequency words (above 30 per million) and 40 low frequency words (below 1 per million) were selected from the SUBTL database by Brysbaert and New (2009; see Table 1 for examples). Twenty-three undergraduates who did not participate in any other part of the study and were from the same subject pool rated their familiarity towards each word from 1 (unfamiliar) to 6 (familiar). The results showed higher ratings for high frequency words than for low frequency words (p < .001). In addition, high frequency words tended to be acquired earlier than low frequency words (Kuperman, Stadthagen-Gonzalez, & Brysbaert, 2012). Each of the selected words had only one meaning but may have more than one sense according to the Wordsmyth English Dictionary–Thesaurus (Parks, Ray, & Bland, 1998). High and low frequency words were matched on concreteness (Brysbaert, Warriner, & Kuperman, 2013) and number of senses (Parks et al., 1998), in addition to number of syllables, word length, and bigram frequency (Balota et al., 2007). Lexical and sublexical characteristics of the selected words are presented in Table 2. For each participant, half of the high and half of the low frequency words were paired with new meanings, and the other half were presented as exposure controls (see Appendix for the full list). The assignment of words to each of the two learning conditions was counterbalanced between participants.
Table 1 Examples of stimuli Table 2 Lexical and sublexical characteristics of trained words For each trained word, three probes that were semantically related to its original meaning were created and one was used in each of the three semantic relatedness judgment tasks on each session. Another 17 native English speakers who did not participate in any other parts of the study rated semantic relatedness between trained words and their probes on a scale of 1 (unrelated) to 6 (related). The collected rating data showed that semantic relatedness between trained words and probes were comparable between high and low frequency words on each of 3 days (overall, high frequency words = 4.802 ± .441, low frequency words = 4.768 ± .518; all ps > .37). Additionally, meaning probes of high and low frequency words were matched at number of letters (ps > .42) and number of syllables (ps > .20), while probes of high frequency words had a higher word frequency compared to those of low frequency words (ps < .08; see Table S1 for mean and SD).
New meanings
Forty new meanings were taken from a previous study (Fang et al., 2017). These new definitions were created to allow realistic conceptual mappings but with no overlap with existing words. The pairing between trained words and definitions was counterbalanced across participants such that each definition was paired with a high frequency word for half of participants and with a low frequency word for the other half of participants (see Appendix for all the pairings). To assess any inadvertent relation of the new meaning of a word to its actual meaning, we carried out a term-to-document Latent Semantic Analysis (LSA, http://lsa.colorado.edu), calculating the cosine of the angle between the resultant semantic spaces (Landauer & Dumais, 1997). The results showed very low LSA cosine values for both high and low frequency words with a mean of 0.003 (SD = 0.047) and 0.006 (SD = 0 .054), respectively, indicating that new meanings were not related to the original meanings.
Procedure
As shown in Fig. 1, participants learned new meanings for high and low frequency words in an associative learning paradigm on Day 1, and were tested on both the new and original meanings on each of the three days.
Learning phase
Participants had six learning trials for each word in a self-paced learning paradigm. Each trial began with a fixation for 500 ms, immediately followed by a visual presentation of a to-be-learned (trained) word. When participants were ready to learn the meaning, they pressed the space bar, which caused the appearance of either a definition (for words with new meanings) or a string of asterisks (for exposure controls). Participants pressed the space bar when they were ready for the next word. Each word was presented exactly once within a complete cycle of 80 individual trials, before the next cycle began. To facilitate learning, after the second (two learning trials for each word) and fourth cycles (four learning trials for each word), participants were tested on the new meanings of half the words from each condition by a meaning generation test (described below).
Tests on new meaning: meaning generation tests
Participants were presented with a trained word and asked to type its new meaning or type “n” (for “none”) if the word did not have a new meaning (i.e., exposure controls). To promote learning, immediate correct meaning feedback was presented in the two tests during the learning phase; all the post-learning tests including the last test on Day 1 and the tests on subsequent days occurred without feedback. Participants’ responses were rated from 0 (no response or no any related information is provided) to 5 (the exact meaning is provided) based on how close they were to the correct answers by two trained research assistants who were blind to the conditions. Averaged scores from two raters were assigned as final scores; however, when differences of ratings from two raters were larger than 1, inconsistencies were resolved through discussion before final scores were assigned. Only data from the post-learning tests are reported here.
Tests on new meaning: multiple-choice tests
In each trial, participants were presented with a trained word for 500 ms, followed by four options that included three new meanings (two that had been paired with some other word if the word had been paired with a new meaning) as the first three options and a string of asterisks as the fourth option. Participants’ task was to choose the correct meaning from four options by pressing one of the four number keys (from 1 to 4). Feedback about correctness and correct answer were always presented after participants’ response. Participants could choose to study the words again and pressed the space bar to continue to the next word when they were ready. Accuracy was recorded.
Testing on original meanings: semantic relatedness judgment tasks
Participants were shown pairs of words with only one word on the screen each time: The first word was a trained word; the second or probe word was either related or unrelated to the original meaning of the trained word. The first word was presented for 500 ms, followed immediately (ISI=0) by the second word. Participants’ task was to judge whether or not two words were related based on the original meaning of the first word by pressing keys using the right or left index finger. The next trial began immediately following participants’ response. There were 80 related word pairs and 80 unrelated word pairs in each task. The unrelated word pairs were created by shuffling the related word pairs. Participants familiarized themselves with the task through a short practice, in which they made judgments on ten word pairs that did not consist of any trained words or probes. Participants performed the task as the first task on both Day 2 and Day 8 to minimize the influence of recent retrieval of new meanings.
Statistical analysis
Data from each task were analyzed with mixed-effects models (Baayen, Davidson, & Bates, 2008) using lme4 package in R. Accuracy data were analyzed with logistic regression after log transformation, others with linear regression. Random effect terms included intercepts of subject and item (training word); a by-subject or by-item slope was added if model comparisons showed a significant contribution and models converged. Final models are reported.
For the meaning generation and multiple-choice tests, only responses to trained words with new meanings are of interest (data for exposure controls are presented in Fig. S1); fixed effects in the models included Frequency, Day, and their interaction. The fixed effects were treatment-coded: High frequency words provided the reference level of Frequency; the reference level of Day was Day 1 and both Day 2 and Day 8 were compared to Day 1. Thus, the intercepts in the models represented performance for high frequency words on Day 1. These coding procedures allowed us to compare the effect of Frequency on Day 1 with our previous findings (Fang & Perfetti, 2017; Fang et al., 2017) and to examine the change of patterns over one day and one week with the smallest number of models. Following significant interactions, contrast analyses were conducted to reveal the Frequency effect on Day 2 or Day 8. To better characterize the decay of the memories of new meanings following, we defined the retention rate over one week as participants’ performance on Day 8 relative to their performance on Day 1. The retention rate was calculated for each test for each participant and compared across high and low frequency words using a paired t test.
For the semantic relatedness judgment tasks, decision times from related and unrelated trials were analyzed separately, because the related conditions were more comparable with the contrast regarding perturbation effect in our previous study (Fang & Perfetti, 2017). Because participants made the “related” responses with their dominant hand and “unrelated” responses with the non-dominant hand, any relatedness effects would not be easily interpreted. For completeness, the results of modeling including both related and unrelated conditions are reported in Supplementary Tables S3 and S4. In addition to incorrect trials, trials with decision times beyond 2.5 standard deviations from the mean or shorter than 200 ms were excluded before modeling (affecting 3.1% of the remaining data). Fixed effects included TrainingType, Frequency, Day, and their full interactions. The coding of Frequency and Day was the same as in the models for tests on new meanings. The reference level of TrainingType was exposure control. Following significant interactions involving TrainingType, contrast analyses were conducted to reveal TrainingType effect on high or low frequency words on each day. Accuracy data of the task were also analyzed but presented in Table S2, as they were of less interest.
Results and discussion
Retention of new meanings
Results of meaning generation and multiple-choice tests are shown in Fig. 2 and Table 3. In the meaning generation tests, participants learned the new meanings of low frequency words better than high frequency words on Day 1 (t = 2.234, p = .026). However, over one day and one week, the relevant memories about new meanings of low frequency words decayed faster than those of high frequency words (ps < .05 for both interactions between Day and Frequency); high and low frequency words did not differ on Day 2 or Day 8 (Day 2: t = .161, p = .872; Day 8: t = -1.61, p = .238). The retention rate for high frequency words (64.7 ± 21.4%) was higher than for low frequency words (52.1 ± 18.6%; t(29) = 4.300, p < .001).
Table 3 Fixed effect estimates for mixed effects models of learning performance in Experiment 1 Very similar patterns were found in the multiple-choice tests: Participants were better at recognizing the new meanings of low frequency words on Day 1 (z = 2.818, p = .005), but their memory for new meanings of high frequency words showed faster decay over one day and one week (ps < .01 for both interactions). The low frequency advantage was gone by Day 2 (p > .90), and reversed, at marginal significance, by Day 8 (z = -1.728, p = .084). Again, the retention rate was higher for high frequency words (87.3 ± 16.8%) than for low frequency words (77.8 ± 16.4%; t(33) = 3.413, p = .002).
The influence of learning on existing meanings
As shown in Fig. 3 and Table 4, Day 1 showed no perturbation effect; neither TrainingType nor its interaction with Frequency were significant in either related or unrelated trials (ps > .45). Neither was there evidence for perturbation on Day 2 or Day 8, as indicated by the insignificant two-way interactions between Day and TrainingType in either related or unrelated trials (ps > .27) or the three-way interactions between Day, TrainingType, and Frequency (ps > .10). Instead, Day 8 decision times for low frequency words with new meanings were marginally faster than exposure controls in related trials (t = 1.731, p = .083).
Table 4 Fixed effect estimates for mixed effects of semantic relatedness judgment task in Experiment 1 (SOA = 500 ms) Generally, the perturbation effect observed in Fang and Perfetti (2017) using ERPs during an implicit task was absent in behavioral results when participants were instructed to make semantic judgments on the original meanings. Instead, we observed a trend of facilitation for learning new meanings for low frequency words one week following learning.
It is possible that perturbation is observable only during a narrow window of word processing when selective access to original meaning is required. The SOA of 500 ms used in Experiment 1 may have exceeded this window, providing sufficient time to stabilize the representations of the original meaning or to suppress the activation of the newly learned meaning. The non-significant difference between high and low frequency exposure controls also indicated that the SOA was too long, especially when participants had multiple exposures to both high and low frequency words prior to the task. If this is the case, then a much shorter SOA should expose a perturbation effect on the word’s connection to its original meaning. Accordingly, Experiment 2 used a shorter SOA of 200 ms, a time window during which word processing shows wide spread activation prior to specific meaning selection (Kintsch & Mross, 1985; Van Petten & Kutas, 1987). Experiment 2 also aimed to replicate the advantage for high frequency words in the long-term retention of new meanings.