Experiment 1 provided evidence that retrieval generally enhances retention of pairmates from a similar temporal context, and that, even when pairmates are far apart, there is a surprising delay-dependent switch between retrieval-induced forgetting and facilitation for semantically related information. The latter effect is consistent with the hypothesis that memory consolidation can strengthen associations between memories with shared elements (e.g., Lewis & Durrant, 2011). In Experiment 1, the sleep and no-sleep groups were tested at similar times of day, but the retention interval varied. In Experiment 2, we sought to examine whether sleep could rescue untested items from competition even if the retention interval was held constant. To test this prediction, we tested two groups with a fixed 12-h delay between study and test, but the timing of the sessions was arranged so that one group was awake during the retention interval, and the other group was able to sleep during the retention interval. Experiment 3 was a pre-registered replication of Experiment 2, using identical materials, design, and procedure (https://osf.io/8nzgb).
Method
Participants
Ninety-six students (71 participants identified as female, 24 participants identified as male and one participant selected Other, Experiment 2) and 200 students (142 participants identified as female, 53 participants identified as male, and five participants selected Other, Experiment 3) from the University of California, Davis participated in exchange for partial course credit. All reported fluency in English and normal or corrected-to-normal vision. All participants were randomly assigned to either a “sleep” or a “wake” group (n = 48 participants/group in Experiment 2; n = 100 participants/group in Experiment 3). In Experiment 2, four participants in the sleep group and three participants in the wake group were excluded due to low accuracy during retrieval practice (below three SDs from the mean) and seven participants in the wake group were excluded due to taking naps between two sessions. In Experiment 3, nine participants in the sleep group and seven participants in the wake group were excluded due to low accuracy during retrieval practice and nine participants in the wake group were excluded due to taking naps between two sessions.
Because Experiment 3 was designed as a replication of Experiment 2, the sample size for this study was determined using the smallest effect size observed in Experiment 2 (d = .32) with an a priori power analysis by GPower (Faul et al., 2009) with power (1-β) set at 0.80 and α = 0.05. The analysis showed that this effect requires at least 79 participants to detect. Because, in Experiment 2, approximately 20% of participants were excluded in the wake group, in Experiment 3, we planned to run 100 participants in each group to ensure at least 79 participants in each group would be included in the analyses. Both Experiment 2 and Experiment 3 were conducted online for the ease of scheduling the 12-h delay.
Participants reported no history of neurological or psychiatric disorders, other major medical issues, or use of medication known to interfere with sleep. Participants also reported having a regular sleep the night before the study and between the two sessions (sleep group), which was defined as going to bed no later than 2 am, waking up no later than 10 am, and getting at least 7 h of total sleep.
Materials, design, and procedure
The materials and procedure used were identical to Experiment 1 except for the following changes. In Experiment 1, retrieval-induced facilitation was observed in both the adjacent and the close conditions. To simplify the design, in Experiment 2 and Experiment 3, only two levels of temporal distance were included: adjacent and far. The number of trials in each condition is presented in Table 1. Participants in the wake group were asked to finish the first session between 8 am and 12 pm and participants in the sleep group were asked to finish the first session between 8 pm ando 12 am. For both groups, after the first session, participants were instructed to wait 12 h before finishing the second session. At the beginning and end of each session, the Stanford Sleepiness Scale (Hoddes & Dement, 1972), which assesses state sleepiness/alertness on a scale of 1 (extremely alert) to 7 (very sleepy), was completed. An intervening activity survey was given at the beginning of the second session to screen out participants who took naps (wake group) or did not have sufficient sleep (sleep group) between the two sessions.
Results
Vigilance
Stanford sleepiness scores did not differ between sleep and wake groups in Session 1 (Exp. 2: sleep mean = 2.32, wake mean = 2.41, t = .43, p = .67; Exp. 3: sleep mean = 2.53, wake mean = 2.37, t = 1.17, p = .24), or in Session 2 (sleep mean = 2.47, wake mean = 2.67, t = .97, p = .34; Exp. 3: sleep mean = 2.38, wake mean = 2.42, t = .31, p = .76), suggesting that there were sleepiness differences between groups due to time of day.
In Experiment 2, during retrieval practice, subjects correctly recalled 75% (SD = .18) of trials in the first round and 85.4% (SD = .15) of trials in the second round. In Experiment 3, subjects correctly recalled 73% (SD = .20) of trials in the first round and 84% (SD = .19) of trials in the second round.
Tables 3 and 4 present the means and standard deviations for final test accuracy in different conditions in Experiment 2 and Experiment 3.
Table 3 Final recall accuracy (mean percent correct) for Control, Non-target and Target trials, and accuracy difference between Non-target and control trials as a function of temporal distance, semantic relatedness and delay in Experiment 2 Effects of retrieval practice on retention of non-targets
As in Experiment 1, our primary analyses focused on recall accuracy for the non-target and control trials on the final test. A 2 (Trial Type: non-target vs. control) × 2 (Temporal Distance: adjacent, far) × 2 (Semantic Relatedness) × 2 (Group) mixed ANOVA revealed four-way interactions in both Experiment 2 and Experiment 3 (Exp. 2: F(1,79) = 10.89, p = .001, ηp2 = .12; Exp. 3: F(1,173) = 22.28, p < .001, ηp2 = .11). These findings confirm that, as in Experiment 1, the effects of retrieval practice on retention of non-targets varied as a function of Retention Interval, Temporal Distance, and Semantic Relatedness. Figures 3 and 4 show that these effects closely parallel what was observed in the immediate- and delayed-recall groups in Experiment 1. To break down this effect, we separately examined the data for temporally adjacent and far trials.
Table 4 Final recall accuracy (mean percent correct) for Control, Non-target and Target trials, and accuracy difference between Non-target and control trials as a function of temporal distance, semantic relatedness and delay in Experiment 3
Regardless of sleep, retrieval practice facilitated retention for temporally adjacent non-targets
As shown in Figs. 3 and 4, for temporally adjacent trials, there were main effects of Trial Type (Exp. 2: F(1,79) = 71.36, p < .001, ηp2 = .48; Exp. 3: F(1,173) = 133.85, p < .001, ηp2 = .44), such that accuracy for non-targets was better than for control trials, and main effects of Semantic Relatedness (Exp. 2: F(1,79) = 50.86, p < .001, ηp2 = .39; Exp. 3: F(1,173) = 144.19, p < .001, ηp2 = .46), such that performance for related trials were generally better than for unrelated trials. There were also interactions between Trial Type and Relatedness (Exp. 2: F(1,79) = 26.24, p < .001, ηp2 = .25; Exp. 3: F(1,173) = 62.28, p < .001, ηp2 = .27), such that the facilitation effect was larger for related items than for unrelated items. There were no other significant main effects or interactions (p-values > .1).
Regardless of sleep, retrieval practice impaired recall of temporally far and unrelated non-targets
For temporally far trials, there was a significant three-way interaction between Trial Types, Semantic Relatedness, and Group (Exp. 2: F(1,79) = 11.60, p = .001, ηp2 = .13; Exp. 3: F(1,173) = 20.83, p < .001, ηp2 = .11). Follow-up analyses of temporally far and unrelated trials revealed that retrieval impaired temporally far non-targets that were unrelated with targets (main effect of Trial Type: Exp. 2: F(1,79) = 27.59, p < .001, ηp2 = .26; Exp. 3: F(1,173) = 57.80, p < .001, ηp2 = .25) and the impairment effect was stronger for the sleep group than for the wake group (interaction between Trial Type and Group: Exp. 2: F(1,79) = 4.20, p = .044, ηp2 = .05; Exp. 3: F(1,173) = 5.65, p = .019, ηp2 = .032).
Without sleep, retrieval practice impaired recall of temporally far and related non-targets but retrieval practice facilitated retention of these items after post-learning sleep
Surprisingly, for temporally far and related items, there was a significant interaction between Group and Semantic Relatedness (Exp. 2: F(1,79) = 8.35, p = .005, ηp2 = .096; Exp. 3: F(1,173) = 14.00, p < .001, ηp2 = .075), such that retrieval practice impaired retention for this type of trial in the wake group (Exp. 2: F(1,36) = 3.50, p = .069, ηp2 = .089; Exp. 3: F(1,83) = 6.84, p = .011, ηp2 = .076), but facilitated retention in the sleep group (Exp. 2: F(1,43) = 5.01, p = .030, ηp2 = .10); Exp. 3: F(1,90) = 7.24, p = .008, ηp2 = .074).
Across-study comparison
Experiment 1 had a long retention interval, whereas Experiments 2 and 3 had a shorter retention interval. An exploratory analysis comparing the effects of sleep on semantically related items in the far condition between experiments revealed no significant interaction between Group (sleep vs. no sleep) and Experiment (Exp. 1 vs. Exp. 2 or Exp. 1 vs. Exp. 3) on the magnitude of retrieval-induced facilitation/forgetting (Exp. 1 vs. Exp. 2: F(1,149) = 2.11, p = .148, ηp2 = .014; Exp. 1 vs. Exp. 3: F(1,243) = 1.89, p = .171, ηp2 = .008). Thus, there is no evidence to suggest that the effects of sleep in Experiments 1–3 were moderated by retention interval.
In summary, results from the comparison of the sleep and wake group mirrored the differences between the short-delay and long-delay groups seen in Experiment 1, suggesting the delay-dependent switch between facilitation and competition is, in fact, sleep dependent.