Examining the effect size and duration of retrieval-induced facilitation

Under certain conditions, the retrieval of some information can increase the recall of other information, a phenomenon known as retrieval-induced facilitation. Chan (Journal of Memory and Language 61:153–170, 2009) proposed two moderating factors to account for why retrieval causes facilitation in some situations and forgetting in others: (1) integration at the time of encoding and (2) the delay between retrieval practice and final test. Chan found a 9–11% facilitation effect when the materials were well integrated and the final test occurred after a 24-h delay. Two sets of experiments sought to replicate and extend Chan’s study by examining retrieval-induced facilitation not only following a 24-h delay but after longer delays (i.e., 1 or 2 weeks). A meta-analysis including these replications and the original experiments was also conducted. The results provide additional evidence of retrieval-induced facilitation, with no evidence that the effect varies as a function of the final delay. However, the size of the effect was found to be somewhat smaller than previously observed.


Introduction
Tests have been documented to improve the later recall of tested information (i.e., the testing effect), making them a learning tool that can be implemented easily and effectively (Roediger & Karpicke, 2006a). Tests allow learners to practice retrieving information in a way that potentiates the subsequent retrieval of that information. However, it is often implausible (if not impossible) to create a practice test that includes all to-be-learned information, necessarily leaving some subset untested. The question of what happens to untested information has been studied extensively over the years, and many results point to a phenomenon known as retrieval-induced forgetting (Anderson et al., 1994).
In the paradigm typically used to study retrieval-induced forgetting (Anderson et al., 1994), participants are asked to memorize a list of category-exemplar pairs and are then asked to retrieve from memory half of the exemplars from half of the categories. This selective retrieval practice creates three types of items: (1) practiced items, Rp+, (2) unpracticed items from practiced categories, Rp−, and (3) unpracticed items from unpracticed categories, Nrp. After a brief delay, participants are tested on all the items. Due to the testing effect, Rp+ items are typically recalled best. For current purposes, however, the more important finding is that Rp− items are typically remembered worse than Nrp items (i.e., retrieval-induced forgetting). According to the inhibitory account of retrieval-induced forgetting (Anderson, 2003;Storm & Levy, 2012), Rp− items are inhibited during retrieval practice to facilitate access to Rp+ items, with this inhibitory effect persisting until the final test when the Rp-items become the targets of recall themselves. Retrievalinduced forgetting is argued to be an adaptive phenomenon in memory as it allows people to access relevant information more effectively by minimizing the likelihood that other, less relevant information will interfere during retrieval. Other theoretical accounts of retrieval-induced forgetting have argued that retrieval causes forgetting through non-inhibitory mechanisms, such as strength-based interference, blocking, or changes in context that make it difficult to retrieve the Rp− items (Jonker et al., 2013;Raaijmakers & Jakab, 2013;Verde, 2013).
Importantly, not all research has observed evidence of retrieval-induced forgetting. Under certain conditions, retrieval practice has been shown to enhance the subsequent remembering of related non-practiced information, a phenomenon referred to as retrieval-induced facilitation. For example, Chan et al. (2006) asked participants to study information about a particular topic (e.g., the toucan). Some participants received retrieval practice for the information, whereas others did not. After a 24-h delay, participants who received retrieval practice performed better on a final test not only for the information that was practiced, but also for related information that was not practiced. Although seemingly at odds with evidence of retrieval-induced forgetting, demonstrations of retrieval-induced facilitation fit well with the idea that retrieval should benefit memory for related non-retrieved information via a kind of spreading activation (Anderson, 1983;Collins & Loftus, 1975). Specifically, when two elements in memory are related and well integrated, the activation of one element should be expected to activate the other element, making it more recallable (not less recallable) in the future than it would have been otherwise.
Why do some studies using a retrieval-practice paradigm show evidence of retrieval-induced facilitation and not retrieval-induced forgetting? Chan et al.'s (2006) experiments differed in two critical ways from many studies showing retrieval-induced forgetting. First, the authors designed the study materials to reflect educational materials. As such, the materials were well integrated, meaning that the content and presentation style were cohesive and clear, and participants were encouraged to consider the many pieces of information in relation to one another. This format is a departure from the more typical retrieval-practice paradigm (Anderson et al., 1994;Murayama et al., 2014), which tends to use category-exemplar pairs designed to be less integrated. The integrated materials and the instructions directing participants to connect the various pieces of information may work together to prompt participants to encode the material in a way that serves as a boundary condition on retrieval-induced forgetting (Anderson, 2003;Storm et al., 2015). Indeed, there is evidence that retrieval-induced forgetting is not observed in the typical paradigm when Rp+ and Rp− items are well integrated, either because of how the materials are constructed or because of the way participants are instructed to encode them (e.g., Anderson & McCulloch, 1999;Anderson et al., 2000;Goodmon & Anderson, 2011).
The second key feature of the experiments reported by Chan et al. (2006) was the length of the delay between retrieval practice and final test. Their study used a 24-h delay, which has been argued to be enough time to allow the effects of retrieval-induced forgetting to dissipate (e.g., Carroll et al., 2007;MacLeod & McCrae, 2001;Saunders et al., 2009;Saunders & MacLeod, 2002). Although more recent work has suggested that retrieval-induced forgetting can, in some situations, persist as long as a week following retrieval practice (García-Bajos et al., 2009;Migueles & García-Bajos, 2007;Saunders et al., 2009;, it stands to reason that the inhibitory effects of retrieval practice should, at the very least, become less pronounced after a long delay than after a short delay. This should make it less likely for retrieval-induced forgetting to offset or mask any co-occurring effect of retrieval-induced facilitation (which is to say that there is no theoretical reason to think that only one of these effects can be present at a given time, but rather that they cannot be observed simultaneously). Beyond that, a delay may enhance the effect of retrieval-induced facilitation in the same way that it has been shown to enhance the testing effect more generally. Specifically, when Rp− items are coactivated during retrieval practice (owing to their relatedness or high level of integration), they may benefit from a type of covert testing effect that is more likely to enhance recall after a long delay than after a short delay (e.g., Congleton & Rajaram, 2012;Roediger & Karpicke, 2006b).
In summary, Chan et al. (2006) simultaneously implemented two distinct conditions-high integration and a long delay-the combination of which may have diminished retrieval-induced forgetting and made it more likely for retrieval-induced facilitation to be observed. In a subsequent study, Chan (2009) attempted to isolate the effects of integration and delay by manipulating them experimentally. In the first of two experiments, Chan employed the same materials as were used by Chan et al. using a 2 (Integration: low, high; between subjects) × 2 (Delay: 20-min, 24-h; between subjects) × 3 (Item Type: Rp+, Rp−, Nrp; within-subjects) mixed design. The to-be-learned prose passages were either presented as they would typically be structured (high-integration condition) or with the sentences within each paragraph shown in a random order (low-integration condition). In the high-integration condition, participants also received instructions to integrate the information as it was presented. In the low-integration condition, the materials were presented as a collection of facts, and participants were not instructed to integrate the information. Consistent with prior research on retrieval-induced forgetting, participants in the low-integration/20-min condition demonstrated significant retrieval-induced forgetting (M effect size = − 9%). In comparison, participants in the high-integration/24-h condition (the same condition used in the 2006 experiments) demonstrated significant retrieval-induced facilitation (M effect size = 9%). Interestingly, the participants in the other two conditions (high-integration/20-min and low-integration/24h) did not show either effect to a level of statistical significance, indicating that both high integration and a long delay may be required to observe retrieval-induced facilitation.
Chan's (2009) Experiment 2 sought to investigate the same factors with a different type of materials. Rather than prose passages, object-location pairs that formed to prepositional sentences were used. For example, for the object-location pair "orange-fridge", participants would be presented with "The orange is in the fridge." Integration was manipulated by either presenting objects sequentially in the same location (i.e., blocked by location) and giving the high-integration instructions (high-integration condition), or by presenting the sentences in a random order and not giving the high-integration instructions (low-integration condition). A final test took place following a 20-min or 24-h delay. Participants showed the same pattern as in Experiment 1: participants in the low-integration/20-min condition showed retrieval-induced forgetting (M effect size = − 7%). Participants in the high-integration/24-h condition showed retrieval-induced facilitation (M effect size = 11%). Participants in the other two conditions did not show evidence of either effect.
Despite the applied and theoretical importance of Chan's (2009) findings, we are not aware of any published attempts to directly replicate them. Although there is strong evidence of integration and delay acting as boundary conditions to prevent or diminish the effect of retrieval-induced forgetting, there is much less evidence of the two factors combining to produce retrieval-induced facilitation. Moreover, the original experiments reported by Chan were somewhat underpowered, making it even more important to investigate the finding's reliability and to estimate the effect size more accurately. As such, the first goal of the current study was to perform a partial direct replication of Chan's highintegration/long-delay conditions. Specifically, the current study focused on the conditions under which Chan observed retrieval-induced facilitation. The other conditions were not included, as evidence of retrieval-induced forgetting is already well established, as well as the mitigating effects of integration and delay (Murayama et al., 2014;Storm et al., 2015).
A second goal of the current study was to examine the durability of the retrieval-induced facilitation effect, replicating not only the 24-h delay condition used by Chan (2009) but also implementing longer delays (i.e., 1 or 2 weeks). Together, these conditions provide opportunities to replicate and extend Chan's findings, and to determine whether the relatively large effect of retrieval-induced facilitation (9-11%) which was documented after a 24-h delay would persist 1-2 weeks following retrieval practice. Achieving a better understanding of how long retrieval-induced facilitation persists will shine a light on the everyday implications of selective retrieval practice and provide new insight into the effect's theoretical mechanisms.
On the one hand, the retrieval-induced facilitation effect may exhibit a pattern such that it is strongest after a 24-h delay but decreases steadily after that. From a theoretical standpoint, however, this possibility seems unlikely. Retrieval-induced facilitation may be long-lasting and perhaps persist indefinitely. Whereas retrieval-induced forgetting is considered the result of a temporary reduction in accessibility, retrieval-induced facilitation is believed to reflect the consequences of unpracticed items being activated by retrieval, thus benefiting from a type of retrieval practice effect. Therefore, just as the testing effect has been shown to persist (and even become larger) following long delays (Butler & Roediger, 2007;Roediger & Karpicke, 2006b;Wheeler et al., 2003), retrieval-induced facilitation may be expected to follow a similar trajectory. That said, with delay, overall recall rates begin to decline, which may make it more challenging to observe a large effect after a long delay than after a short delay. At a minimum, however, based on the theoretical explanation of retrieval-induced facilitation provided by Chan (2009), the effect should remain significant even after a longer final delay.

Experiments 1a and 1b
The goal of the first set of experiments was to replicate and extend the original findings of Chan (2009). Every participant took part in two separate experiments (Experiment 1a and Experiment 1b), run sequentially within a single 1-h time block. Experiment 1a was designed to replicate the high-integration/long-delay condition of Chan's Experiment 1, with participants studying and receiving retrieval practice for a subset of items from one of two prose passages. Experiment 1b was designed to replicate the high-integration/longdelay condition of Chan's Experiment 2, with participants studying and receiving retrieval practice for a subset of object-location pairs. Half of the participants were tested on both sets of materials after a 24-h delay, thus replicating the delay used by Chan. The other half of the participants were tested after a 2-week delay. Retrieval-induced facilitation was assessed by comparing the final test performance on unpracticed items from practiced passages/locations (Rp− items) to that of unpracticed items from unpracticed passages/locations (Nrp items).

Participants
Seventy-four University of California, Santa Cruz (UCSC) undergraduate students received partial course credit in a psychology course for their participation. Participants were assigned pseudo-randomly to participate in either the 1-day Delay condition or the 2-week Delay condition. Specifically, assignment to Delay condition was determined based on each participant's response to an arbitrary question on an online pre-screening survey completed at the beginning of the term. The question asked students to indicate whether the number on their clock was odd (e.g., 2:31 pm) or even (e.g., 2:32 pm) when they were answering the question. Students who indicated an odd number were invited to sign up for the 1-day condition. Students who indicated an even number were invited to sign up for the 2-week condition. Participants were only aware of the condition to which they were assigned and, thus, only had the opportunity to sign up for that condition.
Ten participants failed to return for the second part of the study (three participants in the 1-day condition; seven participants in the 2-week condition). These participants were not included in the analysis, leaving a total of 33 and 31 participants in the 1-day and 2-week conditions, respectively. The stopping rule, which was determined before commencing data collection, was to stop when we reached a total of 64 participants. A power analysis confirmed that this total number would be sufficient for the study to have approximately 80% power to detect a 6% effect size in the overall main effect of retrieval-induced facilitation in each of the two sub-experiments, assuming a standard deviation of the effect size to be 17%. The number of participants in the 1-day condition was also expected to be sufficient to have 80% power to detect the 9% effect size initially observed by Chan (2009) in that condition alone. For comparison purposes, Chan's study included a sample of 24 participants in each of the high-integration/long-delay conditions reported in the two experiments.

Design
The experiments were run in-person, either individually or in pairs, in a contained room with a barrier between two side-by-side computers. A research assistant was always available to answer questions and monitor and facilitate progress.
In Session 1, participants completed the study and retrieval-practice phases for Experiments 1a and 1b. In Session 2, which took place after a one-day or two-week delay, they completed the final tests for both Experiments 1a and 1b. Both Sessions took place in the same room under similar circumstances (testing was completed alone or in pairs with a research assistant on hand). Participants run in pairs were always assigned to the same Delay condition.
All materials, including verbal instructions given to participants, were constructed to follow as closely as possible the instructions used in Experiments 1 and 2 of Chan (2009), based on the materials used in Experiments 2 and 3 of Chan et al. (2006). Experiments 1a and 1b each consisted of 2 (Delay: 1 day, 2 weeks; between subjects) × 3 (Item Type: Rp+, Rp−, Nrp; within-subjects) mixed designs. Following the common practice of the retrieval-induced forgetting/facilitation literature: Rp+ items consisted of items that received retrieval practice; Rp− items consisted of unpracticed items from the practiced categories; Nrp items consisted of all items from the unpracticed categories.

Experiment 1a
Materials Prose passages: Two passages were used (received through direct communication from Dr. Chan), one on the Big Bang Theory and one on the Shaolin Temple. Both passages were approximately 1900 words and 13 paragraphs. Like Chan (2009), the passages were divided into sentences, and individual sentences were inserted into Google Slides presentations such that each slide contained one sentence. Sentences were restructured from their original content so that they could stand on their own when presented individually (for example, changing "He" to "Arthur Eddington").
Retrieval practice questions: The questions used for the retrieval practice stage of Experiment 1a were drawn directly from Chan (2009, Appendix C). Twenty-four questions (short-answer, often fill-in-the-blank) were created from each passage, consisting of 12 related pairs. An example of a related pair: According to the flat and open models, the universe will… (answer: expand infinitely) paired with According to the oscillating closed universe model, the universe will alternate between a big bang and a… (answer: big crunch). Four subsets of items were created to counterbalance (a) which passage received retrieval practice, and (b) which item from each related pair would be used for retrieval practice.
Test questions: The test consisted of all 24 possible questions for each passage. Test items were divided into three categories: (1) 12 items that received retrieval practice, Rp+; (2) 12 items that did not receive retrieval practice but were from the passage that received retrieval practice, Rp−; and (3) 24 items from the passage that did not receive retrieval practice, Nrp. The order of questions was counterbalanced between participants such that half of the participants began the test with the Rp+/Rp− questions, and the other half began with the Nrp questions.
Procedure: The procedure was the same as that of Chan (2009). Session 1 began with participants instructed to read the two passages for 16 min each (Big Bang Theory first, followed by Shaolin Temple) and it was explicitly noted that they should attempt to integrate the sentences as well as possible (Chan, 2009, Appendix E). They were also encouraged to read for the entire 16 min, starting again from the beginning if they reached the end before time elapsed. A research assistant informed participants when the time was up. Half of the participants completed retrieval practice for the Big Bang Theory, and the other half completed retrieval practice for the Shaolin Temple. Retrieval practice was completed immediately after reading the relevant passage.
For retrieval practice, a research assistant directed the participant to follow a URL to a Qualtrics survey with 12 questions (repeated twice) about the passage. In total, retrieval practice took 10 min. Participants had 25 s to answer each question before the computer automatically advanced to the next item. Following retrieval practice, participants were dismissed.
After their assigned delay, participants returned to the lab to complete the final test. Specifically, they were asked to answer 24 questions on the Big Bang Theory passage in the same format used for retrieval practice. The same process was repeated for the Shaolin Temple. The two tests took approximately 16 min.

Experiment 1b
Materials Object-location pairs: Twenty-four object-location pairs were presented to participants individually in cohesive sentences (e.g., The rose is in the garden.) on a computer screen using PowerPoint. The object-location pairs consisted of six individual locations (e.g., garden), and each location had four individual objects (e.g., rose) reported to be in that location. The object-location pairs were generated randomly (MATLAB 2019b) for each participant, selected from a list of 24 possible objects and 24 possible locations, drawn from Chan (2009, Appendix D). No two objects beginning with the same two letters were placed in the same location (e.g., bottle and book were never both in the nursery).
Sentences for retrieval practice: In retrieval practice, participants saw a Qualtrics survey showing a subset of 8 of the object-location pairs with all but the first two letters of the object removed from the sentence (e.g., The ro_____ is in the garden.). These were a subset of object-location pairs that were randomly generated (MATLAB 2019b) to include half of the objects (2 of 4) from 4 (of 6) locations. Participants completed retrieval practice twice.
Sentences for test: For the test, participants saw a Qualtrics survey consisting of all 24 object-location pairs in the same style used for retrieval practice. The test items consisted of three item types: (a) the 8 object-location pairs that received retrieval practice (2 objects from 4 locations), Rp+; (b) the 8 object-location pairs that did not receive retrieval practice but were from the same locations as those that did receive retrieval practice, Rp−; and (c) the 8 object-location pairs from the completely unpracticed locations, 4 objects from 2 locations, Nrp.
Procedure: To begin the study phase of Experiment 1b, a research assistant opened the object-location pairs file while explaining the instructions, emphasizing the need to try to integrate the information as well as possible (Chan, 2009, Appendix E). All items in each location were presented successively before moving to the next location, increasing the ability of participants to integrate the object-location pairs. Specifically, the first four sentences described four objects in the same location; then, the following four sentences described four objects in another location, and so on. The study phase took less than 4 min (8 s per object-location pair, with 1 s between slides).
Participants completed the retrieval practice phase immediately after the study phase. Eight of the twentyfour object-location pairs were practiced (repeated twice, individually presented, and in random order) by typing the correct object name into a textbox. Retrieval practice took less than 3 min (10 s per item). After completing retrieval practice, participants were done for the day.
Participants were tested on the object-location pairs in Session 2, after their assigned delay. All 24 object-location pairs were presented in a random order using the same format used for retrieval practice, but with 15 s per item. The test took approximately 6 min.

Experiment 1b
Experiment 1b used the same analysis plan as Experiment 1a and showed a similar pattern of results.

Discussion
Although the results of Experiments 1a and 1b failed to provide significant evidence of retrieval-induced facilitation, both experiments showed some limited numerical evidence of the effect. Moreover, it is noteworthy that neither of the two experiments, which consisted of two types of materials (prose and object-location pairs) tested after two different delays (one day and two weeks), showed any evidence of retrieval-induced forgetting. This finding suggests that, at least under these testing conditions, the simultaneous implementation of high integration at the time of encoding and long delay between retrieval practice and test may be an effective tool to prevent retrieval-induced forgetting.

Experiments 2a and 2b
Although a significant effect of retrieval-induced facilitation was not observed in Experiments 1a and 1b, the small numerical effect (and general directional consistency across conditions) lends itself to the possibility that an effect exists, but that is much smaller than that observed by Chan (2009).
To provide further clarification, Experiments 1a and 1b were replicated with the addition of a new manipulation that could potentially make the study more sensitive to the benefits of retrieval practice on unpracticed items. Specifically, participants received feedback on the final test, thus giving them an additional opportunity to study the items before giving them a second final test moments later. Presumably, although Rpitems might not be more recallable than Nrp items on the first final test, the Rp-items might be better learned or have higher storage strength than Nrp items due to their previous activation (Bjork & Bjork, 1992), thus potentiating their relearning when the correct answers are provided via feedback. Said differently, if retrieval practice activates related items in memory, then it should act to not only potentiate the future accessibility/retrieval strength of related non-target items in memory (as evidenced by observations of retrievalinduced facilitation), but it should also act to potentiate the availability/storage strength of related non-target items in memory. Although this activation might not be sufficient to make Rp− items measurably more recallable than Nrp items, especially after a long delay, it could be sufficient to put Rp− items in a position to be more effectively relearned than Nrp items. Indeed, as predicted by the New Theory of Disuse (Bjork & Bjork, 1992), items with higher storage strength are expected to benefit more from relearning than items with lower storage strength (assuming equivalent levels of retrieval strength). Thus, even if Rp− items are not recalled better than Nrp items on an initial final test, when that test is followed by feedback in which the correct items are relearned, Rp− items might become better recalled than Nrp items on a second final test. Experiments 2a and 2b replicated Experiments 1a and 1b with the addition of a second test following the first test (with the first test including feedback in the form of the correct answers). The first test served as a close replication of the earlier experiments, with the only difference being that participants were given feedback after attempting to recall each item. The second test was added to provide what theoretically could be a more sensitive measure of retrieval-induced facilitation. Another critical difference in Experiment 2 was the number of Delay conditions and how participants were assigned to them. In Experiments 2a and 2b, participants were randomly assigned to one of three Delay conditions (1 day, 1 week, 2 weeks), which allowed the possibility of a non-linear relationship between delay and retrieval-induced facilitation to be explored.

Participants
A total of 103 UCSC undergraduate students were recruited to participate in the study and received partial course credit for their participation. Unlike in Experiment 1, participants in Experiment 2 were randomly assigned to Delay condition. Specifically, participants were informed when they signed up to participate in the study that they would be randomly assigned to one of three conditions (1 day, 1 week, or 2 weeks). They were not informed of the final delay to which they were assigned until after completing the first session of the experiment, and none of the participants dropped out of the study when they were informed of the final delay. That said, not all participants completed the second session. All participants returned in the 1-day condition, but one participant failed to return in the 1-week condition, and three participants failed to return in the 2-week condition. These participants were not included in the analysis, leaving a total of 35, 31, and 33 participants in the 1-day, 1-week, and 2-week conditions, respectively. The stopping rule, determined before commencing data collection, was to stop at 96 participants, matching the number of participants per condition in Experiment 1 (three additional participants were run by accident, and we chose to include them in the data set before analyzing the data). A power analysis confirmed that this total number would be sufficient for the study to have 80% power to detect a 5.5% effect size in the overall main effect of retrieval-induced facilitation in each of the two sub-experiments, assuming a standard deviation of the effect size to be 19%.

Design
Participants took part in the experiment remotely. They completed the study and retrieval practice phases for both Experiments 2a and 2b in Session 1; they completed the final test for both experiments in Session 2, after either a 1-day, 1-week, or 2-week Delay. Links and instructions were emailed to the participants on the morning of their scheduled participation days.
The materials and procedure were identical to those described in Experiments 1a and 1b, except for minor alterations necessary to add the second final test and to allow the entire experiment (Session 1 and Session 2) to be completed remotely. Changes included (1) collapsing multiple surveys (with multiple links) into one link to ensure that participants completed the tasks in the correct order, (2) adding feedback to the first final test, and (3) adding a second final test (without feedback) after the first test. Both final tests took place in Session 2 and followed the same general method described in Experiments 1a and 1b. In the first final test, participants were provided the correct answer after every question. After completing the first final test, participants were given a filled 15-min distractor task (word search) before proceeding to the second final test.
The experiment employed a 3 (Delay: 1 day, 1 week, 2 weeks; between-subjects) × 2 (Item Type: Rp-, Nrp; withinsubjects) mixed design. Participants were randomly assigned to a delay of either one day, one week, or two weeks between the date of encoding/retrieval practice (Session 1) and the date of the final tests (Session 2). For this experiment, Rp + items were not included in either test as they are not necessary for calculating retrieval-induced facilitation, and because of the possibility that testing Rp+ items on the first final test could disparately affect the recall of Rp-and Nrp items on the second final test.
The interaction between Item Type and Test was not significant. In other words, the difference in recall between Rp− and Nrp items did not vary as a function of Test: F(1, 96) = 0.69, p = 0.41, η p 2 = 0.01. Finally, there was no evidence of an interaction between Item Type, Test, and Delay: F(2, 96) = 0.2, p = 0.82, η p 2 = 0.00. There was no evidence that the size of the retrieval-induced facilitation effect (or the difference between Rp− and Nrp items) changed from Test 1 to Test 2 depending on Delay condition.

Discussion
The results of Experiments 2a and 2b provide some evidence of retrieval-induced facilitation, replicating the general pattern observed by Chan et al. (2006) and Chan (2009 no significant evidence of the retrieval-induced facilitation effect becoming larger on Test 2 compared to Test 1. That said, it is noteworthy that the effect was not eliminated by feedback on Test 1. Re-exposure to Rp− and Nrp items might have eliminated the effect of retrieval-induced facilitation in the same way that re-exposure to Rp− items and Nrp items has been shown to eliminate retrieval-induced forgetting (Storm et al., 2008Storm & Soares, 2022). The current results suggest that whereas the costs of retrieval practice can be eliminated by relearning, the benefits of retrieval practice may not be-a pattern of results that makes sense given the theoretical distinction between the cause of retrieval-induced forgetting and the cause of retrieval-induced facilitation. Finally, it is unclear why evidence of retrieval-induced facilitation was observed in Experiments 2a and 2b but not in Experiments 1a and 1b. One possibility is that the effect is small and variable, and by chance it was significant in one experiment but not the other. It is worth noting that although statistically significant, the magnitude of the retrieval-induced facilitation effects that we observed were substantially smaller than the 9 and 11% effects reported by Chan (2009). Another possibility is that there were subtle differences in the experimental procedure or sample. Experiments 2a and 2b were run entirely online during the COVID-19 pandemic, for example, whereas Experiments 1a and 1b were run in the laboratory. Given the cross-experiment comparison, it is impossible to know for sure, but this difference could be something for future research to consider.

Meta-analysis
Taken together, the results of the current experiments suggest that retrieval-induced facilitation may be a real effect, just one that is somewhat smaller than previously observed. Chan's (2009) two experiments included a combined total of 48 participants in the critical condition (high integration/long delay). In contrast, the current two experiments included a combined total of 164 participants in the critical condition. Moreover, all 164 participated in two experiments designed to measure retrieval-induced facilitation, yielding 328 data points. As a result, although our individual experiments were somewhat limited in power, the current study when combined across experiments had far more power to estimate the true effect size of retrieval-induced facilitation than the study reported by Chan. Indeed, the fact that Chan's study was so underpowered was a large part of the motivation for conducting the current work.
Given the broad theoretical implications for understanding the consequences of retrieval and the potential everyday applications to contexts like the classroom, it is important to estimate the magnitude of the retrieval-induced facilitation effect more accurately. A meta-analysis was used for this purpose, including the combined data reported by Chan (2009) and the two sets of experiments reported here. Specifically, the spreadsheet provided by Neyeloff et al. (2012) was used to perform the meta-analysis calculations and construct the accompanying forest plot. Included as separate "experiments" were each of the delay conditions for both tasks from both current experiments: 4 "experiments" from Experiments 1a and 1b (Experiment 1a, 1 day and 2 weeks; Experiment 1b, 1 day and 2 weeks) and 6 "experiments" from Experiments 2a and 2b (Experiment 2a, 1 day, 1 week, and 2 weeks; Experiment 2b, 1 day, 1 week, and 2 weeks). Note that only the results from Test 1 in Experiments 2a and 2b were used, not Test 2. Chan's (2009) results from the high-integration/24-h delay conditions were also included. Thus, the meta-analysis included a total of 12 experiments (k = 12; df = 11).
Results from the meta-analysis are shown in Table 1. Sample sizes ranged from 24 to 35 (M = 32); effect sizes ranged from − 0.04 to 0.11 (M = 0.04); standard error ranged from 0.03 to 0.07 (M = 0.04). The measure of the effect size was the difference between performance on Rp− items and Nrp items (so, in the two instances showing a retrievalinduced forgetting rather than a retrieval-induced facilitation, the score is negative).
The measure of heterogeneity in samples, Q, was calculated to be 8.44, which was below the critical value for the size of the sample (19.68), so there was no evidence to support true between-studies variability. To quantify more specifically any heterogeneity, I 2 was calculated to be − 30.20. Common practice dictates that this negative number can be interpreted to indicate that 0% of the variability in effect sizes that was observed in the meta-analysis was due to true heterogeneity in the included samples (Higgins et al., 2003). Thus, a fixed-effects model was used rather than a random-effects model. Taking into account the variance and sample size of each experiment, the weighted effect size of retrieval-induced facilitation was 0.05 (SE = 0.01, 95% CI = [0.03, 0.07]). The same calculations were performed with Chan's experiments removed (i.e., only including the results from our experiments), and this analysis showed a slightly smaller but still significant average effect size of 0.04 (95% CI = [0.02, 0.06]).
Although the heterogeneity statistic did not suggest differences between delays or tasks, the effect sizes are reported for illustrative purposes. The effect size was 0.05 (95% CI = [0.02, 0.08]) in the 1-day delay condition, 0.05 (95% CI = [0.00, 0.11]) in the 1-week delay condition, and 0.04 (95% CI = [0.01, 0.07]) in the 2-week delay condition. These results suggest that retrieval-induced facilitation's effect size did not vary as a function of the delay between retrieval practice and test. It remained significant at each time point, although power is limited due to the small number of "experiments" at each Delay.

General discussion
Most research on the consequences of retrieval practice has focused on two major findings: (1) that retrieval tends to make retrieved items more recallable in the future than they would have been otherwise (i.e., the testing effect; Roediger & Karpicke, 2006a), and (2) that retrieval tends to make other, non-retrieved items less recallable in the future than they would have been otherwise (i.e., retrieval-induced forgetting; Anderson, 2003). Although these two consequences are quite robust (Murayama et al., 2014;Roediger et al., 2011;Rowland, 2014;Storm et al., 2015), there are conditions under which a different pattern of results emerges. Specifically, when to-be-learned materials are well integrated, and when the final test is delayed by 24 h, non-retrieved items have been shown to become more recallable in the future than they would have been otherwise (i.e., retrievalinduced facilitation; Chan, 2009;Chan et al., 2006).
The goal of the current study was to replicate and extend evidence of the retrieval-induced facilitation effect, providing the literature with a more precise estimate of its true effect size. Whereas the mitigating effects of integration and delay on retrieval-induced forgetting are well established, evidence pertaining to their combined effect leading to retrieval-induced facilitation is not. Indeed, only a small handful of studies have been specifically designed to study retrieval-induced facilitation. Although the current results provide some evidence of retrieval-induced facilitation, the size of the effect was substantially smaller than that which has been previously observed. Specifically, collapsed across all four current experiments, the estimated effect size was 4% (95% CI = [2%, 6%]), compared to the 7-11% effect sizes observed by Chan (2009) and Chan et al. (2006). The reduced effect size should not be surprising, however, given that replication studies overall tend to produce effect sizes smaller than those initially reported (Patil et al., 2016). Interestingly, there was neither evidence of the effect varying as a function of final delay (1 day vs. 1 week vs. 2 weeks) nor was there evidence of the effect becoming larger on a second test following feedback (Experiments 2a and 2b).
The finding of significant retrieval-induced facilitation effects after 1-and 2-week delays (at least when looking at the results of the meta-analysis) is important. It fits well with the idea that retrieval-induced facilitation is caused by the activation of non-practiced items during retrieval practice, thereby giving non-practiced items a kind of covert retrieval practice that enhances their subsequent accessibility. Just as testing effects have been shown to remain significant across a  (Roediger & Karpicke, 2006b), so too, it appears, do the effects of retrieval-induced facilitation. This finding provides an interesting juxtaposition to retrieval-induced forgetting, which has generally been observed to become diminished over time.
In the first study to show evidence of retrieval-induced facilitation, Chan et al. (2006) employed prose passages (indeed, they were the same passages as those used by Chan (2009) and the current Experiments 1a and 2a) and found 9% and 7% effects of retrieval-induced facilitation in their Experiments 1 and 2, respectively. In their Experiment 3, Chan et al. manipulated the strategy employed by participants during retrieval practice (broad vs. narrow). Specifically, participants were asked to either minimize or maximize the breadth of their retrieval efforts (i.e., the extent to which they attempted to think of related facts for each question). Participants who used a broad strategy demonstrated a 10% retrieval-induced facilitation effect, whereas participants who used a narrow strategy demonstrated a 0% retrieval-induced facilitation effect. This finding suggests that facilitation effects may be larger when participants are led to consciously think about related information during retrieval. Interestingly, it does not appear that Chan gave participants the instruction to engage in broad retrieval practice in his 2009 experiments (and the current study followed suit), choosing to measure the consequences of retrieval practice when participants are not given any specific instructions at the time of retrieval practice. It is possible that a larger and more consistent effect of retrievalinduced facilitation would have been observed if, as part of the current study, participants were instructed to undertake broad retrieval practice. Then, they may have been more likely to activate related information from the passage.
Another potential explanation for the relatively smaller effect sizes observed in our experiments compared to those by Chan et al. may be related to differences in the samples employed and how participants engaged in the retrievalpractice tasks. If participants in the studies reported by Chan et al. were more likely to naturally engage in the kind of broad retrieval strategies that promote retrieval-induced facilitation than the participants in our experiments, for example, then such a difference could have contributed to the discrepant effect sizes observed across the two studies.
It is worth noting that several studies have used other paradigms to observe evidence of retrieval-induced facilitation. In the work by Samenieh (2010, 2012), for example, retrieval practice was shown to enhance memory for items that suffered from either directed forgetting or context-dependent forgetting, presumably because retrieval reactivated certain contextual cues and therefore facilitated access to the individual items within that context. Similar effects have been observed in studies using categorical or contextually bound materials and free recall tests (e.g., Rowland & DeLosh, 2014). For example, retrieval practice can make participants more likely to think of practiced categories than non-practiced categories on a free recall test, leading to a retrieval-induced facilitation effect that is caused not by the retrieval-induced activation of individual items, but by the increased accessibility of practiced categories at final test. The current results (combined with those reported by Chan and colleagues) are important in that they suggest that retrieval-induced facilitation can also result from the activation of individual items during retrieval practice.
From an applied standpoint, the current results suggest that retrieval does not always cause the forgetting of non-retrieved information in memory. Instead, under the right conditions, retrieval has the potential to enhance the subsequent remembering of non-retrieved information (for related evidence in a classroom setting, see Cranney et al., 2009). The information studied by participants in these experiments reflected educational materials, and the long delays between retrieval practice and test (ranging from one day to two weeks) also reflected a typical student's studying habits, making these findings particularly relevant to the classroom. With the testing effect supporting the recall of practiced information, and retrievalinduced facilitation supporting the recall of related and unpracticed information, teachers and students would be well served to take advantage of retrieval practice (for example, through practice tests) to support learning.
The knowledge that retrieval-induced facilitation is a possible outcome of retrieval practice may help to allay concerns about the potential unintended effects of retrieval-induced forgetting. Indeed, it is possible that under typical educational settings-where materials tend to be cohesive and well integrated, and where learning is assessed following delays longer than a few minutes-retrieval-induced facilitation may occur far more often than retrieval-induced forgetting. To extend this work further, future research should explore (1) the extent to which retrieval-induced facilitation is observed in actual everyday learning environments, and (2) whether there are conditions that can be implemented during learning and retrieval practice to magnify the effects of retrieval-induced facilitation that are observed.
Author contributions Both authors contributed to all stages of the study.
Funding No funding was received for conducting this study.

Data availability
The data can be found on the Open Science Framework at https:// osf. io/ d8gsh/.

Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.

Ethical approval
The study was approved through the Institutional Review Board of University of California, Santa Cruz as an exempt protocol.

Informed consent Informed consent was obtained from all individual participants included in the study.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.