Extensive research has established that, over a broad range of conditions and populations, some learning methods are more effective than others. Educationally relevant examples include spaced rather than massed practice (Carpenter et al., 2012; Cepeda et al., 2006; Cepeda et al., 2008; Mozer et al., 2009; Rohrer, 2015) and interleaved rather than blocked practice (Foster et al., 2019; Pan et al., 2019; Rohrer, 2012; Rohrer et al., 2020). A third finding in that vein, and the one on which we focus in the current paper, is that retrieval from memory after an initial study trial can produce better learning and retention than does a restudy activity after the initial study trial. For example, answering the question “what is the powerhouse of the cell?” yields a higher probability of retrieving “mitochondria” on a later test than does restudying “the powerhouse of the cell is the mitochondria,” particularly if correct answer feedback is provided after each test question (Kang et al., 2007; Pashler et al., 2005). That finding has been variously referred to as the retrieval practice effect, test-enhanced learning, and the testing effect.
A typical testing-effect experiment involves two sessions separated by a retention interval ranging from a few minutes to several weeks. The study and training phases are conducted during session 1 and the final test phase is administered during session 2. For the case of cued-recall testing that is explored in the current work, materials such as facts or word-pairs (e.g., lime-salt) are first studied intact. During training, half of the materials are restudied and half are tested (e.g., lime-?). In each of the current experiments, correct answer feedback (henceforth, feedback) was provided after each training phase test trial. During the final test in session 2, a cued-recall test is administered for all items. On the final test in such studies, the testing effect (TE) is defined for each participant as the proportion correct for items in the test condition, minus the proportion correct for items in the restudy condition.
The testing effect has been obtained in a variety of contexts (for reviews, see Karpicke et al., 2014; Kornell & Vaughn, 2016; Rickard & Pan, 2018; Roediger & Karpicke, 2006; Rowland, 2014; van den Broek et al., 2016). It has been observed not only in the laboratory, but also in multiple applied settings, including medical resident classroom learning (Larsen et al., 2009), medical skill learning (Kromann et al., 2009), college classrooms using clickers (Lantz, 2010), children’s spelling (Pan et al., 2015), high school classrooms (Nungester & Duchastel, 1982), and university classrooms (McDaniel et al., 2007).
The testing-effect paradigm allows for multiple informative variations. Manipulations explored in the literature include test type (Carpenter & DeLosh, 2005), material type (Kronman et al., 2009; Pan et al., 2015; Roediger & Karpicke, 2006), presence or absence of feedback (Kang et al., 2007), retention interval (Carpenter et al., 2008; Kornell et al., 2011), and blocked versus random sequencing of test and restudy trials during training (Abel & Roediger, 2017), among others. However, one basic aspect of the testing-effect paradigm has rarely been manipulated; namely, the degree of learning that is achieved prior to the training phase. Exploration of that topic should be of value from both theoretical and applied perspectives. For example, in educational contexts, some students are likely to have studied their notes or read a textbook once prior to a quiz, whereas other students may have done so two or more times, presumably yielding increased episodic knowledge of the material prior to the quiz. The question addressed here is whether the effectiveness of that quiz for producing new learning is moderated by that difference in prior, study-based learning.
We investigate that question by manipulating the number of study-phase item repetitions within the cued-recall testing-effect paradigm. We assume that study phase learning will increase monotonically with increasing repetitions. Over three experiments, experimental conditions involved one study phase trial per item (1x study repetition), four study phase trials per item (4x study repetitions), or eight study trials per item (8x study repetitions). Across all experiments, training phase exposure was held constant at one trial per item, in both the test and the restudy conditions. Although there have been a few papers in which the number of study repetitions has been varied over conditions (e.g., Roediger & Karpicke, 2006), there appears to have been no work that directly addresses the effect of increasing the amount of prior study on the testing effect magnitude.
There are three exhaustive possibilities for the effect of study repetition in the current experiments. One hypothesis is that increasing that repetition, while holding training phase exposure constant, will decrease the efficacy of the training phase test relative to restudy (the attenuation hypothesis). The dual-memory model of Rickard and Pan (2018), described in the next section, predicts that outcome. An empirical phenomenon that is potentially consistent with the attenuation hypothesis is the pretesting effect (e.g., Richland et al., 2009), in which a test with feedback trial can yield substantially more learning than does a time-equated study trial, even though there is no prior study in either case. The final test performance advantage for pretesting can be as large as or larger than that of the testing-effect paradigm (Pan & Sana, 2021; cf. Latimier et al., 2019). In the current notation, pretesting constitutes the extreme case of 0x study phase repetition.
Alternatively, increased learning through study repetition may not attenuate the effect of a test, but rather enhance it (the enhancement hypothesis). That hypothesis might hold, for example, if (1) more study phase repetition yields more learning, as expected, (2) more study phase learning yields a higher proportion correct on the training test, as expected, and (3) learning through that test is greater when the correct answer is retrieved from memory than when it is provided through feedback. In that scenario, the learning advantage of a training phase test versus restudy should be enhanced with increased study phase repetition, all else held constant.
That scenario appears to be predicted by the episodic context theory of retrieval practice (Karpicke et al., 2014). That theory posits that (1) correct retrieval (but not incorrect retrieval) reinstates more of the episodic context that was encoded during an earlier study trial than does restudy, and (2) greater episodic context reinstatement on training test trials is the primary basis for the testing effect. Hence according to that model, the higher the accuracy rate on the training test, the greater the predicted advantage for testing. Furthermore, the degree of episodic context encoding that occurs during the study phase should be an increasing function of the number of study phase repetitions, increasing the upper bound for the degree of context reinstatement that can occur on a training phase test. The episodic context account thus appears to be uniquely consistent with the enhancement hypothesis. Speaking against that possibility, however, is preliminary evidence that learning on a test trial is not causally influenced by retrieval accuracy when there is immediate feedback (Kornell et al., 2015; Rickard, 2020). A third and final possibility is that the amount of study phase learning neither attenuates nor enhances (i.e., is independent of) the efficacy of testing relative to restudy (the no-effect hypothesis).
The hypotheses outlined above are best understood in terms of relative underlying memory strengths for restudied and tested items. However, the dependent measure in the current experiments, and in the vast majority of the literature, is proportion correct. More specifically, it is the difference in final test proportion correct in the test and restudy conditions. Therefore, ceiling effects are a potential issue with data interpretation on this topic. For example, as the proportion correct in the restudy condition approaches one, the maximum TE magnitude must drop toward zero. That fact creates a confound if final test restudy proportion correct increases to a high level with increasing study phase repetition and the TE is observed to decrease with increasing study repetition. In that case, it would not be possible, based on the proportion correct measure alone, to differentiate between the hypothesis of process-based attenuation and a mere ceiling effect on the TE.
In the current experiments we used two approaches to address that issue. First, the experimental designs were similar in most respects to those in our prior work on this topic, in which restudy proportion correct on the final test rarely exceeded .5, reducing the possibility of a large ceiling effect confound. Second and more decisively, we used the proportion correct prediction of the dual-memory model of test-enhanced learning (Rickard & Pan, 2018) for the case of 1x study repetition as a reference prediction not just for the 1x study groups of the current experiments – the case for which that model was originally developed to apply – but also for the 4x and 8x study groups. The logic behind that approach is that the dual-memory prediction for the 1x study case has proven highly accurate over multiple 1x study phase repetition cued-recall datasets (Pan & Rickard, 2018; Rickard, 2020). As we elaborate below, if the model prediction for the 1x study repetition case holds for all study repetition conditions in the current experiments, then the no-effect hypothesis will be supported. Alternatively, if the TE magnitude in the 4x and 8x study groups is smaller than that model predicts for the 1x study case, then the attenuation hypothesis will be supported, and if the TE magnitude in those two groups is larger than the model prediction for the 1x case, then the enhancement hypothesis will be supported. That approach does not require direct comparison of observed testing effect magnitudes across different levels of study phase repetition, and thus it circumvents the potential issue of proportion-correct scaling effects.