Introduction

One of the most robust findings in cognitive psychology is that the practice of retrieving information leads to better learning than re-studying it (Bjork & Bjork, 1992; Carpenter, 2009, 2011, 2012; Carrier & Pashler, 1992; Dunlosky et al., 2013; Roediger & Butler, 2011). However, much of this work has involved assessments of memory performance. Recent work has attempted to extend the benefits of retrieval practice to tasks that involve problem-solving transfer, which requires more than memorization. Consider the solution strategy to the problem in Fig. 1A. Although applying this solution strategy to analogous problems (see Fig. 1B–D) involves memorizing the solution strategy, it also involves learning to recognize when and how to use it (Corral et al., 2021). If either of these latter two processes fail, even if a solution strategy is memorized, the learner will not be able to transfer it to new situations (Gick & Holyoak, 1987).

Fig. 1
figure 1figure 1

Each problem used in Experiments 14 (taken from Catrambone & Holyoak, 1989), along with the corresponding correct solutions. Panel A shows Dunker’s radiation problem (1945), Panel B shows the invading general problem, Panel C shows the fire chief problem, and Panel D shows the aquarium problem

Accordingly, learners often fail to apply solutions from previous problems to novel, analogous scenarios that differ superficially (e.g., Butler, 2010; Gick & Holyoak, 1980, 1983). However, learners can solve such problems when they are reminded to think about how a novel problem relates to previous examples. These findings suggest that learners can successfully acquire and apply to-be-learned solution strategies but struggle to recognize when to use them (formally known as the inert knowledge problem; Whitehead, 1929). Thus, for retrieval practice to aid problem-solving transfer, it must help learners (a) acquire the corresponding solution strategy and (b) recognize when to apply it.

Present theories on the benefits of retrieval practice focus on how retrieval strengthens memory of the information that is retrieved, but do not directly explain how retrieval aids the transfer of learning (Carpenter et al., 2022; Pan & Rickard, 2018). One possibility is that when learners attempt to solve a problem, it allows them the opportunity to retrieve and apply the solution strategy. This opportunity is not afforded to learners when they study worked examples, as worked examples already contain the solution strategy and its application. Based on the voluminous literature on retrieval practice, problem-solving practice should therefore produce better learning and memory of to-be-learned solution strategies, leading to superior problem solving.

It is also possible that problem-solving practice enhances memory of previous problems, which might help learners recognize the similarity among old problems and novel analogs. Critically, this recognition can help learners figure out when to use the correct solution strategy.

These hypotheses offer a mechanistic account of how retrieval practice (via problem-solving practice) can facilitate problem-solving transfer. Nevertheless, these ideas have not been empirically examined and are presently open questions.

On the other hand, research on the worked example effect has found evidence against the benefits of retrieval practice on problem-solving tasks, as subjects who study worked examples often outperform subjects who engage in problem-solving practice (Cooper & Sweller, 1987; Sweller & Cooper, 1985; Van Gog & Kester, 2012; Van Gog et al., 2011). However, these studies often do not provide feedback with problem-solving practice. Feedback is critical for learning (Benassi et al., 2014) and is necessary for complex concept acquisition (Corral & Carpenter, 2020). Thus, not presenting feedback during problem-solving practice might undermine learning. In such cases, it is unclear whether the results are due to the advantage of studying worked examples over problem-solving practice, or if those results are driven by differences in feedback presentation.

Recent studies have applied retrieval practice to problem solving by having subjects read a problem scenario and retrieve its surface details (instead of solving the problem; Hostetter et al., 2018; Peterson & Wissman, 2018). Although this approach has not yielded a clear benefit of retrieval over study, retrieving a problem’s surface features can lead to forming incorrect representations, which can inhibit learning and transfer (Anderson, 1993; Corral & Jones, 2014; Holyoak & Koh, 1987; Ross, 1987, 1989; Sweller et al., 1983).

In the present paper, we explore the effects of retrieval practice and example study on problem-solving transfer through a design that addresses the aforementioned limitations. We report four experiments that compare problem-solving practice and example study. If problem-solving practice enhances memory for a solution strategy and when to apply it, then problem-solving practice should lead to best performance on a posttest involving application of the solution strategy to a new problem.

Experiment 1

Subjects were presented four analogous problems from Catrambone and Holyoak (1989). First, subjects were asked to study a single problem scenario with the solution strategy. Subjects in the example condition were then presented two more example problems to study, whereas subjects in the problem-solving practice condition were asked to solve two problems, each followed by correct-answer feedback. Subjects returned 1 week later and completed a posttest that consisted of a novel problem.

Methods

Subjects

One hundred thirty-eight undergraduate students from Iowa State University (ISU) participated in this experiment for course credit in an introductory psychology course. Approximately 67% of students who attend ISU are 21 years of age or younger, approximately 43% identify as female, and approximately 75% are White.

Thirty-one subjects did not return for the second part of the experiment (i.e., the posttest) and were not included in any of the analyses. All reported analyses were thus based on the remaining 106 subjects: (a) problem-solving practice (n = 54) and (b) example study (n = 52).

Experiments 13 were approved by the Institutional Review Board at ISU. The sample size for Experiments 13 were approximated based on an a priori power analysis with 80% power to detect a medium effect size (f = .25; α = .05).

Design and materials

Subjects were randomly assigned to one of two conditions: (a) problem-solving practice and (b) example study. All instructions and materials were presented on a computer monitor on a white background at the center of the screen. Subjects entered all responses using a computer keyboard and mouse.

The materials consisted of four analogous problems (see Fig. 1), which were taken from Catrambone and Holyoak (1989; also see Gick & Holyoak, 1980, 1983). These problems consisted of Dunker’s radiation problem (1945) and three analogs. These problems all differ superficially, but consist of the same structure, wherein there are enough resources/forces in the problem scenario to reach the problem’s goal state, but to do so, those resources/forces must be fully concentrated on a given target, and there is an obstruction that prevents this from occurring. Due to their shared structure, these problems can be solved by using the same solution strategy, wherein the resources/forces in the problem scenario are divided along alternate paths and then converge simultaneously on the specified target (i.e., a divide and converge strategy).

The correct response for each problem was broken down into three components: (a) dividing the resources/forces in the problem, (b) sending resources/forces down different paths that surround the target location, and (c) the resources/forces converging on the target location simultaneously (for a similar approach to scoring these types of problems, see Snoddy & Kurtz, 2021). Responses were scored on a scale of 0–1. A response was scored as fully correct if all three of these solution components were included. Partial credit was awarded for responses that included one (1/3 partial credit) or two (2/3 partial credit) of these solution components. Posttest responses were scored blindly to condition; responses from the learning phase from subjects in the problem-solving practice condition were scored blindly to the order that the problems were presented. This same scoring scheme was used for Experiments 24.

Procedure

This experiment was two parts: The learning phase occurred in the first part and the posttest occurred in the second part, 1 week later. During the learning phase, all subjects were presented with three of the four problems (one at a time). At the beginning of the learning phase, all subjects were presented with one of these problems along with the corresponding solution (as shown in Fig. 1) and were asked to carefully read and study them.

For subjects in the example study condition, they were then shown two more of these example problems (one at a time), along with their corresponding solutions. For subjects in the problem-solving practice condition, after being shown the first example problem, they were asked to solve two problems (one at a time). Subjects were asked to type out their solution into a textbox, which was presented directly beneath the scenario. After responding, these subjects were presented with correct-answer feedback (as shown in Fig. 1), wherein they were shown the solution to the problem and were asked to carefully study the problem and its solution. Subjects in both conditions were therefore presented identical materials and feedback. Thus, the only difference between these conditions was whether subjects were asked to solve the problem prior to being shown its solution.

All aspects of the experiment (i.e., example and feedback study, problem solving) were self-paced. To move to the next problem, a prompt and continue button were presented near the right side of the bottom of the screen, which notified subjects that they could move on when they were ready by clicking on this button. For each problem that subjects were asked to solve (i.e., problem-solving practice and posttest problems), they were required to type a response into the textbox before they were allowed to see the correct solution (applicable for problem-solving practice problems) or move on (applicable for posttest problems).

After completing the learning phase, all subjects were thanked for their participation and were presented with a prompt reminding them to return 1 week later for the second part of the experiment. Table 1 shows mean completion times on the learning phase partitioned by training condition for each experiment. Subjects took approximately 8 min to complete the learning phase (M = 8.03 min).

Table 1 Mean time subjects in each training condition took to complete the learning phase in each experiment

Upon returning 1 week later, all subjects were given a posttest in which they were asked to solve one problem. For all subjects, the posttest problem that they were presented was the problem that was withheld during the learning phase. The order in which the problems were presented was randomized for each subject but was constrained by pseudo counterbalancing, wherein the same presentation orders were used in each condition equally.Footnote 1 Thus, the order in which the problem scenarios were presented was identical across training conditions. Table 2 shows mean posttest performance for each problem scenario (collapsing across training conditions).

Table 2 Mean performance on each posttest problem scenario in each experiment (collapsed across training conditions)

Results and discussion

Table 3 shows mean performance for each problem that was solved in the learning phase and the posttest across each experiment. Table 4 shows mean performance on the posttest partitioned by condition in each experiment. Because time spent on the learning phase was allowed to vary across conditions, it was included as a covariate in all experiments.

Table 3 Mean performance on learning phase and posttest problems for subjects in the problem-solving and mixed study conditions in each experiment
Table 4 Adjusted (for time on learning phase) and unadjusted mean performance on the posttest (and repeated problems), partitioned by condition, in each experiment

An Analysis of Covariance (ANCOVA) (with time on the learning phase as a covariate) revealed no reliable posttest differences between conditions (see top row of Table 4), F(1,103) = 0.028, p = .868, MSE = .119, ηp2 = .000.Footnote 2 Furthermore, a Bayesian version of this ANCOVA found support for the null hypothesis (BF = .209), as these results were 4.80 times more likely to occur under the null model.Footnote 3 Thus, when the problem-solving and example study conditions were carefully controlled and only differed on whether subjects attempted to solve the problems, subjects in both conditions produced similar levels of transfer on the posttest.

Experiment 2

In Experiment 1, the posttest occurred 1 week after the learning phase. One possibility is that differences in learning exist between the training conditions, but that information was forgotten over the delay, which obscured differences in posttest performance. Thus, we conducted a second experiment, identical to Experiment 1, except that the posttest was immediate.

Methods

Subjects and procedure

One hundred fifteen undergraduate students from ISU participated in this experiment for course credit in an introductory psychology course. Aside from replacing the delayed posttest with an immediate posttest, the design, materials, and procedures were identical to Experiment 1. Subjects took approximately 11 min to complete the learning phase (M = 10.62 min; see second row of Table 1). After completing the learning phase, all subjects were notified that they would be shown one more scenario and would be required to solve it. Subjects were then given an immediate posttest, which consisted of a novel problem scenario.

Results and discussion

An ANCOVA (with learning phase time as a covariate) revealed no reliable posttest differences between conditions, F(1,112) = 3.570, p = .062, MSE = .117, ηp2 =.031. A follow-up Bayesian version of this ANCOVA found support for the null hypothesis (BF = .278), as the results were 3.60 times more likely to occur under the null model. These findings replicate the results from Experiment 1 and demonstrate that when the training conditions are carefully controlled, both produce similar levels of transfer on an immediate posttest.

Experiment 3

One possibility is that the results from Experiments 1 and 2 were due to floor effects, wherein subjects failed to learn enough during the learning phase to solve novel, analogous problems. Another possibility is that for problem-solving practice to be effective, learners must have enough knowledge about the problem type before engaging in problem solving (see Corral et al., 2020). Subjects in the problem-solving practice condition were only presented one example to study, which may have been insufficient for them to learn enough about the problem type to benefit from problem-solving practice.

Thus, in addition to the problem-solving practice and example study conditions, Experiment 3 included a control condition, wherein subjects attempted to solve each of the four problems, but were not shown any examples nor provided feedback. This condition provides a baseline measure of how well subjects can solve these problems without any training, and thus allows for a direct assessment of whether the training conditions produce sufficient learning to facilitate the transfer of knowledge to analogous, novel scenarios. Furthermore, to allow subjects greater opportunity to acquire the necessary knowledge about the to-be-learned problem type before engaging in problem-solving practice, a mixed study condition was included, in which subjects studied two examples before attempting to solve a problem.

If learning occurs during training, subjects who receive training should outperform control subjects on the posttest. Furthermore, if subjects require more than one example to benefit from problem-solving practice, then perhaps studying a second example before engaging in problem solving is beneficial. If so, subjects in the mixed study condition should perform best on the posttest.

Methods

Subjects

Two hundred twenty-eight undergraduate students from ISU participated in this experiment for course credit in an introductory psychology course.

Design, materials, and procedure

Subjects were randomly assigned to one of four conditions: (a) mixed study (n = 58), (b) problem-solving practice (n = 56), (c) example study (n = 58), and (d) control (n = 56). The example study and problem-solving practice conditions were identical to those in Experiments 1 and 2.

In the mixed study condition, after subjects were presented with the first example for study, subjects were shown a second example (as in the example study condition). Next, these subjects were presented with a third problem and were asked to solve it. After entering a response, these subjects were shown the correct solution and were asked to study it (as in the problem-solving practice condition). Subjects took approximately 7 min to complete the learning phase (M = 7.45 min; see third row of Table 1).

In the control condition, subjects were asked to solve four problems (one at a time); no feedback was presented on any of these problems. All other procedures and materials were identical to Experiment 2.

Results and discussion

Control versus training conditions

For problems the control subjects completed, a repeated-measures ANOVA revealed no differences in performance between the first (M = .208, SE = .039), second (M = .220, SE = .044, third (M = .226, SE = .040), and fourth problems (M = .232, SE =.046), F(3,165) = 0.057, p = .982, MSE = 0.102, ηp2 = .001. For this reason, control subjects’ mean performance on these problems was compared to training subjects’ posttest performance.

An ANOVA revealed that subjects in the training conditions performed better on the posttest than control subjects, F(3,224) = 6.380, p < .001, MSE = 0.110, ηp2 = .079; this outcome occurred for each of the training conditions (all ps < .007 and all ds > 0.611; see unadjusted means in Table 4). Thus, subjects in each training condition were able to learn and comprehend the problem solutions well enough to transfer this knowledge to novel, analogous problems.

Training conditions

An ANCOVA (with time on the learning phase as a covariate) revealed no posttest differences among the training conditions, F(2,168) = 1.952, p = .145, MSE = 0.123, ηp2 = .023 (see third row of Table 4). Moreover, a Bayesian version of this ANCOVA found support for the null hypothesis (BF = .297), as the results were 3.37 times more likely to occur under the null model. Given that subjects in the training conditions demonstrated better problem solving than control subjects, these findings indicate that each of the training conditions facilitate learning and transfer, but do so to a comparable degree.

Experiment 4

Experiments 13 show that subjects who engage in problem-solving practice do not transfer solutions to new problems better than example study subjects. Better memory of the content that learners practice retrieving is arguably the primary mechanism through which retrieval practice benefits learning (Butler et al., 2017; Pan & Rickard, 2018). Thus, if subjects who engage in problem-solving practice do not have better memory of the problem’s solution strategy than example study subjects, it would highlight a critical limitation in using retrieval practice to improve problem solving. Experiments 13 do not reveal whether problem-solving practice fails to benefit memory for the problem solution, application of that solution to a new problem, or both.

We therefore conducted a fourth experiment, which was similar to Experiment 3, but did not include a control condition. After the posttest, subjects were asked to solve the problems they were presented with during the learning phase (i.e., repeated problems). If problem-solving practice benefits memory of solution strategies, subjects in the problem-solving practice and mixed study conditions should outperform example study subjects on repeated problems.

Methods

Subjects

Three hundred forty-one undergraduate students from Syracuse University participated in this experiment for course credit in an introductory psychology course. This experiment was approved by the Institutional Review Board at Syracuse University. The experiment site was changed from the previous experiments because the first author changed institutions, which afforded the opportunity to increase our external validity and extend the findings from the previous experiments to a different and more diverse population. Approximately 61% of students who attend Syracuse University are 21 years of age or younger, 54% identify as female, and approximately 56% are White.

To decrease the chance of committing a type two error, the present experiment adopted a more conservative a priori power analysis than Experiments 13. We based the sample size for Experiment 4 on an a priori power analysis with at least 90% power to detect a small-medium effect size (f = .20; α = .05).

Design, materials, and procedure

Subjects were randomly assigned to one of three conditions: (a) example study (n = 116), (b) problem-solving practice (n = 109), and (c) mixed study (n = 116). Apart from not including a control condition, through the posttest, the design and procedure in Experiment 4 were identical to those of Experiment 3. Subjects took approximately 10 min to complete the learning phase (M = 10.34 min; see fourth row of Table 1).

After the posttest, all subjects were asked to solve the same problem scenarios that they were presented with during the learning phase. The order in which these problems were presented was randomized for each subject. No feedback was presented on repeated problems.

Lastly, to get a better sense of subjects’ knowledge about the problems, we asked them two supplementary questions: (a) whether they recognized any connection or commonality among the problem scenarios, and if so, what it was (structural similarity question), and (b) whether they used a common rule or solution for solving the problems, and if so, what it was (solution similarity question). For each of these questions, subjects were asked to type their response into a textbox, which was located directly beneath the question.

Scoring of similarity questions

Both structural and solution similarity questions were scored on a scale of 0–4. As noted earlier, the problem scenarios consist of the same relational structure, such that there are (a) sufficient resources/forces to reach the goal state, but (b) doing so requires concentrating all available resources on a given point, and (c) there is an obstacle that prevents this from happening. Accordingly, these problems can be solved using the same solution strategy, as the (a) resources/forces in the scenario must be partitioned and (b) sent along different routes and then (c) converge simultaneously on the target. Thus, there are three shared structural components in the problem scenarios and three corresponding shared components in the solution strategy.

Structural similarity questions. For the structural similarity questions, subjects received a 0 if they reported not recognizing any commonalities among the problem scenarios; subjects received a 1 if they reported noticing a commonality, but did not note any of the problems’ structural or solution components, a 2 if they noted one of the problems’ structural or solution components, a 3 if they noted two of the problems’ structural or solution components, and a 4 if they noted three of the problems’ structural or solution components.

Solution similarity questions. For the solution similarity questions, subjects received a 0 if they reported not using any similar or common solution to solve the problems; subjects received a 1 if they reported using a common or similar solution, but did not note any of the problems’ solution components, a 2 if they noted one of the problem’s solution components, a 3 if they noted two of the problem’s solution components, and a 4 if they noted three of the problems’ solution components. Table 5 includes mean scores on both similarity questions partitioned by training condition.

Table 5 Mean scores on structural and solution similarity questions partitioned by training condition in Experiment 4

Results and discussion

Figure 2 shows mean performance on the posttest and repeated problems for each condition. To examine performance differences among the training conditions, we conducted a mixed ANCOVA, with training condition as a between-subjects factor (mixed study vs. problem-solving practice vs. example study), learning phase time as a covariate, and test phase as a within-subjects factor (posttest vs. repeated problems). A reliable interaction was observed between the training conditions and the test phase, F(2,337) = 5.29, p = .005, MSE = 0.056, ηp2 = .030, such that differences in performance among conditions depended on test phase.

Fig. 2
figure 2

Mean performance and standard errors of the mean on the posttest and repeated problems for each condition in Experiment 4

Specifically, no performance differences were observed on the posttest, F(2,337) = 0.476, p = .622, MSE = 0.128, ηp2 = .003. Additionally, a follow-up Bayesian ANCOVA (as in Experiment 3) revealed very strong evidence for the null hypothesis (BF = .014), as the posttest results were 73.39 times more likely to occur under the null model.

However, performance differences did emerge on repeated problems, F(2,337) = 3.36, p = .036, MSE = 0.093, ηp2 = .020, as subjects in the mixed study and problem-solving practice conditions performed better on repeated problems than subjects in the example study condition (both ps < .050).

Thus, subjects who engaged in problem solving during learning (i.e., mixed study and problem-solving practice conditions) were better able to retrieve and apply the solutions to those same problems later than subjects in the example study condition. Nevertheless, this benefit did not lead to better transfer on the posttest.

General discussion

Across four experiments (also see the combined analysis in the Appendix), subjects in the training conditions performed comparably on the posttest. Experiment 3 suggests that this finding is due to the training conditions producing similar levels of transfer, as subjects in all of the training conditions outperformed control subjects on the posttest. This finding indicates that the training conditions facilitated the transfer of learning (otherwise posttest differences between the training and control conditions should not have emerged). Further support for this conclusion comes from subjects in the problem-solving practice and mixed study conditions generally performing better on novel problems as the study progressed (see Table 3). Thus, the null findings reported among the training conditions were the result of these subjects being able to transfer what they learned to a comparable degree.

Furthermore, Experiment 4 showed that subjects who engaged in problem-solving practice performed better than subjects who engaged in example study on repeated problems. This outcome can be thought of as a type of testing effect, as subjects who had the opportunity to practice retrieving solution strategies during training were better able to recall those solutions than subjects who studied examples. This result is similar to recent data showing that problem-solving practice benefits subsequent performance on identical problems (Yeo & Fazio, 2019). Critically, however, this superior memory was not enough to produce differential posttest transfer among the training conditions.

These results are in line with recent work suggesting that memory of to-be-learned material is necessary, but insufficient for the transfer of learning (Butler et al., 2017). Memory alone, however, does not facilitate transfer unless learners recognize the relevance of the learned information in the current situation.

Indeed, in Experiment 4, although problem-solving practice and mixed study subjects had the best memory for the solution strategies, the similarity questions revealed that there were no condition differences in subjects’ recognizing the relevance of these solution strategies across different problem scenarios (see Table 5). These findings might explain why transfer more often occurs under conditions where learners are provided with hints about the relevance of learned information to a current situation (Barnett & Ceci, 2002; Butler, 2010; Gick & Holyoak, 1983), as learners often do not recognize that their prior knowledge is applicable.

Thus, even when learners have superior knowledge of a problem’s solution strategy, this knowledge is not enough to facilitate the transfer of learning. This takeaway highlights that other aspects of problem solving might be particularly important for facilitating transfer (e.g., recognizing a problem type’s structure; see Corral & Kurtz, 2023). Accordingly, work in mathematics suggests that students do not struggle to learn a problem’s solution strategy or in how to use it, but rather in recognizing when to apply it (Mayer, 1998). Therefore, although retrieval practice can aid learners’ knowledge of solution strategies, our findings suggest that additional training is needed to facilitate learners’ application of that knowledge to new situations.

These findings offer important new insights into the mechanistic relationship between retrieval practice and the transfer of learning, and how the former impacts the latter. Retrieval practice seems to provide a greater benefit than studying to the learning and memory component that is involved in the transfer of learning. However, one limitation is that this benefit might be restricted to the information that learners attempt to retrieve (e.g., solution strategies). Indeed, when compared to example study, retrieval practice does not appear to better improve the recognition component of knowledge transfer.

Critically, the recognition component of transfer seems to give rise to the inert knowledge problem (Snoddy & Kurtz, 2021), and has been posited to be the most central component in the successful transfer of learning (Corral & Kurtz, 2023). It is thus particularly noteworthy that retrieval (via problem-solving practice) does not appear to better improve the recognition of when to apply a corresponding solution strategy any more so than studying examples. One possibility is that for problem-solving practice to produce a greater benefit to transfer than example study, it must better aid learners’ recognition of when to apply the corresponding solution strategy.

The present results differ from the multitude of studies showing advantages of retrieval practice over restudy (for reviews, see Agarwal et al., 2021; Pan & Rickard, 2018). It is worth noting, however, that most studies on retrieval practice are based on tasks involving recall of information from memory (which we also find evidence for in Experiment 4); tasks involving transfer of learning – and in particular transfer of a solution strategy to novel scenarios – have not been thoroughly explored with retrieval practice (see Carpenter et al., 2020).

The current results thus contribute critical new data showing that retrieval practice benefits memory, but not necessarily transfer. These findings point to potential boundary conditions and limitations to the benefits that retrieval practice provides. These takeaways highlight an often-overlooked distinction by theoreticians between memory and the transfer of learning. Indeed, theories on retrieval practice primarily focus on how retrieval engages mechanisms that strengthen memory of the information that is retrieved (Carpenter et al., 2022; Pan & Rickard, 2018). However, these theories do not directly explain how this enhanced memory might facilitate the transfer of learning, nor (and perhaps more importantly) how it might aid learners in recognizing when to apply their corresponding knowledge. The present findings thus call for theoreticians to consider more careful and nuanced hypotheses on the relationship between retrieval practice and the transfer of learning.