Introduction

When learning a problem-solving strategy, learners are often first instructed about the strategy and then study worked-out solutions of problems that have been solved with the instructed strategy (VanLehn, 1996). Such exemplary solved problems can be text-based worked examples (e.g., Najar & Mitrovic, 2013) or video-based modelling examples (e.g., screencasts showing a model’s action on a computer; van Gog & Rummel, 2010). Studying worked or modelling examples frees up cognitive capacities and is thus more beneficial for learning than independently practising to apply the instructed strategy to solve problems (worked or modelling example effect; McLaren & Isotani, 2011; Renkl, 2014; Sweller, 2006; van Gog et al., 2019; van Gog & Rummel, 2010). Examples are especially beneficial for novices (see expertise-reversal effect; Kalyuga & Renkl, 2010). Regarding (video-based) modelling examples, research has focused on modelling examples illustrating rather brief and simple problem-solving strategies (e.g., Fiorella et al., 2017: assembling an electrical circuit, 90 s; Hoogerheide, 2016; Hoogerheide et al., 2018: calculating current, voltage, and resistance, 240 s). However, there are also often situations where learners need to learn longer and more complex problem-solving strategies. For example, automotive apprentices need to learn how to diagnose car malfunctions (Abele, 2018; Abele & von Davier, 2019). To teach such more complex problem-solving strategies, more extensive modelling examples are necessary. However, such extensive examples have been studied rarely. Consequently, the first aim of the paper was to investigate the effectiveness of more complex and, therefore, longer video-based modelling examples for teaching a complex problem-solving strategy.

It is assumed that examples are conducive to learning because they free up cognitive capacities (van Gog et al., 2019). Generative learning activities stimulated by, for example, self-explanation prompts ensure that these capacities are used for learning (Renkl & Eitel, 2019). So far, prompts usually ask learners to explain previous examples or previous steps of an example (i.e., retrospective self-explanation prompts). Prompts targeting the upcoming contents of an example have hardly been investigated (Bisra et al., 2018). Such anticipatory self-explanation prompts are probably more cognitively demanding, but potentially more conducive to learning. Presumably, the learners’ prior knowledge is a crucial prerequisite of whether they can manage the more demanding anticipatory prompts. Hence, the second aim of the paper was to compare the effects of retrospective and anticipatory self-explanation prompts for learners possessing different levels of prior knowledge.

Cognitive load theory and example-based learning

The effects of worked examples and self-explanation prompts can be explained via the Cognitive Load Theory (CLT; Sweller et al., 1998, 2011). In this paper, we refer to the still widely used conception of CLT from 1998Footnote 1. CLT assumes that working memory capacity is limited and that learning induces three distinct types of cognitive load on working memory: germane cognitive load (GCL), intrinsic cognitive load (ICL), and extraneous cognitive load (ECL; Sweller et al., 1998). If the sum of these three load types exceeds available working memory capacities, learning fails. GCL describes the working memory load resulting from learning-related activities. Such activities include, for example, organizing and integrating new information with existing prior knowledge (see SOI model; Fiorella & Mayer, 2016). ICL is determined by the learning material’s complexity and the learner’s (prior) knowledge. That is, more complex learning materials (i.e., learning materials with higher element interactivity) induce higher ICL. However, the more prior knowledge learners have about a learning topic, the lower the ICL they experience. If learners have prior knowledge of a topic, they already have cognitive schemas enabling them to combine multiple elements from the learning material and handle those as a single element in their working memory. Element interactivity, and thus ICL, decreases. The third type of cognitive load is ECL, which is unproductive and learning-unrelated. Learning materials containing irrelevant information, redundant repetition, or numerous references induce higher ECL. Given the same task (i.e., same element interactivity) and the same learners (i.e., same prior knowledge), ICL is considered fixed. Therefore, to ensure that sufficient working memory resources are available for GCL, ECL should be minimized (e.g., Mayer & Moreno, 2003).

A well-studied method to minimize ECL is worked examples or modelling examples (see worked example effect; Renkl, 2014; Sweller, 2006; Sweller et al., 1998): Learning with examples usually comprises an introduction of the problem-solving strategy, followed by examples demonstrating how to solve example problems (Renkl, 2014; VanLehn, 1996). Learners having to solve the corresponding problems themselves instead of studying examples usually apply general problem-solving strategies not conducive to learning (e.g., trial-and-error, means-ends analysis); these general problem-solving strategies may sometime lead to a correct solution but not to an understanding of the specific strategy to be learned. Hence, such general (weak) strategies can also be considered a learning-irrelevant activity inducing ECL (Renkl, 2014; Sweller et al., 1998; van Gog et al., 2019). Learners studying examples, on the other hand, can devote enough of their cognitive capacities to understanding how the problem-solving strategy is applied to the example problem(s). ECL is thus reduced, and GCL increases - effects that should lead to better learning outcomes (Renkl et al., 2009). Much recent research has focused on video-based modelling examples, such as screencasts showing a model’s actions on a computer (van Gog & Rummel, 2010). In the present paper, we also focussed on just such video-based modelling examples.

Besides beneficial effects on cognitive load and learning, studying (modelling) examples (in comparison to more open learning formats like inventing, or independent problem solving) is also known to promote self-efficacy (Glogger-Frey et al., 2015; Hoogerheide et al., 2014, 2018; van Harsel et al., 2019). Self-efficacy describes how confident learners are in performing a specific task (Bandura, 1997). Observing how a model successfully solves a task can strengthen learners’ confidence that they can perform the task as well (Bandura, 1997; Schunk, 1995). For example, van Harsel et al. (2019) investigated how different sequences of studying examples and problem solving would affect various motivational aspects including self-efficacy. They found that studying examples only resulted in greater self-efficacy than mere problem solving (van Harsel et al., 2019). Finally, self-efficacy exerts a strong influence on learning outcomes, as it positively affects academic motivation and learning behaviour, such as learning perseverance (Bandura, 1997; Multon et al., 1991; Schunk, 1995).

Modelling examples are known to benefit learning in various domains and settings. However, in most cases, such problem-solving strategies were comparatively simple and could be taught with shorter modelling examples (e.g., Fiorella et al., 2017: assembling an electrical circuit, 90 s; Hoogerheide, 2016; Hoogerheide et al., 2018: calculating current, voltage, and resistance, 240 s). We can assume that substantially longer examples than in earlier studies, namely those illustrating more complex problem-solving strategies, also reveal beneficial effects on learners‘ cognitive load, learning outcomes, and self-efficacy. However, to our knowledge, this assumption has hardly been investigated so far. We, therefore, aimed to replicate the worked or modelling example effect with video-based modelling examples for more complex problem-solving strategies.

Self-explanation prompts

By reducing ECL, examples liberate working memory capacities. To ensure that these capacities are used for learning (i.e., ensuring that GCL increases), learners should engage in self-explanations (Hilbert & Renkl, 2009) which can be elicited with self-explanation prompts (Atkinson et al., 2003; Renkl et al., 1998). With such prompts, learners are explicitly asked to relate the content in the illustrative example to the problem-solving strategy explained in an earlier instruction. For example, Hilbert and Renkl (2009) used two paper-based worked examples to teach students a circular, three-step-process of concept mapping that had already been introduced (Hilbert & Renkl, 2008). While worked examples alone failed to promote learning (Hilbert & Renkl, 2009; experiment 1), the combination of worked examples and self-explanation prompts proved beneficial for learning (Hilbert & Renkl, 2009; experiment 2). The self-explanation prompts used in experiment 2 asked students to explain: ‘To which phase of the concept mapping process can you assign what Carolin/Karsten just did? Why?’ (Hilbert & Renkl, 2009, p. 271) with Carolin and Karsten being fictitious students in the examples.

In this case and in most studies, self-explanation prompts refer to aspects already shown in corresponding examples (e.g., Berthold et al., 2009; Hilbert et al., 2008; Klein et al., 2019). We refer to such backwards-directed prompts as retrospective self-explanation prompts. Another potentially successful type of self-explanation prompts is directed forward: in Renkl (1997), successful learners were, inter alia, those who thought about a problem’s upcoming solution steps (anticipative reasoning). Consequently, anticipatory self-explanation prompts, that is, prompts referring to upcoming problem-solving steps could also be useful. Referring to the study by Hilbert and Renkl (2009), such an anticipatory prompt could be ‘Which step of concept mapping comes next and what will Carolin/Karsten have to do?’

Anticipatory and retrospective prompts presumably induce different cognitive processes: When answering retrospective self-explanation prompts, learners have to consider only previous steps in the illustrated problem-solving strategy. Conversely, when answering anticipatory prompts, learners have to represent the problem-solving strategy’s next step. However, these mental processes can only take place by relying on already-completed problem-solving steps. Consequently, when learning with anticipatory prompts, more elements (i.e., the prior and subsequent step) must be considered overall, but more relevant information also has to be organized and integrated (Fiorella & Mayer, 2016).

In CLT terms, this could mean two things for learners’ cognitive load: First, anticipatory prompts might induce higher GCL than regular retrospective prompts, as learners are prompted to organize and integrate more information. On the other hand, as more information to be considered results in greater element interactivity, anticipatory prompts will likely also induce higher ICL and might therefore be more demanding. Presumably, only those learners with greater prior knowledge will successfully manage the increased demands of such anticipatory prompts while remaining able to invest considerable amounts of GCL. Learners with lower prior knowledge, on the other hand, might be overwhelmed by the increased demands of the anticipatory prompts and will thus experience higher ICL (Gerjets et al., 2006; van Merriënboer et al., 2006). Consequently, in terms of learning outcomes, only learners with higher prior-knowledge levels can be expected to benefit from anticipatory prompts.

Self-explanation prompts affect both learners’ cognitive processes and thus their cognitive load and learning outcomes, but they also influence learners’ self-efficacy regarding the learning topic. For example, Crippen and Earl (2007) developed a web-based learning tool to teach undergraduate students problem solving skills in the domain of chemistry with quizzes. Students were allocated to one of three experimental conditions: in the control condition, students learned with the quizzes only. In the other two conditions, students were also provided with worked examples for each quiz item. Additionally, in one of these conditions, students were prompted to self-explain the worked examples. Regarding self-efficacy, these authors found that worked examples alone revealed no effects on self-efficacy, but worked examples provided together with self-explanation prompts did exert a positive effect on students’ self-efficacy (Crippen & Earl, 2007). The question as to whether retrospective or anticipatory prompts reveal different effects on learners’ self-efficacy cannot be answered based on existing research evidence.

Taken together, the potential positive effects of anticipatory prompts supposedly depend on whether learners can cope with the increased demands. Hence, the learners’ prior knowledge likely plays an important role in the relationship between prompt type and cognitive load, learning outcomes, and self-efficacy. However, these theoretical considerations cannot be substantiated with empirical evidence, as anticipatory prompts have seldom been investigated (Bisra et al., 2018).

Present study and research questions

The present study was conducted with automotive apprentices who were taught a diagnostic strategy to diagnose complex automotive malfunctions (Abele, 2018; Abele & von Davier, 2019). Although diagnoses of malfunctions are a crucial part of an automotive technicians day-to-day work (Spöttl et al., 2011), at the end of their 3-year apprenticeship, only 15% of the apprentices master the required strategies to diagnose complex malfunctions; they can thus be considered novices (Abele & von Davier, 2019). We pursued two goals: First, we investigated the use and possible limitations of longer and more comprehensive modelling examples in a screencast video format for teaching a job-relevant and complex problem-solving strategy, namely diagnosing car malfunctions. Second, we compared the effects of anticipatory and retrospective self-explanation prompts for these modelling examples. For this comparison, we considered the apprentices’ general prior knowledge of car diagnoses. We examined the effects of modelling examples and self-explanation prompts on apprentices’ diagnostic strategy knowledge and skills (i.e., knowledge about and application of the instructed strategy), self-efficacy, and cognitive load. Diagnostic strategy knowledge and skills and self-efficacy were measured before and after the intervention. Cognitive load was measured only after the intervention.

We investigated the following hypotheses regarding modelling examples:

  • H1: Following the worked or modelling example effect (Renkl, 2014; Sweller, 2006), we expected a greater increase in diagnostic strategy knowledge and skills from a pretest to a posttest when the apprentices learned with modelling examples than when apprentices practised applying the diagnostic strategy by solving open problems.

  • H2: We expected a greater increase in self-efficacy among apprentices learning with modelling examples than among those practising applying the strategy (Crippen & Earl, 2007; Schunk, 1995).

  • H3: Following the example-based learning literature (e.g., Renkl et al., 2009), we expected apprentices in the modelling example condition to perceive lower extraneous and higher germane cognitive load while learning than apprentices practising to apply the strategy.

Moreover, we were interested in whether the effects of different self-explanation prompts depend on prior knowledge. However, since these effects have hardly been researched so far, we formulated no specific hypotheses. Instead, we posed these three open research questions:

  • RQ1: Do anticipatory and retrospective self-explanation prompts reveal differential effects on the development of apprentices’ diagnostic strategy knowledge and skills and does their prior knowledge moderate these effects?

  • RQ2: Do anticipatory and retrospective prompts exert differential effects on the development of apprentices’ self-efficacy and does their prior knowledge moderate these effects?

  • RQ3: Third, do anticipatory and retrospective prompts demonstrate differential effects on apprentices’ extraneous, intrinsic, and germane cognitive load while learning, and does their prior knowledge moderate these effects?

Methods

Participants

Originally, 78 apprentices participated in our experiment. Because of technical problems with the survey software, only 67 complete data sets could be analysed. Apprentices were 20.85 years old (SD = 2.74), 65 were male, and two were female. German was the first language of 57 apprentices, and 10 reported an additional first language. Seven apprentices had a university entrance qualification (Abitur), 55 apprentices had a secondary school leaving certificate (Mittlere Reife), and five apprentices had a lower secondary school leaving certificate (Hauptschulabschluss).

To determine the required sample sizes, we conducted two a-priori power analyses with Gpower 3.1 (Faul et al., 2007). We aimed for a power of 0.80. Based on previous studies on the worked example effect (e.g., Nievelstein et al., 2013; Schwonke et al., 2009; van Gog et al., 2011) and self-explanation prompts (e.g., Atkinson et al., 2003; Hilbert & Renkl, 2009), we expected medium effect sizes (e.g., Cohen’s f > 0.25 or η2 > 0.06; Cohen, 1988). For the analyses regarding hypotheses H1 and H2 and research questions RQ1 and RQ2 (i.e., repeated measures analyses of variance, RM-ANOVAs), the required sample size was N = 34 (about half of the collected sample). For the analyses regarding hypothesis H3 and research question RQ3 (i.e., analyses of variance, ANOVAs), the required sample size was N = 128. As we had to stop collecting data at an early stage because of school closures during the COVID-19 pandemic, the required sample size for the ANOVAs could not be realized. A larger sample may have enabled us to demonstrate additional effects. However, the effects we did discover can still be interpreted.

Design and procedure

The experiment comprised two sessions separated by approximately 10 days. Table 1 shows the detailed procedure. Session one included the pretests, session two comprised the intervention and posttests. In the intervention in session two, first, apprentices in all conditions learned about the diagnostic strategy with instructional videos and organizational prompts. Then, they learned according to their randomly assigned experimental condition: Two groups received modelling examples, one (n = 21) with retrospective self-explanation prompts and the other (n = 25) with anticipatory self-explanation prompts. The third group (control, n = 21) received no modelling examples and no self-explanation prompts.

The entire study took place on computers in the apprentices’ schools. All learning and testing materials, which can be requested from the first author, were presented in digital form via the page-based online survey tool LimeSurvey. Once apprentices left a page, they could not go back. We told participants when we expected them to have completed a phase and to proceed with the next phase. Thereby we ensured an equal time on task within and between conditions (see maximum durations in Table 1).

Table 1 Procedures in sessions 1 and 2

Learning materials

In the experimentally varied intervention, apprentices learned a complex diagnostic strategy that should help them to diagnose car malfunctions in a structured way. The development of this strategy and the intervention is described in detail by Meier et al. (2022): The cyclical diagnostic strategy comprises four steps: (1) formulating a hypothesis about possible causes for a malfunction, (2) planning a measurement to test this hypothesis, (3) carrying out the measurement, and (4) evaluating the measurement results and the hypothesis. Steps one and two include additional sub-principles. Consequently, this diagnostic strategy can be considered a complex problem-solving strategy (Abele, 2018; Abele & von Davier, 2019; Meier et al., 2022).

The intervention comprised two learning phases (see Table 1). In the first learning phase, apprentices in all conditions watched five animated instructional videos explaining the strategy (16:33 min, Fig. 1). All participants also completed four practice tasks during this phase, which served as organizational prompts (Roelle et al., 2017), and received the correct solution. Learning phase one took 35 min.

Fig. 1
figure 1

Screenshots of the instructional videos in german. Note The top left picture shows the introduction to diagnostic strategy; the top right picture explains a sub-principle in step one; the bottom left picture explains how to plan a measurement with an electrical circuit diagram in step two; the bottom right picture gives an overview of the complete diagnostic cycle in step four

In learning phase two, we implemented both experimental variations. The first experimental variation concerned the modelling examples: Participants in the two modelling example conditions (i.e., both in the retrospective and in the anticipatory self-explanation prompt condition) received two video-based modelling examples showing an expert applying the diagnostic strategy in a computer simulation (Gschwendtner et al., 2009; Meier et al., 2022). Each diagnostic step was illustrated in a separate video. Hence. both modelling examples consisted of several videos (first example: 12 videos; 25:50 min; second example: 10 videos; 19:37 min). Figure 2 shows how the expert uses the electrical circuit diagram in the computer simulation to plan a measurement in step two. Figure 3 illustrates how the expert then executes the corresponding measurement. Participants in the control condition did not receive the modelling examples but tried to diagnose the same diagnostic problems in the computer simulation (Gschwendtner et al., 2009; Meier et al., 2022). Hence, instead of studying the worked-out solutions to the two diagnostic problems in the modelling examples, participants in the control condition were required to solve the problems independently, that is, to practise applying the diagnostic strategy on their own.

Fig. 2
figure 2

Screenshot of a modelling example showing how the expert plans a measurement

Fig. 3
figure 3

Screenshot of a modelling example showing how the expert executes a measurement

After each video of the modelling examples, participants answered the same self-explanation prompt in writing. With these prompts, we implemented the second experimental variation. Depending on the condition, the prompt differed: In the retrospective self-explanation prompt condition, the prompt read as “Which troubleshooting step was just completed? Explain how you will proceed with this step and why it is important for troubleshooting (in general)”. In the anticipatory self-explanation prompt condition the prompt was “Which troubleshooting step comes next? Explain how you will proceed with this step and why it is important for troubleshooting (in general)”. For the first four prompts, participants were supported in their answers by answering fill-in-the-blank self-explanation prompts (i.e., assisting self-explanation prompts; Berthold et al., 2009). For all following prompts, participants received suggestions for how to start their answers’ first sentences. Participants did not receive individual feedback but the correct answer for each prompt after answering it, that is, they received an example of how the respecting prompt could have been answered correctly.

As participants in the control condition did not receive the modelling examples, they did not receive any self-explanation prompts.

Testing materials

To investigate the effects of modelling examples and different self-explanation prompts depending on the learners’ general prior knowledge on diagnostic strategy knowledge and skills, self-efficacy, and cognitive load, different tests were used (see Table 1): To assess general prior knowledge about car diagnoses, we used two different tests in session one. For diagnostic strategy knowledge and skills (i.e., knowledge about and application of the instructed diagnostic strategy) three tests were given in both sessions one (i.e., before the intervention) and two (i.e., after the intervention). Likewise, a questionnaire assessing the apprentices’ self-efficacy in performing diagnoses was used in sessions one and two. Finally, a questionnaire aiming at the apprentices’ cognitive load was given after the intervention in session two. All these tests are described below. Closed and open items were used in most of them. Closed items were scored automatically. For all open items, the first author and a subject matter expert (i.e., the second author) developed a coding scheme. We developed these schemes based on ideal responses to the different tests. Ideal means that these responses were perfectly in line with the taught diagnostic strategy. In addition, we also looked for alternative solutions in the responses of all participants that could be assessed as similarly good from a subject matter perspective. Then, a student assistant and the first author scored 25% of all answers and adjusted the coding schemes until achieving an interrater reliability of Cohen’s Kappa > 0.8. Then the student assistant independently scored the remaining answers.

General Prior Knowledge tests

As a first measure of general prior knowledge about car diagnoses, we selected five out of 24 items in the diagnosis-relevant reception competence (DRC) test by Norwig et al. (2021). This competence describes the ability to read various documents relevant to the diagnosis (e.g., electrical circuit diagrams) and can thus be seen as prerequisite knowledge for car diagnoses. For example, we gave participants a schematic diagram and a photo of an engine compartment and asked them to use the schematic diagram to locate a particular component in the realistic photo. We selected items with a midrange solution rate (ranging from 32 to 71% in Norwig et al., 2021) to prevent floor and ceiling effects and with the highest item-total correlation (> 0.43 for all 5 items).

Second, we selected three of seven items of a partial skills test by Abele (2014) with a high item-total correlation (between 0.48 and 0.60 in Abele, 2014). In these items, participants were instructed to perform specific measurements in the simulation and to evaluate whether the measurement results indicated a malfunction or not.

Diagnostic strategy knowledge and skills tests

We administered three different tests to measure the apprentices’ diagnostic strategy knowledge and skills in the pretest (i.e., in session one) as well as in the posttest (i.e., after the intervention in session two). First, the strategy description test measured conceptual knowledge and comprised two questions asking participants (1) to describe their troubleshooting procedure in a situation where they are given little assistance from a computer-based expert system (i.e., complex diagnostic problems), and (2) how they would narrow down which components might be responsible for a malfunction.

Second, in the strategy completion test, apprentices carried out or described (parts of) steps of the diagnostic strategy in four different scenarios. Within these scenarios, closed and open questions were used. The former dealt, for example, with which diagnostic step should be taken next in the current scenario. In the open-ended questions, the apprentices, for example, studied a circuit diagram and described an appropriate measurement.

Third, to test diagnostic skills, participants performed diagnoses in the computer simulation. They were provided with a description of the malfunction and then diagnosed it. Eventually, participants described the cause of the malfunction and how it could be repaired. Participants made their first diagnosis in both the pretest in session one and the posttest in session two, and one additional second diagnosis in the posttest only.

Self-efficacy and cognitive load

Both in the pretest and posttest and before performing the first diagnosis in the computer simulation, participants rated their self-efficacy regarding this diagnosis with five items on a seven-point Likert scale (Cronbach’s α = 0.89). These items were developed based on Bandura’s (2006) guide for constructing self-efficacy (Table 2). After the intervention, participants rated their intrinsic (two items), germane (two items) and extraneous cognitive load (three items) on a seven-point Likert-scale (Table 2). These items were developed and validated by Klepsch and colleagues (Klepsch et al., 2017; Klepsch & Seufert, 2020, 2021). Reliability was acceptable (intrinsic load: Cronbach’s α = 0.74; germane load: Cronbach’s α = 0.84; extraneous load: Cronbach’s α = 0.60).

Table 2 Overview of self-efficacy and cognitive load items

Data analyses

To test the effects of modelling examples and self-explanation prompts on variables measured in the pretest (i.e., session 1) and posttest (i.e., session 2), we ran repeated measures analyses of variance (RM-ANOVAs) with timepoint (pretest versus posttest) as within-subjects variable and either example condition (modelling examples yes versus no; for H1 to H3) or self-explanation prompt condition (retrospective versus anticipatory prompts; for RQ1 to RQ3) as between-subjects variable. For the latter, the mean-centered test scores on both prior knowledge tests were included as additional continuous factors (Schneider et al., 2015). Significant three-way interactions of timepoint, prompt condition, and a moderating prior knowledge test score were further explored using the Johnson-Neyman procedure. The Johnson-Neyman procedure identifies boundaries of significance along the continuous moderating variable (i.e., prior knowledge test scores). In other words, we identified (mean-centered) prior knowledge test scores at which the difference in the corresponding dependent variable between the retrospective and the anticipatory prompt condition became significant and yielded an α value of 0.05. For participants with (mean-centred) prior knowledge test scores beyond this boundary, the difference in the corresponding dependent variable between the retrospective and anticipatory prompt conditions thus became significant (Hayes & Matthes, 2009; Montoya, 2019).

For variables only measured once (session 2), regular analyses of variance (ANOVAs) with either example condition (modelling examples yes versus no; H1 and H3) or self-explanation prompt condition (retrospective versus anticipatory prompts; RQ1 and RQ3) as between-subjects factors were conducted.

A significance level of 0.05 applied to all analyses. As effect size we used η2partial with 0.01, 0.06, and 0.14 corresponding to a small, medium, and large effect, respectively (Cohen, 1988; Lakens, 2013). Analyses were conducted with IBM SPSS Statistics 27.

Results

Apart from two exceptions, there were no differences in demographic variables, prior knowledge tests, or first measures of repeated measures between the two example conditions or two prompt conditions (all p > .05). The first exception was the age between the two example conditions, F(1, 65) = 4.227; p = .044. However, as age did not correlate with any of the dependent variables (or with the development of the dependent variables with repeated measures), this difference is negligible. Second, in the pretest, apprentices in the modelling example condition scored significantly higher in the first diagnosis in the simulation than apprentices did in the control condition (see Table 3), F(1, 65) = 7.727; p = .007. This difference must be considered in the later interpretation of our results. In the following section, we first report the effects of modelling examples. In the second section, we report the effects of retrospective versus anticipatory self-explanation prompts.

Effects of modelling examples

Table 3 illustrates descriptive data. Table 4 shows the results of the statistical tests.

Table 3 Descriptive data of dependent variables for the control condition (i.e., no modelling examples) and modelling examples condition
Table 4 Main and interaction effects of the example condition and timepoint on dependent variables

Tables 3 and 4 indicate significant main effects of timepoint on strategy description and strategy completion test scores with large effects. In both tests, participants in both conditions scored significantly higher in the posttest than in the pretest. However, there were no interaction effects of example condition and timepoint on these scores. Hence. this improvement was not larger in the modelling example condition. There were no effects on self-efficacy, either as main effects by timepoint or example condition or as interaction effects.

We observed a significant main effect of timepoint and an interaction effect of example condition and timepoint on the score in the first diagnosis. However, both effects arise from a difference in the first measurement of the score on the first diagnosis, which we already pointed out at the beginning of the results section. Thus, these effects should be disregarded.

Regarding the variables measured only once, we detected no effects of modelling examples on participants’ score on the second diagnosis, nor any effects on participants’ ICL, GCL, or ECL.

Taken together, we were unable to confirm hypotheses H1 to H3: modelling examples did not lead to a greater increase in diagnostic strategy knowledge and skills (H1), or in self-efficacy (H2), and participants in the modelling examples did not perceive lower ECL and higher GCL (H3).

Effects of retrospective versus anticipatory self-explanation prompts

Table 5 shows the descriptive data of the dependent variables for the two self-explanation prompt conditions. Table 6 shows the results of the statistical tests of the effects of the different self-explanation prompts.

Table 5 Descriptive data of moderation variables and dependent variables for the retrospective prompt condition and anticipatory prompt condition
Table 6 Main and interaction effects of the prompt condition, the moderation variables, and timepoint on dependent variables

We noted significant main effects of timepoint on the strategy description test score and strategy completion test score with large effects. These effects correspond to the effects we had already observed when comparing the two example conditions.

Regarding research questions RQ1 and RQ2, we detected no interaction effects of timepoint and prompt condition on any of the dependent variables. When also considering participants’ prior knowledge, however, our results revealed two significant three-way interactions of timepoint, prompt condition, and the DRC test score on the strategy description test score (RQ1) and self-efficacy (RQ2). Concerning strategy description test scores (see Fig. 4), the Johnson-Neyman procedure indicated that for learners with mean-centered DRC test scores larger than 1.68, that is, the higher prior knowledge participants, retrospective prompts had detrimental effects and anticipatory prompts had beneficial effects on the difference of strategy description test scores, t(42) = 2.02, p = .05.

Fig. 4
figure 4

Scatter plot of grand mean-centered DRC test scores against difference in strategy description test scores for the retrospective prompt condition and anticipatory prompt condition. Note The differential effect of retrospective and anticipatory prompts on the difference in strategy description test scores is significant right of the vertical longer dashed line (DRC test scores > 1.68)

Regarding self-efficacy (RQ2; see Fig. 5), the Johnson-Neyman procedure indicated that for participants with mean-centered DRC test scores lower than − 0.85, that is, the lower prior knowledge participants, retrospective prompts had beneficial effects and anticipatory prompts had detrimental effects in terms of difference in self-efficacy, t(42) = -2.02, p = .05.

Fig. 5
figure 5

Scatter plot of grand mean-centered DRC test scores against difference in self-efficacy for the retrospective prompt condition and anticipatory prompt condition. Note The differential effect of retrospective and anticipatory prompts on the difference in self-efficacy is significant left of the vertical longer dashed line (DRC test scores < -0.85)

Eventually, regarding RQ3, we found that ECL was lower in the anticipatory prompt group. Moreover, we identified a significant two-way interaction of prompt condition and DRC test score on GCL (see Fig. 6). Following the Johnson-Neyman procedure, we found that for lower prior knowledge participants (mean-centered DRC test scores < -0.98) retrospective prompts induced a higher GCL than anticipatory prompts, t(42) = -2.02, p = .05.

Fig. 6
figure 6

Scatter plot of grand mean-centered DRC test scores against germane load for the retrospective prompt condition and anticipatory prompt condition. Note The differential effect of retrospective and anticipatory prompts on GCL is significant left of the vertical longer dashed line (DRC test scores < -0.98)

Taken together, RQ1 to RQ3 cannot be answered unambiguously, but there is a tendency that apprentices with more prior knowledge learned more when learning with anticipatory prompts, while apprentices with less prior knowledge experienced a greater increase in self-efficacy and a higher GCL when learning with retrospective prompts.

Discussion

So far, research on modelling examples has tended to focus on brief modelling examples teaching quite simple problem-solving strategies. Moreover, self-explanation prompts asking learners to explain past problem-solving steps illustrated in an example have mainly been used. Thus, the present study had two objectives: First, we investigated the effects of modelling examples when teaching longer problem-solving strategies, such as diagnosing car malfunctions, on diagnostic strategy knowledge and skills (H1), self-efficacy (H2), and extraneous and germane cognitive load during learning (H3). Second, while taking into account the apprentices’ prior knowledge, we compared the effects of retrospective and anticipatory self-explanation prompts on the development of diagnostic strategy knowledge and skills (RQ1), self-efficacy (RQ2), and cognitive load during learning (RQ3).

Effects of modelling examples

Contrary to H3, we observed that the modelling examples exerted no effects on the apprentices’ extraneous (ECL) or germane cognitive load (GCL). Since example-based learning’s positive effect on learning outcomes relies on reducing ECL and increasing GCL (Sweller, 2006), we would not expect the modelling examples to reveal any positive effect on learning outcomes. Accordingly, and in contrast to our H1, we detected no such effect. One interpretation of this finding is that longer modelling examples are less suitable for teaching complex problem-solving strategies. However, both text-based worked examples (e.g., Heitzmann et al., 2015; Schalk et al., 2020) and video-based modelling examples (Fiorella et al., 2017; Hoogerheide, 2016; Hoogerheide et al., 2014; Schmitz et al., 2017; van Harsel et al., 2019) have proven to be conducive to learning. We assume that we were unable to detect beneficial effects of the modelling examples because of the long instruction phase in which the diagnostic strategy was initially explained to all apprentices, that is, also in the control condition. Learning phase one took 35 min and comprised five instructional videos and four practice tasks that presumably supported knowledge organisation well. This 35-minute instruction phase is substantially longer than in other studies. Schmitz et al. (2017) tested an instruction lasting only 17 min. Some studies used no instruction at all (e.g., Hoogerheide et al., 2014). Therefore, our extensive instruction may have provided all apprentices, that is, also those in the control condition, with sufficient knowledge to begin independent problem solving (i.e., independent diagnosis in the simulation).

Regarding self-efficacy, we expected the modelling examples to promote self-efficacy, as learners would be able to see how the diagnostic process is completed (H2; Glogger-Frey et al., 2015; Schunk, 1995). However, that is not what we observed. One reason for this might be that the inexperienced apprentices could not identify with the model, who was an experienced expert. According to the model-observer similarity principle (Renkl, 2014; van Gog et al., 2019), the model should also have been an apprentice so that the learners could have identified better with it.

We, therefore, recommend future studies investigating the use of longer modelling examples for complex problem-solving strategies to use a shorter instruction phase and a model with which learners can better identify.

Effects of retrospective and anticipatory self-explanation prompts

Between the two prompt conditions we noted different effects on learning outcomes (RQ1), self-efficacy (RQ2), and cognitive load (RQ3) depending on the apprentices’ prior knowledge: We found a greater increase in declarative knowledge (i.e., in the strategy description test) among the stronger apprentices when learning with the anticipatory prompts. Regarding self-efficacy, the weaker apprentices’ self-efficacy was better supported by the retrospective prompts. Similarly, in terms of cognitive load, apprentices with less prior knowledge reported a higher GCL when learning with retrospective prompts. Overall, these effects suggest that anticipatory prompts are more beneficial for learners with more prior knowledge, whereas learners with less prior knowledge profit more from retrospective prompts. However, before we can recommend practitioners to use either anticipatory or retrospective prompts depending on their learners’ prior knowledge when designing example-based learning scenarios for learners with different prior knowledge levels, replication studies are necessary to make sure that these findings are reliable. Unfortunately, we could not conduct a direct replication of our study as the study was very extensive and was conducted in the field, that is, in classrooms at schools during regular school hours. For this study, the participating schools had to adapt their curricula to allow us access. Nevertheless, a conceptual replication or an extension study (Zwaan et al., 2018) that investigates the effectiveness of similarly designed anticipatory prompts for teaching a cyclical problem-solving strategy would be helpful to examine the specific mechanisms of the anticipatory prompts for learners with different prior knowledge levels. Such an extension study could also involve think aloud protocols.

Besides the interactions, we also found that the anticipatory prompts induced a lower ECL, that is, a lower learning-irrelevant load. This finding is rather surprising. We expected anticipatory prompts to mainly influence element interactivity and thus ICL. The ECL items mainly addressed the learning material’s (visual) appearance, that is, whether the content was easy to process. However, the two prompt conditions did not differ in their learning materials’ (visual) design. For example, Klepsch and Seufert (2020, study 1), who developed the cognitive load instruments used in the present study, found differences in learners’ ECL ratings when element interactivity had been manipulated and argued that participants sometimes struggle to distinguish between ICL and ECL, which resulted in effects on both scales. However, we could only refer to this argumentation if the anticipatory prompts had caused an increase and not a decrease in ECL.

Limitations and implications for future research

Above, we already gave several recommendations for future research: First, in future studies on longer modelling examples, a shorter instruction phase and a model with which learners can better identify should be used. Second, the recurrent pattern of anticipatory prompts being more beneficial for higher prior knowledge learners and retrospective prompts being more beneficial for lower prior knowledge learners needs to be further investigated – possibly in a laboratory setting with think-aloud protocols.

One limitation of the current study that would be important to address in future research was that we found that the self-explanation prompts were answered sub-optimally. That is, participants in both prompt conditions answered only about 50% of the prompts in a meaningful way. The other half of the prompts were often not answered meaningfully, with participants either entering only single letters or blanks or making entries without any reference to car diagnoses. If learners do not answer prompts properly, it can also be assumed that the expected effects of these prompts will be smaller. Table 7 gives exemplary responses to the self-explanation prompts.

Table 7 Overview of exemplary apprentices’ answers on retrospective and anticipatory self-explanation prompts

We see two potential reasons for these inadequate answers. First, the self-explanation prompts were not very specific, as they simply asked learners to name and explain the previous or subsequent diagnostic step. Accordingly, they provided little guidance to the learners. For example, Glogger et al. (2009) showed that for ninth graders prompted to apply learning strategies, specific prompts were superior to general prompts. Second, in the present paper, after each diagnostic step in the modelling example, the apprentices answered exactly the same self-explanation prompt. These prompts may have been perceived as too repetitive. The differential effects of retrospective and anticipatory prompts depending on prior knowledge may be even stronger with more specific and more engaging prompts. This possibility should be investigated in the future.

Conclusion

Even if modelling examples did not yield the desired effects in the present study, anticipatory self-explanation prompts seem to function differently from retrospective self-explanation prompts and could be a promising alternative for stronger learners. However, before educational practitioners apply such anticipatory self-explanations prompts, these prompts should be further investigated in future research.