Learning motor skills by observing a model that performs the desired actions and behavior has been a successful and well researched instructional technique for the last 30 years (McCullagh et al. 1989; Wetzel et al. 1994; Wulf and Shea 2002). The current focus on flexibility in task performance and the mastering of complex cognitive skills (Jonassen 1999) has made modeling (i.e., observational learning) relevant to modern learning environments as well. At the same time, rapid developments in computer and software technology in the last decades have enabled the use of dynamic visualizations, such as animations and video, to illustrate abstract cognitive processes or concepts (Casey 1996; Chee 1995). We refer to the use of animations and video in the modeling of such cognitive processes as video-based models. Moreover, researchers have advocated active learning because it enables learners to connect new information with the knowledge they already possess (Chi et al. 1989; Mayer 2001; Wittrock 1974). So interaction between the learner and the learning environment might be an effective way of getting learners to engage in active processing of the learning materials.

However, in modeling learners typically observe the behavior of a model and only afterwards engage in the actual practice of the modeled behavior. The essence is that the observant does not perform any actions during learning but instead focuses on the actions and performance of the model. The question is: How can interactivity effectively be implemented in video-based modeling? In this paper we argue that modeling should not be treated as a single, isolated instructional method, but as a first phase that learners have to go through in order to become proficient performers of the modeled behavior. The social cognitive model of sequential skill acquisition (Schunk and Zimmerman 1997; Zimmerman and Kitsantas 2002) describes how learners initially start with observing a model, but then start practicing and gradually learn how to self-regulate their own performance. Moreover, we argue that this transition from observing to self-regulated practice is accompanied by different cognitive processes and that the interaction takes different forms that all facilitate different cognitive processes. This implies that the role of interactivity will also change when learners go from pure observation to self-regulated practice. Finally, we contend that human’s limited cognitive capacity requires optimal support for learners in this transition process. Especially the regulation of complex skills can easily overload the cognitive system of novices, because it asks for deliberate attention to one’s behavior, a comparison of this behavior with some standards, and a response following this comparison.

In the remainder of the article we will first give an outline of modeling and the social cognitive model of skill acquisition. Secondly, we will go more deeply into the cognitive processes that learners have to engage in order to become proficient performers. Thirdly, we will elaborate on the concept of interactivity in electronic learning environments. The fourth section discusses design guidelines that enable learners to interact effectively with video-based models. In the final section we draw some conclusions and provide directions for further research.

Modeling and Skill Acquisition

Modeling occurs when the behavior, strategies, or thoughts of observers are molded after those of one or more models (Schunk and Zimmerman 1997). These models might be experts who perform flawless and error free or peer models that make and correct errors that are typical for the modeled behavior. Three types of modeling pertain: Behavioral modeling, cognitive modeling and combinations of both types. Behavioral modeling, in which the model shows the desired performance, typically involves motor skills like those applied in learning to write or sports such as skiing, playing tennis and throwing darts (Kitsantas et al. 2000; Zimmerman and Kitsantas 2002, see for a review, Wulf and Shea 2002). Cognitive modeling on the other hand, requires the explication of considerations, thoughts and reasons that underlie performance. Problem solving (Jonassen 1999) and cognitive behavior modification (Meichenbaum 1977) are examples of domains that involve cognitive modeling. Finally, there are domains, such as surgery, in which both behavioral (e.g., the fine-grid motor skills) and cognitive modeling (e.g., considerations about the best way to proceed) are involved (Custers et al. 1999).

In behavioral modeling the performance can be observed directly by the learner. However, considerations in the cognitive domain (e.g., problem solving) are not observable by themselves, especially when abstract processes or concepts are involved, and need to be externalized. In this respect, the use of dynamic visualizations such as animations and videos may be the preferred way to externalize the considerations involved in cognitive modeling. For example, take a novice computer programmer who has to learn the skill of debugging (trying to find out what happens when an error occurs in the program code). A narrated video might show an expert programmer demonstrating his or her debugging skill. In this video, the expert points to places in the code where important changes take place and explains some considerations that have to be taken into account. Also from the perspective of controlling the costs in education, the production of copies of a videotaped or animated model is much cheaper than having an expert perform the desired actions and explain them each time that it is needed. In line with this argument, videotaped or animated models enable learners to observe the modeled performance at the time and place most convenient for them.

There are at least three reasons that explain why modeling is effective. First, while observing an expert performing a complex task, learners can construct an adequate cognitive representation of the performance that enables them to rehearse the task mentally (or physically) and by doing so refine their initial representation (Bandura 1976). Second, compared with other instructional methods like worked-out solutions, learning by observation might be more beneficial because the model not only shows what is happening, but also explains why this is happening (Collins 1991; Van Gog et al. 2004). Problem solving, for example, can be instructed as the application of several problem solving steps, but this approach does not take into account why some steps are chosen to solve the problem and others are not. Information that explains why a particular action is taken might help learners to construct more generalized schemas that can then be applied to a broader variety of contexts and problem formats. Third, for novices the performance of a complex task might impose such high demands on memory resources that essential information cannot be processed anymore. In that case observing a model that performs the complex task releases cognitive resources that can be used to process this essential information. Without first observing a model, novices might pay attention to this kind of information only when memory demands have decreased because some skills have been automated during performance (Wulf and Shea 2002). By then, however, these novices may have automated skills that later appear to be inappropriate and then are difficult to unlearn or change. From this point of view, it is more effective to have learners first observe and then perform than have them perform from the beginning.

During observation of a model the learners will construct a mental representation of the desired performance without actually performing the behavior. However, eventually the behavior has to be demonstrated by the learner in a variety of situations and contexts. So modeling cannot be treated in an isolated way, but must be regarded in a larger learning framework. In this framework, learners start with observing expert performance, but gradually become more independent and learn to rely on self-regulating mechanisms that enable them to observe their own performance, judge that performance and react on that judgment.

The social cognitive model of sequential skill acquisition (Schunk and Zimmerman 1997; Zimmerman and Kitsantas 2002) provides a learning framework that deals with the transition from observation to performance. The model contends that skills can be optimally acquired in four sequential stages. In the first stage, the observation stage, the learner observes a model performing actions, listens to their considerations, and discerns the consequences of their actions. In this stage models that are effective will be incorporated whereas models that are ineffective will be ignored. In the second stage, the emulation or imitation stage, the learner imitates the model’s behavior at an appropriate level without copying it exactly (e.g., using the same type of question as the model, but with a different wording). In this stage learners are primarily motivated by social guidance and feedback, that is, information from others regarding the accuracy of their performance. The third stage comprises self-control. Learners become capable to perform new tasks independently. In this stage learners achieve some level of skill automation. Although learners still use standards of good performance that are provided by the model, the primary source of motivation is derived from self-satisfaction emanating from a match with the model’s standard. In the final stage, the self-regulation stage, learners are capable of adapting their performance to changing personal and contextual conditions. At this stage learners are primarily motivated by the belief that they are capable of performing the task independently. It is obvious that initially learners heavily depend on the performance and standards of the model, but that gradually they have to rely more on their own performance and on self-regulation processes.

It is essential that the transition from only observation to performance takes place timely. We concur with Schunk and Zimmerman (1997) that both a delayed and premature start with actual performance might hamper learning. Novices starting too early with the imitation stage lack the prior knowledge to attend to the relevant aspects of their own performance. The integration of skills, attitudes, and knowledge involved in learning a complex skill might impose such a demand on cognitive resources that learning is not likely to occur. On the other hand, learners might become demotivated and frustrated when the transition from the observation to the imitation stage is delayed although they are ready to perform tasks themselves. A theory that has dealt with the same issue of the transition between different stages is cognitive load theory (CLT). CLT emphasizes the limited cognitive capacity of the learner as an important determinant for the effective use of instructional methods (Paas et al. 2003, 2004; Sweller et al. 1998; Van Merriënboer and Sweller 2005). According to CLT, instructional designers should take care of three kinds of cognitive load that the learner will experience when performing a task. Intrinsic load, related to the amount of interactive elements in the learning material that have to be processed; extraneous load, related to the cognitive activities that do not contribute to the learning process; and germane load, related to the cognitive activities that strengthen the learning process. The general goal for instructional design is to reduce extraneous load to a minimum, and maximize germane load to a level that remains within working memory limits. The available capacity for any germane process depends on the intrinsic difficulty of the learning materials, which is mainly beyond the control of the instructional designer, as it is dependent on the learner’s prior knowledge.

When confronted with a complex task, novices often do not know where to start. The assumptions of cognitive load theory imply that novices have to be supported when they start to learn complex skills and that this support gradually has to be faded. In the learning tasks sequencing approach based on CLT, novices will start with worked examples in which the problem solution is already given. Initially these worked examples will be process-oriented, that is, the rationale why certain steps in the problem solving procedure are taken will be described (Van Gog et al. 2004). Learning from process-oriented worked-examples shows close resemblance with the observation of models in the social cognitive model of sequential skill acquisition.

However, once a learner understands the rationale behind the problem solving process, the presentation of this information will become redundant and starts to impose an extraneous cognitive load on the learner that will hinder learning. Research based on CLT suggests that this is the moment to fade the instructional guidance and present learners with product-oriented worked examples. At the same time relevant cognitive activities (i.e., germane cognitive load) can be strengthened, for example, by having the learners explain the rationale of the problem solving procedure to themselves and in this way connect new information with prior knowledge (Atkinson et al. 2003; Renkl et al. 2004; Renkl and Atkinson 2003). The next step, from worked examples to solving problems on their own might be too large for learners and they may possibly fall back to a means–ends approach that implies a huge extraneous cognitive load (Sweller 1988). Completion problems provide a bridge between worked-examples and conventional problems. In completion problems only a part of the solution is given and the learner has to complete the solution (Sweller et al. 1998). Finally, the learners engage in conventional problems and solve these on their own.

To conclude, a learning tasks sequencing approach based on CLT research assumes that learners have to be supported in such a way that cognitive processes that do not contribute to learning (e.g., processing redundant information, means–ends problem solving methods) should be avoided, whereas activities that are relevant for learning (e.g., self-explanations) should be stimulated (Paas et al. 2003, 2004). The goal of reducing extraneous cognitive load and maximizing germane cognitive load can be used as a blueprint for the design of interactivity in the social cognitive model of sequential skill acquisition. In the next section we will therefore further elaborate on cognitive processes in skill acquisition that impose a germane cognitive load on the cognitive system.

Cognitive Processes in Skill Acquisition

Cognitive load theory assumes that two structures in human cognitive architecture are crucial for the processing of information. Working memory (WM), in which all conscious processing of information takes place, only has a limited processing capacity that is by far inadequate for the amount and complexity of information learners have to process in modern learning environments. The second structure, long-term memory, is a knowledge base with a virtually unlimited capacity that can serve as added processing capacity by means of schemas, that is, cognitive structures in which separate elements are aggregated in one specialized element that can be processed by WM as a single element (Paas et al. 2003). In a complex skill like driving a car, more experienced drivers continuously make use of such aggregated elements (e.g., changing gears) that can be processed by WM as one element. Less experienced drivers need to bring the separate elements, such as declutching, shifting the gear and engaging the clutch, one by one into WM in order to successfully change gears. The construction and automation of such schemas—so that they can be processed unconsciously—is important because it further optimizes the processing capacity of WM.

Schema construction requires other cognitive processes than schema automation. To construct adequate schemas two types of cognitive processes are important; elaboration and induction (Van Merriënboer 1997; Van Merriënboer et al. 2003). In elaboration, learners have to make the new information meaningful by integrating it with the information they already possess (i.e., prior knowledge), that is, they use what they already know to structure and understand the new information. In this way the schema becomes more elaborated with an increased accessibility, better recall, and higher transfer of learning. In induction learners have to generalize concrete learning experiences (e.g., observation of modeled behavior) into more abstract schemas. More abstract or generalized schemas can also be applied to tasks that are different from the tasks that were solved during learning (Cooper and Sweller 1987). Besides generalizing from learning experiences induction also comprises discrimination, that is, constructing a specific schema that can be used only under restricted conditions. When learners only have little prior knowledge, elaboration and induction will be addressed in the observation and imitation stages in order to construct initial schemas. However, when learners have observed some models and have engaged in the imitation stage their schemas may have become more sophisticated. For example, by observing and imitating performance a schema can be enriched with information about which strategy can best be used for a particular task. When these enriched schemas enable learners to engage in self-regulative activities, like comparing strategies, elaboration and induction can also be addressed in the self-control and self-regulation stages.

For the automation of schemas, compilation and strengthening are the most relevant cognitive processes (Anderson 2000; Van Merriënboer 1997; Van Merriënboer et al. 2003). Newly constructed schemas can be used to provide an initial solution for a problem. Through compilation a highly specific schema is created in which actions are directly coupled to conditions in this schema. Once a task meets the conditions in the schema, actions are triggered automatically. Strengthening refers to the chance that an automated schema will occur. A recently automated schema still has a weak strength, that is, a low chance that it will be activated under specific conditions. By repeatedly applying the automated schema successfully the chance that the schema will occur under specific conditions will increase. Compilation and strengthening enable learners to further automate a schema and thus release cognitive capacity for more controlling and regulating activities that are associated with the self-control and self-regulation stages of the social cognitive model of sequential skill acquisition. This implies that compilation and strengthening should already be addressed in the imitation stage and, if necessary, continued in the self-control and self-regulation stages.

Interactivity in Electronic Learning Environments

Interactivity in learning environments can take on different forms. We concur with Kennedy (2004) that the focus of interactivity research should be on the cognitive processes that learners engage in rather than the technical aspects of interaction. Dependent on the specific cognitive consequences, interactivity might help but sometimes even hinder learning. This may be one of the main reasons why research on the effectiveness of learner control in computer-based instructions has resulted in such mixed findings (for reviews see Kay 2001; Lin and Hsieh 2001; Niemiec et al. 1996; Williams 1996).

Kennedy (2004) has proposed a model of interactivity that is centered around instructional events. Instructional events can be regarded as tasks within a (multimedia) program that are presented to or completed by learners with the intention of learning. In the social cognitive model of sequential skill acquisition, such an instructional event could be the observation of a model performing a task or a task that learners have to perform themselves. The interactivity model differentiates between two highly related and dependent levels of interactivity. The first level, functional interactivity, comprises the behavioral responses learners engage in when they face an instructional event. These behavioral responses can vary from moving forward in a video in which a model performs a task to a more complex response like manipulating variables in a simulation. This relation is reciprocal, meaning that not only instructional events trigger behavioral responses, but also that in turn the behavioral responses of learners determine the occurrence of specific instructional events. For example, when learners are prompted to predict the next step of the performance of a model, their answers might determine the response of the learning environment. That is, an incorrect answer will result in corrective feedback by the model whereas a correct answer will result in confirmation. The second level, cognitive interactivity, proposes that these behavioral responses mediate between the instructional events and (meta)cognitive processes. For example, a prompt from the learning environment to predict the next step of a model by answering a question might trigger cognitive processes that enable learners to select and organize information presented in the instructional event and integrate it with their prior knowledge.

As indicated, cognitive interactivity not only aims at processing the information, but also at metacognitive processes like monitoring. In this respect interactivity not only offers support for learning complex skills, but also might help learners to acquire self-regulation skills (Kinzie 1990). The mediating role of behavioral responses between the instructional event and the intended cognitive processes is of the utmost importance in interactivity. Merely clicking a button to get to the next screen without the purpose to elicit desired cognitive processes can therefore not be regarded as an interaction relevant for learning. We concur that guidelines for interactivity in learning environments should address both the functional and the cognitive component, and describe what behavioral actions and/or responses are addressed and which cognitive processes they mediate.

In this respect the nature of the behavioral response determines at least partly the quality of the cognitive process. However, the effect of the instructional event and its behavioral response on learning might be dependent on other factors that on itself have nothing to do with the construct of interactivity. We consider prior knowledge, which is closely related to task complexity as the most important factor. In terms of cognitive load theory, complexity can be defined as the number of elements that interact (Chandler and Sweller 1994). What novices conceive as complex will be simple for experts because they already possess aggregated schemas that will help them to reduce the number of interacting elements. Recent research on cognitive load theory, for example, has proven that design guidelines that are beneficial for novice learners can be ineffective or even detrimental when applied to experts (Kalyuga et al. 2003). Novices typically lack the cognitive schemas that may reduce the burden on working memory and enable the learner to process information effectively. In the case of novices, the application of some of our design guidelines can compensate for this lack of schemas. More experienced learners, however, already possess schemas to process information effectively and these guidelines may yield instruction that is less effective for them. Recent multimedia studies as well have emphasized the importance of prior knowledge regarding the effectiveness of interactivity. A study of Lowe (2004) showed that novice learners found the facility to interact with animations quite unhelpful. Since they did not have sufficient background to know what aspects of the animation required further exploration, they engaged in unsophisticated interaction with the animation and did not extract the essential information. In addition, Schnotz and Rasch (2005) showed that activities that yield a germane load for low expertise students may yield an extraneous load for high expertise students and vice versa.

Guidelines for Interactivity in Modeling

As argued before, modeling and interactivity seem to be somewhat incompatible at first sight, as the essential characteristic of modeling is that learners observe and do not perform. However, with video-based models, interactivity can stimulate learners to engage in active cognitive processes during observation. Moreover, we embedded modeling within the social cognitive model of sequential skill acquisition in which it is only the first of four stages in becoming a proficient performer. From the second stage of the model onward, learners start to perform themselves and engage in behavioral actions that change the instructional event. This provides ample opportunities for interactivity, for example, by directing their behavioral actions in such a way that the intended cognitive processes will occur.

As far as we know hardly any research has been conducted with the purpose of investigating the effect of interactivity on cognitive modeling. However, studies regarding the modeling of motor skills have investigated instructional techniques, such as working in dyads that also fit our definition of interactivity. Moreover, multimedia learning research has produced guidelines for interactivity that can be useful for our purposes. Also, research originating form cognitive load theory has focused on cognitive processes that are invoked by interactivity. Although we will provide a guideline that addresses compilation and strengthening, these cognitive processes are mainly related to repetition which provides little opportunity for interactivity. Therefore, the primary focus of our review is on the issue how interactivity can address elaboration and induction in the four stages of the social cognitive model of sequential skill acquisition.

Table 1 presents a set of guidelines originating from existing research on the modeling of motor skills, multimedia learning, and cognitive load theory. For each guideline it is indicated which type of interactivity it represents, what cognitive processes are supported, and how the level of prior knowledge affects the way that the guideline should be implemented. In the following, all guidelines will be discussed one by one.

Table 1 Guidelines for Interactivity that Promote Learning from Video-based Models Classified by Type

Pacing

Pacing involves the control over the continuation of a model’s actions on video (or an animation), which can be exerted by either the learner or the system (e.g., a computer). The opportunity to pause, continue or move forward and backward enables learners to adapt the video-based model to their cognitive needs (e.g., by going back to a specific action of the model). For example, in a study by Schwan and Riempp (2004) learners saw a video about nautical knotting and could accelerate, decelerate, stop or repeat the video. This control of pacing was heavily used, especially with increasing knot difficulty. The more difficult the knots became, the more use of control options was made, which resulted in a better understanding of the underlying processes and less practice time was needed. However, for learners with a low level of expertise, interactive pacing can be problematic, as they do not yet know where and when to apply learner control. Novice learners might benefit more from a combination of pacing with segmentation by the instructional designer. Evidence for this suggestion comes from multimedia studies reporting a positive effect of learner controlled pacing when the instructional material was segmented (Mayer and Chandler 2001, Experiment 2; Mayer et al. 2003, Experiment 2a and 2b).

The behavioral component of this type of interactivity contains that learners can navigate to specific locations or moments in the video that are relevant for them. Inexperienced learners can be supported by giving them a segmented video that helps them to discern important moments in the video. In that case the behavioral component ranges from pausing/playing and moving forward/backward between segments. The cognitive processes facilitated by pacing comprise mainly elaboration: learners can go to information that is relevant to them and make explicit connections with their prior knowledge. For this reason, pacing can best be implemented in the observation stage.

Cues

According to Mayer and Moreno (2003) signaling or cueing is the provision of cues to the learner on how to select and organize the instructional material. In this respect signaling covers a broad spectrum including stressing key words in a speech by a different intonation, organizing words in printed text by underlining them, and adding arrows to a picture to focus the attention to a particular part of the instructional material. Novices in particular will have problems attending to the relevant aspects of the instructional material and will engage in high visual search at the cost of cognitive capacity. Some studies report that visual cues do aid learning in multimedia instructions resulting in higher retention scores (e.g., Tabbers et al. 2004), whereas others show that cueing can be even more effective when the amount of necessary visual search, such as in complex graphics, is high (e.g., Jeung et al. 1997). Furthermore, Mautone and Mayer (2001, Experiment 3) investigated the effect of signaling in a narrated animation and found that signaling was only effective when both the animation and the narration were signaled (signaled words were spoken with a slower, deeper intonation), but not when only the animation or the narration was signaled. For more experienced learners these cues might present redundant material because they are capable of attending the relevant parts of the instructional material themselves. Consequently, for them these cues impose an extraneous cognitive load.

We propose to present more experienced learners with uncued video-based models with the possibility to merge cues when they need them. For more inexperienced learners we propose cued video-based models with the possibility to dissolve cues when they do not need them anymore. Obviously, emerging or dissolving a cue is a behavioral response performed by the learner. The cognitive processes these responses mediate are elaboration and induction. For high experience learners uncued models with the possibility to incite cues when they need them might invite them to engage in both elaboration (i.e., the cued information helps them to select relevant information and connect this with prior knowledge) and induction (i.e., the cued information points to information that is generic and thus facilitates the construction of more abstract schemas). For low experience learners dissolving cues might engage them in elaboration because it enables them to make the uncued information meaningful by connecting it with their prior knowledge. This guideline is most appropriate in the observation stage of the social cognitive model of sequential skill acquisition.

Control over appearance

Rapid developments in computer and software technology in the last decades have enabled a dimension of learner controlled interactivity that goes beyond merely controlling the pace of presentation, because learners can also manipulate the appearance of the modeled performance, such as zoom in/out on a specific part of the performance or observing the performance of a model from another angle (Hegarty 2004; Schwan and Riempp 2004). For example, during the observation of a surgery by an expert surgeon, the learner can decide to zoom in to a specific act in order to get a better view of the fine-grained motor skill. Moreover, studies in medical training and visual object recognition in which learners could manipulate 3-D computer visualizations and observe the instructional material from multiple views suggest that control over appearance can be effective (Garg et al. 2001; Harman et al. 1999; James et al. 2002).

The behavioral component of this kind of interaction is the action of the learner deciding what information to observe and how to observe it. Manipulation of the appearance of the instructional event supports elaboration because it enables learners to view the model form a perspective that enables them to connect the new information with their prior knowledge. Similar to the pacing/segmentation guideline, novices might have problems with deciding where and how to engage in control of the perspective or appearance of the task or the performance of a model. Therefore, this guideline might only be effective when learners have already observed a few models or when they are in the self-control or self-regulation stage and go back to the observation stage. An alternative, emerging from research on 3-D computer visualizations on medical training and visual object recognition is to allow learners only a limited degree of freedom in manipulating the model (Garg et al. 1999). In that case novices might also apply control over appearance in the observation stage from the beginning.

Prediction

Asking learners to explain to themselves aloud what they have understood from a learning task has proved to be an effective instructional method (Chi et al. 1989; Renkl 1997; Renkl and Atkinson 2002). A specific type of generating self-explanations is predicting the next step in a process or event, such as the performance of a model. The beneficial effect on learning of prompting learners to predict the next step in a process has been replicated in several multimedia studies (Hegarty et al. 2003; Mayer et al. 2003; Moreno et al. 2001).

The behavioral component of this type of interactivity contains that learners have to respond by providing an answer to a specific question or to a prompt to predict the next step in a process (e.g., the performance of a model). In case a learner provides an incorrect self-explanation, the model might adapt the performance and respond on the answer. A correct prediction may result in confirmation and approval. The cognitive processes in prediction comprise elaboration: the activation of prior knowledge and subsequent integration with newly presented information. Moreover, the prompting can be used to promote induction, for example, by having learners think about the general applicability of a model’s behavior in a specific situation. These cognitive processes make the prediction guideline relevant in both the observation and imitation stages of the social cognitive model of sequential skill acquisition.

Working in dyads

An interesting technique that is not directly associated with interactivity is the alternation between modeling and performing. A special form of this kind of alternation is working in dyads. A study with a video-game task revealed better transfer scores for dyads in which one learner performed and the other observed while switching roles than for a situation in which learners only practiced or for a control group (Shea et al. 2000). In another arrangement dyads worked on the same task but each learner only performed part of the task (e.g., using the joystick) and observed the other learner who performed another part of the task (Shebilske et al. 1992). Results showed that the demands on working memory decreased without affecting learning performance compared to learners that performed the task individually. The alternation between observing and practicing allows learners to process specific information that they would not be able to do when practicing the skill because of the high demand of cognitive resources. After initial modeling learners have constructed a preliminary mental representation that can be further refined through the information originating from performing the skill. By alternating between observation and practice, the mental representation will be enriched (Weeks and Anderson 2000; Wulf and Shea 2002).

When dyads work on the same task simultaneously, the behavioral component of interactivity is clear: The actions taken by one learner will change the instructional event for the other learner whose reaction then will have implications for the first learner. It is less clear, however, when the dyads work on a task in turns. In that case the dyad should be regarded as a unity: The performance of one partner of the dyad provides the instructional event that the other partner of the dyad has to observe and vice versa. Peers working in dyads are likely to make the same kind of typical errors. In a study about dart-throwing, Kitsantas et al. (2000) found that peer models made key errors but that they promptly corrected these errors. They also found that observers of peer models learned to identify and eliminate such errors, whereas observers of expert models missed the information of error correction. For learners who work in dyads, the typical errors and the information that is presented when these errors are corrected gives learners the opportunity to induction, that is, classify these errors and their corrections and in this way construct more generalized or specific schemas. Moreover, the information about the correction of an error might incite elaboration, that is, activate prior knowledge of learners and integrate it with the new information. Whereas dyads always imply some performance this guideline will be most effective in the imitation stage.

Personalized task selection

A promising instructional technique following from the four-component instructional design model (4C/ID: Van Merriënboer 1997) is the personalized task selection in which learners choose a task or model from a subset that is based on their recorded performance and invested mental effort (Corbalan et al. 2006; Van Merriënboer 1997). Personalized task selection comprises that learners have to self-assess their performance and to select an appropriate task or model from a subset of tasks or models. The behavioral component of interactivity contains that learners have to select a task or model from a presented subset. Having the learner self-assess their performance and level of expertise makes that elaboration and induction are involved as cognitive processes. This type of interactivity is only useful when learners have sufficient (prior) knowledge to assess their performance. Therefore, this guideline can be applied in both the self-control and self-regulation stage of the social cognitive model of sequential skill acquisition.

Reflection prompts

As stated earlier, the final goal of the social cognitive skill of sequential skill acquisition is that learners will become self-regulative, that is, capable of adapting the performance to changing circumstances. In this respect comparing different strategies to perform a task is an important dimension of self-regulated learning. According to Borkowski et al. (1990), reflection can be conceived of as a strategy or skill that operates on other strategies, that is, a form of personal mental experiment which is conducted to compare strategies to each other. It is assumed that in electronic learning environments reflection prompts can be used to provoke learners’ reflection on the way they performed a task and compare it with other strategies to perform the task (Butler 1998; Seale and Cann 2000; Winne and Stockley 1998). An indication for the practical value of reflection prompts was provided by a study of Van den Boom et al. (2004) who found that reflection prompts combined with tutor feedback had positive effects on the development of students’ self-regulated learning competence. The behavioral component of this type of interactivity contains that learners have to respond on a specific question or prompt to evaluate their performance. The cognitive processes involved in reflection prompts comprise those processes that enable learners to compare strategies and connect them with specific situations. In this sense both elaboration and induction are facilitated by reflection prompts. Because reflection prompts aim to help learners to evaluate strategies to perform a task, this guideline can be applied in both the self-control and self-regulation stage of the social cognitive model of sequential skill acquisition.

Personalized task selection with part-task practice

The performance of complex tasks involves the integration and coordination of skills, knowledge, and attitudes. These constituent skills involve both recurrent skills that are highly consistent across different tasks and non-recurrent skills that need to be adapted to the specific characteristics of a task. For example, the skill ‘driving a car’ involves the recurrent skill ‘changing gears.’ In complex tasks it might be necessary to have learners engage in part-task practice, that is, to perform tasks in which they have to practice such recurrent skills with the purpose to automate these skills and release cognitive resources for others aspects of the constituent skill (Van Merriënboer 1997). Personalized task selection with part-task practice enables learners to choose form a subset of recurrent part-tasks (e.g., because they feel uncertain about a particular recurrent skill) which they then practice intensively. The behavioral component of interactivity involves that learners are able to select part-tasks themselves. The cognitive processes promoted by part-task practice comprise compilation but particularly strengthening, that is, increase the chance that the recurrent skill will be automatically performed under specific conditions. As argued earlier, compilation and strengthening aim at releasing cognitive capacity for self-regulating activities. For this reason, personalized task selection with part-task practice is most appropriate in the imitation stage of the social cognitive model of sequential skill acquisition, although it may be used in the self-control and self-regulation stages as well.

Discussion

In the previous sections we have argued that interactivity in video-based models can be applied as an instructional method to engage learners in relevant cognitive activities. The nature of modeling, which is observing how an expert or peer performs a complex motor or cognitive skill, seems incompatible with having learners interact with the model. However, we argued that the final goal of modeling is to have learners perform the skill themselves. Therefore, we adopted the social cognitive model of sequential skill acquisition in which observing a model is the important first stage in learning, followed by respectively the imitation, self-control, and self-regulation stages. The model supports learners in their transition from observers to independent performers of complex skills. Moreover, we contended that interactivity should not only be described in terms of behavioral components, that is, actions and strategies from learners that affect the instructional event (i.e., the task or model), but also describe the cognitive processes that are mediated through the behavioral components. Cognitive load theory emphasizes that cognitive processes should be focused on the construction of schemas and the automation of these schemas in order to release cognitive resources. Elaboration and induction are cognitive processes that facilitate schema construction, whereas compilation and strengthening facilitate schema automation. We identified 7 guidelines for implementing interactivity that facilitate schema construction and that can be applied in all of the four stages of the social cognitive model of sequential skill acquisition: Pacing, cues, control over appearance, prediction, working in dyads, personalized task selection, and reflection prompts. On the other hand, personalized task selection with part-task practice was identified that can be applied to compilation and strengthening.

The review gave rise to some issues that need to be resolved by systematic research. To start with, the guidelines are presented in isolation, but it is not clear how they are related. In other words, does an ideal sequence of interactivity guidelines exist when learners go through the social cognitive model of sequential skill acquisition? For example, is it prudent to present novices with the prediction guideline right from the start, or should one begin with pacing in combination with predefined segments and apply prediction only in the imitation stage? Another interesting avenue of future research is the question whether the effectivity of isolated guidelines can be increased by combining them. The combination of pacing and segmentation already has shown the promise of such an approach (Mayer and Chandler 2001; Mayer et al. 2003). Recently, Paas et al. (Instructional efficiency of interactivity in traced animation, in press) combined segmentation and prediction in a technique called tracing. The tracing strategy keeps information available in a sequence of key frames (i.e., the segments), rather than having it replaced by the ongoing performance. The trace keeps information available and thus decreases extraneous cognitive load and frees up cognitive capacity that then can be used for cognitive processes that are beneficial for learning. Two strategies were expected to stimulate the construction of schemas. Learners were prompted to either mentally reconstruct the previous segment by interpolating from a presented segment or mentally construct the following segment by interpolating from a presented segment. The results showed that learners in the tracing strategies performed more efficient than learners in the non-interactive instructional condition.

Secondly, the level of expertise has a strong mediating influence on the effectiveness of the guidelines. A guideline effective for a low experience learner may be ineffective for a high experience learner. In the field of multimedia learning some recent studies have indeed revealed that interactivity can have a detrimental effect on learning, imposing extraneous cognitive load instead of germane cognitive load (Lowe 2004; Schnotz and Rasch 2005). In the social cognitive model of sequential skill acquisition, the level of expertise determines when a learner will go from one stage to another. This makes a rapid and reliable measurement of the expertise level a sine qua non for successful implementation of interactivity in video-based modeling. A method proposed by Kalyuga and Sweller (2004) requires learners to use schemas in order to recognize a problem state and retrieve the appropriate next solution step, The method has promising results, because the time involved in assessing expertise levels was drastically reduced compared with more traditional assessment methods. A limitation to this method, however, is that it is only tested with relatively well-structured problems, and not with ill-structured problems in complex domains that can be solved in various ways. The importance of the level of expertise requires instructional designers to be careful when implementing interactivity. In this respect the effectiveness of modeling can benefit much from future research on instructional design in which the level of expertise is a taken into account.

To conclude, the implementation of interactivity in video-based models meets two focal points of contemporary educational theory. First, the modeling of (cognitive) processes in complex tasks is in line with the current focus on lifelong learning, problem solving and self-regulation. Second, it supports the notion that learners should actively and cognitively construct meaning from instructional material rather that passively taking meaning.