Digital technology offers exciting new opportunities for learning. Learners can use technology to explore virtual realities, use their tablet computers to interact naturally with their learning applications, and gain knowledge with high-fidelity simulations of the real world. However, recent research suggests that several design factors involved in digital learning can induce an increase in cognitive load. For instance, feeling completely immersed in a virtual world during a learning task can create a whole new learning experience when compared with learning with traditional media, but this immersion may also lead to a depletion of learners’ cognitive resources on that experience itself rather than contributing towards learning (e.g., Frederiksen et al., 2020). Thus, design factors that would intuitively be thought to be a positive aspect can turn out to lead to unwanted cognitive load. This type of task-irrelevant load plays an important part in cognitive load theory (Sweller et al., 1998), a model in instructional psychology that specifies different components of mental resources competing for learners’ working memory capacity. Cognitive load theory assumes that unnecessary cognitive load can be eliminated or reduced by optimizing the design of learning materials (Sweller et al., 1998). When these distractions are minimized, learners’ cognitive systems are thought to have more mental capacity available to transform the information currently being attended to into long-term memory traces (Sweller et al., 1998).

Although this fundamental principle conveyed in cognitive load theory research (i.e., avoiding unnecessary cognitive load to allow a focus on the actual learning) has been supported by numerous studies over the years (Sweller et al., 2019), recent investigations centering on the new affordances of digital and online learning have brought to light several results that can pose a challenge for the theory (e.g., Brom et al., 2018; Skulmowski et al., 2016; Skulmowski & Rey, 2020b). In a nutshell, these results, often involving perceptually rich elements such as detailed visuals or interactive responses generated by the learning application, can induce a certain extent of irrelevant cognitive load while still fostering learning outcomes.

The aforementioned examples of using digital technology for learning have also been referred to as online learning collectively (Mayer, 2019), although the terms digital learning and technology-enhanced learning can usually be used interchangeably. In line with this definition, the range of modes for online learning can span from relatively simple web pages to highly immersive digital environments (Mayer, 2019). In this paper, we review five research areas related to online learning that have demonstrated such seemingly paradoxical findings of an improved learning performance despite additional extraneous cognitive load. These areas of research mainly consist of studies in which digital learning is investigated and their results can be used to further refine cognitive load theory.

The five challenges are interactive learning media, immersion, realism, disfluency, and emotional design. As interactive learning media, we refer to all forms of online learning that revolve around providing learners with computer-generated responses based on user input (see Domagk et al., 2010), such as simulations that allow learners to test their hypotheses in virtual counterparts of chemistry labs (for an overview, see Plass et al., 2009) and interactive presentations that feature controllable on-screen elements (e.g., Song et al., 2014). Interactive learning media often feature an immersive quality, for instance when a virtual reality head-mounted display is being used. The impression of immersion can be generated by blocking out learners’ external surroundings so that they feel as if they were “teleported” into a virtual reality (Dede, 2009). In order to achieve a believable digital environment, realistic graphics are usually employed that can include minute and possibly distracting (see Brucker et al., 2014) details. Therefore, realistic visualizations are often regarded as the opposite of simplified, schematic line drawings, with other forms (such as slightly more detailed and shaded drawings) falling in-between these extremes (for overviews, see Dwyer Jr, 1969; Höffler, 2010). Immersion, interactivity, and realism all have in common that they are known, or at least theoretically supposed to induce additional task irrelevant cognitive load (Makransky et al., 2019; Skulmowski & Rey, 2020a, b) but also to increase learning performance under certain circumstances (Frederiksen et al., 2020; Johnson-Glenberg et al., 2014; Skulmowski & Rey, 2020b). Another research area suggests that a slightly raised level of cognitive load may have the potential to benefit learning is disfluency. Disfluency is operationalized by adding slight distractions to the learning material (for instance, less readable fonts, Diemand-Yauman et al., 2011) and can therefore contribute towards a higher level of unnecessary cognitive load (Seufert et al., 2017). However, some studies show that disfluency can actually enhance learning (Eitel et al., 2014, Exp. 1). Similarly, redundant elements such as embellishments stemming from the notion of emotional design have also challenged the assumption that irrelevant cues should always be avoided (Brom et al., 2018). In numerous studies, it has been shown that introducing minor visual cues that trigger positive emotions (such as warm colors and child-like faces) can enhance learning (e.g., Brom et al., 2018; Plass & Kaplan, 2016).

Given the rapid development in digital learning, it is important to also advance the understanding of cognitive load theory in line with this growing body of findings. In the present article, we aim to propose theoretical advances regarding the theory that are based on relevant empirical developments and thus make allowance for a more comprehensive paradigm of cognitive load. In the following, we first present a brief review of the theoretical basis of cognitive load theory, and then, we summarize the five already mentioned challenges in greater detail. Based on these findings, we devise a revised theoretical model to align different types of cognitive load with specific forms of mental processing. Using this strategy, cognitive load theory can be reconciled with the contradicting results from digital learning briefly summarized above that are then elaborated in later sections of the paper.

The 1998 Cognitive Load Theory Model

Cognitive load theory (CLT) is a theoretical model of learners’ working memory and the different categories of load that can fill their memory capacity (Sweller et al., 1998). The most influential iteration of this theory, which we will refer to as the 1998 model for the remainder of this paper, partitioned the working memory demands of instructional settings into three load types, intrinsic cognitive load (ICL), extraneous cognitive load (ECL), and germane load (GCL; Sweller et al., 1998). This model assumed that learning materials have an inherent complexity that stems from the number of information units and the number of their connections, termed as “element interactivity” (Sweller et al., 1998). Although intrinsic load cannot be lowered by means of instructional design, it can vary according to the learner’s level of prior knowledge such that for the same learning task a novice will experience higher intrinsic load compared to an expert (Kalyuga, 2005; Sweller et al., 1998). In contrast, the way that the learning contents are displayed can be substantially affected through a myriad of design choices, which can induce extraneous load (Sweller et al., 1998). This type of load can be generated by inefficient instructional procedures such as unguided problem-solving, as well as unnecessary actions, gaze patterns, or mental operations that learners must perform while working with learning materials (Sweller et al., 1998). It is important to note that these early studies in the development of the theory usually dealt with extraneous load that is caused by design factors found in traditional, non-digital learning media. This includes redundant textual information presented alongside a diagram (Chandler & Sweller, 1991), diagram labels that are placed at an inconvenient distance to a diagram (Chandler & Sweller, 1992), and several detrimental effects related to the presentation of (mathematical) problem-solving tasks (e.g., Sweller & Cooper, 1985; van Merriënboer & Krammer, 1987). Naturally, it follows that instructional designers should minimize extraneous load in order to avoid overloading the mental capacity for dealing with the actual learning contents, i.e., the intrinsic load (Sweller et al., 1998).

The same model also proposed a new component of this model (in comparison to an even earlier version of the model which consisted only the intrinsic and extraneous cognitive load; Sweller, 1988), the germane load (Sweller et al., 1998; see, e.g., Kalyuga, 2011, for an overview of the debate surrounding this component). In the 1998 model, germane load is defined as cognitive resources that may be devoted to the mental processes aimed at generating and storing newly acquired knowledge into long-term memory, for instance abstraction or the composition of task-oriented representations, so-called schemas (Sweller et al., 1998). In this model, germane load has an additive relationship with the other two cognitive load components. Thus, intrinsic load and extraneous load must be low enough in order to leave sufficient cognitive resources for processes associated with germane load to occur (Sweller et al., 1998). This model can be seen summarized in Fig. 1.

Fig. 1
figure 1

A comparison between the 1998 model by Sweller et al., the 2019 model by Sweller et al., and a combination of various recent developments related to the cost-benefit approach to cognitive load. In the 1998 model, intrinsic load (ICL), extraneous load (ECL), and germane load (GCL) contribute towards the total mental load (visualized as the gray container). In the 2019 model, germane load is not thought as a component of total load anymore, but rather as germane processing. The cost-benefit model includes the notion that a learning task can contain multiple types of extraneous load that are associated with different degrees of germane load. In the right part of the figure, this is exemplified by two types of extraneous load that are connected to two germane processes

The Revised 2019 Model and Other Recent Advances

Although the 1998 model was embraced by the field of instructional psychology (Ozcinar, 2009), some aspects of this iteration remained controversial. In particular, the germane load has sparked several discussions regarding its concrete definition (e.g., Debue & van de Leemput, 2014) and its theoretical relevance and basis (e.g., de Jong, 2010; Kalyuga, 2011; Leppink & van den Heuvel, 2015). As a result, calls to simplify the triarchic structure into a dyadic model consisting only of intrinsic load and extraneous load became more and more prevalent (e.g., Kalyuga, 2011; Leppink & van den Heuvel, 2015). Some authors raised the question whether germane load can simply be added towards the total cognitive load (e.g., de Jong, 2010). This is an interesting turn of events, considering that older papers by Sweller featured a two-component model of intrinsic load and extraneous load (e.g., Sweller, 1988). Jiang and Kalyuga (2020) recently presented a factor analysis of cognitive load survey responses and argue that a two-factor model consisting only of intrinsic and extraneous load is appropriate.

In response to these developments, Sweller et al. (2019) presented a revised version of CLT along with a discussion of important theoretical advances and current research topics. The major change from the 1998 model is that Sweller et al. (2019) have removed germane load from the additive equation of total load, but rather a dimension now called “germane processing.” Thus, an increased germane load is no longer believed to lead to cognitive overload and can thus no longer be assumed to be detrimental to learning performance. We call this new model the 2019 model for the remainder of this article. This model is presented in the middle of Fig. 1.

Other important recent contributions towards a more holistic description of learning settings as they pertain to cognitive load include a theoretical work by Choi, van Merriënboer, and Paas (2014; see also Paas & van Merriënboer, 1994). The model of Choi and colleagues (2014) emphasizes the interactions between learners, their learning tasks, and the learning environment as potential contributors to cognitive load. Although the authors did not review specifically research in digital learning environments, they review how other environmental factors such as noise or high temperatures can negatively affect learning (Choi et al., 2014). Below, we first review relevant theoretical advances pertaining to cognitive load for digital learning, and then elaborate on the role of motivation in relation to extraneous load and digital learning.

Relevant Advances in Reconceptualizing the Cognitive Load Types for Digital Learning

Authors from a wide range of disciplines have contributed to the reconceptualization of cognitive load and investigated the fundamentals of cognitive load. In this section, we review those reconceptualizations that have the most relevance for the field of digital learning. One interesting model is aimed at transferring insights from CLT into the field of ergonomics with an emphasis on extraneous load. In their paper, Hollender et al. (2010) present an adapted CLT model that integrates software demands into the extraneous load component. Hollender et al. (2010) propose to reformulate extraneous load as the sum of the cognitive load created through instructional design and the load stemming from software usability (such as demands posed by interfaces). This model introduces an important distinction, namely that there may be different load types due to digital interactions that can add to the extraneous load. A similar emphasis on the need to differentiate different load components has also been formulated recently by Skulmowski and Rey (2020a) based on results from findings that showed how cognitive load surveys aimed at different types of extraneous load (e.g., verbal or software interaction-based cognitive load) can lead to variations in resulting extraneous load measurements, thus justifying the assumption of different extraneous load components. In addition to different load types, it is commonly acknowledged in the CLT literature that there can be many sources of extraneous load depending on the presentation of the task, or other affordances of the learning environment (Schnotz & Kürschner, 2007), for instance visual factors (such as split-attention resulting from a spatial separation of related content; Chandler & Sweller, 1992) or different instruction approaches (Sweller & Levine, 1982). This task-dependent nature of cognitive load has led to more differentiated cognitive load measurement methods. As an example, Andersen and Makransky (2020) developed a cognitive load survey specifically for virtual reality learning environments that divides extraneous load into three sub-facets: instruction, interaction, and environment (see also the simulation task load index developed by Harris et al., 2020).

There were also attempts to frame the different load types as specific processes on the neural level based on research from neuro-imaging studies (Whelan, 2007). Whelan summarizes studies that vary in task difficulty as a way to investigate the neural correlates of intrinsic load. He concludes that intrinsic load can be regarded as “maintenance and manipulation demands placed on the prefrontal cortex” (Whelan, 2007, p. 7). Extraneous load, on the other hand, is described as a “disruption in the activation of the sensory modality-specific mechanisms that underlie attentional modulation” (Whelan, 2007, p. 7). Whelan’s (2007) conceptualization links extraneous load to perceptual demands and views the effects of intrinsic load as a burden on attention and working memory. Furthermore, Whelan summarizes that germane load is often linked to motivational processes in the literature. He reviews a neuro-imaging study by Taylor and colleagues (2004) investigating the role of rewards on motivation and working memory. The overlap in brain activity found in the study supports the assumption of a connection between motivation and working memory (Taylor et al., 2004). In sum, transferred to the field of digital learning, Whelan’s (2007) reconceptualization of CLT can be interpreted as being in line with the conceptualization that extraneous load should be regarded as stemming from perceptual obstacles induced by learning materials, while germane load may be thought of as the extent to which learners are motivated to utilize their working memory resources.

A number of CLT researchers also explicitly refer to the notions of costs and benefits when discussing cognitive load (e.g., Kalyuga, 2007, 2012; Kalyuga & Singh, 2016). Such perspectives assume that the effects of different design factors can be framed as cognitive costs and benefits that need to be considered in context (e.g., Skulmowski et al., 2016). We summarize the core of this approach as considering costs (more or less synonymous with extraneous load) as linked to benefits (that would usually be referred to as germane processing). As we will discuss in the remainder of this paper, digital learning often involves a trade-off between risks of exposing learners to extraneous load and the benefit of letting them profit from novel and engaging experiences (for instance, when using virtual reality, e.g., Frederiksen et al., 2020; Makransky et al., 2019). The right panel in Fig. 1 shows this model that we will refer to as the cost-benefit model. In this fictional example, the factor of immersion may be one of the two extraneous load components shown in the cost-benefit model in Fig. 1. The extraneous load of immersion may be linked to one of the two germane load components shown in Fig. 1 that would actually benefit specific learning processes (Makransky et al., 2019). Furthermore, it should be noted that the cost-benefit model depicted in Fig. 1 integrates the idea of different types of extraneous load discussed earlier.

As can be seen from this overview of relevant literature, there has been a considerable effort to capture the nature of the different components of cognitive load, in particular extraneous load. Newer research highlights that extraneous load can be divided into different components that might require different measurement strategies (Hollender et al., 2010; Skulmowski & Rey, 2020a). Additionally, some authors emphasize a cost-benefit approach that associates different design factors with cognitive costs and benefits (e.g., Skulmowski et al., 2016). In a cost-benefit approach, extraneous load is usually seen as a cost and germane load as a benefit. It should be noted that over the years, the role of germane load has been often associated with motivation that contributes towards learner engagement and facilitate the “germane processes” (Sweller et al., 2019). Figure 1 summarizes the theoretical developments with the highest relevance for our strategy of cognitive load management in the context of digital learning. While the 1998 CLT model includes three components that make up the total cognitive load that learners need to hold in their limited working memory, the two-component 2019 model omits germane load from the total cognitive load. The cost-benefit model links different design factors to extraneous load types that feature associated forms of germane processing.

Motivation in the Context of Cognitive Load Theory

As we have seen in the preceding section, there are links between CLT and motivation that warrant a closer examination. The research literature already contained suggestions that CLT could benefit from a stronger inclusion of motivational aspects (e.g., Schnotz et al., 2009). Indeed, a paper by Paas et al. (2005) even included a formula that quantifies motivation (presented as a high involvement) as a combination of a higher cognitive load occurring in tandem with a higher learning performance. However, as we will discuss in later sections, specific design factors of online learning can contribute towards inducing a higher interest, but this does not necessarily entail better learning (Pedra et al., 2015). Mayer (2014) offers a pragmatic outlook on integrating motivational factors into CLT. He emphasizes that some components of learning environments that may be perceived to add unnecessary information or that increase difficulty may, in fact, enhance generative processing (i.e., germane cognitive load) by increasing motivation. Thus, Mayer concludes that such design choices may be desirable under the condition that learners are not constantly being overloaded as previously assumed based on theoretical conjecture.

Indeed, digital learning media are often aimed at offering motivating interfaces in order to foster learners’ engagement; thus, motivation plays an inherent and important factor in the efficacy of digital learning environments. Current research suggests that the interplay between motivation, cognitive load, and learning performance may be more complex than previously assumed. Xu et al. (2020) conducted a study to assess whether learning performance, as well as the subjective feeling of cognitive load, can be modulated by learners’ motivation, by using a growth mindset intervention (i.e., fostering learners’ belief that they are capable of increasing their mental capacity through practice). The results of this study revealed that the inclusion of a motivation prompt, the growth mindset intervention, lowered levels of perceived cognitive load and fostered higher retention and transfer scores. This study supports the role of motivation in learning and showed a complex relationship between motivation and perceived cognitive load. Even though this study was not focused on digital learning, the results could imply that enhanced motivation might counteract to some extent the cognitive burden afforded by the learning task and environment. In other words, the impression of cognitive load can be greatly affected by motivational interventions and could thus be somewhat independent of the actual difficulty of a learning task. In turn, the influence of cognitive load on motivation should also be considered. It has been argued that cognitive load can act as a form of “motivational cost” that may impact learners’ willingness to engage with a learning task (Feldon et al., 2019). Below, incorporating motivational and the cost-benefit perspectives, we review five research areas that utilized design factors challenging existing CLT conceptualizations specifically related to extraneous cognitive load.

Challenges for Cognitive Load Theory

As we have seen in the preceding sections, CLT has come a long way since it was first developed in the 1980s. However, despite the progress in conceptualizing the cognitive load types and their interdependence, recent research results from the field of digital learning challenge some of the assumptions of CLT. Thus, we wish to highlight a number of these results that can be used to further advance CLT. Research on multimedia learning has confirmed several design principles that aim at reducing extraneous load in order to maximize learners’ available cognitive capacity for dealing with the intrinsic load of learning materials. Among these effects, there is the split-attention effect that states that related information should be placed in adjacent regions (Chandler & Sweller, 1992). Another influential effect describes that pictures that consist of distracting or irrelevant information should be avoided in order to steer clear of seductive details (Harp & Mayer, 1998). Furthermore, the use of animations has been discouraged from a CLT perspective, as static information was found to be easier to process than their consecutive presentation (see Höffler, 2010, for an overview of this controversy). Although most of the aforementioned design principles have been found to be reliable across a wide range of studies, recent results have revealed exceptions in which the simple heuristic of reducing potentially distracting or irrelevant information did not lead to better learning performance, but rather had the opposite effect (e.g., Bateman et al., 2010; Eitel et al., 2014, Exp. 1; Seufert et al., 2017; Skulmowski & Rey, 2020b). Importantly, most of these exceptions arise through perceptually rich learning media in the context of online learning. In the following sections, we summarize five current challenges for CLT.

Challenge 1: Interactive Learning Media

Research on interactive learning media has resulted in several findings that appear difficult to reconcile on a first glance. In this paper, we distinguish between different levels of interactivity. A low degree of user-environment interactivity offers only basic and intuitively understandable controls such as the play and stop button of video players (in line with Song et al., 2014). The highest degrees of interactivity can be found in entirely virtual environments that react and adapt to learners’ actions (in line with Johnson-Glenberg et al., 2014). Between these two extremes, there are several examples for a medium level of interactivity, such as controllable sliders (e.g., Homer & Plass, 2014) or movable elements in simulations (e.g., Kalet et al., 2012). It is generally assumed that a high level of interactivity can easily drain cognitive resources by being too distracting (e.g., Kalet et al., 2012; Song et al., 2014; for an overview, Skulmowski & Rey, 2018a). However, there is a considerable level of disagreement regarding which level of interactivity actually benefits learners. On one hand, some studies in which only minor forms of interactivity were used resulted in higher levels of extraneous load, a lowered learning performance, or both. These lower levels of interactivity include clicking to switch between two visualizations (Skulmowski & Rey, 2020a) and controlling movable elements in a simulation (Song et al., 2014). On the other hand, some authors argue that there is a compromise which lies in a medium degree of interactivity that allows learners to use rather simple interactions such as clicking on elements presented on their screen, but without more complex patterns such as mouse dragging interactions (Kalet et al., 2012). It has been suggested that (inter-)activity will be most helpful when it is deeply integrated into a learning task, for example, if the activities that need to be performed during learning are essential for understanding or remembering the learning contents (Skulmowski & Rey, 2018a). Such learning activities are often based on the ideas of embodied cognition (Wilson, 2002) or embodied learning (Skulmowski & Rey, 2018a) which emphasize the role of bodily movement cognitive processes and learning, respectively. Moreover, it has been hypothesized that interactivity can induce interest in a learning topic, which in turn should lead to higher learning scores (Pedra et al., 2015). Studies have shown that a high interactivity in a learning task can indeed increase interest, but also that this does not necessarily ensure a better learning performance than learning with a less interactive animation (Pedra et al., 2015). Furthermore, a literature review has revealed that moderate degrees of (inter-)activity tend to be the most beneficial for learning (Skulmowski & Rey, 2018a).

Online learning offers various opportunities for interactions with devices or interfaces. In some cases, simple movements, such as finger pointing or tracing on tablet screens (Agostinho et al., 2015), have been found to enhance learning compared with situations of passive learning, even when there are no changes in the displayed information. In order to explain these results, CLT theorists increasingly integrate the evolutionary distinction between biologically primary and secondary knowledge (e.g., Sweller et al., 2019). While biologically primary knowledge (such as recognizing faces) does not need to be explicitly learned, biologically secondary knowledge (such as foreign languages or knowledge from school subjects) requires effort and instruction to learn (Geary, 2008, as summarized by Sweller et al., 2019). Agostinho et al. (2015) argue that finger tracing on a tablet can be regarded as biologically primary and may help in acquiring secondary biological knowledge, such as mathematical relations. Judging from their positive results, using simple interactions that rely on knowledge already available (e.g., tracing with fingers) may be a good guideline towards introducing interactivity into a learning task. The result of achieving learning gains through finger tracing could be replicated recently (Ginns et al., 2020). When considering how such implementations of embodied learning act on cognitive processing, it has been argued that the use of biologically primary knowledge in task design only forces little mental demands on learners’ working memory capacity, thereby keeping cognitive resources available for learning (Agostinho et al., 2015, cite Paas & Sweller, 2012).

One form of online learning that is usually thought to have potential for generating motivating and enjoyable learning experiences is the use of gamification (or “serious games”). Transforming learning into educational games has been a major trend in online learning research for several years (for a meta-analysis, see Girard et al., 2013). Research on serious games has revealed that enjoyment and motivation are correlated, but having fun during learning does not necessarily contribute towards an actual learning advantage (Iten & Petko, 2016). However, one study found that having a concrete learning task while playing a serious game leads to worse transfer performance and a higher subjective extraneous load than playing just to have fun (Hawlitschek & Joeckel, 2017). When comparing a serious game to a more traditional interactive simulation, Imlig-Iten and Petko (2018) found no significant differences regarding learning performance, interest, and enjoyment. Interestingly, their study found that learning with a serious game can lead to a stronger subjective impression of deep thinking compared with using a simulation. As can be seen from this short summary of selected studies on gamification, the relationships between enjoyment, motivation, and learning are not as straightforward as one might assume. The fact that there is a temporal component to the emotional states involved in learning with serious games (Cheng et al., 2020; see Knogler et al., 2015, for a more general study on the temporal dynamics of interest) further complicates conclusions regarding the use of instructional games. More generally speaking, interactive forms of learning have long been assumed to foster learner motivation and engagement (e.g., Domagk et al., 2010) and recent developments such as audience response systems have confirmed these potentials (e.g., Blasco-Arcas et al., 2013).

Another important aspect to consider is that some studies revealed that learning gains achieved through interactive learning using nonstandard input devices such as graphics tablets can best be recorded using suitable assessment methods. In one study, different levels of interactivity were compared by giving learners the opportunity to learn physics using a motion-tracking device that captured their movements and thus enabled them to interact with virtual atoms (Johnson-Glenberg & Megowan-Romanowicz, 2017). When learners had used these full-body controls instead of low-interactivity keyboard controls during the learning phase of this study, their test performance was higher if they used a highly interactive test method involving a graphics tablet instead of a mouse. This result highlights that the benefits of learning with higher levels of interactivity can be best detected using similarly interactive test methods (Johnson-Glenberg & Megowan-Romanowicz, 2017).

In summary, research on interactive learning media shows us that in some cases, the most simplified and straightforward way of presenting information may not be the optimal method. Interactivity can be used to create more engaging and motivating learning experiences that have the potential drawback of inducing unnecessary cognitive load resulting from the requirement to first learn certain interaction patterns of a simulation or becoming familiar with the rules of a serious game. However, low and middle levels of interactivity have been found to promote learning and motivation while avoiding a cognitive overload. Furthermore, the benefits of interactivity may require a similarly interactive assessment method in order to have a detectable advantage over traditional modes of learning. This nuanced pattern of results may currently be challenging for CLT to explain, as it is unclear why additional extraneous load can foster learning. However, the cost-benefit approach presented earlier offers a first step towards a more comprehensive understanding of this issue. If a form of learning includes some extraneous load as a cognitive cost, the benefit of an increased level of germane processing may exceed the cost and will therefore promote learning.

Challenge 2: Immersion

The unique selling point of virtual reality technology lies in users’ impression of being transformed into a virtual world. This feeling of being moved into a believable digital surrounding that makes us forget our real environment is called immersion (Dede, 2009). Some researchers heavily emphasize the beneficial role of immersion in learning (e.g., Johnson-Glenberg et al., 2014). However, Frederiksen and colleagues (2020) recently published results that challenge the idea that immersion is always a benefit. In their study in the context of surgical training, immersion and virtual reality training methods could be identified as contributors to excessive extraneous load. Highly immersive virtual reality training in a simulator led to a decrease in various learning process variables compared with less immersive learning settings (Frederiksen et al., 2020). Another recent study sheds light on the possible mechanism behind such results. Makransky et al. (2019) found that immersive virtual reality did not significantly affect retention performance compared to written instruction, but increased transfer scores, enjoyment, and motivation. The assumption that transfer scores are more consistently (or more strongly) positively affected by immersion when compared with other assessment variables is supported by further studies (e.g., Baceviciute et al., 2021; Klingenberg et al., 2020). These results underline that the benefits of immersion are task-dependent. While immersion might not have a positive effect on the mere retention of information (Frederiksen et al., 2020), an immersive virtual reality setup offers more opportunities to deeply process behavioral routines that were particularly important for the tests in the study by Makransky et al. (2019). This result pattern is a challenge for CLT as it currently cannot explain why an instructional method associated with an increase in cognitive load can at the same time boost transfer performance as well as motivation.

Importantly, Makransky and Lilleholt (2018) found two paths over which immersion can act on learners. One path consists of an influence of immersion on positive emotions, while in a second path, immersion increased the cognitive value of tasks. Given the increasing availability of immersive learning media, emotional and motivational aspects related to immersion should receive more consideration within the framework of CLT.

In conclusion, immersion can greatly enhance enjoyment, motivation, and specific learning processes such as transfer while it may leave other outcomes unaffected. Importantly, immersion is linked to a certain degree of extraneous load, but at the same time, this small cognitive cost can lead to substantial benefits, again delivering support for a cost-benefit approach.

Challenge 3: Disfluency

As we have already discussed, perceptual demands of learning materials can be considered a major contributor towards extraneous load. A particularly counter-intuitive field of online learning research is concerned with the effects of disfluency. This term describes the phenomenon of an increased learning performance being triggered through learning material that makes it harder to learn. Examples for the induction of disfluency include the use of less readable fonts (Diemand-Yauman et al., 2011) or a degraded image quality mimicking the output of older photocopiers (Eitel et al., 2014). Usually, the rationale behind disfluency research is an attempt to bring learners to strongly concentrate on the learning materials by requiring them to carefully parse the degraded materials (Eitel et al., 2014) or by presenting them in an unusual way that otherwise requires more effort to process (Diemand-Yauman et al., 2011). As a result of being compelled to exert more effort, it is believed that learners automatically activate more mental resources to process the information (Diemand-Yauman et al., 2011; Eitel et al., 2014). This, in turn, is supposed to lead to an incidental increase in learning by generating a learning activity that prevents superficial processing (Diemand-Yauman et al., 2011). Importantly, we can assume that a slightly less readable font or the look of a degraded image will likely only introduce a slight increase in extraneous load, thereby adhering to the idea of investing a small amount of extraneous load and thus not filling up a significant part of learners’ working memory. From a standpoint of cognitive processing, it should be noted that most of the disfluency manipulations we discuss focus on perceptual aspects of the design and presentation. This is underlined by Eitel et al.’s (2014) decision to include a visual working memory capacity test in one of their studies. Hence, disfluency can be regarded as an example of how extraneous load can be introduced at the perceptual level.

Although several results in the field of disfluency research have been put into question (e.g., Xie et al., 2018), there are a number of results using more subtle disfluent cues that appear to more consistently lead to an increased learning performance (or, at least, not to significant disadvantages). For instance, Seufert et al. (2017) found that out of three levels of varying text legibility, a low level of legibility (and thus, disfluency) resulted in the descriptive result of an increased learning performance when compared to the legible (fluid) control condition. Based on this result, a second study was conducted and revealed that two mildly disfluent conditions descriptively had an advantage in terms of learning over a fluid control group (Seufert et al., 2017). The only significant difference concerning recall performance occurred between the “medium” disfluency group and the “high” disfluency group in favor of a medium level of illegibility. This pattern of results vaguely resembles the previously discussed result that a medium level of interactivity can lead to the highest learning results (Kalet et al., 2012). Although more research on this topic is needed, we can conclude that less pronounced forms of disfluency do not harm learning and have in some instances even resulted in a higher learning performance. While it still is unclear why this result pattern is so unstable (Xie et al., 2018), these results cannot be adequately explained by a standard CLT perspective (see Seufert et al., 2017). As we have already seen in the previous challenges, a cost-benefit model may be helpful to get an understanding of the mechanism. In terms of a cost-benefit analysis, a low to medium level of disfluency induced by operationalizations such as a slight illegibility may have a low cognitive cost (in terms of working memory capacity), but potentially offers the aforementioned benefit of triggering learners to invest more effort.

In summary, disfluency may be a rather unstable effect, but still exemplifies the basic idea of all challenges discussed in this paper: Getting learners to deeply engage with learning materials may increase learning, even if the perceptual demands (and therefore, the extraneous load) of disfluent learning materials are higher than those of “fluent” materials.

Challenge 4: Realism and Detailed Visualizations

A topic that has recently gained attention due to the increasing use of virtual learning environments is realism. For a long time, realistic details in visualizations were often considered a source of extraneous load (Brucker et al., 2014; Scheiter et al., 2009). As realistic details can make the visual “disassembly” of visualizations more challenging (e.g., Berney et al., 2015; Skulmowski & Rey, 2018b), at least for learners with low spatial abilities (e.g., Huk, 2006), the standard CLT approach would be to avoid realism in favor of more simplified visualizations (Renkl & Scheiter, 2017). Recent results paint a slightly more complex picture. For instance, Skulmowski and Rey (2020b) found that when realistic and schematic visualizations are presented alongside each other in a learning task, the retention performance for the realistic parts of the learning material is higher. At the same time, one combination which contained a particularly large amount of realistic details received higher extraneous load ratings (Skulmowski & Rey, 2020b). As the association between a higher learning outcome and increased cognitive load contradicts CLT, the authors refer to this result as the “realism paradox” (Skulmowski & Rey, 2020b). Results such as these have been framed as examples of the disfluency effect (Skulmowski & Rey, 2018b, 2020b) described in detail in the preceding section. While the explanation of positive effects of realism despite an increase in extraneous load could potentially be due to a low level of disfluency that triggers more effortful processing, other explanations that assume a higher interest being generated through realistic visualizations could also be true (e.g., Goldstone & Son, 2005). Another study that found a specific advantage of realistic details over schematic visualizations during testing concerning performance on a retention test revealed the concreteness of learning and testing materials should be matched in order to foster learning (Skulmowski & Rey, 2021). As in the design of assessments for interactive learning media described in an earlier section, this result further underlines that the specific design of a test can greatly affect performance. In line with this assumption, the benefits of realism were found to be best detectable with a test that is aimed at the advantages of realistic details, for example, visual retention tests (Skulmowski & Rey, 2021). It has been argued that the usefulness of realistic details in digital learning is tied to the objective of a test (e.g., Nebel et al., 2020), thereby deciding whether these details are relevant or irrelevant for the task.

Judging from the literature reviewed in this section, we can conclude that a higher amount of detail in visualizations cannot be seen as a linear contributor to cognitive load and a lowered learning performance. In fact, some of the results indicate that more details can, under very specific conditions, be helpful. As a result, realism and details should be included in a cost-benefit analysis that also considers the task. If a test requires a very fine-grained knowledge of visual details, having learned using a realistic visualization will be a benefit despite the potentially higher extraneous load during the learning phase.

Challenge 5: Redundant Elements and Emotional Design

Various studies on learning have revealed that some elements that are considered redundant or merely decorative can actually enhance learning—despite the fact that these types of additions could in most instances be categorized as extraneous forms of cognitive load. One example for this is the so-called redundancy effect which states that extraneous load can be decreased by avoiding redundant information such as repetitions in texts (Mayer et al., 2001). Interestingly, Mayer and Johnson (2008) found that when a narration is accompanied by a few keywords of that narrated text being presented on-screen, this very subtle form of redundancy can increase learning compared with learning without the presentation of the keywords. However, they explain this exception of the redundancy effect by assuming that this particular type of redundancy is too minor to cause extraneous load. In a related study, Yue et al. (2013) found that a shortened text presented together with an audio narration leads to better recall performance compared with the identical text being presented on-screen and as an audio recording. An on-screen text featuring minor changes from the narrated content resulted in better recall on a descriptive level compared to a match between the two presentations, but this result did not reach significance. We favor the interpretation that both results stem from a small level of extraneous load that triggers a stronger attentional focus during learning. It should be considered that in these two studies, recall performance was more strongly affected than transfer performance, underlining the task-specific effects of this design factor.

A major line of research in digital learning that has contributed towards a shift from a largely cognitive perspective on learning towards an approach focused on affective aspects is called emotional design (for an overview, see Plass & Kalyuga, 2019). The main idea behind emotional design is to embellish learning tasks with elements aimed at inducing positive emotions, and thereby increasing learners’ cognitive engagement (Brom et al., 2018). As a result, the use of emotional design usually involves the addition of enjoyable design elements such as pleasing, warm colors (e.g., Mayer & Estrella, 2014; Plass et al., 2014) and human, baby-like faces (often as a means to anthropomorphize non-human entities such as cells, e.g., Mayer & Estrella, 2014; Plass et al., 2014). A meta-analysis confirmed that such design choices can increase retention, transfer, comprehension, and motivation while at the same time lowering the perceived difficulty of learning materials (Brom et al., 2018). A prominent example is a study in which the learning task was focused on the immune system (Plass et al., 2014, Exp. 2). In this study, emotional design was used to enhance a grayscale schematic visualization of different cells in the human body either through anthropomorphization (achieved through adding simple cartoon facial features to the cells), the inclusion of vivid colors, or a combination of both. Comprehension performance was increased through the use of colors (Plass et al., 2014, Exp. 2). Concerning transfer performance, an interaction effect revealed that the version without colors benefits from anthropomorphization while the colorized version does not (Plass et al., Exp. 2). It must be noted that emotional design is usually achieved through the inclusion of merely decorative elements that do not transform the intrinsic contents of learning tasks. While some authors argue that such elements do not necessarily deplete cognitive resources (e.g., Plass & Kalyuga, 2019), others perceive emotional design cues as potential sources of extraneous load and recommend that this load should be kept to minimum in order to allow learning to occur (Brom et al., 2018). The latter perspective is at odds with CLT, as this would mean that it is possible to sacrifice a small amount of cognitive capacity in order to gain learning performance. Thus, emotional design contradicts CLT in a manner comparable to the disfluency effect described earlier.

In a similar vein as the conflict between emotional design and CLT, there exists a long-standing question of whether decorative elements or “embellishments” help or hinder learning (see, e.g., Carney & Levin, 2002). A CLT perspective would usually argue that unnecessary elements can easily become seductive details (e.g., Harp & Mayer, 1998) that induce extraneous load by distracting learners from the actual learning content. However, there are results that go against this rule. In one such study, Bateman et al. (2010) presented their participants with a minimalist version of different charts and an “embellished” counterpart. For instance, a chart entitled “Monstrous costs” could simply be presented with a simple bar graph indicating rising spending over the years. However, an embellished version used in the study depicts a cartoon monster with a large mouth in which the bars of the bar graph were presented as the teeth of the monster, with labels of the two axes printed on the lips and inside of the mouth. Long-term recall of the more detailed charts turned out to be better than for the simpler versions in that study, with participants judging the embellished charts to be easier to remember and to be enjoyable (Bateman et al., 2010). Again, we should focus on why this particular study found these effects running counter to CLT. Bateman et al. (2010) discuss that the embellishments worked so well because they immediately invoked an emotion that can potentially be memorized alongside the information included in the charts. Furthermore, it must be noted that they found no significant differences for short-term recall performance, but for tests regarding the long-term recall of topics and details (such as the general trend depicted in the illustrations). Hence, they discuss that the additional mental demands of the detailed visualizations may be small compared to the benefits accrued by providing learners with an emotional cue that may help to consolidate chart contents during memorization.

The bottom line of this challenge is that there are instances in which redundant or purely decorative elements that may be considered irrelevant for the actual learning task can enhance performance. The mechanism behind these effects is thought to involve a stronger cognitive engagement that can be elicited through positive emotions. As a result, CLT should be expanded to address how a small investment in irrelevant cognitive processing can yield positive rewards in terms of learning.

Summary and Implications of the Challenges

In sum, these challenges underline several issues with CLT as it is usually understood and applied. One of the main conclusions that needs to be drawn is that the concept of extraneous load should be revised to include the wide range of findings aimed at more precisely specifying the true nature of this concept. Extraneous load needs to be understood as task-specific, since not all learning tests assess the same kind of knowledge. Thus, learning realistic details or interaction patterns can be valuable or irrelevant based on the test used. Furthermore, not all increases in extraneous load necessarily induce a lower learning performance.

Current approaches such as embodied cognition highlight its action-oriented nature (e.g., Wilson, 2002) that is focused on completing specific tasks (Wilson & Golonka, 2013). Learners should therefore be regarded as being interested in attaining information that enables them to accomplish a certain task. While the notion of extraneous load is important for assessing how difficult it may be for learners to acquire the needed information, we argue that it is necessary to emphasize the contextual factors that determine learning success (for a similar approach, see Choi et al., 2014).

For the design of digital learning, these fundamental characteristics of learners and instructional materials have direct implications. As an example, we consider the field of medical education. There are several technology-enhanced methods of training surgeons using continually advancing virtual simulators that have been shown to facilitate learning (e.g., Thomsen et al., 2017). As we have reviewed, several aspects of virtual learning environments, such as immersion, realism, and interactivity, can become a source of extraneous cognitive load. Yet, we know that surgeons need practical training and cannot seamlessly graduate from learning with textbooks to operating on their patients. In this example, extraneous cognitive load arising from potentially overwhelming visual impressions and experiencing an emotionally taxing work environment may all technically be “learning-irrelevant” demands that are still essential parts of the learning task (see Szulewski et al., 2020; see also van Merrienboer & Kirschner, 2018, for an argument for realistic situations in learning).

As an interim summary, it can be noted that CLT offers the compelling notion that there is detrimental cognitive load that can be avoided through good design to put the limited working memory capacity to use for learning the actual learning content. In addition, it has been found that the extraneous load in online learning can have different components. However, the challenges presented above all highlight that there can be design choices in digital learning that induce cognitive load while raising learning performance. As a means to reconcile these results with CLT, a stronger focus on the cost-benefit nature of cognitive load in instructional design may be a remedy. Yet, a cost-benefit approach still does not suffice to explain divergent effects of digital learning settings on different outcome measures (such as retention and transfer) as well as motivational effects that may affect how strongly cognitive load influences learning results. It should be noted that the presented literature allows two mechanisms concerning the interplay between extraneous load and germane processing. The first mechanism can be described using the cost-benefit approach of investing a small amount of (perceptual) extraneous load in order to harvest advantages concerning germane processing. A second mechanism can be compared to the way learners’ expertise can determine whether information is essential or extraneous for a learning task (Kalyuga, 2007; Sweller, 2010). For learners with a high expertise, being presented with previously learned information can become detrimental for their learning process, while low expertise learners still need to learn this information and thus benefit from it (Kalyuga, 2007). Hence, the same information can either be a form of extraneous or intrinsic load depending on learners’ expertise (Sweller, 2010).

As task characteristics such as the nature of the learning test vary, a similar effect can occur in digital learning. For instance, if a learning task induces highly negative emotions that can become an obstacle during learning (e.g., Fraser et al., 2015; Kremer et al., 2019), such a learning task only offers learners an appropriate learning experience if dealing with such emotions is an integral part of the learning task (Szulewski et al., 2020). In the latter case, the negative emotions felt during learning become an intrinsic part of the learning task, while they would constitute an extraneous component otherwise. From the perspective of cognitive processing, it is important to acknowledge that the emotional impact of learning scenarios has been found to be correlated with learners’ subjective assessment of cognitive load (Fraser & McLaughlin, 2019).

Related to this issue is the idea of the information reduction hypothesis (Haider & Frensch, 1996). This hypothesis states that people get better at discriminating between information that is intrinsic or extraneous for a task over time, for instance when some of the information is irrelevant to the task (e.g., Haider & Frensch, 1999). The process of learning to distinguish between useful and irrelevant information in itself can be an important part of a learning task. This is another example in which some degree of extraneous load can end up promoting germane processing and thus becomes intrinsic to the task.

In order to better understand the complex relationship between the extraneous load induced during digital learning, germane processing, and task demands (intrinsic load), we take a look at a theory focused on outcome-based instructional design with an emphasis on the role of motivation in the following section.

From Constructive Alignment Towards Cognitive Load Alignment

From the summarized literature in the five challenges, it follows that the positive effects of certain forms of digital learning can be observed if they are assessed using appropriate methods. For instance, learning with immersive media may lead to an advantage for specific transfer tests, but could have no detectable benefit for retention performance (Makransky et al., 2019). Thus, we argue that this aspect particularly often found in digital learning needs to be better integrated in the CLT model. It needs to be noted that the specificity of instructional goals has been considered in previous discussions of CLT (e.g., de Jong, 2010; Kalyuga & Singh, 2016). In contrast to the focus on the task-appropriate use of extraneous load in this paper, previous reconceptualizations of CLT were mainly concerned with the proper choice of intrinsic load in relation to learners’ expertise and objectives (Schnotz & Kürschner, 2007). However, the substantial perceptual demands of digital learning (due to realism, interaction patterns, or similar features) require a stronger consideration of the relationship between (different types of) extraneous load, germane processing, and assessment methods. Simply put, if we risk inducing a certain level of extraneous load through the inclusion of a digital feature, we need to know which specific form of germane processing can be triggered by it. Consequently, we need to use an appropriate test that will be able to detect the effects of this germane processing.

While the cognitivist perspective of CLT naturally puts the learner and, increasingly, the interaction with their environment (Choi et al. 2014) in focus, the challenges reviewed earlier show that more attention on the targeted learning outcomes and motivation may be necessary for more accurate theory-grounded predictions. To put it in another way, CLT can be regarded as an outlook aimed at removing obstacles in learners’ journeys towards meaningful understanding and memorizing. Outcome-oriented educational approaches, however, try to shape this journey by first determining the end of that route, namely by setting specific learning targets. One such approach is constructive alignment (Biggs, 1996). Constructive alignment combines a constructivist perspective on learning with principles from Cohen’s (1987) instructional alignment (Biggs, 1996). It is constructivist in the sense that it emphasizes the role of learners in developing knowledge through activities rather than relying on direct instruction by educators (Biggs, 1999). More relevant for our goals, Biggs (1996) explains that instruction needs to be geared towards specific objectives, but clarifies that these need to be broad enough to give learners an opportunity for deep forms of learning. He particularly criticizes instructional styles that have the primary aim of preparing learners for narrowly defined testing situations at a low level of cognitive processing. Instead, he emphasizes the potential of more engaging activities. The “alignment” part in the approach of constructive alignment stems from the aim to match learning activities with specific outcomes. Biggs (1999) summarizes that different learning activities lead to different ways of learning. For instance, learning in groups is said to lead to elaboration-based and problem-based learning (Biggs, 1999). In turn, different assessment methods are described to tap into knowledge gained from specific forms of learning. As an example, performance-based assessments such as projects are thought to have a high potential for assessing how well learners can apply their knowledge (Biggs, 1999).

However, the constructivist aspect included in constructive alignment has sparked a debate, in particular when viewed through a CLT lens. Previous, more quantitatively oriented research found that generative activities in line with a constructivist or “learning-by-doing” approach can lead to impeded learning performance due to cognitive load induced by the learning activities. For instance, Stull and Mayer (2007) showed that generating graphic organizers can reduce transfer knowledge compared with the use of pre-made graphic organizers. Similarly, letting learners generate their own drawings rather than being provided with ready-made visualizations has decreased learning performance in a number of studies (e.g., Leopold et al., 2013; Schwamborn et al., 2011), although a recent review by Fiorella and Zhang (2018) still found positive effects of drawing. In line with the principle of alignment between learning and assessment discussed earlier, drawing during learning was found to be helpful for a later drawing test (Schmidgall et al., 2019).

To conclude, constructive alignment teaches us that successful learning rests on the alignment between objectives, cognitively engaging activities, and appropriate assessment methods. These three components need to be more strongly considered in CLT-based digital learning research. After having discussed CLT and the challenges posed by digital learning as well as having presented constructive alignment, we will turn towards a potential synthesis of the two approaches.

Cognitive Load Alignment

In order to facilitate the application of CLT to the instructional design of digital learning settings, in particular perceptually rich scenarios, we present the strategy of cognitive load alignment. As the basis for this approach, the cost-benefit model with differentiated extraneous load types presented in Fig. 1 is used. Cognitive load alignment emphasizes the connection between different design factors (and their associated cognitive load), the particular form of germane processing they promote, and the type of assessment they require to have a measurable impact (see Fig. 2). The model thus merges the notion of different forms of learning (reformulated as germane processing) and their relationship to specific types of assessment presented by Biggs (1999) with CLT.

Fig. 2
figure 2

A fictional example analysis of cognitive load alignment in a digital learning task involving the design factor of interactive responses. In order to actually promote the desired learning outcome, the different types of cognitive load must be in alignment with the germane processing needed for a learning task (similar to the notion of forms of learning of Biggs, 1999, p. 68) and need an appropriate method of assessment (Biggs, 1999, p. 70) in order to have a detectable benefit. In the example of interactivity, a motion-based assessment type can be considered a well-aligned test type, since interactive controls may trigger procedural learning as a form of germane processing (Johnson-Glenberg & Megowan-Romanowicz, 2017). However, if the learning test is text-based, there may be a mismatch between the interactive design and the assessment method. As interactivity has been shown to shift learners’ focus away from verbal content (e.g., Song et al., 2014), the employment of a text-based test would not tap into the germane processing of procedural learning induced by interactivity. Therefore, for this type of task, interactivity would constitute a form of ECL. Learner characteristics may be an additional factor that determines whether germane processing is induced

Let us consider the example from Fig. 2 in more detail. Figure 2 depicts how learning with interactive learning media can be modeled using cognitive load alignment. The model contains a total cognitive load capacity that can get filled up using intrinsic and extraneous load. As an example for design factors associated with extraneous load, we focus on the effects of interactivity. In line with the cost-benefit approach, interactivity can have cognitive benefits and costs. When analyzing or planning digital learning settings, cognitive load alignment can be realized through determining which germane processing activity (i.e., form of learning) these design choices promote and which assessment method is most likely to detect the benefits. In the example of interactivity, procedural forms of learning may be triggered that can best be assessed using a motion-based assessment method (based on Johnson-Glenberg & Megowan-Romanowicz, 2017). In this case, the cognitive load required during learning is aligned with the desired learning outcome. If, for instance, a verbal test was to be administered as the assessment instead of the motion-based test, there would be a cognitive load misalignment. A written test could not properly measure the benefits of interactivity, but rather, only emphasize the costs (see, e.g., Song et al., 2014). This alignment between cognitive load, germane processing, and the assessment method can be considered to be the deciding factor whether cognitive load becomes intrinsic or extraneous for a task (see the discussion based on the expertise reversal effect in a previous section). If there is a misalignment between the design, cognitive processing, and assessment, there still may occur “germane” processing. However, if this misaligned germane processing is not conducive to the learning test, it may remain unnoticed and thus becomes extraneous to the task. On a related note, some results suggest that digital learning requires a good matching of expertise in the form of prior content domain knowledge, as has been shown in a study on a game-based simulation for medical training (Dankbaar et al., 2016). In that study, medical students did not learn more if they used a game-based simulation than using other methods such as case-based learning. Interestingly, this game-based approach did increase the learning performance of medical residents (Dankbaar et al., 2014), underlining the importance of an expertise-appropriate design of digital learning that delivers information at an appropriate level of knowledge.

Based on the research results summarized above, we hypothesize that in many instances, learning can be enhanced by the introduction of elements that may incur the cost of a slightly raised level of extraneous load, but whose benefits can be substantial. This hypothesis should not be misunderstood as an excuse to recklessly induce cognitive load through poor design choices in the misguided hope of raising learning performance. Rather, we argue that there are instances in which one may deviate from the CLT heuristic of extraneous load reduction in the case that specific design features entail cognitive, motivational, or other benefits despite (potentially) causing a low degree of additional extraneous cognitive load. In particular when using forms of digital learning such as virtual reality environments, it is nearly impossible to avoid extraneous load due to more demanding input devices or controls, the design of visualizations, immersion, or other design factors. It should be noted that this additional extraneous load involved in digital learning often stems from perceptual demands in line with Whelan’s (2007) definition of extraneous load. Instead of aiming at the avoidance of extraneous load, we argue that it is generally more promising to utilize the possibilities of online learning by maximizing the potential for motivation or other (cognitive) benefits. This strategy, however, only works if these benefits are actually relevant for the learning objectives. The second cognitive route that instantiates extraneous load is the misalignment between objectives and the design of the learning task. As discussed before, this misalignment can lead to a potentially useful learning experience involving the creation of schemas in long-term memory going unnoticed, thus transforming potentially useful cognitive processing activities into a waste of time and resources.

The benefits of digital education generally will only be noticeable when they are assessed using appropriate methods. Based on the notion of constructive alignment, the employment of design factors associated with extraneous load needs to have an evidence-based link to a specific improvement in one or more facets of learning. The assessment method must be chosen to appropriately capture this advantage. The choice of this method can be guided by considering which forms of germane processing are triggered or enabled by the given design factor and whether the type of assessment is dependent upon these particular learning processes. The major difference to standard CLT models is a shift in focus from reducing extraneous load towards creating engaging learning experiences. At the same time, the aim is to keep the extraneous load that is not connected to germane processing to a minimum. This unavoidable cognitive load needs to be aligned with the learning objectives in order to ensure optimal learning outcomes.

On a more general note, we do not wish to have our discussion of constructive alignment misunderstood as signifying that we want to import all of the assumptions of that theory into CLT. Rather, we merely want to encourage CLT researchers to gain an additional perspective that focuses not only on aspects of cognitive processing but also on the context of the learning task—an important trend in theoretical advances in CLT (e.g., Choi et al., 2014).

Future Perspectives

While the literature reviewed in this paper overwhelmingly supports the strategy of cognitive load alignment, new experiments explicitly devised to test this approach are needed, particularly in (large-scale) real-world learning settings. Yet, studies of a smaller scale could provide some initial data for a more precise computational model and validation of the approach. In response to a recent emergence of interest concerning cognitive modeling in the CLT community (e.g., Wirzberger et al., 2020), this model may present a fruitful starting point for computational investigations into the design of digital learning settings.

Related to the issue of cognitive modeling, empirical studies need to provide more information concerning the temporal development of the load induction of the different design factors. Temporal changes in cognitive load and methods to appropriately capture them have been discussed for several years (e.g., Kalyuga & Singh, 2016; Paas et al., 2003). Paas et al. (2003) use the distinctions of cognitive load concepts introduced by Xie and Salvendy (2000). They describe that over time, the current cognitive load (referred to as instantaneous load) during a task can change and may be used to compute an average cognitive load across a particular task (Xie & Salvendy, 2000). It will be particularly useful to compare the temporal dynamics of these load measures in the context of digital learning.

De Jong (2010) summarizes that many CLT-based studies use very short learning phases of only a few minutes. In addition, he notes that there often is no real incentive for study participants to learn the materials given the artificial situation of a research study. Considering these limitations, it may be more suitable to consider such studies as evaluations of an initial cognitive load. The temporal development of cognitive load during digital learning is an important aspect that should be investigated in relation to potential increases in germane processing. While it is plausible that some forms of perceptually rich learning can induce a higher initial cognitive load (for instance, during the time interactive controls need to be learned) that decreases over time, it is also possible that an initially high engagement can wear off (for a recent analysis concerning the dynamics of learning processes, see Tetzlaff et al., 2020).

Another potential example for the temporal dynamics involved in learning would be listening to background music during learning. The cognitive effects of background music have been extensively investigated (e.g., Doyle & Furnham, 2012; Moreno & Mayer, 2000), with a meta-analysis by Kämpfe et al. (2011) indicating a disadvantage for listening to music for memory-related variables. Negative effects of background music have been explained by the idea that music can act as a form of extraneous load (Moreno & Mayer, 2000). However, it may be the case that the investment of a low degree of extraneous load induced by background music could enhance learners’ willingness to invest more time into a learning task, thereby potentially leading to a positive impact of music over the long term. In addition, learners’ intrinsic motivation might be increased by letting them exercise an autonomous decision regarding the choice of their background music (see Ryan & Deci, 2020, for an overview of the effects of autonomy on motivation). However, this hypothesis regarding the temporal development of a small investment of extraneous load on more long-term behavior patterns will need to be investigated empirically.

The temporal dynamics of cognitive load further complicate the choice of design factors, but empirical investigations might result in more concrete guidelines concerning the temporal aspects of digital learning. Kalyuga and Singh (2016) suggest to consider cognitive load at smaller intervals and Kalyuga and Plass (2017) present a method in which participants’ verbal reports during learning tasks are analyzed concerning expressions of their cognitive load. Using more fine-grained temporal analyses of cognitive load and germane processing, it may become easier to predict the development of these two variables.

As digital learning research is usually concerned with the effects of extraneous processing, this has been the focus of this paper. However, future research should be conducted with the aim of gaining more insight into the relationship between intrinsic load and germane processing, in particular whether combinations of different levels of intrinsic load can affect how strongly learners are influenced by extraneous load.


As we have seen in this review, digital learning presented new challenges for CLT. Interactive learning media, immersion, realism, disfluency, and emotional design all have the same design feature in common that learners are required to invest a small amount of extraneous load in order to allow certain forms of germane processing to occur. This cost-benefit approach also requires a strong focus on choosing appropriate assessment methods. Without an alignment between the extraneous load associated with some design factors, their particular form of germane processing, and appropriate test types, the benefits of digital learning may not be detected and only the cognitive costs become visible. However, the five design factors reviewed in this paper have been shown to improve learning outcomes if their use is tied to the goal of a specific form of germane processing which can then be assessed with an appropriate test. If these conditions are met, digital learning can be used to foster learning despite minor increases in extraneous load. Further research is needed to model these relationships in a more precise and perhaps even quantifiable manner. For instance, it would benefit the field to develop more exact guidelines concerning exactly how much extraneous load is acceptable in order for specific forms of germane processing to have a net benefit.