Introduction to the Cognitive Theory of Multimedia Learning

The cognitive theory of multimedia learning (CTML) represents my continuing and evolving attempt to understand how meaningful learning works. Meaningful learning occurs when the learner engages in appropriate cognitive processing during learning, including attending to the relevant information in a lesson (i.e., selecting), mentally organizing the incoming information into a coherent cognitive structure (i.e., organizing), and connecting it with relevant knowledge activated from long-term memory (i.e., integrating; Mayer, 2021, 2022). Meaningful learning is indicated by performance on transfer tests, which involves being able to use the learned material in new situations. CTML focuses on how people learn meaningfully from academic material containing words and graphics. It focuses on techniques that prime appropriate cognitive processing during learning.

The development of new theories such as CTML is shaped by existing theories. For example, Camp et al. (2022) point out that CTML builds on classic conceptual frameworks from cognitive psychology, including dual-coding theory (Paivio, 1986), multi-stage model of memory (Baddeley, 1986), and cognitive load theory (Sweller, 1999, in press; Sweller et al., 2011). In particular, Camp et al., (2022, p. 17) note:

The power of theories is that they have the potential to remain valid and relevant through generations, while the power of new theories is that they build upon and expand those old theories. You might say that the old theories are the giants upon whose shoulders new theories, and thus new giants, stand. The cognitive theory of multimedia learning (CTML) ...can be regarded as one such new giant.

The creation of CTML is clearly a team effort within the field of educational psychology. My modest goal in this essay is to share some introductory insights gleaned from my efforts at theory-building over the past decades, describe the evolution of CTML, summarize the current state of CTML, and speculate on possible future directions for CTML. First, in line with Greene’s (2022) call to carefully examine the process of theory development in educational psychology, I share seven insights about theory building that are exemplified in my decades-long attempts to explain how people learn from multimedia lessons.

Insight 1: Theory Building Depends on Intellectual Curiosity

CTML is a manifestation of my lifelong quest to understand how meaningful learning works. I began my journey on one of those amazing, crisp autumn days in Ann Arbor in 1969, as I started my graduate career in psychology at the University of Michigan. Soon within my first year in graduate school, under the mentorship of James Greeno, I developed a deep curiosity about how meaningful learning works. How can we teach so people can take what they have learned and use it productively in new situations? This seemingly simple question about teaching for transfer has been a driving force in my lab across more than four decades. Specifically, CTML focuses on some fundamental questions about learning and instruction with multimedia materials: How do people learn academic material consisting of words and graphics (i.e., multimedia learning)? How can we help people learn academic material consisting of words and graphics (i.e., design of multimedia instruction)? My curiosity about these kinds of questions is the engine that drives the development of CTML.

How did I get started on this theory-developing journey? During my first year of graduate school, while I was struggling to build my identity as a research psychologist, I had the good fortune to take a course entitled, Models of Thinking, taught by my advisor, Jim Greeno. That course got me thinking about how people are able to come up with creative solutions to problems, which lead me to a basic question: “How can we help people learn in ways so that they can take what they have learned and apply it to new situations?” This question about teaching for transfer, which formed in my mind during that course, has stuck with me all these years and has driven my research. I soon discovered that this is a classic, albeit elusive, issue both in psychology and in education that dates back to early days of research in learning and instruction.

Insight 2: Theory Building Is Grounded in Old Ideas

The search for theories of meaningful learning has a long history, including the insightful work of the Gestalt psychologists such as Wertheimer (1959) and Katona (1940); the groundbreaking work of developmentalists such as Piaget (1926) and Vygotsky (1978); and the creative work of memory researchers such as Bartlett (1932). In the field of educational psychology, a focus on instruction for meaningful learning has its roots in generative theories of learning by pioneers such as Wittrock (1974, 1989) and Ausubel (1968). Some key ideas rising from this work are meaningful learning as assimilation to schema (i.e., connecting incoming information with existing knowledge), meaningful learning as a generative activity (i.e., actively attending to relevant material, organizing it into a coherent structure, and relating it to relevant prior knowledge), and meaningful learning as knowledge construction (i.e., building mental representations in working memory). The development of CTML represents my attempts to understand and clarify these intriguing ideas based on empirical testing.

How did I get this insight? As an undergraduate, my goal was to read every classic psychology book, but in graduate school, my goal became more focused on reading every classic book related to how learning works and, particularly, books that could help me better understand my driving question of how to teach for transfer. I found amazing used bookstores in Ann Arbor, where I spent a lot of my spare time on a treasure hunt for books by the likes of Piaget (1926), Bartlett (1932), Wertheimer (1959), Katona (1940), Ebbinghaus (1913), Ausubel (1968), and many others. I was moved by the introductory quotation I found on the cover page to Ebbinghaus’ nineteenth century book, Memory: “From the oldest subject we will build the newest science.” Even as a beginning graduate student, I felt privileged to contribute in even a small way to this newest science of how the human mind works, which in my case turned out to the science of learning and instruction.

Insight 3: Theory Building Is Not a Straight, Planned-Out Path

My search for a theory of meaningful learning has not taken a straight, planned-out path, but rather has progressed in an irregular pattern of small steps based on fortuitous findings over decades, with invaluable contributions from a long list of collaborators and colleagues. CTML was not based on a systematic step-by-step career plan, and it certainly did not pop out of my head one day all in one nice polished package. It came out in fits and starts, with a long series of revisions, refinements, clarifications, deletions, and additions.

Where does this idea come from? I never set out to have a long-term career plan of systematic research on multimedia learning, but rather I have much shorter 2- or 3-year plans targeted on specific research questions. For example, back in the 1980s, after reading all of David Ausubel’s (e.g., Ausubel, 1968) writings, I became obsessed with advance organizers—material that is presented before a lesson that is intended to improve learning through activating relevant prior knowledge. That line of research led me to the unexpected conclusion that visual advance organizers could be effective in helping learners relate a new concept with familiar prior knowledge, which got me thinking about the power of connecting visual and verbal representations as a route to meaningful learning. As that idea incubated in my mind throughout the late 1980s and 1990s, I became interested in how to incorporate illustrations in text. That fateful exploration brought me to the formulation of what was to become the multimedia principle—people learn better from words and pictures than from words alone. In short, I often do not know where my theory-building is going but I simply follow the fruitful paths that my current research takes me.

Insight 4: Theory Building Is an Engineering Problem

Based on this journey, I have come to realize that theory building in educational psychology is much like solving a practical engineering problem. However, instead of building an increasingly better device to carry out some function, our theory-building task is to build a progressively better explanation of some educationally relevant phenomenon (such as how students develop meaningful learning that transfers to new situations). I just keep tinkering with the theory trying to make it work better based on new research evidence and logical reasoning. One indication of working better is the rate at which the work is cited by others and incorporated into their frameworks, as documented by Camp et al. (2022).

Where does this insight come from? When I look at the evolution of CTML, particularly through the ways I represented it visually, I see that I started with a core idea—there are several cognitive conditions for meaningful learning—and then continually tweaked the idea based on the research evidence that it generated and the new ideas I came across. When I read Paivio’s (1986) work on dual-coding, I realized its relevance and tried to incorporate that idea. When I read Miller’s (1956) and Baddeley’s (1986) and Sweller’s (1999) work on limited working memory capacity, I realized that fundamental idea had to be part of my theorizing. When I read Wittrock’s work on generative learning, it validated my thinking that appropriate cognitive processing during learning was a key to meaningful learning and led to my incorporating the SOI model (based on selecting, organizing, and integrating). As in engineering, my approach has been to keep redesigning what I have to make it better—which in my case means making it better able to explain a wider set of findings. In this way, my work began with what Kuhn (1962) would call a paradigm shift—a shift from behaviorist to cognitive views of how learning works—but then for all these ensuing years has involved continually improving on my original idea.

Insight 5: Theory Building Is an Iterative Process Involving the Persistent Interplay Between Research and Theory

As shown in Fig. 1, consistent with Greene (2022), I start with the kernel of a theoretical idea (i.e., theory), which leads me to a testable question (i.e., research question) that I examine in series of experiments (research design), which generates a pattern of findings (research evidence) that helps me improve my theoretical account (theory). The development of CTML represents many turns around the circle presented in the figure (both clockwise and counterclockwise). These turns depend on valuable discussions with colleagues concerning how to frame the next version of the theory, how to generate useful research questions (and know when to give up on unfruitful ones), how to design impactful studies, and how to interpret the results. The result is a series of iterations of a theory of how meaningful learning works, as described in the following sections.

Fig. 1
figure 1

The theory-research cycle

Where did I get this insight? Let me give an example. Most of the early work by my collaborators and me on instructional design principles for multimedia learning focused on the instructional goal of reducing extraneous processing—that is, reducing the learner’s cognitive processing that is not directed towards learning the content so that the learner can use cognitive resources to make sense of the material. This led, for example, to the coherence principle in which people learn better when we remove unneeded visual and verbal material from a lesson. However, for some lessons, even when we eliminated unneeded material, students still had trouble learning it, perhaps because they just did not want to put out effort to make sense of the material. This led us to develop a new kind of instructional design goal, which I called fostering generative processing—motivating the learner to actively engage with the material. This goal suggested a whole new set of multimedia learning techniques based on prompts to engage in generative learning activities during learning, such as writing a summary, drawing an illustration, creating a graphic organizer, and explaining to others. In this way, the research evidence (i.e., not being able to improve learning through reducing extraneous material), led to a new theoretical idea (i.e., fostering generative processing) which led to new research avenues (i.e., incorporating generative learning activities).

Insight 6: Theory Building Depends on Persistence in Collecting New Research Evidence

Also in line with Fig. 1, building the cognitive theory of multimedia learning depends on a persistent commitment to experimental comparisons using a value-added design. In value-added experiments, we compare the learning outcomes (and, when possible, learning processes) of people who learn from a base version of a multimedia lesson with those of people who learn from the lesson with one feature added. For example, we can compare learning from a narrated animation on lightning formation versus learning from the same lesson with the words presented as printed captions at the bottom of the screen. Theory development depends on a strong foundation of research evidence, which flows from a willingness to replicate effects detected in value-added studies.

Where did this insight come from? Let me give you an example. When Roxana Moreno was a graduate student in my lab, she came to my office with the preposterous (to my way of thinking) idea that using conversational language in an online science lesson would improve learning outcomes over our existing lessons using traditional, formal language. In spite of my skepticism, we worked out plans for a series of rigorous experiments, each one of which came back with strong positive results. After replicating the effect multiple times with multiple content topics, I finally allowed myself to share the personalization principle with the larger research community, that is, the idea that people learn better when instructors use conversational wording rather than formal wording. This opened my eyes to the idea that online multimedia learning depended not only on cognitive processing, but also social and affective processing, which has led to several ongoing new directions for our research (Horovitz & Mayer, 2021; Lawson & Mayer, 2022; Lawson et al., 2021).

Insight 7: Theory Building Is a Team Activity

CTML would not have happened if I was tasked with working alone. I have had the pleasure of working with dozens of collaborators over the years who have helped build CTML. The theory benefits from collaborations with students, campus colleagues, and fellow educational psychologists from near and far, including visitors from around the world who have contributed to our lab over the years. The theory has been shaped by advances and feedback from the larger community of scholars, many of whom are represented in the various editions of The Cambridge Handbook of Multimedia Learning (Mayer, 2005, 2014; Mayer & Fiorella, 2022). CTML also has benefitted from ideas from competing theories such as cognitive load theory (Paas & Sweller, 2022; Sweller, 1999, in press; Sweller et al., 2011) or the integrated model of text and picture comprehension (Schnotz, 2022, 2023). Part of theory building is being able to convince your peers and to be convinced by them.

Where did this insight come from? As I look over the hundreds of research papers on multimedia media that have my name on them, I see that the vast majority were co-authored with graduate students or visitors to my lab. I seek to meet on a weekly basis with our lab research team—graduate students, postdocs, and visitors. I also meet regularly with students, visitors, colleagues, and anyone else I can find who is interested in talking about improving multimedia instruction. I cherish these meetings because they are essential in giving me a chance to work out new ideas about multimedia learning.

The Past of the Cognitive Theory of Multimedia Learning

I did not plan to spend a substantial portion of my academic career on developing the cognitive theory of multimedia learning. In this section, I briefly describe how it happened, including how I stumbled upon an appropriate name and a concise visual representation, how I searched for a conceptual framework, and how CTML has grown in terms of research base and design principles.

Stumbling Upon a Name for the Theory

An important challenge in theory development is to find a name that highlights the key concepts in the theory. Table 1 summarizes the names leading up to the cognitive theory of multimedia learning, which highlighted an evolving collection of inter-related concepts. I began with “model of meaningful learning” (Mayer, 1989), which focused on the external conditions for instruction for meaningful learning—namely, having potentially meaningful material, having learners who need help, having illustrations that provide help, and having a test that can detect meaningful learning outcomes. I changed the name to “model of conditions for effective illustrations” in Mayer & Gallini (1990) but retained the same set of conditions albeit in a different order.

Table 1 Name changes leading to the cognitive theory of multimedia learning and beyond

How did I progress to the next levels? In continually discussing our theoretical account with students and colleagues, it became clear that this initial model captured the external factors involved in meaningful learning but did not adequately address the internal cognitive processes involved. If I wanted to know how meaningful learning works, I would have to consider the cognitive processes during learning. I was heavily influenced by three threads of scholarship I had been reading about—dual-coding such as articulated by Paivio (1986), limited working memory capacity such as articulated by Miller (1956) and Baddeley (1986) and Sweller (1999), and active cognitive processing during learning such as articulated by Wittrock (1974, 1989). First, I focused on dual-coding.

As a result of my dissatisfaction with a focus solely on external conditions of meaningful learning, the names shifted to focus on the internal conditions for meaningful learning by considering cognitive processing in the learner’s information processing system. Different names emphasized different aspects of cognitive processing such as having dual-channels for verbal and visual material, having limited capacity for cognitive processing, and engaging in generative processing during learning. First, I emphasized the concept of dual-coding (i.e., separate information processing channels for auditory and visual material) with names like “dual-coding model” (Mayer & Anderson, 1991, 1992) and “dual-processing model of multimedia learning” or “dual-processing theory of working memory” (Mayer & Moreno, 1998). The idea of separate channels for processing visual and verbal material was to become a central feature of CTML, as represented by the two rows in the current model.

Then, the name was broadened to differentiate among three cognitive processes during learning—selecting relevant material for further processing, organizing it into a coherent cognitive representation, and integrating it with relevant prior knowledge. Mayer (1996) referred to this idea as the “SOI model,” and other papers used the term “generative theory” (Mayer, 1997; Mayer et al., 1995) or “generative theory of multimedia learning” (Mayer, 1997; Plass et al., 1998) or “generative learning theory” (Fiorella & Mayer, 2015, 2016) or “generative theory of learning” (Mayer, 2010). The SOI model was a major conceptual breakthrough for me, and has remained at the core of CTML ever since, as represented by the arrows in the current model. In short, the SOI model represents the core cognitive processes that drive CTML.

Finally, we began using the name, “cognitive theory of multimedia learning” in Mayer et al., (1996, 1999), Mayer (1997), and Moreno & Mayer (2000). We also elaborated on the underlying ideas of the cognitive theory of multimedia learning in Mayer & Moreno (2003) and all editions of Multimedia Learning (Mayer, 2001, 2009, 2021) and all editions of The Cambridge Handbook of Multimedia Learning (Mayer, 2005, 2014; Mayer & Fiorella, 2022). This approach included the idea of limited capacity and the distinction among extraneous, essential, and generative processing in which cognitive capacity directed at extraneous processing reduced the capacity available for essential and generative processing.

How am I progressing beyond the current model of CTML? Research evidence came pouring in that alerted me to the idea that there may be more to multimedia learning than cognitive processing. We began to find evidence for the role of social process (e.g., how using conversational language can build social rapport) and affective processing (e.g., how the instructor’s gestures and tone of voice can affect learning), and evidence for the role of motivational factors (e.g., benefits of training for self-efficacy in multimedia lessons) and metacognitive factors (e.g., role individual differences in executive function in learning from distracting lessons). We are now grappling with how to represent these additions to CTML.

As we move to expand CTML, as summarized in Table 2, we supplemented CTML with “social agency theory” (Atkinson et al., 2005; Mayer et al., 2003), which incorporates social processes during learning, and with the “cognitive-affective theory of learning with media” (Moreno & Mayer, 2007) and the “cognitive-affective model of e-learning” (Lawson & Mayer, 2022; Lawson et al., 2021), which incorporates affective processes during learning.

Table 2 Adjunct theories to the cognitive theory of Multimedia Learning

The name changes reflect a shift from a focus on external conditions to internal processes ranging from dual-coding processing to generative processing to social and affective processing. Although it took us more than a decade to get there, throughout the twenty-first century, we have landed on the “cognitive theory of multimedia learning” as the name of our theory.

Inching Towards a Visual Representation of the Theory

Although it took many iterations to find a suitable name, we also struggled with finding an appropriate visual representation of the theory. I have found that visual representations help me better understand and improve on the theory, so I generally start with a visual representation and then express my ideas in words. In the case of CTML, it took many tries at building a flowchart that could represent the theory concisely and accurately. Our earliest attempts are shown in Figs. 2 and 3, which depict the external conditions for effective multimedia instruction—having meaningful text, having complimentary illustrations, having learners who need help, and having a test that taps meaningful learning (Mayer, 1989; Mayer & Gallini, 1990).

Fig. 2
figure 2

External conditions for meaningful learning (Mayer, 1989)

Fig. 3
figure 3

Alternative version of external conditions for meaningful learning (Mayer & Gallini, 1990)

Next, we shifted from a vertical flowchart depicting external conditions of meaningful learning to a vertical flowchart depicting steps in a dual-coding model (in Fig. 4; Mayer & Anderson, 1992). In a further refinement of the dual-coding model, we flipped to a horizontal flowchart involving a visual channel and an acoustic channel (in Fig. 5; Mayer & Moreno, 1998). Around the same time, we developed a more inclusive flowchart based on a generative theory that broadened the cognitive processes to include selecting, organizing, and integrating, but without the dual channels (in Fig. 6; Mayer, 1996). Figure 7 shows a flowchart version of generative theory (with selecting, organizing, and integrating) that also begins to incorporate dual channels involving text and illustrations (Mayer, 1997; Mayer et al., 1995).

Fig. 4
figure 4

Dual-coding model (Mayer & Anderson, 1992)

Fig. 5
figure 5

Dual-processing theory of working memory (Mayer & Moreno, 1998)

Fig. 6
figure 6

The SOI model (Mayer, 1996)

Fig. 7
figure 7

Generative theory of textbook design (Mayer, 1997; Mayer et al., 1995)

Finally, in Fig. 8 (Mayer, 2001; Mayer et al., 2001), we refined those previous flowcharts to include all three features of the theory: dual-channels as represented by an auditory row across the top and a visual row across the bottom; limited capacity as represented by boxes for sensory memory, working memory, and long-term memory; and generative processing as indicated by arrows for selecting, organizing, and integrating. This has become the stable flowchart representation we have used to depict the cognitive theory of multimedia learning throughout the twenty-first century. It is the single most important representation of CTML, and it continually helps me think about how multimedia learning works and the implications for instructional design. On reflection, it seems fitting that the most important statement of CTLM is itself a multimedia representation consisting of words and graphics.

Fig. 8
figure 8

Cognitive theory of multimedia learning (Mayer, 2001; Mayer et al., 2001)

Going beyond the standard flowchart for CTML in Fig. 8, we also have added some complementary flowcharts for social agency theory (Mayer, 2009), which adds social processing, and for the cognitive-affective model (Lawson & Mayer, 2022; Moreno & Mayer, 2007), which adds affective processing. The changes in our flowchart reflect refinements and additions in the theory, as we grappled with how to integrate an inter-related set of concepts about dual channels, limited capacity, and active processing.

Adding to the Research Base

Across four decades, the research base for the cognitive theory of multimedia learning has grown substantially, which enabled further theory development. Table 3 shows the number of experimental tests conducted by my colleagues and me as well as the number of multimedia design principles we have proposed based on those studies across the three editions of Multimedia Learning (Mayer, 2001, 2009, 2021). Starting with our first multimedia learning studies in 1989, we have been able to generate 15 evidence-based principles based on more than 200 experiments conducted by my colleagues and me.

Table 3 Growth of research base across three editions of multimedia learning

Table 4 lists the principles that were included in each of the three editions of Multimedia Learning. As can be seen, we began mainly with principles aimed at minimizing extraneous processing—cognitive processing that does not support the instructional goal—such as eliminating unneeded material (i.e., coherence principle). Then, we added principles aimed at managing essential processing—cognitive processing aimed at representing the material in working memory—such as pausing a continuous video to create self-paced segments (i.e., segmenting principle). Lastly, we added principles aimed at fostering generative processing—cognitive processing aimed at making sense of the material—mainly with new technologies such as asking learners to summarize or explain what they are learning (i.e., generative activity principle).

Table 4 Growth of principles across three editions of multimedia learning

Of course, the growth of the research base supporting CTML goes far beyond what our lab produces and includes an ever expanding network of researchers around the world. As summarized in Table 5, some of this work is described in the three editions of The Cambridge Handbook of Multimedia Learning (Mayer, 2005, 2014; Mayer & Fiorella, 2022).

Table 5 Growth of research base across three editions of the Cambridge Handbook of Multimedia Learning

This growth in the research base is supported by improvements in assessment of learning outcomes—especially the development of appropriate transfer tests. I am a strong proponent of replication and for searching for boundary conditions under which the various design principles apply. Overall, theory development depends on a storehouse of research evidence generated by labs around the world.

The Present State of the Cognitive Theory of Multimedia Learning

The cognitive theory of multimedia learning is an evidence-based description of how people learn from multimedia instructional messages. A multimedia instructional message is instructional material consisting of words (e.g., printed text or spoken text) and graphics (e.g., illustrations, photos, animation, video, or immersive virtual reality) intended to foster new knowledge or skills in a learner. A multimedia instructional message can be presented in print (e.g., as a book), on a computer screen (e.g., as an instructional video or narrated animation or a simulation game), or in virtual reality via a head-mounted display (e.g., as an interactive simulation). The theory yields implications for the design of effective multimedia instructional messages, which are rendered as design principles.

Some of the advances in CTML have been fostered by advances in educational technology including instructional video, animation technology, technologies for creating onscreen agents, immersive virtual reality, and educational games, but it was not my intention to study educational technology per se. In fact, the theory began by studying learning from printed text and illustrations rendered on paper, and my focus has always been on how to design effective instruction involving words and graphics. In short, my focus is on how to help people learn academic content rather than on the capabilities of the latest educational technologies.

In this section, I summarize the current state of the cognitive theory of multimedia learning including the guiding assumptions, the memory stores, the cognitive processes, the demands on cognitive capacity, and three instructional goals. I also summarize 15 evidence-based multimedia instructional design principles based on CTML. More detailed descriptions are available in Mayer (2021, 2022).

The cognitive theory of multimedia learning is represented in Fig. 8. A multimedia instructional message enters the learner’s cognitive system through their eyes and ears. Printed words and graphics are held briefly in visual sensory memory, and spoken words are held briefly in auditory sensory memory. As these images fade, the learner can pay attention to some of the material, which is transferred to working memory for further processing. In working memory, the learner can organize the pictorial material into a pictorial model and the verbal material into a verbal model and integrate corresponding pictorial and verbal representations with each other and with relevant knowledge from long-term memory. The outcome is meaningful knowledge that is stored in long-term memory and can be applied to new situations.

Guiding Assumptions of the Cognitive Theory of Multimedia Learning

The cognitive theory of multimedia learned as represented in Fig. 8 is based on three guiding assumptions derived from cognitive science: dual channels, limited capacity, and active processing. The dual-channels assumption is that humans have separate but interacting channels for processing auditory/verbal information and pictorial/visual information (such as narration and animation, respectively). The limited-capacity assumption is that humans can process only a few pieces of information in each channel at one time. The active-processing assumption is that meaningful learning occurs when the learner engages in appropriate cognitive processing during learning, including selecting relevant material to attend to in a lesson, mentally organizing the incoming material into a coherent representation in working memory, and mentally connecting it with corresponding representations and with relevant prior knowledge activated from long-term memory.

Three Memory Stores in the Cognitive Theory of Multimedia Learning

CTML has three memory stores, which are represented as boxes in Fig. 8 sensory memory, working memory, and long-term memory. Sensory memory holds complete visual images (in visual sensory memory) that enter through the eyes and complete auditory images (in auditory sensory memory) that enter through the ears, but the images fade rapidly within a fraction of a second. Working memory holds pictorial/visual and auditory/verbal pieces of information that the learner has attended to before they decay from sensory memory. This information can be re-arranged, but only a few pieces of information can be processed in each channel at any one time. Long-term memory is the learner’s permanent storehouse of knowledge, parts of which can be activated and brought into working memory during learning.

Five Cognitive Processes in the Cognitive Theory of Multimedia Learning

CTML has five cognitive processes, which are represented as arrows in Fig. 8: selecting words, selecting images, organizing words, organizing images, and integrating. Selecting words refers to attending to relevant parts of the printed text, and selecting images refers to attending to relevant parts of the presented graphics. Organizing words refers to arranging the relevant words in to a verbal model in working memory, and organizing images refers to arranging the relevant parts of the graphics into a pictorial model in working memory. Integrating refers to making connections between corresponding verbal and pictorial representations in working memory as well as relevant knowledge from long-term memory. Meaningful learning depends on the learner engaging in appropriate cognitive processing involving selecting, organizing, and integrating. Instructional design is intended to guide these processes.

Three Demands on Cognitive Capacity

A central tenet of the cognitive theory of multimedia learning is that working memory capacity is limited, but there are three demands on that limited cognitive capacity during learning: extraneous processing, essential processing, and generative processing. Extraneous processing is cognitive processing that does not support the instructional goal; the amount of extraneous processing depends on the degree of poor instructional design, such as presenting extraneous verbal or pictorial information in a lesson. Essential processing is cognitive processing to mentally represent the presented material in working memory; the amount of essential processing depends on the inherent complexity of the material for the learner, such as presenting many inter-related concepts in a fast-paced lesson. Generative processing is cognitive processing aimed at making sense of the incoming information; the amount of generative processing depends on the learner’s level of motivation to exert effort to understand the lesson. The three kinds of processing each require some of the learner’s limited cognitive capacity, so capacity that is used for extraneous processing takes away from capacity that could be used for essential and generative processing, and cognitive capacity that is used for essential processing takes away capacity that could be used for generative processing.

Three Instructional Goals

The three demands on cognitive capacity give rise to three instructional goals: minimize extraneous processing, manage essential processing, and foster generative processing. Consider a situation in which a poorly designed lesson causes the learner to allocate almost all of their cognitive capacity to extraneous processing, so they do not have adequate remaining cognitive capacity to engage in needed essential and generative processing. In this case, an important instructional goal is to minimize extraneous processing. This can be accomplished, for example, by eliminating unneeded words and graphical elements from a lesson.

Consider another situation in which extraneous processing has been reduced, but the lesson is so complex that the amount of needed essential processing exceeds the learner’s cognitive capacity. In this case, an important instructional goal is to manage essential processing. This can be accomplished, for example, by breaking a continuous lesson into manageable chunks that can be paced by the learner.

Finally, let us assume that we have minimized extraneous processing and managed essential processing, so cognitive capacity is available for generative processing, but the learner is not motivated to exert effort to understand the material. In this case, an important instructional goal is to foster generative processing. This can be accomplished, for example, by prompting the learner to engage in a generative learning activity such as writing a summary or self-testing during pauses a lesson.

Fifteen Multimedia Instructional Design Principles

Table 6 summarizes 15 principles supported by research by my colleagues and me over the years. The first set of five principles addresses the instructional goal of reducing extraneous processing; the second set of four principles addresses the instructional goal of managing essential processing; and the third set of six principles addresses the instructional goal of fostering generative. Each principle is subject to boundary conditions including for whom the principle applies, for which kind of lesson the principle applies, and under what circumstances the principle applies. These are described in more detail in Multimedia Learning (Mayer, 2021). Many more principles are described in the wider literature, such as in The Cambridge Handbook of Multimedia Learning (Mayer & Fiorella, 2022).

Table 6 Fifteen principles of multimedia instructional design (from Mayer, 2021)

The Future of the Cognitive Theory of Multimedia Learning

Even when a theory reaches a somewhat stable state, such as the cognitive theory of multimedia learning, there is still room for further theory development. Today’s version of CTML focuses mainly on cognitive processing during learning—such as selecting, organizing, and integration—as the core mechanism along with assumptions about the architecture of the human information processing system, limited capacity of working memory and dual channels for visual and verbal processing. In this section, I explore future directions for CTML involving the integration of new learning components that go beyond these basic cognitive processes, such as social processes, affective processes, motivational processes, and metacognitive processes. In the future, I also expect an increase in the research base, an increase in the number of evidence-based design principles, and a clearer specification of boundary conditions for design principles.

Integrating New Components into the Cognitive Theory of Multimedia Learning

Progress is being made in integrating understudied components into the cognitive theory of multimedia learning, including social, affective, motivational, and metacognitive processes. Concerning social processes, initial progress is reflected in the incorporation of social agency theory (Atkinson et al., 2005; Mayer et al., 2003), which posits that learners try harder to understand a lesson when they feel that the instructor is working with them. Concerning affective processes, initial progress is reflected in the incorporation of the cognitive-affective model of learning with media (Moreno & Mayer, 2007) and the cognitive-affective model of e-learning (Lawson et al., 2021; Lawson & Mayer, 2022), which posit that learners try harder to understand a lesson when they experience positive emotion while learning. Concerning motivational processes, initial progress is reflected in research showing students learn better from a statistics lesson when they are given prompts intended to boost their self-efficacy and decrease anxiety during learning (Huang & Mayer, 2019; Huang et al., 2020). Concerning metacognitive processes, initial progress is reflected in research assessing learners’ judgements of understanding during pauses in a multimedia science lesson (Pilegard & Mayer, 2015a, 2015b).

Overall, more work is needed to incorporate new components into the cognitive theory of multimedia learning, especially given ongoing advances in theories of self-regulated learning and motivation. For example, Kuhlmann et al. (2023) have shown how the SOI processes in CTML can be expanded through a motivational theory perspective. Clearly, self-regulation plays an important role in understanding how students learn from multimedia materials and in understanding how to design multimedia instructional materials for students with different kinds of self-regulation skills.

Expanding Research Methodologies to Monitor Learning Processes

In order to incorporate new components into the CTML, we need to expand the research methodologies used to monitor learning processes, including the use of eye-tracking, biometric, brain monitoring, and survey techniques. Concerning eye-tracking techniques, progress is being in determining how learners allocate their attention in viewing a multimedia lesson, such as how many times their eyes move between corresponding printed words and graphical elements (Johnson & Mayer, 2012; Ponce et al., 2018) or where students look when viewing a video lecture consisting of an instructor standing next to projected slides (Stull & Mayer, 2021; Stull, et al., 2018). Concerning biometric techniques, initial progress is reflected in studies examining students’ emotional arousal as measured by heart rate variability and electro-dermal activity when learning in immersive virtual reality versus with conventional media (Parong & Mayer, 2021a, 2021b). Concerning brain monitoring techniques, initial progress is reflected in studies examining students’ level of distraction as measured by a portable electroencephalogram (EEG) system when learning in immersive virtual reality versus with conventional media (Parong & Mayer, 2021a, 2021b). Similarly, functional near-infrared spectroscopy (fNIRS) technology offers a potential avenue for detecting the intensity of activity in brain areas related to cognitive, social, and affective processing (Li et al., 2022). The traditional way to measure learning activities is through self-report surveys administered after learning, but initial progress is being made in injecting brief survey items at pauses within an ongoing lesson (Pilegard & Mayer, 2015a, 2015b). Overall, I expect advances in methodologies aimed at detecting learning processes during an instructional episode, which will complement existing techniques to measure learning outcomes.

Expanding the Knowledge Base

Finally, I expect the cognitive theory of multimedia learning to be fortified with an increase in the research base, which will enable an increase and refinement in design principles and a clearer specification of boundary conditions for when a principle is most likely to apply. As the research base grows, I expect to see more meta-analyses that pinpoint the strength of key multimedia design principles as well as their moderating factors. An important direction for future research is to conduct studies in more natural learning environments such as school classrooms, online courses, and professional training. The ultimate test of the value of the cognitive theory of multimedia learning rests in its practical role in improving instruction and training as reflected in the five editions of e-Learning and the Science of Instruction (Clark & Mayer, 2003, 2008, 2011, 2016, 2024).

Conclusion

The cognitive theory of multimedia represents one of educational psychology’s success stories by showing progress in addressing some of our discipline’s fundamental questions about learning and instruction: How do people learn and how can we help people learn? In particular, the CTML represents our attempts to understand how meaningful learning of academic material works and how to improve the design of academic material to foster meaningful learning. For more than 100 years, our field has grappled with these questions. The development of the cognitive theory of multimedia learning provides a case example of how educational psychology can contribute to psychological theory and educational practice. I will consider this essay to be a success, if it encourages you to join this worthwhile effort, such as by conducting theory-grounded studies of meaningful academic learning, contributing to research-based theories of meaningful academic learning, or developing evidence-based instruction.