Introduction

The teaching methods of Science, Technology, Engineering and Mathematics (STEM) are required to change rapidly due to the speedy development of technology and accumulation of knowledge within the applied STEM fields (Kelley & Knowles, 2016). STEM areas are therefore challenged with the task of ensuring that graduating students have the corresponding knowledge, skills, and techniques to match the demands of the modern job market (Jelks & Crain, 2020). However, providing specialized labs, equipment, and enough experts to guarantee a quality knowledge transfer are resource-demanding and not always possible to obtain (Bonde et al., 2014). Educational facilities may struggle finding necessary staff or funding and students may experience a temporary inability to access a professional learning environment, such as has been the case internationally with the current COVID-19 pandemic.

There has also been a decades-long educational challenge with students’ low motivation to engage in learning STEM subjects, and governments have made effortful attempts to change the way STEM subjects are being taught (National Research Council, 2012). Pedagogical sciences have therefore developed specific STEM practices that can help with raising students’ awareness about the goals, applicability, and relevance of the STEM subjects through combining different STEM subjects, providing situational learning, applying technology, using manipulatives and hands-on learning, applying a problem solving approach, and implementing an inquiry approach to instruction (e.g., Kelley & Knowles, 2016; Zemelman, et al., 2005). In order to boost students’ personal motivation, STEM education should also provide socially mediated learning contexts, where the students’ sense of self-efficacy is supported (e.g., Clark & Dwyer, 1998; Foote, 1999; Pintrich & De Groot, 1990; Pintrich & Schunk, 1996, 2002; Schunk & DiBenedetto, 2016). To achieve this, Zemelman et al. (2005) advocate having teachers be the learning facilitators and integrating formative assessment into instruction.

STEM education with its current challenges is speculated to possibly benefit from the rapid advancement of Immersive Virtual Reality (IVR) technology that has already seen the development of various educational applications (Radianti et al., 2020; Suh & Prophet, 2018; Wu et al., 2020). It has especially been seen as a promising technology for adoption in K-12 and higher education settings (Makransky & Mayer, 2022; Patterson & Han, 2019; Suh & Prophet, 2018; Tilhou et al., 2020). IVR is predicted to be used broadly in classroom teaching within a few years (Freeman et al., 2017) as a cost-friendly aid for teaching STEM subjects in an effective manner (Dalgarno & Lee, 2010; Hew & Cheung, 2010; Johnson-Glenberg et al., 2011; Kaufmann et al., 2000; Nadan et al., 2011). Compared to other technological knowledge transmission mediums, such as video, slides, or online virtual worlds, IVR has several unique affordances including: the enhancement of spatial knowledge and sense of presence, the ability to provide experiential learning and embodiment, the affordance of creating situated learning, and the capacity to increase intrinsic motivation (i.e., DiNatale et al., 2020). Previous studies have already coupled IVR technology with various learning theories, such as multimedia learning, situated learning and embodied learning, and the cognitive affective model of immersive learning (Dede, 2009; Johnson-Glenberg, 2019; Makransky & Parong & Mayer, 2018; Petersen, 2021; Petersen et al., 2022a). In the next section, we present the specific IVR affordances that support the effective STEM practices, exemplified with the IVR biology simulation on next-generation sequencing (NGS) that was used in the current study.

IVR affordances in secondary and higher educational settings for STEM subjects

In order to effectively show learners how a certain STEM knowledge or skill can be put in practice, a learning context must be created that corresponds to the conditions in which real-life STEM knowledge can be applied (Putnam & Borko, 2000). IVR allows for the creation of realistic learning environments that engage the user and create a sense of presence and authenticity (DiNatale et al., 2020). In the case of IVR, a high sense of presence is supported through a head mounted display (HMD) with its head position tracking, its sufficient depth perception, and its ability to isolate learners from viewing the outside world (Baceviciute et al., 2021b; Cummings & Bailenson, 2016). Recent research has shown, for example, that a sense of presence can be especially beneficial in the exploration phases of inquiry-based science learning interventions in STEM (Makransky & Mayer, 2022; Petersen et al., 2020).

Applying a problem solving approach to STEM lessons may be beneficial in order to promote deeper learning, meanwhile promoting the students’ awareness about the relevance and practicality of a STEM subject (e.g., Bonde et al., 2014; Fowler, 2015; Zemelman et al., 2005). Dede (2009) warns against separating tasks of learning and doing, which could lead to the fragmentation of the learning process and hinder the transfer of knowledge and skills. Unlike many other media, Immersive Virtual Reality Learning Environments (IVRLEs) allow for the seamless combination of practical goal-directed learning assignments and additional theoretical knowledge. The simulated biology lesson used in the current study has applied this IVRLE affordance and created a real-life practical problem of having to locate the gene that is linked to obesity for the learners to solve. The students’ progress in solving the assignment is aided by additional theoretical information presented in a lab pad, thus tapping into many effective STEM practices.

Various STEM fields also place a high value on practical assignments, because hands-on activities help to achieve better knowledge transfer than purely theoretical learning (e.g., Zemelman et al., 2005). The benefits of practical assignments stem from the phenomenon of embodied learning. Embodiment is one of the key features within any learning process (e.g., Barsalou, 2008; Gallagher, 2005; Varela et al., 1992), because the brain's sensory systems, bodily states, and actions in connection to the situation form the basis of our cognition (Barsalou, 2008). Positive findings on learning outcomes have been reported when using embodied approaches in teaching STEM subjects, such as physics (Kontra et al., 2012), chemistry (Ping et al., 2011), mathematics (Howison et al, 2011; Nemirovsky & Ferrara, 2009; Nuñez et al., 1999), geology (Bircheld & Johnson-Glenberg, 2012), and neuroscience (Decety & Grezes, 2006). The virtual manifestations of the user’s own body within the IVR—self-avatars—are thought to help create a sense of embodiment (Taylor, 2002). Thanks to this IVR-specific affordance, the user can achieve a sense of self-location, global motor control, and body ownership (Kilteni et al., 2012). Moreover, a high level of interactivity with the virtual environment is often mediated by physical controllers placed in the users’ hands, which can support a sense of active engagement and allow one to feel in control of one's own actions, thus supporting the sense of agency within the virtual environment (Makransky & Petersen, 2021; Petersen et al., 2022b; Piccione et al., 2019). The STEM practice of hands-on learning can only effectively be replicated in IVRLE when utilizing the affordances that unite the physical and embodied learning activities with the virtual sensory experience. The biology simulation used in the current study depicts a realistic navigable 3D virtual biology laboratory that allows the students to experience the activities, manipulate the tools and follow the safety measures that are a crucial part of a real NGS process in a biology laboratory. The user is set in a first-perspective view and is coupled with virtual hands wearing the equipment necessary for the lab procedures. The learning simulation is accessible through an HMD, and the interaction is facilitated by two physical controllers.

It is not always possible to provide students with necessary meaningful and motivating practical STEM learning activities that encourage students to be curious and inquisitive towards the learning content approaches (e.g., National Research Council, 2012; Zemelman, et al., 2005). IVRLEs, however, allow students to perform tasks that would be impossible or impractical to conduct in the real world, leading to a feeling of safety within the learning experience (Dalgarno & Lee, 2010). The biology simulation used in the current study provides a safe environment, where making mistakes does not impose a high cost, allowing students to freely explore the learning contents and manipulate with the tools within the learning environment. This IVR affordance can, in turn, encourage the users to acquire skills of scientific inquiry (e.g., National Research Council, 2012), as well as provide cognitive benefits of hands-on learning (e.g., Zemelman et al., 2005), and improve technological literacy (e.g., Kelley & Knowles, 2016).

Next to the environmental design, learning content naturally also plays an important role for learning effectiveness. Integrating STEM subjects is thought to provide students with the opportunity to learn from relevant and stimulating experiences, improve critical thinking and problem-solving skills, as well as boost knowledge retention (Stohlmann et al., 2012). The IVR simulation used in our learning experiment incorporates knowledge and practices from the fields of genetics, biomedicine, and molecular biology. Thus, this specific IVRLE dismantles the traditional educational boundaries between different STEM subjects in order to provide an experience of the realistic application of the integrated STEM knowledge (e.g., Kelley & Knowles, 2016; Klingenberg et al., 2020; Zemelman et al., 2005). This, in turn, can lead to a higher motivation towards STEM subjects.

Supporting students’ sense of self-efficacy is seen as another motivational factor for effective learning (e.g., Clark & Dwyer, 1998; Foote, 1999; Pintrich & De Groot, 1990; Pintrich & Schunk, 1996, 2002; Schunk & DiBenedetto, 2016). Academic accomplishments may depend on the students’ self-efficacy even more than on their study motivation (Pintrich & De Groot, 1990; Pintrich & Schunk, 1996, 2002). Research points to feedback as being one of the most significant sources of information for helping students to achieve an optimal sense of self-efficacy (Clark & Dwyer, 1998; Foote, 1999; Warden, 2000; Zimmerman & Martinez-Pons, 1992). Feedback that emphasizes mastery, improvement, and achievement is suggested to enhance the self-efficacy of the learner (Pintrich & Schunk, 2002). Students work harder when they see themselves as competent for the task (Schunk & DiBenedetto, 2016). The STEM practices that can support students’ self-efficacy are, among others, having a teacher as a facilitator and integrating assessments into instruction (Makransky et al., 2020b; Zemelman et al., 2005). IVRLEs allow for the utilisation of virtual agents—with the affordance of adapting their form to fit the learning demand—whose presence can mimic the educational dynamics associated with the more traditional pedagogical settings (). Moreover, Parong and Mayer (2018) propose that IVR may even exceed traditional pedagogical teaching practices thanks to the possibility for personalized immediate and adaptive feedback to the students within the virtual learning environments. The biology IVR simulation used in the current study offers the user real-time interaction with a mostly auditory pedagogical agent who provides not only factual and conceptual information and guidance in the practical assignments, but also knowledge checks and timely feedback for the student’s actions. Many IVR learning experiments have resulted in higher self-efficacy when learning in IVR as opposed to less immersive media (e.g., Plechata et al., 2022; Pulijala et al., 2018).

To summarize, IVRLEs have the potential to provide embodied, engaging, immersive, and empirical learning experiences, which can lead to a better knowledge transfer (e.g., Makransky et al., 2019a; Meyer et al., 2019) and a higher sense of motivation for STEM learning (e.g., Makransky et al., 2020c; Parong & Mayer, 2018; Petersen et al., 2020; Stepan, et al., 2017; Zhao et al., 2020).

Research on IVR in education

Luo et al. (2021) reported in their recent meta-analysis that: “With the launch of commercial products such as the Oculus Rift and HTC Vive, HMD-based VR interventions have witnessed the largest increase in publication in the last decade (from 14 to 24%).” (p. 7). The existing research has revealed that IVRLEs can have many affordances in educational settings, including several motivational and affective benefits (e.g., Lee & Wong, 2014; Makransky et al., 2020a; Merchant et al., 2014; Parong & Mayer, 2018; Zhao et al., 2020). A recent meta-analysis found a small effect size advantage (ES = .24) of using VRLEs compared to non-immersive learning approaches when summarizing results from studies conducted between 2013 and 2019. Results of newer studies comparing learning outcomes for IVRLEs compared to less immersive media in science education have been mixed (e.g., Baceviciute et al., 2021a; Fealy et al, 2019; Luo et al., 2021; Parong & Mayer, 2018; Wu et al., 2020). One challenge which has been repeatedly highlighted when applying IVR in everyday teaching practices is that lessons presented in IVR can lead to higher levels of cognitive load compared to non-IVR based learning mediums (Makransky et al., 2019b; Parong & Mayer, 2020). Cognitive load can occur due to an increased level of visual stimulation that is not always relevant for learning, or due to the unfamiliarity of the control devices (Makransky et al, 2019b). In some cases, the increased cognitive load has even been associated with poorer learning outcomes (e.g., Makransky et al., 2019b; Parong & Mayer, 2018, 2020).

To overcome the increased cognitive strain, several studies have investigated the role of scaffolding techniques around IVR lessons (e.g., Klingenberg et al., 2020; Makransky et al., 2020a; Meyer et al., 2019, 2020; Parong & Mayer, 2018, 2020). It has been suggested that scaffolding techniques can help facilitate learners in IVR to recollect the given material more efficiently, structure it better, and help students associate the information with pre-existing knowledge and apply it in novel situations (Parong & Mayer, 2018).

Zhao et al. (2020) investigated the effects of using the Generative Learning Strategy (GLS) of summarization after a biology IVR and video lesson, and found that the GLS strategy significantly reduced learners’ perceived effort both in the IVR and video conditions but did not lead to better content comprehension. The researchers reason that “writing summaries facilitated the cognitive process of selecting relevant information in the working memory, thus resulting in the reduced effort.” (p. 480). Meyer et al. (2019) investigated the effect of pre-training on learning biology in IVR and through video, and found that knowing the names and characteristics of the main concepts before the session helped both knowledge acquisition and transfer of the learned material in IVR but not the video condition. The authors suggest that without pre-training, the IVR learning experience could have been more cognitively demanding than with video, leaving fewer resources available for information processing and encoding it into the long-term memory. Parong and Mayer (2018) investigated the instructional effectiveness of adding a prompt to write summaries at different points in an IVR-based biology lesson. They found that the students who summarized after each IVR learning segment, performed significantly better on the post test, compared to the participants who did not summarize and who learned from the IVR lesson without segmentation. Makransky et al. (2020a), examined the efficacy of enactment—a GLS that involves engaging in task-relevant movements during learning—and in combination with an IVR science simulation compared the same lesson observed as a video. They found that the IVR and enactment group had the highest performance in the procedural knowledge and transfer tests and concluded that enactment is specifically relevant for learning in IVR. Finally, Klingenberg et al. (2020) investigated the efficiency of using the GLS of teaching following a higher education biology simulation presented through IVR or on a desktop computer. The authors found an interaction effect between media (IVR/desktop) and method (teaching/no GLS) for the outcomes of retention, transfer, and self-efficacy. The authors concluded that the GLS of teaching is specifically relevant for improving self-efficacy, retention, and transfer in IVR.

Objectives of this study

The different positive findings of scaffolding techniques applied to the IVR science lesson on learning, motivational, and affective measures show that scaffolding is important when applying IVR based lessons in STEM. However, the specific affordances of different scaffolding techniques have not been fully analyzed. To our knowledge, the GLS of self-explanation has not been investigated in the context of IVR science learning.

Moreover, even though various studies investigate the value of IVR in science education (e.g., Chittaro & Buttussi, 2015; Chittaro & Ranon, 2007; Gunn et al, 2017; John et al, 2018; Kesim & Ozarslan, 2012; Monahan et al., 2008; Pan et al, 2006; Rauch, 2007; Yang et al., 2018), many fail to disentangle the instructional design, or pedagogical principles that are essential for understanding the value of this medium from an instructional science perspective. Furthermore, many of these studies are conducted within laboratory settings, which may limit their generalizability to ecological science learning settings.

The current study is a value-added study that investigates the effect of adding a supplementary feature—a self-explanation task—to an IVR based science simulation on perceived cognitive load and learning outcomes. Testing this in an ecological valid setting (a higher education biology course in the beginning of the education) increases the practical value of the results for science education. The self-explanation principle currently under scrutiny was selected based on literature that suggests that it is important to include scaffolding strategies when implementing IVR based science lessons (e.g., Klingenberg et al., 2020; Makransky & Petersen, 2021; Parong & Mayer, 2018, 2020; Zhao et al, 2020). Self-explanation is relevant in applied contexts because it can be easily implemented in a real classroom environment in addition to an IVR learning intervention. We will hereby proceed with introducing the theoretical background of the study and reveal the research questions in the next section, then explain the study methodology in detail, report the experimental results and finally discuss the outcomes and provide our future directions.

Theoretical background

This theory section concentrates on the two main theories supporting the current study. It starts with the self-explanation from generative learning theory, which constitutes the background of our experimental intervention, and binds it with the research goals. The section proceeds with the cognitive load theory in association with the cognitively demanding IVR learning environment, and ends with the study’s research questions.

Self-explanation

Self-explanation is one of eight Generative Learning Strategies (GLS) proposed by Fiorella and Mayer (2015, 2016). Self-explanation is theorized to be a process by which learners form inferences about causal connections or conceptual relationships within the learned material (Bisra et al., 2018). It is a constructive cognitive activity that can be applied at will or in response to a prompt, for understanding novel information or learning new skills (Fonseca & Chi, 2011). Self-explanation is seen as separate from other scaffolding techniques, such as summarizing, explaining to others, or talking aloud, as it is self-focused and has the purpose of making new information personally meaningful. Its process may be completely covert or, if expressed overtly, it may not be intelligible for anyone else but the learner (Chi, 2000).

Positive relationships between self-explanation and learning have been found in early self-explanation research (Chi et al., 1989; Renkl, 1997). Bisra et al. (2018) found a statistically detectable advantage (g = .35) of self-explanation over instructional explanations in their meta-analysis. They hypothesize that by retrieving relevant previously acquired knowledge from memory and elaborating on it using the relevant features in the new information, meaningful associations emerge. Constructing the explanation is thought to engage the fundamental cognitive processes involved in understanding the explanation, recalling it later, and using it to form further inferences.

Previous research has also investigated the learning effectiveness of self-explanation while using prompts either before (e.g., Ainsworth and Burcham 2007), during (Hausmann & VanLehn, 2010) or at the end of the learning activity (Tenenbaum et al., 2008). Evidence points towards better learning achievement when self-explanation has been taught or prompted (e.g., Lin & Atkinson, 2013; Rittle-Johnson et al., 2017). A meta-analysis by Rittle-Johnson et al. (2017) revealed that prompting students to self-explain while learning mathematics had a small to moderate effect on learning outcomes. Moreover, it is theorized that if learners are allowed to decide what to self-explain, constructing explanations may be better adapted to idiosyncratic prior knowledge (Bisra et al., 2018). The researchers also claim that “the beneficial effects of self-explanation may be evident more on transfer tests than recall tests, and more on long-form test items such as essays and problems than multiple-choice questions” (p. 707).

A study by Schworm and Renkl (2006) showed that the participants who received self-explanation prompts only outperformed the participants who were provided with self-explanation prompts together with supplementary instructional explanations on demand. This finding may refer to a boundary condition of the self-explanation prompts, where allowing for relevant instructional explanation may undermine the learners’ incentive to exert effort in self-explaining.

The prompt format may affect the elaboration quality generated in the self-explanation process (Bisra et al., 2018). For example, an open-ended prompt format that contains fewer cues than multiple-choice and fill-in-the-blank formats, can allow for more elaborative processing that is better adapted to the learners’ unique deficiencies in knowledge. Alternatively, stronger cues in a multiple-choice format may more effectively point out the specific misconceptions the learner should address. Bisra et al. (2018) found that using multiple-choice prompts did not have a statistically significant effect on learning, which may mean that the greater cue strength of multiple-choice prompts can undermine self-explanation effects. Bisra et al. (2018) argue that optimal cue strength may depend on the students’ own ability to self-explain the concepts or procedures at hand. Novices may benefit more from strongly cued self-explanation prompts, but as the level of expertise increases, they may exceedingly benefit from less strongly cued prompts. In conclusion, research on self-explaining suggests that a successful self-explaining task should be prompted by open-ended cues, loosely structured and devoid of supplementary instructional explanations. Since the students in our experiment had some previous knowledge of the subject at hand, they were given an open-ended prompt to self-explain the content of the IVR lesson in a loose format, with a time limit of 10 min. No extra material was provided to the self-explanation group. Thus, all of the recommendations from the self-explanation literature were followed.

Cognitive load theory

Cognitive load theory posits that a high mental load in information processing demands the allocation of mental resources (Kirschner et al., 2011a, 2011b; Sweller, 1988, 2010). The theory proposes that the working memory load (cognitive load) is determined by the number of information elements (Cowan, 2001; Miller, 1956) that need to be processed simultaneously within a certain amount of time (Barrouillet et al., 2007). Cognitive load theory is in its essence a theory of instructional design that proposes that the working memory limitations should be taken into account when developing learning instructions (Sweller, 2010; Sweller et al., 1998, 2011; Van Merriënboer & Sweller, 2005).

Several studies have highlighted the relevance of cognitive load in IVR based STEM lessons. Parong and Mayer (2018) investigated the instructional effectiveness of IVR versus a desktop slideshow for teaching scientific knowledge. They found that students who viewed the slideshow performed significantly better on the posttest than the IVR group. The researchers argued that this effect may have been elicited by the more cognitively demanding IVRLE. Similar results were found in their later study (Parong & Mayer, 2020), where the students who viewed the IVR biology lesson as an interactive animated journey performed significantly worse on transfer tests, reported more extraneous cognitive load and showed less engagement based on EEG measures than those who viewed the slideshow lesson. The authors argue, based on mediational analysis, that IVRLEs may create an affective and cognitive distraction, which may lead to poorer learning outcomes than desktop environments. Makransky et al. (2019b) found that a science simulation presented in IVR led to more presence but less learning compared to a desktop version of the simulation. They measured cognitive load using EEG and found that students in the IVR condition had higher levels of cognitive activity compared to the desktop condition late in the session. The authors highlight the importance of being able to differentiate between different forms of cognitive load in future IVR based research. Makransky et al. (2020a) propose that due to the increased size of the visual field of view of the IVR systems compared to a monitor, the IVR users’ sense of presence is also enhanced, but that this feature also increases extraneous cognitive load because learners need to locate relevant content, and often the environment includes seductive details that are not necessary for learning.

Many authors have attempted to divide the working memory load into categories depending on its function (see also: Kalyuga, 2011; Paas et al., 2003, 2004; Sweller, 2010; Sweller et al., 1998; van Merriënboer & Sweller, 2005). Three types of cognitive load have been identified by Sweller et al. (1998) including intrinsic, germane, and extraneous cognitive load. Intrinsic cognitive load (IL) is enforced by the basic structure (the intrinsic nature) of the information that learners are required to obtain. This type of cognitive load is based on the interaction between the nature of the material at hand and the learner’s level of relevant expertise (Sweller, 2010; Sweller et al., 1998, 2011).

Similar to IL, extraneous cognitive load (EL) requires working memory resources. The manner in which the to-be-learned information is presented (the nature of the instructional design or procedures) or the actions which learners have to perform, impose EL. If the instructional methods are suboptimal, the learners need to apply cognitive processes that do not directly contribute to the construction of their cognitive schemata and do not serve the learning goals (Brünken et al., 2003; Sweller & Chandler, 1994; Sweller et al., 1990). Therefore, EL is related to the manner in which the material is presented, and IL is related to the innate complexity of the information (Sweller et al., 2011). It is assumed that IL and EL are additive (Sweller, 1994; Um et al., 2012). The more working memory resources have to be devoted to EL, the fewer will be available to dealing with IL, thus reducing learning (Sweller, 2010). Research implies that learning in IVRLE may lead to an essential overload more quickly than learning in a less immersive format because the extraneous load is thought to be higher due to the increased amount of sensory information in this medium (Meyer et al., 2019; Richards & Taylor, 2015).

Finally, germane cognitive load (GL) refers to the working memory resources that the person devotes to deal with a lesson. It is the result of helpful cognitive processes, such as abstractions and elaborations that are promoted by the instructional presentation in the learning environment. An efficient learning environment should allow learners to apply available resources to advanced cognitive processes that are associated with germane cognitive load while reducing extraneous cognitive load. (Gerjets et al., 2004).

Research questions

The current study aims to answer two general research questions:

  • Research Question 1a: Does an IVR based science lesson significantly increase knowledge about Gene Expression when measured using a pre- to post-test design?

  • Research Question 1b: Does the addition of a self-explanation task after an IVR lesson lead to better learning outcomes compared to a control group?

  • Research Question 2: Does the addition of a self-explanation task increase students' perceived intrinsic (Research question 2a), extraneous (2b), and germane (2c) cognitive load?

Method

This section provides a detailed description of the methodology of the experiment, including the characteristics of the sample, the full experimental procedure, as well as the account of the study materials, measures and apparatus.

Participants and procedure

The participants were 79 undergraduate students from the biology department at a large European university. The sample consisted of 37 males, 40 females, and two non-binary. The mean age of the students was 23.8 (SD = 5.5). The experiment took place in a higher education biology classroom where the students were a part of four lab groups with approximately 20 participants per group. The experimental session took place during a planned biology lesson where students had to learn about gene transcription using IVR. The procedure in each of the four sessions was identical. The students were first introduced to the content of the lesson and randomly assigned a number which determined their placement in the self-explanation or control group. Students were seated next to each other at long study desks, leaving at least 50 cm between each other in order to provide them with enough space to physically move their upper body and hands when engaging with the learning content in the IVRLE. The first task was to respond to a web-based pre-test questionnaire that took between 5 and 10 min to complete. Then students were given the IVR headsets and controllers, along with verbal instructions on how to use the equipment. The students were encouraged to start the IVR simulation when they felt ready. Three lab assistants/researchers were available to assist the students in case of technical questions or problems. While being immersed in the IVR learning simulation, the students could either sit in their seats or stand up to allow for better movement. The mean time of the IVR lesson completion was 47 min (SD = 14).

Students were individually approached by an experimenter immediately after finishing the simulation. The students were either left seated in their original seats (the control group) or quietly guided to a remote table (the self-explanation group). They were then given a pen and a sheet with appropriate instructions (see Appendix 1). All students received brief oral instructions from the experimenter to read the given instructions carefully, follow the guidelines, without having any contact with other students during the task. The participants were not aware of the goals of the experimental manipulation and did not discover the differences between the two conditions during the experiment. The experimenters ensured that the people engaged in the required activities by making monitoring rounds. The research assistants instructed the students to respond to a post-test questionnaire as soon as the simulation was finished.

Materials

The immersive VR simulation

The learning intervention consisted of an IVR simulation “Gene Expression Unit: Use sequencing to unveil a gene linked to obesity” developed by the educational technology company Labster (for a preview of the simulation see Labster, 2019). The main objective of the simulation was for students to learn how to sequence DNA using pig samples as a model, and to find the genes in the DNA that are linked to obesity. The simulation depicted a biology laboratory with the necessary equipment that learners could manipulate in order to complete the assignments. The main tasks in the IVR simulation were: preparing samples for next-generation sequencing, understanding the principles behind the next-generation sequencing (NGS) technique and performing a quantitative polymerase chain reaction (qPCR) experiment with proper controls (see Fig. 1 for screenshots of the simulation). First, the students were required to perform a real-time quantitative polymerase chain reaction (RT-PCR) to generate complementary DNA (cDNA) from messenger RNA (mRNA) and prepare the samples for NGS by adding the proper genetic labels. After sequencing, they were guided to analyze the data to single out a gene that could be linked to obesity. To confirm the findings, they needed to design a quantitative polymerase chain reaction (qPCR) with proper controls and analyze the resulting curves. The simulation allowed students to see the process inside the NGS machine in the form of 3D close-up animations of the process and hear the auditory explanations of the ongoing processes. During the simulation, students interacted with the auditory virtual agent and a lab pad, where they received both oral and written instructions, as well as multiple-choice questions that served as a retrieval practice activity and provided a running score that students could use as a form of formative assessment. The students received 2 points for every correctly answered multiple-choice question at the end of each phase of the learning simulation. The students were allowed to read complementary material within the lab pad to answer the questions, thus the score mostly reflects the attention paid by the student to the study material at hand. The maximum score a student could obtain in the IVR simulation was 110 points and the minimum score was 0 points.

Fig 1.
figure 1

Screenshots of the IVR simulation “ The Gene Expression Unit:Use sequencing to unveil a gene linked to obesity”

Post-IVR simulation self-explanation and control instructions

The paper-based materials consisted of two instructive texts: one for the self-explanation condition and the other for the control condition. The instructions were handed out to the students after the IVR learning session. The participants in the self-explanation group were required to write the self-explanation task of the learned material on the same sheet of paper that provided them with the instructions. The people in the control group only needed to read the instructions on the paper and continue with other activities. The instructions are presented in Appendix 1.

The purpose of adding the 10-min long self-explanation task to the IVR learning experience for one group of students in this study was to help them reflect over the most critical points of the IVR lesson that they had encountered. They were asked to select the most relevant information for them from the next-generation sequencing lesson and write it down in a self-organized structure. Three features of the task were highly subjective: the precise IVRLE content that was selected for the reflection, the amount of information the students needed to report, and the quality of the written assignments. As all the students were presented with the same IVR lesson that explicitly highlighted the most important aspects of next-generation sequencing by providing cued prompts during the session, participants were expected to have a fairly similar understanding of what constituted crucial points of the lesson.

The self-explanation task was meant to be self-directed (i.e., Bisra et al., 2018) and its written format was meant to help structure the information students were processing. Even though most self-explanation processes are expected to happen internally, which means that there is no objective way to evaluate the true quality of the self-explanation process of an individual, the written reports provided the raters with necessary cues to evaluate the nature of the reflected contents.

An open-ended prompt format was chosen for the post-learning self-explanation task, since the students were already somewhat familiar with the learning content, which would allow for a more idiosyncratic reflection (e.g., Bisra et al., 2018). The self-explanation process was also deliberately not taught to the students, but merely prompted by a written cue (see Appendix 1), in order not to make them follow a cumbersome format of the reflection assignment and not hinder the formation of their own associations. The task was purposefully set at the end of the learning activity so as to separate it temporally from the multiple-choice questions provided during the IVR learning session. Lastly, no learning materials were present at the self-explanation task, since providing one group with 10 more minutes of the learning content would have rendered the conditions incompatible (Bisra et al., 2018), and/or it could have led to the boundary condition explained by Schworm and Renkl (2006), and undermined the students’ incentive to exert effort in self-explaining.

To score the self-explanation assignment, clear subject-centered 10-point coding criteria were developed by a biology lecturer who was responsible for the course. The ratings were performed by two independent raters who had knowledge of the subject matter. A complete coding scheme is presented in Appendix 2. According to the coding schemes, describing each necessary concept in the simulation (not particularly in the correct order), amounted to one point. One self-explanation could maximally receive 10 points. The interrater reliability of the scoring was α = .96, which shows that the criteria were objective enough to consider the coding form successful. The mean score of the self-explanation task was 1.6 (SD = 1.4; maximum score = 6). Based on a content matter expert judgement, 2 was used as a threshold to indicate that students had engaged meaningfully enough with the self-explaining task for it to have a difference from the resting group. This meant that both independent raters had to find at least two key concepts stated correctly in the student’s written self-explanation task. Using this criterion meant that 22 out of the 41 students in the self-explanation group were classified as engaging meaningfully with the self-explaining task. This threshold helped us distinguish the students who accurately and with enough thoroughness reflected upon the critical points of the simulation in the self-explanation activity, and those who misunderstood important aspects of the learned concepts, were too general or may have been too unmotivated to fulfill the task demands.

Pre- and post-test measures

Students administered a pre- and post-test in an online survey platform. The pre-test consisted of demographics and a 10-item multiple-choice knowledge test about gene transcription, which was the subject of the IVR learning session. The post-test included the same 10-item multiple-choice knowledge test that was used in the pretest, with two additional transfer test items, and a cognitive load measure (Andersen et al., 2020).

The pre- and post-knowledge test was designed to assess the students’ conceptual and procedural knowledge retention (e.g., “Why do we need to reverse transcribe the mRNA to cDNA before PCR and Sequencing? (A) Because the RNA cannot be used by the Taq polymerase, (B) Because cDNA is more abundant than mRNA, (C) Because the mRNA cannot be heated, (D) Because the mRNA cannot be sequenced”). Content validity was prioritized by ensuring that the items in the test measured the broad content of the lesson. The test was scored by giving students one point for each correct answer. A significant positive correlation between the scores obtained in the IVR learning simulation and the retention post-test [r = .59,** n = 69, p < .001] indicate that the knowledge test is measuring the effects of the IVR learning environment. That is, there was a positive correlation between how well students performed in the simulation and the post-test.

In addition to the retention test, we employed two transfer tests which consisted of open-ended questions. The aim of the transfer tests was to assess the participants’ ability to apply what they had learned during the lesson to new situations. The first transfer test item assessed conceptual knowledge transfer (“Describe the flow cell/chip surface and explain how the DNA molecules are bound to it”). The second transfer test assessed procedural knowledge transfer (“Describe the three main steps of the bridge PCR step in NGS”). The transfer tests were graded by two biology lecturers. Each of the questions were divided into three sub-questions which were rated with 0 or 1 based on a scoring rubric agreed upon before scoring. Therefore, one student could obtain a maximum of 3 points in each test. The correlation between the raters’ scores was .87, so the mean score across raters was used as the final measure of transfer. There was a significant positive correlation between the conceptual and procedural transfer tests [r = .51**, n = 69, p < .001] (see Table 3). Also, the correlation between the score in the simulation and the procedural transfer test was positive and significant [r = .27*, n = 69, p = .03]. However, the correlation between the score in the simulation and the conceptual transfer test was non-significant [r = .17, n = 68, p = .17].

The post-questionnaire also included a cognitive load measure by Andersen (2020). Students responded on a 5-point Likert scale from 1 (strongly disagree) to 5 (strongly agree) to three intrinsic cognitive load items (e.g., “The topic covered in the simulation was very complex”), three extraneous cognitive load items (e.g., “The instructions and/or explanations used in the simulation were very unclear”), and four germane cognitive load items (e.g., “The simulation really enhanced my understanding of the topics covered”). The scales had Chronbach’s alpha reliability coefficients of .85, .75, and .82 respectively.

Apparatus

The VR simulation was stereoscopically displayed through a Lenovo Mirage Solo (2018) standalone headset with headphones attached. The HMD has two separate lenses for each eye that give access to a high graphical fidelity screen, and attached headphones that carry the sound of the visual display. A head-motion tracking system controls the interaction in the VR set-up, allowing users to move their field of view inside of the virtual 360-degree environment as they turn their heads.

Results

Initially, an analysis was conducted to investigate if the students in the two conditions differed on basic characteristics of age and gender. An independent samples t-test indicated that the groups did not differ significantly in age, t(77) = .25, p = .62. Furthermore, a Chi-square test indicated that the groups did not differ significantly in the proportion of different genders, χ2 (2, N = 79) = 4,76, p = .09. In conclusion, we were unable to find an effect of the basic characteristics on the conditions.

RQ 1: Did the IVR simulation increase students’ knowledge, and were there differences between the conditions on knowledge acquisition and transfer?

We used a Robust Bayesian estimation model, with a prior t-distribution instead of a Gaussian function, such that the model is less sensitive to outliers. In Tables 1 and 2, we report the mean predictions and effects of the model of the form \({y}_{ik}\sim T\left(v,{\mu }_{ik},{\sigma }_{ik}\right)\).

Table 1 Posterior predictions and 95% confidence intervals for the dependent variables measured in the pre- and post-test
Table 2 Posterior predictions and 95% confidence intervals for the dependent variables measured in the pre- and post-test

The results indicate that there is a large increase in retention test scores over time, with an effect size of ßPosterior = 3.29 (95% CI ranging from 1.78 to 4.80, see Table 1). Therefore, we can conclude that the IVR science simulation was effective in facilitating learning about next-generation sequencing.

The results reveal that the self-explanation group had a higher retention post-test score compared to the control group ßPosterior = .37 however the 95% confidence interval for the effect size ranged from − .34 to 1.06 indicating that there is not enough support to conclude that the groups differed significantly on this outcome. The Bayesian estimation model did not converge for the conceptual transfer and procedural transfer outcomes, so we report the results for independent samples t-tests instead. The results indicate that the difference between the self-explanation group (M = .46, SD = .58) and the control group (M = .50, SD = .77) were not significantly different for conceptual transfer (t(76) = .243, p = .808, d = .054). Furthermore, the results indicate that the difference between the self-explanation group (M = .67, SD = .91) and the control group (M = .54, SD = .76) were not significantly different for procedural transfer (t(77) =  − .696, p = .488, d = .16). Therefore, no significant differences were found between the self-explanation and control groups on the learning outcomes measured in this study.

A post-hoc analysis was performed to investigate if there were differences in the outcomes of knowledge acquisition, conceptual transfer, and procedural transfer when only including the students who engage meaningfully in the self-explanation task (students who scored 2 or more points on the written self-explanation task). The first column in Table 2 illustrates that the students in the self-explanation group who meaningfully engaged with the task had a significantly higher knowledge retention post-test score than the control group (Engaging self-explanation: ßPosterior = .64, 95% CI [.15; 1.45], control group: ßPosterior = .44, 95% CI [− .09 − 1.01]), however the difference was due to a higher pre-test score. The students who engage meaningfully with the self-explanation task, had a significantly larger effect on procedural transfer than the control group, albeit there is a chance that the effect is non-significant. Finally, the results did not converge for the conceptual transfer test, but an independent samples t-test indicated that there was not a significant difference between the self-explanation group of students who engaged meaningfully with the task (M = .69, SD = .61) and the control group (M = .50, SD = .77) (t(60) =  − 1.01, p = .24, d = .27). We can therefore conclude that the students who engaged meaningfully with the self-explanation task did not perform better than the control group on knowledge retention, there was however a positive effect for procedural transfer, albeit it is possible that the effect is not significant, and no significant difference between the groups on conceptual transfer.

In further analyses we investigated the correlation between students’ performance in the simulation and other outcomes in the study (see Table 3). There were significant positive correlations between the score obtained in the IVR simulation and the retention post-test score [r = .60, ** n = 69, p < .001], as well as the procedural transfer test [r = .27,* n = 69, p = .03]. However, the simulation score is not significantly correlated with the amount of pre-to-post-test gain in the retention test [r = .22, n = 58, p = .11]. Thus, the engagement in the IVR lesson significantly predicts knowledge retention and procedural knowledge transfer test outcomes, but not the scope of pre-to-post-test change in the retention test scores.

Table 3 Summary of Intercorrelations, Means and Standard Deviations of Scores on the Conceptual Transfer, Procedural Transfer, Knowledge Retention, Intrinsic Cognitive Load, Extraneous Cognitive Load, Germane Cognitive Load and IVR simulation scores

Table 3 also illustrates how students’ self-explanation scores were correlated with their learning results. The correlation between self-explanation assignment scores and knowledge retention test scores was positive but not significant [r = .29, n = 34, p = .09]. However, the correlation between self-explanation assignment scores and the procedural transfer test was positive and significant [r = .47**, n = 34, p = .01]. Finally, the correlation between self-explanation assignment scores and the conceptual knowledge test was positive but did not reach significance [r = .33, n = 33, p = .07]. Thus, the quality of the self-reflection task significantly correlates with the procedural knowledge transfer test, but not the other outcomes.

RQ 2: Did the addition of the self-explanation task increase students' intrinsic, extraneous, and germane cognitive load?

Table 1 also shows that the self-explanation group reported significantly higher intrinsic cognitive load than the control group ßPosterior = .35 (95% CI [.05; .68]). Similarly, the self-explanation group reported significantly higher extraneous cognitive load compared to the control group ßPosterior = .37 (95% CI [.03; .73]). Finally, the self-explanation group reported significantly lower germane load ßPosterior = -.38 (95% CI [− .75; − .01]) than the control group. In conclusion, students in the self-explanation group reported having higher intrinsic and extraneous cognitive load, but reported lower germane load compared to the control group. This is a third major empirical finding in this study.

Table 2 presents these results for the group of students who engaged meaningfully in the self-explanation task (scored above 2). These results indicate that the students in the self-explanation group who engaged meaningfully with the task report higher intrinsic load than the control group ßPosterior = .37 (95% CI [.03; .73]), but the results for extraneous load ßPosterior = .23 (95% CI [− .12; .58]), and germane load ßPosterior =  − .22 (95% CI [− .65; .21]), were not significant.

Finally, Table 3 illustrates how the quality of students’ self-explanation tasks were correlated to perceived cognitive load. The correlation between the students’ self-explanation score was negatively correlated with IL (r =  − .42), and EL (r =  − .24), but positively correlated with GL (r = .17), but none of the correlations reached statistical significance.

Discussion

Firstly, the current paper aimed to contribute to further evidence related to the effectiveness of IVRLEs in science education (e.g., Klingenberg et al., 2020; Meyer et al., 2019; Petersen et al., 2020). The results of the current study support the notion that IVR can be an efficient learning medium in science education as reflected by the significant and large effect size improvement of knowledge retention from before to after the IVR lesson. This finding may encourage secondary and higher educational establishments to adapt IVR as a science learning tool. Moreover, the finding that the amount of active engagement with the IVRLE predicts the later learning results may motivate pedagogical instructors to encourage the students to interact actively with the IVRLEs during the sessions.

Secondly, the study set out to test whether a supplemental self-explanation assignment could enhance learning. Contrary to expectations, giving students the task of self-explanation did not increase knowledge retention, conceptual transfer, or procedural transfer. However, students who had engaged meaningfully in the self-explanation task (i.e., achieving a score of 2 or more points) had a medium yet non-significant higher procedural transfer score than the control group but the increase was not significant for knowledge retention. Contrary to these results, most studies have found positive learning effects of using scaffolding strategies. For instance, Parong and Mayer (2018) found that asking students to perform the GLS of summarization led to better conceptual knowledge retention. Meyer et al. (2019), showed that adding a pre-training task achieved better knowledge acquisition and transfer in the IVR than a video. Makransky et al. (2020a) found that when students engaged in the GLS of enactment, it led to higher scores in procedural knowledge and transfer tests in IVR. Finally, Makransky et al. (2020a), found that assigning students the GLS of teaching increased knowledge retention and transfer in IVR.

One possible explanation for the general non-significant self-explanation results can be that the instructions for the self-explanation task may have not provided some students with prompts that were structured enough for a thorough engagement in the self-explanation activity. Some students may not have been motivated enough to spontaneously work on the topic with full dedication if they had to come up with their own structure. However, results show that some of the students visibly worked on the task more than others, and also gained from it. These inconsistencies may be explained by the reasoning by Bisra et al. (2018), who propose that optimal prompt strength may depend on the students’ own internal abilities to self-explain.

What differentiates this study from the previous studies is that the IVR lesson in our study lasted over 45 min while IVR lessons in the previous studies were much shorter. The IVR simulation used in the Makransky et al. (2020a), Meyer et al. (2019) and Parong and Mayer (2018)experiments lasted approximately 10 min, and the IVR simulation in the Klingenberg et al. (2020) experiment lasted 36 min. Indeed, the students in the self-explanation condition in our study reported significantly higher intrinsic and extraneous cognitive load and lower germane cognitive load compared to the control condition. Moreover, the self-explanation scores significantly predicted the procedural transfer scores, and were positively, albeit not significantly correlated with both knowledge retention and declarative transfer scores, which indicates that the amount of engagement with a GLS seems to affect learning outcomes.

Our findings of a higher level of cognitive load within the self-explanation group contradict the results of Zhao et al. (2020), who reported that a conceptually similar GLS—summarization—at the end of an IVR biology lesson significantly reduced learners’ perceived effort, while not leading to better comprehension. One difference between the two studies is that our experiment used a biology IVRLE where students could actively explore and manipulate the virtual objects while solving a scientific problem, thus tapping into some of the most recognized effective practices of STEM education (e.g., Kelley & Knowles, 2016; Zemelman et al., 2005). In contrast, the biology simulation used by Zhao et al. (2020) was a virtual animation where students were passive viewers. Therefore, the characteristics of the IVRLE may play a role in whether the cognitive load would be reduced or enhanced by a scaffolding GLS technique.

Another difference between our study and most previous studies (with the exception of Klingenberg et al., 2020) is that this study was conducted as part of a mandatory course rather than a low-stakes lab experiment. In general, it has been argued that conducting more research in natural learning settings will increase the validity of the experimental findings on the topic of IVR learning (e.g., Gerjets & Kirschner, 2009). Also, our self-explanation task was a post-intervention task that was applied after a long and cognitively demanding IVR learning simulation, while Parong and Mayer (2018) used the summarization task after individual segments of the simulation. This design was not possible in our setting because it was conducted in an applied educational context using an existing commercially available IVRLE with approximately 20 students simultaneously, which would have made the organization of such a procedure problematic.

Future research

Our study experimented with only one type of learning strategy applied after an IVR lesson and is thus inconclusive about the effectiveness of similar, but slightly varied strategies, in supporting learning in a virtual medium. Since the self-explanation task can be conducted in various forms, which can lead to different outcomes, future research should investigate different prompt formats (e.g., adding multiple-choice prompts, fill-in-the-blank prompts or diagrams), as well as look into the benefits of having students conduct self-explanation at different times in relation to the IVR learning simulation (Bisra et al., 2018).

Since one major difference between the current study and previous studies is the length of the IVR learning experience, it would be beneficial to investigate the value of using GLS strategies for lessons that differ in terms of length and cognitive demand. Our study also purposefully tested the students’ learning in terms of both retention and transfer, because using the GLS strategy of self-explanation may benefit one type of information acquisition more than the other. Future studies should construct tests that extensively and separately measure conceptual and procedural learning. This would allow for further scrutiny of the affordances of different types of self-explanation processes.

As our study only used the GLS of self-explanation and did not compare the effectiveness of other GLS strategies in combination with a cognitively demanding IVRLE, there is yet no understanding as to which GLS strategy would lead to better STEM learning results. Thus, in the future studies, self-explanation should be compared to different GLS strategies such as drawing, explaining, or enacting, which have been shown to increase procedural knowledge and transfer in previous IVR research (e.g., Makransky et al., 2020a).

Conclusion

This study contributes to the evidence that supports the use of IVR as an effective learning medium when it comes to acquiring complex scientific conceptual and procedural knowledge. This information could encourage educational institutions to offer IVR based interventions as part of secondary and higher education STEM lessons. If educators are interested in the information structuring techniques that could make learning in IVRLE more effective, this study proposes that a self-explaining task is only profitable for those who engage meaningfully with the activity. However, adding the scaffolding technique leads to additional perceived intrinsic and extraneous cognitive load and lower germane load when it is applied after a long and cognitively demanding IVRLE session. Finally, comparing different prompt formats in applying self-explanation after an IVRLE, as well as comparing it to other forms of scaffolding techniques within an authentic STEM higher education setting, is highly recommended.