Introduction

Immersive virtual reality (IVR) refers to a computer-simulated environment that is rendered in three-dimensional space, allowing users to feel like they exist in an authentic world instantly and without constraint (Chittaro et al., 2018; Rodriguez et al., 2018). Although various devices can be used to create immersive environments, by far the most common are head-mounted displays (HMDs; Jensen & Konradsen, 2018). Due to recent advances in the complexity and affordability of HMDs, it is now possible for the general public to experience events and environments that would otherwise be inaccessible. Indeed, non-specialists can now effectively ‘climb’ Mount Everest, explore the streets of an historically accurate Pompeii, tour the International Space Station, and even take Fantastic Voyage-style field trips inside the human body. Although the content of these experiences can be accurately reproduced in two-dimensional videos, the immersive nature of VR stimulates an experience that is phenomenologically different to watching a 2D video or a 3D machinima (Makransky & Lilleholt, 2018). Thus the idea that IVR can improve learning and teaching practises seems almost self-evidently true. With that in mind, the aim of the present study was threefold: first, to establish whether the effects of IVR are detectable at the level of individual cognitive abilities; second, whether the hypothesised effects manifest as improved learning outcomes; and third, whether these effects can be obtained under conditions that would make IVR useful to educators balancing differentiated learning practises with general classroom management (Roiha, 2014). Therefore, the present experiment was designed to provide the simplest possible test of IVR’s efficacy as an educational tool by comparing the recall of learners viewing a 360° video clip via a HMD or a 2-D flatscreen monitor.

Comparing IVR with traditional learning methodologies

A growing number of studies have examined the use of IVR in a wide range of educational settings, assessing outcomes in terms of cognitive re-education following traumatic brain injury (De Luca et al., 2019), skills development (Howells et al., 2007; Xie et al., 2021), and health and safety training (Clevenger et al., 2015). Evidence suggests that IVR-delivered educational content can engage learners on emotional, motivational and empathetic levels (Calvert & Abadia, 2020; Makransky & Lilleholt, 2018). A typical examination of how IVR compares to other modes of learning was provided by Howells et al. (2007). They compared the performance outcomes for junior orthopaedic trainees following either a simulation-based training protocol rendered in IVR or a traditional in-theatre training method. Those in the simulation group performed virtual knee anthroscopies, while their in-theatre counterparts followed the traditional protocol of theory, observation and assistance during live surgery. Following six training sessions taken over seven days, the trainees were assessed in a live setting and rated on several competencies by experienced surgeons. Results were twofold. First, trainees in the simulator group were rated significantly higher on standardised measures of orthopaedic competency. Second, the simulator group outscored their counterparts even on kinematic measures of performance, suggesting that virtual interaction can improve real-world physical dexterity.

Although Howells et al. (2007) demonstrate the advantages of IVR over an extended training period, other studies have assessed learning outcomes following a single IVR exposure. In this regard, results are mixed. A recent study by Parong and Mayer (2021) had undergraduate students watch a video depicting a series of battles between Australian and Japanese forces in World War II. The video contained historically accurate battle simulations and narrations, as well as interviews with veterans who took part in the conflict. Importantly, participants viewed the video either on a standard 2D desktop monitor or while fully immersed using a HMD. Participants were asked to rate their subjective experience on variables such as presence, enjoyment and motivation. Learning outcomes were objectively measured using questions related to the displayed content, while the neural and physiological correlates of cognition were obtained through electroencephalography (EEG) and skin conductance responses (SCR), respectively. Results showed that although virtual immersion improved affective processing, with higher levels of enjoyment and lower levels of boredom, the benefits were not reflected in measures of cognitive processing or performance outcomes.

In contrast, Christopoulos et al. (2022) examined the learning outcomes of 10th and 11th graders taking part in an Escape Room-style game designed to facilitate learning about cell biology. The game required students to solve a series of puzzles relating to the internal structures of a cell, thus enabling them to understand the role of enzymes and catalysts in normal cell function. Students performed the activity while fully immersed in the escape room via a HMD, or accessed the machinima via a PC or portable device. Importantly, the interactive and exploratory nature of the activity was preserved in both viewing conditions, as the gameplay mechanics allowed players to respond using computer peripherals such as touchpads and touchscreens.

In accordance with Parong and Mayer’s study, Christopoulos et al. also recorded subjective and objective measures of learning, indexing knowledge acquisition and retention at different points pre- and post-intervention. However, unlike the earlier study, a positive effect of IVR was obtained on learning outcomes: Relative to students viewing the machinima, IVR learners showed significantly greater knowledge retention following a lag of one week, although this had dropped to the same level as the machinima group by the four-week mark. On the affective measures, the IVR group reported greater levels of engagement than did their machinima counterparts, experiencing greater enjoyment, motivation, and learning satisfaction. Despite these clear benefits, however, the IVR group did not rate VR as offering improved cognitive benefits or more effective learning. This is an interesting finding that raises questions about the role of IVR in the classroom; a point to which we shall return below and in our Discussion.

Empirical challenges when testing immersion effects

The above studies have been described in detail to highlight the challenges faced by researchers investigating immersion effects on performance. Indeed, one of the foremost difficulties happens to arise from IVR’s principal strength—that is, its ability to create environments in which users feel immersed. The issue is that immersion is not a single variable that can be switched on or off according to the demands of an empirical test. Rather, it is the combination of multiple variables that determine how information sampled from the external environment in analogue form by the human sensory system can be replicated in digital form. Taking examples from only the visual modality, for instance, the human visual system uses a variety of cues to construct three-dimensional models of an environment. These include important binocular cues such as ocular vergence angle (Collewijn & Erkelens, 1990; Mon-Williams et al., 2000; Richard & Miller, 1969; Ritter, 1977; Viguier et al., 2001) and retinal disparity (Bishop, 1989; Mayhew & Longuet-Higgins, 1982), but also monocular cues such as retinal blurring (Mather, 1997; O’Shea et al., 1997), ocular accommodation (Mon-Williams & Tresilian, 1999, 2000), and the surface texture, contrast and shading of observed objects (Gonzalez & Perez, 1998; Johnston, 1991; Johnston et al., 1993; O’Shea et al., 1994). Thus, the problem is that immersion is actually a composite of all of these variables, whereas a rigorous scientific test demands the manipulation of only one.

When comparing learning under varying conditions of immersion, therefore, the challenge is in finding a trade-off that will allow researchers to retain as many of the features as possible that make an IVR experience so compelling, while selecting a control that is matched closely enough to make the comparison meaningful. This trade-off is evident in all of the studies cited above. For instance, the careful stimulus matching in Parong and Mayer’s (2021) viewing conditions is laudable, as learners received the same visual content and audio narration in both IVR and 2D video. However, as the authors themselves point out, some objects were interactable in IVR but not in the 2D video. Thus the critical differences between the two treatments extended to more than just the level of immersivity—the active versus passive engagement with the content may also have impacted on the learners’ experience and outcomes (Chrastil & Warren, 2012; James et al., 2002; Minhas et al., 2012).

Christopoulos et al. (2022) also made a commendable attempt to match immersive and non-immersive escape rooms—and show a positive effect of immersion on knowledge retention. However, their chosen task leaves one unable to establish precisely which aspect (or aspects) of cognition benefit from immersion. Indeed, one of the reasons why escape rooms are increasingly popular among educators is because they exercise a number of basic cognitive and executive functions, such as spatial, divided and temporal attention, encoding, retention and retrieval of information, reasoning, numerosity, reading, pattern recognition, logic and reasoning (e.g., Makri et al., 2021; Veldkamp et al., 2020). As such, a sensitive examination of IVR’s effects on cognition requires isolation of a single ability.

A final challenge concerns how positive research findings of IVR on cognition can be successfully translated into applied use within the classroom. The research cited above is important in identifying the conditions under which IVR can be effective, but if the benefits of IVR require extended periods of training (Howells et al., 2007), improve only the affective experience of learners (Parong & Mayer, 2021), or yield short-lived performance gains (Christopoulos et al., 2022), then the day-to-day practicalities of classroom management may not allow educators to commit the resources required to make IVR-based activities a viable part of school curricula.

With those issues in mind, the purpose of the present experiment was to examine the effect of IVR on a single cognitive ability that has direct bearing on educational attainment, and in a way that has maximum utility in a classroom setting. This is in contrast with the studies cited above, whose experimental designs and selected paradigms make it impossible to isolate individual cognitive abilities. To that end, forty undergraduate learners viewed a short (three-minute) video clip depicting a fictional scene taking place in an art gallery. Participants viewed the clip via an (‘immersive’) HMD or on a (‘non-immersive’) touchscreen monitor. The clip appeared in 2D in both cases but, as it was filmed in 360°, participants were able to control where they looked by moving their head in the immersive condition or by swiping the laptop touchscreen in the non-immersive condition. Following the film’s conclusion, participants were given a surprise recall test comprising 20 questions about the clip. There is a long history of using surprise recall as a means to measure incidental learning processes (e.g., McLaughin, 1965). Thus the strengths of the present method were threefold: first, that the visual stimuli were identical in both viewing conditions; second, both devices allowed learners the same level of interaction with the content; and finally, the surprise recall test measured basic memory processes in the absence of instruction provided by the experimenter, and preparatory and motivational differences in the learners. As such, if immersivity has a positive effect on learning, then recall rates ought to be higher for IVR learners than for their 2-D counterparts.

Method

Participants

A total of 40 Psychology undergraduate participants were recruited from the University of Hull, UK. All were volunteers who responded to a research advertisement posted in the researchers’ home department, with the first eligible 20 male and 20 female respondents selected to take part. The only eligibility criterion was for participants to have (self-reported) normal or lens-corrected binocular vision. To ensure an equal number of males and females in each group, participants were pseudorandomly assigned to the two immersivity conditions. Participants in the ‘immersive’ condition had a mean age of 20.7 (SD = 0.87), and 20.9 (SD = 1.16) in the ‘non-immersive’ condition. This difference was not significant, t(38) = 0.77, p = .45.

Apparatus and stimuli

The immersive condition was carried out using an Oculus Rift CV1 HMD. The HMD contains two OLED panels, resulting in a resolution of 1080 × 1200 pixels per eye, at a refresh rate of 90Hz, and with a field of view of 110°. The device was powered by Pentium PC running Windows 10. The stimulus video was presented on the Rift using Skybox (Skybox, 2019, version 0.2.0.1), a free piece of software capable of rendering and outputting 360° video to VR devices.

The non-immersive condition was carried out on a Microsoft Surface laptop also operating on Windows 10. The 13.5″ touchscreen display had a native resolution of 2256 × 1504 and a refresh rate of 60Hz. The stimulus was presented on the laptop using the native Windows application ‘Films & TV’ (version 10.19031.1141.0), which allowed for playback and interaction with the stimulus via the touchscreen. The film was displayed at a quality of 1080s on both devices.

The stimulus used in both viewing conditions was a three-minute film entitled Do Not Touch (Leigh & Wickert, 2018; See Fig. 1). The video was shot in 360°, placing observers in the centre of the scene and allowing them to pan the video in any direction. The film depicts a fictional scenario set in an art gallery. The protagonist, after finding out he can ‘enter’ into the paintings, triggers an alarm and is subsequently pursued by two guards through the various artworks as he attempts to flee, with the visual style and editing of the characters mirroring the painting in which they move around. The viewer does not fully enter the painting themselves, but rather observes them as if they are walking along the corridor of the art gallery.

Fig. 1
figure 1

Four stills from the learning content used in the present study—a 360° three-minute short film titled ‘Do Not Touch’, which depicts two security guards chasing a protagonist through the various painting within an art gallery. See Leigh and Wickert (2018; https://www.youtube.com/watch?v=ecajlIKwlCg) for the full clip (best viewed in context using a HMD device or laptop)

A surprise cued-recall test was also administered, comprising 20 questions based on details from the video stimulus, such as vocalizations, physical actions and observable character features. Fifteen of these questions related to events at specific timepoints within the video, with the intervals between these spaced relatively equally. An example of one such target question is “What is the name of the newspaper the protagonist pretends to read?”. The remaining five items were foils, concerning details and events that did not occur in the film, e.g., “What colour is the horse in the first 'wild-west' painting?” (with no horse being present). In these cases, responses indicating the negative were coded as correct. Questions were presented in a unique random order for each participant on the same device on which they viewed the film, with answers inputted via a keyboard. The full list of questions can be found in the “Appendix”.

Design and procedure

The experiment employed a single-factor, between-subjects design comparing the effect of immersion on cued recall (see Fig. 2).

Fig. 2
figure 2

The non-immersive (panel A) and immersive (panel B) viewing conditions in the experiment. Participants viewed and interacted with a three-minute film depicting events taking place in an art gallery. Following the film’s conclusion, participants completed a surprise recall test designed to measure incidental learning. Images are not shown to scale

Participants were tested individually while seated in a dimly-lit laboratory. Following informed consent, participants were given slightly different instructions according to the viewing condition to which they were assigned. Participants in the immersive condition were instructed to put on the headset and perform an acuity check to ensure the HMD lenses were positioned correctly. This consisted of two green horizontal lines with edges that only appeared sharp when the HMD was in the optimal medial and lateral positions. When completed, the camera position was then centred, and the title scene of the stimulus film was presented. Participants were asked to read this aloud as to verify the acuity check was valid and, if not, the check was repeated. In the ‘non-immersive’ condition participants did not have to undergo any such calibration tasks. Instead, they were presented with the title screen of the stimulus and shown how to use the touchscreen to interact with and pan the video in different directions. Once comfortable with this, the stimulus was reset, such that the title was once again in the centre of the display.

After set-up was completed, participants in both conditions were informed that they would be shown a short 360° video clip and, while free to look around (either physically or by using the touchscreen), they should keep the scene focused on the protagonist. The video was then played in its entirety, with the ability to pause, fast-forward or rewind the video disabled for participants. Once completed, participants were informed that they would be presented with 20 questions based upon what they had just seen. It was emphasized that they should attempt to answer each question as accurately as possible, but that if a particular answer could not be remembered, or if they had no memory of the event taking place, to respond with ‘don’t know’ or ‘N/A’. Participants were then presented with the recall test. Upon completion of this task, participants were given a debriefing and thanked for their time.

Results

Participant responses were coded as either correct (1) or incorrect (0), yielding a maximum correct score of 20. Some responses required a subjective judgement to ascertain their accuracy. For example, the question “What colour shoes was the protagonist’s (real) friend wearing?” was potentially answerable as any synonym of “brown”, such as “tan”, or “beige”. Therefore, all participant responses were scored independently by two different scorers who were blinded to the immersivity conditions of the participants. One author of this study served as one scorer, while the other, a doctoral student in biomedical science, had no other involvement in the research. The unambiguous design of the questions and answers led to very high inter-scorer reliability, r(38) = 0.997, p = .00001.

The purpose of this study was to examine the influence of IVR on basic memory processes. Cued-recall scores were submitted to an independent samples t-test which showed that recall was significantly higher in the immersive condition (M = 9.30, SD = 1.75) than in the non-immersive condition (M = 7.55, SD = 2.21), t(38) = 2.77, p = .009. As such, greater immersivity improves incidental learning and recall ability.

Discussion

The present study sought to examine the effect of IVR on recall using an incidental learning paradigm. In the context of learning and education, previous research has demonstrated greater efficacy for IVR interventions over traditional training methods, at least in the case of extended training programmes (e.g., De Luca et al., 2019; Howells et al., 2007). In the case of single exposures, however, evidence suggests that while IVR improves affective feelings of engagement and enjoyment, it offers no improvement (Parong & Mayer, 2021) or relatively short-lived improvement (Christopoulos et al., 2022) on learning outcomes. The present study examined the effect of immersion on incidental learning and recall memory. Participants viewed a short film via either an immersive HMD or a non-immersive 2D touchscreen, and were able to interact with the film using head movements or touch gestures, respectively. Following this, they were given a surprise cued recall task covering events in the film, and results showed that learners in the immersive condition recalled significantly more content than did their non-immersive counterparts. These results suggest a useful role for IVR as a way of delivering educational content, and further that learners naturally engage in VR-delivered content without any special instruction or preparation.

Our findings raise interesting questions about the mechanisms through which IVR can facilitate human cognition, and candidates are likely to include both arousal and attention. The ability of IVR to elicit physiological arousal cannot be overstated. Even though HMDs capable of displaying VR content have been in use for several decades, it is only within the last few years that their cost-per-unit has made them accessible to casual users (Hodgson et al., 2015). Thus for most people, IVR remains a highly novel experience and one that is eminently capable of increasing arousal relative to experiencing the same content on other devices (see Tian et al., 2021). The relationship between arousal and performance is famously depicted as an inverted U-shaped curve (Teigen, 1994; Yerkes & Dodson, 1908): Performance is maximal with moderate levels of arousal, but quickly drops off when arousal is too high or too low. Easterbrook (1959) further advanced the arousal-performance model by positioning attention as an intermediary factor. In other words, optimal attentional focus is maintained during moderate levels of arousal, but becomes narrowed when arousal passes thresholds at either end of the inverted curve. Thus, by controlling the gatekeeper to incoming information, arousal can modulate the amount of information that is encoded, stored and subsequently retrieved (see Storbeck & Clore, 2008, and Checa et al., 2021, for reviews on the contrasting effects of arousal on performance).

Because the present study focused on the effect of IVR on only cognitive performance, we elected not to measure levels of subjective affect. However, participants in both Parong and Mayer (2021) and Christopoulos et al.’s (2022) studies did show IVR-induced improvements in the psychological correlates of increased arousal—namely, higher enjoyment, motivation and learning satisfaction, and lower levels of frustration and boredom. Hence we may be seeing a picture in which IVR reliably improves the subjective and affective experience of learners, while stimulating only subtle and possibly limited effects on objective learning outcomes. Thus, returning to a point raised in our General Introduction, which we pose as an open question: What expectations do we have of IVR technology in the classroom? Would it be considered useful if it improves feelings of affect, but has only small, transient effects on knowledge retention?

While the use of IVR in education may ultimately rest on matters of school funding and finance, we suggest the following steps in order to submit the question to full scientific scrutiny. First, a proper empirical test requires a consensus of terms and concepts to be agreed. As Gonçalves and colleagues point out, we currently lack an accepted taxonomy of the various terms and definitions used in VR research, which can hinder clear communication (Gonçalves et al., 2022). Immersion, presence and fidelity are related concepts that are fundamental to creating a believable VR experience, and are widely used in the literature (e.g., Agrawal et al., 2020; Howard, 2019; Slater, 2018). However, many authors identify other concepts such as coherence (Skarbez et al., 2017), authenticity (Malliet, 2006), credibility (Boucaud et al., 2019), and realism (Ferwerda, 2003). While the definitions of the latter concepts seem to provide some overlap with the generally adopted terms of immersion, presence and fidelity, we need some consensus on whether they are merely synonyms to be used interchangeably, or independent constructs that are best measured subjectively or objectively.

Second, having agreed these concepts, education researchers need to pursue two separate lines of enquiry: (i) whether IVR improves learning in practice; and, if so, (ii) how does IVR improve learning? These different questions require comparison treatments to be set at different levels of control. In the former, a macro-level of control is sufficient to compare the efficacy of IVR interventions with traditional methods of learning (De Luca et al., 2019; Howells et al., 2007; Xie et al., 2021). Such studies can examine IVR operating at its maximal capacity; that is, exploiting its capacity to create genuinely novel and even fantastical learning environments without any constraint whatsoever. In this context, the question of IVR efficacy is a simple empirical question that asks whether learners undertaking an IVR-based programme are more successful than are those following a traditional programme. Thus, matching IVR and control treatments requires only that the training content of two interventions are valid to the abilities under investigation. The second line of enquiry—understanding how IVR works at the micro level—must examine the effect on individual cognitive processes, but doing so demands that researchers set out precisely which elements of IVR are hypothesised to drive an effect. If it is immersion, the objectives of this line of research have to be to identify which subvariables constitute a sense of immersion, reducing them to a manageable number via a process of factor analysis, and manipulating immersivity within a HMD to maintain a very high level of experimental control. With that in mind, the present findings offer an early foray into that line of enquiry, and seek to offer suggestions for how we might match modes of delivery, manipulate immersivity, and isolate individual cognitive abilities of interest. We note that other researchers are making similar attempts and showing promising results (e.g., Sauter et al., 2020; Schubring et al., 2020).

This study has several limitations which should be noted. First, immersion was manipulated coarsely, and on a binary level. As discussed above, immersivity is supported by multiple constituent elements that can be difficult to separate, and the design of the present study does not allow us to identify which aspect(s) of immersion gave rise to the observed differences in recall performance. In accordance with our proposed ‘roadmap’ for how we think immersion effects can be empirically manipulated and tested, we are currently undertaking work in our laboratory in which we are attempting to manipulate immersivity at a more granular level.

Second, the study assessed only the immediate effects of immersivity on passive learning. As such, it was not possible to determine if the observed results occurred solely due to increased arousal resulting from the novelty of the experience, which we would argue are likely to be short-term in nature. Future research will explore the longevity of the facilitatory effects of IVR on cognition, resulting from both a single-shot exposure as in the present experiment, or as a longer-term training programme involving repeated exposures.

Finally, there are limitations relating to how closely the two presentation modes can be compared. The overall resolution and effective screen size of the laptop was different to that of the HMD, and of course the source of content interaction was different (touch vs. head movement). While this may elicit caution when interpreting the results from a strictly empirical position, such presentation mediums are indicative of what would be normally found in the classroom. Therefore, if such research findings are to translate well into use in the classroom, they ought to be achievable using equipment that is currently in use in school settings.

In conclusion, the contribution of the present study is threefold. First, it seeks to identify the challenges faced by researchers investigating the use of IVR in education and pedagogy. Second, it offers a ‘roadmap’ for how researchers approach investigations into the effects of IVR on learning. Finally, it provides new evidence IVR can facilitate learning and memory in just a single exposure, and without special instruction from the instructor or preparation from the learner.