1 Introduction

Immersive virtual reality (iVR) harbors the potential for learners to access a complex multimodal form of meaning-making learning that integrates aural, embodied, and spatial components in addition to the visual elements in an interactive environment [32, 44, 73]. Learners have the opportunity to experience and explore complex (learning) spaces, such as the human body or laboratories while feeling that they are physically present and interacting with the immersive learning environment. In addition, learners often receive auditory information simultaneously, i.e., specific learning content coherent with the spatial environment that is visually conveyed. The potential of iVR lies in its ability to present multiple types of information (e.g., auditory and spatial representations) simultaneously and in a coherent combination that can be perceived and mentally integrated by the learner sustainably. However, processing multiple types of information and creating coherence between them is often a challenge for learners [86], especially in stimulus-rich iVR environments that have been shown to be cognitively demanding, if not overwhelming [82]. It follows that for learners, active engagement in the learning process is crucial. Conversely, for teachers, it is essential to have an understanding of the medium and its multifaceted effects on cognitive, emotional and motivational processes, along with their impact on the learner's development.

The potential of iVR is evidenced by previous studies that suggest the effectiveness of an immersive virtual environment as a learning tool due to positive effects on learner motivation, interest, and engagement [3, 33, 58, 76], and presence [59, 99] when learning in an immersive virtual reality. Similarly, some studies report positive results on the learning gains such as procedural knowledge acquisition (IVR: [28, 45, 106]; IVR/Desktop-VR: [95]; Desktop-VR: [67]) and increased semantic knowledge acquisition compared to learning in less immersive media ([2, 101]; Systematic Review of iVR applications in education: [80]). Contrary to this, other studies cannot empirically substantiate the learning potential attributed to iVR, partly due to the multiplicity of the sensory inputs to which the learners are exposed, making it increasingly challenging to capture the cognitive processes taking place fully. This is reflected in studies that have found either no results or even negative results in terms of learning outcomes [49, 59, 71, 76, 82]. One potential explanation for these findings is the complexity of stimuli presented in iVR, which can overwhelm and distract learners, consequently hinder their learning (e.g., [57]). Learners appear reluctant to organize detailed sensory input, integrate knowledge information cognitively, and direct their attention in a manner conducive to learning [36].

Frameworks such as the Meaningful iVR Learning framework (M-iVR-L) by Mulders et al. [72] emphasize the need to support learners to successfully construct meaningful knowledge in highly immersive environments, thereby integrating the knowledge into existing mental schemas [64] and initiating deep learning processes in an iVR learning environment [11, 107]. Existing research on effective supplemental learning strategies focuses primarily on selecting and organizing knowledge by means of cognitively processing the essential knowledge information in working memory [22, 60, 76]. However, there is a paucity of research investigating effective complementary strategies for scaffolding the learning process in virtual environments, focusing on the integration and deeper learning process of knowledge in iVR. In particular, the addition of generative learning strategies appears to be helpful to actively engage in meaningful (in non-immersive) learning situations as these relate to the learner's generative processing and thus deeper understanding of knowledge information through integrative cognitive processing [22, 57, 76]. Generative cognitive processing aims not only to reorganize presented knowledge stimuli but also to integrate them with existing mental schemas and prior knowledge [60]. To facilitate this cognitive processing specifically for iVR, the generative learning strategy of imagination seems promising.

Imagination, as one of the defined key features of an immersive virtual reality [46, 72] and a well-studied learning strategy (e.g., [13, 26, 42, 70]), is a promising approach to scaffold and thus support meaningful learning by means of integrating knowledge information for deeper learning in ephemeral learning environments [22, 27]. Imagination strategies potentially allow learners to integrate external stimuli, such as auditory knowledge information and visual spatial knowledge, into a coherent mental image and to conceptualize knowledge information internally. Spatial knowledge is the understanding of the spatial relationships that exist between objects within an environment. It enables individuals to mentally map the positions and relationships of objects within a particular space. In virtual learning environments, spatial knowledge often involves recognizing how different objects are arranged in a virtual space, enabling learners to navigate and interact effectively [79, 89], which contrasts with semantic knowledge, encompassing general knowledge about concepts or meanings [77]. When visual representations are incomplete or lacking, learners can supplement them by imagining auditory information that aligns with the visual context. This process involves mentally visualizing auditory information within the existing visual framework, effectively filling in the gaps in understanding. By combining these auditory cues with the visual elements, learners create a more comprehensive mental model of the environment. This integration helps them better understand and remember the material by enriching their internal representations. Auditory imagery prompts learners to connect auditory information to visual elements, thus expanding their mental representations and enhancing comprehension [22]. Research has examined imagination training in various contexts. For instance, Cooper et al. [13] and Ginns et al. [27] studied how imaginative techniques improve the integration of sensory information and lead to better learning outcomes. Cooper et al. found that providing structured imaginative strategies before the main learning session enabled students to better grasp abstract concepts. Similarly, Ginns et al. observed that imaginative techniques helped learners bridge gaps between different sensory inputs and improved comprehension. Fiorella and Mayer [22] emphasized that learners often lack knowledge of the benefits of imaginative strategies, which makes structured training crucial for effective use. Their work highlighted how preparatory guidance on imaginative strategies helps learners convert external stimuli into structured mental representations. This ultimately fosters better cognitive engagement and knowledge retention.

Overall, imagination bridges the gap between different sensory inputs, such as auditory, visuospatial, and internal knowledge construction. Research suggests that learners often do not use learning strategies effectively [13, 21, 83], hence, it needs to be trained before learning in an immersive virtual environment. Pre-training of imagination might facilitate learners to integrate coherent auditory knowledge information and the perceived spatial virtual environment by means of their own imagination, thus potentially improving semantic and spatial knowledge acquisition.

This gives rise to the following research questions: (1) Does a pre-training of imagination strategies promote the acquisition of semantic knowledge in an immersive virtual environment? (2) Does a pre-training of imagination strategies promote the acquisition of spatial knowledge in an immersive virtual learning environment?

By pre-training in the use of imagery strategies, we hypothesize that learners will be more engaged in meaningful information processing, i.e., effectively combining semantic and spatial knowledge presented in the immersive virtual environment. This potentially addresses the issue of iVR learning environments being overwhelming to learners and preventing them from integrating the types of information presented. Accordingly, we investigate in an experiment whether a pre-training of imagination promote the acquisition and effective integration of semantic and spatial knowledge when learning in an immersive virtual environment [26, 29, 36].

2 Theoretical background

Mayer’s Cognitive Theory of Multimedia Learning (CTML) [61], initially developed to explain learning processes in hypermedia environments, provides the theoretical basis for addressing our research questions. CTML was adapted for the use in more immersive environments, where it considers how learners process multiple sensory stimuli and knowledge information, such as visual and auditory information, and sustain their learning in a meaningful way [65]. According to CTML, learners have to successfully select multiple sensory stimuli, organize them, and finally integrate all the knowledge information into a coherent mental model of all the information provided [63]. To integrate knowledge stimuli sustainably while being exposed to an iVR learning environment, the learner is challenged to achieve a deep processing of knowledge information, and therefore, learners may benefit from the use of generative learning strategies, such as imagination [21, 22]. Imagination acts as a bridging factor between visual, auditory, and spatial inputs within a learning situation, and therefore potentially supports sustainable learning in an iVR environment by transforming multiple pieces of knowledge information into a coherent mental model [104].

Strongly related to CTML, dual coding theory proposes that learners process information through two distinct channels in working memory, the auditory and the visual channels, and is therefore highly relevant to sensory-rich learning environments [74, 75, 83]. This processing mechanism is particularly important when knowledge is presented auditorily in a highly visual and spatial medium such as iVR, where the visual and spatial components often likewise contain relevant knowledge that must be encoded. This challenges the learner's attention and thus the meaningful integration of all knowledge units to acquire a deep learning process.

Due to the limited capacity of the auditory and the visual channels, targeted regulation of the learning process becomes relevant, when learners are engaged in highly stimulating learning environments [1, 10, 17, 94]. The premise is that the human memory, particularly working memory, has limited capacities and learners are in the need of appropriate learning environments by reducing external interfering elements or regulating the complexity of the learning task by providing additional support [10, 94]. According to the dual coding theory, learning environments such as an iVR require the learner to consciously focus their attention on multiple (sensory) inputs which subsequently need to be cognitively processed to construct lasting knowledge and mental schemas [48, 97].

According to the Cognitive Load Theory [10], limited working memory capacity directly affects learners' information processing and needs to be addressed through instructional principles. To assess the degree of informational complexity experienced by learners during knowledge processing, the concept of element interactivity is employed. This involves the reduction of unnecessary complexity of information through the application of instructional design principles [92, 93]. The concept of element interactivity can be attributed to the three types of cognitive load: intrinsic, extraneous, and germane load [10, 47]. In contrast to intrinsic load, which concerns the internal properties of the information to be processed, extraneous load is concerned with the instructional procedures. Both types of load involve the learning design and material. In contrast to the aforementioned types of load, germane load is directly related to the learners’ working memory capacity, which is available for processing the information and the effort it takes to build task-relevant knowledge representations [92]. In particular, iVR can induce cognitive load on the leaner due to the immersive environment itself in addition to the information that needs to be processed in such learning settings. Studies have shown that the load placed on the learner by the interactivity and immersion of the iVR environment (extraneous load) and the high demands of intrinsic load are negatively related to learning performance [1, 23, 37, 38]. Given the diverse information channels through which learners must process information in iVR, it is crucial to identify the source of mental overload by differentiating between the three types in order to provide learners with the appropriate support [93].

Hence, the learner’s cognitive process needs to be controlled by reducing extraneous processing and managing essential processes through the use of augmentative learning strategies [62, 72]. Imagination strategies appear to be promising in assisting the learner to process the information provided by iVR, as imagination can help the learner to focus on the essential information of the learning situation. Research such as Ignatova's [42] on the imagination effect in the learning of linguistic material, within the framework of cognitive load theory, suggests that specific imagination instructions can significantly mitigate cognitive load. This is consistent with the concept of germane load, where imagination strategies can be effectively used to enhance learners' cognitive processing by facilitating the integration of new information into existing knowledge structures, thereby optimising learning in iVR environments. Beyond that, the imagination strategy can provide the learner with the support to organize the various sensory knowledge information and, above all, to integrate them into existing knowledge schemata when learning in iVR, which has received little attention in research so far. An advantage of promoting the use of imagery as a strategy in iVR environemnts is that it allows learners to process information in the iVR independently and actively. Another option is to reduce information or emphasize relevant information (see “Cuing” or “Signaling”: [96]). However, a study by Albus et al. [1] shows that signaling in VR does not improve learners' deep processing in terms of comprehension and transfer. As we assume, addressing imagery strategies may support deep processing. This corresponds to the generative cognitive process, which in turn is part of the generative learning theory, as discussed in more detail in the following section.

2.1 Generative learning theory

The generative learning theory, in which learners are actively make sense of the experiences and integrate newly constructed knowledge into mental schemas [105]. Wittrock’s [105] conceptualised model of meaningful learning consists of the components of generation, motivation, attention and memory. The components refer to the linking of the new units of information (generation), the willingness to actively make sense of them (motivation), the conscious directing of attention (attention), and the incorporation of prior knowledge (memory). In particular, linking new units of information can be a challenge for learners when confronted with multiple sensory stimuli containing units of knowledge that need to be processed in an immersive learning situation.

This theory is strongly consistent with Mayer’s [60] select-organize-integrate (SOI) model, which is closely related to Wittrock’s [105] model and a subcomponent of the CTML [64]. The model suggest that learners select relevant sensory input (e.g., visual and auditory) to briefly store it in the sensory memory. Subsequently, the selected elements are coherently organised into a mental representation for subsequent integration with existing schemas and prior knowledge. Despite this, learners tend to have difficulties in organising and particularly integrating knowledge units, especially when they are delivered through various sensory channels [59, 107]. Difficulties in organising and integrating processes, which are situated as generative processes [22], can be counterbalanced by applying complementary strategies. For learners to engage in such strategies, they need a certain level of motivation to maintain the will to invest cognitive effort to persist with the learning strategy while being exposed to the learning content [21]. These strategies aim to enhance learners’ active role in sense-making by reflecting on prior knowledge and integrating it with newly acquired information to create a coherent mental schema [22, 57].

Fiorella and Mayer [21, 22] postulate eight learning strategies that promote generative learning, and thus facilitate learners to actively make sense of knowledge information in a coherent mental model. These eight generative learning strategies are enacting, self-explaining, teaching, self-testing, drawing, mapping, summarising, and imagining. As research has shown, the value of generative learning strategies is evident in highly stimulating iVR learning environments. For example, Makransky et al. [57] investigated the effect of the generative learning strategy in an experiment in which high school students were presented with a forensic analysis of a collected DNA sample in a science simulation either iVR or a video. Participants in both conditions were either instructed to use enactment as a learning strategy by using concrete stimuli after the learning session or received no instruction. Students who were instructed to use enactment showed significantly better learning outcomes in both procedural knowledge and transfer than students who were not instructed to use enactment, but only for learning with iVR and not for learning with video. A similar positive effect of applying a generative learning strategy in iVR on learning was shown by Parong and Mayer [76]. In their study the generative learning strategy summarizing was implemented: Learners were presented with learning units in iVR and requested to write summaries after each unit. The group of students who summarised outperformed the group of students who did not summarise in terms of their learning performance. This demonstrates how beneficial it is to support learners in selecting and organizing information in a directed learning process.

The aforementioned results exemplify the effectiveness of generative learning strategies for learners in highly iVR learning environments and how learners can be facilitated in the process of organising and integrating knowledge units. However, more research is needed on the effective use of generative learning strategies to support learning in iVR. Firstly, it is necessary to show also for more learning strategies (imagination, self-explanation etc.; cf. [21, 22]) to what extent they can effectively support learning with iVR. Moreover, previous studies (e.g., [57, 76]) have mainly focused on strategies in which learners were supported after the iVR lesson or iVR segments, but not before or while being engaged in iVR. While engaged in iVR, learners need a strategy that allows them to generate a meaningful internal image of the different (sensory) knowledge information and thus integrate it into a mental model, provided that knowledge does not only consist of visual and auditory information, but also the space and spatiality experienced by the learners serves as complementary coherent input. In addition to learning strategies that are subsequently employed in the learning session, such as enactment and summarizing, the generative learning strategy imagination can be used as a bridging factor. Imagination can be utilitzed as a learning strategy to stimulate learners further to integrate coherent knowledge information from various sensory channels, as visual and auditory. Particularly in spatial-situational iVR environments, it is essential that learners link auditory information with the corresponding visual information in order to integrate them internally them internally and stimulate deep learning processes.

Imagination as a generative learning strategy in IVR environments potentially acts as a bridge between semantic knowledge (auditory presented information) and spatial knowledge (visual presented information) by supporting learners to integrate both forms of knowledge into a comprehensive mental model. As presented in Research Question 1, the visualisation of semantic information can help learners to process auditory information in a more sophisticated way, i.e. by activating and elaborating on their own prior knowledge as part of the imaginative process. Furthermore, as presented in Research Question 2, integrating the imagination-learning strategy while learners are present in iVR can also lead to a more distinct spatial knowledge. By integrating the visually experienced space through imagination with the auditory knowledge information internally, both knowledge sources can be processed by the learner in a more comprehensive mental model. Thus, deeper learning of the presented material can occur. Imagination can thus promote both semantic knowledge (see Research Question 1) and spatial knowledge (see Research Question 2). These two ideas will be discussed in more detail below, starting with the acquisition of semantic knowledge.

2.2 Imagination as generative learning strategy in an iVR: semantic knowledge

The importance of imagination as one of the generative learning strategies in an iVR environment arises from two distinct considerations. As a generative learning strategy, imagination is a well-studied approach to support learners in the creation of a mental representation of knowledge information [6, 13, 27]. Research uses imagination as a learning strategy in various forms, from imagining procedural processes [13, 53, 54] to imagining content specific processes described in writing or presented aurally [55], in order to better organise the knowledge information into a mental representation through its high adaptivity [64]. Hence, imagination enables the learner to form the subject matter into an internal image to eventually integrate it into a long-term mental schema [6]. Studies support the positive effect of the imagery strategy on the recall of facts by learners of all ages [24, 25]. In one of the studies [78], schoolchildren were taught to form mental images through pre-training. The students were then instructed to read a story and mentally visualize its content. The group that was instructed to visualize the content showed better results in recalling the facts of the content compared to the control group that was not instructed to visualize. This study amplifies the importance of pre-training imagination to successfully use this strategy to support integrating knowledge information. This applies especially to iVR learning environments, where learners need to select and organize knowledge information from various sensory channels, and imagination can thus be exploited to promote integrating this knowledge information.

Accordingly, imagination can be seen as one of the critical features of learning in an immersive virtual reality [33, 72]. As imagination can be referred to a process of “seeing with the mind's eye” ([55], p.2), therefore highly adaptive as it evolves from the learners’ individual mental schemas and is hence purely internal subjective. The iVR may trigger cognitive processes through visualization and sensory input of learning environments. Evidence suggests that imagined representations and perceived input follow similar processing mechanisms, indicating that learning processes can be stimulated not only by external stimuli, but also internally by learning strategies such as imagination [8, 55]. Therefore, immersive learning environments can be used to support learners' minds through imagination strategies to internally conceptualize knowledge information [102]. Imagination can be considered as a melting boundary between reality and the virtual world due to its function of generating an internal perception of the external stimuli (real world/virtual world) perceived through different senses, which in turn can influence the external perception of the learner [91]. Specifically, this means that learners in iVR environments are exposed to spatial-visual stimuli and are (inter-)actively engaged with them. When learners receive additional auditory knowledge information to process, imagination as an integrating learning strategy can help to process this semantic knowledge more deeply. The learner is asked to imagine the auditory material concretely in the virtual environment and thus not only to organize the knowledge, but also to process it actively and to reconstruct it in the "mind's eye", which stimulates already existing mental structures, thus to integrate and to learn deeply.

The rationale behind the CTML, specifically of generative learning processes, is the deep processing of the connection between aural and visual input to a coherent organization and integration of a mental schema [55, 60, 65, 85]. If the virtual learning environment provides the visual input, imagination can act as an integrative component to bridge the gap between external auditory and visual input and internal mental representation. Considering imagination as a fruitful learning strategy and additionally as a key component of an iVR amplifies the importance of investigating its effect on cognitive learning processes [22, 38, 72].

Therefore, with regard to our first research question, we hypothesise that the use of imagery as a pre-trained generative learning strategy in a situated immersive learning environment will enhance the learner's acquisition of semantic knowledge.

2.3 Imagination as generative learning strategy in an iVR: spatial knowledge

In addition to imagination as a generative learning strategy to actively promote the acquisition of semantic knowledge, it can also be considered that imagination supports learners in forming an internal mental image of the spatial situational learning environment. This becomes relevant when the learning environment itself contains visual and spatial information that is necessary for comprehensive understanding and deep learning. In other words, when the learning environment itself visually conveys additional, complementary, or even fundamental knowledge information. This could be the meaning of the arrangement of objects in a room, specific details of the room, or the meaning of different colors in the context of the room. Imagination may enable learners to internally augment and enrich the externally perceived environment and integrate it with prior knowledge or pre-existing mental models for a deeper integration and learning process. A feature of an iVR environment is that learners experience an episodic learning situation within a spatial context [88]. Spatiality is one of the key features of episodic memories [30, 84]. Episodic memories allow learners to relive what they have experienced, including auditory, visual, or spatial sensory stimuli [81]. Studies suggest that the neural regions of the learner that are activated during an episodic memory are the same as those for memorised input in a spatial context in iVR environments [9]. This is consistent with embodied cognition, which postulates that “cognition is a situated activity” ([103], p. 626), referring to the premise that situated cognition occurs when learners are confronted with input that is relevant to the learning task. Robin et al. [81] demonstrated that learners recall events more accurately and in greater detail with spatial context than without. Similarly, in their study, Van Helvoort and colleagues [98] investigated the effects of participants navigating through museum spaces in an iVR on spatiotemporal associative recall in episodic memory. Their results suggest that the spatiotemporal context of an iVR environment enhances the episodic memory performance, due to the highly immersive nature of the medium.

To perceive spatiality in an immersive virtual reality, the learner has to consider oneself present in the virtual environment [43], which is referred to as a “sense of being there” ([104], p. 495). This sense of being there is determined by the immersive nature of the virtual environment [104], which can lead to increased motivation and, consequently, higher levels of perceived learning [58]. Cummings and Bailensons’ [16] meta-study revealed that high immersion enhances the perception of presence within a virtual space. According to Wirth et al. [104], the learner first creates an internal spatio-visual mental model of the virtual environment. This mental model, in turn, leads the learner to relocate oneself in terms of his own actions, emotions, and cognitions in the virtual environment rather than in the real world. Thereby visual imagery allows the learner to fill in missing information about the perceived space from memory within the mental model [18]. The learner’s visual spatial imagery can be considered as an individual factor in the process of constructing an inner spatial model of the situation. It enables the enhancement of cognitive relevance of the experienced spatiality and thus leads to learners creating vivid spatial structures [18, 51, 104]. These particular properties of iVR on learners' processes may enable them to cognitively organise additional sensory input, such as auditory cognitive information, in a meaningful way and to integrate it purposefully into a mental schema, considering further support such as imagination as a facilitating learning strategy.

The beneficial relationship between spatiality and the acquisition of knowledge information is further demonstrated by studies on virtual memory palaces, in which learners are trained to associate items (semantic knowledge items) with the virtual space and thus to retrieve them more sustainably [40, 41, 52]. The findings are consistent with further evidence that learners retrieve information better in an immersive virtual reality than in a desktop version [30]. In other words, learners are taught to organise and integrate semantic knowledge information presented visually with the spatial context into a complex internal mental schema. When knowledge information is provided through auditory rather than purely visual input, another source of sensory input is added that needs to be organised into such a mental representation. The learner is required to integrate auditory information as well as visuospatial input from the iVR environment [19], whereby additional auditory stimuli have been shown to improve source memory when added to an immersive virtual environment [4]. It is crucial for the sustained integration of both external sources that the auditory knowledge information is semantically coherent with the visual-spatial input in which the learner is immersed [69].

The learner's imagination can be used to bridge the various external stimuli, consisting of spatial-situational input and auditory knowledge information, that the learner is confronted with, with the internal perception, and thus to process them in depth. For example, in an iVR learning environment, learners could be immersed at realistic historical VR site. They receive additional audio knowledge information about the historical context and events around the site. The learner cannot only perceive the site visually and spatially, therefore in an episodic manner, but also listen to the auditory knowledge information. This learning situation requires the learner to repeatedly direct and select attention to the currently relevant information (alternately visual and auditory), and to organize the information from both sensory channels. Integrating this knowledge information is crucial for a sustainable and deep learning process of spatial knowledge. Imagination can help integrate the external (auditory) knowledge information with the visual-spatial knowledge information and one's own mental schemas and existing prior knowledge. This highly adaptive strategy supports integration through an (inter-) active cognitive process [11, 56, 63].

However, learners rarely tend to use specific learning strategies spontaneously, so in order to promote successful integration of learning content using imagination as a learning strategy, learners need to be exposed to and familiarised with the strategy beforehand. Accordingly, it seems plausible to support learners' imaginative strategies by training their ability to use their imagination while engaging with iVR in advance.

2.4 Pre-training of imagination in iVR environments

Existing research in non-immersive learning settings suggests that individuals successfully use imagination learning strategies while exposed to knowledge information increases when they have been pre-trained to use the learning strategy [22, 26, 27]. For instance, Leopold and Mayer [55] utilized pre-training of an imagination learning strategy. Participants were asked to read on-screen paragraphs on the topic of the human respiratory system on a desktop, in order to form internal mental representations of the written content. In addition, they were either shown a blank area next to the text, which acted as a prompt to form an imaginary representation of certain parts of the content, or they were asked to read the texts without any pre-trained strategy. The findings show that the pre-trained imagination strategy outperformed the control condition regarding retention and transfer of the learning material, indicating the necessity to pre-train a learning strategy before employing it. Further studies for non-immersive learning situations suggest that learners can be effectively pre-trained to focus better and process the learning material presented later. It can be assumed that this is also true for iVR environments, as in sensory rich and complex learning environments such as an iVR, mental readiness seems essential for successful learning of material [59]. Therefore, the use of the imagination strategy before learners are exposed to the learning material in order to be able to apply the strategy appropriately echoes the findings of other studies of imagination strategies in non-immersive learning contexts that emphasize mental readiness for successful implementation (e.g., [13, 26]).

By supporting the learner to actively engage in the construction of a mental representation of the iVR environment, to deepen it through the use of coherent auditory knowledge information, and to purposefully integrate it with imaginative learning strategies, a more comprehensive mental model might emerge, and thus deeper learning of the presented material. This leads to the assumption of the second research question that the use of imagination through a pre-trained learning strategy will increase learners' spatial knowledge and subjective mental model of the learning environment.

2.5 Research question and hypothesis

Research Question 1. Our first research interest was to investigate whether a pre-training of imagination, as a bridging factor between external auditory and visual input and internal mental representation, increases the effectiveness of iVR on semantic knowledge acquisition compared to the control condition.

Hypothesis 1. This led to the assumption that pre-training of the imagination results in a higher acquisition of semantic knowledge compared to the control condition.

Research Question 2. Our second question is to determine whether pre-training of imagination enhances constructing a spatial situation model in iVR environments compared to the control condition.

Hypothesis 2.1. We assume that the learners’ spatial knowledge is more pronounced in the pre-training group than the control group.

Hypothesis 2.2. The learners’ subjective perception of their mental model of the iVR environment is higher in the pre-training group compared to the control group.

2.5.1 Exploratory analysis

After analyzing the research questions and answering the hypotheses, we further explored the data inductively to gain a more detailed insight into what might have influenced the learning process, considering variables such as motivation and cognitive load, in terms of semantic knowledge and spatial knowledge.

3 Method

3.1 Experimental design and participants

The study was conducted as a pre-posttest between subject design, in which learners either received the pre-training of imagination or no pre-training. The learner’s semantic knowledge, spatial knowledge, and the learners subjective perceived spatial mental model of the iVR (subjective mental model) were assessed, along with variables such as the learner’s self-perceived visual spatial imagery abilities, cognitive load, and enjoyment during the iVR learning session.

The overall sample consisted of N = 60 (25 female, 35 male) students from German universities. The participants’ age ranges from 18 to 55 years (M = 27.20; SD = 5.99).

3.2 Procedure

Participants were invited to meet individually in a lab located at our university (cf. Fig. 1). First, the participants received detailed explanation of the study and gave consent to participate by agreeing to a written consent form. Subsequently, participants were randomly assigned to the experimental group with pre-training (n = 30) or the control condition without pre-training (n = 30). All participants were then given a pre-test, a technical introduction and a habituation of the head-mounted iVR headset head mounted display, Valve Index VR HMD and controllers. Subsequently, the participants were given five minutes to familiarize themselves with the handling and the virtual environment. They were specifically instructed to explore the virtual environment as if they would have to explain it to a friend. Depending on which group they were assigned to participants either went through the pre-training or the control task, which took around ten minutes. After a short break the learning session of two times ten minutes took place, with a break of five minutes in between. Participants listened to an audio guide while moving freely around in the iVR learning environment. After the end of the learning session the participants could take off their device and had a break of five minutes, which was followed by the post-test that took around 20 min.

Fig. 1
figure 1

Experimental design and procedure

4 Materials

4.1 IVR learning environment and audio lesson

The learning session took place equally for all participants of both groups in the freely accessible app “IL DIVINO: Michelangelo’s Sistine Ceiling in VR” [20]. The environment offers a high-fidelity virtual reality experience of the Sistine chapel (see Fig. 2), true to scale. The learners could interact with the environment by moving around freely either on the floor (during habituation, pre-training or control task) or on a heightened platform (during learning session), but they were instructed not to interact with items in the environment. While immersed in the iVR, learners were able to perceive the proportions of the room, the objects in the room and the wall and ceiling paintings.

Fig. 2
figure 2

Virtual learning environment: The Sistine Chapel [20]

During the twenty-five-minute learning session the learners were on the platform listening to an audio guide with information about the Sistine Chapel. The content areas were about the ceiling and altar paintings, painting techniques, Michelangelo and the Sistine Chapel in terms of history, architecture, restoration and cleaning process. Participants were asked to memorize as much of the learning content as possible.

4.2 Pre-training of imagination learning strategy

The pre-training for the experimental group took place before the start of the learning session within the iVR of the Sistine Chapel (Fig. 2). The structure of the pre-training was closely linked to the content structure of the audio information in order to mentally prepare the learner to use the imagery strategy as effectively as possible during the learning session. The pre-training was divided into three parts in order to prompt the learner on what to specifically imagine [21]. Three parts could be identified, Structural Anchoring, Chronological Integration, and Symbolic Association.

Structural Anchoring refers to content that is directly linked to the physical structure of the Sistine Chapel and emphasizes the anchoring of knowledge in the physical structure of the environment (e.g. “The structure of the ceiling is curved like a barrel and looks like a tunnel”). Chronological Integration reflects the integration of historical and temporal information into the learning process and represents knowledge information items like years (e.g. “It is believed that it was built between 1473 and 1475”). Symbolic Association includes more abstract, episodic content and describes the association of knowledge with symbolic or metaphorical elements (e.g. “From now on, he decided to paint the rest of the ceiling himself, without any help or guidance”). In order to prompt the learner’s imagination of those three parts, a common topic was chosen, which was widely related to the virtual environment, easy to grab, and not cognitively overloading. The learners were explained the benefit of imagining knowledge information within the space and asked to imagine the following:

  1. 1.

    Structural anchoring—Content that is directly connected to the structure of the chapel: Choose a spot in the room and imagine that from the ground an ivy plant is growing up the wall. Ivy is a climbing plant, with heart-shaped dark green leaves. Yet ivy is not a parasite that feeds on the plants/walls/walls it grows up, it just uses them as scaffolding. In autumn it grows light green berries that develop into blue-black poisonous berries by spring.

  2. 2.

    Chronological integration—Knowledge information like years: There is ivy that is between 400 and 500 years old and it is a symbol of life. Although its berries are poisonous, ivy was chosen as the medicinal plant of the year in 2010 by the University of Würzburg.

  3. 3.

    Symbolic association—More abstract, episodic content: Ivy has a long history, in ancient Egypt it was dedicated to Osiris, the god of fertility and ruler of the realm of the dead. In this meaning, the ivy was also adopted in Christian symbolism. Already in early Christianity, ivy images can be found on stone coffins.

After each section, learners were intensely asked by the experimenter to describe in great detail how and where they imagined the information in the virtual space. This was to prevent learners from disengaging from the task and, if they had difficulty imaging the information, to create the inner picture with the guidance of the experimenter (e.g., by asking for more detailed description of the imagined scenery). Lastly, students in the pre-training group were reminded of the imagination learning strategies’ benefits and encouraged to apply it in the subsequent learning session to achieve a higher learning outcome. The audio guide had pauses of around 12 s after each information cluster. During those seconds the learners of the pre-training group were asked to imagine the content listened to [36]. The control group were not given any specific instruction how to use the breaks.

4.3 Pre- and post-test instruments

All instruments utilized and outlined hereafter are listed in Table 1, indicating the number of items, answer format, M, SD, and Cronbach’s Alpha. The pre-test measured the demographic data, self-reported prior-knowledge, and visual spatial imagery abilities. The post-test consisted of scales measuring motivation, cognitive load, spatial presence, spatial situated model, spatial knowledge, and semantic knowledge.

Table 1 Descriptive statistics of pre- and posttest instruments

4.3.1 Prior knowledge

Learners' prior knowledge of the Sistine Chapel was measured by self-assessment, asking whether they had been exposed to information about the Sistine Chapel or have visited it. We purposefully did not implement a knowledge test, to prevent learners from focusing and guiding their attention on spatial and auditory knowledge information as soon as they entered the iVR environment.

4.3.2 Visual spatial imagery

The visual spatial imagery was adapted from Vorderer et al. [100], as e.g. “When a picture shows only part of a space, I can clearly imagine the rest of the space”.

4.3.3 Enjoyment

To measure the participant’s interest the subscale ‘Interest/Enjoyment’ of the Intrinsic Motivation Inventory [66] was chosen. To evaluate how autonomous participants felt in the virtual reality learning environment and how it was affecting the motivation one item was added (“I had the feeling that I could do anything I wanted in the learning environment “).

4.3.4 Cognitive load

In order to assess the learners’ cognitive load (extraneous, intrinsic, and germane load) the scale of Klepsch et al. [50] was utilized accordingly to the study by means of including the term ‘in the virtual reality’ in the items when necessary.

4.3.5 Subjective mental model of the iVR

The scale measuring the spatial presences has been composed of four items of ‘Spatial Presence: Self Location’ and three items of the ‘Spatial Presence: Possible Actions’ from the Spatial Presence instrument by Vorderer et al.’s [100]. Additionally, Vorderer et al.’s [100] subscale Spatial Situation Model (SSM) was integrated. The scale called ‚Spatial Situation Model – audio’ (SSMaudio) was adapted from the subscale SSM [100] and the Object-Spatial Imagery Questionnaire [7]. It aims to determine whether participants can imagine the audio content while being present in the virtual environment. The scales Spatial Presence, SSM and SSMaudio represent the learners’ self-perceived and subjective mental model of the spatiality of the iVR environment. The instruments aim to gain insight into how the learners themselves assess the construction of the mental model of the iVR environment and are therefore a subjective assessment as opposed to the objective spatial knowledge instrument in the next section. This subjective self-assessment is particularly important with regard to the learning strategy of imagination, as the learner internally and subjectively transforms the objective external iVR stimuli into an internal subjective mental model.

The experiment group received a further item, which was directly connected to how they perceived the usefulness of the pre-training („The training prior to the learning session helped me to imagine the content of the audio guide in the virtual environment more easily “).

4.3.6 Spatial knowledge

In order to more comprehensively capture the learners' evolved internal representation and its’ objective correctness of the iVR and its objects, the operationalized Spatial Knowledge was measured with the help of four subscales (Room-Objects, Room-Size, Wall-Paintings, Ceiling-Paintings), which are utilized as individual subscales to more precisely depict the effect of the pre-training. The items of the scale Room-Objects asked the learners to specify the correct number of windows, doors, candleholders on the marble fence, stairs, balconies, and candles on the altar in the Sistine chapel. All items of the Room-Objects are relevant as there is just a limited number of objects present. The learners were specifically asked to memorize the room; hence the accurate rendering of the objects represent the objective correctness of the inner spatial mental model. Further, the learners’ perception of the room scale was measured. This subscale has a semantic knowledge component as the relation of the room size was specifically mentioned in the audio guide information (“Both buildings are twice as long as they are high and three times as long as they are wide.”). To obtain a more comprehensive impression of the mental model of the learning environment, the sequences of paintings on the side walls (Wall-Paintings) and on the ceiling walls (Ceiling-Paintings) was tested. The paintings were divided into sub-scales as the paintings of the Paintings-Ceiling were explicitly described and spatially located by the audio guide. All spatial knowledge subscales had an open answer format, which was coded on a percentage scale.

4.3.7 Semantic knowledge

The semantic knowledge test consisted of 34 true–false statements of the audio guide knowledge information (“One of the first two paintings Michelangelo painted in the Sistine Chapel were "the story of Noah and the Great Flood"), of which four items were removed as they were correctly answered by all learners.

5 Results

Before our main analyses, we checked whether there were any pre-existing differences between the pre-training and the control group. The randomized samples of the pre-training and control group did not differ significantly in age, t(58) = 0.60, p = 0.551, and there was no difference in gender proportion, X2 (1, N = 60) = 0.07, p = 0.793. The self-reported prior knowledge about the Sistine Chapel, its creation and architecture indicated that the learners have hardly engaged with the topic before (M = 0.47, SD = 1.01). The level of low prior knowledge also did not differ significantly between the groups, t(58) = 0.70, p = 0.485, and the learners’ self-reported visual spatial imagery, t(58) = 0.53, p = 0.600.

Our first research interest was to investigate whether a pre-training of imagination, as a bridging factor between external auditory and visual input and internal mental representation, increases the effectiveness of iVR on semantic knowledge acquisition compared to the control condition. According to this question, we tested whether the pre-training of imagination leads to a higher semantic knowledge acquisition compared to the control condition. Both the pre-training-group (M = 23.70; SD = 2.54) and the control-group (M = 23.53; SD = 3.59) strongly performed in the semantic knowledge post-test (max = 30), however, both, the pre-training group and control group did not significantly differ with respect to the learners’ performance in the semantic knowledge post-test, t(58) = 0.207, p = 0.418, d = 0.05. Accordingly, our data do not support our first hypothesis, that the pre-training of imagination leads to a significantly higher gain in semantic knowledge than the control group.

Our second question is to determine whether pre-training of imagination enhances constructing a spatial situation model in iVR environments compared to the control condition. In accordance with this question we further examined the effect of the pre-training of imagination on the learners’ gained spatial knowledge and the construction of the subjective mental model of the iVR environment compared to the control group, with a one-sided t-test (Table 2). Learners of the training-group performed descriptively, but not significantly better on two subscales of the spatial knowledge test, the wall-paintings, t(58) = 0.96, p = 0.171, d = 0.25, and ceiling-paintings, t(58) = 1.19, p = 0.120, d = 0.30. In contrast to our expectations, learners in the control group performed, non-significantly, better on the subscales room-size, t(58) = −1.93, p = 0.971, d = −0.50 and descriptively better on the room-objects t(58) = −0.79, p = 0.782, d = −0.20. The results partly counter our assumption that learners in the pre-training group perform better in spatial knowledge than the control group.

Table 2 Descriptive and test statistics for semantic and spatial knowledge and subjective mental model

In the subjective mental model of the virtual learning environment, the learners responded concerning the spatial presence, t(58) = −0.58, p = 0.717, d = −0.15 marginally in favor of the control group. There was no statistically significant difference between the groups in how they imagined the information presented by the audio guide, t(58) = −0.33, p = 0.630, d = −0.09. However, the spatial situation model tended to be perceived more clearly by the pre-training group, t(58) = 1.37, p = 0.088, d = 0.35, which is at least partially consistent with our assumption that learners in the pre-training group had a more pronounced subjective mental model.

5.1 Further analysis

As our data does not support the assumption that pre-training of imagination affected students learning with iVR, we further explored the data to outline challenges or effects not included in our hypotheses. Particularly, we aim to inductively explore patterns to obtain a more detailed insight into what factors might have influenced the learner’s semantic knowledge and spatial knowledge of the learning environment.

Existing studies on the effectiveness of iVR learning environments have highlighted the importance of the learner’s enjoyment and interest (e.g., [57, 68]). Our data suggested that learners who enjoyed engaging the iVR environment tended to focus their attention on what is relevant to learning and thus might have minimized distracting stimuli, as indicated by the negative correlation between enjoyment and extraneous load, r(58) = −0.36, p < 0.01. This in turn could have enhanced the learner’s mental capacity to process semantic knowledge information, indicated by the positive correlation between semantic knowledge and germane load, r(58) = 0.29, p < 0.05. Consistent with this, extraneous cognitive load does not show a significant correlation with semantic knowledge, supporting the assumption that learners were not cognitively overloaded by semantic knowledge, r(58) = −0.03, p = 0.84. However, it is noticeable that extraneous cognitive load is significantly correlated with the imagination of auditory knowledge content, r(58) = −0.48, p < 0.001. This indicates that learners who imagine the auditory knowledge content focus their attention more pronounced on the internal integration of the auditory knowledge rather than the extraneous details.

The correlation between semantic and spatial knowledge only reached statistical significance for the subcategory ceiling-painting, r(58) = 0.24, p < 0.05, what may be explained by the nature of the items. First, the learners were closest to the ceiling walls regarding location, and some of the audio information was directly related to it. Thus, directing attention to the paintings and connecting it to the semantic audio information was possibly more ostensible for the learners. Also, noticeable is the positive correlation between extraneous load and the room-object, r(58) = 0.32, p < 0.05, and wall-painting subscales, r(58) = 0.26, p < 0.05. This could indicate that the questioning of the correct arrangements and object numbers had a strong mental impact and might have overloaded the learner.

Contrary to the objective retrieval of spatial knowledge, the self-reported spatial mental model, especially the imagination of auditory information in the virtual environment, correlates negatively with extraneous load, r(58) =−0.48, p < 0.001, and positively with the students’ enjoyment, r(58) = 0.32, p < 0.05. This might indicate that learners have devoted greater attention to integrating auditory information within the perceived learning environment to form a more comprehensive mental model due to increased enjoyment. The extent of the self-reported spatial mental model is further reflected in the strong positive correlation with the students’ spatial situation model of the audio content, r(58) = 0.38, p < 0.01, and their perceived spatial presence, r(58) = 0.44, p ≤ 0.001.

The correlations of the self-reported mental model scales raised the question of the relevance of the learner's visual spatial imagery as a personological prerequisite for constructing a vivid and meaningful internal mental model by enhancing the cognitive significance of spatiality, indicated by its strong correlation to the spatial situation model audio, r(58) = 0.38, p < 0.01, and the students’ spatial situation model, r(58) = 0.56, p < 0.001. One might question, however, to what extent the visual spatial imagery contributed to the creation of a vivid and internally consistent mental model. In order to identify preliminary indications, learners' self-reported visual spatial imagery was divided into two groups by calculating the median. Learners with a score above 3.6 were assigned to the high visual spatial imagery group and learners below that to the low group. As a result, there were 32 learners in the high group and 28 learners in the low group.

5.2 Differences between high and low visual spatial imagery

Table 3 shows the semantic and spatial knowledge, as well as the subjective mental model performance of both groups. Among all results, only semantic knowledge, t(58) = 2.76, p = 0.008, d = 0.71, spatial situation model, t(58) = 3.74, p < 0.001, d = 0.97, and spatial situation model audio, t(58) = 1.98, p = 0.053, d = 0.51, revealed significant group differences. The correlation patterns of the outcome variables showed, in addition to the already discussed correlation between visual spatial imagery, spatial situation model and spatial situation model audio (Tables 4 and 5), a positive relationship between the high vs. low visual spatial imagery groups and the acquisition of semantic knowledge, r(58) = 0.34, p < 0.01.

Table 3 Descriptive and test statistics for semantic and spatial knowledge and subjective mental model for high vs. los visual spatial imagery
Table 4 Correlations for semantic and spatial knowledge and subjective mental model, n = 60
Table 5 Correlations for semantic and spatial knowledge, subjective mental model and high (n = 32) vs. low spatial visual imagery group (n = 28)

The correlational patterns of the variables discussed suggested that a strong relationship might exist, therefore, a multiple regression analysis was conducted. As the spatial situation model and spatial situation model audio are theoretically strongly interdependent, they were treated as interacting covariates, in addition to the high vs. low visual spatial imagery. The dependence between the spatial situation model and spatial situation model audio was confirmed by the analysis, cf. Table 7. The results conduced to the assumption that the interaction between the spatial situation model and the spatial situation model audio together with a high self-perceived visual spatial imagery predicted the learner’s semantic knowledge acquisition in the iVR learning environment, R2 = 0.26, F (4,55) = 6.13, p ≤ 0.001, Tables 6 and 7. The results indicate that learners with high self-perceived visual spatial imagery and a higher perception of their creation of a subjective mental model of the iVR environment learn semantic auditory knowledge more effectively.

Table 6 ANOVA table of multiple regression model on semantic knowledge acquisition, n = 60
Table 7 The result of multiple regression analysis on semantic knowledge acquisition, n = 60

6 Discussion

The aim of the study was to investigate the extent to which the spatial-situational context of iVR plays a role in learning, the extent to which learners are supported in their learning process by pre-training the generative learning strategy of imagination, and which underlying processes play a part when learning in iVR learning settings.

With regard to research question 1, our results did not support our assumption that pre-training of imagination would enhance learners' acquisition of semantic knowledge while engaging in iVR. However, we did observe tendencies in favor of the pre-training group regarding semantic knowledge. We attempt to explain these unexpected findings by further exploring the extent to which pre-training actually led to better strategy use, and whether learners consistently used the pre-trained learning strategy throughout the session. First of all, learners were taught a new learning strategy in a relatively short period of time, which they were expected to use throughout the learning session. The high adaptivity of the imagination strategy plays an important role here, i.e. it is entirely up to the learner to apply the strategy and can only be stimulated externally, not controlled. Because of the (particularly extraneous) load of the new learning strategy, learners may have made an effort to use imagination as a learning strategy, and it may have focused their attention. However, they could not sustain it through the learning session, making the use of the strategy mostly ineffective. This is also supported by feedback from learners (through transcribed interviews) who indicated that they fell back on their familiar strategies over time. This is known as the mathematantic effect [12]. The effect describes how learning is hindered when learners are asked to replace pre-existing learning strategies with dissimilar alternatives. In high performing learners, as indicated by the high learning scores in this experiment, it takes the form of novel strategy substitution [12]. This form of substitution describes when a novel learning strategy has been introduced to high achieving learners who suppress learning while attempting to use the new strategy, and occurs particularly when different units of knowledge are to be integrated by a learning strategy such as imagination, in this case auditory semantic knowledge with visual spatial input [14]. In order to counteract novel strategy substitution and thus facilitate the learning process instead of hampering it, learning strategies need to be trained sufficiently extensively, especially when dealing with new and complex learning environments such as an immersive virtual reality [15]. One way to counteract this is to conduct a knowledge test before the pre-training to assess whether they are high achieving or low achieving students. Subsequently, the pre-training can be adapted according to the grouping. A more adaptive and individual pre-training would also correspond to the adaptivity of imagination as a learning strategy. In this way, the needs of the learner could be addressed, as well as ensuring that the training has the desired effect and that the strategy is applied in a way that promotes learning.

In addition to the need for deep learning of the strategy, imagination as a learning strategy in iVR environments may be further enhanced when cognitive schemas are already in place. As imagination itself requires a large amount of cognitive capacity and attentional focus, it is challenging to maintain when the learner is additionally confronted with many new pieces of information, such as through the multiple sensory and semantic inputs during the iVR environment learning session [13, 27]. Although the learners in our study performed well on the semantic knowledge post-test, they indicated that they mainly lacked prior content knowledge, which may have led to a lack of focused attention, resulting in the new strategy substitution and hindering learning effectiveness. This idea is consistent with Wittrock's [105] model of meaningful learning, which emphasizes the importance of attention in effectively selecting, organising and integrating sensory input into schemas [60]. Learners' attention could be additionally supported, in parallel with the use of imagination. Thus, learners are provided with concrete aid, e.g., signaling, in selection and organization, which could provide them with increased capacity for integration of the learning material through the learning strategy. The VR environment was designed to enhance learning by integrating visual and auditory information to move beyond mere memorisation to a deeper, multi-sensory engagement. Despite the visual richness, auditory descriptions were necessary to fully grasp the historical and artistic contexts, which constantly referred back to the visual environment, requiring learners to imaginatively link the audio with visual cues to construct a comprehensive understanding. However, the effectiveness of this integrative approach varied between learners. The lack of significant differences between conditions suggests that some learners may have struggled with the cognitive demands of processing complex multimodal information in such an iVR. This points to the potential challenge of cognitive processing and attentiveness in immersive environments, where learners may focus on either visual or auditory information to the detriment of the integrative task. These findings highlight the need for instructional designs that provide more effective cognitive support in iVR, in addition to a thoughtful learning environment. Future research should explore different levels of guidance to optimise cognitive integration and improve learning outcomes in different visually stimulating iVR environments. Addressing these instructional challenges can help to increase the educational potential of immersive technologies.

According to research question 2, our results did not support the assumption that pre-training imagination as a learning strategy strengthens the learner's mental model. A possible explanation for this result could be that learning in an iVR environment may be highly sensitive to attentional processes. The pre-training group performed slightly better on the spatial knowledge subcategories of wall and ceiling paintings, possibly due to the link with the auditory knowledge information. Here, the unconsolidated learning strategy may have suppressed a greater learning gain and diffused the attentional focus of the learners. The assumption that the learners in the pre-training group had different attentional and regulatory control than the control group is suggested by the finding that the control group slightly outperformed the pre-training group in remembering spatial details, such as room size and room objects, of the iVR environment. While the control group seemed to pay more attention to spatial details, the pre-training group may have been engaged in alternative processing, such as trying to apply the new learning strategy. The different attentional processes that the groups went through are also reflected in the results of the subjective mental model, where the pre-training group outperformed the control group in the spatial mental model. However, the control group outperformed the pre-training group in their perception of imagining the auditory content and their spatial presence. The pre-training group's shifting attention between imagination and their own learning strategy, between auditory and visuospatial input, may have reduced their spatial presence but increased their sense of experiencing the spatiality of the virtual environment. In both conditions, attentional processes were not sufficiently conducive to the acquisition of semantic and spatial knowledge. The intentional effect of the pre-training was to merge visual and auditory information to support the learner to integrate all knowledge information meaningful internally, thus promoting deep learning. However, this may not have been effective as learners seem to have difficulties in fully focusing their attention on integrating the knowledge information through their imagination, which means that pre-trained imagination strategies need to be accompanied by attentional guidance for students, such as maintaining the learning strategy or facilitating prior schema construction [22]. Learners may need to have automated and internalised the imagination learning strategy in order to apply it consistently and thus integrate auditory knowledge with spatial information into a coherent mental model. The potential interaction between attentional processes and the effectiveness of pre-training as learning strategy may require special consideration, especially compared to less immersive media where learners can process environmental stimuli more naturally. This could be achieved, as proposed earlier, through supplemental support that specifically directs attention.

The subsequent exploratory analysis illustrated the necessary attentional control and mental load of the different information channels that have to be processed by learners, indicating the complexity of the input that learners have to process when learning in an iVR environment cognitively [104]. The focus was on self-reported visual spatial imagery as a central requirement for constructing a meaningful mental model and thus facilitating learning in an immersive virtual environment. The results indicate that learners with higher self-perceived visual spatial imagery and an interacting perception of the learning environment and the imagined auditory information in the room showed increased semantic knowledge acquisition. This suggests that coherent spatiality may have a positive effect on the learning process if certain personological prerequisites are met. However, a similar effect was not found for spatial knowledge. This may be due to the different channels through which the information was conveyed. Semantic knowledge was presented auditorily and learners had the opportunity to focus their attention on both visual and auditory input. Thus, coherent auditory knowledge could be linked to the visual-spatial input of the situational environment, especially by learners with high visual-spatial imagery abilities. Details of spatial knowledge were captured only incidentally by learners without directional attentional guidance. Learners had to self-identify and capture the additional visual spatial details, which may have required additional cognitive capacity through further selection and organization, rather than being a natural consequence of the large visual input from the chapel. Even if learners did not perform well on spatial knowledge, they might have constructed a meaningful mental model, which may not be objectively accurate, but is useful for integrating auditory semantic knowledge into a more comprehensive internal schema, which in turn highlights the adaptive character of imagination as a learning strategy. As learners showed challenges in acquiring spatial knowledge incidentally, it's imperative to test the effectiveness of various signaling or auditory cue strategies [1] to help learners focus their attention on spatial details relevant to learning. This research would aim to determine which cues best help learners focus on and integrate various spatial details, thus enhancing their overall spatial understanding, and how this is relevant for learning the information in iVR. Visual spatial imagery is the component that, according to Wirth et al. [104], ensures that a vivid spatial image of the environment is created in the learner by supporting the cognitive meaning of the spatial structure. Given the impact of visual spatial imagery on learning outcomes, future studies should explore targeted training programs designed to enhance these abilities in learners. This could involve controlled experiments to measure the efficacy of such training in improving iVR learning experiences and support learners with low visual spatial imagery to improve their learning process. In terms of an imaginative learning strategy, this could mean that the learner's visual spatial imagery ability acts as a biplane that counterbalances or suppresses the learning strategy when the strategy has not been internalised by the learner. In addition to a more pronounced training of imagination to ensure that the integration of knowledge information can take place, more emphasis could be placed on the training of visual spatial imagery and the meaningfulness of embodied space as a learning environment. The adaptive nature of imagination strategies points to the need for iVR learning environments that dynamically adjust to individual learner profiles and needs. Further research could develop and test environments that modify instructional support based on real-time assessments of learners' cognitive loads and visual spatial imagery abilities. Spatiality training facilitates the formation of a stable mental model of the immersive virtual environment, which can subsequently be enriched with semantic knowledge to enhance learning and strengthen schemata [35, 39, 41, 52, 63]. Our study underscores the importance of considering individual differences in designing iVR content. Future work should delve deeper into how pre-assessment of visual spatial imagery and other cognitive skills can guide the creation of personalized learning paths that adjust content complexity and instructional strategies accordingly [93]. In particular the long-term effects of imagination and visual spatial imagery training on learners' ability to process complex multimodal information in iVR should be assessed. This research could explore how sustained training interventions influence learning effectiveness and cognitive engagement over time, considering the leaners individual profiles and learning strategies [87].

7 Limitation and future direction

A limitation is that the learner's prior knowledge was self-reported and not measured by a knowledge test, so knowledge gain cannot be determined. Although the self-reported prior knowledge provides an insight into the extent to which learners have had contact with the learning content or visited the chapel in person, self-reports vary at times from objective test results. To counteract this, future studies could practice prior content knowledge before students enter the learning session in iVR, which could further benefit the success of implementing a learning strategy. Similarly, the self-reported visual spatial imagery ability of the learner, although a subjective assessment may also be conducive in this case to the subjective construction of a mental model of spatiality as it might be beneficial to learning knowledge information. Future work should involve other measurements, like knowledge tests, mental rotation tests or physiological measurements.

Another aspect to discuss is the internal consistency of the knowledge test, as well as the different internal consistency scores of the cognitive load types. One potential explanation for the lower values of intrinsic and germane load compared to extraneous load is that the items measuring extraneous load are likely to be reflective of a similar concept. Extraneous load often includes more homogeneous indicators that consistently reflect similar types of distractions in the learning environment, potentially leading to higher inter-item correlations. This implies that changes in one item will likely predict changes in another. In contrast, germane and intrinsic load often encompass a wider range of cognitive processes that are highly dependent on the specific characteristics of the material being learned and the learning strategies employed [87]. According to Stadler et al. [90], germane and intrinsic load may reflect a formative model more than a reflective one, where different questions assess diverse learning strategies or types of learning activities. Similarly, the knowledge test performance is formative in nature, as each question provides unique information that contributes to the overall knowledge score. This suggests that the items contribute uniquely to the overall construct and are less directly interchangeable, leading to lower internal consistency scores [90].

A further limitation might be that it was not assessed whether the pre-training of imagination was effectively carried out by the learners and to what extent the learning strategy was applied during the learning session. In terms of pre-training, speaking aloud the scenarios presented prompted all participants to engage with the pre-training content. However, future research could include more time-intensive and extensive pre-training to ensure that the learning strategy is internalised by ensuring that the learner has fully processed and adopted the learning strategy. Learners could, for example, practice the learning strategy by sketching the imagined ideas after each segment or linking them more closely to personal experience and then expressing them loudly until the strategy is internalised. Furthermore, the learning strategy could be directed by employing scaffolds during the learning session, such as auditory or visual attention steers.

The study has shown that iVR environments can be highly sensitive to attentional processes, that learners are challenged by the amount of sensory input they have to process cognitively, and that learning strategies such as imagination need to be carefully implemented to effectively support learners, as it is highly adaptive, difficult to control externally, and therefore in the learner's own responsibility to utilize. In order to fully exploit the potential of imagination as a learning strategy, to stimulate learners in the integration of the learning material and thus deep learning, a more individual conception of the implementation of this learning strategy should be considered. In order to use learning strategies profitably in educational situations, the characteristics of the learners must be considered as well as the effects of different learning supports, so that appropriate adaptations and extensions can be provided. Learners who need further support in selecting and organizing could receive additional support such as signaling to the use of imagination. High achieving learners, on the other hand, could benefit from more intensive and longer pre-training. Our contribution highlights the need for further in-depth research to understand the impact of emerging technologies such as VR on the learner and how it can be transferred into a setting that promotes learning and meets the learners needs. The focus in this regard should be on harnessing the potential of iVR and seeing the learner as a self-efficacious individual who is equipped with existing resources and mental models that can be further strengthened and activated by emerging technologies. In future research, the potential interaction between attentional processes and learning strategies could be considered by adding measures such as eye-tracking to enable a better understanding of learners' attentional shifts and the underlying complexity of cognitive, affective and motivational processes during learning in immersive virtual environments. The personal characteristics of learners, their learning processes and their capacities should be given greater consideration when designing learning settings, so the individual learner needs are addressed more pronounced.

8 Theoretical and educational practice implications

This study contributes to the understanding of constructivist learning within iVR environments. As an interesting avenue for future research and as already applied in different learning contexts [31], the ideas and results presented here can be used to rethink constructivist ideas in the context of immersive multimedia learning. From a theoretical perspective, our study contributes to the constructivist framework by demonstrating the necessity of carefully embedding active engagement through imagination in order to facilitate the integration of multimodal information, thus promoting deeper learning. This aligns with and extends existing cognitive load theory by highlighting how specific strategies like imagination and visual spatial imagery can modulate cognitive load, particularly germane and intrinsic loads, to enhance learning outcomes. In practical terms, our findings have important implications for the design of educational settings in iVR. The study highlights the necessity to consider cognitive factors such as prior knowledge, attention, cognitive load, strategy use, and individual differences in visual-spatial imagery abilities when developing iVR learning scenarios. By focusing on these elements, educators and researchers can create more effective learning environments that not only engage learners but also significantly improve their ability to acquire and integrate knowledge. Moreover, our research encourages the development of personalized learning interventions that cater to individual learner profiles. This could lead to more adaptive learning systems that dynamically adjust the complexity and type of content presented to optimize the learning process for each learner. In order to provide a basis for adaptation, i.e. data, iVR technology provides a lot of information about the actual learning behaviour (e.g. eye movements), which can be explored by more fine-grained process analyses [5, 34]. This information can be employed to modify iVR environments to enhance or impede learning outcomes. Such advancements could change how educational content is delivered and experienced in iVR, making learning both more effective and enjoyable. In conclusion, this study provides valuable insights that contribute to both theoretical and practical advancements in the use of immersive technologies for education. It paves the way for future research that can further explore and refine the techniques and strategies that best support learning in complex digital environments. In this manner, the learner is encouraged to assume responsibility for their own learning process.