1 Introduction

The purpose of this study was to examine the challenges that adult participants experienced in immersive virtual reality (I-VR, referring to systems with high-end technical properties). The recent global COVID-19 (coronavirus disease) pandemic directed attention and interest towards alternative ways to secure workers' hard skills development remotely. At the same time, virtual reality (VR) technologies have emerged with promises to deliver autonomous skills training within emulations of true-to-life scenarios. However, the best practices for VR application development have become tacit knowledge and theory-based instructional principles are scattered across various publications. To advance the autonomous training capabilities of VR, we believe that it is necessary to study the factors that might hinder the self-study of skills using VR and to start building towards a unified model of immersive learning design.

As the availability and stability of “immersive interfaces” have increased and costs have decreased, companies from various fields have shown interest in their adaptation to employees' skills training (Sagnier et al. 2021). According to Agrewal et al. (2020 p. 404) immersion is “a state of deep mental involvement in which the individual may experience disassociation from the awareness of the physical world due to a shift in their attentional state.” Extended reality technologies that enable immersion have offered access to moderated spaces and scenarios that surpass obstacles related to accessibility, time, costs, resources, and safety (Vasarainen et al. 2021). Researchers in the field of VR training have pointed out that industries are rapidly adopting VR technologies as part of the tools to deliver various learning content, including skills training (Carruth 2017; Radhakrishnan et al. 2021; Radianti et al. 2020; Xie et al. 2021). Through successful simulations of true-to-life environments and situations, VR training shows promise for facilitating knowledge and skills transfer from simulations to the real world (Dobrowolski et al. 2021; Ricca et al. 2021; Waller et al. 1998). VR training has been adapted in various domains such as first-responder training, medical training, transportation, military training, interpersonal skills training, and workforce training (Carruth 2017; Xie et al. 2021).

Many review studies have offered comprehensive assessments of the efficiency of I-VR training applications in diverse fields such as education (Jensen and Konradsen 2018; Radianti et al. 2020), rehabilitation (Elor and Kurniawan 2020), and industrial skills training (Radhakrishnan et al. 2021; Xie et al. 2021). Skills training in I-VR learning environments (I-VRLE) can provide flow-inducing opportunities for trainees of various domains to practice typical situations and complex tasks in safe and controlled environments. However, researchers have raised concerns regarding the lack of explicit use of learning theories for the design of VR learning applications (Radhakrishnan et al. 2021; Radianti et al. 2020). Furthermore, most applied theories were general instead of medium-specific; that is, they rarely considered the use of immersive interfaces and the presentation of immersive media for the creation of learning experiences and spaces (Radhakrishnan et al. 2021).

In the future, workforce training conducted in I-VRLEs would aim to offer self-regulated autonomous training activities within emulations of real-world scenarios that are modifiable, controllable, and evaluative (Korhonen et al. 2022). In fact, a recent review of industrial skills training by Radhakrishnan et al. (2021) claimed that 77% of the reviewed applications were already applicable for autonomous remote training because they featured built-in multimedia instructions. However, the inclusion of virtual instructions does not necessarily guarantee a seamless learning experience and adaptive intelligent tutoring within I-VR hard skills training software can still be considered a novelty (Laine et al. 2022). Furthermore, some industrial training situations may require the presence of a trainer to offer early feedback on task performance (Xie et al. 2021) and to oversee simulation issues. Immersion and desirable transfer of training are not necessarily guaranteed if interaction with the interface is unsuccessful, or the learner encounters cybersickness symptoms or is unable to perform the simulation’s training tasks because of inadequate psychological fidelity (i.e. improper mapping of basic psychological theories to produce the intended behaviours and processes) (Ho 2020). This could be especially problematic if a trainee had to discontinue training during distributed or autonomous training without reaching its intended goals. To influence the design and development of training, tutorials, or even adaptive systems, recent research has suggested that the factors that hinder immersive professional training should be identified and studied further (Obukhov et al. 2023).

Increasingly, distributed skills training VR content has been developed to offer spaces for autonomous training within the privacy of a mistake-tolerant personal space (Radhakrishnan et al. 2021; Xie et al. 2021). To investigate autonomous immersive professional training of complex skills and develop supportive tools for VR training, we joined forces with a Finnish I-VRLE development company, Upknowledge (Upknowledge.com), for a research–practice partnership (RPP; Coburn and Penuel 2016) project. They explored the possibility of gathering deviant behavioural data for the development of an artificial intelligence (AI) tutor for VR-based training applications. Meanwhile, we investigated the structure and constraints of immersive learning. Together we aimed to improve autonomous capabilities for VR-based training of complex skills for industrial maintenance and assembly (IMA) workers. We defined the complex skills for IMA as procedural skills (i.e. mastery of processes and sequences of actions related to operational and assembly tasks or safety procedures) and decision-making skills (i.e. realisation and use of relevant actions or abilities when attempting to perform a task). In our initial RPP discussions, the VR trainers elaborated that trainees repeatedly encountered certain challenges that required guidance regardless of their background or profession while training within I-VRLEs even if they included built-in instructions.

Overall, VR application research has been conducted with fully developed simulations to assess their usability or user experiences (Radianti et al. 2020). Some studies have measured their efficiency in delivering learning outcomes (e.g. Johnson-Glenberg et al. 2021) or transfer of training (e.g. Abidi et al. 2019). However, to meet both the needs of the RPP project (gather evidence and data of deviant behaviours for machine learning) and our research interests (investigate the structure of immersive learning and comprehend the challenges it might present for autonomous training) we decided to run an early-development self-study I-VRLE with automated instructions, practical problem-solving tasks, and self-discovery of game mechanics.

The purpose of this study was to evaluate the experiences of immersed learners and to detect the challenges that users may experience during adaptation to skills training using I-VR technology. We gathered and analysed multifaceted data from 168 university students and staff members who played an I-VR simulation game in its early development to answer the following research questions (RQ):

  • RQ1. What challenges did the participants experience regarding the I-VR system?

  • RQ2. What challenges did the participants experience regarding the immersive learning situation?

The simulation featured five scenes of practical problem solving with self-discovery of VR functions and game mechanics, and one scene with a basic procedural assembly task. We sought to identify the various digitally created elements, actors, and actants involved in the immersive learning entanglement and to examine the immersed learners’ relationships with them. We analysed the participants' self-reported problem statements on the basis of the surveys and video-stimulated recall interviews (v-SR) to derive the several factors and components present in immersive learning with this I-VR system. Our findings indicated that immersive learning with I-VR consists of at least five principal factors that may present challenges for the learner during training. We present a thorough qualitative description of the challenges experienced in Sect. 4 and five main implications for the design and development of VR skills training applications in Sect. 8. By exploring the structure of immersive learning, introducing a repository of potential challenges, and offering design implication insights, we hoped to contribute to a future where research findings and theory-based instructional principles are more readily available for the development of VR training applications. Post-evaluation of the presented findings and their implications are required to determine their significance for I-VR-based skills training development.

2 Related work

In this section, we will feature works related to skills training with VR simulations to comprehend contemporary research and development needs, and the requirements that I-VR technology places on training providers and participants. Furthermore, we will review medium-specific learning theories that apply to I-VR simulation games and suggest design and development principles for immersive learning. Our particular focus will be on complex skills training (i.e. practice of procedural skills and decision-making skills) with VR in the field of IMA. Systematic reviews of industrial skills training have disclosed that although some researchers have used different learning theories to design VR skills training applications, the practice of explicit use of learning theories, especially medium-specific theories for immersive learning, is scarce (Radhakrishnan et al. 2021; Radianti et al. 2020). In the following subsection, we review the literature regarding the current state of skills training transition towards the inclusion of VR simulations.

2.1 Skills training transition

Complex skills training aims to improve workers' overall efficiency through the practice of the necessary procedures, equipment, teamwork, skilled actions, safety regulations, and mental readiness. Among others, Wong et al. (2023) detailed how we can expect large investments in immersive training technology soon. Through skills training, employees develop the necessary autonomous work habits and procedures, whereas I-VRLEs offer modifiable, controllable, and evaluative platforms to support their iterative practice (Korhonen et al. 2022). According to Radianti et al. (2020), I-VR applications have also been used to teach various subjects in higher education, but most often procedural and practical skills. Their use for skills training has been met with positive attitudes, engagement, and high expectations (Wong et al. 2023). The shift from physical mock-ups to psychologically relevant virtual simulations is underway (Kozlowski and DeShon 2004).

Several industrial domains apply I-VR simulations as part of their training packages, such as first-responder training, medical training, transportation, military training, interpersonal skills training, and workforce training (Xie et al. 2021). In comparison, according to Carruth (2017), hands-on and “on-the-job” workforce training methods in the real world can be costly, risky, and logistically heavy. For instance, human–machine system operators are expected to master various devices, and their diversifying knowledge and skill requirements have led to longer training periods (Petukhov et al. 2017). Furthermore, even though firefighters' work environments consist of various of situational threats and patient encounters, firefighters have trained in benign or staged physical simulations that are unable to emulate the risk or elicit the physiological sensations involved with true-to-life high-pressure situations (Steffen et al. 2019; Wheeler et al. 2021).

Skills training that could benefit from I-VRLEs range from surgical procedures (Huber et al. 2017) to IMA training. For instance, in traditional maintenance training, demonstration sessions must be set up with experts, whose time resources and costs determine their availability (Gutiérrez et al. 2010). However, I-VRLEs can deliver emulations of real-world equipment, spaces, and events when they best suit the trainee’s schedule. In fact, Radhakrishnan et al. (2021) found that 77% of their reviewed studies (N = 60) featured autonomous training capabilities (namely virtual multimedia instructions). When meticulously designed, adaptive environments can afford opportunities to practice procedures and tasks repetitively without delay or risks of injury or environmental harm, thus offering a viable substitute for training in complex and high-risk situations (Zahabi and Abdul Razak 2020). However, VR training practitioners have indicated that even with autonomous training capabilities trainees continue to encounter challenges in VR applications that may require guidance, instructions, troubleshooting, and so on.

The use of I-VRLEs and I-VR devices places specific skills needs for the trainees. They are expected to master the controller functions to perform the game mechanics (Kao et al. 2021), use visual-spatial abilities to skilfully navigate the virtual space; recognise distances, sizes, and depth; handle virtual objects (Radhakrishnan et al. 2021); and apply bodily motor skills to perceive their surroundings and activate virtual–physical affordances (Ho 2020). Hence, we can expect that without practice in these skills novice trainees will experience some challenges related to the skilful use of the I-VR system (see RQ1). Little work has previously concentrated on identifying the spectrum of potential challenges to inform VR skills training application design. Most of all, I-VR technology is a medium that provides access to situated information within simulated experiences where learning can take place (Jensen and Konradsen 2018). Thus, we review related learning theories and design recommendations for immersive learning situations in the following subsection.

2.2 Immersive learning design and development principles

Mastering complex skills requires practice and repetition. The immersive learning situations for IMA skills training within I-VRLEs offer simulations of authentic work scenarios in which trainees may practice procedural skills and decision-making under realistic conditions. To ensure effective psychological fidelity in skills training simulations developers could benefit from instructional principles regarding the design of simulation features, capabilities, and surroundings based on basic psychological theories (Kozlowski and DeShon 2004). In a recent review of VR applications for skills training, Xie et al. (2021, p. 3–4) discussed and detailed a process for developing realistic and targeted skills training scenarios: (1) identify training objectives, (2) design learning scenarios, and (3) implement with I-VR systems.

During the first phase, developers use and combine task analysis methods and frameworks to identify the learning objectives for the training that guide the simulation construction. They could aim to derive all the information needed and factors involved in completing a profession's tasks (task analysis, Xie et al. 2021). On the other hand, they could aim to identify the units and structures of the desired goals and how best to achieve them (hierarchical task analysis, Salmon et al. 2010). Alternatively, to identify the relevant parameters necessary to emulate authentic work and tasks, they might aim to describe the working conditions, systemic constraints, cognitive requirements, and the ways in which functions and purposes may be achieved through behavioural performance (cognitive task analysis, Kozlowski and DeShon 2004; cognitive work analysis, Salmon et al. 2010).

According to Xie et al. (2021, p. 3), learning scenarios (i.e. tasks, surroundings, baseline configurations, milestones, and randomness level) are then designed based on “detailed descriptions of how people accomplish a task” or a set of tasks to be trained during the second development phase. Kozlowski and DeShon (2004, p. 12) emphasised that learning scenarios do not necessarily have to mimic true-to-life events as long as they elicit previously identified “theoretically based constructs and processes” and include “measurement systems to track those constructs and processes as they unfold during the simulation experience.” In other words, the learning scenarios fidelity should be sufficiently high enough to allow learner manipulation of key elements to gain meaningful lived experiences (Korhonen et al. 2022). In their review of educational and VR training applications, Radianti et al. (2020) derived several design elements that had been previously implemented in procedural-practical knowledge training. The first two development phases can inform which basic interactions and realistic surrounding factors should be included in the learning scenario. Furthermore, they can inform what virtual object assembling is required, where and when to offer immediate multisensory feedback and instructions, whether the users should receive virtual rewards or knowledge tests during the simulation, and whether there are moments when interaction with others, passive observation, or moving around are applicable. Descriptions of these and other derived design elements for different type of VR training applications can be found in the extensive work of Radianti et al. (2020, p. 14).

The third VR application development phase is to apply proper hardware and generate the necessary software elements for the designed learning scenario (Xie et al. 2021). This is where medium-specific learning theories and prior research can inform developers of applicable devices and content generation. When it comes to learning within three-dimensional (3D) virtual environments, Fowler (2014) introduced the idea of pedagogical immersion and suggested ways in which to choose technological elements based on pedagogical affordances. They suggested that developers (and organisations) should intend to design for learning by (1) identifying and defining learning requirements and intended learning outcomes, (2) matching task affordances (i.e. functional properties of training methods or educational technologies, such as I-VR) with the learning requirements, and (3) specifying the appropriate learning objectives and activities. For a general affordance framework of virtually assisted activities, Steffen et al. (2019, p. 721) provide information on how to match immersive technologies with certain learning requirements.

According to Xie et al. (2021), procedural generation of virtual content for a learning scenario is often a demanding process with which game engine tools and 3D modelling software may assist. Overall, the effectiveness of I-VR use depends significantly on the presented environmental characteristics and the organisation of the learning material (Zinchenko et al. 2020). On the basis of the cognitive theory of multimedia learning (Mayer 2014), humans actively engage in cognitive processing of auditory and visual material with different information processing channels that have limited simultaneous processing capability to construct mental representations of their experience. Following Mayer’s (2014) design implications, the generated virtual content for skills training should have a coherent structure and offer guidance on how to build the intended knowledge structures. Improper instructional and content design can overload a learner's working memory capacity and lead to less effective model building and more extraneous processing (Mayer 2014; Zinchenko et al. 2020). Of course, I-VR skills training experiences not only are audio-visual presentations but also engage a person’s body, mind, and self (for a comprehensive collection of learning theories, see Illeris 2018) to socially constructed situations through interactions with the aspects of the environment. Towards that end, Johnson-Glenberg (2018) suggested numerous general guidelines and gesture-rich design principles for the creation of embodied learning experiences within educational I-VR applications. For instance, the use of hand controllers for active and body-based learning and proper mapping of gestures could support the transfer of training from practice to targeted tasks (Ho 2020). Korhonen et al. (2022) provided a more theoretical work that combines I-VR-specific training and tutoring design ideas with embodied cognition.

Lastly, research can also help guide developers' choices by providing information regarding the challenges and negative effects of previously implemented design attributes. For instance, Radhakrishnan et al. (2021) reviewed literature in which challenges in certain design attributes hindered the effectiveness of skills training. Researchers have suggested apt animations and improved haptic feedback as viable solutions to improve object interaction realism (Barkokebas et al. 2019). Researchers have also called into question whether higher immersive features automatically lead to better knowledge gains, specifically, when they may also lead to more extraneous cognitive load (Makransky et al. 2019). Regarding hardware ergonomics, researchers are expecting lighter and wireless head-mounted displays (HMD) with higher resolution to increase user-friendliness (Huber et al. 2018). Finally, each VR application development phase could benefit from co-designing with subject matter or process experts and incorporated playtesting to recognise breakpoints (Johnson-Glenberg 2018). In this context, breakpoints are aspects of the experience that perplex or preoccupy users and pause or hinder their gameplay progress.

The reviewed literature indicates that I-VRLEs could offer effective environments for autonomous and distributed skills training in emulations of real-life spaces and scenarios, so long as the development of VR simulations is guided by robust instructional principles. Hence, we can expect immersed learners to experience challenges during immersive learning situations (see RQ2) when developers have not applied task analysis methods nor considered contemporary theories of learning, motivation, and performance during the simulation's development process. However, as VR training practitioners have noticed, even then trainees may require assistance with their experiences. In Sect. 3, we detail how we set out to derive the breakpoints (i.e. experienced challenges that might preoccupy users in VR simulations).

3 Methods

To access a wide range of experienced challenges and capture participants’ deviant behavioural data we ran gameplay sessions on an early-development I-VRLE. First, to elicit and study challenges regarding the I-VR system (RQ1), the simulation featured self-discovery of VR functions and game mechanics through the first five scenes. Second, to elicit and study challenges regarding immersive learning situations (RQ2), the simulation featured practical tasks in mundane spaces and based the sixth scene’s assembly task on a manual instead of contemporary design recommendations. In this section, we describe the research setting, data acquisition setup, and data analysis methods in more detail.

4 Research setting

4.1 Research laboratory and hardware

We organised a research laboratory at the University of Helsinki in November 2020. In accordance with the RPP, we collaborated with Upknowledge (Upknowledge.com), which designed and built the applied I-VR software: Funland. Together, we set up the I-VR research laboratory with two web cameras, microphones, a custom-assembled gaming PC, and HTC Vive Pro Eye devices (Fig. 1). The setup enabled us to gather video data on immersed learners' deviant behaviour when they encountered challenges within the I-VRLE. The company used the data to develop an AI assistance tool that could recognise struggling trainees and provide apt assistance. Simultaneously, it enabled us to collect data on and trace the participants' experienced challenges.

Fig. 1
figure 1

Illustration of the data collection setup in the research laboratory. In the middle of the play area, the participants' voice and gestures were recorded using two video cameras and the head-mounted display's microphone. A facilitator observed the simulation from their desk and took field notes

4.1.1 Participants

We advertised the opportunity to participate in the research in various places and platforms of the university, such as e-mail lists, internal communication boards, pamphlets around the campus, and university course lectures. Altogether, we ran 184 gameplay sessions. However, we excluded from this study participants who misunderstood the post-survey's open-ended question, left it unanswered, or indicated that they had not experienced any challenges. Hence, we conducted the present study on the basis of the gameplay sessions and the complete survey data of 168 participants. These participants were mainly University of Helsinki students who majored in an education degree program (77.4%). Most of them studied in the General and Adult Education program (N = 85). Participation in the research was voluntary and based on informed consent. Each participant received a unique overall patch for participating in the study. The relevant characteristics are presented in Table 1.

Table 1 Overview of the participant characteristics

4.1.2 Immersive virtual reality software

In the present case, the I-VR software was designed to challenge the participants with problem-solving tasks that gradually increased in difficulty, including a basic do-it-yourself (DIY) assembly task. The participants accessed Funland with an I-VR that was highly exclusive of the outside world, surrounded the participant with a realistic virtual simulation through an HMD, and engaged two or more sensory modalities. Following the suggestions of Laine et al. (2023), we can define Funland as linear interactive-active software. Its users were free to move around in the meaningful parts of the simulation and could interact with various virtual objects, tools, and mechanisms available at different scenes of the program that mimicked generic rooms, with objects of interest scattered over work surfaces.

The problem-solving tasks measured individual users' initiative to resolve problems and skilled intentionality (Rietveld et al. 2018) regarding the use of I-VR capabilities in some basic everyday activities such as watering a plant. To solve these tasks, the participants needed to discover the game mechanics and take actions that were common and general across I-VR skills training platforms such as grabbing and placing virtual objects, equipping and using a virtual tool, and navigating the virtual–physical space (Fig. 2). Task performance was continuously assessed by the program logic and the facilitator by measuring the time required to complete individual actions (e.g. approaching a work surface and grabbing the correct virtual object).

Fig. 2
figure 2

An immersed learner examined their controller functions

Despite being updated four times during the project, Funland was a steady early-development simulation designed to elicit the usual challenges regarding immersive learning within self-study-oriented I-VRLEs. The first five rooms served as a medium for discovering the mechanics and functions of the simulation. From its second update, it also included a sixth scene with an assembly task (Table 2). The program automatically offered pre-coded instructions and hinted to the participants at predetermined intervals. After completing a task, the program congratulated the user and warned them before automatically transporting them to the next scene. Each scene added new practical skills and variations of the game mechanics to the mix. The various scenes are portrayed in Fig. 3.

Table 2 Overview of the simulation scenes
Fig. 3
figure 3

Snapshots of the immersed learner’s point of view from scenes 2–6 from left to right. In scene 2, the learner observes a puzzle box. In scene 3, the learner looks along the narrow corridor before them. In scene 4, the learner holds a spray bottle and looks at a plant in the corner of the room. In scene 5, the learner attempts to use buttons with directional arrows to bring a sign closer to them. In scene 6, the learner examines a partly assembled chair on a table

5 Research procedure

We conducted the study during the COVID-19 pandemic outbreak. Permission to perform on-site data collection was granted by the university’s pandemic monitoring group. To obtain permission, we devised thorough hygiene measures such as wearing disposable masks and gloves and regularly disinfecting the devices and applied them during the research procedure. For their gameplay sessions, the participants entered the research laboratory one at a time. They filled out an online informed consent form and a pre-survey regarding prior I-VR and gaming experience beforehand. A facilitator escorted them into the laboratory and monitored the task progress, signs of cybersickness, and the program flow. Before the simulation, the facilitator informed them of the gaming area and its boundaries, potential incompatible health conditions, controllers and their functions, and the possibility of cybersickness and ways to prevent it along with the way out of the simulation (for a baseline of I-VR ethics-in-practice, see Southgate et al. 2019). The facilitator asked the participants to think aloud during the simulation and to attempt to resolve each scene and task on their own before consulting the facilitator. They would interfere and aid the participant when progress was not made, the program had a bug, or any signs of cybersickness appeared. After the simulation, the participants filled out a post-survey regarding the challenges they had experienced and the support they preferred to have received during the simulation.

5.1 Data acquisition

We gathered data in three cycles (Table 3). The first author functioned as a facilitator during Cycles 1–3, and the second and third authors participated in supporting facilitation during Cycle 3. During the data gathering cycles, we took field notes, recorded the simulations, and asked the participants to complete pre- and post-surveys. In addition, during the third cycle, we conducted remote v-SR interviews with the selected participants primarily the day after the gameplay experience. The total active time for data gathering was 6 months.

Table 3 Overview of the data gathering cycles

We used the open-ended responses from the post-surveys as the primary source for the data analysis. The v-SR data supplemented the findings by offering content-rich information and diverse frames of reference. We accessed the video recordings to support deductions from these two primary qualitative data sources. During Cycles 1 and 2, the first author kept a field note journal. To reflect on the observed immersive learning actions and choices, the facilitator asked the participant quick questions and displayed scenes from the video recording after the gameplay session. For Cycle 3, the facilitators devised an observation matrix for challenges based on shared observations, discussions, and preliminary challenge categories derived from the participants' responses to the post-survey. We took notes and logged first impressions into the matrix whenever the participants struggled during the simulation. We used it to scout out and select individuals and episodes to discuss for remote v-SRs. We chose a diverse and representative group of participants to cover the varied challenges observed in the interviews. Overall, we interviewed 13 participants during the third cycle. The interviewed participants' characteristics are presented in Table 4.

Table 4 Overview of the interviewee characteristics

We developed the v-SR interview protocol in accordance with the best practices of stimulated recall methods (Dempsey 2010; Nguyen et al. 2013; Pitkänen 2015). The interviews consisted of structured general questions and a semi-structured recall phase. During the recall phase, the interviewees watched the facilitator select video episodes of the observed challenges from their gameplay. We asked the participants to describe in their own words what happened during the video episodes, what they were thinking during the events, the challenges they had met, and the support they would have wanted to receive. The v-SR recordings lasted for 26.29 min, on average. We conducted and recorded them remotely with the videoconferencing application Zoom. Finally, we transcribed the interview discussions from the videos for analysis.

5.2 Data analysis

We analysed the materials in three phases: (1) initial mixed content analyses, (2) triangulation, and (3) quantified data analysis. In this section, we describe how we prepared the data for the analyses and how the survey, v-SR, and quantified data were organised. The work on these varied materials on the same phenomenon supported the researchers' immersion in the challenges experienced during immersive learning.

5.2.1 Content analysis and triangulation

To answer research questions 1 and 2, we applied the methodological standards of Elo and Kyngäs (2008) for inductive and deductive content analyses to the multifaceted qualitative data gathered from the surveys and v-SR. We analysed the materials systematically and carefully considered their categorisation through multiple iterative and triangulating analysis steps. The analyses led us to develop and apply a four-level categorisation: (1) The principal factors were the actors and actants present in the immersive learning entanglement. (2) Those actors and actants were comprised of multiple significant components that may or may not hinder the immersive learning experience in some way. (3) These components had specific problematic component features, which were qualities that the participants’ referred to in their responses. (4) The participants' responses consisted of problem statements regarding the experienced challenges that addressed their relationships towards the component features (i.e. what made the specific component features challenging for them).

In the post-survey, the participants responded to the following open-ended question: “What challenges did you face during the simulation?” We prepared the research material by repeatedly reading their responses and extracting the problem statements in a Microsoft Excel sheet. The participants' responses contained various amounts of reported and expressed problem statements. We repeatedly evaluated the responses and problem statements throughout the data analysis process as our understanding of the immersive learning experience expanded.

In the beginning, we derived the initial component features via a systematic inductive grouping of problem statements that referred to the same issue. To test these initial categories, we applied deductive analysis steps. Two independent reviewers designated each problem statement into the initial component feature categories. We combined the two researchers' categorisations under a one-sheet matrix to detect any disagreements. We then compared the reviewers' correspondence by calculating the inter-rater agreement coefficient with Cohen's kappa for each initial component feature category. The result was not up to par, and we decided to discuss and resolve the discovered differences. We used negotiation, problem statement separation, and video inspections as tools to appoint the disputed problem statements.

Next, we implemented an iterative cycle of systematic inductive analysis. We closely inspected the problem statements within the context of their groups. First, we re-organised and divided the component features into themes, which we later turned into the principal factors. Then, we re-assessed each problem statement on the basis of the material in their component feature group and grouped similar statements to form separate experienced challenges. After deriving experienced challenges through each component feature, we re-assessed and re-organised the experienced challenges and component features. Through systematic iterative analysis steps and research group discussions, we also derived new component features and re-named the various categories. After rigorous inspection and re-assignment, we sorted 460 problem statements and devised a preliminary experienced challenge matrix. However, we decided to investigate the phenomenon from a more holistic perspective and triangulated the survey findings with the v-SR interview material.

In the v-SR, the participants answered general questions regarding their immersive learning experience and discussed their challenges during the recall phase. We used the Atlas.ti (Atlas.com) software for the content analysis of the v-SR material. Before importing interview segments to the software, we re-listened to the interviews and corrected any mistakes in the initial transcripts. For the content analysis of the v-SR material, we used three of the four levels of the coding system developed through the content analyses of the survey data (principal factors, component features, and experienced challenges) to denote problem statement passages. We read the material repeatedly and coded all segments with references to experienced challenges. The problem statement passages offered detailed descriptions of the experienced challenges from the interviewees' perspective, which allowed us to derive new challenges and component features. We then combined the v-SR findings with the survey framework, on which we based the re-assessment of each post-survey response. After repeating the inductive analysis steps, we further abstracted the component features into components of the principal factors. On the basis of these iterative analysis steps, we finally appointed 481 problem statements into 89 experienced challenges in 22 component features pertaining to 11 components under the five principal factors of immersive learning.

5.2.2 Experienced challenges’ relative proportions

On preliminary analyses indicated that some participants immediately attributed the challenges of using I-VR to their lack of digital gaming experience. Thus, we decided to investigate the matter further by comparing the relative proportion of the challenges experienced between groups of gamers and non-gamers. We identified these participant groups from the pre-survey. On the basis of the discussions and field notes, we expected that the participants who did not play digital games would experience significantly more problems with the I-VR system than those who played digital games.

To the best of our knowledge, this is the first time that challenges and their relative proportion have been studied in the context of immersive learning. Thus, we relied on conventional statistical methods to interpret and visualise the data. We examined the relative proportion of challenges experienced by comparing between-group differences in relation to gaming experience using an independent sample t test. We completed these computations using the IBM SPSS 28.0 (Statistical Package for the Social Sciences) software for Windows (ibm.com/spss). In the next section, we present the statistically meaningful results to partially address the first research question.

6 Results

In this section, we present the results of our examination of the challenges that the participants experienced during the immersive practical problem-solving gameplay sessions. We addressed the participants' self-reported challenges that they experienced with the I-VRLE in the following order: (1) those regarding the I-VR system and (2) those regarding the immersive learning situation. The former involved an examination of the relative proportion of challenges experienced as a function of the participants' digital gaming experience. We divided the subsections according to the derived principal factors of immersive learning: hardware, software, learner, learning activity, and virtual–physical space. They begin with an overview of the components, component features, challenges experienced, and samples from post-survey responses. Then, we take a closer examination of the challenges and enrich the examination with interview excerpts.

6.1 Experienced challenges regarding the I-VR system

To answer the first research question, we present the participants’ experienced challenges regarding the applied I-VR system. We considered applied hardware and software as two principal factors of immersive learning. Through several iterative analysis steps, we derived components and their features and the challenges that fell under these factors and present them in the following sub-sections, alongside the results regarding the relative proportion of experienced challenges.

6.1.1 Challenges related to hardware

Overall, 99 of the 168 participants reported or expressed 137 challenges regarding the hardware. The participants' responses shed light on the following hardware-related component features: controllers, teleportation function, trigger function, release function, accessibility, and cybersickness (Table 5).

Table 5 Overview of hardware-related challenges

The use of controllers and functions received the most mentions among all the derived components in this study. Hand-held controllers prevented some participants from fully realising their abilities within the I-VRLE. It took some time for the participants to become accustomed to the controllers and their interaction depth. Some participants were unsure of the button meanings and some would have preferred to use their hands instead of the counterintuitive controllers. One interviewee reflected on their experienced challenges with hand-held controllers:

Since I indeed do not regularly play anything, when my partner would tell me to perform a specific action on a console controller, I cannot remember which button out of the millions of buttons to press. It was somewhat similar with this system, that I have just not memorised them yet; what functions to apply and when. (I11)

The participants experienced challenges regarding the operation of the teleportation function. The most specific issue with teleportation was how to direct it properly. In the end, the operation and aim of the teleportation function were integral parts of the same challenge. Some participants grew accustomed to the gameplay functions more quickly than the others. Aiming challenges became obvious from the descriptions where the participants mentioned ending up in the middle of a table, that is, when the participants intuitively approached the concept of teleportation with the intention of teleporting to a location of interest by directing the teleportation beam towards its centre. Instead, the designed teleportation beam was meant to be directed towards the floor space. One interviewee explained the reasoning behind their experienced teleportation challenge:

I did not pick up between red and green, and what that meant. And that you needed to point towards the floor where you want it to be. I was thinking about Google Maps. You just click forward, like straight forward, and it takes you one way or another. You cannot point down or up on Google Maps. I mean, that is my only previous experience, I would say, with like a teleporting sort of thing. (I4)

Other function-specific challenges were related to handling the trigger function and using the release function. The participants had trouble rotating, moving, and grabbing virtual objects or equipping virtual tools. Moreover, two participants expressed that they could not remember which button to press to let go of a virtual tool fluently. The essential differences between the virtual object and virtual tool designs were related to the fact that the tools had additional functions that could be activated. Therefore, they were “equipped” by tapping the trigger, activated by holding the trigger, and then released by grasping the grip buttons. The participants did not go through formal training to become accustomed to the use of controllers and functions. Furthermore, in some instances, we observed a behaviour in which participants would seemingly attempt to grab objects remotely with the teleportation beam. An interviewee's description of events indicated that their experienced challenge was impacted either by the incorrect recollection of button meanings and their functions or by misclicking:

The first thing I will always do is a trigger with this hand, because I am right-handed. So, when I pulled the trigger, it automatically does the laser, so I thought maybe this is the way to move things: you need to just point out a laser. (I6)

The participants experienced challenges regarding the technology's accessibility. Some parts of the device and its operations were described as clumsy, the floor level was uncalibrated, the HMD felt uncomfortable, the cable got in the way, or the volume was too low. Furthermore, we associated the rare instances of cybersickness with the applied technology. The participants mentioned feeling nauseous, dizzy, or as if their balance was off. One interviewee mentioned experiencing eyestrain the next day. In one way or another, many aspects of these challenges culminated on the HMD, which could be regarded as the gateway to the VR and to the constraints, affordances, and capabilities it may offer.

But the real thing, that is kind of bothering me, that the headset was heavy on my neck. I know it is not for play, but it is not something that you would want to use for hours upon hours. Or even advice kids to use [over] a long period of time. (I5)

As the above example depicts, there is still work to be done to improve the usability of I-VR devices. Furthermore, we acknowledge that the current technology is not available and accessible to all. Alongside the research procedure, we held demonstrations for university students and staff who had reason to believe they might not access the I-VR technology (physical restrictions, fear of sickness, or bad prior reactions). Those who braved to try the technology did so by initially sitting on a chair to alleviate potential cybersickness symptoms. To summarise, the experienced challenges regarding the hardware had to do either with issues pertaining to the use of controllers and functions or with the devices' usability.

6.1.2 Challenges related to software

Overall, 80 of the 168 participants expressed 108 problem statements regarding their relationship with the software. The participants' responses shed light on the following software-related component features: instruction timing, instruction content, feedback, realism, virtual objects, and system stability (Table 6).

Table 6 Overview of software-related challenges

The first component of the software factor included features related to reciprocal interaction in the program. The participants felt that the program's instruction timing was off and lacked interactivity. Overwhelmingly, the participants considered the instructions too slow for them. Sometimes, overlapping instructions interfered with the participants who were pre-occupied with other matters. Furthermore, the participants felt that the program’s instruction content was inadequate; that is, they were thought of as vague for lacking preciseness, visual aid, or functional support. This vagueness was often brought up when the participants could not comprehend the given instructions or felt that they were incorrect in some way. Even repeating the same instructions did not help. Overall, the participants had no apparent means of influencing instruction timing or requesting additional hints or instruction specifications.

It felt like instructions came slowly. They did not feel at all interactive and came automatedly. You would have wished to be able to ask it to repeat them or to give them as written text. (I11)

The participants experienced challenges regarding the feedback offered by the program. Before we added animations to the fifth scene, the participants noticed a lack of direct environmental feedback on the virtual mechanism. Throughout the simulation, some felt that they did not receive enough feedback when succeeding or failing at an action or lacked the desired haptic feedback and necessary controller information. The latter came up specifically during the v-SR interviews when the interviewees explained that they were missing vital information regarding the available functions in the virtual simulation. The feedback-related challenges indicated that the simulation could have been more interactive and utilised various multisensory means to provide feedback.

For example, I try to press a button, but did not recognise that the system was activated. Had I received instructions to keep pressing the button, I do not think I would have struggled. (I1)

The second component included features related to the program's fidelity. The realism of the program felt off to some participants. Collectively, the participants noticed various aspects that differed noticeably from reality. Some noticed that one could walk through structures, such as walls and tables. Moreover, it took time for the participants to get used to the simulation's physics, specifically how virtual objects reacted to one another. The participants had no virtual embodiment (i.e. virtual avatar), which felt strange to them, as it was considered to have limited the perception and use of their bodies and hands. The participants also mentioned several minor incidents regarding the program's fidelity. They felt that virtual objects had peculiar interactive boundaries, movement within the simulation was counterintuitive, some elements of the simulation were artificial, it was brighter in the simulation, or the virtual scene did not adjust well to the perceived boundaries of the physical space.

I am seeing these realistic things in front of me, and I feel I should climb over it, because in the real world I can, but then I cannot. I must walk through; then I feel like a spirit, so I cannot even go through it. I just felt restricted. (I6)

I was really shocked when the two pieces of chair got tangled. How did it even happen? I cannot flip the table, but the chair can get tangled? That is crazy. And I cannot pick up the chair, but I can throw the pieces across; it was very weird. (I2)

A few participants experienced challenges with the clarity of the program's virtual objects in the final DIY assembly scene. They could not differentiate holes from the chair's texture or tell two different yet similar chair parts apart from one another. One interviewee hinted that the lack of virtual embodiment hindered their ability to perceive certain details of virtual objects and progress in the task:

I could not tell where each part belonged. Because I was not able to truly touch them nor feel their shapes. And since I could not handle them as fluently, I could not distinguish where they had holes and pegs. (I13)

Lastly, the participants noticed various system stability-related issues, or program bugs that prevented them from progressing in the simulation. They included abruptly ending instruction feeds, the program failing to register a successful action, objects colliding and flying out of reach, simulation lag due to stacked objects, virtual tool gimmicks, cable coming loose and disconnecting the simulation, or disappearing virtual objects. Together with the RPP company, we continuously monitored, addressed, and resolved these program bugs throughout the project.

6.1.3 Significance of prior digital gaming experience

We performed a two-sample t test to examine whether the relative proportion of the experienced challenges differed significantly between the participant groups according to sex, prior VR experience, or digital gaming background. We detected significant differences only regarding digital gaming background. Regarding the use of the I-VR system, the digital gamer group (n = 79) accounted for 113 of the 244 overall notions of experienced challenges, whereas non-gamers (n = 88) accounted for 131. One participant had missing information related to their gaming habits; thus, we did not factor the challenges they reported in these calculations. As shown in Table 7, when we compared the relative proportions of challenges at the principal factor level, there was a significant difference in hardware-related challenges between the digital gamer (mean (SD) = 0.2519 [0.30256]) and non-gamer groups (mean (SD) = 0.3608 [0.33036]; t[165] = − 2.213, p = 0.028) when equal variances were assumed. Furthermore, when we inspected the relative proportions of challenges on the component level, we found a significant difference in the challenges concerning the use of controllers and functions component between the digital gamer (mean (SD) = 0.1881 [0.28415]) and non-gamer groups (mean (SD) = 0.3081 [0.32514]; t[165] = − 2.527, p = 0.012) when equal variances were assumed.

Table 7 Descriptive statistics of differences in the relative proportion of experienced challenges

On the basis of the two-sample t tests, there was a statistically significant difference in the relative proportion of experienced challenges only in the hardware factor, specifically regarding the use of controllers and functions. Hence, the results did not fully meet our expectations, as the groups did not significantly differ in the relative proportion of software-related challenges. To summarise, according to these results, the participants without prior digital gaming experience expressed statistically significantly more challenges related to the use of controllers and functions.

6.2 Experienced challenges with the immersive learning situation

To answer the second research question, we examined the participants' experienced challenges regarding the factors of the immersive learning situation. From a learning process point of view, immersive learning situations with I-VR technology are set in a virtual–physical space where a learner engages in one or more learning activities. We have divided this section into sub-sections to examine challenges regarding a) the learner, b) learning activities, and c) virtual–physical space. Here, we present the derived components, component features, and experienced challenges related to the principal factors.

6.2.1 Challenges related to the learner

Overall, 64 participants expressed 92 problem statements that we considered to be related to the learners. The participants' responses shed light on the following learner-related component features: spatial reasoning, inexperience, language proficiency, state of mind, and agency (Table 8).

Table 8 Overview of learner-related challenges

The participants experienced challenges related to their capabilities, some of which were closely connected to the simulation’s final task. Nevertheless, we considered that spatial reasoning, language proficiency, and I-VR-related inexperience were applicable skills in various I-VRLEs. The challenge with spatial reasoning occurred when the participants had difficulty mentally rotating and imagining an assembled final product in the DIY assembly task.

Because the part did not detach, it was extremely difficult to visualise and imagine the spatiality and measures. That was frustrating. (17)

The participants also struggled with their language proficiency when attempting to comprehend the program’s instructions. They attributed difficulties specifically to the English language and some of the key concepts applied in the DIY assembly task. On the basis of the interviews, it appeared that inadequate English comprehension led to challenges and could sometimes be a by-product of other challenges for non-native speakers.

I think that by far my weakest part was when I got nervous, the English language. So, there was something that I could not, that I should have listened to more closely. (I8)

Some participants indicated inexperience with the technology as a reason for struggling with the use of VR devices and adaptation to the simulation, for losing concentration, for their low self-expectations, and for their need for guidance. During the data collection, some participants used the novelty of the technology to explain why some aspects of the generated experience felt more challenging.

And of course, it must be related to the devices and my lack of experience with gaming. It just does not come naturally to me, and I press the wrong things a lot. (I7)

That was my first time experiencing the VR environment. Any obstacles around me make me a little bit scared and even though I am surrounded by walls, the physical walls are not in my eyesight. (I1)

The participants experienced changes in their state of mind during the simulation. Besides confusion, the participants expressed impatience, frustration, insecurity, tension, disappointment, restlessness, distrust, and irritation. We viewed confusion as a positive and engaging emotion that may support learning. Thus, we did not count it as a separate experienced challenge. However, we considered the remaining expressions negative and more likely to hinder the learning process. The participants typically reported that other challenges influenced changes in their states of mind. Occasionally, a negative state of mind was the cause of other challenges. An interviewee explained how the program's slow instructions made them feel impatient and how those feelings manifested in their actions:

I need to make something happen, so I am just sort of testing everything out. Because I feel like I have received a lack of instructions and do not know what is supposed to be going on. So yeah, I am going to click away. I think it goes back to what I have said about the time in between directions, like, “I am ready, make something happen.” (I4)

Many participants experienced challenges regarding their agency. The distribution of regulation confused them; that is, they were unsure whether they were expected to wait for instructions or if they could act on their own volition in the simulation. Some of them were apologetic for their self-agency, that is, for acting at their own volition before receiving the program instructions. Some were disappointed with the decree of regulation offered by the program; i.e. they wanted more control over their own experiences and decisions.

I felt somewhat frustrated that I was forced to place the bars first, because I thought that the task was simple and that I could perceive how to build the chair on my own. But then it kept repeating the instructions and I gave up on assembling the chair according to my own plan. (I12)

To summarise, we identified capabilities, state of mind, and agency as learner components. We used this expression (“component”) for categorical consistency. The participants brought these parts of themselves forth in their responses regarding the challenges in this precise immersive learning situation.

6.2.2 Challenges related to learning activity

The learning activity referred to the program's problem-solving tasks. The challenges experienced hindered the execution of those tasks in one way or another. Overall, 62 of the 168 participants expressed and reported 77 problem statements associated with the learning activity. We categorised their responses according to the following component features: task execution, affordance perception, and objectives (Table 9).

Table 9 Overview of learning activity-related challenges

The participants experienced challenges in task execution. Some were related specifically to the DIY assembly task's execution, assembly order, or object placement. Furthermore, they experienced challenges due to incoherent performance requirements, missing virtual objects, or inconvenience of the task layout. In the applied simulation, the problem-solving tasks were scripted; that is, the participants were expected to perform a series of actions in a certain order to solve the tasks. As expected, some participants remarked on even the slight changes in the coherency of actions required to perform tasks. In one of the scenes, a relevant virtual object was deliberately positioned out of sight to capture the physical reaction of immersed learners seeking a missing object for the RPP project's AI tutor development. Finally, the inconveniency manifested regarding various elements of the task layout: transparency of virtual glass windows, table size, restricted access to the chair's side frame, and task resources that were spread out of view.

It was annoying that they kept repeating the instructions. I was going to get there after I got all the pieces. It could have said nothing until you bring all the pieces to the same place. (I2)

The participants had difficulties in completing certain learning activities because of limited affordance perception; that is, they did not notice and realise relevant assembly, virtual mechanism, or virtual environment options at first or without hints during the various scenes. Sometimes, they also forgot to utilise the optimal functions or both controllers to solve problems successfully. The latter was evident in the fourth scene, where the participants were expected to make use of both controllers. They had to equip a virtual tool with one controller and use the teleportation function with the other. Until that point, we noticed that the participants exclusively used the controller of their dominant hand. Furthermore, some of the definitions used to describe the virtual objects did not align well with the participants' experiences. The interviewees explained that they struggled to identify certain objects:

I did not really understand what pieces the two bars or the sidebar were. (I5)

Lastly, the participants experienced challenges related to the learning task objectives. They could not comprehend what they were supposed to do at times. We expected this, as the problem-solving goals were not stated up front if the participant began engaging with the task layout right away. However, some simply forgot the goal during the task. The participants typically received only auditory instructions and hints.

I missed the beginning of the instructions. And then I became a bit confused since I had not paid attention to them, and then I had no clue as to what I ought to be doing. (I10)

To summarise, the components of the learning activity factor had to do with either the learning activity's objectives or relevance realisation, which is recognising what actions and objects were the most relevant to progress and achieve the task goals.

6.2.3 Challenges related to virtual–physical space

The virtual–physical space refers to the intersection of the immediate physical space and the accessible virtual environment. Thus far, within I-VRLEs, while technology extends learners' senses and abilities within the virtual space, their perceptions of the physical space become limited. This creates multiple potential challenges for unaccustomed users. Altogether, 51 participants expressed 65 problem statements regarding virtual–physical space (Table 10). Their responses considered the following component features: locomotion and navigation, and space and boundary perception (Table 10).

Table 10 Overview of virtual–physical space-related challenges

The participants experienced challenges with navigation and locomotion within the virtual–physical space. They took some time to get used to moving around and realising the various possibilities for movement. Sometimes, they felt it was counterintuitive to use the controllers for movement. Some participants mentioned running into the chaperone (i.e. the play area's safety boundaries). One interviewee explained why they would instinctively try to move around the space physically:

For me, the most challenging thing was to move around there. I was not thinking about how to move in the physical room, but how to progress and move in that [virtual] world and do so as realistically as possible. (I9)

The participants experienced challenges with the composition of their virtual–physical space and its boundaries. At times, they struggled to comprehend its dimensions. The virtual space seemed more boundless than the physical room, and some participants lost sense of the combined space. Sometimes, the participants ended up in situations where the boundaries of the virtual and the physical formed a narrow space. The fear of running into the physical wall worried some participants. Others forgot about them as the simulation progressed, and they became immersed in the scenes and tasks.

You get into it quite easily. However, when you receive a notice that there is a wall right in front of you, then you suddenly realise that you must concentrate on where to move and what is going on. (I3)

The virtual–physical space factor included the component of spatial and navigational constraints. In fact, the physical space in the research laboratory was noticeably smaller than the explorable virtual space. Combined with the participants’ adaptation to new forms of controller-based navigation, the constraints imposed by the virtual–physical space were met more frequently than we would expect from more experienced I-VR users. In the next section, we summarise the present study and continue to discuss its results.

7 Discussion

This study originated from an RPP project in which we cooperated with Upknowledge (Upknowledge.com) to gather data related to immersed learners' behaviour during adversity and challenges in an I-VRLE. The project's end goal was to develop AI assistance tools for immersive self-study and the training of complex skills for IMA workers. To examine immersed learners' needs for assistance, we collected data on the participants' experienced challenges during immersive learning of practical task completion with I-VR by conducting surveys and v-SR interviews. Through iterative and triangulated steps of content analysis, we analysed and categorised experienced challenges regarding various factors of immersive learning. Our aim was to arrive at a compact yet ample description of the various challenges that may arise during gameplay within self-study I-VRLEs. For this purpose, we ran an early-development linear interactive-active I-VRLE. It featured practical problem-solving tasks with delayed automated instructions, a scripted task path, bounded environmental interactions, and exploration of the virtual space.

The 168 participants, who were primarily female education degree university students, had few prior experiences with VR: 79 had no prior experiences, 8 were not confident at all in their I-VR fluency, and 54 participants reported that they had only tried it before (Table 1). Therefore, according to immersive training experts from the RPP company, they represented the general experience level for I-VR use of today's workforce in IMA professions. Moreover, Johnson-Glenberg (2018) proposed that educational applications should always be developed under the assumption that their users will be VR novices. Most participants were identified as female. Then again, we found no straightforward differences in the distribution of the female and male participants' experienced immersive learning challenges.

Altogether, we retrieved 481 problem statements associated with the I-VR system (RQ1) and the immersive learning situation (RQ2). We derived 89 separate challenges related to 22 component features under 11 components belonging to the five principal factors of immersive learning. An overview of the factors and components is depicted in Fig. 4. An immersive learning entanglement is comprised of a learner using I-VR system to participate in learning activities within a virtual–physical space. Our findings regarding the structure of the entanglement relate closely to the definition of Beck et al. (2020) of immersive learning environments that include both virtual and physical settings and where the state of immersion is facilitated by the technical system, narrative content, and engaging challenges.

Fig. 4
figure 4

A framework of the principal factors and components of immersive learning entanglement

The various challenges portrayed the participants' relationships with several aspects of the immersive learning entanglement. It should be noted that the challenges are also intertwined with one another. For instance, some learner-related component features such as agency, spatial reasoning, and language proficiency were closely connected with the software's design choices regarding the assembly task layout, restricted virtual object interactions, instruction timing, and applied concepts in the instructions. Furthermore, some of the challenges experienced under the spatial and navigational constraint component such as familiarisation to movement and counterintuitive movement were closely connected with the requirement to use controllers and functions for navigation. Hence, alterations to one aspect could have profound effects across the entanglement network and on the proportions in which learners will experience and report challenges. For instance, on the basis of our findings, we could state that applying automated voice instructions does not necessarily suffice to develop autonomous VR training applications for remote training. We observed many challenges regarding the lack of reciprocity in the instructions and virtual environment. The participants failed to establish a reciprocal and personalised relationship with the automated voice instructions. They felt that the automated instructions were too slow, non-interactive, and content deficient. Furthermore, the participants reported that the I-VRLE did not provide sufficient direct feedback cues in response to the participants' actions or needs. In their VR application development recommendations, Johnson-Glenberg (2018) suggested that users deserve guidance (e.g. pacing, signposting, and object highlighting) and unobtrusive, immediate, and actionable feedback during gameplay. Furthermore, Mayer (2014) implied that the purpose of multimedia design is to guide learners in their mental model building process. Thus, the importance of further empirical studies on the effects of various design choices and the use of established instructional principles in VR application development cannot be understated.

In this study, we found a statistically significant difference (at the 95% level) between digital gamers' and non-gamers' relative proportion of experienced challenges regarding the hardware principal factor and, more specifically, the use of controllers and functions component. The non-gamers reported statistically significantly more problem statements regarding the component's experienced challenges. Thus, when arranging immersive learning situations it would be reasonable to account for trainees' digital gaming proficiency. However, we obtained these results in an I-VRLE, where participants had to self-discover the correct functions to operate the game mechanics. Having prior digital gaming experience might have cultivated behaviours that facilitated the in-game discovery of mechanics. Typically, VR games include an in-game training sequence to introduce game mechanics (Ho 2020). Towards that end, researchers Kao et al. (2021) successfully compared three modalities of in-game controller tutorials in different types of I-VR games and learned that the significance between tutorial modalities increased for VR games with higher control complexity. Tutorials with spatial cues and textual instructions showed significant upsides. Furthermore, they showed that the positive influence of in-game tutorials may extend beyond the mastery of the controls into game performance, enjoyment, and engagement (Kao et al. 2021). In our study, two challenge component categories corresponded to these findings: missing controller information and visual feedback. From the v-SR interviews we learned that some participants had hoped that information regarding the game mechanics and functions would appear near their controllers and that they would receive instructions and information on the task objectives in the form of a text.

Overall, few studies have previously paid attention to the factors and constraints of immersive learning of complex skills as exhaustively as the present study in the case of an early-development I-VRLE. In the industrial skills training field, researchers have typically concentrated on comparative studies that evaluate time- or score-based measures, and some studies have also measured usability, cybersickness, task load, or immersion (Radhakrishnan et al. 2021). Much like our study, previous studies have discussed hardware-related component features regarding cybersickness and accessibility (Martirosov et al. 2022; Radianti et al. 2020). It has been deemed inevitable that, with VR, some users will experience symptoms of cybersickness (Jerald 2015). However, more recently, Obukhov et al. (2023) offered a well-rounded review of cybersickness symptoms, their plausible causes, and ways to reduce their probability. Challenges regarding accessibility included difficulties in device control due to poor display resolution or cable disturbances in reviewed papers on higher education I-VR implementations (Radianti et al. 2020).

Moreover, previous studies have discussed software-related component features regarding feedback and realism. For instance, expert participants suggested the introduction of haptic feedback and more realistic virtual tool interactions to surgery simulation in a study by Pulijala et al. (2018). More broadly, Li et al. (2020) categorised narratives regarding immersion in a VR operating room (VOR) via semi-structured interviews. They specifically inquired about the uncompelling and unrealistic factors of the simulation and derived four principal narratives for experienced challenges in fidelity (Li et al. 2020): user interfaces (trocar and headset), VOR environment (OR setup, surgery steps, and sounds), team interaction (instructions, camera assist, and mood), and personalisation. In this study, we considered fidelity to be a component of the software. The challenges experienced were typically related to the designed environment, apart from movement fidelity, lack of virtual embodiment, and unrealistic physics. Nonetheless, we view that they all result from design choices regarding the program and its contents.

In this study, we also derived and discussed previously unaccounted component features and experienced challenges related to I-VR systems, such as instruction timing, instruction content, and system stability. Furthermore, we examined learner (capabilities, state of mind, and agency), learning activity (relevance realisation, and objectives), and virtual–physical space (spatial and navigational constraints)-related challenges pertaining to immersive learning situations. Our study brought forth new inquiry areas and post-evaluation needs related to the study and design of immersive learning. The assessment of the challenges and the development of innovations that pertain to the technology are ongoing. For instance, even though some of our study participants were concerned with the lack of virtual embodiment, the experimental results from Ricca et al. (2021) suggested that the partial embodiment of virtual hands had no impact on performance during motor training compared with only visualising the equipped tools. Furthermore, Chen and Chen (2022) began developing technologies to recognise HMD users' emotions from their facial expressions, which could prove beneficial to the interpretation of the learner's state of mind during immersive learning. Finally, the visual behaviours of I-VR users could be assessed through eye tracking and improvements incorporated into innovations to support immersed learners (Pastel et al. 2023).

Lastly, the analysis of v-SR data enriched our understanding of the tendency of experienced challenges' to form clusters; that is, certain issues tended to follow one another or occur in groups. Some survey responses also featured interlaced attributions between the expressed challenges. For instance, one participant reported that they became frustrated when they could not execute the DIY task and speculated that their changed state of mind further impaired their logical thinking. These observations might indicate that 1) experienced challenges have priorities, and 2) unresolved primary challenges might lead to further experienced challenges. If so, then the recognition and treatment of primary challenges becomes vital in I-VRLEs to enhance their autonomous training capabilities. These assumptions should be evaluated further in experimental settings in the future.

8 Summary of design implications

The purpose of educational VR is to offer spaces for learners to embed new concepts in their mental models (Johnson-Glenberg 2018; Mayer 2014). Training of complex skills (i.e. procedural skills and decision-making) requires attentive and active practice of relevant tasks under realistic enough conditions and suitable guidance. Three VR application development phases (1. identify training objectives, 2. design learning scenarios, and 3. implement with I-VR systems [Xie et al. 2021]) were introduced as part of the related work in this study. Furthermore, we reiterated that it is important that the phases are informed by psychological theories of learning, motivation, and performance (Kozlowski and DeShon 2004). In this section, we highlight what we consider to be the main implications of this study for the development and investigation of VR-based skills training simulations. For a pliable implication summary, we close with five implications, one for each of the principal factors derived from immersive learning in I-VR.

  • Self-discovery of mechanics. Regarding the implementation of the I-VR system, the study findings imply that the self-discovery of game mechanics and VR functions (i.e. how to make something happen with the controller) led to challenges when no in-game encouragement towards it was included. With VR training simulations, it might be more beneficial to offer, for instance, spatial controller tutorials (Kao et al. 2021) instead and allow learners to concentrate on practicing how, when, and where they should apply game mechanics (read as “relevant actions”) during simulation training. Indeed, developers might want to build up the complexity of the game and phase-in game mechanics accordingly when designing learning scenarios (Johnson-Glenberg 2018). In the future, adaptive systems could assist in the regulation of such scaffolds and fade them away once learners display sufficient skills.

  • Reciprocal interaction. The participants regarded automated voice instructions as too slow, non-interactive, and content deficient. Furthermore, the simulation did not offer sufficient feedback cues to meet each participant's needs. Overall, some participants appeared to have been unable to establish a reciprocal relationship with the I-VRLE. From the ecological-enactive cognition perspective, skilled intentionality, actions, and responsiveness manifest within affordance-rich sociomaterial landscapes (Rietveld et al. 2018). The availability and accuracy of social and environmental cues for the participants' actions were too scarce and limited within the early-development simulation. It also follows that simply adding automated voice instructions to a VR skills training application might not provide enough autonomous capabilities for remote training.

  • Competence-supportive design. The attribution of challenges to one's inexperience, challenges pertaining to the learner's state of mind, and the shortcomings of the software in its feedback and instruction content indicated that the learners' need for competence (i.e. the psychological need to exert a meaningful reciprocal effect on their surroundings [Legault 2017]) was also likely at play during immersive learning. Thus, we encourage researchers to contemplate future enquiries into competence-supportive designs in I-VRLEs to measure their impact on and variations in immersed learners' engagement, effort, and learning outcomes. We also encourage VR application designers and researchers to consider competency-related principles, theories, and valid assessment measurement (e.g. optimal experience [flow] in adult learning, EduFlow-2 scale [Heutte et al. 2021]) when designing the learning scenario and generating software content. For instance, how to incorporate the dynamic assessment of competency and increase the complexity and challenge as learners' skills grow (Johnson-Glenberg 2018).

  • Relevant tasks and objectives. In this study, the participants experienced challenges regarding the relevance realisation and objectives of the simulated tasks. These observations mostly pertained to the final DIY assembly task and some of them could be related to the novelty of conducting tasks within I-VR (e.g. object placement and perceiving functions) or to the perceived difficulty of the task (e.g. assembly task, assembly order, and assembly options). However, the incoherency of the performance requirements, inconvenience of the task layout, and unclear objectives should be addressed during the development process. The implications are three-fold: First, user testing could help discern whether task performance requirements seem consistent throughout the simulation. Second, when it is not crucial to the trained task, the virtual environment should not add complications to the task (e.g. having the chair parts scattered out of sight added excessive and irrelevant complexity to the DIY assembly task). Third, clear and goal-directed tasks should follow from a thorough identification of relevant work parameters and training objectives. Displaying those clear and professionally meaningful goals along with the trainee's progress in them might help direct their attention towards the completion of the task.

  • Spatial limitations. Lastly, the results of this study brought forth quite a few spatial and boundary challenges regarding virtual–physical space. The idea behind applying an I-VRLE for training is to enable learning, not to complicate it. Developers should consider whether navigating around the I-VRLE is a necessary and relevant part of the trained task and learning objectives. Some training simulations feature arm's scale spaces where learners remain seated or standing without the possibility or need to move around (Radhakrishnan et al. 2021). In case the skills training requires movement, developers have the option to implement VR treadmills (see Xie et al. 2021) or to introduce and make sure trainees master the game mechanics of teleportation early on to transition from physical modes (walking and running) to virtual modes (teleportation) of movement.

As Overton (2023) pointed out, the effectiveness of I-VR for impacting learning outcomes depends on developing and providing medium-specific and personalised simulations with clear objectives, active and engaging learning strategies, feedback, and possibilities for interaction. In a recent randomised controlled trial study, Johnson-Glenberg et al. (2021) derived comparable results with added emphasis on design factors supporting embodiment and agency. In general, the field of research and development of VR skills training applications could benefit from a clear and compact set of instructional principles of immersive learning design informed by basic psychological theories. We derived experienced challenges and raised design implications from the findings regarding the constraints and breaking points of immersive learning in an early-development I-VRLE. More research work to assess the development processes, design elements, and actor networks of effective I-VR training applications is required. We could use such applications as testbeds to study and evaluate various design implications such as those provided in this work.

9 Limitations

This study has a couple of limitations. Specifically, we conducted the study in experimental settings in a research laboratory. Without the global pandemic, we could have gathered data at a workplace with an I-VRLE designed specifically for the occupation's complex skills training. Going forward, demand for longitudinal experiments and evaluation of I-VR-based skills training in authentic settings remains high (Jensen and Konradsen 2018). They could improve the comprehension of immersed learners' potential challenges and facilitate better design and development of VR applications. Furthermore, the identified challenges are likely to be closely connected with the applied program, Funland. The no-frills simulation was still under development and did not include some well-established features and visual aids of I-VRLEs, such as spatial highlighting. Moreover, the participants received only a partial controller briefing before the simulation, whereas opportunities for the design of in-game practice and tutorials were typically offered for VR games (Kao et al. 2021). Nonetheless, the early development phase of the program served the purposes of the RPP project well, as many types of deviant behavioural data were captured, and we received many self-reported problem statements from which to derive the structure and potential challenges of the immersive learning phenomenon. Indeed, the goal of the RPP project was to elicit as many diverse challenges within an early-development self-study I-VRLE as possible without risk to the participants' health. However, the investigation of the self-reported survey statements somewhat detached the challenges experienced from the situations where they occurred. In future inquiries, we encourage researchers to consider how the immersive environment and training situation may have factored into the challenges of embedded immersed learners.

The strength of this study was the large amount of mixed-methods data that we managed to collect. Furthermore, the collected material was content-rich and multifaceted, which afforded the triangulation of the results. The survey data led to specific mentions of experienced challenges, while the interviews enriched them with contextual knowledge. The various video recordings (gameplay and interviews) and the facilitators' field notes supported our deductions. However, we advise against drawing conclusions by assigning significance to the detected challenges based solely on the number of problem statements they generated. One must consider the circumstances (e.g. early-development simulation with the intention to generate deviant behavioural data) and the self-reported nature of the data that we analysed in this study. Thus, further comparative research is required to determine the significance of the derived potential challenges. This study contributed knowledge regarding the categorisation of challenges that may occur in product testing and the early development of a VR application, design implications, and a mapping of the actants in an immersive learning entanglement.

10 Conclusion

We conclude that the factors and challenges that we derived from the participants' responses were those that came up in this precise research-setting context that examined challenges in an early-development simulation. However, it is not difficult to imagine that other factors (e.g. workplace environment and training setup) and components (e.g. trainers, co-trainees, attitudes, training time, group dynamics, automated tutor intelligence and aptness, and simulation creation and maintenance expertise) could be identified in different immersive learning settings (see, for example, Li et al. 2020). In this study, the proof-of-concept type simulation served the purpose of the study well by providing an environment where the participants experienced various challenges regarding the I-VR system and immersive learning situation. The extent of the immersive learning entanglement should become clearer as more studies arrange enquiries into its constraints and challenges in other settings (e.g. at factories, offices, higher education, and vocational schools). Naturally, each element and ability added to the environment presents the possibility of trade-offs. We value the study of the constraints of immersive learning and suggest that future work should explore them in the aforementioned true-to-life immersive training settings. With further enquiries into immersive learning and its constraints, we may begin to develop and design appropriate immersive learning settings, task layouts, and digital assistance for the technologically inclined youth and workforce.