1 Introduction

In this article, we describe the use of extended reality (XR) in higher military education at the Norwegian Defence University College (NDUC) in order to address teaching and learning of a complex concept. The Norwegian Defence University College (NDUC) provides study programs at the bachelor's and master's levels. NDUC studies defence and security policy, military leadership, military history, and military operations. Although military studies follow the same pedagogical levels as civilian education, there are distinct differences. Military education is becoming more multidisciplinary—involving weapons training, tactics, organisation, leadership, negotiations, and cultural awareness—to name a few. A core challenge is to transfer knowledge and real-life experiences from warzones to the classroom. Military educators use pedagogical and didactic tools as well as modern technology (such as extended reality (XR)) similarly to civilian education to better achieve learning goals in their programs. Likewise, educators might find it difficult to explain some complex concepts to their students. An example of such a concept is the threat-based approach to the protection of civilians in armed conflicts.

This concept was developed to aid military planners understand the role, utility, and limitations of force in protecting civilian population from physical violence, given the need for guidance tailored to militaries [6, 7]. However, few policies, doctrines or guidelines tell military planners and practitioners when to do what to protect civilians from different types of threats. The threat-based approach provides concrete threat-scenarios and tools to assist military students in understanding and discussing the role, utility, and limitations of force in different situations. It captures most types of violent situations facing civilians in the past 30 years.

The foundational building block of the threat-based approach is a deeper and more systematic understanding of the perpetrators of violence based on empirical studies of past wars. Only when military planners understand why, how, and to what effect perpetrators target civilians is it possible to design timely and tailored military protection responses [32]. The concept analyses threats to civilians along five parameters: actor type, rationale, strategies and tactics, relevant military capabilities, and the anticipated outcome in terms of human suffering. Depending on the answers to these five parameters, it is possible to categorise the types of threats to civilians with the help of eight generic scenarios, each describing a fundamentally different threat situation, demanding different military responses.

First of all, this concept is a thinking tool for planners, to make them better informed about how force may be effective to protect in particular situations. However, it also redefines the role of armed forces as protectors beyond the traditional understanding found in International Humanitarian Law. Therefore, it represents “troublesome knowledge”, when the learning topic contradicts the student’s previous knowledge or understanding of the world. This makes it challenging to convey this new understanding effectively. Some possible reasons for these challenges include:

  • protection of civilians is considered as “too soft” for armed forces, something typically left to humanitarian organisations and development actors.

  • protection of civilians might challenge deeply founded concepts of what the role of military force is and should be and represents a “threshold” for students to cross intellectually, professionally, and culturally.

  • the pedagogical challenge might be connected to whom officers are meant to protect, meaning a distant stranger vs. own country or family.

  • the threat-based approach to civilian protection is too complex a concept, with a range of questions and scenarios, representing another “threshold” for additional learning to occur.

These challenges motivated NDUC lecturers to start a project to address the “troublesome knowledge” and develop better teaching tools for the threat-based approach. The initial review of pedagogy, technology and learning theory led to identifying three key components of the project that could potentially help transform students’ understanding of the topic: Immersive Extended Reality (XR) technology, threshold concepts and empathy, as explained below:

XR describes immersive technologies, an umbrella term for virtual, augmented and mixed reality (VR/AR/MR) [41] that offers computer-generated multimodal experiences. The affordances and identified benefits of XR in learning, teaching and training [19, 36] make it a good technology choice to create computer-supported experiences, which would be difficult to re-create or revive in reality [10, 49]. Educational programmes in XR can stimulate senses and mirror behaviours therefore increasing empathy and understanding of various perspectives. In this way students are made aware of several viewpoints and contextual factors before making decisions. This is relevant for understanding the threat-based approach.

Threshold concepts can be defined as those concepts that are central to the mastery of a subject, requiring a new understanding by the learner [11]. Threshold concepts relate to certain aspects of a learning trajectory that are troublesome to the learner, require a shift in perception of the world (i.e., an ontological shift), and also require the learner to let go of previously accepted conceptions [11]. This pedagogical framework is therefore well-suited to address the “troublesome knowledge” inherent in the new approach to protection of civilians that contradicts the students’ previous knowledge and understanding of the field of study. Threshold concepts thinking can improve understanding of the reasoning behind the knowledge and affect the learners’ attitude integrated into their working practices, potentially provoking a shift in the mindset. When learners are able to effectively grapple with troublesome knowledge and embrace it into their existing understanding, they are able to deepen their understanding of the subject. In other words, they could be motivated to reflect critically or even alter their thinking and acting patterns. The process of leaving the well-known behind, places the learner in a state of uncertainty before the new understanding is accepted. Such a shift has implications for professional and personal identities since professions tend to form both [17, 28, 56].

The aspect of identity leads us to the necessity for acknowledging the perspectives of others, captured by the third pillar of our approach: empathy. In order to understand the phenomenon at the heart of this topic, it is essential to understand the perspectives of victims, the perpetrators of violence, and the military officers set to protect civilians from such violence. Empathy focuses not only on recognizing another person's emotions but also understanding the underlying reasons for those emotions, such as their experiences, cultural background, and personal ethics [48]. In this project, embodiment [40] was used to enhance the understanding of various vantage points and to shift from one role to another.

The major research question of this paper is as follows: How can XR (Extended Reality) aid military staff officers in learning troublesome knowledge about new ways of using force in civilian protection? To answer the main research question, the XR-based approach—underpinned by threshold concepts and empathy—was tested through a quasi-experimental setting. To overcome the identified knowledge threshold, NDUC—together with Norwegian University of Science and Technology (NTNU) and industrial partners Fynd Reality and Try—developed an educational programme in VR/XR, consisting of cutting-edge technological solutions and pedagogical approaches, including combined 360°-videos and digital embodiment, dialogue with virtual perpetrator, and collaborative VR-landscapes to encourage immersive peer-to-peer learning. The innovative XR approach assists military students in appreciating crucial factors involved in protecting civilians by force in armed conflict.

The article is structured as follows. The following sections present related work, the design of pedagogically-based XR application and convey the results from evaluation of the prototype XR program conducted in May–June 2022.

2 Related work

Civilians continue to bear the brunt of negative consequences of war and armed conflict [14, 60]. Since the end of the 1990s, military forces—most pronounced in UN peace operations—have been tasked with protecting civilians from deliberate violence perpetrated by armed actors [58]. Lately, NATO has also developed policies and guidelines that expect a more forward-leaning approach towards protecting populations under threat of violence [38]. It is no longer enough to avoid unnecessary and unlawful civilian suffering caused by military operations as described in International Humanitarian Law (IHL). Military forces must move beyond mere compliance to IHL and intervene actively to protect civilians from armed groups that deliberately target civilians. This is a new and poorly understood expansion of military tasks worldwide.

While IHL provides the most central rules and regulations for the conduct of warfare and armed conflict, it fails to address a major gap: providing physical protection against armed groups that intentionally target civilians. The threat-based approach to protection of civilians aims to bridge this gap from a military point of view. According to Beadle, who has developed the only holistic theory to explain the utility of force to protect civilian populations, it is of primary importance to understand why and how perpetrators attack civilians before designing proper responses [31]. The threat-based approach to protection of civilians encourages a deep understanding of threats to civilians and provides specific ideas on” when to do what” to increase the utility of force to protect. While tailored to modern warfare, Beadle’s insights reflect age-old ideas about the role and utility of force. Sun Tzu famously stated that “[i]f you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle” [57]. However, after almost a decade of teaching this new way of thinking to military planners worldwide, the threat-based approach seems to represent “troublesome knowledge”, demanding new pedagogical approaches.

XR prototypes and applications may answer this pedagogical dilemma, as they may shift the perspectives of military officers if they are exposed to real-life incidents in virtual environments. Access to real-life case studies in a safe virtual environment is critical for military educators when the risks, costs or ethical conditions make in-situ learning impossible. The literature on military education suggests that XR technologies could be very effective in training and meta-analysis of the conflict parameters, e.g., as reported by the US military [47, 53] and other international studies [62]. VR-tools are already deployed in a broad spectrum of teaching in military disciplines and new application areas are being explored [45, 52]. The Ukrainian defence forces use VR for training on different weapons systems such as manpads [59]. Other examples are countermeasure training [55], observation training in artillery [23] and tank command [29], among others [1]. The US army has created the synthetic Training Environment Cross-Functional Team which incorporates AR and VR for military training [51]. While VR/AR/XR technologies are widely used in military education [25, 37], there are few examples of implementing XR to teach so-called soft skills to military officers.

Effective XR-based educational tools demand careful tailoring to inspire the right learning. One of the soft skills related to protecting civilians is empathy triggered by drama-based scenarios and embodiment. Embodiment in VR refers to using an avatar’s body that “is apparently spatially coincident with their real body” [40]. To make the user feel embodied, motion capture creates the illusion of body transfer and could stimulate a perception shift that could lead to behavioural change [40]. Behavioural modification [3] is crucial in understanding the threat-based approach because it presupposes the learner understands the feelings, rationale, and motivational drive of all the characters involved in the conflict scenarios.

Empathy is a controversial term with positive and negative angles interpreted by disciplinary lenses [9, 63]. It is also greatly influenced by contextual factors [64]. Recent studies conducted by the US army provided evidence that cognitive and affective empathy could play an important role in the military context [46]. Work by Fernandez et al. [18] claims that exposure to an immersive VR could help increase affective and cognitive empathy. VR can be useful in educational settings to supplement competency training by allowing students to observe the situations other people endure [12].

The definition of empathy provided in Army Doctrine Publication 6-22 [61] is a first attempt to understand the term in military contexts:“Identifying and understanding what others think, feel, and believe”. According to McDougall [46], empathy could be taught and assessed in different cases [61].

That is why we employed three major categories of empathy to further explain, implement, and evaluate how military students may learn the threat-based approach better: the affective, cognitive, and associative aspects [48]. Affective empathy refers to one’s personal emotional reactions to others or crises. Cognitive empathy embraces diverse perspective-taking as a mechanism to understand individual motivations and actions. Associative empathy is the sense of proximity with another person [15] related to the military officer’s experiences and motivational drive. All in all, empathic concern is the desire to understand feelings, thoughts, and actions to promote other people’s sense of well-being or alleviate their suffering [16]—the priority of the threat-based approach.

As military operations usually require collaborative work, it was essential to include peer learning in the development of the XR experience. According to Harvard Business Review [42], the peer-to-peer approach can make the learning process more effective and enjoyable for the students/officers.

The overarching pedagogical framework for the XR project, the threshold concepts framework, as briefly presented in the introduction, is a part of a broader perspective on transformative learning [17]. As the name suggests, this pedagogical approach is inherently about ‘crossing a threshold’: letting go of previous certainties, i.e., your knowledge base, your skills honed over time, and also your values and beliefs, and embracing a new understanding and mindset. Most learners experience the “einstellung effect”, i.e., the reluctance and resistance to leave the safety of the known [13]. The totality of “a way of thinking and practising in the discipline” is questioned, and challenges “how to think like an officer” (or medical doctor, architect, engineer, etc.). Once this questioning is initiated, the process is irreversible, once a new understanding or mindset is obtained, there is no going back to the old ways of thinking. A threshold concept has the characteristic of being transformative (changes the way a profession is understood and performed), integrative (connects aspects of the profession previously not connected), discursive (adds to and expands the professional vocabulary and the way the profession is described and talked about) and troublesome (involves the parts of the profession not immediately in view, and that represents “the underlying game” of the profession) [56]. The transition period might be compared to a rite of passage—a liminal stage—and is a stage of uncertainty but also of affordances and expectations [56]. The threshold concept framework and its focus on liminality sits well with the so-called—social events characterised by Volatility, Uncertainty Complexity, and Ambiguity, and with the increased multi-disciplinary interest in “wicked” problems [28].

Prior to designing how to implement the project, data was collected to see what possible threshold concepts could be identified from interviews with students, teachers, and subject specialists, and in dialogue with the research group [11]. We established four main areas that resonated and reverberated throughout the interviews and discussions, and that emerged as gravitating points in the narratives presented. These preliminary threshold concepts were (1) liminality, as reflected in the scenarios and areas of conflict, (2) the military perspective in situational awareness (how to read the world), and a revised emphasis on 3) the perception of Otherness or understanding the Other, i.e. a renewed focus on understanding the diversity of actors involved in a conflict (hence the focus on empathy). Finally, (4) a sensitivity to student difficulties, where and how students may be stuck, is reflected in the identification of what constitutes the “einstellung effect” for the students.

In the next section, we present the design of an XR experience that has been inspired by the pedagogical theories.

3 Extended reality (XR) multiuser application for human security teaching

3.1 XR application structure

The requirements for the XR were defined based on the learning goals defined for teaching the threat-based approach. As mentioned in the previous chapters, the approach to protection of civilians covers the whole spectrum of violence against civilians. This spectrum is divided in eight categories ranging from genocide to mob violence [31]. Two categories, ethnic cleansing and insurgency, were selected to be implemented in the XR application. The application designers decided this was the best approach for implementing enough variation and complexity of threats whilst still keeping the task simple.

The course requires a combination of individual and group work. Therefore, the XR app needed to support multiple users.

The XR application was developed based on Fynd CORE (see Fig. 1).

Fig. 1
figure 1

Modular schema of the XR application

Fynd COREFootnote 1 is a platform developed in Norway to facilitate learning and training in virtual environments. Fynd Reality has been developing their application over several years, working with partners in the public sector, industry, and education. The guiding principle of CORE is to make modern learning technology easily available, but also accessible since VR applications often have a high barrier to entry. By offering a shared experience via either VR or desktop PC, it is possible to connect teachers and students in a way that is more practical.

Fynd CORE is built on a customized network backend, running on Microsoft’s Azure servers, designed for security, scaling, and functionality across continents, which is a limitation for many of the popular Unity networking solutions. This means that the application can be used in a wide variety of settings, with newer version of the program also working offline and on LAN networks.

A custom functionality was added to Fynd CORE during the development of the prototype, which was the implementation of simple progress tracking. Because the evaluation included metrics like heart rate and stress, it was important to be able to map events in the program to physical responses in the participants. For each user, a log file is generated with timestamps for key events in the program, for example when the 360 videos are played, and during dialogue segments in the dialogue simulation. A video presenting the project and the application can be found at https://youtu.be/DKulNaGoEBc

3.2 Design of the threat-based experience

The XR application consists of three parts: (1) Combined 360° videos and embodiment; (2) Dialogue with virtual perpetrator; and (3) Collaborative map exercise.

(1) Combined 360° videos and embodiment: In this first section the user is placed ‘in the shoes’ of two different characters, a civilian and a perpetrator (Figs. 2, 3, 4).

Fig. 2
figure 2

Screenshot of a combined 360°-video and embodied experience as a civilian

Fig. 3
figure 3

Screenshot of a session in front of a virtual mirror to facilitate the embodiment illusion, embodied as a perpetrator

Fig. 4:
figure 4

360° video with an embedded avatar of the user embodied as perpetrator

While embodying these virtual characters, the learners observe two different 360° videos with a scene unfolding in front of them. To facilitate the embodiment illusion [43], the learner is first placed in a scanned environment matching the scene in the upcoming movie, in front of a virtual mirror, in the virtual body of a civilian or a perpetrator. The learner’s movements are mimicked by his/her avatar in the mirror (Fig. 3).

The virtual avatar was set up to mirror the users’ real movements using the common approach to this in VR development, Inverse Kinematics (IK). To implement this, the FinalIK plugin for Unity was used [https://assetstore.unity.com/packages/tools/animation/final-ik14290]. This method of synchronizing a user’s movements is not without faults, especially when it comes to compensating for different limb lengths in different people (height of users varied greatly when conducting the evaluation). To mitigate IK issues as much as possible, it was decided to perform the embodiment section in a seated position for both characters.

To strengthen the embodiment illusion further, the surrounding virtual environment was populated with realistic props from the movie set. To obtain these props, the movie set was scanned after filming by a professional photogrammetry crew and processed during development. Due to the possible constraint of having to run on a Snapdragon XR2 mobile chip, the fidelity of the scan was reduced to the point where it can run without issue even on a Meta Quest 2. Because the focus of the user is supposed to be their own presence in the scene, the lighting reflects this and leaves most of the virtual room in darkness.

In the following 360° video scenes the users were still able to see their virtual bodies by looking down and observing lower body and hand movements following movement of VR controllers to sustain the embodiment illusion (Figs. 2, 3, 4). At the same time, while watching a 360° video with a full body avatar in VR, the users might still feel that their body is disconnected from the filmed environment around them, which has no depth. Scans were utilised again here, to replace the immediate surroundings of the user with scanned props that were removed from the set while filming. In this way, the users can see and ‘feel’ the furniture (tables) in front of them while watching the movie, bridging the gap and serving as a layer between the two-dimensional (360° video) and three-dimensional (avatars) elements. Since the props were scanned on set, the lighting perfectly matched the ones seen in the video. The 3D elements therefore feel very integrated in the full picture, and the result is an experience divided into three layers. The immediate layer to the user is the virtual body, corresponding to their movements and visible when looking down. The second layer is the area surrounding the user, with the scanned props acting as a bridge between the virtual body and the recorded movie set. The final layer is the stereoscopic 360° video. This method of making the 360° video more interactive through embodiment is rather novel and constitutes one of the major contributions of this project.

Both 360° videos had a duration of around 8 min. All actors had directions on how to behave throughout the scene, since 360° video by definition captures everything that is happening, all actors always being visible. The script for the video sequence also had to take into account the embodiment of the user, so that the user would feel integrated in the scene and included in the actions taking place. This was achieved by letting the actors make eye contact with the camera at certain points, and subtle acknowledgements like nods.

For filming a Z CAM v1 PRO camera was used, consisting of 9 lenses at f/2.8, with a dual lens setup for stereoscopic video, shooting in 7 K at 30fps. Although being rather powerful, this setup had certain drawbacks that influenced the way the production had to be performed. While planning the shots, the near limit of the camera of 1,20 m meant that the scene had to be played out at least 1,20 m distance from the camera set in the centre. This led to certain constraints in how the actors could act close to the camera. The f/2.8 aperture of the camera required the whole set, including all corners, to have a very good light in order to avoid noise. The darker tone of the final 360° video is instead achieved with colour grading in the post-production. Finally, the camera did not include an internal cooling system, which limited run-throughs to about 5 sessions before the production had to take a break to allow the camera to cool down [22].

(2) Dialogue with a virtual perpetrator: The 2nd part of the XR experience places the students in a conversation with a virtual perpetrator. This scene takes place in a war-torn urban environment, using assets designed by NATO Allied Command Transformation in another project, with changes to better fit the context of our experience. The scene was designed for the user to be standing stationary in front of the virtual perpetrator. Various guns and military equipment were scattered throughout the environment, and virtual militia soldiers were patrolling the scene and inspecting the user (Fig. 5). The intention of this set design and background characters was to give the feeling of the user being present in a real conflict zone, at the mercy of the perpetrator that the user was supposed to interview.

Fig. 5
figure 5

The dialogue with the virtual perpetrator

In contrast to the 360° video however, this environment was fully modelled in 3D, which allowed the user to move around and explore it if desired.

The virtual perpetrator was based on an existing character model by Fynd Reality, designed for use in procedure training and dialogue simulation on the multiplayer platform Fynd CORE. This character has previously served as a parent in a parent-teacher conference, as a traffic accident victim, and as a figure dropping off a suspicious package at the Akershus Fortress gate (headquarters for NDUC) in Oslo, Norway. For this specific case, the virtual human was given a new set of clothes and modified to fit the setting. The user is also represented by an avatar in this experience, a more generic male one without known history to the user and without the preceding embodiment procedure in front of a mirror.

While conversing with the virtual perpetrator, the user was presented with dialogue options in the form of text buttons in front of the 3D character (Fig. 5). The learners could use their hand controllers to point and select between different responses and questions. The virtual perpetrator responded with a simple AI created by Fynd, set up in a predetermined dialogue tree, but with certain parameters influencing the outcome at any given point. For example, the AI system includes a feedback loop where the user’s dialogue choices influence the mood of the virtual perpetrator. This affects the conversation in such a way that some dialogues can only be reached if the character is in a certain mood when reaching that part of the conversation. In other cases, there is a random element to the dialogue branches, to create variation in the resulting dialogue upon multiple playthroughs. The virtual character has a full set of emotions and facial features created through the use of blend shapes in Unity that have been connected to the AI system. The result is that the simulated mood of the virtual character is reflected in its facial expression, and all lines of dialogue are lip synced during runtime. The character shows hospitality and friendliness but can also react with hostility or appear insulted by some questions. This method of reacting to the player’s actions and input is a major factor in the immersion of this part of the experience.

(3) Collaborative map exercise: The last part of the experiment was also created using Fynd CORE, building upon its multiplayer functionality. Each user was represented in the experience with by a small “robot” avatar with disconnected hands (Fig. 6). Through CORE’s networking functionality, this part of the experience could have been run completely distributed, with the participants collaborating over voice communications. During the evaluation, this feature was disabled since to the participants were all in the same room.

Fig. 6
figure 6

Screenshot of users in the collaborative map experience

This part of the XR application was very different from the previous ones and was designed as a bird’s eye view of the same conflict scenario as previously. A virtual map of the fictitious country of Suania was laid out on a table ~ 3 m across, including 18 multimedia clues that the students could engage with and inspect. In total 40 clues were created, with the rest to be implemented in a later version.

The room was designed to easily facilitate 20–30 simultaneous users, if a group of such size was ever necessary. The clues on the map were spaced out across the whole area, and some clues were placed on the walls around it. This was done to encourage the users to move around the space and engage more physically with the map, clues, and each other. Early drafts of the environment included more tables and chairs, but since they served no practical purpose in VR, they were removed. The map was also scaled up significantly while prototyping, moving from a regular dining table to something much bigger. This means that instead of standing around the map, the users walked on it instead, changing the perspective a lot during the experience (Fig. 6).

The clues present on the map pointed to the two threats to civilians scenarios that run through the whole experience, insurgency and ethnic cleansing. The content was presented in the forms of images, press releases, videos, audio clips, drone footage, and more (Fig. 6).

Users interacted with the map through the laser pointer on their controller. This was used both to activate clues, and to point while communicating with other participants. When moving between clues, the active selected clue was synchronized across all users, posing an interesting challenge to maintaining shared workspace awareness while collaborating.

To conclude, during the 3 parts of the XR application the students could experience the situation from the perspectives of civilians and perpetrators and collaborate with peers to gain an empathetic overview of the conflict.

4 Evaluation of the XR application

A study was designed and carried out to evaluate the use of Extended Reality for teaching the threat-based approach to protection of civilians. The study got ethical approval from the Norwegian Centre for Research Data (NSD). A mixed-methods approach was chosen.

4.1 Evaluation methodology

4.1.1 Instructional design

A 5-step instructional design was created to support both the learning process and learners in an immersive setting. The sequential order of the five steps using various media created a learning experience that challenged the students' critical and reflective thinking. The core of the learning experience was a fictional threat scenario in Suania. It was developed to enable students (military officers) to experience a conflict from the perspectives of civilians and perpetrators and collectively analyse the existing threats to civilians in Suania while using the guidelines of the theoretical framework [11] in the following steps:

Step one: Since threat-based approach can be perceived as complex at first glance, it was necessary to “get everyone onboard” before the exploratory journey began. Therefore, the educational program started with creating a common intellectual platform through a lecture. The lecture took approximately 30 min in total and introduced the ideas of the threat-based approach to protection of civilians to refresh what the students had already explored in an e-learning course (https://www.forsvaret.no/en/courses-and-education/human-security-and-the-military-role/).

Step two: The students used VR headsets and controllers and jumped into a civilian family's living room, embodied as the character Uncle George, representing an ethnic minority in the capital of Suania. He came to celebrate his niece's birthday but observed the implications of threats (insurgency and ethnic cleansing) against the family (Fig. 2) and the despair of the little girl instead. This part targeted the emotional empathy of the students via a drama-based and experiential note.

Using the same equipment, each student embodied a member of a perpetrator group involved in ethnic cleansing (see Figs. 3, 4). This experience focused on cognitive empathy and aimed to mirror the perpetrator's rationale in planning his military strategy. After the end of this step, students had to take a break to avoid cognitive overload caused by the extensive use of controllers and headsets. The break also offered the opportunity to discuss their first impressions with their colleagues and educators.

Step three: This phase was designed to engage the students more with the perpetrator's way of thinking and target students' cognitive empathy. This step could be regarded as a motivational drive that would later trigger their analysis and decisions. Playing the role of an intelligence officer, each student had a dialogue with a virtual perpetrator and chose questions from a digital list (see Fig. 5). The questions indicated concerns about territorial conflicts and civilians' security. After the end of this session, students had a conversation to dig deeper into what they had experienced.

Step four: The focus of this step was on working as joint forces team on a 3D map of Suania as military officers usually do with operational planning. Peer learning and listening to the opinions of fellow officers enhanced their analysis. The clues on the map gave them further information via short videos, newspaper headlines, social media posts, and animations. The students could mark their presence as generic avatars and communicate with each other (Fig. 6).

Step five: The primary goal of the final step was focused on honing associative empathy via in-depth and reflective meta-dialogues about the scenarios, the people's perspectives and their roles in the scenarios. Associative empathy is about connecting the dots with experiences, knowledge, skills and beliefs that have the potential to act as a motivational drive and urge participants to change roles or make a decision. At this step, students could identify the threats they had dealt with (ethnic cleansing and insurgency), analyse contextual factors, feelings, and arguments, and possibly envision their role in such a conflict.

Each step was designed based on specific learning goals and a specific form of empathy to address ‘troublesome knowledge’ and enable ‘threshold crossing’ through XR storytelling.

4.1.2 Technical setup

The evaluation was carried out at NDUC in Oslo. Reverb G2 headsets were used with VR-ready laptops. The laptops were placed on tables in the same room. Cameras and microphones were used to capture the participants' activities and conversations. Physiological data were recorded for a subset of 15 participants using Biopac MP150 system (Biopac System Inc., Cam USA), BioNomadix amplifier, and AcqKnowledge 5.0.6 (sampling frequency of 2 kHz). Our goal was to use a photoplethysmogram (PPG) to index cardiac activity. In addition, electrodermal activity (EDA) was used to index the activity of the sympathetic nervous system. Our hypothesis was that these indexes will provide indication of stress [34], arousal [24], and empathy [39].

4.1.3 Participants

Nineteen participants, 17 male and 2 female, were recruited. We are aware of the gender disbalance in the sample, however both Norwegian Armed Forces and military institutions otherwise traditionally have much fewer women than men in their ranks. Therefore, the sample is rather representative for our intended target group. They all were enrolled in the master’s degree at NDUC. The military branch of the participants was the following: Army (6 participants), Air Force (6), Navy (4), Logistics (1). Two participants did not state their branch. The ages of the participants ranged from 34 to 47 with a mean of 39.6 and standard deviation of 3.4. One participant did not specify their age. The experiences of the participants in the military ranged from 14 to 26 years with a mean of 20.3 and standard deviation of 2.9. Prior to the trial, they received information about the study and signed a consent form. It was a requirement to be enrolled in the masters’ program to take part. No additional exclusion criteria were defined. No vision correction apart from basic headset adjustment was performed.

4.1.4 Questionnaires

To find an answer to the research question of learning troublesome knowledge (along the dimensions of understanding the theory behind the threat-based approach, empathy and threshold concepts as behavioural shift), an evaluation process was designed as follows. The participants were asked to fill questionnaires at different stages of the pilot evaluation. Demographics were collected from participants including sex, age, branch of armed forces, years of service and previous experience with XR. The participants answered pre- and post-questionnaires on the constructs of self-efficacy in analysing threats to civilians in armed conflicts [4], empathy [43], motivation to learn more about the threat-based approach and attitude towards using military force for protecting civilians. The participants also were asked to rate the three modules of the XR application (embodiment/360, dialogue-based simulation, collaborative map exercise) on constructs such as Perceived learning [35], Cognitive aspects, Assessment of the application and General technical aspects. At the end the participants also answered post-questionnaires on the constructs perceived learning, behavioural change, reflective thinking, general satisfaction, value [2], perceived usefulness [54], and technical aspects.

4.1.5 Procedure

The procedure involved taking part in a lecture and participating in discussions. Participants were welcomed and took part in a lecture for approximately 30 min. The data collection started with the participants filling a self-reporting pre-questionnaire. Selected participants were fitted with sensors for collecting physiological data. Then the participants were invited to use the XR application starting with the 360/embodiment experience, followed by dialogue-based simulation and the collaborative map modules. They were asked to fill questionnaires after each experience. The participants had breaks after each module. At the end they filled a questionnaire regarding the whole experience. The whole procedure took approximately 5.5 h.

4.2 Results of the quantitative evaluation

Each participant answered five questionnaires during the evaluation. The results of those questionnaires are presented as follows: We first present results comparing answers to the pre- and post-test questionnaires on constructs Self-efficacy, Motivation, Attitude and Empathy. These are followed by listing the results of Questionnaires 2 to 4 that focus on constructs associated with the different modules in the XR application and the overall evaluation of the XR application.

4.2.1 Results from pre-and post-questionnaires

The participants filled pre- and post-questionnaires on the Human Security and Threat-based approach (construct self-efficacy), Attitude to using military force in protecting of civilians (construct attitude), Motivation to learn about Human Security and Threat-based approach (construct motivation) and Empathy. The method employed measures changes in empathetic rationale, behavioural changes, and potential threshold crossing.

First, the participants were asked to rate their confidence in their knowledge (self-efficacy) regarding human security and threat-based approach to protection of civilians. For this part of the study the participants used a 7-Point Likert scale ranging from 1 = strongly disagree to 7 = strongly agree.

The results are mostly clustered around positive values on most questions (see Table 1).

Table 1 Modes for questions on self-efficacy

The results suggest that the participants rated positively their abilities, especially in the post-survey, which can be translated as enhanced understanding of the main principles of the threat-based approach. The fact that they did not use the highest values may suggest that there is possible room for improvement.

The results for the statements “I am confident that I am able to identify the threats to civilians in contemporary wars and armed conflict” and “I am confident that I am able to apply the knowledge of threat analysis to protect civilians in military planning processes” illustrate the changes in mode between pre- and post-questionnaires, indicating that the overall pedagogical experience contributed to better understanding of the topic (Fig. 7).

Fig. 7
figure 7

Results on identifying threats and applying knowledge of threat analysis

The study also collected data on attitude to using military force in protecting of civilians before (Table 2) and after (Table 3) the participants experienced the XR application. The participants rated their attitude on a seven-point scale ranging from 1 = No responsibility to 7 = Very high responsibility and from 1 = not at all to 7 = to a very large extent. The results cluster towards positive ratings both in the questions filled before the XR experience and after. The change in Attitude is related to shift in the mindset in the context of the threshold concepts framework.

Table 2 Results of Likert scale on Attitude pre-test
Table 3 Results of Likert scale on attitude post-test

The results to question “What responsibility do you think military forces have for actively protecting civilians from armed groups that attack civilians during armed conflict?” were the following: In pre-test (some responsibility: 4-high responsibility: 7-very high responsibility: 8) and in post-test (some responsibility: 3-high responsibility: 9-very high responsibility: 7).

The main difference in pre- and post-test results for Attitude questions is that those who chose a neutral answer tended to go for a more positive rating after the intervention. A possible interpretation of these results is that the XR experience somehow concretized the expectations and views of some participants concerning obligations and responsibilities of military forces, resulting in choosing a more positive rating. The differences however are not enough to be significant.

The participants were asked to rate questions related to motivation to learn more about the human security concept and use it more actively in own working practice.

Four statements were presented regarding Motivation. The statement was presented as “I am motivated to…” then followed by the text in each statement. The results from pre- and post-tests are presented in Table 4a and b.

Table 4 Results to (a) motivation statements pre-test, (b) motivation statements post-test, (c) empathy statements pre-test and (d) empathy statements post-test

The results for motivation are mostly clustered towards positive with very few ratings on neutral. It is interesting that in the question “learn more about the variation of threats to civilians in armed conflicts” some participants seem to have changed their rating from the pre-test to post-test towards a more neutral rating. This could possibly be explained that the experience made the topic more concrete and understandable for the students.

The participants also graded four statements regarding Empathy towards participants in the conflict in pre- and post-test (Table 4c and d).

In empathy there are negative ratings in the pre-statement “I understand the motivation of different perpetrators in an armed conflict”. That is the only statement with negative ratings. This could be partly contributed to the fact that the importance for understanding the motivation of perpetrators, as opposed to victims, is not yet commonly acknowledged. In post-test there are no longer negative ratings for Empathy, which might imply that the experience highlighted the importance of understanding the motivations of perpetrators in armed conflicts for some of the participants.

In the post-test, there are more answers in the ‘Strongly agree’ category, but the difference is not significant. Possibly some participants might have produced socially acceptable answers, but also had felt that their expectations had been confirmed or that they were not sure if the empathy level did change.

If we analyse the results as interval data, we could compare pre-and post-test constructs such as self-efficacy (Tables 5, 6), attitude, motivation, and empathy.

Table 5 Paired samples statistics for self-efficacy
Table 6 Paired sample test for self-efficacy

The means for pre-test and post-test are (M = 4.79, SD = 0.67) and (M = 5.56, SD = 0.70) respectively. As shown in the table, the difference is statistically significant, t(18) = − 5.22, p < 0.001, Significance One-Sided p: 0.00003, Significance Two-Sided p: 0.00006.

This difference between pre-and post-test can be due to the shift of ratings clustered around the “somewhat agree” option to an “agree” option. This could indicate that the pedagogical experience contributed to a better understanding of the Human Security and Threat-based approach among the students. At the same time students had taken part in the XR experience consisting of 3 modules when they filled the post-test questionnaire. They had also had group discussions about the threat-based approach before filling the questionnaires, which was an integral part of the combined pedagogical design. The students experienced several factors during the data collection, something that increases the difficulty to isolate influencing effects that can explain the pre-and post-test difference.

A Wilcoxon signed-rank test on the constructs attitude, motivation, and empathy provides the descriptives presented in Table 7.

Table 7 Descriptives for the constructs attitude, motivation and empathy

The ranks are given in Table 8. For attitude, there are 2 negative ranks, 3 positive ranks and 14 ties. Motivation has 10 negative ranks, 5 positive ranks and 4 ties. While empathy has 3 negative ranks, 6 positive ranks and 10 ties. The changes between pre- and post-test are not significant for constructs attitude, motivation, and empathy.

Table 8 Ranks for Wilcoxon signed-rank test (attitude, motivation, empathy)

That was expected when looking at modes for these constructs. The results suggest that not enough participants changed their rating of statements between pre and post. In most cases we can see ratings changing from neutral to a positive rating like somewhat agree or agree. However, the ratings in disagree or strongly disagree are a minority in all cases, and in many cases all values moved towards positive in a way that appears to keep the same spread.

4.2.2 Evaluation of the modules

Participants were presented with questionnaires after they had experienced each of the 3 modules of the XR application.

4.2.3 Embodiment/360 experience

In this module, the participants experienced a fictitious threat situation from the perspectives of civilians and perpetrators through combined embodiment and 360 video exposure (Figs. 2, 3, 4, see also description in the previous sections).

After the experience, the participants were asked to rate perceived learning, cognitive and general aspects, as well as providing general assessment of the module. The results regarding perceived learning were mostly on agreement with the statements (see Fig. 8).

Fig. 8
figure 8

Results for perceived learning after embodiment

These results can be interpreted as the participants understanding what the goal of the XR application was and therefore having a positive opinion in terms of feeling they got something out of the learning activities.

The majority of participants rated positively the contribution of the embodiment experience (see Fig. 9).

Fig. 9
figure 9

Results of cognitive aspects after embodiment

The interesting feedback regarding cognitive aspects is the number of neutral answers to the statement “The 360/embodiment exercise helped me to have a better overview of the educational content”. It is possible that participants were more willing to accept the role that embodiment played in the analysis, but some might have been more hesitant to accept a role of embodiment as an educational tool.

The participants rated the technical aspects generally in a positive way. Nonetheless it is interesting that in the statement: “I was completely captivated by the virtual world” there was a high number of neutral ratings (see Fig. 10).

Fig. 10
figure 10

Results of general technical aspects after embodiment

It is ambitious to expect participants to be fully unaware of their surroundings. Probably having several participants in the same physical space, together with the investigators, did not contribute to an environment where users will disconnect from their surroundings. Nonetheless, the sense of presence seemed to have been largely convincing, which might be associated with the appreciation of the visual quality of the scene, judging by the similar ratings. The majority considered the quality of graphics good enough for focusing on the task, with those in disagreement matching roughly the numbers of negative ratings to the previous statements related to visual quality. This could be a case of some being more practical whilst a handful might have higher expectations on computer graphics. In addition, some of the students experienced technical problems and lagging due to the heating up of the laptops, which might have influenced their experience. The rating regarding motion sickness effects is not surprising since usually only a handful of users experience any effects.

The assessment of the module (Table 9) was largely positive as suggested by the mode (all statements had a mode of 4.). Ratings to statement “I felt that the virtual body I saw when looking at myself in the mirror was my own body” suggest some participants are unsure about being fully embodied and therefore not committing to a more positive rating. It is interesting to note that there was a significant variation in the individual responses from the participants when it comes to perceived sense of presence, the value of the experience as educational tool and the statements regarding perceiving virtual body as own body. It might be attributed to differences in personalities and professional background.

Table 9 Assessment of embodiment module

4.2.4 Dialog-based simulation

After working with the 2nd module of the XR application (Fig. 4), the participants were asked to rate their experience where they interacted with a virtual character representing a commander of a militia group. As with the previous module, the participants were asked to rate perceived learning, cognitive and general aspects, as well as providing general assessment of the module.

The statements on Perceived learning mostly received positive ratings. The statements “What I learned during this XR experience, I can apply in a real context” and “I gained a lot of factual information on the threat-based approach and protection of civilians during this XR experience” have a higher number of neutral ratings (see Fig. 11).

Fig. 11
figure 11

Results of perceived learning after the dialogue experience in XR

The statements related to cognitive aspects were also rated mostly positively (see Fig. 12).

Fig. 12
figure 12

Results of cognitive aspects after the dialogue experience in XR

The rating of the general technical aspects was mostly positive (see Fig. 13). It is encouraging that participants did not seem affected by cybersickness. There was a high neutral rating regarding visual quality of the scene, as it might have been perceived as ‘cartoonish’ by some of the participants.

Fig. 13
figure 13

Results of technical aspects after dialogue

The module mostly received positive assessments, with all statements having a mode of 4 (as shown in Table 10). The ratings for statement “The perpetrator had a sense for what is right and wrong” might be influenced by the difficulty in assessing the virtual perpetrator and acknowledging its agency.

Table 10 Assessment of dialogue-based simulation

4.2.5 Collaborative map exercise

The participants experienced the previous interventions in XR individually using a mixed reality headset. In the collaborative map experience, they were simultaneously interacting with other members of the course in groups (Fig. 6). They were again asked to rate statements associated with the experience. The ratings for Perceived learning were mostly clustered on “Agree” and “Strongly agree” (see Fig. 14). The last two statements had more spread in the rating, this could be because the applicability of the insights from this experience might vary for participants from different branches of the Armed Forces, such as Navy vs Air Force.

Fig. 14
figure 14

Results of perceived learning after the collaborative map exercise

The results for cognitive aspects were rated mostly positive (see Fig. 15). The number of neutral ratings for the second statement warrants follow up in the future.

Fig. 15
figure 15

Results of cognitive aspects after the collaborative map exercise

There was mostly agreement on the ratings of the statements regarding technical aspects (see Fig. 16). It is difficult to interpret the neutral rating to cybersickness, but it might be associated with issues that some participant commented on regarding movement quality during navigation.

Fig. 16
figure 16

Results of technical aspects after collaborative map exercise

The results from the assessment of the module are presented in Table 11. The mode was 4 for all statements except the statement “There was a sufficient number of clues on the map to facilitate analysis of threats to civilians” where the mode was 3. The assessment of the module was largely positive as suggested by the mode values. The ratings to question “It was easy to interpret and read the clues on the map” matches comments made by some participants of wanting improved clues.

Table 11 Assessment of map module

The experiences in the 3 modules were different from each other; therefore, we are not comparing them directly. The diversity of the experiences was meant to address different pedagogical approaches and aspects of the threat-based approach and complement each other. No statistically significant difference was found in perceived learning and cognitive aspects between modules. When the students asked to compare the modules explicitly, embodiment and 360° was considered slightly more useful and engaging than the other two modules.

4.3 General evaluation after the XR experience

4.3.1 Perceived learning, behavioural change, and reflective thinking

After the participants had completed all interventions, they were asked again to rate their Perceived learning (Table 12).

Table 12 Perceived learning combined

Only 18 participants responded to these statements (and only 14 answered the last question). It is interesting to note that there are some small differences between perceived learning after the collaborative map exercise and the combined values. In the question “I gained a good understanding of the threats presented during this XR experience” “Agree” went from 12 after collaborative map exercise to 14 after combined. In the statement “What I learned during this XR experience, I can apply in a real context”, there is also a change in values to a more positive rating. Nonetheless, since the number of participants is different, we cannot make any definite conclusions from this comparison.

Table 13 shows the results of questions to behavioural change. The results clustered around neutral ratings in three questions, which is reflected by the modes. While the participants did not acknowledge a subjective behavioural change, the majority of them expected to be “better at analysing threats to civilians in armed conflict” after the experience and generally perceived the combined XR experience as “worthwhile”.

Table 13 Results to behavioural change statements

The participants rated the statements associated to Reflective Thinking (Table 14) largely positively. The mode for the statements was 4 except for the statement “After going through the XR experience, I was able to become a better learner” where the mode was 3. These results suggest that the XR application to a certain extent helped participants to reflect on their understanding of the topic and connect it to their previous experiences.

Table 14 Results to reflective thinking statements

4.3.2 General satisfaction, perceived usefulness and value

Twenty statements were used to assess general satisfaction, perceived usefulness and value using a Likert-5 scale (Table 15a, b).

Table 15 Statements (a) 1 to 10 for general satisfaction, perceived usefulness and value, (b) 11 to 20 for general satisfaction, perceived usefulness and value

4.3.3 System usability and general technical aspects

The result for System Usability Scale [5] was a SUS score of 79.74. The outcome is consistent with the overall positive estimations observed in the Likert-scale. Overall, the application seems to be usable for the purposes of the activities it was designed. It must be clear that the limited number of participants impedes a generalisation based only on SUS but it provides a base for further improvement.

The assessment of additional technical aspect was rated mostly positive (Table 16). The ratings for these statements are consistent with ratings in other constructs and suggests that most users could use the XR application and experienced the benefits of an immersive interactive experience therefore rating the statements positively.

Table 16 Results from assessment of technical aspects

4.4 Qualitative comments from participants

On each questionnaire there was space for participants to provide voluntary comments. Not all participants commented. The comments are summarized below.

4.4.1 Comments after embodiment/360

One participant commented “The table I was sitting at was too high; it should relate to my real wood table so that my virtual arms are not under the virtual table.” The comment is interesting feedback related to potential mismatch in room calibration or not anticipating a coincidence between real and virtual furniture. The possibility that there were issues with room calibration is reinforced by this other comment “Some issues regarding elevation of my avatar made me look unnaturally small compared to the surroundings”. These aspects might have influenced the effect of embodiment on the participants.

Some comments suggested cognitive overload, for example “Maybe too much specific information regarding places, names, etc. It made me focus on trying to remember names vs getting the overall picture. Maybe actors could be named in advance”.

Other comments revealed some confusion about the context of the experience as in this one: “Some kind of factual reference i.e. a map or something could be helpful to reference facts, i.e. names and location. Could be factual map, the desk in front of the perpetrator, could be typical "holiday postcard" in vicinity of the …. TV could have been closer in the first scene.

4.4.2 Comments after dialogue-based simulation.

One of the comments questioned the credibility of the dialogue: “The dialogue options were very far from how I would picture a conversation like this playing out”. Another participant pointed out usability issues: “Sometimes a bit difficult to hit the "clicker”. Need to press to aim then click and by the time I click it is "deactivated". There is an interesting comment reflecting on the options selected in the dialogue: “A lot of 'direct' questions. I was trying to have a friendly and understandable approach but in hindsight I think I should have taken the more direct questions in order to get an understanding on how the perpetrator acts”.

There were suggestions for improving the XR application: “My own reply should be read out loud in order to create better dynamic in the conversation”. A comment provided explicit feedback on the main purpose: “This has potential as an educational tool”.

4.4.3 Comments after map experience

There were some extremely opposite comments after this experience: “This exercise is tricking me into learning, very nice!” contrasting with “Expectations not met!”. The following comment suggested there are some technical aspects to review: “Would love a much higher details of clues, dots to look at, actions etc. on the map. It is an amazing solution. Would like for it to be easier to remove "pop down"/“get up close" (options on the clues) to read quickly. Mix of audio, video, image and text is very nice”.

4.4.4 Comments after completing all the parts of the experience.

The comments emphasised the need to have an instructor supporting the discussion “For learning on the protection of civilians to take place…. a more guided approach is needed.” Other comments included: “Experience was good in getting me to relate to the hardships of the civilian populace”, “Good learning experience, but I would like some more information regarding the background of the conflict. It made it a bit confusing analysing the threat to civilians, and one had to make a lot of assumptions.” One of the comments was very positive “I do think the XR strongly could contribute to enhanced understanding and provide high value. I also think exposure can be increased to challenge and provide learning and understanding.”

4.5 Physiological data collection

EDA (electrodermal activity) data was analysed using tonic and phasic activity. Phasic activity (SC) provides insights into prolonged influences, such as ambient conditions, mood, or overarching arousal levels. Conversely, the phasic activity, manifested as the Skin Conductance Response (SCR), pertains to the transient, rapid fluctuations in skin conductance post stimulus presentation. SCR serves as a reflection of a more immediate and transient physiological reactions to specific stimuli.

EDA data revealed that the tonic activity was lower during the dialog module (mean ∓ STD: 4.6 ∓ 1.7 μS), with respect to the other phases of the experience (baseline 8.0 ∓ 4.3 μS; 360 video 7.0 ∓ 3.9 μS). Tonic changes occur in absence of external stimuli and reflect psychological state and autonomic regulation [8]. A lower tonic activity during the dialog module suggests that, at the beginning of the XR experience, participants were excited, then their emotional arousal decreased. Another explanation, suggested by post-experiment feedback, could be that the embodiment part was the most engaging for the participants, as they do not know what to expect.

The phasic activity, which changes because of external stimuli [63], was the highest at baseline (0.14 ∓ 0.07 μS; 360 video 0.07 ∓ 0.04 μS; dialog 0.07 ∓ 0.07 μS) It is likely that participants were more stressed at the beginning of the experiment, and then their stress level decreased [44]. Such decrease could be explained in two ways not mutually exclusive. On one hand, as soon as the experiment proceeded, they became more familiar with the situation, and stress levels decreased. On the other hand, XR might play a role in reducing stress [30]. The data collected so far is not enough to get an indication to probe out hypothesis on indication of stress, arousal and empathy.

5 Discussion

Officers today need new theoretical and practical guidance to envision and experience their potential role in armed conflicts. The major challenge of understanding the threat-based approach lies in wearing the lenses of all the people involved (civilians, perpetrators, protectors/military forces etc.) and taking into account specific contextual factors according to the scenarios of the threat-based approach.

To address this challenge, experiential learning methods and threshold concepts approach with XR were deployed to engage students with the context via realistic cases. The suggested threshold concepts theoretical framework entails mentality shifting via reflecting on roles and seeing consequences of conflicts on civilians under new lights, thus crossing thresholds. This behavioural change seems to be affected via emotional, rational and motivational drives (affective, cognitive and associative empathy). Experiential media such as 360° videos and XR environments can provoke such a shift in mentality to enable officers recognise the implications of violence on civilians and perpetrators from an empathetic point of view [33]. Understanding the threat-based approach from an empathetic angle means that the students can identify the emotional factors, the various rational or irrational perspectives, and the motivational drives of all people involved, including themselves as joint operation military officers. Of course, to have a behavioural shift takes time and repetition. Still, immersive technologies allow students to experience virtual environments that may directly impact cognition in several ways. This includes perspective-taking [26], embodied experience, mirroring behaviours (under the impact of mirror neurons) and emotions [27]. That is why we adopted an innovative approach in our project, where a “traditional” embodiment in a VR world was integrated with drama-rich 360° videos in the 2nd step to target affective empathy. The subsequent steps were designed to affect cognitive empathy through dialogue-based simulations with virtual humans (perpetrators) and through contextualized peer learning [42]. Therefore, our approach is both novel and firmly based on modern pedagogical theories.

The research question this paper attempts to address is how/ (if) XR (Extended Reality) can aid military staff officers in learning troublesome knowledge. Despite the limited data and convenience sampling, there are some signs that the XR approach could be a useful addition to military education on the topic of threat-based approach.

Learning troublesome knowledge within the field of Human Security is defined as a three-fold approach: XR, Empathy and Threshold Concepts. The preliminary results from our research could indicate that:

(1) XR could enhance understanding of the threat-based approach for the protection of civilians with the intervention of educators who could moderate discussions, run debriefs and provide an explanation when needed.

(2) Although clear conclusions could not be drawn concerning supporting empathy as the perspectives of others (the civilians’, the perpetrators’ or the military officers’ rationale), the 360 videos and embodiment approach seems to aid the process.

(3) Willingness to think or act differently in war and conflicts did not seem to change significantly but the self-efficacy improvement indicates that the participants might be more able to deal with this situation in the future.

The quantitative data show evidence that the students' perceptions about learning in the five steps were positive and they were satisfied with the educational intervention. In brief, the students consider the XR-based educational program helpful and contribute to a better understanding of the threat-based approach to protection of civilians and support the value of the experience for adopting different perspectives. They also highlighted the importance of the XR experience as an integrated part of an educational program. The intervention seemed to affect students' emotional reactions when watching the 360° videos while being embodied as different participants in the conflict (Step 2). Responses about the role of embodiment were conflicting due to several reasons. One of the reasons could be that the students did not get properly accustomed to their avatar body before and during engaging with the plot of the video. This might be attributed to the fact that the participants (partly due to the heightened viewing angle during the 360° sequence) needed to keep looking down deliberately to see their virtual hands and body during the combined embodiment/video experience. The scenario was designed as a passive experience, and the lack of agency (the need to participate actively in the scenario, such as interacting with the documents or other objects on the virtual table) did not provide natural reasons for the participants to look down. This and the limited gaze contact and other interactions with the actors in the video might have been disruptive to the participants’ sense of presence and the overall experience. In addition, as the virtual bodies in our case were not ethnically and professionally very different from the bodies of the participants (i.e. white men being embodied as a Caucasian-looking civilian, officers being embodied as generic military-clad perpetrators), it would be interesting to perform further research on the role of embodiment vs context and storytelling.

The first trial of the application with students was important in evaluating the design concepts and identifying areas of improvement. Technically the XR experience was user-friendly despite the heating of the computers that resulted in some performance issues.

The usability results are encouraging, as the SUS score was slightly above the average and therefore, acceptable. It is worth mentioning that the prototype is rather extensive. It was also developed under a strict schedule. Based on the overall analysis, the system will be redesigned. The preliminary study shows the potential of the XR project in the field of teaching protection of civilians. Still, it could be adapted to other areas linked to behavioural changes via virtual experiences.

The study has a number of limitations. It focuses on examining attitudes and perceptions of the threat-based concept and how students understand it. The threat-based approach is aimed at a rather limited population, i.e. military active personnel with experience and responsibility in planning operations. The Norwegian armed forces had approx. 27,557 active personnel in 2021 (without home guard) [20]. This represents 0.5% of the population (5,425,270) [50]. The approach is taught as part of a master’s program that requires a very specific education and professional experience. Similar to armed forces in general, in Norway and internationally, there is a significant gender disbalance in the student population taking this master course, something that is reflected in our sample. This disbalance is, however, representative for our main target group. The usability of the XR application is also carried out from the perspective of understanding the threat-based approach. These reasons imply that it is not possible to recruit participants without military background and then attempt to extrapolate the results.

To summarize, the sample population used is therefore a reflection of the reduced number of available participants. The availability of the participants caused by their current work and study commitments was rather limited. As a result, the convenient sample size was relatively modest. Therefore, it was not practically possible to design this study to include a control group with the randomised assignment. While this convenience sample could argue against the validity (internal and external) of the study, it is rather representative of our target group and therefore suitable for a pilot study. Also, due to some technical problems with processing of sensor data and associated delays, only a limited analysis has been possible at this stage. On the other hand, in this work we have adopted an exploratory approach to evaluate a pilot of an XR application for teaching protection of civilians, something that was never done before to our knowledge. The presented quantitative data triangulated with qualitative ones from interviews, observations, and physiological data provide important insights for future work and additional studies to come..

6 Conclusions and future work

This article extends on a preliminary report of our work [21] and provides additional content on evaluation and results. The presented study indicates that the innovative approach to teach civilian protection with the XR, storytelling and interactive features makes the students more engaged and motivated to learn. This approach was recognised by being selected as the best demo for the 29th ACM [22].

The goal of this paper has been to report the evaluation of a pilot XR application for teaching protection of civilians to military officers, in order to investigate the suitability of XR medium for the purpose and to inform the design of the advanced version of the application to be used for teaching at the Norwegian Defence University college and ultimately as a part of continuous education at United Nations Peacekeeping operations. The primary evaluation method has been questionnaires to get feedback from the participants on the design of the XR application, its perceived value for teaching protection of civilians, its impact on their understanding of threat-based approach and other aspects. The final goal of teaching protection of civilians to military officers is to contribute to a change in their mindset. So, the ultimate objective measurement of the success of the intervention is the extent to which participants adopt a more active approach to protection of civilians in their future operation planning. Obviously measuring this is extremely complicated due to time constraints (as we only have access to the participants for a very limited time due to their busy schedules and as actual operation planning might happen several months in the future), privacy and especially security issues as military operation planning is normally classified information. Also, every military operation is different and different measures for protection of civilians might be needed.

The XR project is highly cross-disciplinary and involved experts in pedagogy, technology-enhanced learning, sensors, XR software development and human security in military operations. The adopted holistic design seems to have much potential for further exploration. Along with the qualitative analysis and sensor data, impact studies will be orchestrated to discover more about the project's effects on understanding of the topic and behavioural changes. In particular, a new and lighter version of the prototype is developed for stand-alone headsets (Oculus Quest), simplifying the technical setup and facilitating easier access to the educational application. The new version of XR app will be tested with a larger sample at NDUC and internationally during 2024. We also plan to work more on comparing the sensor and questionnaire data to find possible evidence of emotional and emphatic response from the participants. A possible study investigating the role of embodiment vs context and story for empathy and attitude change would be an interesting direction for future research. Finally, we are working on generative AI-powered virtual perpetrators to enhance students’ understanding of perpetrators’ motives and cognitive empathy.

In the future, the results of the project will likely be shared with international partner institutions to the Norwegian Defence University College, including United Nations, NATO, and others interested in teaching protection of civilians. The instructional design and the XR learning experience might also be adapted to other disciplines. The authors are not aware of any previous work where XR was used in this way for teaching protection of civilians in a military context, especially a combined XR experience consisting of several elements such as embodiment/360° video, dialogue with virtual humans and collaborative map exercise. This novel approach constitutes our main contribution to the field of educational XR.