1 Introduction

Technological advances in the field of virtual reality (VR) have created new opportunities for collaborative learning, allowing students to interact with each other in immersive learning environments. Research suggests that collaborative learning promotes student engagement, critical thinking, and knowledge acquisition (Hmelo-Silver 2013) through pedagogies such as problem-based learning (Kolmos et al. 2009) and theoretical concepts such as group cognition (Stahl 2006). In this review we understand collaborative learning as a social process shaped and reshaped by the participants using talk, embodied resources and mediational tools (Goodwin 2003), building upon concepts such as mediated activity (Vygotsky 1978) and situated learning processes (Lave and Wenger 2003). In the literature there is a distinction between collaborative learning and cooperative learning (Dillenbourg 1999). The latter concept implies a division of labour between the participants (each person working on a subtask without knowing what the others necessarily are do), whereas the former suggests a mutual focus on the activity from the participants (each person is engaged in the task focusing on each other’s actions). In this paper we investigate collaborative learning as a mutual engagement towards a shared problem, rather than just a divided form of labour (Roschelle and Teasley 1995). There is then a central interest in how participants actively create and attend to shared problems, orientations, and understandings.

VR is most commonly defined through a continuum, more recently referred to as extended realities (XR), in which a variety of technologies appear, ranging from real environments to fully virtual (Scavarelli et al. 2021). In this continuum, mixed realities appear such as Augmented Reality (AR) and VR (Milgram and Kishino 1994). While AR creates immersion by blending the real world with mediated objects (Azuma 1997), VR creates immersion by fully immersing the user in a virtual environment which can either be programmed (Jensen and Konradsen 2018), or 360-degree video recordings from real situations (Pirker and Dengel 2021). This type of VR distinguishes itself from other types of VR such as desktop VR, where a virtual world is inhabited on a flat screen (Hindmarsh et al. 2006), resulting in a different form of immersion. Mixed Reality (MR) is seen as an overarching spectrum in which the physical and virtual world blend together, e.g. through watching real world video in VR, or by overlaying virtual objects to the real world, such as through AR (Milgram and Kishino 1994). Going forward in this paper, MR will be seen as a type of VR, if the basis for VR is a fully immersive mediation, in which both computer-generated material and real-world objects are present. The specific sub-field that the review then orients itself towards, Immersive Virtual Reality (IVR), has especially gained popularity through the advances in Head Mounted Displays (HMD), such as the Meta Quest (Meta, n.d.) or HTC Vive (HTC, n.d.). The immersive use of virtual reality in educational settings is commonly defined through terms such as immersion, presence and interactivity (Radianti et al. 2020; Walsh and Pawlowski 2002) or immersion, presence and embodiment (Scavarelli et al. 2021) with a clear focus on the immersive mediation as the unique affordance of the learning environment that is created. While immersion can be afforded through other technologies, such as flat computer screens, we argue that these are distinctly different experiences, than the ones mediated through an HMD. This has also been the subject of research in many different studies, comparing desktop-VR and IVR (Kosko et al. 2021; Makransky et al. 2019; Prasolova-Førland et al. 2018; Wu et al. 2020).

IVR is however primarily used as an individual technology (Enyedy and Yoon 2021), hailing from areas such as social psychology (Blascovich et al. 2002). While the interest in using VR for collaborative learning seems to be growing (Makransky and Petersen 2023; Paulsen et al. 2023), there is a need to investigate how the environment and activities surrounding the IVR can be designed for in collaborative learning settings (Fortman and Quintana 2023). This paper presents a systematic review of empirical studies on collaborative learning in VR, exploring the pedagogical concepts and the design of environments and activities in educational and professional learning settings.

Previous literature reviews have explored the use of immersive learning environments aimed at specific settings such as remote learning (Chen and Konomi 2022), healthcare education (Liaw et al. 2018); specific skills such as spatial skills (Huang et al. 2022) or with specific theoretical positions such as deep and meaningful learning (Mystakidis et al. 2021). Broader reviews into collaborative activities in VR have also been conducted, albeit not with a specific focus on IVR, which is the focus of this review. Two clusters can be identified in these broader reviews.

The first cluster is aimed at a broader scope of immersive technologies, encompassing multiple types of immersive technology. Ali et al. (2019), in their mapping of AR, VR and MR for collaborative learning, conclude that several types of collaboration are present across the technologies, however with no clear distinction between which types occur in AR, VR and MR, making it difficult to transfer findings to a specific technology. Scavarelli et al. (2021) present a qualitative overview on AR and VR based on a set of prior reviews, concluding that there is a major challenge in determining how to use these technologies in a way which not only recreates the physical classroom. This proposes a need to study how VR is applied in collaborative learning settings.

The second cluster is VR-specific but operationalised through a broader VR-definition as discussed earlier. Zheng et al. (2018) conduct a meta-analysis of 87 VR-prototypes for collaborative learning, albeit with no limitation to IVR, meaning that most studies can be assumed to be desktop-VR solutions, concluding that there is a need to propose design strategies that are grounded in pedagogy, in order to inform technology and activities. The most recent review on collaborative learning in VR (van der Meer et al. 2023), supports the notion that desktop-VR is prominent, with only 10,8% of the 139 included studies making use of a HMD. This could indicate that IVR may be under-researched in comparison to desktop-VR, when it comes to collaborative learning. Van der Meer et al. (2023) also reiterate that there is a lack of guidelines as to how IVR should be applied in educational settings, possibly indicating that IVR is currently not used in its full potential for collaborative learning.

The present review contributes to this body of knowledge by providing insights into the environments and activities designed for IVR-mediated collaborative learning, identifying and synthesising current designs, moving the field towards design guidelines for facilitating collaborative learning in immersive learning environments.

1.1 Research question

In order to guide the review, a general research interest and series of research questions have been outlined through exploring the related work within the field:

Which design recommendations can be constructed through synthesising and conceptualising the current state of IVR-mediated collaborative learning?

  1. 1.

    Which pedagogical concepts are informing the design of IVR-mediated collaborative learning?

  2. 2.

    What types of environments are created for IVR-mediated collaborative learning, and through which affordances is learning made possible?

  3. 3.

    What types of collaborative learning activities are conducted in and around IVR-mediated collaborative learning?

  4. 4.

    What are the current potentials and limitations surrounding IVR-mediated collaborative learning?

2 Review methodology

A systematic review with a textual narrative synthesis (Lucas et al. 2007) has been conducted in order to investigate the outlined research question. In systematically building the data corpus, inspiration has been taken from the Prisma 2020 framework, ensuring a systematic and transparent approach throughout the data selection process (Page et al. 2021).

2.1 Identification

In order to address collaborative learning in IVR, a single broad search string was initially developed using Boolean operators (Gough et al. 2017). During exploratory searches of the field, it was discovered that there was a seeming mis-overlap between addressing collaborative immersive learning through the collaborative activity, e.g., “collaborative learning” or the collaborative learning environment, e.g., “social virtual reality”. After consultation with information seeking specialists at Aalborg University Library, Aalborg, Denmark, two search strings were developed, which can be seen in Fig. 1.

Fig. 1
figure 1

Search strings

In the first string, the first block addresses the collaborative learning setting, with the second block addresses the technology. In the second string, the first block addresses the collaborative use of technology, with the second block addresses the learning setting. After performing both searches, results were exported and combined in the reference management tool Zotero (Corporation for Digital Scholarship, n.d.). The use of both strings allows for capturing the interdisciplinary nature of the field, as well as the unclear terminologies used across the MR field (Ali et al. 2019), while maintaining the precision of the search (Buckland and Gey 1994). An initial broader search string using just “collaboration” was originally tried out, but increased the amount of retrieved papers and reduced the precision of the documents retrieved. The search was carried out in a multitude of interdisciplinary and domain specific databases (Table 1). Scopus and ProQuest were chosen for a broad search of the literature, with ACM, IEEExplore and ERIC all have a focus on either technology or education or both.

Table 1 Number of search results in databases

The search results were limited to the time period 2016–2023, in order to exclude studies appearing before consumer HMDs were available. The authors are aware of the fact that the Oculus Rift DK1 and DK2 were released in 2013 and 2014, but a preliminary search in the period from 2013 to 2015 shows that studies from this period are mostly concerned with Desktop-VR rather than IVR. Results were also limited to peer-reviewed material. The initial search was performed on 27th of April 2023 and resulted in 1214 results (805 after removing duplicates).

2.2 Screening & eligibility

A series of inclusion criteria were created in order to address the relevance of the 805 results (Table 2). These criteria acted as a code book, increasing the accuracy and transparency of the screening process (Belur et al. 2021). During the initial stages of the screening 30 randomly selected abstracts were individually coded by Author 1 and Author 2. The authors then met, and compared their coding, in order to iterate on the wording of the criteria in relation to the results. The criteria were further iteratively refined throughout the coding process, attending to arising issues, and unclear exclusions.

Table 2 Inclusion criteria

The review is limited to empirical descriptions, as the review is aimed at the application of designed environments and activities for collaborative learning in IVR. This then excluded purely technical-studies, frameworks that have not yet been empirically tested, review-articles, and preliminary studies. During the coding process, studies were found where the VR-environment was aimed a relevant setting, but evaluated with other users, e.g., students from a different educational setting. Only studies conducting learning activities with the intended learner (student/professional) of the VR collaborative learning scenario have been included. The collaborative learning activity must then take place in in a post-secondary education or a professional learning setting. The IVR criteria limits the immersive technologies to only include Headset-mediated IVR, as we have argued for this type of immersion being distinct from other types of immersion (Kosko et al. 2021; Makransky et al. 2019; Prasolova-Førland et al. 2018; Wu et al. 2020). While other forms of collaborative learning around VR exists, e.g. individual VR with followed by group discussions (Pieterse et al. 2023) or collaborative learning across realities (Paulsen et al. 2022; Steier 2020), the focus has been narrowed to collaborative learning in VR, in order to ensure that findings and design recommendations are transferable and actionable within this setting.

Initial coding was performed on abstract level by a single researcher. In cases of unclear inclusions, the paper was discussed with another researcher in order to determine the criteria of inclusion or exclusion. After screening abstracts, 102 results remained, with 97 of them being retrieved. The same set of criteria were then applied on full text, following the same coding process, resulting in 11 studies being included. A full overview of the process can be seen in Fig. 2.

Fig. 2
figure 2

PRISMA flow diagram

2.3 Textual narrative synthesis

As the included studies contain both quantitative and qualitative approaches, a textual narrative synthesis has been applied, making the diversity in study designs and contexts explicit (Lucas et al. 2007). This approach allows synthesizing studies by grouping them together, and creating a textual narrative (Barnett-Page and Thomas 2009). The analysis consisted of three stages (Lucas et al. 2007). First, the included studies were grouped into categories and sub-themes according to the research questions. Secondly, key aspects of the studies were summarized with relation to the theme. Lastly, these themes were synthesized in relation to each other. Steps 1 and 2 were iteratively conducted using an Excel spreadsheet. Throughout reading the included studies, four themes were identified in relation to the overall categories derived from the research questions: Pedagogical concepts, environments, activities, and potentials and limitations. The key aspects of each study were then coded in relation to the emergent themes.

3 Study characteristics

Before synthesising the results, the included studies were mapped according to authorship, country of main author affiliation, publication type, hardware, software, and domain (Table 3).

Table 3 Mapping of key information from included studies

Looking at the mapping in Table 3, an initial understanding of the corpus can be obtained. When mapping out the number of publications over time (Fig. 3), the field seems relatively stable over the past 5 years, with only a handful of publications appearing each year.

Fig. 3
figure 3

Included publications mapped out over time

It was expected that a rise in publications could be outlined, this is however not the case. Collaborative learning in VR is not new, as VR has been used for collaborative learning since the 1990s (Dede et al. 1996; Jackson and Winn 1999). IVR-mediated collaborative learning is however very much a niche genre of VR-mediation (van der Meer et al. 2023). This is further supported by the fact that about a third of the screened abstracts were excluded for using VR for describing ‘virtual worlds’ or ‘virtual environments’ inhabited on a flat screen. During the full-text coding it seemed that the use of ‘VR’ has become a very broad umbrella term, with many different types of immersive mediation being categorised under it, ranging from Desktop-VR using a flat computer screen to IVR using HMD’s. Another interesting thing to note here, is that 4 studies come from the same group of researchers (Lerner et al. 2020; Schild et al. 2018, 2021, 2022), with 2 studies coming from another group showing that at least two research groups (Prasolova-Førland et al. 2018, 2021) has identified a potential in this use of IVR for collaborative learning.

Regarding publication type, there is a fairly even distribution between conference studies (6), journal articles (4) and 1 book section present in the final corpus. This shows that while the field may be niche, some of the work has matured enough to be published in journals. The main authors of the included publications are dominantly affiliated with either German- (5) or Nordic institutions (5), with the last publication’s main author being affiliated with a Turkish institution. A potential reason for this may be a long-standing tradition for collaborative learning in Nordic educational systems, and an educational economy capable of supporting investments in experimental technologies such as HMDs. The application domains seem to be more widespread, albeit with a dominance of studies published within medicine and medical education, accounting for 7 of the 11 studies. Other domains include architecture, design, engineering, and pedagogy. Medicine being dominant is no surprise, as the medical field have long been regarded as frontrunners when it comes to adopting new educational technologies (Helle and Säljö 2012).

In terms of hardware, all included studies make use of PC-based HMDs. This is interesting, as Mobile-VR has gained popularity as a more accessible entry into using IVR as an educational technology. Mobile VR has however in some cases shown major limitations, leading to a reduced user experience (Alamäki et al. 2021). These limitations may then be why all systems are based on PC-HMDs. Regarding controls, 10 studies use the VR-controllers that accompany the used HMDs. Mouse/Keyboard has been mapped twice, as one study allows both control-types (Gong et al. 2020), with another study only allowing controls through mouse/keyboard (Prasolova-Førland et al. 2018). Two different studies also include other interactable hardware, such as surgical controllers (Chheang et al. 2020) and physical mannequins (Schild et al. 2022).

When looking at the used software, EPICSAVE appears twice (Lerner et al. 2020; Schild et al. 2018), with the iterated version of it, VITAWIN, also appearing twice (Schild et al. 2021, 2022). Both of these software-packages are developed based on the published research. CAVA360VR (Davidsen et al. 2022) and VRArchEducation (Özacar et al. 2023) are also software-packages that are tied to the ongoing research published in the included publications. Two studies use commercial VR-software, with Prasolova-Førland et al. (2018) using SecondLife in VR-mode, and Yu & Khalid (2019) being the only study using multiple software-packages, utilizing both SculptrVR, Google Blocks and Sketchbox.

4 Synthesis

After mapping, emerging themes and key aspects were coded according to the method described in Sect. 2.3. For sub question a, the key themes around pedagogical concepts were coded. For sub question b, the key themes around the content, the environment, and the mediation were coded. For sub question c, the key themes around the activities performed in and around VR were coded. For sub question d results, potentials and limitations were coded. The following sections will present a textual narrative synthesis (Lucas et al. 2007) of the themes related to the four research questions.

4.1 Pedagogical concepts

To address RQ1.a, the pedagogical concepts of the studies have been coded (Table 4).

Table 4 Pedaogical concept(s) of included studies

Before attending to the identified pedagogies, it is noteworthy that 4 studies did not state or reference any type of pedagogy or pedagogical principles (Chheang et al. 2020; Gong et al. 2020; Schild et al. 2021, 2022). This may be due to the increased focus on usability e.g., (Schild et al. 2021, 2022), the focus on feasibility (Chheang et al. 2020) and the focus on interaction design (Gong et al. 2020). While these areas are important to consider when designing for collaborative learning in IVR, they are also present in many other studies in the included corpus, which still focus on pedagogy.

In the corpus, 7 out of 11 studies are linked with socially oriented pedagogies, with some studies incorporating multiple pedagogical concepts. During coding, two clusters were identified. The first identified cluster contains different principles for the interaction with the world around us, containing ‘authentic learning’ (Yu & Khalid, 2019), ‘active learning’ (Schild et al. 2018) and experiential learning (Schild et al. 2018; Özacar et al. 2023). The second cluster can be seen as operationalisations of the principles outlined in the first cluster. The most frequently coded pedagogy is problem-based learning (PBL) (Davidsen et al. 2022; Prasolova-Førland et al. 2018; Yu & Khalid, 2019). Other pedagogical frameworks include ‘small group tutorial learning’ (Prasolova-Førland et al. 2018), ‘simulation based training’ (Lerner et al. 2020), and a broad reference to ‘collaborative learning’ (Prasolova-Førland et al. 2021).

This theme then shows that while there is still about a third of the included studies which do not state any pedagogical concepts or principles, the rest seem grounded in relevant pedagogies, that orient themselves towards the social nature of collaborative learning in IVR. The presence of pedagogically oriented studies may in part be due to the inclusion criteria, focusing on empirical studies. This excluded a lot of the work-in-progress studies from the initial corpus, where there presumably is a larger focus on technical usability rather than pedagogy.

4.2 Environments for IVR-mediated collaborative learning

To address RQ1.b, the themes surrounding the environment in which the collaborative activities occur have been coded (Table 5). During this theme, three subthemes appeared during coding; environment; mediation in the environment; affordances of the environment.

Table 5 Environment for IVR-mediated collaborative learning

The environment is in this case understood as the scene or world that the users are jointly immersed in. Two main types of environments are identified; video-based and programmed. Davidsen et al. (2022) is the only included study where multiple users are immersed in a 360-degree video recording of a knee examination, as opposed to a programmed environment. 360VR has recently seen a growth within educational fields, giving access to situated real life scenarios (Pirker and Dengel 2021; Rosendahl and Wagner 2023). The programmed environments can be further clustered into two sub-categories: abstract and reality oriented. Some studies make use of abstract environments, which aren’t designed to emulate a real environment, but can be seen as blank canvases. In the study of Forland et al. (2021), students are tasked with designing their own collaborative learning environment on the basis of a blank canvas. In the study of Yu & Khalid (2019) VR is seen as an open sandbox, where students design the environment from scratch. Gong et al. (2020) compare a modelled manufacturing hall akin to the one that the engineers would usually reside in, and a sandbox-like environment without any environmental details for design reviews. While participants found that the modelled environment improved realism, the authors raise the point that in programmed VR modelling real-life environments takes both time and effort and should therefore be carefully considered. In terms of reality-oriented environments, there are two further subcategories: digital twins and loosely modelled environments. Digital twins are digital replicas of a real-life environment (Barricelli et al. 2019). Özacar et al. (2023) are interested in comparing building surveying in a physical space and a virtual environment, and therefore create a digital twin of the environment – an empty room. Prasolova-Førland et al. (2018) create a digital twin of a Norwegian hospital, under the assumption that the direct mirroring will lead to a better transfer into the physical space after VR. The environments in which users are immersed can also be realistic, but not a direct replica of a real-life space. Chheang et al. (2020) create a virtual operating room and (Lerner et al. 2020) create an indoor amusement park. The VITAWIN software is the only one in which users move between different environments; a terrace, an ambulance, and a trauma room (Schild et al. 2021, 2022) – creating a narrative approach to the VR experience.

Mediation in the environment varies across studies, both in terms of how users are mediated, and the level of customisation made possible. Lerner et al. (2020) mention avatars with no further description, however, a set of mediated hands with gloves can be seen in the included figures of the original paper. In the lower end of the mediation-realism scale, (Chheang et al. 2020) mediate users using floating 3d-models of VR-headsets and VR-controllers. The authors point out that this type of mediation made it difficult for users to identify each other in the virtual environment. Across the included studies, the most common approach seems to be floating heads; either as grey heads with hands (Davidsen et al. 2022), coloured heads with controllers (Yu & Khalid, 2019), or coloured heads with hands (Prasolova-Førland et al. 2021; Schild et al. 2018). Some go beyond floating heads, Gong et al. (2020) use customised avatars from the shoulder and up, allowing users to more easily identify each other. Schild et al. (2021, 2022) use upper-half body mediation, Özacar et al. (2023) also uses upper-half body mediation, however with lip sync based on microphone activity. Prasolova-Førland et al. (2018) is the only study using full body-mediation, this is in part due to the controls being input through a mouse and keyboard, removing the need for tracking positions from the physical space. A central issue is then that due to hardware limitations, only head, and position can be tracked, greatly reducing the level of mobility which can be mediated.

Affordances, or perceived affordances, are the properties of the system that suggest how users may interact with it, a key term when synthesising digital learning environments (Dau 2014; Norman 1999), e.g., both looking at something and interacting with a programmed object. There is a clear dominance of interaction opportunities in programmed VR. Before attending to these, it is relevant to take a look at Davidsen et al. (2022) 360-video based environment. This environment is based on a 360-video, and therefore does not allow interaction with objects in a programmed world. Instead, the video can be interacted with through media-controls (play, pause, scrub), and annotated through a combined laser pointer/drawing tool. This type of VR then doesn’t depend on interactivity with programmed virtual objects for supporting learning, but rather on the immersion and presence in the video-based mediation of an authentic, situated setting. In terms of programmed affordances, there is clear reflection of the dominant medical domain in the corpus. Most affordances are centred around virtual patients (Chheang et al. 2020; Lerner et al. 2020; Prasolova-Førland et al. 2018; Schild et al. 2018, 2021, 2022). These can exhibit symptoms that can be monitored through e.g., virtual respirators (Chheang et al. 2020) or interactive tablets (Schild et al. 2022). Virtual patients are also interactable, allowing users to e.g., administer muscle relaxant through a programmed syringe (Chheang et al. 2020), change posture (Lerner et al. 2020), or order tests (Prasolova-Førland et al. 2018). In Schild et al. (2022) the virtual patient is linked to a physical mannequin, creating a Mixed Reality scenario, however, with resulting mixed user feedback. Looking beyond the medical domain, annotation objects are common, e.g., sketching tools (Özacar et al. 2023) or blackboards (Prasolova-Førland et al. 2021). Within engineering the most common affordance is manipulating imported or created 3d objects (Gong et al. 2020; Yu & Khalid, 2019). This theme then shows that affordances are very much domain specific.

4.3 Activities in IVR-mediated collaborative learning

To address RQ1.c, the collaborative activities have been coded. During coding, four subthemes appeared; aims, activities, scaffolding and evaluation (See Table 6).

Table 6 Learning goal(s) of IVR-mediated collaborative learning

Before addressing the subthemes some more general reflections can be synthesised around the activities. When it comes to the number of user engaged in collaborative learning acitities in VR, it varies from pairs (Schild et al. 2022), to groups of 6 (Prasolova-Førland et al. 2018). Davidsen et al. (2022) seem to have the biggest spread in groups (3, 6, and 5 users). The authors found no difference in time spent in VR, or the post-test between groups. The length of VR-learning also varies from Schild et al. (2021) reporting an average of 11.2 min, with the longest being 49 min in Davidsen et al. (2022). The reason for the long immersion time may be due to the self-directed nature of problem-based learning.

The aims of collaborative VR centre around professional skills and competencies such as practicing interprofessional communication (Chheang et al. 2020; Prasolova-Førland et al. 2018; Schild et al. 2021, 2022) and improving teamwork (Davidsen et al. 2022; Lerner et al. 2020; Prasolova-Førland et al. 2018, 2021; Yu & Khalid, 2019). The dominance of the medical field also shows here through aims such as improving clinical reasoning (Lerner et al. 2020), preparing for clinical examinations (Davidsen et al. 2022), and improving procedural knowledge (Schild et al. 2018). Other aims include supporting project work (Prasolova-Førland et al. 2021), supporting design review tasks (Gong et al. 2020), building surveying (Özacar et al. 2023) and designing outdoor areas (Yu & Khalid, 2019).

In order to fulfil these aims a series of activities have been identified taking place both inside and outside VR. In the medical domain these include scenarios requiring communication between different professions (Chheang et al. 2020; Prasolova-Førland et al. 2018; Schild et al. 2021, 2022) making diagnosis and treating patients (Lerner et al. 2020; Schild et al. 2018), and self-directed analysis of a 360-learning video (Davidsen et al. 2022). In the other domains the activities within VR include developing collaborative tools (Prasolova-Førland et al. 2021), design reviews (Gong et al. 2020), building surveying (Özacar et al. 2023), and co-creating and presenting design (Yu & Khalid, 2019).

The collaborative learning activities inside VR are supported through a series of scaffolds. Davidsen et al. (2022) uses prompts with embedded reflection questions in their 360-degree video, with a positive response from participants. Prasolova-Førland et al. (2018) make use of role-cards, helping participants getting immersed into their roles when roleplaying communication scenarios. The most common scaffold however seems to be having a trainer/teacher/researcher present inside or outside VR monitoring and intervening if necessary, or controlling virtual patients (Gong et al. 2020; Lerner et al. 2020; Özacar et al. 2023; Schild et al. 2018, 2021, 2022).

Outside VR, the most common activity is either a briefing, or debriefing (Davidsen et al. 2022; Lerner et al. 2020; Prasolova-Førland et al. 2018; Schild et al. 2018, 2021). In two studies a secondary condition is tested. Prasolova-Førland et al. (2018) compare desktop-VR and IVR for team communication, with Özacar et al. (2023) comparing the building surveying across IVR and an identical physical space. Davidsen et al. (2022) include a transfer of knowledge from VR to practice by having students perform the clinical examination that they have prepared for in VR. The authors show that students actively perform the clinical examinations despite having only analysed them in VR, and not tried them out before. This shows that IVR should in some cases not be viewed as an all-encompassing tool, but as part of a wider range of tools for facilitating effective collaborative learning.

In terms of evaluating the activities, interviews and different types of questionnaires seem to be the most prevalent. 6 of the included studies make use of interviews, either with single users (Chheang et al. 2020) or focus groups due to the collaborative setting (Schild et al. 2022). 9 of the included studies make use of one or more questionnaires or post-test. While many different questionnaires are present, some of the more commonly used ones are general evaluations (Prasolova-Førland et al. 2021) knowledge oriented tests (Lerner et al. 2020), cognitive load (Özacar et al. 2023), Usability (Schild et al. 2022). While Prasolova-Førland et al. (2018) mention screen capturing the session in VR, only Davidsen et al. (2022) provide analysis of the collaborative learning process within VR, as opposed to focusing on pre- and /or post-measures. This shows that while work has been done outlining the outcomes of collaborative learning in VR, research is needed on what happens within the collaborative learning processes that take place in IVR.

4.4 Current potentials and limitations

To address RQ1.d, the results, potentials and limitations of the 11 included studies have been coded.

Across the studies, the learning environment is seen as immersive (Davidsen et al. 2022; Özacar et al. 2023; Prasolova-Førland et al. 2018; Yu & Khalid, 2019), interactive (Prasolova-Førland et al. 2021), and efficient (Yu & Khalid, 2019). Moreso, it is flexible (Lerner et al. 2020; Prasolova-Førland et al. 2018), and free of time and location, allowing users to repeatedly practice at their own pace in a safe environment (Davidsen et al. 2022; Özacar et al. 2023; Prasolova-Førland et al. 2018; Schild et al. 2021). Users perceive collaborative learning in VR as engaging and motivating (Chheang et al. 2020; Özacar et al. 2023; Prasolova-Førland et al. 2021; Yu & Khalid, 2019), experience a high degree of presence (Davidsen et al. 2022; Lerner et al. 2020; Schild et al. 2018, 2022) and social presence (Chheang et al. 2020; Özacar et al. 2023; Prasolova-Førland et al. 2021). The experience leads to outcomes that are beneficial for communication (Chheang et al. 2020; Davidsen et al. 2022; Prasolova-Førland et al. 2018), collaborative learning (Davidsen et al. 2022; Prasolova-Førland et al. 2018, 2021; Schild et al. 2021; Yu & Khalid, 2019), problem solving (Davidsen et al. 2022), emotionalisation (Schild et al. 2018) and creativity (Yu & Khalid, 2019). This shows the wide outline of potentials for collaborative learning when facilitated through IVR. In a flexible environment that is rich and immersive, students can feel present and repeatedly practice realistic scenarios without affecting real outcomes, enhancing a wide variety of skills and competencies.

There are however also areas where the potentials are more unclear. Lerner et al. (2020) and Schild et al. (2022) report a user-friendly experience. Most studies however list a variety of technical usability problems impacting the learning experience (Chheang et al. 2020; Gong et al. 2020; Lerner et al. 2020; Prasolova-Førland et al. 2018; Schild et al. 2018, 2021, 2022; Yu & Khalid, 2019). On top of this, Forland et al. (2021) also reports on the time-consuming setup as a negative aspect of using IVR.

When attending to collaborative learning as the setting, the reported outcomes should also be considered. While Davidsen et al. (2022) show a transfer of knowledge from preparing clinical examinations in VR and performing them in the physical space, studies comparing pre-test and post-test find either no knowledge gain (Lerner et al. 2020; Schild et al. 2021) or similar results to traditional methods (Özacar et al. 2023).

Another issue raised across the literature is realism. While Davidsen et al. (2022); Lerner et al. (2020); Prasolova-Førland et al. (2018) and Yu & Khalid (2019) describe realism as a positive for collaborative IVR learning, most studies results report low realism (Chheang et al. 2020; Schild et al. 2018, 2021, 2022; Yu & Khalid, 2019), and average to low involvement (Schild et al. 2021, 2022). While the issue of realism should theoretically be improved by using 360VR, Davidsen et al. (2022) report on the lack of movement in 360VR, limiting the available information and interaction with the environment. Limited navigation is commonly reported as a limitation (Davidsen et al. 2022; Özacar et al. 2023; Prasolova-Førland et al. 2018; Schild et al. 2018). In relation to realism, the mediation is also questioned. This both concerns impersonal avatars (Chheang et al. 2020; Prasolova-Førland et al. 2021; Yu & Khalid, 2019), lack of visual cues for speaking and body language (Prasolova-Førland et al. 2018, 2021; Schild et al. 2018), as well as the lack of haptics (Chheang et al. 2020; Schild et al. 2018) – showing that there is still quite a way to go before collaborative learning in IVR can truly emulate the realism of real learning and training scenarios. While immersion in IVR is often at risk for creating physical discomfort (Chang et al. 2020; Jensen and Konradsen 2018), only Prasolova-Førland et al. (2018) and Yu & Khalid (2019) report having experienced issues with this. Another commonly found issue with both programmed and 360VR is increased cognitive load result in lesser learning, than other types of immersion (Makransky et al. 2019; Pirker and Dengel 2021). Only Schild et al. (2018) and Yu & Khalid (2019) touch on cognitive load, both identifying high cognitive load, which may hinder learning compared to real world settings.

5 Discussion

5.1 Implications

Looking across the synthesised sub-questions, some more general statements about collaborative learning in IVR can be outlined. Synthesising the knowledge produced from the sub-questions is needed for theorising the findings, which is the first step in moving towards design principles. The findings of the systematic review suggest that collaborative learning in IVR can, in its current state, be conceptualised as ‘a shared experience in an immersive, virtually mediated space, where there is a shared goal/problem which learners must attend to collaboratively’. Further, this activity may be supported through human and/or digital scaffolds, as well as activities outside VR before and after the collaborative VR activity. The use of collaborative VR for collaborative learning is utilised across domains, environments, and through different activities. While there is no consensus on VR improving knowledge gains, it creates a practice- and reflection space which is free of physical space and time, where collaborative learning is perceived as engaging and motivating, without the risk of interfering with actual practices. Current designs however struggle with usability, the complexity of designing and developing environments, as well as realism and facilitating social interaction. Returning to the individual sub-questions, these conceptualisations can be further explored.

5.1.1 Pedagogy

The results of this systematic review imply that while some studies are focused purely on technical usability, most of the collaborative learning designs in VR are grounded in socially oriented pedagogies such as experiential- and problem based- learning. We argue that a focus on the technical user experience of collaborative learning should not be disconnected from the pedagogical principles that are the basis of the collaborative learning activity. While the discussion of whether technology or pedagogy should go first in the field of educational technologies has a long history, there has been an increasing call for seeing technology and pedagogy as mutually informing, or entangled (Fawns 2022). This is then not to discredit the focus on technical usability, but to argue that technical usability and pedagogical principles cannot be separated but must be attended to in relation to each other.

When looking at the coded pedagogies, these are mainly oriented towards socially constructed learning. This is contrasting existing conceptualisations of VR learning, where learning is conceptualised as an individual, cognitive process (Enyedy and Yoon 2021). While the use of VR for collaborative learning should naturally call for socially oriented pedagogies, we still argue that this is an important finding, as it shows the feasibility of applying these kinds of pedagogies for VR-learning.

The shift towards collaborative learning is however not without pedagogical problems of its own. The social nature of collaborative learning implies that participation and social interaction should be designed for. This especially seems to be an issue when it comes to mediation of participants in VR. The current mediation of participants in VR arises issues such as difficulties identifying participants (Chheang et al. 2020), lack of visual cues and bodily conduct (Prasolova-Førland et al. 2021), and low realism (Yu & Khalid, 2019); all issues that can possibly result in diminished interaction and lessened participation. While some designs account for this by offering customisation (Gong et al. 2020) and lip-sync (Özacar et al. 2023), there is still a need to explore how avatar-mediation can be designed in order to facilitate participation and social interaction.

5.1.2 Environment

Our results also have implications for designing VR environments. A major difference between environments seems to be the use of 360-video in Davidsen et al. (2022) compared to the programmed environments present in the rest of the corpus. A core argument for adaptation and implementation of VR is that educators and learners must be given access to design processes, and not just be an afterthought after development (Scavarelli et al. 2021). This ties into the pedagogical implications of viewing pedagogy and technology as entangled (Fawns 2022). Educational environments and activities must be designed in co-ordinance with technical development. Jensen and Konradsen (2018) argue for using 360VR, as it reduces the barrier of entry into the design process, allowing educators and learners to become designers themselves using a video camera. 360VR and programmed VR however come with a series of trade-offs when it comes to designing for collaborative learning in IVR, which we will try to outline.

360-videos primary potential lie in giving immersive access to authentic settings (Rosendahl and Wagner 2023). This ties into a focus on pedagogies focusing on authentic learning (Yu & Khalid, 2019) and experiential learning (Özacar et al. 2023; Schild et al. 2018). This review has identified four different types of authenticity when it comes to environments for collaborative learning in IVR, with the programmed ones being: programmed (abstract), programmed (realistic), programmed (digital twin). These programmed environments are however still a computer-generated interpretation of reality when compared to re-mediating practice through 360-degree video recordings.

While 360VR focuses on the authenticity, this typically also comes with a cost in terms of the interactional possibilities presented to users (Pirker and Dengel 2021). In individual use of 360VR in educational settings, 360VR is mostly used as a passive tool, posing no interaction for participants (Paulsen et al. 2023). This is however not the case with Davidsen et al. (2022), where participants are able to use media controls and collaboratively annotate the 360-video. Although there is then interaction, this interaction differs from the interactional affordances presented in the programmed-VR studies, where users can have rich interactions with virtual patients (Gong et al. 2020; Lerner et al. 2020; Özacar et al. 2023; Schild et al. 2018, 2021, 2022) and manipulate 3D-objects (Gong et al. 2020; Yu & Khalid, 2019).

The discussion of interaction and authenticity is especially interesting as the lack of realism has been identified as a key limitation of collaborative learning in VR (Schild et al. 2022). Interactivity, and the realism of it, is often cited as a prerequisite for feeling present (Radianti et al. 2020). Our results however imply that realism is not only a function of interactivity. In comparing programmed-VR and 360VR for collaborative learning we can then distinguish between (1) realism as having interactional opportunities that reflect the real world, and (2) realism as being immersed in an authentic re-mediation of the real world. This leads into a discussion of terminology which is out of the scope of this paper. This distinction however implies that design needs to attend to realism and presence from a duality-perspective, focusing not only on the interactional affordances of the environment, but also the authenticity and situatedness of the environment in which these interactions are made possible. When specifically looking at collaborative learning, interaction design should not only focus on interaction with the environment, but also on interaction with other users – a key implications of using socially oriented pedagogies.

5.1.3 Activities

The results of the review also have implications for design of activities in and around IVR. Activities in IVR are aimed at practicing team communication and shared understandings of practical skills. It however seems unclear what the intention of learning in IVR is, with some studies focusing on practicing skills and competences, but evaluating through measures such as the amount of knowledge gained (Schild et al. 2021). There is then a need to decide whether the learning activity should be designed for knowledge retention or for practicing collaborative skills and competences.

We argue that the collaborative use of VR-learning lends itself better towards being a reflective space for collaboratively practicing skills and developing competencies. This is also in alignment with the outcomes of most of the include studies citing the learning as beneficial for communication (Chheang et al. 2020; Davidsen et al. 2022; Prasolova-Førland et al. 2018), collaboration (Davidsen et al. 2022; Prasolova-Førland et al. 2018, 2021; Schild et al. 2021; Yu & Khalid, 2019), problem solving (Davidsen et al. 2022), emotionalisation (Schild et al. 2018) and creativity (Yu & Khalid, 2019). While IVR learning show similar results in comparison with face-2-face learning (Özacar et al. 2023), we still find it relevant to outline the unique potential of IVR in comparison to traditional learning and training methods: Practicing in collaborative IVR allows for utilising an environment which is freed from time and space constraints and doesn’t interfere with practice. Users are then both free to participate when and where they want, and don’t have to manage mistakes that affect real outcomes, such as real patients. With this level of freedom, an increased emphasis is put on designing activities that can both accommodate this freedom, but also ensure a shared orientation towards a set of learning goals. The most common approach found in the includes studies is using a scaffold, either through pedagogies such as PBL (Prasolova-Førland et al. 2018), digital scaffolds such as prompts (Davidsen et al. 2022), or human trainers monitoring the session and intervening when needed (Schild et al. 2022).

While the work around these scaffolds begins to conceptualise how shared knowledge construction may be designed for within IVR, there is also a need to design for transfer, allowing participants to re-construct their knowledge in a different setting outside of IVR (Dohn and Markauskaite 2019). While collaborative learning can pre-qualify actions in the physical space, it cannot replace it, but should rather be seen as a supplement to traditional learning activities.

5.2 Design recommendations

Our goal with this review has been to (1) conceptualise the concept of collaborative learning within IVR and (2) use this conceptualisation to move the field towards design principles, tying into calls from other reviews (Scavarelli et al. 2021; Zheng et al. 2018). In the following we have attempted to convert our conceptualisation into a series of broad design recommendations for designing for collaborative learning in IVR. These are not to be seen as universal guidelines. As the results of this review implies, the field, albeit its small size, contains many different approaches to each aspect of the learning design, with many of these approaches being grounded in the specific domain in which the learning occurs. The aim of these recommendations is to provide a starting point, based on the literature review, taking first steps towards concrete design principles. Further research is needed to evaluate these recommendations empirically.

  • Pedagogy and technical usability should be viewed as entangled, designing for both social knowledge construction as well as user friendly experiences with a focus on collaborative learning. It is then important to e.g.,

    • Align pedagogical goals and technical features.

    • Include relevant stakeholders in the design and evaluation of immersive environments and activities.

  • Immersive environments for collaborative learning should be designed for participation and social interaction through e.g.,

    • Mediating avatars which are recognisable / customisable while striving to mediate as much bodily conduct as possible – e.g., through different coloured avatars (Yu & Khalid, 2019) or lip-synced avatars (Özacar et al. 2023).

    • Creating methods for establishing spatial awareness, joint attention and perspective-taking, such as bodily orientation, and tools for marking objects as relevant – e.g., through laser pointers (Davidsen et al. 2022).

  • An active choice should be made regarding the degree of realism when designing the immersive environment for collaborative learning, whether programmed or 360-video based.

    • In programmed environments emphasis should be on creating presence through designing interactions that emulate the real world – e.g., through virtual patients (Chheang et al. 2020).

    • In 360VR environments emphasis should be on creating presence by being immersed in an authentic re-mediation of practice – e.g., through videos of non-scripted interactions (Davidsen et al. 2022).

    • In both types, design should not only focus on affording interaction with the environment, but also interaction with other users creating co-presence.

  • Activities should be centred around a shared goal or problem that must be attended to collaboratively and aimed at developing and practicing skills and competencies. This can be done through e.g.,

    • Defining clear learning objectives that supports a collaborative learning process.

    • Ensuring that learning objectives, immersive environments, and activities are in alignment with each other and pedagogical concepts and goals.

  • Activities should be supported by human or digital scaffolds who can e.g.

    • Guide learners in supporting their collaborative learning processes – e.g. through role cards (Prasolova-Førland et al. 2018).

    • Support learners in reflecting – e.g., through video prompts (Davidsen et al. 2022).

    • Give learners feedback on their actions – e.g., monitoring and evaluating the process (Schild et al. 2022).

  • Knowledge constructed in IVR should be made transferable or actionable outside of IVR through e.g.,

    • Using IVR as a tool to pre-qualify face-2-face learning/training – see e.g., (Davidsen et al. 2022)

    • Facilitating debriefing or group discussion in order to reflect on the experience – see e.g., (Lerner et al. 2020)

    • Supporting reification, allowing participants to take the constructed knowledge with them across realities, e.g., through exporting annotations or video-clips of the session.

As a general note, we recommend involving relevant stakeholders, pilot testing, and iterating different design propositions before conceptualising a final design decision regarding environments and activities for collaborative learning using IVR.

5.3 Limitations

Despite the systematic nature of the inclusion process, and the added transparency of the codebook, there are still some limitations which may affect the conclusion that can be drawn from the results presented in this review. The search string and inclusion criteria greatly limit the generalisability that can be drawn from the results. The findings are then then only valid for collaborative learning within IVR which is aimed at a post-secondary educational or professional collaborative learning setting and evaluated with intended learners. This then excludes other relevant learning and training settings, other types of collaborative learning such as asynchronous and across worlds, as well as designs that have only been evaluated with experts, or other users.

Another limiting factor is the number of studies and the dominance of the medical domain within it. With only 11 included studies, authorship from 5 countries, and multiple author groups being included with multiple studies, the results come from a narrow scope and may result in the design recommendations not being applicable to other settings and domains. We again stress the need to empirically evaluate our proposed recommendations before moving further towards actual design principles.

A last limiting factor which needs to be addressed is the narrative approach used in the synthesis. The subjective coding and grouping of categories and themes are not objective statements about the designed VR environments and activities, but our interpretation of the original authors reporting of them. The subjective nature of these codes may then also change the implication of our presented results.

5.4 Future work

Looking ahead, we have identified two main areas of future work, which we deem important to address in order to increase the adaptation and implementation of IVR for collaborative learning.

(1) Improving participation and social interaction in IVR is crucial for facilitating effective collaborative learning among participants. Future research should investigate ways to enhance non-verbal cues and bodily conduct, such as facial expressions, gestures, and body language, to make interactions feel more natural. This should hopefully increase the feeling of social presence in IVR, enhancing the usability and experience of engaging in collaborative learning in IVR. I.

(2) Exploring different evaluation measures. The results of the review implies that IVR should be viewed as a reflective space for developing and practicing skills and competencies. Most studies however evaluate through post-test, rather than looking into what is happening within IVR-collaborative learning. In order to effectively design for collaborative learning in IVR, we need to open up the black box, and start analysing what is happening inside IVR, and not just the experience and outcomes that the activity results in. The actual practice in which technology is used, is often overlooked in design processes (Tatar 1989). A starting point for this could be attending to key terminology within fields such as computer-supported collaborative learning (CSCL) (Cress et al. 2021), looking at how collaborative problem solving (Roschelle and Teasley 1995), perspective-taking (Paulsen et al. 2022), joint attention (Luff et al. 2003) is constructed and organised. This may also aid in figuring out which elements of designs increase e.g. discomfort and increased cognitive load, which may hinder learning. Further, the interactional data from the collaborative learning activities in IVR could be measured using other methods like biometrics, eye-tracking and HMD tracking. Coupling the quantitative measures with the interactional data could provide deeper understandings of collaborative learning in IVR environments supporting future designs (Järvelä and Rosé 2023).

6 Conclusion

This systematic review aimed to explore the design and empirical application of immersive virtual reality (IVR) in collaborative learning in educational and professional learning settings. The review identified 11 relevant studies. Through a textual narrative synthesis, the review examined the pedagogical foundations of learning design, the design of IVR environments, and the collaborative activities in and around IVR. The presented findings indicate that collaborative learning in IVR can be conceptualized as a shared experience in an immersive, virtually mediated space, where the type of activity is centred around a shared goal/problem which learners must attend to collaboratively. This experience is mostly used as a shared reflective space for the development and practice of skills and competencies through domain specific affordances, particularly in the realm of professional communication. Based on the outlined conceptualisation, the review presents a series of design recommendations that move towards establishing design principles for collaborative learning in IVR. These recommendations highlight the need for attending to both pedagogy and technical usability while designing for participation and social interaction.

The review also highlights important gaps in the current literature. While many studies have focused on post-VR evaluations, there is a need for future research to explore and evaluate what occurs within the virtual reality experience itself. Understanding the construction and organisation of collaborative learning processes within IVR environments will provide much needed insights into the learning activity.

In conclusion, this systematic review contributes to the understanding of designing for collaborative learning in virtual reality by synthesizing existing empirical studies. The findings emphasize the importance of designing IVR environments and activities that support the development and practice of skills and competencies by designing for participation and social interaction. By addressing the identified gaps and pursuing future research that examines the intricacies of the IVR experience, educators, researchers, and designers can further enhance the adaptation and implementation of collaborative learning in virtual reality settings.