Introduction

The anatomical sciences are foundational to health sciences education [1]. For centuries, gross anatomy education has been reliant on cadavers, which provide students with an accurate three-dimensional (3D) representation of human anatomical structures [1,2,3]. Not only are cadavers effective gross anatomy educational tools [4, 5], cadaveric dissection is considered important for teaching humanistic professional competencies, beyond anatomy knowledge, including teamwork, patient interactions, and an understanding of medical ethics [4, 6, 7]. Despite this, the use of cadavers as teaching tools has been in decline throughout the twenty-first century due to the time-consuming and costly upkeep of cadavers, alongside reduced time dedicated to teaching gross anatomy [8,9,10,11,12]. More recently, distance learning measures, including online and virtual classes, required to combat the COVID-19 pandemic have also proven to be a significant barrier to meaningful in-person anatomy education [13]. Thus, modern anatomy education is turning to alternative anatomy learning resources to adapt to these changes.

Alternative resources have been limited primarily to two-dimensional (2D) assets, including digital screen-based anatomical images and models. With technological advances, however, novel ways to digitally present 3D anatomical structures, which were previously confined to physical environments, are emerging. These include autostereoscopic displays, augmented reality (AR), mixed reality (MR), and immersive virtual reality (VR). Autostereoscopic display allows viewing of stereoscopic images without the need for additional equipment (e.g., 3D headset) whereas stereoscopic AR and MR superimpose computer-generated images on the real world. While definitions are conflicting, it is generally understood that AR only allows for viewing using a camera or glasses, whereas MR uses a headset, such as the HoloLens (Microsoft Corp., Washington, USA), allowing the viewer to interact with the virtual and physical environment. Immersive VR also employs a headset, but immerses the user into an entirely computer-generated world, isolating them from their surroundings. VR can also be used as an umbrella term for virtual environments, encompassing AR and MR as well. For the purpose of this review, VR will be used to refer to immersive VR.

Common immersive VR headsets include the Vive (High Tech Computer Corp., New Taipei City, Taiwan), the Oculus (a division of Meta Platforms, Menlo Park, USA), the PlayStation VR (Sony Group Corp., Tokyo, Japan), and the Valve Index (Valve Corp., Washington, USA) products. VR is an attractive option for learning anatomy outside of a gross anatomy cadaver laboratory, as it allows understanding of spatial relationships, environmental manipulation including virtual dissection, and recreation of pathologies and complex anatomical structures can be prepared by using 3D-scanning technologies or by creating entirely synthetic materials [14,15,16]. Moreover, head-tracking technology and controllers permit the user to interact with their environment, facilitating remote “hands-on” learning.

Immersive VR has already proven to be a versatile tool in health sciences education due to its customizable and “hands-on” capabilities. Uses include procedural simulation, surgical skill development, surgical planning, and gross anatomy learning [17, 18]. While its use has historically been limited by cost, recent hardware and software developments have resulted in more affordable access to VR [19]. Accordingly, many educational institutions have begun implementing immersive VR in their anatomy programs [20,21,22]. Despite these seemingly promising developments, the literature regarding the efficacy of VR as an anatomy learning tool is largely mixed and remains contentious. Some studies have shown that VR improves post-intervention anatomy test scores and long-term retention of anatomy knowledge as compared to traditional learning methods, including dissection, textbooks, and 2D virtual counterparts [23, 24]. On the other hand, a systematic review and meta-analysis by Moro et al. found that VR does not significantly enhance anatomy learning but is a viable alternative [18]. Evidence also exists that suggests digital learning in general may interfere with learning ability [25,26,27,28], and that learning anatomy using immersive VR is inferior to using physical 3D models [29, 30]. This draws attention to a gap in current knowledge regarding the specific factors associated with the use of immersive VR that affect the user’s ability to learn anatomy.

While prior reviews have been conducted to explore the efficacy of VR as an anatomy learning tool [21, 29, 30], this scoping review is the first to identify and compile potential factors that affect the learner’s ability to acquire anatomical knowledge in an immersive VR environment. Through this exploration, the benefits and limitations of using VR as a learning tool can be further elucidated, and either exploited or improved, respectively. Moreover, the gaps in the literature regarding these factors will be clarified, identifying avenues for future research to maximize the potential of VR as an anatomy learning tool.

Materials and Methods

A scoping review was conducted to explore the determinants of learning anatomy in VR. The five-stage scoping review framework by Arksey and O’Malley [31] was followed: identifying the research question, identifying relevant studies, selecting eligible studies, charting the data, and collating, summarizing, and reporting the results.

Identifying the Research Question

The research question used was “What are possible determinants of learning gross anatomy in an immersive VR environment?” Determinants are defined as factors, related to the learner or the learning environment, that may influence learning ability. The immersive VR environment refers to the use of a head-mounted display which visually isolates the learner from their surroundings. This review focuses specifically on immersive VR, omitting AR and MR, in order to specify the determinants of learning while immersed in an entirely separate, digitally rendered environment.

Identifying Relevant Studies

An electronic search was conducted using four databases (MEDLINE, Embase, Web of Science, and PsycINFO). Due to the relatively new introduction of VR technology, studies published in the last 20 years (March 2002 to February 2022) that were available in English were included. Review articles, commentaries, editorials, letters, and academic theses were excluded.

Selecting Eligible Studies

The screening and eligibility process was conducted in two stages: title and abstract screening and full-text screening. The eligibility criteria for both stages are summarized in Table 1. Search terms were developed by two researchers and optimized in collaboration with McMaster University Health Sciences librarians. Using a thorough search strategy (ESM Appendix), resulting studies underwent title and abstract screening according to a priori eligibility criteria (Table 1). Included studies proceeded to full-text screening by two separate researchers using additional eligibility criteria (Table 1). Disagreements at both stages were resolved by discussion with a third researcher when required. The reason for exclusion was recorded at the full-text screening stage. Although all authors had institutional access, and other methods were used to try to obtain access to full-texts, some studies were omitted due to researchers being unable to access full-texts.

Table 1 Eligibility criteria for article selection

The selection process followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram (Fig. 1).

Fig. 1
figure 1

PRISMA flow diagram of the article screening process. Of 4523 studies identified, 924 duplicates and 3574 studies were removed during the screening process, leaving 25 included studies. Reasons for article exclusion are listed for both title and abstract and full-text screening stages

Charting the Data

A data extraction spreadsheet was developed to chart information from the selected articles as follows: paper title, authors, publication date, study locale, participant experience level (education level, program, and/or prior anatomy experience when noted), learning instrument used (VR headset and software when noted), comparison modality (when applicable), learning determinant (outcomes measured other than anatomy knowledge test scores), subthemes (i.e., through which narrower components of the learning determinants are measured, when applicable), and the relationship between immersive VR and the learning determinant identified in the study. The data from each included article was extracted by two independent researchers to ensure all relevant information was included. Any opposing opinions were re-evaluated by the entire team, and a unanimous consensus was reached regarding the final decision.

Collating, Summarizing, and Reporting the Results

Descriptive statistics about the methods and results from included studies were collected. Post hoc categorization of non-knowledge test outcomes was performed by three independent researchers based on the specific factors that were being measured. All measures were sorted into six common learning determinant categories: cognitive load, cybersickness, student perceptions, stereopsis, spatial understanding, and interactivity.

Results

Using the search strategy outlined in ESM Appendix, 4523 articles were identified. In total, 924 duplicate articles were removed, leaving 3599 citation titles and abstracts for screening. Of those, 458 articles underwent full-text screening (Fig. 1). Each article’s full-text screening was independently conducted by two separate researchers. Disagreements were resolved by a third researcher when required. Cohen’s kappa coefficient was 0.74 for full-text screening, indicating substantial agreement between reviewers. Twenty-five final articles were included for data extraction. Included study characteristics are summarized in Table 2. The search included studies that have been published from March 2002 to February 2022; 1 article (4%) was published prior to 2017, and 24 articles (96%) were published between 2017 and 2022.

Table 2 Characteristics of studies included in scoping review

Included studies each reported outcome measures other than knowledge test scores. These measures were sorted into six possible learning determinant categories: cognitive load, cybersickness, student perceptions, stereopsis, spatial understanding, and interactivity. Although a determinant would be considered even if it was presented in only one study, each of the determinants identified was assessed in two or more studies. These determinants and associated study findings are summarized in Table 3. Many outcomes outlined in the table are not exhaustive in detail and are generalized based on themes — further detail on specific results can be obtained from the referenced papers.

Table 3 Possible learning determinants identified by included studies. Qualitative and quantitative outcomes reported by included studies are listed according to the learning determinant category. Results from these studies are summarized in “Statistical results” if they are supported by statistical analysis, or “Non-statistical results” if they are not

Cognitive load refers to the amount of information one can process at once, and encompasses ideas including intrinsic, extraneous, and germane load, and perceived mental effort. Cybersickness refers to the adverse symptoms experienced when using virtual screens, including measures of nausea, oculomotor, and disorientation, among others. Student perceptions included concepts such as motivation and perceived efficacy of instruments used. Stereopsis refers to the ability to perceive depth and is mainly due to binocular vision, allowing individuals to perceive the relative distance of objects in real or virtual space. Papers describing spatial understanding discussed whether a particular tool was able to provide adequate levels of understanding of the positioning of landmarks relative to one another. Finally, interactivity refers to levels of user control in immersive VR.

As outlined in Table 3, student perceptions were assessed in 84% of the selected publications, cybersickness in 32%, spatial understanding in 24%, interactivity in 12%, and both cognitive load and stereopsis in 8% of papers. Quantitative measures used included emetic response and dropout rate for cybersickness, the Revised Purdue Spatial Visualization Test, Mental Rotations Test, and the ability to label anatomical landmarks for visuospatial ability, degrees of freedom of user control for interactivity, and monocular or binocular vision for stereopsis. No quantitative measures were used to assess cognitive load or student perceptions. Qualitative measures were used to assess student perceptions, primarily in the form of self-reported Likert scales, as well as the Instructional Materials Motivation Survey for assessing motivation, and the Motivated Strategies for Learning Questionnaire for assessing perceived value of VR. Cybersickness was qualitatively measured using the Simulator Sickness Questionnaire, in addition to self-reported incidence of symptoms. These symptoms were grouped into general symptoms (nausea, dizziness, disorientation, headaches, etc.), oculomotor symptoms (tired eyes, double or blurred vision, aching eyes, etc.), and flashbacks (false sensation of movement). Cognitive load was measured qualitatively using previously reported questionnaires, including the “Questionnaire to assess your activity in the virtual order-processing environment”, [32] and an additional cognitive load questionnaire assessing intrinsic, extraneous, and germane load. Spatial understanding was assessed qualitatively through questions evaluating perceived efficiency of VR for spatial understanding. Stereopsis was assessed qualitatively as well using a question regarding perceived depth perception. Finally, interactivity was also assessed qualitatively, using a question regarding perception of hand–eye coordination, and a System Usability Scale form.

Of the 25 included studies, one study compared two VR systems [35], and five provided no comparator groups [32, 35, 36, 38]. Nineteen studies provided comparisons between immersive VR and other learning methods, which included tablet AR [40], tablet [22], AR [22], mixed reality [41], 3D-printed and plastic physical modals [15, 40, 41], desktop displays [40, 43,44,45,46,47], online lecture and online textbooks [48], textbooks [49, 50], cadavers [49, 51], lecture [51, 52], periapical radiographs and cone-beam computed tomography [53], cross-sectional viewings in a picture archiving and communication system interface [54], annotated magnetic resonance imaging scans [55], and finally, independent study/conventional content that was otherwise not specified [21].

Discussion

Understanding anatomy is important across healthcare disciplines; however, given the decline in access to cadavers, students may not have adequate access to necessary anatomy learning tools [8,9,10,11,12,13]. VR is a modern, rapidly changing technology that shows potential as an alternative to traditional cadaver-based learning, particularly in the absence of physical laboratory space and model availability. In order to effectively use this modality, however, it is important to understand how it may affect a learner’s ability to learn anatomy. Although many studies have attempted to assess the efficacy of VR as an anatomy learning tool [18, 31, 33, 34], the mechanisms underlying the differences between anatomy learning ability across different learning modalities remain unclear. Thus, the knowledge required to maximize the educational potential of VR is limited. This review sought to compile non-knowledge test outcomes measured in the literature, and synthesize them to identify potential determinants of learning in the immersive VR environment. Six possible determinants of learning were derived from these outcomes: cognitive load, cybersickness, interactivity, student perceptions, stereopsis, and spatial understanding.

Cognitive Load

The cognitive load theory outlines three types of cognitive load that strain resources used for information processing. Intrinsic load refers to the innate difficulty of a particular task or concept. Extraneous load is associated with content delivery, which can be inadvertently imposed by poorly designed instructional materials. The integration of learned content into existing knowledge for storage in long-term memory is referred to as germane load. Choosing a method of delivery that limits extraneous cognitive load is essential to allow dedication of sufficient intellectual resources to learning and integrating the content [56, 57]. Previous literature has indicated that immersive VR may place a heavy burden on working memory, which increases extraneous load [58,59,60, 22, 35, 38, 43, 44, 46, 49, 52] highlighted cybersickness as a factor in VR anatomy learning, a limitation associated with content delivery that may contribute to increased extraneous load. However, the two studies assessing cognitive load both noted that there was no difference across modalities [32] and that participants did not feel cognitively overloaded [55]. It is possible, however, that the intrinsic difficulty of the learning material presented in these studies was low enough that the load associated with design flaws (e.g., cybersickness, inadequate stereopsis) was negligible due to sufficient remaining cognitive resources. It may also be that unspecified benefits of VR counter the hypothesized increase in cognitive load. In all, the extraneous load associated with VR requires further objective investigation to determine how it can be limited if necessary.

Cybersickness and Interactivity

Cybersickness is akin to motion sickness, with the exception that it refers to the symptoms experienced when viewing electronic screens rather than through physical movement. Cybersickness is among the most well-known limitations of VR use [61,62,63]. Indeed, the papers included in this review have confirmed this as a limitation, albeit to varying degrees. An explanation for these adverse effects has been attempted by multiple hypotheses, with much of the evidence indicating that conflict between the visual and vestibular sensory systems is causal [61, 64]. Research investigating a solution to this issue is ongoing [65,66,67]. There is evidence that cybersickness levels are positively correlated with duration of VR use [38, 68], and degree of interactivity or navigational control [38], as was found in this review. Both of these features can be minimized when developing VR programs. Students may benefit from study sessions shorter than 30 min with breaks to mitigate fatigue and sickness [38, 69]. Moreover, reducing the number of axes that models can be rotated upon or disabling head-tracking technologies may mitigate sickness associated with navigational control. Nonetheless, the incidence of cybersickness may deter students from using immersive VR for anatomy learning. Thus, it is important to supplement the use of VR as an educational tool with other options.

Student Perceptions

Students generally reported that the immersive VR environment had a positive influence on their willingness and ability to learn and explore anatomy in comparison to other methods. Interestingly, none of the included studies reported statistically significant negative student perceptions, nor was it found that most participants in any study shared a negative perception of VR. However, transcribed student comments presented in some papers were omitted from this analysis due to their specific nature, and these may include some negative opinions. There are a number of possible explanations for the positive perception of VR. The self-guided nature of immersive VR technology supports a learner-centered approach to learning [70], a method that is highly validated and effective. Moreover, the first-person perspective increases motivation, which allows students to learn new information with increased ease [71,72,73]. Accordingly, most articles stated that students had increased motivation and found immersive VR to be engaging and interesting. However, as many of these studies were not associated with course credit, it is important to assess motivation in a real-world context by integrating VR into anatomy courses, where there are stakes associated with the learning outcome. Furthermore, it is possible that VR constitutes a novel learning environment for many participants given that it is a relatively new technology, which may result in increased engagement. This presents a possible limitation of this data, as the novelty of the platform may mask the true effect of VR on learning ability by increasing engagement and augmenting performance to a level that may not persist in the long-term. It may also be possible that the novelty of the experience may increase cognitive burden and decrease learning performance, but the true effect of novelty under these circumstances is unclear. Moreover, whether VR remains as engaging and motivating over time, as is reported in included studies, remains to be elucidated.

Stereopsis

Stereopsis refers to the visual ability to perceive the world in three dimensions allowing one to distinguish the relative distance of objects in space. This ability is mainly due to binocular disparity though numerous monocular cues, such as parallax, interposition, and relative size, are also significant [74]. One unique feature of immersive VR is its creation of binocular disparity, which facilitates true stereopsis in a virtual world, something which is absent from textbooks, 2D videos, and all but autostereoscopic displays. Stereopsis is enabled through binocular vision, which facilitates the perception of depth. There are other monoscopic cues to stereopsis are not studied in any detail from an educational perspective. Wainman et al. [41] found that learning anatomy with a VR headset with monocular vision significantly reduced anatomy test performance in the same fashion as monocular vision inhibits learning of physical models whereas using monocular vision in an AR headset did not reduce performance. This finding is supported by similar studies which suggest that stereopsis significantly improves anatomy performance while using digital learning resources [75] and further suggests that the AR headset did not provide meaningful stereopsis. In summary, stereopsis is an important feature of VR that may play a role in maximizing its potential as an anatomy learning tool. This observation is borne out by the meta-analysis of Bogomolova et al. [76] who showed a clear educational benefit with stereoscopic displays of anatomical material.

Spatial Understanding

Spatial understanding in this study refers to the capacity of the learner to understand, and mentally manipulate, the spatial orientation of anatomical structures. This is a closely linked concept to stereopsis, mentioned above, as perception of depth contributes to the ability to understand spatial relationships. Accordingly, the beneficial effect of stereopsis as previously described results in further improvement for learners with low visuospatial ability [14, 75, 77], a term which refers to one’s individual ability to understand spatial relationships among objects. The relationship between visuospatial ability and anatomy performance is well consolidated in the literature [22, 58, 59, 78,79,80,81]. Generally, those with high visuospatial ability tend to perform better on anatomy examinations [81]. In this review, Wainman et al. found that those with low visuospatial ability performed worse when learning with VR; however, other studies found no correlation [44, 52, 55]. One may consider that increased user manipulation of the model might allow for enhanced learning for those with low visuospatial ability, as it would theoretically minimize the need to visualize the rotation of the model; however, studies show that manipulating models to angles outside of key views (i.e., orientations in which an object is best visualized and least obscured) slightly benefits high visuospatial ability learners, but significantly hinders the learning of individuals with low visuospatial ability [58]. This suggests that limiting user control to key views may be optimal as was shown by Garg et al. [78, 79] for two-dimensional display of three-dimensional objects. However, this requires further investigation in immersive VR.

Summary of Key Findings

Both studies addressing cognitive load found that the load imposed by the VR learning environment did not differ significantly from that imposed by comparator modalities [55], and that participants did not feel cognitively overloaded [37]. However, clarifying the degree of extraneous load imposed by immersive VR requires additional data. Secondly, cybersickness has been noted by all eight associated papers as an occurrence in immersive VR [21, 35, 38, 43, 44, 46, 49, 52]. Specifically, it was found that both duration of use and degrees of freedom of control over manipulation of the anatomical model were significantly correlated with incidence of symptoms [38]. Thus, limiting the duration of VR use to a maximum of 30 min [38, 69] and limiting the degree to which the model can be manipulated may help to mitigate this issue. Uncovering additional ways to limit cybersickness is imperative to enhancing learning experiences and limiting the cognitive burden imposed by immersive VR. Thirdly, while students generally perceive immersive VR as either superior or equal to other modalities in terms of ease of use, engagement, motivation, and efficiency for learning anatomy among other factors, it is important to continue assessing student perceptions over long-term use. It is possible that the novelty of the platform either falsely enhances performance due to increased engagement or may even reduce performance due to a lack of familiarity. Additionally, stereopsis has been shown to be an important feature of immersive VR, with students performing worse when using monocular vision [41]. Thus, stereopsis should be maintained. Finally, whether a students’ visuospatial ability affects their learning ability in VR is unclear, with three studies indicating that visuospatial ability does not affect anatomy learning ability in VR [44, 52, 55], and one finding that lower visuospatial ability students perform worse after learning in VR [15]. Therefore, limiting the orientation of the model to key views may improve learning outcomes for low visuospatial students [58], as well as occurrences of cybersickness due to decreased navigational control [38].

Limitations

Articles that focused on learning or exploring surgical skills did not meet the eligibility criteria of this review unless the authors specifically assessed anatomy learning. It is possible, however, that some studies that were focused primarily on procedural skills may have minimally explored anatomy learning in VR but did not include this detail in the title or abstract. Moreover, these studies may have explored relevant factors that could influence anatomy learning in a VR environment despite focusing on a procedural skill. An additional limitation is that many of the included studies reported specific student comments derived from interviews or open-ended comments. However, due to the specific nature of their content, they were omitted from the review. Nonetheless, they may provide further insight into possible avenues for research. Additionally, due to the exploratory nature of this scoping review, statistical analyses were not conducted to compare the determinants identified against learning performance. Many of these determinants of learning are being actively researched, and there soon may be sufficient data for meta -analysis such as that carried out by Bogomolova et al. [76] who investigated stereopsis specifically. Further research is needed to clarify such relationships, which will enhance understanding of how the possible determinants identified in this review affect student performance.

Future Directions

This review attempted to identify the determinants of learning anatomy in an immersive VR environment by analyzing the current scientific literature. However, it is possible that additional determinants of learning exist, but have not yet been explored or measured. To explore this, it is important to analyze student perceptions and comments, and further investigate what makes a learning object ideal for the VR environment. Moreover, it is important to continue assessing the determinants identified in relation to both learning and testing performance while incorporating further objective measures. For example, cognitive load can be measured using the dual task paradigms and cybersickness can be measured using an galvanic skin response [82] and electroencephalogram [83, 84]. VR technology is constantly evolving, and current factors that may deter an individual’s ability to learn in VR, such as cybersickness and interactivity, could be rendered negligible in the coming years. It is therefore important to continually understand how developments in VR technology affect its capacity as an anatomy learning tool.

Conclusion

This paper reviewed the past twenty years of anatomy education literature, with the majority of included studies having been published in the last five years, to identify possible determinants of learning gross anatomy in an immersive VR environment. Non-knowledge test outcomes reported in the included studies were collated and, using post hoc categorization, six possible determinants of learning were derived from these secondary outcomes: cognitive load, cybersickness, student perceptions, stereopsis, spatial understanding, and interactivity. While VR is generally positively perceived and does not seem to cause cognitive overload, symptoms of cybersickness were reported which may impair some users’ ability to learn anatomy. Moreover, other modifiable VR design factors, such as degree of interactivity, can be manipulated to improve VR as an anatomy teaching tool. However, as it stands, each determinant of learning must be further assessed using objective measures in order to elucidate more ideal ways to design and implement immersive VR as a tool for learning anatomy. Further research is needed to manipulate these factors and determine the associated impact on anatomy learning.