Background

Although anatomy serves as the basis for other medical courses for medical students [1], universities have decreased the hours allocated to anatomy education in favour of applied clinical work [2]. Medical students need to supplement their anatomy education with plenty of traditional resources, including cadaveric dissection, preserved specimens, and various 2-dimensional (2D) image representations (e.g. textbook illustrations, atlases, and tomographic scans) [3]. Recent advancements in computer technology have led to many different forms of digital anatomy simulations [4]. Among them, virtual reality (VR) technology is one of the most promising teaching tools in medical education. VR can be used to deliver a highly immersive experience through head-mounted displays (HMDs) and a less immersive experience through a desktop system [5]. A wide range of virtual learning resources (VLRs) have been developed that use 3-dimensional (3D) visualization technologies to supplement and even replace traditional instructional materials such as cadaver dissections [3]. Users can interact with vivid imagery for an active and self-directed learning experience without the limitations of ethical concerns and donation shortages [6, 7] or having to enter an anatomy laboratory [4]. In a few studies, the educational value of VLRs has been compared with that of conventional methods, and the results are generally inconsistent. When used either alone or to complement traditional written and online materials, VLRs showed better or similar effectiveness in terms of enabling students to learn anatomy [2, 8,9,10,11]. More importantly, VLRs are rated as more interesting and engaging [8], enjoyable [8, 9, 11], motivating [8, 12,13,14] and useful for understanding spatial relationships [11, 13, 15] than traditional tools. There is an inherent appeal in these newer and more advanced visualizations in addition to their novelty [4]. However, studies have shown that compared with cadaver dissection and physical modes, VLRs are less effective in improving learning outcomes [16, 17]. The lack of tactile experience is regarded as a disadvantage.

Although current studies suggest that VLRs cannot replace cadaver and physical models, they are perceived as promising supplementary resources in anatomy education. It is therefore important to evaluate the evidence from different aspects. However, current researches are largely focused on the comparison between VLRs and 2D textbooks, online materials or physical models. Petersson et al. [16] and Codd et al. [17] both compared VLRs with cadaver dissection, but neither study used VLRs to deliver a highly immersive experience. Recent studies by Birbara et al. [18] and Shao et al. [10] provide a fully immersive experience, but neither of the research groups compared VLRs with cadaver dissection, and the VLR group used only perception questionnaires. Objective assessments are crucial to evaluating participants’ performance, in which the identification test is considered to be a predictor of improved learning outcomes following 3D learning [19, 20]. Therefore, immersive VLRs and traditional teaching modalities, such as cadaver dissection and 2D atlases, should be further compared, and the different aspects of the assessments should also be considered to evaluate the impact of VLRs on anatomy education.

Aim and hypothesis

Neurosurgery comprises some of the most challenging surgical procedures, and mastering the intricacies of cranial anatomy is a career-long endeavour for every neurosurgeon [10]. In this study, a coloured and detachable skull VLR was constructed. The aim of this study was to compare the results of the skull VLR with the cadaver skull and a 2D atlas for anatomy education by administering objective questionnaires and perception surveys. Our hypothesis was that the skull VLR and cadaveric skull groups might have similar performance in the objective tests, and that students would show a more positive attitude towards their learning material than the atlas group.

Materials and methods

3D skull model based on VR technology

The virtual 3D skull model used in this study was constructed from computed tomography (CT) scans of a human skull from the Peking Union Medical College (PUMC) Anatomy Teaching Collection (Fig. 1). The CT scans were imported into Mimics 17.0 (Materialise NV, Leuven, Belgium) and converted into stereolithography (STL) files. The method used to create a 3D model from CT scans was previously published by Shui et al. [21]. Several defective structures (ethmoid plate, crista galli, anterior clinoid process and inferior orbital fissure) on the 3D skull model were modified by using 3D Studio Max 2016 (Autodesk Inc., San Rafael, CA). In addition, each bone was isolated from the entire skull and painted in a different colour (Fig. 2c & d). The model was then imported into the Unreal Engine VR platform (Fig. 2a) through the HTC VIVE software development kit (High Technology Computer Corporation, Taiwan) and Unreal Engine 4.15 (Epic Games Inc., Cary, NC), which is compatible with HTC VIVE CE (High Technology Computer Corporation, Taiwan), a VR HMD with a resolution of 2160 × 1200. Users could rotate and scale the model through handheld controllers. In addition, each cranial bone could be isolated from all other bones, which allows the user to view an individual selected bone and its position in space relative to the other bones. When the isolated structure was placed back in its original position, the model was reset.

Fig. 1
figure 1

Photos of the cadaveric skull and skull VLR. a From left to right, the cadaveric skull is shown in the frontal, right, superior and inferior views. b From left to right, the skull VLR is shown in the frontal, right, superior and inferior views

Fig. 2
figure 2

Photos of the simulation classroom and the skull VLR. a The entire classroom, in which a skull is placed on a table in the front of the classroom, the other skull is placed on a table in the middle of the classroom, and pictures of the human skeleton are placed in front of the window. b The skull VLR and projection screen. c The frontal bone separated from the entire skull. d All the bones separated, with the bright white ball representing the center of the original skull

Participants

Seventy-four clinical undergraduates from PUMC who had just finished a 2.5-year pre-medical programme at Tsinghua University were recruited. These students would begin their undergraduate stage of medicine in the subsequent 5.5 years, from the basic study of anatomy to clinical internships. The anatomy course combines regional and systematic anatomy and requires 144 study hours for each student. Every theoretical lecture is followed by a cadaver dissection teaching of equivalent time. There are a theoretical test and an identification test for objective assessment at the end of the course.

The students were randomly divided into three groups, namely, the skull VLR group (VR group, n = 25), cadaveric skull group (cadaver group, n = 25), and 2D atlas group (atlas group, n = 24). Seventy-three participants completed the trial, while one participant in the atlas group dropped out of the study for personal reasons before the pre-intervention test.

Ethical approval

This study was approved by the Institutional Review Board of the Institute of Peking Union Medical College Hospital (PUMCH) (Project No: ZS-1724).

Design

A flowchart of the study design is displayed in Fig. 3. All participants finished the pre-intervention tests. Then, they attended a 30-min PowerPoint-based introductory lecture on cranial anatomy, which included the characteristics of each cranial bone, feature structures and spatial relationships. The lecture was taught by a teacher from PUMC whom the students had not met before. During the lecture, each participant received a single sheet of paper with the teaching outline, which could also be used for note-taking. Afterwards, the three groups were assigned to three separate rooms for a 30-min self-directed learning session that used skull VLR, cadaveric skulls, and 2D atlases. The students in the VR group received a 2-min instruction about the manipulation of the VR equipment before learning. Study mentors were assigned to each room to prevent intragroup communication and were forbidden to answer questions related to anatomy. The participants took turns so that each participant had 7.5 min to manipulate and observe the model in the first perspective, and they observed the 3D model on the computer screen for the remaining 22.5 min. The participants in the cadaver and atlas groups also had the same amount of time to hold the cadaver skull or atlas, while the other participants could only observe, without manipulation. To compensate for the inability to view the teaching outline on paper in the simulated environment, a projector was used to project the teaching outline on a screen (Fig. 2b). A post-intervention test was conducted immediately after the learning session to evaluate the educational efficacy of each model. Finally, each participant completed a perception survey.

Fig. 3
figure 3

Flowchart of the study design

The pre- and post-intervention tests comprised the same set of theory tests and identification tests (Supplementary file 1.1 & 1.2). The theory test consisted of 18 multiple-choice questions that mainly covered basic knowledge on the skull. Each correct answer was awarded 1 point, and the examination lasted 15 min. The identification test consisted of 25 fill-in-the-blank questions on labelled anatomical structures of the skull. All structures were labelled on the cadaveric skulls. The participants had 45 s to observe each structure and write down its name. Each correct answer was awarded 1 point. The content was based on the syllabus from the PUMC anatomy course, and all the test questions are available in Supplementary file 1.

To assess the potential efficacy of the teaching tools, in addition to the objective learning efficiency determined by the test scores, a perception survey was designed (Supplementary file 1.3). The questions were based on those included in several previous studies conducted to evaluate the efficacy of other 3D models [8, 19, 22]. The perception survey used in this study consisted of five parts that addressed the participants’ enjoyment, learning efficiency, attitude, intention to use, and the tool’s authenticity, and a standard five-point Likert scale was used to quantify the responses (1-strongly disagree, 5-strongly agree with the statement).

Data collection and marking

Demographic information, including each participant’s age, sex, self-reported VR headset experience and video game experience, was collected during the trial. The participants recorded their group and individual identification numbers on the sign-in sheet. The previous grade point average (GPA) of each participant was obtained from the grade counsellor. The demographic and grouping information were hidden from the test mentor, study mentor and study staff until the trial was completed. The study staff scored each answer sheet, and the results were reviewed by the investigators (Zhu J and Cheng C) twice.

By using the Chen et al.’s [19] mean total scores and the variance data of the post-intervention test, power calculations were performed for this study. The calculations revealed that 26 students were required per group (78 students total) to achieve 80% power to detect a 10% change in the post-intervention total scores at an alpha level of 0.05.

Statistical analysis

The previous GPA, test scores and perception survey scores were expressed as medians (interquartile ranges (IQRs)), and the categorical variables were expressed as numbers (%). The participants’ ages were expressed as means (±SDs). A p-value of < 0.05 was considered to indicate statistical significance. Statistical analysis was performed with SPSS 23.0 (IBM Corp, Armonk, NY).

The data distributions were assessed with the Kolmogorov-Smirnov test. The between-group differences in the pre- and post-intervention test scores, changes in the scores, and perception survey scores were assessed with the Kruskal-Wallis H test because they were found to be non-normally distributed. If there was a significant difference with the Kruskal-Wallis H test, the Mann-Whitney U test was employed for pairwise comparisons. The participants’ ages were compared with ANOVA. The categorical variables, except for video game experience, were compared with the chi-square test; video game experience was compared with Fisher’s exact test.

Results

Participant demographics

A total of 73 third-year medical students (39 females, 53.42%) were included in the study (Table 1). Most participants were 20 or 21 years old. There were no statistically significant differences across the 3 groups in terms of gender, age, previous GPA in the pre-med programme at Tsinghua University, VR experience or video game experience (all p > 0.05).

Table 1 Demographic information in the three groups. aChi-square test. bANOVA. cKruskal-Walis H. dFisher’s Exact test

Comparison of the test scores across groups

The scores for the theory test and identification test were included in the total scores. The maximum scores of the theory test, identification test, and both tests together were 18, 25, and 43 points, respectively. A within-subject analysis showed overall improvement in the test scores from before to after the intervention, and the magnitude of improvement was significantly different across the three groups (p < 0.001). Table 2 displays the results of the pre- and post-intervention tests.

Table 2 Pre- and post-intervention tests score in the three groups. Full scores of theory test, identification test, and total score were 18, 25, and 43 points, respectively. The median and quartiles of the total scores were not simply equal to the sum of the theory score and the identification score. cKruskal-Walis H

No statistically significant difference was revealed across the three groups in the pre-intervention tests (p > 0.05 for the total score, theory score, and identification score). In terms of the post-intervention test, there were no statistically significant differences across the three groups in either the total score or theory score (all p > 0.05). The participants in the VR group performed better in the identification test than the cadaver and atlas groups (Fig. 4), although with no significance (VR: 15 [IQR: 10–18], cadaver: 12 [IQR: 8–15.5], atlas: 13 [IQR: 8–18]; p > 0.05).

Fig. 4
figure 4

Comparison across the three groups in the post-intervention test scores and changes in scores. There were no statistically significant differences across the three groups in the post-intervention test scores and changes in scores

The differences between the pre- and post-intervention test scores of the individual students were considered by using the change in scores. Kruskal-Wallis H analysis revealed that the changes in the scores were not significantly different across the three groups (p > 0.05 for the changes in the total, theory, and identification scores), as shown in Fig. 4.

Results of the perception survey

Comparisons of the results of the perception survey are shown in Table 3. Overall, the participants in the VR and cadaver groups found their assigned learning models to be more enjoyable (VR: 4 [IQR: 3–5], cadaver: 4 [IQR: 3–5], atlas: 2 [IQR: 1–3]; p < 0.001), more interesting (VR: 4 [IQR: 3–5], cadaver: 4 [IQR: 3–4.5], atlas: 2 [IQR: 1–3]; p < 0.001), more authentic (VR: 4 [IQR: 3–5], cadaver: 4 [IQR: 3–5], atlas: 2 [IQR: 1–3]; p < 0.001), and more efficient for memorization (VR: 3 [IQR: 2–4], cadaver: 3 [IQR: 3–4], atlas: 2 [IQR: 1–4]; p < 0.001) and spatial understanding (VR: 4 [IQR: 4–5], cadaver: 4 [IQR: 3–4.5], atlas: 1 [IQR: 1–3]; p < 0.001). The VR and cadaver groups reported a higher intention to include the study material for use in standard anatomy education (VR: 3 [IQR: 2–4], cadaver: 4 [IQR: 2.5–4], atlas: 1 [IQR: 1–2]; p < 0.001).

Table 3 Results of perception survey in the three groups. Full score of perception survey is 35. cKruskal-Walis H. *p < 0.05

Discomfort during the learning session

During the learning process, discomfort including headache, blurred vision and nausea were evaluated in the three groups (Supplementary file 2). Although the participants in the VR group exhibited these adverse effects more frequently than in the other 2 groups, no significant difference was found (VR group: 24%, cadaver group: 12%, atlas group: 8.7%, p > 0.05). Additionally, in the VR group, the total scores of the post-intervention tests did not vary in the participants with and without discomfort (30 [IQR: 19.25–32.5] vs. 30 [IQR: 22–34], p > 0.05).

Discussion

This is the first study to compare VLR with two different traditional teaching methods in a randomized controlled study design, including both objective assessments and perception surveys, namely, “various question types” in previous research [19]. The results of the objective assessment demonstrated that the skull VLR had the same efficiency as the cadaver skull and atlas in enabling students to learn anatomy, despite the relative simplicity of the model used in this study. The post-intervention identification scores were higher in the VR group, although not significantly, compared with the other two groups. This result was consistent with the advantages of VR in stereoscopic observation and operation, which incorporated the intrinsic spatial relationships of the anatomical sites studied and may thus confer a spatial knowledge advantage [11]. The results of the perception survey in the VR and cadaver groups also showed a more positive attitude towards the learning models than the results of the 2D atlas group, which indicated that the VR and cadaver groups had similar levels of enjoyment, learning efficacy and authenticity. Novel interventions usually spark participants’ curiosity and lead to better results [4], and all participants in the VR group were highly enthusiastic to promote the use of this skull VLR in anatomy education. Similarly, previous studies that have compared a 3D VR model with traditional 2D materials also reported that VR was considered to be a more enjoyable and useful educational tool [8, 11, 23].

Cadavers offer high levels of realism, haptic feedback, and the opportunity to use real instruments and tools, which was found to be superior to atlas models in several previous studies [24,25,26], particularly in surgery [27]. In our trial, the scores of the cadaver skull group showed no statistically significant differences from the scores of the atlas group. This discrepancy might partially result from structural variations and small damages of the cadaveric skulls and the negative psychological reactions from participants triggered by the cadaveric skulls [28, 29]. In addition, we combined lecture and model learning to simulate the real learning process. The lectures allowed the participants to learn important information for the tests and narrowed the differences across the three groups. Moreover, exposure to the pre-intervention test will affect performance on an identical post-intervention test through familiarity with the questions and may also influence learning during the intervention [30].

3D VLRs, which provide rapid and feedback-based modifications, offer an opportunity for repetitive practice [23]. Another advantage of VLRs is that students can observe and receive instant visual feedback based on predefined practical tasks [31]. Critics have argued that this approach lacks expert guidance during the learning process, which plays an important role in forming the basic framework [32]. In fact, teachers can also assess students’ learning performance and mistakes through digital reports to further improve students’ skills. Moreover, the participants in our trial conducted self-learning in the absence of guidance and gained substantial progress in learning anatomical knowledge, which is consistent with the results in a previous study [33]. It has been suggested that self-learning in private places is a feasible way to implement VR simulation learning without constraints of the place or time provided for learning [33]. In addition, 3D models are likely to enhance rather than replace lecture-based teaching by experts [23]. Individuals can first practice with a VR simulation rather than with cadavers so that they can repeat the procedures and acquire basic skills before using expensive laboratory facilities [33]. Our study incorporated a room-scale HMD unit, which was available for the individuals. With this unit, many more manipulations can be easily achieved, such as rotating to a suitable view, segmenting a single bone and scaling up the model, which makes it a better tool for understanding difficult anatomical structures. This HMD unit provides a completely immersive experience with a high-resolution display, a high refresh rate, and highly precise, low-latency constellation head tracking.

However, adverse effects caused by the highly immersive experience and the device being positioned in front of the eyes were regarded as negative aspects of the VLRs [18]. Adverse effects, including headaches, dizziness, sore eyes, blurred vision and motion sickness, have been reported [2, 34, 35]. A previous study reported a high adverse effect rate in a VR group (headaches 25%; blurred vision 35%) [2], but this rate was lower in our trial (headaches 20%, blurred vision 4%). It can be inferred that the discomfort caused by the virtual environment could be relieved with increased resolution and lower latency. Newer VR designs are being designed to overcome motion sickness and other VR-associated adverse effects. These designs include grounding the user by allowing their eyes to fix on a constant object such as a virtual nose or hand and decoupling the axes of movement from the visual plane [18].

In addition to physical discomfort, the high degree of immersion in a stereoscopic environment and the novel experience of an immersive learning might make the learning process more mentally taxing [11, 15]. Khot et al. suggested that the extraneous load of the virtual delivery modalities might contribute to the worse performance of students who used these modalities than students who used physical models [36]. Birbara et al. also found that students with minimal prior anatomy knowledge in a stereoscopic cohort had a higher cognitive load [18]. With no prior knowledge, the participants in our study might have experienced a greater mental burden, which undermined their learning efficiency in the learning session.

Although commercially available and low-cost HMDs have further enhanced the opportunities for a truly interactive virtual experience [37], the cost of VLR as a supplement should be noted. Each set of VR equipment cost approximately 700 dollars in our study, and it requires related software, computer hardware and technicians for its maintenance [38]. Considering the benefits of improved learning, repeated use, and versatility, VR can reduce the costs of laboratory material, supervisor staff and even simulated patients. In general, the cost-benefit trade-off for these expensive technologies is likely to vary based on institutional goals and resources [39].

Limitations

Our study had several limitations. First, this is a single-institution study with a small sample size that included only 73 participants, which was close to but smaller than the expected sample size. Further studies with larger sample sizes are needed to examine a broader spectrum of medical personnel including resident physicians, nursing students and related educators. Second, the self-directed learning session was limited to 30 min in our study. The participants in the VR group only received a 2-min instruction for the VR equipment manipulation before the learning session. To reduce the mental burden, prior acquaintance with a virtual environment may be essential in future study designs. Third, as the pre- and post-intervention tests were identical, probability existed that the participants might have purposely focused on the questions in the pre-intervention tests during the 30-min learning session, which may have influenced their performance in the post-intervention tests. Longer learning sessions should be considered in future studies. Finally, this study directly compared VLR with other resources and only combined VLR with traditional lecture. Further studies are needed to identify the optimal combination of VLRs with various teaching methods such as cadaver dissection to fully reflect their effectiveness as a supplementary aid in medical teaching.

Conclusion

In this study, the skull VLR was equally efficient with cadaver skull and atlas in teaching anatomy structures. Such a model can aid individuals in understanding complex anatomical structures with a high level of motivation and tolerable adverse effects. Advances in 3D digital technology have enabled the development of more sophisticated and realistic VLRs, which provide opportunities for its use in traditional anatomy teaching settings as a powerful supplement. Future generations of medical students may benefit from these technologies at the earliest stages of their learning, from VR anatomy models to patient-specific VR simulations, as needed. Additional studies with larger sample sizes are required not only to evaluate the teaching effectiveness of VLRs in a more comprehensive frame but also to investigate the optimal combinations of VLRs with traditional medical education.