Spatial cognition has been studied as a potential predictor of student success in higher education and careers in science, technology, engineering, and mathematics (STEM) since the 1950s when Super and Bachrach published Scientific Careers and Vocational Development Theory, a report that identified common characteristics of the most successful contemporary scientists and engineers (Super and Bachrach 1957). Comprehensive studies by organizations such as Project Talent have consistently shown that high spatial cognition skill is a predominant characteristic among students who go on to pursue higher education and careers in STEM (Humphreys and Lubinski 1996; Lohman et al. 1987; Wai et al. 2009). Success in research careers both in academia and industry that rely on knowledge of physics, biology, chemistry, or genetics (Siemankowski and MacKnight 1971), as well as computer science and programming careers (Jones and Burnett 2007) in particular involves high spatial cognition skill.

Spatial cognition encompasses the ability to recognize the dimensional properties of objects, individually and with respect to other objects. The spatial properties of objects include its location in space, occupation of space, and trajectory of movement in space (Newcombe 2010). The ability to understand objects spatially is important in daily tasks like navigating streets and handling the placement of objects (Wai et al. 2009). It is important to note that despite its predominance in daily life, spatial thinking is not a substitute for verbal or mathematical thinking. All three kinds of thinking are important for success in STEM careers (Newcombe 2010).

The association between high spatial cognition and success in STEM careers and academics has motivated many to study the development of spatial thinking in children and young adults (Wai et al. 2009). Recent studies have shown that spatial thinking is malleable, as elementary school children’s spatial cognition improved more significantly over the school year than over the summer months (Huttenlocher et al. 1998). Similar results were also seen in a study among undergraduate students who practiced various spatially oriented tasks for several months (Terlecki et al. 2008). These findings show that improvement in spatial cognitive abilities is dependent on consistency and routine, rather than a particular age group. The results also demonstrated that it took longer for low-ability participants to start observing significant improvement, whereas high-ability participants showed consistent improvement throughout the same number of training sessions (Terlecki et al. 2008; Wright et al. 2008). This led to studies on understanding effective methods for training general spatial skills using traditional two-dimensional techniques (Wright et al. 2008). The efficacy of training spatial skills in a three-dimensional environment has not yet been investigated.

Virtual reality (VR) is a computer-generated environment that allows the user to interact in simulated three-dimensional situations using specialized technology that integrates the user into the simulated space. The three-dimensional nature of VR experiences and the agency users have when interacting in virtual environments makes it a cognitive behavioral tool (Schultheis et al. 2002). The third dimension of VR adds depth information to traditional spatial cognitive exercises only ever performed in two dimensions; thus, the hypothesis is that training in traditional spatial cognitive tasks in three dimensions will show equal or greater improvement in spatial cognitive skill compared to training in the same tasks in two dimensions.

Methods

Participants

Thirty-three undergraduate student participants (15 males, 18 females), each between the ages of 18 and 22, were recruited from different disciplines at Vanderbilt University and consented (IRB#170264). Participants were randomly assigned to a non-training control group, a traditional two-dimensional training group, or a three-dimensional VR training group. Figure 1 summarizes the testing schema for this study.

Fig. 1
figure 1

a Cognitive tasks. In the mental rotation task (MRT), participants compare two Shepard-Metzler structures to determine whether the shapes were the same or different. The Shepard-Metzler structures were offset by either 50° (easy), 100° (medium), or 150° (hard). In the cube rotation task (CRT), participants compare two cubes with letters, numbers, or symbols on each face to determine whether the cubes were potentially the same or different. The cubes were offset by either no rotation (easy), rotation around one axis (medium), or rotation around two axes (hard). In the verbal analogies task (VAT), participants compare the relationships between two sets of words to determine whether the relationships were the same or different. The analogy questions were obtained from the Read Theory Team and are at either sixth-grade (easy), eight-grade (medium), or tenth-grade (hard) reading levels. b Training schedule. All participants were given a pre- and post-test with a 7-week period in between. The control participants had a 7-week no intervention period during which they were asked to maintain their regular lifestyle with limited use of recreational video games, while the two-dimensional and three-dimensional training groups had a 7-week training period consisting of 21, 15-min sessions

Cognitive Tasks

Participants were evaluated on improvement in reaction times and accuracy from the pre- to post-test based on three different cognitive tasks: the Shepard-Metzler mental rotation task (MRT), cube rotation task (CRT), and verbal analogies task (VAT). In MRT, participants compared two Shepard-Metzler structures to determine whether the shapes were the same or different. The Shepard-Metzler structures were offset by either 50° (easy), 100° (medium), or 150° (hard) (Research: Kit of Factor-Referenced Cognitive Tests 2016). In CRT, participants compared two cubes with letters, numbers, or symbols on each face to determine whether the cubes were the same or different. The cubes were offset by either no rotation (easy), rotation around one axis (medium), or rotation around two axes (hard). In VAT, participants compared the relationships between two sets of words to determine whether the relationships were the same or different (e.g., “paint” and “house” share the same relationship as “fur” and “bear”). The analogy questions were obtained from the Read Theory Team and are at either sixth-grade (easy), eight-grade (medium), or tenth-grade (hard) reading levels (Read Theory Team 2012). The VATs were included to ensure no issues using either the two-dimensional or three-dimensional training formats were present.

Training and Evaluation

All participants were given a pre- and post-test consisting of five warm-up questions and 15 randomly ordered questions of each of the three cognitive tasks, with an equal number of questions in each difficulty level. The tests were provided in a two-dimensional format (computer monitor, keyboard, and mouse) to individual participants in a private office space with a proctor accessible. Five warm-up questions gave participants the opportunity to familiarize themselves with the task and ask questions or clarify instructions with the proctor. The 15 questions were used to determine the reaction times and accuracy differences between the pre- and post-test.

All participants had a 7-week period between the pre- and post-test during which the control group participants were asked to maintain their regular lifestyle with limited use of recreational video games, while participants in the two training groups received 21 sessions of training. The two-dimensional training group received training in a traditional environment with a computer monitor, keyboard, and mouse. The three-dimensional training group received training in a virtual environment using a VR headset and controllers (HTC Vive). During each 15-min training session, participants were presented with unlimited, randomized MRT and CRT questions. Participants were not trained in VAT to address practice effects on performance.

Statistical Analysis

The pre- and post-test results for reaction times and accuracy were obtained. The reaction times corresponding to questions that the participants answered incorrectly were removed from the data set to focus on the improvement associated with correct responses. The reaction time values for easy, medium, and hard were then each normalized from the distribution characterized by a mean and standard deviation. An unequal variance t test was conducted on the data set to compare the improvement from pre- to post-test for participants in each group: control, two-dimensional training, and three-dimensional training. The t-ratio and one-tail p values of the results were recorded. An unequal variance t test was then conducted to compare whether the improvement of one group was statistically greater than another group in reaction time.

The accuracy improvement rates were calculated as the percent difference between the total correct post-test questions and the total correct pre-test questions over the total correct post-test questions. All analysis was completed in Microsoft Excel.

Results

The pre- and post-test results demonstrated that there was no significant difference in reaction times and accuracy rates among the difficulty levels. Therefore, participant improvement focuses solely on the difference in reaction times and accuracy rates between the pre- and post-test results. As shown in Fig. 2a, the results of the control group showed no significant improvement in reaction times in MRT, CRT, or VAT between the pre- and post-tests. The results of the two-dimensional (MRT: t = 2.663, p < 0.05; CRT: t = 1.668, p < 0.05) and three-dimensional (MRT: t = 1.557, p < 0.05; CRT: t = 1.006, p < 0.05) training groups showed significant improvement between the pre- and post-tests. As shown in Fig. 2b, the improvement in reaction times in MRT and CRT of the three-dimensional training group was not significantly different than the improvement of the two-dimensional training group, while the three-dimensional and two-dimensional training groups were both significantly greater than the control group (p < 0.1 and p < 0.05, respectively). Lastly, only the two-dimensional training group had significant improvement in VAT from the pre- to post-test (t = 2.125, p < 0.05).

Fig. 2
figure 2

a Comparison within training groups. The improvement of reaction times in participants completing MRT, CRT, and VAT from pre- to post-test was statistically analyzed by performing an unequal variances t test. The table demonstrates the t-value (p value) of the reaction times. The single-starred cells (*) represent a p value < 0.05, demonstrating statistically significant improvement in reaction times from pre- to post-test. b Comparison between training groups. The average improvements of reaction times from pre- to post-test in completing MRT, CRT, and VAT are shown. A single-starred bracket (*) represent a p value < 0.05 and a double-starred bracket (**) represents a p value < 0.1. A lack of significance is denoted by ns. The error bars represent the standard error. Improvements in the two-dimensional experimental group over the control group and in the two-dimensional experimental group over the three-dimensional experimental group are shown

The accuracy rates (percent correct answers) for the control group were 75.2% (pre) to 78.8% (post), 73.9 to 64.2%, and 84.2 to 73.9% for the MRT, CRT, and VAT, respectively. The accuracy rates for the two-dimensional group were 81.3 to 88.6%, 84.0 to 87.3%, and 89.3 to 82.3% for the MRT, CRT, and VAT, respectively. The accuracy rates for the three-dimensional group were 76.6 to 87.2%, 75.0 to 80.0%, and 76.6 to 68.3% for the MRT, CRT, and VAT, respectively. Statistical testing on the accuracy rate results showed that the control group did not improve in any exercise from pre- to post-test (p > 0.1, data not shown). The two-dimensional and three-dimensional training groups only improved in MRT (p < 0.05). The three-dimensional training group had greater improvement in MRT and CRT than the two-dimensional training group, but the improvement was not statistically significant (p > 0.1). Two-dimensional and three-dimensional groups both showed greater improvement in MRT and CRT as compared with control, though neither was statistically significant (p > 0.1).

Discussion

The results demonstrate the effectiveness of the two training methods in comparison to control. While both training methods showed improvement in reaction times in MRT and CRT, the two-dimensional group showed statistically significant improvement in VAT over both the control group and the three-dimensional group. These findings relate to previous studies that demonstrate the effectiveness of training in spatial cognitive tasks in improving spatial cognition (Wright et al. 2008). We hypothesized that training in virtual reality with an additional dimension will yield at least as much improvement as two-dimensional traditional training. Our results showed no statistical significance between the two- and three-dimensional training groups. While we hoped that training in three dimensions alone would result in greater improvement in spatial cognition, the equality between the two methods is also encouraging as it means that future test setups can be done entirely in VR.

The study limitations that may have contributed to our results include small sample size, unmatched participant fields of study between groups, and unmatched participant year of study (or age) between groups. Due to the length of the study, we were unable to sequester participants to control for other potential influences on spatial cognition or disruptions to the training/testing. Our virtual environment design could have also been a limitation to the study as we recreated the two-dimensional training in three dimensions without giving the participant the ability to move relative to the object in question, whether MRT or CRT. We did not survey the participants for their satisfaction with the type of training, which could have been useful to determine their preference for technology platforms. Lastly, having the study design focus on repetition over trial and error as its training method could have been another limitation to the study. Concealing the accuracy of their performance from the participants prevented them from identifying their mistakes, and consequently, adapting their spatial cognitive thinking to understand the correct response. Future studies could include more elaborate training regimens to take advantage of the entire three-dimensional environment and could provide more feedback to the participants during the training period, while using our methodology as a control. In addition to our study limitations, known factors such as a training phenomenon called “dimensionality crossing” may play a role in spatial cognition training, and thus, impact our results. Dimensionality crossing is the ability to take two-dimensional objects and visualize them as three-dimensional objects by manipulating the shape in the mind, and then relate it back into a two-dimensional space (Neubauer et al. 2010). The two-dimensional group, which was trained in the two-dimensional environment, completed 21 sessions training their dimensional crossing abilities. This phenomenon may have contributed to the improvement in cognitive spatial abilities for this group. Future studies could give subjects the ability to manipulate the objects in VR to potentially strengthen the visualization of such manipulations.

Similar findings have been described in medical training studies exploring options for efficiently training novices with laparoscopic skills. One study observed the impact of training in two-dimensional versus three-dimensional laparoscopic systems and found that the results were comparable between the two systems even though the three-dimensional laparoscopic system was predicted to be more effective (Noureldin et al. 2016). This demonstrates that, while training in a VR environment can be effective in improving spatial cognitive abilities, the complexity of acquiring the materials and costs may not be worth the effort. The dimensionality crossing phenomenon also raises the possibility that traditional, two-dimensional environments may even be more effective in training spatial cognitive abilities than the same training in virtual, three-dimensional environments. A study controlling dimensionality crossing would be illuminating. Since two-dimensional and three-dimensional training have shown to not be significantly different both in our study and those of others, it is possible that the ideal use case for training in a virtual environment has yet to be found (or investigated). The use of VR technology may also improve the desire to train as subjects may be more excited to use it compared with performing the same exercises in 2D.