1 Introduction

Cybersickness (CS) (McCauley and Sharkey 1992), also known as virtual simulator sickness (Howarth and Costello 1997), visually induced motion sickness (Kennedy et al. 2010), or virtual reality-induced symptoms (Cobb et al. 1999) and refers to the negative effects users experience during or after immersion into virtual reality (VR) (Kim et al. 2015; Stanney et al. 2003; Merhi et al. 2007). CS is similar to motion sickness and is commonly associated with vection (Hettinger and Riccio 1992; LaViola 2000; James Smart et al. 2002), but the exact relation between both is unknown. Recent studies suggest that vection is a necessary, but not sufficient, prerequisite for CS (Keshavarz et al. 2015; Kennedy and Fowlkes 1992). CS is nevertheless not restricted to VR and also occurs in other visual display systems, such as large screens, curved screens, and CAVEs (Rebenitsch and Owen 2016). However, in this systematic review, we focus only on CS in current-generation immersive VR HMDs, such as Oculus Rift and HTC Vive.

Empirical evidence shows that 60–95% of participants experience some level of CS during exposure to a virtual environment, whereas 6–12.9% of the participants prematurely end their exposure (Stanney et al. 2003; Arns and Cerney 2005; Roberts and Gallimore 2005; Regan 1995a, b; Kim et al. 2005). CS usually appears approximately 10–15 min after immersion (DiZio and Lackner 1997, 2000; Lampton et al. 1994), although cases have been reported with less time (So and Lo 1999). Once the user leaves the VR environment, symptoms usually disappear in around 15 min (DiZio and Lackner 1997, 2000), but reimmersion causes them to reappear abruptly and severely (DiZio and Lackner 2000). The duration of this susceptibility is unclear (Viirre and Ellisman 2003) and some studies suggest that aftereffects can also persist for hours (Johnson 2005).

Numerous potential solutions to CS have been discussed, e.g., a virtual nose (Whittinghill et al. 2015; Wienrich et al. 2018) or motion sickness medication (Chen et al. 2015). Many previous studies suggest different methods to reduce the discrepancy between virtual and real movements, such as restricting movement to instantaneous locomotion (Christou and Aristidou 2017), manipulating the limited physical VR space with acoustic redirected walking (Nogalski and Fohl 2016), or extrapolating and filtering head movements (Garcia-Agundez et al. 2017). Including environmentally meaningful features such as sound and vibration when operating a virtual vehicle (Sawada et al. 2020) or snapping the viewpoint in sections of significant movement (Farmani and Teather 2018) have also been mentioned. However, user adaptation is widely regarded as the best option to address CS at the moment (Johnson 2005; Golding and Gresty 2015).

The etiology of CS is undetermined and coexists with the possible causes of common motion sickness. The three most common theories are: (1) a sensory conflict theory that is based on a discrepancy between the visual, vestibular, and proprioceptive senses, as well as on expectation and past experience (Reason and Brand 1975), (2) the postural instability theory that describes a physiological response to the inability to maintain bodily postural control (Riccio and Stoffregen 1991), and/or (3) the eye movement theory (Ebenholtz et al. 1994). However, even though several studies support these theories, other studies present competing hypotheses for the cause of motion sickness. For example, Bos (2011) reviewed these findings and found negative correlations between postural instability and CS. Similarly, Lubeck et al. (2015) conclude that postural sway is not necessarily increased by visual motion. Another possible cause is the vergence-accommodation conflict, or the mismatch between the actual and eye focusing distance of a 3D object, as discussed by Kramida (2016). Further research suggests a subjective vertical mismatch theory: when subjective vertical cannot be determined, this will cause motion sickness in general (Bles et al. 1998) and CS in particular (Bos et al. 2008). Additional studies also mention the poison theory as a possible cause for CS (Treisman 1977); however, as already argued by LaViola (2000), this theory is substantially different from the other mentioned before and difficult to verify.

There exist various questionnaires to assess CS, e.g., Motion Sickness Susceptibility Questionnaire (Golding 1998) or Fast Motion Sickness Scale (Keshavarz and Hecht 2011). However, the golden standard is the Simulator Sickness Questionnaire (SSQ) (Kennedy et al. 1993). Studies have been made explicitly on its suitability for VR (Bruck and Watters 2011), considering the differences between CS and simulator sickness (Stanney et al. 1997). In the SSQ, the possible symptoms of CS are evaluated on a scale from zero (none) to three (severe) and then grouped into three blocks: nausea, oculomotor, and disorientation. Finally, a total SSQ score is derived from these three blocks and the simulation is then categorized depending on its score: negligible (SSQ lower than 5), minimal (5–10), significant (10–15), concerning (15–20), and bad simulator (20 or higher) (Stanney et al. 1997). Disorientation symptoms are predominant in CS (Stanney et al. 2003; Kim et al. 2005; Lampton et al. 1994; So and Lo 1999; Lo and So 2001), whereas oculomotor ones are typical of simulators (Stanney et al. 1997) and nausea, as well as emesis, are predominant in motion sickness. Moreover, the incidence of CS has been reported to be 2.5–3 times higher than simulator sickness (Kennedy and Fowlkes 1992; Roberts and Gallimore 2005; Stanney et al. 1997).

Although the amount of research into CS is, as shown, quite extensive, user complaints are still fairly common (Rangelova et al. 2020), so an issue in current HMDs still remains. Current-generation HMD devices, such as the Oculus Rift, have already proven to cause CS (Kim et al. 2015; Garcia-Agundez et al. 2017; Gavgani et al. 2017). Many previous publications explored different factors affecting CS, e.g., individual (age, gender, illness, posture), device (lag, flicker, calibration, ergonomics), and task factors (control, duration) (LaViola 2000; Davis et al. 2014). Similarly, Chang et al. (2020) recently surveyed the causes of CS and identified three major factors (hardware, content, and human factors). Additionally, Rebenitsch and Owen (2016) provide an excellent review on CS. However, these publications usually focus on determining the factors affecting CS and do not consider different VR HMDs, such as HTC Vive.

Hence, in contrast to related work, we conducted a meta-analysis, comparing different HMDs (e.g., Oculus Rift vs. HTC Vive) and stimuli (matched vs. mismatched). The goal of this systematic review is to provide further insight of CS in current-generation immersive VR HMD from three perspectives. Firstly, we aim to determine whether there are significant differences in the intensity and patterns of CS, measured by the SSQ among different VR HMDs (see Sect. 3.1). Secondly, we explore the nature of movement that causes sensory mismatch (see Sect. 3.2). Finally, we determine how biosignals may detect CS (see Sect. 3.3). This may help researchers to draw more definite conclusions into whether these factors do have a significant impact on CS.

2 Methods

In order to conduct the study, we performed a systematic search with the following keywords: virtual reality AND (cybersickness OR simulator sickness OR motion sickness) AND (application OR game), published since January 1, 2013 (release of the first current-generation HMD, Oculus Rift DK1Footnote 1). The search was performed on July 17th, 2019. Inclusion criteria were: (1) the immersive VR application uses a current-generation HMD, (2) the study measures CS with a standard questionnaire, and (3) the study reports some degree of CS among its participants. Exclusion criteria were: (1) non-HMD-based VR applications (e.g., large screens or CAVEs), (2) the study does not specify the immersion time, (3) reviews, (4) keynotes, and (5) books.

Fig. 1
figure 1

PRISMA (Moher et al. 2009) flowchart of study selection

Figure 1 shows an overview of the study selection, as proposed by Moher et al. (2009). In total, we identified 1358 articles through database searching and three further articles through other sources that meet our requirements. After the preliminary screening (i.e., screening of title and abstract), we excluded 1219 records because they did not rely on an immersive VR system (thus, not using a HMD). Afterward, we applied eligibility criteria, again excluding further 93 articles. Finally, we could reduce our corpus to a total of 49 articles.

For the analysis, we extracted important information, e.g., number of users, total immersion time, version of HMD used in experiments, type of application, condition (sitting or standing), and locomotion technique. Then we grouped the publications based on the HMD version and stimuli. We furthermore obtained for each publication total SSQ score (mean and standard deviation). All resulting data for the studies can be consulted in Table 1. Data refers exclusively to post-immersion, non-normalized SSQ scores since normalization was not always performed.

Table 1 Average SSQ values for different HMDs and stimuli

We furthermore conducted a meta-analysis using the “meta” package in R statistical software (Schwarzer et al. 2015). We calculate the overall effect (mean differences), its confidence interval, and p-values to compare SSQ values for different types of VR HMDs (e.g., Oculus Rift vs. HTC Vive) and different stimuli (matched vs. mismatched stimuli). Therefore we filtered the research corpus again and excluded articles that (1) did not employ the SSQ or (2) did not provide explicit SSQ data. The results of the meta-analysis of 32 leftover articles are detailed in Tables 2 and 3.

Table 2 SSQ scores regarding different HMDs and stimuli
Table 3 Meta-analysis results for data comparison

3 Results

3.1 On the differences in CS patterns with different HMDs

In order to determine if there are differences in the CS patterns based on different HMDs, we obtained the total SSQ score and subscores (nausea, oculomotor, and disorientation scores) if available. The results of this analysis are presented in Fig. 2.

Several observations can be extracted from this data. The HTC Vive shows clearly the best SSQ values in total score as well as in nausea and disorientation subscores. Regarding oculomotor scores, Oculus Rift DK1 and HTC Vive have similar results. However, this may be due to the limited sample size since there are only four available subscores for DK1 (in contrast, for Vive, nine studies reported subscores, including nausea, oculomotor, and disorientation). As detailed in Table 3, there are also significant differences between the HTC Vive and the Oculus Rift DK1 as well as DK2 for all stimuli (\(p<0.0001\)). Unfortunately, there are not enough studies or information to elicit a comparison between the Oculus Rift CV1 and HTC Vive. In any case, future studies would likely increase statistical power.

Considering the differences between the Oculus Rift DK1 and the DK2 are not significant (\(p=0.4764\)), we suspect that changes in resolutions and refresh rates have had only a marginal impact on CS when compared to changing the means of locomotion and environment interaction, which is the essential difference between the presented HMDs (see also Sect. 3.2). In fact, we observed higher scores in the Oculus Rift DK2 in comparison to the DK1 (see Fig. 2). However, as already mentioned before, this may be due to the limited sample size. Nevertheless, as can be seen in Table 1, Oculus Rift DK1 causes a higher withdrawal rate (\(24.50\%\)) compared to DK2 (\(9.60\%\)). Since the production of Oculus Rift DK1 and DK2 has been stopped, it is unlikely that more results will arise in the future.

Fig. 2
figure 2

SSQ score results classified by HMD

3.2 On nature of movement

3.2.1 Sensory mismatch

Many studies have drawn attention to the fact that mismatched stimuli cause subjects to experience CS. In particular, CS occurs when the users perceive self-movement in the virtual environment while actually remaining stationary, e.g., different kinds of driving or flying simulators (see Sect. 3.2.2) and different locomotion techniques (see Sect. 3.2.3).

Therefore, to further investigate the effect of sensory mismatch, we conducted a meta-analysis on different VR HMDs and different stimuli. The data in Table 2 and Fig. 3 show lower SSQ scores for virtual environments with matched stimuli (average SSQ values of 12.78) compared to virtual environments with mismatched stimuli (average SSQ values of 31.84). Furthermore, on average, \(7.11\%\) of the subjects dropped out when perceived and real motions do not match (see Table 1). In contrast, studies with matched stimuli show lower withdrawal rates (on average, \(3.21\%\) of the subjects dropped). These results show that the SSQ scores and the withdrawal rate are higher for mismatched stimuli. Moreover, the meta-analysis results in Table 3 show that total SSQ scores between mismatched and matched stimuli are significantly different (\(p<0.0001\)).

We furthermore analyzed the SSQ scores of matched and mismatched stimuli depending on different HMDs. As detailed in Table 3, there is a significant difference between matched and mismatched scores for Oculus Rift DK1 (\(p<0.0001\)) and HTC Vive (\(p=0.0169\)). In fact, we found a significant difference for HTC Vive compared to other HMDs for mismatched stimuli; however, not for matched stimuli. For example, we found a significant difference between mismatched stimuli between HTC Vive and Oculus Rift DK1 (\(p<0.0001\)) as well as Vive and DK2 (\(p<0.0001\)); however, not between matched stimuli for Vive and DK1 (\(p=0.0944\)). Unfortunately, we could not compare HTC Vive and Oculus Rift DK2 because our analysis does not include any study reporting SSQ values for matched stimuli for DK2. Nevertheless, because there is a significant difference between matched and mismatched stimuli independently of the HMD, we can assume that the mismatched stimuli are one of the main causes of CS.

Fig. 3
figure 3

SSQ scores classified by type of stimuli

Additionally, a too high latency can cause a mismatch between the perceived and real motions. Thus, the latency of a HMD can contribute to CS, especially when the users can perceive a latency lag between the head movements and the corresponding visual feedback on the HMD. Latency jitter seems to significantly affect CS (Stauffert et al. 2018). Delays above 40 ms already evoke CS (DiZio and Lackner 2000), although significant CS symptoms appear upwards from 75 ms (Caserman et al. 2019). Many researchers suggest keeping the latency below 20 ms; however, the latency does not appear to be the main cause for CS and a significant decrease in the overall system’s latency will not abolish CS (Fuchs 2017).

3.2.2 Perceived motion in VR simulators

Some studies explicitly investigated the severity of CS in VR simulators, e.g., roller coasters or other driving and flying simulators. Roller coasters usually cause participants to terminate the ride prematurely due to nausea. For example, researchers report a withdrawal rate of up to \(92.86\%\) (Nesbitt et al. 2017; Nalivaiko et al. 2015; Gavgani et al. 2017). These kinds of driving or flying simulators provoke CS because subjects usually sit in a stationary chair while exposed to linear and angular accelerations. Furthermore, several studies show that SSQ scores are generally higher in VR-HMD conditions compared to non-VR conditions. For example, Walch et al. (2017) studied the intensity of CS while the participants either drove a car visible on the flat screen or via a HMD. In this study, the participants reported higher CS symptoms scores in VR-HMD setup (SSQ scores of 29.09) compared to the screen setup (SSQ scores of 16.41). Weidner et al. (2017) have expressed a similar view. The researchers report that in the VR-HMD condition, the participants reported significantly higher CS symptoms (SSQ scores of 30.91) than in the stereoscopic 3D condition (SSQ scores of 13.49). As discussed by Kramida (2016), the vergence-accommodation conflict could be an explanation for this difference.

These results suggest that immersive VR applications such as virtual roller coasters (Nesbitt et al. 2017; Nalivaiko et al. 2015; Sra et al. 2019; Gavgani et al. 2017; Jin et al. 2018; Onuki et al. 2017), car simulations (Walch et al. 2017; Weidner et al. 2017; Rietzler et al. 2018; Oishi et al. 2016), bike driving simulators (Tran et al. 2018; Mittelstädt et al. 2019), flight simulations (Garcia-Agundez et al. 2019a; Mirhosseini et al. 2017) and wheelchair simulators (John et al. 2018; Chowdhury et al. 2017b) cause higher SSQ scores due to sensory mismatch and, in particular, due to perceived self-motion while remaining stationary. Some studies suggest that air cushions (Onuki et al. 2017) or vibro-kinetic seats (Gardé et al. 2018) can mitigate CS symptoms while driving. Sra et al. (2019) furthermore found a similar result and reported that galvanic vestibular stimulation could also reduce CS.

3.2.3 Locomotion techniques

Many studies investigated the effects of different locomotion techniques (see Table 4). The SSQ scores reported by Christou and Aristidou (2017) and Frommel et al. (2017) indicate that artificial continuous locomotion techniques are generally more likely to cause CS than discrete locomotion techniques. These results are also in agreement with Habgood et al. (2018). The authors did not provide explicit SSQ values; however, their results show that locomotion techniques with continuous movements, such as free joystick-based movements, cause significantly higher nausea and oculomotor scores than discrete movements, such as teleportation. Furthermore, Tregillus et al. (2017) compared different continuous locomotion techniques and according to Stanney et al. (1997) (pointing out that scores higher than 20 indicate a bad simulator), all three locomotion techniques seem to elicit CS.

Table 4 Summary of studies investigating different locomotion techniques

Additional studies compared different artificial locomotion techniques with natural walking. In general, the results in Table 4 show low SSQ scores (on average, below 20) for natural walking. For example, Christensen et al. (2018) investigated two different conditions. The participants reported negligible symptoms when they could physically walk in the real world (SSQ scores of 1.87) and concerning symptoms when using a joystick (SSQ scores of 19.45). Similarly, the work of Llorach et al. (2014) shows that a position estimation system (SSQ scores of 15.93) induces less CS than game controllers (SSQ scores of 32.27). Furthermore, research by Wilson et al. (2018) suggests that natural walking without translation gain (one-to-one mapping of virtual space to physical space) causes only minimal CS symptoms (SSQ scores of 10.0). In contrast, additional translation gain causes significant CS symptoms (SSQ scores of 21.0). Moreover, Krekhov et al. (2018) pointed out that natural walking as a giant causes lower SSQ scores (SSQ scores of 10.47) than teleportation (SSQ scores of 19.95).

3.3 On biosignal-based alternatives to the SSQ

As already stated in Sect. 1, the SSQ is a commonly used method to quantify CS. Another option to explore CS is to study its physiological effects. This effect can be measured via the analysis of different biosignals, as an alternative to subjective measures such as the SSQ. For example, CS has been reported to increase cortisol levels in saliva (Kennedy et al. 2010) or to cause tachycardia (Hu and Stern 1999; Imai et al. 2006). Furthermore, it seems that CS often correlates with facial pallor, sweating, and respiration rate variations (Johnson 2005) as well as with heart rate variability (Rieder et al. 2011). Unfortunately, individual differences in autonomic regulation and variations caused by the experience but not by a negative reaction to it (sickness) make it challenging to predict CS based on variables such as the heart rate or respiration rate (Kiryu and Iijima 2014).

Thus, in this section, we aim to determine how reliable biosignals are in detecting CS. The corpus was again filtered and all articles that did not use biosignals were excluded. A total of six articles fulfilled this criterion, which are presented in Table 5.

Table 5 Summary of studies using biosignals

From these results, several observations can be drawn. Firstly, changes in galvanic skin response seem to be the most reliable method to detect CS. Four out of six studies used galvanic skin response to detect CS and also reported an increase or at least changes of statistical significance. Secondly, although CS clearly elicits changes in heart rate and heart rate variability, it does not seem that an increase or decrease can be explicitly linked to higher rates of CS. For example, an increase in heart rate can occur due to physical activity, whereas a decrease in heart rate variability can indicate a higher stress level. Thus, the heart rate does not seem to increase only due to CS. In this sense, heart rate alterations may be used to verify CS that has already been detected by a different method, such as galvanic skin response, changes of in-game behavior, or verbal communication.

Results also suggest that increases in the respiratory rate, basal finger temperature, or tachygastria may also be used to predict CS. In general, we believe these results confirm the advantage of biosignals to determine CS without using subjective questionnaires, such as SSQ.

4 Discussion

In this paper, we analyzed 49 studies that investigated CS among participants while they were immersed in a virtual environment using a current-generation HMD. We focused on determining whether there are differences in the CS patterns based on different HMDs, whether CS increases when stimuli do not match, and how biosignals can detect CS. A summary of the statistical conclusions based on the meta-analysis is provided in Table 6.

Table 6 Summary of the meta-analysis

Although more studies on the Oculus Rift CV1 are required, it is clear that last-generation HMD devices have significantly fewer problems with CS, although CS is still present. A comparison between different HMDs showed that HTC Vive with accurate positional tracking causes significantly less nausea and disorientation symptoms, but oculomotor ones remain at similar levels (see Sect. 3.1). More particularly, a meta-analysis revealed statistically significantly lower (\(p<0.0001\)) SSQ scores for HTC Vive compared to Oculus Rift DK1 and DK2 (see also Table 6). Nevertheless, given the expansion of the VR market, numerous more studies are required to test all new HMDs reaching the market (Valve Index, Oculus Rift S, Oculus Quest, Samsung Odyssey, and Microsoft Augmented Reality devices).

Regarding the nature of movement, the meta-analysis revealed significant higher SSQ scores (\(p<0.0001\)) for studies with mismatched stimuli compared to the studies with matched stimuli (see Sect. 3.2.1). In particular, VR simulations that force movements upon the users, e.g., virtual roller coasters, driving, and flying simulations, are more susceptible to CS (see Sect. 3.2.2). Additionally, the results show that locomotion techniques (see Sect. 3.2.3) with mismatched stimuli (e.g., joystick-based locomotion techniques) cause significantly higher severity symptoms compared to locomotion techniques with matched stimuli (e.g., continuous movements such as natural walking or discrete movements such as teleportation). Especially joystick-based locomotion techniques that allow the user to sit on a stationary chair to explore the virtual environment through continuous movements significantly increase the probability that CS occurs. In contrast, natural walking results in lower SSQ scores, indicating that it does not provoke CS. This evidence complements related work (Rebenitsch and Owen 2016), showing that the navigation is still strongly correlated with CS. Recent research suggests that adding motion cues to the visual stimuli may not necessarily reduce CS (Klüver et al. 2015). Interestingly, there is evidence opposed to this conclusion in the case of simulators (Gardé et al. 2018).

Concerning biosignals, galvanic skin response is by far the best signal to predict and detect CS. More numerous studies exploring galvanic skin responses in users experiencing CS may provide sufficient information to develop a reliable linear regression algorithm to more accurately predict CS in real-time.

In summary, based on the meta-analysis, it appears that the nature of movement is the main reason for CS. To prevent that CS occurs in immersive VR applications and games, the developers should avoid sensory mismatch, i.e., they should provide room-scale environments where users can naturally walk in the physical and virtual world. If the direct mapping of movements in VR is limited by the physical tracking space, then the game should instead employ discrete locomotion techniques, such as teleportation. Modern HMDs with accurate head tracking can also help to prevent CS.

In general terms, it seems the traditional evaluation scales for the SSQ, with scores over 20 describing “a bad simulator,” are outdated. Significantly higher values can be expected across all HMDs and a new evaluation scale should be implemented. Considering that in our corpus a SSQ score of 40 or higher usually means a withdrawal rate of approximately one third, we suggest 40 would be a more precise value to describe “a bad simulator,” meaning an outdated HMD or a VR scenario design that causes specifically abnormal levels of CS for current standards. This is not a critique on the SSQ itself, but more of an observation of differences of current VR and the simulators it was initially designed for.

Finally, we believe it is also important to restate some general remarks into how to conduct CS or SSQ-based studies in the future: samples should be larger than 15 users (Kennedy and Fowlkes 1992) and immersion times should be of at least 20 min (Wilson 1996). During the evaluation, early indicators of CS may be atypical eye movements, sweating, or fiddling with the HMD (Cobb et al. 1999). In general terms, the more correlated virtual and real movements are, the less CS than it is to be expected (LaViola 2000; Regan 1995a). Finally, SSQ values of up to 40 can be expected.

4.1 Limitations

We aimed to identify the main causes of CS; however, this article presents several limitations. Firstly, the number and nature of studies on the Oculus Rift CV1 are insufficient to draw any significant conclusions on the impact of CS on this device. The same can be said for other HMDs released during 2019. In general terms, the number of studies on which we performed our meta-analysis was limited. Secondly, authors on VR-based CS studies tend not to provide full SSQ results or information on the dropout/CS complaint rates. However, despite these limitations, several conclusions based on statistically significant differences can be drawn from our meta-analysis.

5 Conclusion

In this paper, we have reviewed the state of the art research on CS with current-generation VR HMDs. We included 49 studies that met our criteria. To discuss the main causes of immersive VR-caused CS, we conducted a meta-analysis and compared different HMDs (e.g., Oculus Rift vs. HTC Vive) and stimuli (matched vs. mismatched). Firstly, the meta-analysis results show that last-generation HMD devices have significantly fewer problems with CS, although these are still present. Secondly, mismatched stimuli cause a significant increase in CS compared to matched stimuli. Especially VR flying or driving simulators as well as continuous locomotion techniques with mismatched stimuli (e.g., joystick-based movements), cause higher SSQ scores. In contrast, discrete locomotion techniques, such as teleportation or room-scale setups enabling natural walking cause significantly lower CS symptoms. Finally, concerning biosignals, galvanic skin response seems by far the best signal to predict and detect CS in real-time. We hope that this survey will encourage game developers and researchers in the future to study the causes and solutions regarding CS using modern HMDs.