1 Introduction

Back in 1999, Cobb et al. (1999) introduced the term virtual reality induced symptoms and effects (VRISE) as a way of addressing multiple health and safety related issues that commonly occur in users when they are subjected to headset-based virtual reality (VR). Regarding the etiology of these issues, the authors raised concerns about a variety of potential factors, from virtual environment (VE) design and task configuration to individual characteristics of each VR user, calling for further research into understanding the causality of individual experiences with VRISE, as well as aspects of VR hardware and software design that may influence their intensity. Almost 25 years and several headset generations later, researchers across multiple disciplines are still attempting to answer the same questions.

Ranging from perceptual and cognitive effects to changes in affective state, physiological functions, postural instability, and ergonomic symptoms, there is a multitude of ways in which VRISE can manifest in a user, but the one that has arguably garnered the most attention is the state referred to as cybersickness. Usually characterized by symptoms such as disorientation, nausea, and oculomotor difficulties, cybersickness is most often thought to arise in response to a sensory conflict between the visual and the vestibular sense (Reason and Brand 1975), although other theories are also discussed across literature (LaViola 2000; Rebenitsch and Owen 2016).

Over the years, while researching cybersickness, various application-related factors have been considered as potential contributors to its occurrence and intensity. As discussed in Chang et al. (2020), cybersickness often increases with a pronounced sensation of vection—the so-called illusion of self-motion—arising as a result of optical flow as the user traverses through the virtual scene. Thus, the likelihood that a particular VR application will provoke cybersickness in its users depends on different aspects of the virtual environment, such as its level of visual detail or the implementation of locomotion method used.

Drawing upon this knowledge, industry leaders, such as Meta (Meta 2023b), as well as individual Internet marketplaces dedicated to immersive content—for example, The VR Shop (2023)—are increasingly opting to disclose comfort ratings for individual applications to inform customers of the possibility of cybersickness onset during use. While commendable and useful in providing some level of distinction between different types of content, at the time of this writing, these ratings systems have not yet been standardized, nor are their criteria fully transparent to the consumer base. Furthermore, presenting what are essentially cybersickness ratings as comfort ratings significantly oversimplifies the diverse issues that may arise from the use of VR technology. Lacking transparency and failing to address major contributors to VR discomfort, these ratings are inadequate for most users and arguably offer little assistance to a broad range of users with disabilities and health issues that may be exacerbated by VR use.

The overwhelming focus on cybersickness is also present in the academic community. In their efforts to research this condition and its potential causes, researchers have been utilizing a broad inventory of tools, from questionnaires to physiological measures. In the decades since its conception, the Simulator Sickness Questionnaire (SSQ) (Kennedy et al. 1993)—initially developed for measuring simulator sickness—has not only become the most popular measure of VR-induced cybersickness, but also the most frequently used tool for measuring VRISE (Kourtesis et al. 2019; Somrak et al. 2021), despite covering only a small subset of potential symptoms and effects that commonly arise in VR users.

The prevalence of SSQ as a primary, or sometimes even solitary, measure of VR-induced discomfort is confirmed by the comprehensive literature review by Hirzle et al. (2021), as the authors criticize the comparative lack of studies addressing discomfort that arises from other potential causes, such as unoptimized ergonomic design of head-mounted displays (HMDs). The issue of this disparity is further highlighted by the results of a fairly large-scale study (N=325) presented by the authors in the same publication. Despite the aforementioned focus on cybersickness, the majority of experienced VR gamers who participated in the study rated both ergonomic factors and digital eye strain as more relevant contributors to VR-induced discomfort (Hirzle et al. 2021), which calls for an urgent re-evaluation of common practices in VR research.

Similarly, the recent literature review by Souchet et al. (2022), centered around the topic of VR ergonomics and resulting risks at the workplace, extended its focus to include not only cybersickness, but also visual fatigue, muscular fatigue, stress, and mental overload. The authors noted that visual fatigue is often being confused with cybersickness, while the remaining issues are rarely even considered. The research agenda proposed as a conclusion to this narrative review lists several research issues to be explored, including a more holistic examination and modeling of relationships between cybersickness, visual and muscular fatigue, stress, and mental overload, as opposed to previous approaches to exploring VRISE, which generally only focused on one effect at the time. Moreover, the authors call for research into interaction techniques that reduce muscular fatigue, while also providing an incentive for the inclusion of novel measures of cybersickness in place of outdated questionnaires that are often used in cybersickness research.

While Souchet et al. (2022) address VRISE in the workplace setting, our focus is on VR gaming. Unlike applications used for utilitarian purposes, which are streamlined and usually designed to require the least amount of effort necessary for the completion of a given task, games are intended to challenge the user’s cognitive and motor abilities. In VR gaming specifically, movement plays an important role, contributing to a positive user experience (Yoo et al. 2018). Although VR games tend to be rated higher in terms of appeal (Pallavicini et al. 2017), enjoyment (Shelstad et al. 2017), and presence (Pallavicini et al. 2019) compared to the same game played on non-immersive platforms, research analyzing the Steam game market shows that players tend to rate VR games lower than non-VR games (Foxman et al. 2020). Moreover, there appears to be a preference towards VR games that are shorter in duration, which the authors theorize could be due to VRISE, highlighting eye strain, physical fatigue, thermal discomfort, and cybersickness as potential culprits (Foxman et al. 2020).

In addition to being physically uncomfortable, common symptoms arising from VR use, such as fatigue, disorientation and oculomotor issues, raise further concerns regarding impaired cognitive ability and reduced attention following exposure to VR (Varmaghani et al. 2022). Such effects are especially problematic when considering the increased risk of operating vehicles and machinery soon after VR use. While researchers have long warned about the dangers of driving immediately after experiencing VR content (e.g., LaViola (2000)), and drivers are generally aware of cognitive impairment risks from other causes (e.g., alcohol, sleep deprivation), the potential impact of VRISE on cognitive performance may not yet be fully ingrained in the public conscience. At the time of this writing, research on the impact of VR use on cognitive performance is mixed (as discussed in Varmaghani et al. (2022)), but multiple sources (Nalivaiko et al. 2015; Nesbitt et al. 2017) have identified some level of correlation between cybersickness symptoms and increased reaction time (RT). It is important to consider, however, that such claims often stem from research conducted on rollercoaster simulators or other vection-producing materials that are known to provoke cybersickness, which may not be representative of VR content that employs other locomotion and interaction techniques. For example, such findings may not be generalizable to highly interactive applications, standing or Room-Scale VR applications, VR applications involving significant physical workload and/or VR applications requiring precise manipulation of virtual objects—i.e., applications with elements commonly found in VR gaming.

In this article, we present the results of two user studies conducted with the objective of exploring the frequency and intensity of various symptoms and effects (excessive workload, cybersickness, pain and muscle fatigue, device-related symptoms, negative impact on cognitive performance) that plague users as they experience different mechanics (slash, pick-and-place, shoot) and genres (action, shooter, cooking simulator, rhythm-based game) commonly encountered in VR gaming. With these studies, we aimed to address the variety of standing/Room-Scale VR games, as well as the diversity of the resulting VR user experience, stressing the need for VR research approaches and content design paradigms that recognize and account for different ways in which application design choices may affect different users. Therefore, the main contribution of our article is the overview of VRISE experienced by young, healthy participants after experiencing multiple 20-minute VR gaming sessions/episodes (each for a different game).

Addressing the research objective, we define the main research question (MRQ) of the article as follows: What are the short-term effects of popular VR game mechanics on physical discomfort and reaction time? To explore this further, we devised specific research questions (RQs) aimed at addressing the impacts of each individual VR game:

  • RQ1: How heavy was the workload for the player during this VR game?

  • RQ2: How did this VR game influence the intensity of pain and muscle fatigue?

  • RQ3: To what extent was device-related discomfort experienced during this VR game?

  • RQ4: To what extent did this VR game influence the intensity of cybersickness symptoms?

  • RQ5: What was the overall prevalence of individual VRISE during this VR game?

  • RQ6: Which individual VRISE was the most bothersome for this VR game?

  • RQ7: To what extent did this VR game influence simple reaction time?

To collect this information, we used established tools for the evaluation of discomfort (the Borg Category-Ratio (CR-10) scale; Borg (1998)) and simple reaction time (i.e., the time of reaction to a single stimulus measured in the Deary-Liewald RT task; Deary et al. (2011)), while including the state-of-the-art VR-specific variations (Cybersickness Questionnaire—CSQ (Stone III 2017), Virtual Reality Sickness Questionnaire—VRSQ (Kim et al. 2018), Simulator Task Load Index—SIM-TLX (Harris et al. 2020)) of widely used measures of cybersickness (SSQ) and workload (NASA Task Load Index—NASA-TLX (Hart and Staveland 1988)). In light of recent critiques of the SSQ, this paper also provides a discussion on the suitability of SSQ and its novel alternatives for the evaluation of cybersickness in standing and Room-Scale VR games.

After collecting relevant information for each individual episode of gaming, we were able to compare cybersickness symptoms, pain and muscle fatigue, and reaction time before and after exposure in VR for each individual game. For measures that were taken only following VR exposure (workload and device-related discomfort), we focused on comparing results between games of different genres and mechanics.

This article extends our previously published conference paper (Vlahovic et al. 2021), where we reported the results of a study (referred as Study 1 for the remaining part of this paper) focused on exploring VRISE resulting from playing VR games with slash, pick-and-place, and shoot mechanics. In this article, we provide a more thorough analysis of the results of Study 1 (N=20). Additionally, we present the results of a new study (referred to as Study 2) which utilizes the methodology presented in the original paper with slight modifications. Study 2 incorporates a different population of participants (N=20) and a separate set of games chosen based on the criteria of sharing mechanics with the games in Study 1 (i.e., slash, pick-and-place, shoot).

The paper is organized as follows. Section 2 provides an overview of related work on the topics of cybersickness, ergonomics, device-related discomfort, and cognitive impact related to VR use. Section 3 presents the methodology of both studies, including details related to game choice, hardware choice, methods, procedure, and participants. The results are listed in Sect. 4. Each subsection of Sect. 4 can be mapped to specified research questions, providing the analysis of workload (RQ1), pain and muscle fatigue (RQ2), device-related discomfort (RQ3), cybersickness (RQ4), overall prevalence and ranking of reported VRISE (RQ5, RQ6) and measured reaction time (RQ7). By comparing results between games, each of these subsections provides input for the MRQ. While a discussion regarding each individual RQ/VRISE category is presented in corresponding subsections, Sect. 5 provides a high-level discussion on the overarching themes of the paper, along with an overview of limitations and ideas for future work. Concluding remarks are summarized in Sect. 6.

2 Background and related work

Hirzle et al. (2021) conducted an online user study with 352 participants who reported symptoms of VR-induced discomfort after a 30 min period of participating in a VR experience of their choice. The study targeted fairly experienced VR users, the majority of whom reported using VR technology on a weekly basis. Based on the study’s results, the authors extracted a factor model of VR-induced discomfort, which was divided into six factors: one for digital eye-strain, one for simulator sickness, and four relating to ergonomic symptoms. Among these factors, the most pronounced symptoms related to sweating and “perceived change” that occurs as the user is wearing the device (i.e., feeling physically different and perceiving differences in their movement, which becomes hindered as a result of wearing the HMD). Overall, digital eye-strain symptoms were less pronounced than ergonomic symptoms, with simulator sickness symptoms being the least pronounced. When asked to rate the perceived relevance of the symptom groups, participants rated simulator sickness symptoms as the least relevant, with digital eye-strain and ergonomic symptoms receiving similar ratings. Results of this study reaffirm the need for embracing more comprehensive methodological approaches for evaluating VR-induced discomfort.

2.1 Simulator sickness and cybersickness

Simulator sickness—the term used by Hirzle et al. (2021) referring to symptoms measured by the Simulator Sickness Questionnaire (SSQ) (Kennedy et al. 1993)—is often used interchangibly with the term cybersickness, which we chose to use in this article. However, other researchers ensure to distinguish between the two, as simulator sickness occurs during simulator use, while cybersickness arises from experiencing virtual environments. Furthermore, according to Stanney et al. (1997), the two distinct types of sickness differ in terms of symptomology. Cybersickness has been shown to cause symptoms of higher intensity compared to simulator sickness, as measured by the SSQ. Additionally, in case of cybersickness, scores for nausea (SSQ-N) and disorientation (SSQ-D) symptom groups were higher compared to oculomotor symptoms (SSQ-O), whereas the opposite was true for simulator sickness.

While the work presented in (Stanney et al. 1997) was conducted over 25 years ago, the topic of using simulator sickness measures, such as the SSQ, in the context of evaluating cybersickness still interests researchers. The Virtual Reality Sickness Questionnaire (VRSQ) (Kim et al. 2018) and the Cybersickness Questionnaire (CSQ) (Stone III 2017) are fairly recent variations on the SSQ aimed specifically at measuring cybersickness rather than simulator sickness. Both questionnaires include only a subset of symptoms measured in the original questionnaire. Contrary to the aforementioned findings of Stanney et al. (1997), symptoms excluded from both VRSQ and CSQ, such as sweating, increased salivation, stomach awareness, and burping, primarily belong to the nausea category of symptoms. Both VRSQ and CSQ include symptoms such as fullness of head, eye-strain, difficulty focusing and blurred vision, vertigo, and dizziness with eyes closed. We note the following differences between the two questionnaires:

  • certain symptoms are included in one of the SSQ variants but missing from the other,

  • general discomfort and fatigue are included only in the VRSQ, and

  • nausea and dizziness with eyes open are only included in the CSQ.

The two questionnaires also differ regarding the scoring system and highlighted dimensions of cybersickness, with VRSQ including oculomotor symptoms and disorientation as separate dimensions (referred to later as VRSQ-O and VRSQ-D, respectively), and CSQ including difficulty in focusing and dizziness (referred to as CSQ-DF and CSQ-D, respectively). Unlike SSQ and VRSQ, CSQ does not provide a way to calculate the overall (total) sicknesss score (referred to as SSQ-T and VRSQ-T).

Sevinc and Berkman (2020) found that both VRSQ and CSQ outperformed the SSQ in terms of psychometric qualities for evaluating headset-based VR applications. Both VRSQ and CSQ were shown to provide a valid, reliable measure of cybersickness, although it remains unclear which of the two performs better. However, the authors also reflect on different approaches taken by the developers of VRSQ and CSQ regarding statistical methods and sample sizes, as well as the limitations pertaining to test material and hardware choices used to validate the questionnaires, noting that both questionnaires were developed in conditions that may not have been fully representative of the use cases they were intended for.

2.2 Ergonomics and device-related discomfort

Considering that VR experiences which utilize either hand tracking technology or handheld controllers tracked in six degrees of freedom (6DoF) usually rely on mid-air interactions, it is vitally important to consider ergonomics in application design. The so-called gorilla arm syndrome refers to muscle fatigue and perceived heaviness in the arms that occur following prolonged mid-air interaction (Boring et al. 2009). Souchet et al. (2022) list examples of user-related factors (age, body mass index), hardware-related factors (headset weight and fastening straps, used interaction devices, errors in position tracking, display resolution), and software-related factors (duration of VR use, required head and body rotation, required general posture, amplitude of gestures, task repetition, body part representation and feedback) that contribute to VR-induced muscular fatigue.

Certain movements and postures are more likely to cause significant muscle fatigue. For example, interacting with the system with the arm fully extended at shoulder height is more fatiguing than lowering the arm with a bent elbow (Hincapié-Ramos et al. 2014; Hansberger et al. 2017). Previous work has shown that placing targets in VR 15 degrees or more above eye level or 30 degrees or more below eye level leads to greater discomfort in neck and shoulder muscles (Penumudi et al. 2020). Thus, VR application designers can reduce the user’s physical workload by mindful placement of target objects, avoiding positions that would impede prolonged use.

While the aforementioned findings can serve as a useful rule of thumb for applications that focus on efficiency rather than entertainment, there is a gap in research regarding the design of interaction mechanics for VR games (Vlahovic et al. 2022), as decreasing physical workload in VR gaming may actually backfire. For example, Yoo et al. (2018) have shown that players prefer VR games that require light physical exertion as opposed to those that require no exertion. Similarly, Evans et al. (2021a) suggest that players prefer games that require light physical exertion with mechanics that predominantly involve arm movements. Conversely, there is such a thing as too much exertion, as players wish to avoid sweating and excessive physical demands (Yoo et al. 2018). While temporary physical exhaustion and discomfort detract from the user experience, potentially affecting player retention, this topic has not yet garnered a lot of interest (or raised serious concern) among the research community, at least compared to efforts invested toward investigating and mitigating application design factors that aggravate cybersickness symptoms.

Unfortunately, considering that wearing a VR headset was shown to impair user posture, stressing the musculoskeletal system (Knight and Baber 2007), wearing the headset while performing energetic, repetitive movements may lead to serious consequences, even if the game of choice does not seem too demanding regarding cardiovascular load. Baur et al. (2021) describe a recent case of a healthy 31-year-old who fractured a vertebra as a result of playing “a VR video game involving combinations of shoulder, arm and head movements to rhythmic visual and musical triggers” (Baur et al. 2021, p. 2). Stressing that energetic movements performed during VR gaming should not be underrated as a potential cause of injury, the authors predict further occurrences of such traumatic injuries as VR technology continues to grow in popularity.

While manufacturers are attempting to fix ergonomic issues of VR hardware by eliminating cables and reducing the size and weight of HMDs, other conditions of HMD use may be strong determining factors in whether a particular HMD is considered safe and comfortable. In addition to the time spent using the HMD, both speed and range of movement, as well as target placement, may impact the physical workload imposed on the musculoskeletal system, increasing energy expenditure and potential for injury. For example, Chihara and Seo (2018) examined the effects of HMD mass and center of mass position on physical workload in different body postures. They stress the necessity of considering physical workload in the neck when designing HMDs, further noting that different neck postures (dependent on target placement) would benefit from different center of mass positions. The authors called for further research into HMD ergonomics using different test conditions.

As discussed by Iskander et al. (2018), experiencing virtual environments for prolonged periods of time may result in visual fatigue (e.g., double vision, eye strain) as well as symptoms related to the ocular surface (e.g., stinging, watery eyes), as digital screen use tends to reduce users’ blink rate (Patel et al. 1991). The occurrence and intensity of ocular symptoms in VR may be affected by different device-related factors, such as Field of View (FoV), latency, and various optical aspects e.g., contrast and luminance (Yuan et al. 2018). Visual fatigue may occur as a result of the vergence-accomodation conflict (caused by the proximity of the HMD) and motion during simulation (Iskander et al. 2018). It may also be affected by the brightness of the displayed graphics, as the darker color mode was shown to be a less fatiguing option (Erickson et al. 2020). Higher resolution and dynamic lens focus are also expected to benefit visual comfort during VR use (Kim and Shin 2021).

2.3 Cognitive impact

Even with display improvements in refresh rate and resolution, visual design choices made during software development may produce oculomotor symptoms, while strobing lights used in special effects could trigger migraines or seizures. However, potential negative outcomes of VR use involve more than physical discomfort and fatigue. Experiencing distressing content in VR was shown to produce high levels of emotional stress (Martens et al. 2019; Fadeev et al. 2020). Some users may even experience depersonalization and derealization as a consequence of VR use (Aardema et al. 2010; Peckmann et al. 2022; Barreda-Ángeles and Hartmann 2023). Moreover, as previously discussed in the Introduction, VR use may cause temporary cognitive impairment. Circling back to the concern of VR-induced cognitive impairment as a potential hazard to user safety, in the remaining part of this article we further focus on implications of VR use on cognitive performance—more specifically, reaction time, as this metric was shown to be the most sensitive to VR use (Mittelstaedt et al. 2019). It is important to consider that distinct VR aftereffects are not completely separate from each other, as they often share a common cause or show a significant level of correlation. As such, articles addressing VR-induced changes in reaction time often consider this effect in relation to cybersickness.

In their article on the effects of VR-induced cybersickness on heart rate, cutaneous vascular tone, and reaction time (RT), Nalivaiko et al. (2015) found that SRT increased by 20 to 50 ms following exposure to VR content (rollercoaster simulators). This increase in SRT was primarily attributed to nausea; however, the authors also noted that the intensity of such effects was greatly influenced by the specific characteristics of chosen VR content, such as visual motion cues. Nevertheless, in light of this discovery, the authors expressed their concern about the safety of VR technology. The use of rollercoaster simulators in the exploration of the impact of VR use on reaction time and cybersickness can also be found in the work of Nesbitt et al. (2017). Similar to Nalivaiko et al. (2015), their results indicated that immersion in VR may produce a significant increase in SRT (M = 26.25 ms, SD = 39.92), which correlates with the increase in cybersickness-related symptoms such as nausea.

The effects of VR usage on various measures of cognitive performance were investigated by Mittelstaedt et al. (2019), along with their relationship to cybersickness. While VR exposure did not impair performance in certain cognitive tasks—e.g., Mental Rotation Task and Corsi Block Task—the authors noted a significant change in reaction time, with simple reaction time (SRT) increasing by 17–29 ms after being exposed to VR. The authors offered a number of alternative explanations for this effect, rejecting the notion that cybersickness is the only cause behind the increase in SRT (even though a level of correlation between increased SRT and cybersickness was found). Possible explanations included visuomotor adaptation (based on the work by Fernandez-Ruiz et al. (2011)) and cognitive adaptation to the slight latencies stemming from the use of I/O devices.

Compared to (Nalivaiko et al. 2015; Nesbitt et al. 2017) the VR content used by Mittelstaedt et al. (2019) granted a higher level of control to the user, but all of the above examples display visual motion cues that are likely to produce a strong sensation of vection. In contrast, VR table tennis used in Szpak et al. (2019) and Beat Saber used in Szpak et al. (2020) are both based on mechanics that are strongly reliant on physical movement (namely gross motor movement of the arms), but without artificial self-motion. Using the CANTAB five-choice RT task instead of the commonly used SRT tasks, the authors of Szpak et al. (2019, 2020) were able to distinguish between decision time and motor movement time. No significant differences between post-VR and pre-VR decision or motor movement times were found in Szpak et al. (2020). VR use did not produce a significant change in overall RT in Szpak et al. (2019)—although there was a general increase in decision time, but not movement time—and there was no correlation between RT and cybersickness.

3 Methodology

3.1 Game choice

Table 1 Distinguishing characteristics of chosen games
Fig. 1
figure 1

Snapshots of the chosen games for Study 1 (ac) and Study 2 (df): a Beat Saber; b Order Up VR; c Serious Sam VR: The Last Hope, d Fruit Ninja VR, e Dungeon Brewmaster, f Space pirate trainer

For our test material, we chose to narrow our focus on commercial VR games with a first-person perspective, as well as standing or Room-Scale gameplay (i.e., no in-game locomotion methods). While researchers exploring VRISE often employ test material characterized by vection-producing elements (e.g., rollercoaster simulators), considering that this article aims to shed light on VRISE other than cybersickness, we chose to focus on games that are less likely to induce cybersickness symptoms. Thus, games that produce significant levels of optical flow were excluded as a way of mitigating the potential confounding influence on ambiguous VRISE that are otherwise often attributed to cybersickness, but may actually stem from other causes (e.g., sweating, fatigue, changes in reaction time). On the flip side, games that omit in-game locomotion in favor of standing or Room Scale gameplay are often made to be highly-interactive, leveraging gross physical movement for their core game mechanics. Therefore, they are likely conductive to musculoskeletal pain and discomfort, as well as some device-related effects—i.e., issues which demand further exploration, but are consistently being overlooked in the context of comfort ratings (Vlahovic et al. 2023).

Based on research regarding user preferences in VR gaming (Foxman et al. 2020), we chose games that belong to most frequently downloaded genres—action, shooter, and simulation—while making sure to also include an example of a music/rhythm game, as the authors note that this particular type of game (in addition to the action genre) tends to be especially well-received by VR gamers. Our choice to include games that belong to different genres corresponded with our decision to choose games that differ in terms of interaction mechanics (Vlahovic et al. 2022). Based on this criteria, we included an example of pick-and-place mechanics, as well as two different types of tool-mediated mechanics—a swordplay-based one with targets closer to the player, and a projectile-based one with mostly distant targets. Moreover, considering a relatively short duration of each gaming session, all chosen games had to be simple and beginner-friendly, with easy-to-grasp rules and mechanics.

When choosing games for Study 2, we deliberately focused on games that were similar in genre and/or core interaction mechanics to the games chosen for Study 1. However, we also focused on choosing games that differed in terms of aesthetics, considering whether each game had a colorful, diurnal setting, or a darker, nocturnal setting contrasted with emmissive materials and/or gleaming particle effects. Screenshots of the games we chose for Study 1 (Beat Saber—BS, Order Up VR—OU, Serious Sam VR: The Last Hope—SS) and Study 2 (Fruit Ninja VR—FN, Dungeon Brewmaster—DB, Space Pirate Trainer—SPT) can be seen in Fig. 1, while Table 1 presents their characteristics. Because of their absence of vection-producing elements and according to the industry comfort ratings available at the time of this writing (Meta 2023b; The VR Shop 2023), all chosen games fall into the most comfortable green category of VR games.

3.2 VR hardware choice

The ergonomic effects of VR use are obviously highly dependent on headset design, although we acknowledge that the impact of hardware design factors is beyond the scope of this paper. However, the intensity of those effects can be significantly compounded by movements and postures required by the chosen application. Therefore, focusing on the impact of such software-related factors, we decided to use an identical hardware setup across all gaming sessions. The PC-tethered headset we used—HTC Vive Pro—is a relatively high-end commercial product with a dual AMOLED display at a combined resolution of 2880 x 1600 pixels, a 110-degree field of view (FOV), and a 90 Hz refresh rate. Its design incorporates adjustable straps and headphones, and provides the option of adjusting both lens distance and interpupillary distance. Despite its fairly large size and weight, the headset outperforms a number of other VR HMDs based on a combination of relevant ergonomic factors, as presented in more detail in (Mehrfard et al. 2019), and can thus be expected to provide a reasonably comfortable experience. The HMD was used along with its accompanying Vive controllers.

3.3 Measures

Studies were conducted using a combination of different measures, both subjective and objective, administered before (i.e., baseline measures), during, and immediately following immersion in VR. When choosing methods, we considered the most widely used tools for the evaluation of investigated symptoms.

As noted by Evangelista Belo et al. (2021), the NASA-TLX (Hart and Staveland 1988) and the Borg CR-10 scale (Borg 1998) are among the most relevant tools for the subjective assessment of ergonomic issues (e.g., fatigue, exertion, discomfort) in immersive environments. Even though both tools are general purpose (i.e., can be used for different tasks and in different contexts), the Borg CR-10 scale can easily be adapted to different sensations and contexts, while NASA-TLX requires significant alterations if used for very specific contexts.

The original NASA-TLX has recently been adapted into the SIM-TLX, (Harris et al. 2020), a questionnaire used to obtain evaluations of different dimensions of workload required to perform the given task specifically in the context of simulated environments such as VR. While adopting relevant items of the questionnaire it was based on (e.g., mental, physical, and temporal demand, frustration, task performance), SIM-TLX additionally provides the means to measure dimensions such as perceptual load, task complexity, and task control, which we deemed highly relevant to our test material and research objective. Although the full version of this questionnaire provides a way to obtain the overall workload score using pair-wise comparisons between different dimensions of workload, we chose to omit this weighing process and focus on individual dimensions of workoad, as is often done for NASA-TLX (Hart 2006). Considering we used multiple questionnaires over the course of both studies, we decided on this SIM-TLX modification because it reduced the overall number of evaluations the participants would have to perform.

While SIM-TLX was used to measure subjective workload, the Borg CR-10 scale (Table 2) was used to report different sensations of pain and muscular fatigue in the arms (i.e., shoulders to fingertips) and different regions of the back, as depicted on a simplified anatomical diagram given to each participant as a reference. While corresponding to the areas surrounding cervical, thoracic and lumbar spine, these regions were colloquially described as neck, upper back, and lower back with the aim of facilitating user comprehension. These regions of the body were chosen based on the interaction mechanics of the chosen games, which predominantly required the activation of muscles in the upper body. Furthermore, the Borg CR-10 scale was also used to report the level of discomfort attributed to different aspects of the headset device, with separate items pertaining to discomfort resulting from the headset fit (i.e., the feeling of tightness or looseness), weight, temperature, display quality, and annoyance with the headset cable.

Table 2 The Borg CR-10 rating scale Borg (1998), as used in this study—taken from (Vlahovic et al. 2021)

As discussed in the Introduction and further explored in Sect. 2, the SSQ (Kennedy et al. 1993) is currently the most popular measure of VR-induced cybersickness, as well as VRISE in general. However, recent critiques (Sevinc and Berkman 2020; Hirzle et al. 2021) have highlighted the shortcomings of its use. In our studies, we employed the SSQ to align with the most established practices in the field. However, we also decided to extend our analysis to include state-of-the-art alternatives to the SSQ—CSQ and VRSQ—both of which demonstrated superior validity and reliability compared to the original questionnaire, as shown by Sevinc and Berkman (2020). As previously discussed in Sect. 2, both of these questionnaires encompass a particular subset of SSQ items and their scores can be calculated based on the results of the SSQ, which is what we did in this article.

In terms of evaluating cognitive impairment, we used a desktop computer to conduct the Deary-Liewald RT (DLRT) task (Deary et al. 2011), a free tool which measures simple (SRT, i.e., reaction to a single stimulus) and four-choice (CRT) visual reaction times. Widely employed for this purpose, the same tool has been utilized in several other studies addressing VR-induced changes in reaction time (Nalivaiko et al. 2015; Nesbitt et al. 2017; Mittelstaedt et al. 2019), providing a useful benchmark for interpreting our results in a broader context.

Items pertaining to the overall prevalence of VRISE and the most bothersome symptom (i.e., items pertaining to RQ5 and RQ6) were not included in Study 1, but were added in Study 2 to provide additional context for the analysis of our results. To avoid exhausting our participants with an extensive list of all possible VRISE, while still opting for pre-defined options to facilitate subsequent analysis, we provided a shortlist of symptoms that seemed to be most pronounced based on a short pilot study. However, participants were allowed to add additional symptom(s) of their choice. Participants were also inquired about their willingness to continue playing each game. For Study 2 we asked those inclined to terminate the experience to disclose a qualitative explanation for their choice.

It is important to note that there are several measures that were used in one or both studies, but were eventually omitted from the paper. This includes the results of both studies pertaining to the Deary-Liewald CRT task, the Game Experience Questionnaire (GEQ) and the overall Quality of Experience ratings, given on a 5-point Absolute Category Rating scale. Moreover, for Study 2, we collected heart rate data, collected using the Fitbit Charge 3 fitness tracker device. Given the large number of collected measures, we focused on the results we consider to be most relevant to the research objective. The results included in this article are therefore summarized in Table 3, along with their mapping to individual research questions.

Table 3 VRISE and measures addressed in the article

3.4 Procedure

Fig. 2
figure 2

Flow diagram depicting the complete study procedure for each participant—the same core methodology was followed for both studies, but several additional steps were used only in Study 2

To prevent symptoms from building up over time, each participant was asked to engage in three separate gaming sessions (one per game, in randomized order), with each session scheduled for a different day. The entire process is outlined in Fig. 2. All sessions took place in a laboratory and were supervised by an administrator. Participants completed a pre-study questionnaire at the start of the first session, providing the necessary personal information (age, sex, level of education, etc.), as well as self-assessments of their experience with VR and gaming in general.

In each of the three episodes, prior to entering VR, participants completed the SSQ and provided ratings of their current levels of muscle fatigue and pain. Baseline measurements of RT and heart rate were also collected at this time. Subsequently, participants received instructions on using the HTC Vive Pro system, from adjusting the headset straps and the inter-pupillary distance, to handling the Vive controllers.

Following these procedures, participants were warned about the possibility of cybersickness and other uncomfortable symptoms that may appear as a consequence of engaging in a VR session. They were encouraged to pause or completely terminate the experiment if they experienced significant discomfort. Participants were then familiarized with the rules and control mappings of the specific game they were about to play and entered VR for a brief tutorial session. After grasping the game mechanics, participants began the 20-minute test session. This duration was chosen to reflect a realistic episode of VR gaming based on reported average duration of VR use (Statista 2019). Moreover, the episodes of this length remain within the recommended time limits for uninterrupted VR use (Meta 2023a) and VR user testing sessions (Drachen et al. 2018).

Participants’ heart rate was continuously recorded throughout the entire test session, while the remaining measures were collected immediately after the session ended. Upon exiting the session, participants were first instructed to retake reaction time tests. Subsequently, they were asked to fill out the SSQ and report their post-VR evaluations of pain and muscular fatigue, as well as their discomfort levels attributed to different aspects of the headset device. Finally, they answered questions regarding their willingness to continue playing, the prevalence of VRISE, and the most bothersome VRISE experienced.

It is important to acknowledge that both studies were conducted amid the COVID-19 pandemic. To ensure participant safety throughout the process, we implemented stringent hygiene and safety protocols. Before each session, participants were required to provide written consent in which they expressed compliance with the enforced safety measures. Both participant and administrator were required to wear masks and maintain appropriate interpersonal distance at all times. To alleviate physical discomfort, participants were permitted to remove their masks during the gaming session, which took place in a spacious laboratory designed to facilitate safety during the physical activities required by the games. During this time, the administrator would exit the room and continue to supervise the player through see-through glass walls of the laboratory. All experiments were conducted in a well-ventilated space, with all surfaces and devices thoroughly cleaned and disinfected between sessions. Additionally, participants were provided with single-use headset covers.

3.5 Participants

Studies have indicated that the degree of VR-induced discomfort may vary between users of different demographics. For instance, a user’s age may impact their susceptibility to sickness (Reason 1968; Stanney et al. 2002; Brooks et al. 2010), as well as the overall workload necessary to keep up with the game, along with a potential increase in pain and muscle fatigue that may come as a result of a more advanced age. One-size-fits-all approaches to game design may overlook the needs of individuals with disabilities, and a non-flexible approach to in-game object sizing could pose challenges for users whose dimensions deviate from the average. While our studies primarily focused on the impact of game factors on player discomfort—thereby largely excluding an in-depth exploration of issues related to VR hardware design and sizing—it is important to note that, broadly speaking, headset design may also be better suited to large males (Stanney et al. 2020), compared to smaller adults or children. Additionally, research suggests that female users may exhibit greater sensitivity to cybersickness (Stanney et al. 2020) compared to their male peers.

Initial sample size estimates were calculated using the G*POWER software (Faul et al. 2007). We assumed a large effect size (r = 0.60) with 80% power, at a significance of p = 0.05. Based on these parameters, a total sample size of 20 participants is required for the Wilcoxon signed-rank test with matched pairs. The decision to consider a large effect size was made with the goal of prioritizing effects that are not only statistically significant, but also hold practical significance in terms of impact on user experience. However, it is important to note that the scale of our studies was also constrained by the specific conditions of the COVID-19 pandemic. Given the overall diversity of the broad population, because of the limited samples in our studies, we decided to take on the homogeneous sampling approach, as it would provide clearer generalizability (Jager et al. 2017). By intentionally selecting participants who were alike in terms of age, health, and disability status across both studies, we are better able to compare their results and draw more specific conclusions. The participants in our studies were young adults, healthy, and without disabilities, representing a demographic expected to be least vulnerable to the negative effects of VR use. It is likely that any negative impacts of VR gaming would be even more pronounced in virtually any other demographics. Therefore, because our study participants did not encounter the obstacles faced by many others regarding VR device design and motor requirements of VR gaming, the extent of VR gaming-related health and safety issues as described in this paper can arguably be considered the “best” case scenario. The characteristics of participant samples across both studies are presented in Table 4.

Table 4 Characteristics of participant samples across both studies

4 Results

Due to the volume of conducted tests, we primarily focused on reporting results that were found to be statistically significant. The analysis of all quantitative data is reported as follows. For measures administered only following VR use (i.e., those pertaining to workload and device-related discomfort), we report the results of a within test (Friedman test followed by a post hoc analysis with Wilcoxon signed rank tests and a Bonferroni correction) with the game choice as an independent variable with the following levels: BS, OU and SS for Study 1, and FN, DB, and SPT for Study 2 (e.g., in Study 1, we tested for differences between physical workload scores following BS, OU and SS). For each factor that was measured both before and after VR exposure (pain and muscle fatigue, cybersickness, simple reaction time) we report the results of a within test (Wilcoxon signed rank test) with exposure to a specific game as an independent variable with pre-VR and post-VR levels (e.g., we tested for differences between muscle fatigue in BS pre-VR and muscle fatigue in BS post-VR).

4.1 Workload

4.1.1 Results

The SIM-TLX questionnaire was used to assess the levels of Mental demand (MENT), Physical demand (PHYS), Temporal demand (TEMP), Frustration (FRUS), Task complexity (CMPX), Situational stress (STRS), Distractions (DIST), Perceptual load (PERC), Task control (CONT), and Total workload (SUM) after each game. We collected assessments from all participants in both studies. However, due to data corruption issues we were only able to analyze workload data reported by 12 out of 20 participants in Study 2. Note, no corruption occurred for Study 1 (i.e., data from all 20 participants was included in the calculations) and all other data (aside from corrupted SIM-TLX scores) was collected and calculated for all 20 participants in Study 2.

As explained in the previous section, to simplify the scoring process, we omitted the weighting method explained in Harris et al. (2020). Instead, we focused on individual dimension ratings reported on the 21-point Likert scale, transforming them to a 0–100 range with 5-point steps, as is often done with NASA-TLX (Hart 2006). To calculate the SUM score we aggregated the number of points on the Likert scale (i.e., points in the 0–21 range as opposed to the 0–100 range) for all individual dimensions of workload, resulting in the overall score of up to 189 points.

When comparing the mean SUM score, the Friedman test did not show a significant difference (p > 0.05) between BS (M = 52.00, SD = 21.57), OU (M = 65.25, SD = 23.32), and SS (M = 56.40, SD = 25.24) in Study 1. Likewise, the same test did not identify any significant differences (p > 0.05) between FN (M = 68.17, SD = 23.06), DB (M = 61.83, SD = 23.27), and SPT (M = 70.42, SD = 31.30) in Study 2.

Fig. 3
figure 3

Heat map depicting the mean SIM-TLX scores (and their standard deviations) obtained for individual dimensions of workload (different background shades represent mean score categorization)

Mean SIM-TLX scores for both studies are presented in Fig. 3. In line with the NASA-TLX interpretation scale reported by Prabaswari et al. (2019), we categorized the mean scores as either Low (0–9), Medium (10–29), Somewhat high (30–49), High (50–79), or Very High (80–100).

Comparing individual dimensions of workload for Study 1, the Friedman test followed by a post-hoc analysis with Wilcoxon signed-rank tests and Bonferroni correction identified the following significant findings.

PHYS (\(\upchi ^{2}\)(2) = 10.50, p = 0.005, W = 0.27) was significantly higher in BS compared to OU (Z = \(-\)2.50, p = 0.01, r = \(-\)0.56). TEMP (\(\upchi ^{2}\)(2) = 12.67, p = 0.002, W = 0.32) was significantly higher in OU compared to both BS (Z = \(-\)2.72, p = 0.006, r = \(-\)0.61) and SS (Z = \(-\)2.41, p = 0.02, r = \(-\)0.54). FRUS (\(\upchi ^{2}\)(2) = 10.63, p = 0.005, W = 0.27) was significantly higher in OU compared to BS (Z = \(-\)3.14, p = 0.002, r = \(-\)0.70). DIST (\(\upchi ^{2}\)(2) = 8.19, p = 0.02, W = 0.20) was significantly higher in OU compared to SS (Z = \(-\)2.45, p = 0.01, r = \(-\)0.55). CONT (\(\upchi ^{2}\)(2) = 12.97, p = 0.002, W = 0.32) was significantly higher in OU compared to both BS (Z = \(-\)3.16, p = 0.002, r = \(-\)0.71) and SS (Z = \(-\)2.40, p = 0.02, r = \(-\)0.54).

Comparing individual dimensions of workload for Study 2, the Friedman test followed by a post-hoc analysis with Wilcoxon signed-rank tests and Bonferroni correction identified the following significant findings. PHYS (\(\upchi ^{2}\)(2) = 14.44, p < 0.001, W = 0.60) was significantly lower in DB compared to both FN (Z = \(-\)2.55, p = 0.01, r = \(-\)0.74) and SPT (Z = \(-\)2.98, p = 0.003, r = \(-\)0.86). TEMP (\(\upchi ^{2}\)(2) = 13.31, p = 0.001, W = 0.55) was significantly higher in FN compared to DB (Z = \(-\)2.94, p = 0.003, r = \(-\)0.85). CMPX (\(\upchi ^{2}\)(2) = 8.97, p = 0.011, W = 0.37) was significantly higher in SPT compared to FN (Z = \(-\)2.68, p = 0.007, r = \(-\)0.77). STRS (\(\upchi ^{2}\)(2) = 9.52, p = 0.009, W = 0.40) was significantly higher in OU compared to SS (Z = \(-\)2.45, p = 0.01, r = \(-\)0.81)..

4.1.2 Discussion

Statistically significant differences in SUM scores were not found between games in either study. When we observed the results for separate dimensions of workload, the mean ratings did not exceed a High level of workload for either game. Overall, the majority of mean ratings could be described as either Somewhat high or Moderate with regards to intensity of workload.

All games except for BS and SS (games that did not receive any High ratings) obtained a High score in a single dimension: PHYS for SPT, TEMP for OU and FN, and CONT for DB. While both OU and FN were rated as High in TEMP, based on their gameplay we can conclude that they received those scores for different reasons—whereas FN required a series of quick, precise, discrete movements, OU demanded a number of more complex actions (i.e., cooking and assembling food) to be performed in a relatively short period of time. MENT was the only dimension for which all games received a similar mean score, suggesting that manual, isomorphic, controller-based games tend to produce a Somewhat high level of mental workload regardless of their core mechanics.

In both studies, pick-and-place games scored significantly lower on the PHYS dimension compared to swordplay-based games, while DB also scored significantly lower compared to SPT, likely due to the more modest requirements in terms of gross motor movement. Although both swordplay-based games achieved a very similar PHYS score, the categorization of OU (High) and DB (Moderate) scores, as well as SS (Somewhat high) and SPT (High), into different categories based on their mean scores may suggest that games with similar core mechanics should not necessarily be expected to require similar levels of physical exertion. The imposed workload likely varies based on factors such as target placement, temporal constraints, or secondary mechanics. However, in our case, it is important to note that participants in Study 1 and Study 2 were different, which may have influenced their scores more than these hypothesized factors.

Interestingly, all games in Study 1 were deemed similar in terms of CMPX. In Study 2, SPT was rated as more complex compared to the other games (although this was only statistically significant when compared to FN, not DB), likely because of the frequency of offensive and defensive actions required from the player and the accompanying level of physical activity.

It is also worth noting that OU and DB both received the highest mean CONT score in their respective studies (although this difference compared to the other games was only statistically significant for OU). While games with pick-and-place mechanics may be less demanding in terms of gross motor movements compared to more action-packed games, they are more demanding in terms of fine motor movements, specifically wrist movements necessary for orienting and aligning virtual objects. Furthermore, these mechanics are non-mediated (Vlahovic et al. 2022). With tool-mediated mechanics, VR controllers serve as a physical, tangible representation of virtual tools. With both sword-like weapons and guns, both orientation and girth of handheld controllers generally match the orientation and girth of virtual tool handles to a sufficiently realistic degree, meaning that player’s physical hands correspond very well with their virtual hands. The positioning and behaviour of virtual hands in case of non-mediated mechanics is not as straightforward, requiring substantial adaptation to different angles on the player’s part, which is further complicated by the properties of virtual objects that need to be grabbed, repositioned, and interacted with.

4.1.3 Key findings

  • Tested games predominantly received Moderate and Somewhat high scores across various dimensions of workload, although occasional High scores were also noted;

  • Games that share genres and mechanics should not be expected to necessarily produce similar degrees of workload;

  • All tested games produced Somewhat high levels of Mental workload regardless of core mechanics;

  • Pick-and-place games are the least physically demanding of the three compared mechanics.

  • While less physically demanding, pick-and-place games scored higher in terms of task complexity (although this was only statistically significant in Study 1) compared to other tested games, likely because of the required fine motor movements (i.e., pick-and-place games).

4.2 Pain and muscle fatigue

4.2.1 Results

Table 5 Wilcoxon Z-score comparing post-VR and pre-VR pain and muscle fatigue scores (* p < 0.05, ** p < 0.005, *** p < 0.001; presented alongside accompanying effect sizes)

We conducted Wilcoxon signed-rank tests to determine if there were significant differences between post-VR results and baseline measurements. Neck pain was the only symptom that did not increase significantly for either game. All other results are shown in Table 5, with arm fatigue and arm pain scores reaching statistically significant increases after every game except for OU.

On average, playing each of the VR games resulted in an increase in pain and muscle fatigue. However, the mean post-pre difference was less than one point on the Borg-CR10 scale for all body parts except for arms. Overall, the mean increase in arm fatigue was 1.5 (SD = 1.40) for BS, 0.25 (SD = 1.48) for OU, and 1.2 (SD = 2.16) for SS for Study 1. For Study 2 it was 1.95 (SD = 1.56) for FN, 0.85 (SD = 1.15) for DB, and 1.75 (SD = 2.19) for SPT. The mean increase in arm pain was 0.95 (SD = 1.40) for BS, 0.05 (SD = 0.87) for OU, and 0.95 (SD = 1.60) for SS for Study 1. For Study 2 it was 1.25 (SD = 1.55) for FN, 0.65 (SD = 0.79) for DB, and 1.15 (SD = 1.35) for SPT.

It is important to note, however, that participants’ experiences with fatigue and pain were quite varied and inconsistent. A significant percentage of participants did not report any changes in the intensity of symptoms, and a minority even experienced improvements—for Study 1, the number of participants (out of 20) who indicated a decrease in symptom intensity ranged from 0 to 6, depending on the symptom/game. For Study 2, improvements were noted in up to 3 participants (out of 20) per symptom/game. However, when observing only those participants who did experience an increase in symptoms (more detail on the frequency of occurrence of particular symptoms in Sect. 4.5.1), the reported intensity of symptoms increased by 2 or more points on the Borg scale (on average) for all games except for DB and SPT. While the majority of participants only experienced increases of 1 to 2 points, for each game except for OU we found a small number of individuals who reported worrying levels of musculoskeletal discomfort and fatigue. For example, following the FN gaming session six different participants (30%) reported increases of 4–5 points on the Borg scale for at least one of the symptoms in this symptom group. Likewise, five participants (25%) experienced increases of at least 4 and up to 7 points following the SPT scenario.

4.2.2 Discussion

Arm fatigue and pain were the only significantly increased symptoms for SS and DB. The only symptom that significantly increased for OU was neck fatigue, likely due to participants needing to turn their head more, considering the 360-degree horizontal play angle. Although both swordplay-based games are very similar in terms of mechanics, which caused significant increases in arm pain and fatigue for both, BS resulted in significantly increased upper back pain, while FN caused significantly increased lower back fatigue. A possible explanation for this result lies in differences in target behavior and positioning. In FN, the player is expected to hit multiple targets in a single swing of the blade, with target objects spawning sporadically along the circular arc on the floor before the player. Expelled target objects shoot up vertically and fall under the effect of gravity, each of them moving along its individual path. This means that different objects often happen to be placed at different heights at the same time. The comparatively wider horizontal play angle and diagonal slashes necessary to hit multiple targets at once may require rotations that start from the waist and therefore affect the muscles of the lower back, as opposed to BS, in which targets move in a more predictable manner and are more constrained in terms of horizontal and vertical positioning. With BS, however, the rhythmic nature of the game calls for a higher frequency of upper body movement, possibly affecting upper back muscle pain. The game that seemed to trigger the most diverse range of ergonomic symptoms was SPT, which produced statistically significant increases in arm fatigue, arm pain, upper back fatigue, lower back fatigue, and lower back pain. To avoid enemy attacks, users had to frequently bend over or crouch down, stressing their lower back. Continuously keeping their arms raised at an awkward angle whilst shooting at enemies positioned above eye level is the likely cause behind increased upper back fatigue.

The minority of participants who reported some level of slight musculoskeletal discomfort even prior to VR use, but later noted a decrease in the intensity of pain and muscle fatigue, may have experienced analgesic effects that were shown to occur with VR use (Triberti et al. 2014). This effect may have been compounded by the fact that the games we used (especially swordplay-based games and SPT) can be considered active VR games due to their physical demands, which likely enhances any analgesic effects of VR alone (Evans et al. 2021b). Nevertheless, some participants still experienced significant increases in pain and fatigue in different muscle groups. Studies have shown that light exertion in VR games is preferable to passive gameplay (Yoo et al. 2018), and there are many benefits to VR design that promotes physical activity. However, despite only several participants reporting significant worsening of musculoskeletal discomfort for each game, it is important to consider the small scale of our study. If our results are in any way representative of a wider range of existing or potential consumers, whereby just 20 min of gameplay (played at a non-challenging level) can produce musculoskeletal pain/fatigue increases of 4–5 points on the Borg CR-10 scale (e.g., causing discomfort to increase from non-existent to somewhat strong or strong) in 30% of young, healthy FN players, or increases of 4 up to 7 points in 25% SPT players (e.g., causing discomfort to increase from non-existent to somewhat strong or very strong), then this is not only a health-and-safety issue that needs to be communicated to potential consumers, but also a serious problem for the VR gaming industry, implicating potential losses in terms of both player-base and revenue. It is important to note that muscle fatigue following physical activity may not be necessarily perceived as an undesirable effect, as we are not able to discern whether the perception of fatigue in our studies was positive or negative based on the employed measures. However, when talking about pain, even increases as low as 1–2 points indicate the need for further improvements in ergonomic design of VR games and reconsideration of existing comfort-rating systems.

4.2.3 Key findings

  • Significant increases in muscle pain and/or fatigue were noted for all games, with the highest number of significantly increased symptoms reported for SPT;

  • Games with similar mechanics are likely to trigger muscle fatigue and pain in different muscle groups in cases of different target placement;

  • Analgesic effects of VR gaming may have been experienced by some participants in both of our studies;

  • 25% of SPT players and 30% of FN players experienced worrying symptom increases of 4 or more points on the Borg CR-10 scale after 20 min of gameplay;

  • Our results indicate the need for further improvements in VR gaming ergonomics and new comfort-rating guidelines.

4.3 Device-related discomfort

4.3.1 Results

Figure 4 presents the mean Borg CR-10 scores for discomfort caused by weight of the HMD, its temperature, fit (i.e., HMD feeling too tight or too loose) and display quality, as well as annoyance with the HMD cable, considering we were using a tethered headset.

Participants struggled with adjustable straps of the HMD, with HMD tightness appearing to be a more bothersome issue compared to HMD looseness. However the scores for fit-related issues remained below the moderate category for both studies and there were no significant differences in fit-related scores between the games.

For both studies, the shooter game scenario produced the highest level of weight-related discomfort, although Friedman test only showed significant differences between games for Study 2 (\(\upchi ^{2}\)(2) = 6.82, p = 0.03, W = 0.17), as post hoc analysis with Wilcoxon signed-rank tests and Bonferroni correction showed a statistically significant difference between SS and BS (Z = \(-\)2.69, p = 0.007, r = \(-\)0.60).

A significant difference between games (\(\upchi ^{2}\)(2) = 10.05, p = 0.006, W = 0.25), more specifically between SS and BS (Z = \(-\)2.75, p = 0.006, r = \(-\)0.62), was found for HMD temperature in Study 1. For the same study, Friedman test also showed a statistically significant difference in discomfort attributed to display quality (\(\upchi ^{2}\)(2) = 8.60, p = 0.01, W = 0.22), as post hoc analysis confirmed a significant difference between OU and BS (Z = \(-\)2.43, p = 0.02, r = \(-\)0.54), as well as SS and BS (Z = \(-\)2.96, p = 0.003, r = \(-\)0.66).

Annoyance with the HMD cable was the only device-related VRISE that was shown to significantly differ between games for both studies. For Study 1, the Friedman test (\(\upchi ^{2}\)(2) = 21.03, p < 0.001, r = 0.53) followed by post hoc analysis showed significant differences between OU and BS (Z = \(-\)3.13, p = 0.002, r = \(-\)0.70), as well as SS and OU (Z = \(-\)3.56, p < 0.001, r = \(-\)0.79). For Study 2, the Friedman test (\(\upchi ^{2}\)(2) = 10.44, p = 0.005, W = 0.26) followed by post hoc analysis confirmed statistically significant differences between DB and FN (Z = \(-\)2.48, p = 0.01, r = \(-\)0.55), as well as between SPT and FN (Z = \(-\)3.14, p = 0.002, r = \(-\)0.70).

Fig. 4
figure 4

Mean Borg CR-10 ratings (95% CI) for discomfort and annoyance triggered by different device factors (Study 1 – left, Study 2—right)

4.3.2 Discussion

Even though identical hardware was used for the entirety of both studies, the intensity of device-related symptoms varied between games, with differences in experiencing particular symptoms even reaching statistical significance. This is to be expected as the overall experience of device-related VRISE occurs as a result of combining specific hardware features with specific motor requirements of the particular application.

Annoyance with the cable was the only device-related VRISE that significantly differed between games in both studies. Playing a game with a 360-degree play angle (as is the case with OU) may result in players getting tangled in the long cable—an effect that does not occur with purely front-facing games. Similarly, the footwork-intense secondary mechanics of avoiding incoming projectiles in SPT occasionally resulted in users tripping and stepping on the cable. The safety implications of these findings are relevant for game developers, hardware manufacturers and user experience researchers alike. However, the rapidly growing popularity of portable HMDs confirms that cable-related hazards can be expected to become less of an issue for future generations of VR devices.

While this difference between games was only statistically significant for Study 2, HMD weight was most bothersome in the shooter gaming scenarios of both studies. This is likely due to target positioning and behaviour—targets in shooter games are more dynamic, characterized by appearing and disappearing, moving in unpredictable ways, and attacking the player. As a result of their dynamics, the player is constantly scanning the environment, which often includes not only ocular movements, but head/neck movements as well. Furthermore, in both SS and SPT—but especially SPT—targets are placed above the player’s eye-level, requiring the player to adjust the pitch orientation of their head. Thankfully, this did not seem to produce significant increases in reported neck pain, but we consider it to be the most fitting explanation for the increase in perception of HMD weight and related discomfort.

Generally speaking, although many physical features of VR HMDs can be assessed in a fully objective manner (e.g., weighing the HMD, noting its resolution, measuring the range of adjustable parts), our results highlight the importance of including subjective metrics and different scenarios of usage when attempting to evaluate the quality of specific devices. Likewise, our results show that it makes sense to include device-related metrics of discomfort in studies evaluating different tasks or applications, even if they are using the same hardware setup. Analyzing VRISE that occur using different combinations of HMDs and software, especially active games, is likely to be beneficial to hardware manufacturers, software developers, and—above all—to users. Such approaches may serve as a potential step toward preventing injuries resulting from the unbalanced combination of heavy devices and in-game requirements for fast-paced movements, which is considered to be the likely cause behind the fractured vertebra in the case described by Baur et al. (2021).

4.3.3 Key findings

  • Despite using the exact same hardware, device-related symptoms occurred at varying intensities depending on the game, even resulting in statistically significant differences between games;

  • Annoyance with the HMD cable was the only symptom that significantly differed between games in both studies, possibly due to different play angles and footwork requirements;

  • Discomfort attributed to HMD weight was most pronounced in shooter games, supposedly because of target placement (above eye-level) and dynamic target behaviour;

  • Measures of device-related discomfort should be included when evaluating VRISE during various scenarios of HMD usage—insights gained during such experiments may contribute toward preventing VR-induced injuries.

4.4 Cybersickness

4.4.1 Results

Table 6 Mean post-pre differences (PPD) in symptoms (presented alongside accompanying standard deviations) and Wilcoxon signed-Rank test (WSRT) Z-scores comparing post-VR and pre-VR symptom scores (* p < 0.05, ** p < 0.005, *** p < 0.001; presented alongside accompanying effect sizes)
Table 7 Mean post-pre differences (PPD) in symptom group scores (presented alongside accompanying standard deviations) and Wilcoxon signed-Rank test (WSRT) Z-scores comparing post-VR and pre-VR symptom group scores (* p < 0.05, ** p < 0.005, *** p < 0.001; presented alonside accompanying effect sizes)

To gain a better understanding of the impact of tested VR games on the intensity of cybersickness, we chose to analyze VR-induced differences in individual symptoms before moving onto further calculations. Mean differences between post-VR and pre-VR ratings for individual SSQ symptoms, as well as Wilcoxon signed-rank test results comparing post-VR and pre-VR ratings, are presented in Table 6. Overall, games in Study 2 produced significantly larger increases in reported symptoms compared to games in Study 1. While statistically significant increases in Study 1 were found only for 2 to 3 symptoms per game, in Study 2 they were found for 7 (both FN and DB) and 10 (SPT) out of 16 symptoms in the SSQ.

None of the symptoms experienced a statistically significant post-VR increase for every single tested game. Statistically significant increases, however, were found following the majority of tested games (i.e, 4 out of 6 games) for these particular symptoms: general discomfort, eye strain, difficulty focusing, and blurred vision. Sweating significantly increased following 3 games, while statistically significant increases were found for 2 out of 6 games for fatigue, headache, fullness of the head, and dizziness with eyes open as well as closed. No significant increases were found for salivation, nausea, stomach awareness, and burping.

We further calculated the scores for different symptom groups. SSQ scores were calculated as explained in (Kennedy et al. 1993) (note, the formula we used to calculate SSQ-T included the brackets that were missing from the original publication, as described by Bimberg et al. (2020)), CSQ scores were calculated as explained in (Stone III 2017), and VRSQ scores were calculated based on (Kim et al. 2018). As we calculated different dimensions of cybersickness for the three questionnaires, post-VR increases in individual symptom ratings were compounded as multiple symptoms were joined together and multiplied with weighing factors. This resulted in statistically significant increases in calculated SSQ dimensions, as well as CSQ and VRSQ dimensions, especially for games in Study 2. Mean differences between post-VR and pre-VR ratings for calculated symptom groups, as well as Wilcoxon signed-rank test results comparing post-VR and pre-VR scores, are presented in Table 7. It is worth noting that both total cybersickness scores (SSQ-T and VRSQ-T) showed statistically significant increases following VR gameplay for all tested games across both studies.

4.4.2 Discussion

A higher number of significant increases of reported symptoms for all games in Study 2 compared to Study 1 are likely attributable (at least in part) to different participant populations. Although similar in terms of gender and age distribution, according to self-reported data, participants in Study 2 reported greater susceptibility to motion sickness and cybersickness compared to participants in Study 1. Moreover, participants in Study 1 were more experienced gamers, and even slightly more experienced with the use of VR, which could have led to cybersickness adaptation.

In addition to general discomfort—a rather vague symptom—symptoms that statistically increased for the highest number of games were predominantly those that are related to eyes and vision (eye strain, difficulty focusing, blurred vision). Consequently, all six of the tested games produced statistically significant increases in all individual questionnaire dimensions that used at least three of these four symptoms in their calculation—SSQ-O, VRSQ-O, CSQ-DF. Two of these symptoms were also present in the calculation for SSQ-D, which yielded statistically significant increases for 5 out of 6 games. Notable increases in intensity following VR gameplay were found for the majority of remaining symptoms included in the calculation of the SSQ-D score, although they only reached statistical significance for either one or two out of the six games, depending on the symptom.

An equally high number of statistically significant increases (i.e., 5 out of 6 games) was found for the SSQ-N dimension. Based on this score, one would expect that participants in our studies struggled with gastrointestinal symptoms—nausea, burping, salivation, and stomach awareness. However, it was quite the contrary, as these symptoms happened to be the least prominent of all symptoms measured by the SSQ. Instead, increases in SSQ-N scores can be attributed to significant increases in general discomfort and sweating, with both symptoms being somewhat open to interpretation.

Considering that the original questionnaire was designed to be used with simulators and not active games, the inclusion of sweating in the SSQ presumably pertains primarily to diaphoresis, i.e., increased sweating which is not caused by physical activity or environmental factors (temperature, humidity). In case of an otherwise stationary VR game that has vection-producing locomotion mechanics and is being played in comfortable environmental conditions, a notable increase in sweating could indeed be classified as diaphoresis, and as such it would indicate the onset of cybersickness. However, with physically active standing/room-scale games determining whether this symptom comes as a result of exertion or sickness—or some combination of the two — is not as straightforward. Moreover, sweating may also occur as a normal physiological response to increased HMD temperature during a prolonged period of usage.

As for our results, we found that the post-VR increase in sweating did not correlate significantly with total cybersickness (calculated according to the VRSQ-T formula because of its independence from the influence of sweating) for either study, but positive correlation was found between sweating and PHYS (SIM-TLX) for both Study 1 (rs = 0.25, p = 0.046) and Study 2 (rs = 0.46, p = 0.004). A significant positive correlation between sweating and discomfort related to HMD temperature was found only for Study 2 (rs = 0.43, p < 0.001). With this in mind, it can be assumed that, in our case, the post-VR SSQ-N score was likely a result of increased physical activity or headset temperature rather than cybersickness.

Both CSQ and VRSQ omitted the ambiguous sweating item present in the SSQ, which makes both of them more resistant to the confounding influence of physical exertion that occurs during active VR gaming, and thus possibly more suitable for use in studies such as ours. However, as explained by Yoo et al. (2018), sweating has a notable negative impact on user experience with active VR games, and as such we still feel that including it as an item in questionnaires evaluating VRISE could provide valuable information. A potential solution to the ambiguity of this symptom is to separate it into different symptoms, similarly to the distinction made in the Motion Sickness Assessment Questionnaire (MSAQ) (Gianaros et al. 2001), which separates the feelings of being hot and sweaty from cold sweat/clamminess. Similarly, finding alternative ways to better define other ambiguous symptoms such as fatigue or general discomfort—which could also be influenced by physical exertion alone—may aid researchers with identifying the etiology of certain symptoms.

As previously mentioned, sweating is one of the SSQ symptoms that were excluded from both CSQ and VRSQ. Both questionnaires also exclude increased salivation, stomach awareness, and burping—symptoms that did not increase significantly for any of the tested games—as well as difficulty concentrating, which was only significant for SPT. Based on our results, these items did not contribute to valuable findings regarding VRISE during active VR games with no in-game locomotion, so we do not consider their inclusion necessary, although we note that their importance would likely increase in case of more vection-provoking games. Although there are some differences between the two SSQ-variations, the majority of symptoms that were most influenced by VR gaming were included in both versions.

4.4.3 Key findings

  • Games in Study 2 produced a more considerable amount of significant increases in cybersickness symptoms, likely due to a more sensitive population of participants;

  • Certain symptoms measured in the SSQ may increase in intensity because of reasons other than cybersickness and their inclusion in further calculations may produce misleading results (e.g., sweating caused by physical exertion increases the SSQ-N score);

  • The majority of symptoms with most notable post-VR increases — general fatigue, eye strain, difficulty focusing, and blurred vision—are included in the CSQ, as well as the VRSQ;

  • Gastrointestinal symptoms (nausea, burping, increased salivation, stomach awareness)—which were, for the most part, ommited from both SSQ variations—were also the least significant symptoms in both of our studies.

4.5 Overall prevalence and ranking of reported VRISE

4.5.1 Results

Although we evaluated the occurrence of different VRISE through individual specialized questionnaires, we wanted to provide participants with the opportunity to evaluate their experience as a whole. Therefore, for Study 2, we asked participants to report all VRISE experienced during gameplay, as well as the single most bothersome VRISE for the particular session. While a pre-defined list of VRISE was provided for participants to choose from, they were also encouraged to add their own answers in case they experienced other VRISE. Only one participant decided to add a non-predefined option (“feeling uncomfortable and uneasy because of cable-related issues”), only in case of DB. The overall prevalence of VRISE (reported as the overall percentage of participants) for each session is depicted on the heat map in Fig. 5. Nausea was the only symptom that was not reported by any participant for any game. Broadly speaking, the most prevalent VRISE were muscle fatigue in the arms and back, eye strain, thermal discomfort, and HMD tightness, however, in certain cases the prevalence of individual symptoms was highly dependant on the game (e.g., arm muscle fatigue ranging from 5% for DB to 70% for FN).

Figure 6 presents percentages of participants who reported each symptom/effect as the most bothersome of all experienced VRISE. Except for eye strain, symptoms commonly attributed to cybersickness were generally not considered to be the most bothersome for any of the three games tested in this study. Instead, the largest number of participants (6) reported arm muscle fatigue as the most bothersome symptom for FN, with the majority of other participants choosing either HMD tightness, eye strain or arm pain. Similar results were obtained for SPT, although a larger percentage of participants reported thermal discomfort, while none opted for arm pain. The results for DB were more distinguished, with the majority of participants choosing eye strain, followed by HMD tightness and back pain.

Fig. 5
figure 5

Heat map depicting the overall prevalence of VRISE during gameplay (darker color indicates higher percentage)

Fig. 6
figure 6

Heat map depicting the percentage of participants who reported a particular symptom/effect as the most bothersome of all VRISE (darker color indicates higher percentage)

Furthermore, one of our research goals in Study 2 was to examine whether VRISE influenced the participants’ intention to continue playing. An overview of participants’ willingness to terminate the gaming session or continue playing is presented in Fig. 7. Participants who reported they would not be willing to continue playing after the 20 min session, as well as those who stated they wished that the session was terminated sooner, were asked to specify the reasons behind their preference (i.e., reasons for game termination). The collected qualitative data was analyzed and each answer was coded as belonging to one or more of the three broad categories: VRISE, playability issues (PI) and technical issues (TI).

Among the five answers collected after the FN session, one included only PI, two included a combination of VRISE and PI, and two included only VRISE as a reason for game termination. Thus, the mention of VRISE was present in four out of five answers. For SPT, the mention of VRISE was present in all four obtained answers, two of which also included PI. The results differed for DB. Five out of 10 answers only mentioned PI, a single answer only mentioned TI, and two answers mentioned only VRISE. The remaining two answers involved a combination of TI and VRISE. Thus, the mention of VRISE was present in four out of ten answers. Combining all 19 answers across the three games, we found 18 distinct mentions of different VRISE as reasons for game termination. We categorized these mentions as follows (the number in parentheses represents the the number of mentions): muscle fatigue/muscle pain (5), overall fatigue (4), headache (3), eye strain (3), general discomfort (2), and disorientation (1).

4.5.2 Discussion

When examining the overall prevalence of VRISE, we can conclude that VR gaming triggered a diverse set of uncomfortable symptoms, although many were experienced only by a minority of participants. Certain symptoms exhibited similar overall prevalence regardless of the game in question, such as back muscle fatigue, neck pain, and general discomfort, while others varied significantly between games depending on their mechanics. For instance, arm muscle fatigue was reported by a single player in DB, whereas 14 out of 20 players in FN experienced it.

Regarding the most bothersome symptom, there was less diversity in the results, as several symptoms—such as neck muscle fatigue, nausea, headache, and general discomfort—were either highlighted by a single participant or none at all. However, we did not identify a specific symptom that was overwhelmingly chosen as the most bothersome. Instead, our results varied between games and individual participants, highlighting the diversity of the VR experience. It is still worth mentioning that HMD tightness, arm muscle fatigue, and eye strain stood out as more frequently reported compared to other symptoms.

Similarly to Hirzle et al. (2021), our results indicated that symptoms of cybersickness (aside from eye strain, which is included in the SSQ) did not stand out as either the most frequent or the most bothersome VRISE for the tested games. Since cybersickness is often associated with vection, it is not unusual for it to be less pronounced in games with no in-game locomotion, such as those used in our study. However, knowing that the SSQ is often used as a primary measure of VRISE (Hirzle et al. 2021)—even when evaluating games without in-game locomotion (e.g., Szpak et al. (2020, 2022))—we found it important to highlight that other VRISE (which are generally less researched) may pose a larger threat to the comfort and safety of VR gamers compared to cybersickness. Therefore, these should be included in studies addressing VRISE, particularly when examining user experience with physically active games.

Reported reasons for game termination included both VRISE and playability issues for all three games, while technical issues were only reported for DB. According to our results, VRISE were the leading (i.e., most frequently reported) cause behind the participants’ desire to terminate the gaming experience for the two more physically demanding games (FN and SPT). Based on their frequency of occurrence in the pooled results (i.e., combined reports from all three games), musculoskeletal symptoms (muscle pain and fatige) and general fatigue were the main culprits. The intensity, as well as the prevalence, of aforementioned VRISE is expected to increase as play time surpasses 20 min (i.e., gaming session duration across both studies), but even at such a short interval, these findings support the need for further advancements. However, a significant majority of participants in FN and SPT did not feel the need to terminate the game following the 20 min gaming session, which is encouraging.

4.5.3 Key findings

  • Playing a VR game with manual, isomorphic, and controller based mechanics for 20 min may produce a wide diversity of symptoms and effects, but their overall prevalence varies between games;

  • With regards to the most bothersome symptoms, results differ between games as well as players, but arm muscle fatigue, HMD tightness, and eye strain appeared to be among the most dominant VRISE;

  • Cybersickness symptoms (as measured by the SSQ) do not stand out as either most frequent or most bothersome—other VRISE may be more disruptive to the player experience, indicating the need for further research;

  • VRISE (especially musculoskeletal symptoms and fatigue) are a common reason behind the desire to terminate a 20 min gaming session.

Fig. 7
figure 7

Willingness to terminate game/continue playing following each 20-min gaming session

4.6 Reaction time

4.6.1 Results

For Study 1, all games except for SS resulted in a longer post-VR SRT, with an average post-pre difference of 14.95 (SD = 27.44) and 22.53 (SD = 25.67) for BS and OU, respectively, while the mean difference for SS was minimal (M = 1.15, SD = 20.16). However, the only game that resulted in a statistically significant increase in SRT was OU (Z = \(-\)3.17, p = 0.002, r = \(-\)0.71). According to the Friedman test, no statistically significant difference was found in post-pre differences for the three games (\(\upchi ^{2}\)(2) = 3.90, p = 0.14, W = 0.10).

Similarly, for Study 2, two of the games resulted in a notable mean increase in post-pre differences—DB (M = 29.98, SD = 28.87) and SPT (M = 18.68, SD = 35.85)—with a negligible mean difference for FN (M = 1.65, SD = 27.81). Again, only the pick-and-place game (DB) produced a statistically significant increase in SRT (Z = \(-\)3.51, p < 0.001, r = \(-\)0.78). Comparing the calculated post-pre differences between games, the Friedman test (\(\upchi ^{2}\)(2) = 6.7, p = 0.04, W = 0.17) followed by post hoc analysis with Wilcoxon signed-rank tests and Bonferroni correction showed a statistically significant difference between DB and FN (Z = \(-\)2.98, p = 0.003, r = \(-\)0.67).

It is important to note that changes in reaction speed vary across players, and an individual player’s cognitive performance may also vary greatly after exposure to different games/mechanics. Overall, the majority of participants experienced an increase in RT after VR. This effect was more pronounced for pick-and-place games, as 80% of users experienced varying increases in RT after OU, and 90% after DB. For some individuals, VR exposure resulted in an RT increase of up to 93 ms for DB and 112 ms for OU and SPT. However, a significant number of participants showed faster responses after certain game scenarios. Post-VR RT decreases were noted in 45% of participants following SS and 50% of participants following FN, and certain individuals experienced post-VR RT improvements of up to 40–50ms following SS, FN, and SPT.

Table 8 presents Spearman correlation coefficients between post-pre differences in SRT and workload and its individual dimensions. Several statistically significant moderate to strong correlations were found across different games and different dimensions of workload. However, BS was the only game for which we found statistically significant correlations between post-pre SRT difference and multiple dimensions of workload, as well as the total workload score. After calculating Spearman correlation coefficients between post-pre differences in SRT and post-pre differences in overall cybersickness and its subscales (calculated for SSQ, VRSQ, and CSQ), the only statistically significant correlation (rs = 0.58, \(p = 0.007\)) was the one between post-pre SRT difference and post-pre SSQ-N difference for the OU scenario.

Table 8 Spearman correlation coefficients (r) between post-pre difference in SRT and different dimensions of workload for each game

4.6.2 Discussion

The overall magnitude and statistical significance of RT changes varied based on genre/mechanics. Mittelstaedt et al. (2019) listed adaptation to latency introduced by the system as one of the potential reasons for VR-induced impairment of reaction speed, which may explain why playing pick-and-place games resulted in slower reaction times. Zindulka et al. (2020) have shown that throwing an object in the real world is superior to throwing it in VR in terms of both precision and accuracy. Although the motion of the ejected projectile was found to be physically plausible, participants in their user study reported problems with the timing of releasing the projectile from the virtual hand, which likely affected their throwing performance. This slight delay can be attributed to the action of releasing the grip of the controller trigger, as it takes some time for this action to be physically performed by the user and subsequently registered by the system. Even though participants in our studies did not perform throw mechanics, both pick-and-place games we used were similar in terms of controller mappings for grabbing and releasing the target (i.e., pulling and releasing the controller trigger, respectively). As opposed to pick-and-place games, the use of trigger press and release in shooters is more in line with the corresponding real-life action of shooting, while playing swordplay-based games did not involve the use of discrete controls (i.e., triggers and buttons) at all. Moreover, both shooters and swordplay-based games used mediated interaction mechanics, with the controller serving as a physical substitute for a tool/weapon. With pick-and-place mechanics as realized in our setup, the controller was used for tracking the virtual hand which is supposed to represent the player’s own hand, but their alignment is far from perfect. In their attempt to control the movement of their virtual hand despite this visuomotor disturbance, participants needed to alter their reaching movements to compensate for the misalignment which could have resulted in prolonged RT (Fernandez-Ruiz et al. 2011; Mittelstaedt et al. 2019).

While these theories provide possible explanations for the increase in RT following pick-and-place simulation games, highlighting the genre’s effect on cognitive performance compared to other genres, we are unable to provide a cohesive explanation for cognitive performance changes (or lack thereof) following exposure to other tested mechanics. The comparatively unimpaired post-SS reaction time can potentially be explained by previous work showing that playing shooter games during training sessions causes a significant reduction in reaction time (RT) compared to playing a control game (Dye et al. 2009), although any effects noted in our work were measured immediately following short-term exposure, as opposed to a long multi-episodic training period.

Another aspect that may have influenced our results is the temporary acceleration of reaction speed which was shown to occur shortly after a period of exercise or active VR gaming (Audiffren et al. 2008; Szpak et al. 2020), potentially explaining why BS and FN caused smaller impairments compared to OU and DB. Both theories, however, fail to explain why partaking in SPT, a shooter as well as a physically demanding game, produced a fairly large mean increase in RT. Nevertheless, our results indicate VR-induced changes in reaction speed can not be explained only by cybersickness/visual motor cues, but are rather diverse in magnitude and dependant on factors such as interaction mechanics, control modality, and workload imposed by the particular VR application. Moreover, the implications that pick-and-place tasks produce the most significant changes in RT can be explored in future research, and potentially be used as a benchmark for the naturalness and the overall quality of a particular implementation of pick-and-place mechanics. It would also be interesting to see whether using hand tracking or a controller that facilitates more natural grasp/release actions (e.g., Valve Index) would lead to improvements in RT for games with pick-and-place mechanics.

Although changes in reaction time following VR exposure are commonly attributed to cybersickness (Nalivaiko et al. 2015; Nesbitt et al. 2017), this explanation may be better suited for VR applications that present the users with very obvious motion clues and less control over their locomotion. As we used games based on frequent manual interaction but without in-game locomotion, our results are more in line with sources that highlight the influence of other factors (Mittelstaedt et al. 2019; Szpak et al. 2020). According to our results, workload measurements obtained using the SIM-TLX questionnaire reveal more significant relationships with changes in SRT in comparison to different SSQ-based measures of cybersickness.

4.6.3 Key findings

  • Overall, the majority of participants experienced an increase in RT after VR;

  • Playing shooters and physically active VR games may result in decreased SRT in some users;

  • In both studies, only games with pick-and-place mechanics resulted in a statistically significant increase in SRT;

  • While we list several possible theories behind SRT increases that occur after pick-and-place games, we are not able to provide a full explanation for this effect;

  • Changes in SRT correspond more significantly with SIM-TLX workload scores compared to different SSQ-based measures of cybersickness;

5 Discussion

5.1 Overview of key findings

When observing the mean ratings of workload, we found that tested games predominantly received scores that can either be categorized as Moderate or Somewhat high across different dimensions of workload. While certain trends can be noticed, such as pick-and-place games receiving lower PHYS scores and higher CONT scores compared to other games, any differences between games are likely due to a broader range of game-related factors, as opposed to just being explained as a definite effect of game genre and primary game mechanics. However, notably high workload scores obtained in both studies suggest the importance of measuring this aspect of user experience.

Our results reveal that only 20 min of gameplay may be enough to cause significant increases in muscle pain and fatigue, especially in the arms. Notably, this increase was most pronounced in SPT, which caused significant increases in all measured symptoms except for neck fatigue and upper back pain. Contrary to our initial expectations, games sharing primary game mechanics may not trigger muscle fatigue and pain in the same muscle groups. This suggests that factors beyond primary game mechanics, such as secondary mechanics and target placement, may serve as more significant predictors of these symptoms. Notably, certain participants actually experienced a decrease in symptoms following VR use, which may be explained by analgesic effects of VR use (Triberti et al. 2014). However, a significant percentage of SPT (25%) and FN (30%) players experienced worryingly high increases in symptom intensity. These effects are particularly concerning when considering that the participants in question were young, healthy individuals without situational or permanent disabilities. Moreover, these effects occurred as a result of a fairly short VR exposure, i.e., a period of time that was considerably shorter than the recommended episode duration of 30 min (Meta 2023a). Thus, our results indicate the need for further improvements in VR gaming ergonomics and new comfort-rating guidelines.

Despite using the same hardware for all games in both studies, device-related symptoms occurred at varying intensities, with differences between games occasionally reaching statistical significance. This was especially evident in case of annoyance with the HMD cable, as play angles and footwork requirements varied between games, putting the participants in potentially hazardous situations of getting tangled or almost tripping over the cable. We theorize that target placement and behaviour may have been the reason for HMD weight being perceived as more uncomfortable in shooter games compared to other games due to specific neck postures, which ties back to the works of Chihara and Seo (2018) and Penumudi et al. (2020). Therefore, we stress that measures of device-related discomfort should be included when evaluating VRISE during various scenarios of HMD usage, even if the HMD itself remains the same between scenarios. In general, results obtained for ergonomic symptoms, whether pertaining to muscle pain and fatigue or device-related discomfort, signal the need for further exploration of these factors, especially with the use of objective measures, such as the XRgonomics toolkit for the facilitation of ergonomic 3D interfaces (Evangelista Belo et al. 2021).

Participants in Study 2 may have been more susceptible to cybersickness compared to participants in Study 1, despite being comparable in age and gender. Although the games used in our studies were chosen based on their lack of vection-inducing elements, participants still experienced considerable increases in multiple symptoms. However, it is possible that some of the more ambiguous symptoms may have been caused by reasons other than cybersickness, such as physical exertion being the possible cause for excessive sweating, misleadingly inflating the SSQ-N score even though participants did not report significant increases in gastrointestinal symptoms. Our findings seem to be more in line with the rationale behind novel measures of cybersickness, which eliminated several symptoms present in the SSQ. In fact, the majority of symptoms with the most notable post-VR increases in our studies (general fatigue, eye strain, difficulty focusing, and blurred vision) were included in both CSQ and VRSQ, while both omitted the symptoms that were found to be the least prominent in our studies (i.e., nausea, burping, increased salivation, stomach awareness).

In general, playing standing or Room Scale games with manual, isomorphic, and controller-based mechanics may produce a wide range of symptoms and effects, even with only 20 min of VR use. However, the overall prevalence of individual symptoms tends to vary between games. When participants were asked to report the most bothersome VRISE, they mentioned varying symptoms, but arm muscle fatigue, HMD tightness, and eye strain appeared to be among the most dominant. These findings support the results of Hirzle et al. (2021), who found that both digital eye strain and ergonomic symptoms tend to be more pronounced compared to VR-induced symptoms, at least for the gaming context. Moreover, our results indicate that VRISE are a common reason behind the desire to terminate a 20-minute gaming session. In addition to the already highlighted need for a more in-depth investigation of ergonomic symptoms, these results call for a more detailed analysis of ocular symptoms occurring following VR gaming, as our methodology did not include a broad range of measures pertaining to ocular symptoms. For example, our methodology could be extended to include measures such as saccadic activity and blink rate, realized with the use of eye-tracking capabilities of contemporary HMDs.

Finally, the majority of participants experienced an increase in RT following VR; however, this increase was not statistically significant for either shooters or swordplay-based games. In fact, some participants even experienced a slight decrease in RT following VR use, possibly due to temporary acceleration of reaction speed following physically demanding activities. In both studies, the only game that produced a statistically significant increase in SRT was the game with the pick-and-place mechanics. The etiology of this effect remains unclear. However, our results indicate that changes in SRT correlate more significantly with subjective measures of workload rather than cybersickness, and are possibly also dependant on factors such as interaction mechanics and control modality. Even though VR-induced changes in reaction time remain negligible for most participants, it is important to note that some experienced post-VR increases as high as 93–112 ms.

5.2 Limitations and Future Work

There are several clear limitations with both of our studies. Concerning participant recruitment, both studies were limited in terms of sample size. Regarding the conducted power analysis, the deliberate choice to focus on large effect sizes was driven by our intention to ensure robust and easily detectable findings within the limited sample size. However, we acknowledge that it may limit the generalizability of our findings, as smaller effects may not have been fully captured in our analysis. Therefore, some of the null findings may not reflect a true absence of effect, but rather insufficient power to detect smaller effects—a limitation that needs to be considered when interpreting our results.

Most participants in our studies were inexperienced with VR; our results are therefore more representative of the ways in which VR beginners—rather than frequent users—experience VRISE in the short term. Playing each game for a longer duration of time would likely increase the intensity of most symptoms, while playing each game over multiple sessions could have potentially led to a decrease in certain symptoms as participants acclimate to the platform. Additionally, while we deliberately chose participants based on their demographics, their experience with VR is not representative of the experiences of other demographics, such as children, the elderly, and people with disabilities, injuries or other health issues that put them at an increased risk of VRISE. Furthermore, while we aimed to enroll similar demographics of participants in both studies, it is possible that different groups of participants differed in terms of cybersickness susceptibility, pain tolerance, personality traits and personal preferences, which is why it was not possible to confidently compare the results of the two studies.

In terms of game choices, the titles used as test material during our research represent only a small fraction of VR games. There are many other genres to explore, as well as a variety of games that incorporate different tasks, interaction mechanics, and methods of in-game locomotion. Other potentially relevant aspects, such as aesthetics, social presence, and narrative, were not explored in our paper. Additionally, compared to the remaining four games, both Order Up VR and Dungeon Brewmaster were more prone to technical issues and little bugs that—although not game-breaking—may have further impaired user experience during gameplay.

Another limitation that we need to acknowledge is the VR hardware used in both studies. Despite its well-balanced weight distribution, HTC Vive Pro is still a fairly heavy headset, tethered to a desktop computer with a long cable that further contributes to its weight while also posing a safety hazard. Furthermore, according to Mehrfard et al. (2019), this device can reach a very high temperature, and is therefore likely to produce significant thermal discomfort. A lighter, portable headset such as Oculus Quest 2, which also features a better display resolution, may result in a more comfortable experience, at least in terms of device-related factors. Moreover, considering that the Oculus Quest 2 is by far the most commonly used headset at the time of this writing (according to the Steam Hardware & Software Survey: January 2023 (Steam 2023), it may be considered a representative choice for future experiments.

Lastly, it is important to note the large number of questionnaires participants were asked to complete, which could have led to fatigue. Using a single questionnaire which examines multiple dimensions of the user experience would thus be preferable to our approach. Unfortunately, while there are questionnaires that encompass multiple aspects of the user experience with VR applications, e.g., (Tcha-Tokey et al. 2016), as well as those that are geared more toward VR games, e.g., (Kourtesis et al. 2019), the ones we considered did not incorporate a broad spectrum of VRISE we wanted to include in our investigation. Therefore, further efforts could be made toward developing different versions of questionnaires to use with different types of applications. In the context of designing such questionnaires, we believe that further studies investigating a wider range of VRISE (such as the studies reported in this paper) may aid in separating items that are relevant to the user from those that could be excluded if necessary.

As for future work, our goal is to analyze the remaining data that has not been reported in this article, as described in Sect. 3. We are further interested in exploring SRT as a measure that could potentially be used to evaluate the fidelity of pick-and-place game mechanics implementations. Finally, we aim to leverage the findings from this article to further specify individual parameters of game mechanics (e.g., play angle, target placement, pacing, mismatch between real and virtual hand) that may influence player behaviour in a manner that could reduce their comfort and safety. Once these parameters are specified, we plan to conduct additional user studies to acquire more specific insights that could be beneficial to content creators. We intend to streamline this process by using simplified questionnaires and recording task performance measures.

6 Conclusion

Over the years since its conception, manufacturers of VR technology have faced challenges related to its impact on user comfort. One of the biggest and most persistent challenges is in preventing cybersickness, a state characterized by symptoms such as disorientation, nausea, and oculomotor difficulties. A substantial body of work has focused on understanding the causes of cybersickness and proposing solutions, which have been incorporated into guidelines and best practices for modern VR developers to follow. However, the heightened focus on cybersickness and its underlying factors has overshadowed research into other symptoms and effects that may arise from VR use. These include musculoskeletal discomfort, issues with headset design, and impact on cognitive performance, particularly relevant for VR scenarios involving significant physical activity.

While numerous articles in the field have focused on individual symptoms of VR use, our aim was to paint a more complete picture of the user experience by extending our focus to explore a number of different VRISE at once, from workload and musculoskeletal issues, to device-related discomfort, cybersickness, and changes in reaction time. When collecting and analyzing data, we chose to incorporate state-of-the-art VR-specific measures (SIM-TLX, CSQ, VRSQ) as opposed to the more frequently used questionnaires (NASA-TLX, SSQ). Our methodology was used to evaluate a total of six games (three per study) that fit a pre-defined set of standards (i.e., standing or Room-scale games with controller based isomorphic controls, belonging to popular genres with commonly used mechanics).

Conducting two studies with a total of 40 participants, we discovered that 20 min of VR gameplay led to statistically significant increases in various VRISE. Specifically, VR gaming resulted in a moderate to somewhat high workload, contributing to muscle fatigue and/or pain in the arms and back. Different game mobility requirements led to varying levels of device-related discomfort, even with the same Head-Mounted Display (HMD) used in both studies. Post-VR increases in some of the symptoms related to cybersickness symptoms were notable (e.g., oculomotor symptoms), but ambiguous effects like sweating raise concerns about the suitability of the original SSQ for evaluating active VR gaming scenarios. Overall, exposure to VR was shown to increase reaction time by a small margin in case of pick-and-place games suggesting a possible link with workload.

Results presented herein were collected during short VR sessions, using a small sample of fairly homogenous and inexperienced individuals, and a limited set of games. Extending beyond those individual limitations in our future work would likely lead to more relevant conclusions. However, findings obtained in such a controlled setting helped us articulate the possible connections between specific game elements and VR-induced discomfort, to be explored in future work. Further dissecting each of these elements (e.g., play angle, target placement, pacing, mismatch between real and virtual hand) and evaluating their parameters in future studies may provide insights that could assist developers in designing experiences that strike a balance between fun and challenging while also prioritizing the comfort and safety of their users.