1 Introduction

Virtual reality (VR) usage has become widespread in the last decade thanks to the development of different advances that have provided it with new levels of realism and enabled its use in multiple fields of application (Servotte et al. 2020; Stanney et al. 2020). For example, in construction engineering VR has been used for the visualization of designs and architecture, training in health and safety in construction, training in equipment and operational tasks, as well as in structural analysis (Wang et al. 2018). Similarly, medical students can perform surgical practices in a safe virtual space where they can interact with various anatomical structures (Li et al. 2017). Also, exposure techniques carried out with VR are being of great help for the intervention of phobias such as agoraphobia, acrophobia, claustrophobia, social phobia, among others (Botella et al. 2017; García-Batista et al. 2020). Along these lines, two recent meta-reviews explored the use of VR in clinical psychology and found evidence of the long-term effectiveness of VR for the treatment of anxiety disorders, pain management, and weight and eating disorders (Riva et al. 2016, 2019).

As Glaser and Schmidt (2021) point out in their literature review, there is no established and field-recognized operationalization of what defines VR. This is further compounded by the fact that many studies do not define the term at all, creating confusion among researchers. For this study, we will focus on the use of VR through head-mounted displays (HMDs) and will employ the definition provided by Glaser and Schmidt (2021), which integrates previous definitions of the term to provide a clear and detailed operationalization:

VR is “a model of reality with which a human can interact, getting information from the model by ordinary human senses such as sight, sound, and touch and/or controlling the model using ordinary human actions such as position” (Hale and Stanney 2014, p. 34) and typically includes a digitally simulated three-dimensional space that can induce sensations of telepresence (Miller and Bugnariu 2016) including both the physical sensations delivered through computer generated sensory stimuli and the psychological sense of feeling ‘there’ within a computer-generated virtual environment (Slater et al. 2009; Steuer 1992). (p. 2)

The potential applications and contributions of VR, as well as the mass market adoption of HMDs, however, are currently being hindered by the appearance of temporary side effects on its users such as nausea, dizziness, and headaches (Teixeira and Palmisano 2020). These side effects, generally known as cybersickness, can have a negative impact on the user’s well-being due to the discomfort they cause (Rebenitsch and Owen 2016). According to the empirical evidence, 60–95% of the people who are exposed to VR environments through HMDs experience some level of cybersickness, with approximately 5–13% ending their exposure prematurely due to the intensity of symptoms (Caserman et al. 2021; Sharples et al. 2008; Stanney et al. 2020), and in some cases with rates of abandonment of more than 50% (Dennison et al. 2016; Martirosov et al. 2021).

Given that cybersickness remains a common user problem even with current-generation HMDs (Caserman et al. 2021; Yildirim 2020), Stanney et al. (2020) proposed an updated cybersickness research and development (R&D) agenda aimed at reducing the cybersickness problem and accelerating the mass adoption of immersive technologies. This “2020 cybersickness R&D agenda” recommended, among other items, that for the short to medium term cybersickness research should aim to develop a better understanding of the magnitude of the cybersickness problem and how it manifests and varies across individuals, of its implications to mass adoption of the VR technology, and to create predictive models of cybersickness. At the same time, researchers have raised concerns that the current knowledge of cybersickness is based on a large proportion of underpowered studies with very small samples (de Araújo et al. 2019; Lanier et al. 2019; Weech et al. 2019). For example, Caserman et al. (2021) performed a meta-analysis of cybersickness with modern HMDs and selected 49 publications containing 57 samples. Of these, more than half (57.9%) had samples of 25 or less participants, and the great majority (87.7%) had samples sizes of less than 50 cases. Indeed, only one sample was sufficiently powered to detect medium size correlations, as recommended in the cybersickness literature (≥ 85; Weech et al. 2019). This is problematic because studies with small samples are more prone to be rejected for publication or not even submitted if they produce negative results, exacerbating the publication bias in the literature (Button et al. 2013; Ferguson and Heene 2012). Also, small-sample studies have more ‘vibration effects’ (i.e., different analytical decisions by the researcher can result in large changes in the estimated effects), are likely to have poorer design quality, use poorer or more convenient (i.e., to obtain a desired positive result) analytical strategies, and have less quality control than studies with larger samples (Button et al. 2013; Friese and Frankenbach 2020). Thus, knowledge on cybersickness needs to be updated with studies based on more robust samples.

Aside from the problems related to small samples, another issue undermining the cybersickness literature is the limited number of studies with modern consumer-oriented HMDs, which have been in use since 2013 (Caserman et al. 2021; Grassini and Laumann 2020a, b). For example, in the literature review conducted by Weech et al. (2019) to examine the relationship between presence and cybersickness, out of the 19 studies identified composing 21 samples, only one used a modern HMD. In fact, many of the studies identified by Weech et al. (2019) did not even use HMDs, instead using projection screens and LCD screens, among other technologies. These factors, combined with small samples, could help explain the large variability in results found by Weech et al. (2019) with multiple studies finding large positive correlations, large negative correlations, and null correlations, between presence and cybersickness. Further, cybersickness research has been conducted on very heterogeneous VR environments/programs, with many constituting games with fast-moving objects or sequences such as rollercoasters (e.g., Davis et al. 2015). Less is known about cybersickness in psychotherapeutic environments where users do not experience these extreme conditions.

Another area of cybersickness research that can be advanced is related to the experience of cybersickness during the VR immersion. In the last decade, the advent of validated single item cybersickness scales has facilitated the study of the trajectories or temporal evolution of cybersickness during the VR immersion (Keshavarz and Hecht 2011; Teixeira and Palmisano 2020; Weech et al. 2019). Although this research has found an overall positive relationship between cybersickness and exposure time (Keshavarz and Hecht 2011; Teixeira and Palmisano 2020), the shape and variability of the cybersickness trajectories across individuals has yet to be explored using appropriate multilevel or growth curve models that allow the parameters to vary across participants. Thus, using these models would provide a greater understanding of the cybersickness phenomenon and how it varies across individuals during the VR immersion. Importantly, these multilevel and growth curve models can accommodate sets of covariates that are posited to explain the individual trajectories, vital information to developing effective screening measures and preventive protocols, particularly those aimed at preventing the premature termination of the VR immersion due to cybersickness.

Based on the Stanney et al.’s “2020 cybersickness R&D agenda” as well as the outlined limitations related to the current cybersickness literature, we designed the present study with three main objectives: First, to estimate the pervasiveness of cybersickness and its latent trajectory during the VR immersion. Second, to examine the relationship between cybersickness and key variables of the VR experience such as virtual presence, perceived enjoyment, and the user’s behavioral intention of using VR in the future. Third, to determine if susceptibility to motion sickness and other physical and mental health variables could predict the onset and trajectories of cybersickness. To respond to these objectives, two virtual environments designed to treat obsessions and compulsions related to cleaning were employed using Oculus Rift VR devices. This was done in the context of a sufficiently large sample size in line with current recommendations in the literature (de Araújo et al. 2019; Lanier et al. 2019; Weech et al. 2019). To contextualize the present research, the rest of the introduction is organized as follows: first, we define cybersickness and outline its possible causes. Second, we describe the cybersickness symptomatology. Third, we summarize the effects of cybersickness on the virtual reality experience. Fourth, we examine cybersickness susceptibility. Finally, we enumerate the hypotheses tested in the present study.

1.1 Definition and causes of cybersickness

Cybersickness has been present since the beginning of VR use (Davis et al. 2014). It can be defined as a “constellation of symptoms of discomfort and malaise produced by VR exposure” (Weech et al. 2019, p. 4). Cybersickness is typically categorized as a form of visually induced motion sickness, which constitutes any sickness produced by the observation of visual motion (Weech et al. 2019). Although cybersickness can be symptomatically like motion sickness and simulator sickness, these phenomena are caused by different types of exposure. Motion sickness occurs when people travel in moving vehicles (e.g., cars, airplanes, boats, etc.). Simulator sickness, on the other hand, results from experiences in simulators (e.g., flight simulators) that map virtual movements in the simulator to actual movements of the simulation platform (Davis et al. 2015). Thus, while there might be a perceived discrepancy between the simulator’s motion and that of the virtual vehicle, it is much smaller than the discrepancy related to cybersickness where the person is observing movements in the virtual world while remaining stationary (Davis et al. 2015).

Although the exact causes underlying the appearance of cybersickness are unknown, there are different theories that try to explain this phenomenon. Currently, the most widely accepted are the sensory conflict theory, the posture instability theory, and the poison theory (LaViola 2000; Mousavi et al. 2013; Stanney et al. 2020). Briefly, the theory of sensory conflict explains that cybersickness arises because the exposure to VR causes a conflict between the vestibular system of the inner ear and the other senses, mainly sight (Davis et al. 2014). On the other hand, the theory of posture instability proposes that the visual alterations produced by VR, such as accelerations and rotations, and the contrast between the virtual environment and the real one in which the user is, reduce the stability of the posture, a factor that according to this theory is essential for the human being and that when affected gives way to cybersickness (Davis et al. 2014; Dennison and D’Zmura 2017; LaViola 2000). The poison theory is based on the idea that experiences with VR influence the visual and vestibular system in a similar way to when a toxic substance is ingested, which produces confusion in the brain, making it think that the body is being intoxicated and causing a bodily process aimed at eliminating the supposed substances, which result in cybersickness (Davis et al. 2014; Mousavi et al. 2013).

Different characteristics of HMDs have been found to be related with the experience of cybersickness. Field of view (FOV) measures the extent of the observable world that is seen at any given moment of the VR experience. A wide or unrestricted FOV may provide a greater sense of immersion and presence, but can increase cybersickness (Saredakis et al. 2020; Teixeira and Palmisano 2020). In this regard, dynamic FOV restriction (contraction of the FOV when the user moves or is simulated to move) has been found to reduce cybersickness (Fernandes and Feiner 2016; Teixeira and Palmisano 2020). Framerate refers to the rate at which the frame is changed in a video (which consists of a collection of still pictures), while latency or lag is the delay between the user input and the displayed output by the VR device (Lee et al. 2020). A low framerate can cause flickering, which can lead to eye fatigue and headaches, whereas a large latency can increase visual-inertial sensory conflicts, leading to more intense levels of cybersickness (Lee et al. 2020; Kim et al. 2021; Palmisano et al. 2017). On the other hand, vection is the ability of HMDs to generate illusions of self-motion by stimulating either the visual or non-visual senses (Kim et al. 2021). Previous studies have found a positive relationship between vection and cybersickness, although the mechanisms for this relationship appear to be complex, with negative relationships emerging in some cases (Palmisano et al. 2017).

1.2 Cybersickness symptomatology

A variety of research studies indicate that visual fatigue, headaches, dizziness, nausea, and disorientation are the most prevalent symptoms of cybersickness (Davis et al. 2014; McHugh 2019; Nesbitt et al. 2017). Although these symptoms are temporary, some effects may linger for two or more hours (Rebenitsch and Owen 2016). Although current-generation HDMs devices such as the Oculus Rift and HTC Vive have significantly fewer problems with cybersickness, it remains an important issue (Caserman et al. 2021; Yildirim 2020). Additionally, HMDs produce more cybersickness than VR experiences through other mediums such as desktop and projection display systems (Sharples et al. 2008; Yildirim 2020). Regarding the moment of their first appearance, some VR studies with HMDs indicate that users begin to experience them after 10–15 min of exposure (Caserman et al. 2021). However, the literature shows that this time can vary, as in the cases of Dennison et al. (2016) and Martirosov et al. (2021) where half of the participants using HMDs had to abandon the VR immersion before 10 min due to the symptoms of cybersickness. Additionally, Keshavarz and Hecht (2011) measured cybersickness at constant one-minute intervals and found a positive relationship between VR exposure time and the intensity of cybersickness, which increased from the first minutes of VR use. Rebenitsch and Owen (2021) later showed with data from several studies that cybersickness followed a linear trajectory across VR exposure time, although the slope of the line varied considerably across experiments.

Even though certain symptoms seem to be common among individuals, in general the symptoms of cybersickness are wide and can vary from person to person, a factor that, together with other variables, such as diversity in technologies and the personal characteristics of users, can become difficult to study (Rebenitsch and Owen 2016; Saredakis et al 2020). In this regard, instruments such as the Simulator Sickness Questionnaire (SSQ; Kennedy et al. 1993), which has been frequently used to measure the symptoms of cybersickness, may present limitations due to a possible confounding with anxiety levels prior to the virtual reality immersion. This makes it difficult to determine up to which point a user suffers from characteristic symptoms of anxiety not related to the VR experience or to actual symptoms of cybersickness resulting from the immersion in the virtual environments (Bouchard et al. 2009; Pot-Kolder et al. 2018). Due to this confounding, measuring cybersickness before and after VR immersion has been recommended (Pot-Kolder et al. 2018).

1.3 Effects of cybersickness on the virtual reality experience

Virtual presence is a term used to describe the subjective and psychological impressions experienced by VR users of transporting themselves out of reality and into a virtual environment (Kober and Neuper 2013; Schuemie et al. 2001; Servotte et al. 2020; Weech et al. 2019). Cummings and Bailenson (2015) explain that the formation of virtual presence is a process in which, first, the user must be able to perceive the virtual space as admissible, and then, be able to experience it feeling that they are located within said space. For years, being able to achieve a good virtual presence has been considered the main objective of VR developers, since it allows the experience to feel real and credible to the users (Grassini and Laumann 2020a, b; Weech et al. 2019). However, cybersickness may act as a barrier in achieving this sense of virtual presence due to symptoms such as nausea, eyestrain, headaches, disorientation, etc., that can make users lose their concentration and engagement in the VR experience (Grassini and Laumann 2020a, b; Melo et al. 2017). In this line, Weech et al. (2019) concluded from their literature review that cybersickness and virtual presence were negatively related.

Within the VR field, perceived enjoyment indicates how pleasant, pleasurable, and satisfactory the experience of using VR is to the user (Balog and Pribeanu 2010; Shen and Eder 2009). Regarding this variable, the literature indicates that being able to achieve a good degree of virtual presence is related to a greater perceived enjoyment and making the experience more attractive (Sylaiou et al. 2010; Tussyadiah et al. 2018). On the other hand, perceived enjoyment also has a positive relationship with the user’s behavioral intention (Karjaluoto and Leppaniemi 2013; Wu and Liu 2007), which can be defined as the degree to which a person has conscious plans to carry out or not carry out a specific action in the future (Warshaw and Davis 1985). Indeed, a person will be more motivated to repeat an activity that they have enjoyed (Norazah and Norbayah 2011). Regarding the role of cybersickness in relation to these variables, Lin et al. (2002) and Yildirim (2019) reported a negative relationship between the appearance of cybersickness symptoms and the perceived enjoyment of VR. Similarly, it has been found that experiencing cybersickness can negatively affect the user’s behavioral intention to use VR in the future, even when the experience has been enjoyed (Hildebrandt et al. 2018).

1.4 Cybersickness susceptibility

An individual differences variable that can predict the experience of cybersickness is susceptibility to motion sickness (Golding 2006; Golding et al. 2021; Howard and Van Zandt 2021; Pot-Kolder et al. 2018; Rebenitsch and Owen 2021). In this regard, Mazloumi Gavgani et al. (2018) point out that, in advanced stages, cybersickness and motion sickness can be considered the same from a medical perspective. This suggests that motion sickness susceptibility can be used to predict which users will suffer from cybersickness. Indeed, Nesbitt et al. (2017) in their review of the literature found that having a history of suffering from motion sickness (e.g., on boats, planes, trains, roller coasters, etc.) increases the chance of experiencing cybersickness. Further, Golding et al. (2021) found that susceptibility to motion sickness was a unique predictor of cybersickness in a multivariate model that considered a variety of individual differences variables.

Other risk factors related to the physical and mental health of users have also been identified as predictors of cybersickness. Among these are: propensity to migraines, general poor health, sleep problems, anxiety, and having a phobia, among others (Bockleman and Lingum 2017; Golding et al. 2021; Howard and Van Zandt 2021; McHugh 2019). Additionally, on their systematic review and meta-analysis Saredakis et al. (2020) found that older samples (≥ 35 years) reported lower scores of cybersickness than younger samples, although they noted that there were limited studies available with older users. Also, Saredakis et al. (2020) did not find differences in cybersickness across sex, a finding that was also in line with the conclusions of Stanney et al. (2020). In contrast, the meta-analysis of Howard and Van Zandt (2021), which used different methodologies, did not find a relationship between age and cybersickness, and found that women experienced greater cybersickness than men.

1.5 The present study

Based on the findings presented in the previous sections of the manuscript, we postulated seven relevant hypotheses related to the cybersickness phenomenon. As recommended in the literature (e.g., Weech et al. 2019), we aimed to collect a sufficiently powered sample to detect effects that can be considered of medium size in the behavioral sciences. It is important to point out that we were interested not only in determining if these hypotheses were supported by the data, but if they were, to ascertain the magnitude of the effects to better assess their relevancy. The seven hypotheses we postulated were:

H1:

Cybersickness scores after VR immersion will be higher than at baseline.

H2:

Cybersickness will be positively related with exposure time on the virtual environments.

H3:

Cybersickness will be negatively related with virtual presence.

H4:

Cybersickness will be negatively related with the perceived enjoyment of the VR experience.

H5:

Cybersickness will be negatively related with the behavioral intention of using VR in the future.

H6:

Cybersickness will be positively related with the susceptibility to motion sickness.

H7:

Cybersickness will be positively related with poorer general health.

2 Materials and methods

2.1 Sample size calculation

Because hypotheses H2 to H7 implied bivariate relationships, we estimated the sample size required to detect a medium-sized correlation (.30, Cohen 1992) with a type I error rate of 5%, a type II error rate of 20% (i.e., 80% power), and for a two-tailed analysis.Footnote 1 According to the results provided by the G*Power 3 software (Faul et al. 2007), the sample size required to detect a .30 correlation with these specifications would be 84. This suggested sample size for correlations is in line with the recommendations of Weech et al. (2019) on their literature review of the relationship between cybersickness and virtual presence. On the other hand, hypothesis H1 implied differences in the cybersickness scores before and after VR immersion. In this case the G*Power 3 software indicated that the sample size necessary to detect mean differences of medium size (d = 0.50, Cohen 1992) using a dependent samples t test, with a type I error rate of 5%, a type II error rate of 20% (i.e., 80% power), and for a two-tailed analysis, would be 34. As this sample size is lower than the one needed to detect medium-sized correlations, the minimum sample size needed to answer hypotheses H1 to H7 was set at 84.

2.2 Participants

The sample consisted of 92 adults aged 18 years and older residing in the Dominican Republic, who were invited to participate through a snowball non-probabilistic sampling strategy. The age of the participants was between 18 and 52 years (M = 26.22, SD = 7.40). Of the total number of participants, 50 (54.3%) were female and the remaining 42 (45.7%) were male. Most of the sample (95.7%) had Dominican nationality, with 2.2% from the USA, 1.1% from Haiti, and the remaining 1.1% from other countries. Regarding the maximum educational level reached, 1.1% had a primary level, 27.2% had a secondary level, 2.2% had a technical degree, 46.7% had obtained a university degree, and 22.8% had completed postgraduate studies. On the other hand, 84.8% reported being single, 12.0% married, and 3.3% were in common law union. Regarding familiarity with VR, 53.3% reported not having used VR devices in the last three years, 27.2% used them once, 7.6% twice, 9.8% three to five times, and the 2.2% remaining six or more times. It should be noted that the original sample was composed of 94 participants; however, two of them responded incorrectly to two or more of the four attention checks (directed questions) and were removed from the database, reducing the sample size to 92.

2.3 Measures

2.3.1 Cybersickness

To measure cybersickness before and after the VR experience, the Virtual Reality Sickness Questionnaire (VRSQ; Kim et al. 2018) and the Motion Sickness Assessment Questionnaire (MSAQ; Gianaros et al. 2001) were used. The VRSQ is an adaptation of the Simulator Sickness Questionnaire (SSQ; Kennedy et al. 1993), in which a selection of the most representative items of cybersickness is made. The VRSQ has 9 items divided into the dimensions Oculomotor (4 items), for example “eyestrain”, and Disorientation (5 items), for example “vertigo”. Kim et al. (2018) reported alpha internal consistencies of .85 and .89 for the Oculomotor and Disorientation scales, respectively. A total score for the VRSQ can also be obtained by averaging the Oculomotor and Disorientation scale scores. The VRSQ items were answered using a 4-point Likert scale, from 0 (not at all) to 3 (severe). Regarding the MSAQ, it is made up of 16 items corresponding to 4 dimensions: Gastrointestinal (4 items), for example “I felt sick to my stomach”, Central (5 items), for example “I felt dizzy”, Peripheral (3 items), for example “I felt sweaty”, and Sopite-related (4 items), for example “I felt drowsy”. The MSAQ items were answered using a 9-point Likert scale, which ranged from none (0) to severe (8). Kousoulis et al. (2016) reported alpha internal consistencies of .95 for the scores on the Gastrointestinal scale, .79 for the Central scale, .83 for the Peripheral scale, and .73 for Sopite-related scale. As with the VRSQ, a total score for the MSAQ may also be obtained.

To measure cybersickness during the VR experience, the Fast Motion Sickness Scale (FMS; Keshavarz and Hecht 2011) was used. This instrument has a single item that assesses the symptoms of nausea, upset stomach and general discomfort felt by the person during the VR immersion. The FMS item is responded via a 20-point scale, from 0 (no discomfort) to 20 (extreme discomfort). In their validation study, Keshavarz and Hecht (2011) found very high correlations of .79 and .77 between total SSQ scores and the maximum and final FMS scores, respectively.

2.3.2 Virtual presence

To measure virtual presence, we employed the Multimodal Presence Scale (MPS; Makransky et al. 2017) and the Igroup Presence Questionnaire (IPQ; Schubert et al. 2001). For the MPS, we used the Physical Presence scale (5 items), for example “I had a sense of acting in the virtual environment, rather than operating something from outside”. The MPS items were answered using a 5-point Likert scale, from strongly disagree (1) to strongly agree (5). Makransky et al. (2017) reported an alpha internal consistency of .86 for the scale scores. Regarding the IPQ, it is composed of 14 items divided into three first-order dimensions: Spatial Presence (5 items), for example “I did not feel present in the virtual space”, Involvement (4 items), for example “I was not aware of my real environment”, and Realness (4 items), for example “How real did the virtual world seem to you?”. Additionally, item 14 of the IPQ, “In the computer generated world, I had a sense of “being there””, measures General presence as a second order dimension along the other three first-order dimensions. The IPQ items were answered using a 7-point Likert scale from -3 to 3 adjusted to the statement of each item. The official webpage of the IPQ reported alpha internal consistencies of .80/.77 for Spatial Presence, .76/.76 for Involvement, .68/.70 for Realness, and .85/.87 for General Presence (igroup n.d.).

2.3.3 Perceived enjoyment and behavioral intention

The Technology Acceptance Model 3 (TAM3; Venkatesh and Bala 2008) was used to measure perceived enjoyment and behavioral intention related to VR. The scales of Perceived Enjoyment and Behavioral Intention of the TAM3 are composed of 3 items each, which are answered through a 7-point Likert scale from totally disagree (0) to totally agree (6). An example of an item on the perceived enjoyment scale adapted for virtual reality is “I find using virtual reality to be enjoyable”, and of the behavioral intention scale is “Given that I had access to virtual reality, I predict that I would use it”. Venkatesh and Bala (2008) found alpha internal consistencies of .89 and .88 for the scores on the perceived enjoyment and behavioral intention scales, respectively.

2.3.4 Motion sickness susceptibility

To assess the susceptibility to motion sickness, the Motion Sickness Susceptibility Scale was used in its short version (MSSQ-Short; Golding 2006). This instrument has two parts that measure how often the person has experienced discomfort or nausea when using various means of transportation. Part A of the MSSQ-Short measures the frequency of discomfort before the age of 12, while part B does for the last 10 years. Both parts contain 9 items. The responses are presented on a 4-point Likert scale, with options ranging from never felt sick (0) to frequently felt sick (3). For those cases where the person had not used that mean of transportation in the last 10 years, the option “not applicable—never traveled” was available. An example of an item is “small boats” Golding (2006) reported an alpha internal consistency of .87 for the MSSQ-Short scores. Only part B of the MSSQ-Short was used for the present study.

2.3.5 Cybersickness health risk factors

To measure the general health and well-being of the participants we employed five scales from the Copenhagen Psychosocial Questionnaire version III (COPSOQ III; Burr et al. 2019), which were adapted and translated into Spanish by Moncada i Lluís et al. (2021). The COPSOQ III scales used in this study were: Self Rated Health (2 items), for example “In general would you say your health is:”; Sleep Troubles (4 items), for example “How often have you found it hard to go to sleep?”; Stress (3 items), for example “How often have you had problems relaxing?”; Somatic Stress (4 items), for example “How often have you had stomach ache?”; and Cognitive Stress (4 items), for example “How often have you had problems concentrating?”. All items of the COPSOQ III were answered for the period of the last four weeks, and all except those of Self Rated Health using the following Likert scale: all the time (4), a large part of the time (3), part of the time (2), a small part of the time (1) and not at all (0). In the case of the Self Rated Health items, the first was answered on a Likert scale with options of excellent (4), very good (3), good (2), fair (1) and poor (0), and the second on a scale of 0 (worst conceivable health) to 10 (best conceivable health).

2.3.6 Insufficient effort responding

To detect insufficient effort responding we administered four directed items (DeSimone et al. 2015; Kung et al. 2018), two of which were inserted into the questionnaire filled prior to the VR immersion and two in the one after the exposure. Each directed item instructed subjects to leave the question blank if they were reading carefully. An example of one of the directed items is “If you are reading carefully, leave this question blank”. The responses for these items were coded as one if it was left blank and as a zero if any response option was chosen.

2.4 Procedure

The present study was approved by the National Council of Bioethics (CONABIOS) of the Dominican Republic (No. 032-2015). To participate in the study, it was necessary to be of legal age (18 years or older), not to be pregnant or breastfeeding and not to have suffered from vertigo. Potential participants were identified through snowball sampling, and they were contacted prior to the application. During the initial contact the nature of the research was explained to the potential participants, and they were provided with the written informed consent form in digital format. The form described the activities they would do, their rights and possible risks, among other things. Specifically, the informed consent form explained that participation would be anonymous and that they could abandon at any time, without this having any repercussion on them. Also, no monetary compensation was offered. An appointment was arranged for those people who agreed to be part of the study and had signed the form.

All the tests used, except the COPSOQ III, were translated from English to Spanish using the parallel-blind technique (Behling and Law 2000). Three bilingual persons, including a professional translator and two psychologists, translated the instruments individually from the original language (English) to the target language (Spanish). The three translations were then compared, and any discrepancies were resolved to create the final Spanish versions of the instruments.

Since the data collection took place during the COVID-19 pandemic in November of 2020, additional actions were taken to prevent contagion between participants and researchers. These protocols are detailed in the Supplemental materials. No participant expressed or showed signs that these protocols impacted their VR experience. A pilot study was carried out with 10 volunteers in order evaluate the understanding of the tests, the cleaning and sanitizing protocols, and the operation of the VR equipment and applications. As a result of this pilot study, minor changes were made to the administration protocols to improve the experience of the participants.

The data collection phase was executed via the Qualtrics web application in three phases: (1) pre-immersion, (2) VR immersion, and (3) post-immersion. In the pre-immersion testing phase, the participants completed the VRSQ, MSAQ, MSSQ-Short, COPSOQ III, and the socio-demographic questions. After completing the baseline battery, the participants went to the second phase of VR immersion. The immersion was done using an Oculus Rift, a current-generation VR HMD device that has been widely used in cybersickness research (Caserman et al. 2021). The participants explored two virtual environments designed for obsessive–compulsive disorder (OCD) therapy: a kitchen and a public bathroom (see Figs. 1, 2). Participants were instructed to explore the environments, noting details, and manipulating available objects. These environments can be considered typical for OCD VR therapy, and they are presented at various degrees of cleanliness, from completely clean to completely filthy (Inozu et al. 2020; Laforest et al. 2016). Additionally, the experience of moving around in these rooms should be comparable to those of any VR program where the user can explore spaces. In terms of propensity to cybersickness, these environments do not contain any specific features that would cause cybersickness. As such, they are expected to provide a lower bound estimate of the levels of cybersickness experienced with HMDs (Saredakis et al. 2020). The VR exposure lasted 10 min, which can be considered a typical length of time for VR psychotherapeutic interventions (e.g., Chan et al. 2020; Chasson et al. 2020; Culbertson et al. 2012; Owens and Beidel 2015). Also, 10-min VR sessions have been employed to study the emergence of cybersickness (e.g., Martirosov et al. 2021; Teixeira and Palmisano 2020).

Fig. 1
figure 1

Kitchen virtual environment

Fig. 2
figure 2

Bathroom virtual environment

Starting from the moment that the participants had the Oculus Rift device correctly positioned in their heads and functioning, the FMS test question was administered (minute 0), and then again at each subsequent minute of the VR immersion (minutes 1–10), for a total of 11 measurements. Following the recommendations of Hutton et al. (2018), if at any time the participant provided an FMS score of 15 or more, the VR immersion was stopped, and the necessary measures were taken to minimize the symptoms of cybersickness that the participants were experiencing. In the last phase of the application process, the post-immersion phase, the VRSQ, MSAQ, MPS, IPQ, and TAM3 tests were applied. After completing the post-immersion battery, the application was considered finished, and the participants were thanked for their involvement.

To augment the ecological validity of the experiments, the levels of dirtiness in the psychotherapeutic environments were increased during the VR immersion. Several steps were taken to eliminate or minimize potential confounding effects that this action could have had on the experience of cybersickness. First, before immersion the participants were asked if they had any phobias or problems related to dirt or dirty environments. Two participants said they did, and they were only presented with the completely clean environments for the duration of the VR immersion. Second, the reactions of the participants to the increases in the dirtiness of the environments were closely monitored so that appropriate and timely action could be taken if needed. Two participants showed notable aversion and anxiousness as the levels of dirt were increased, and for them the environments were immediately returned to their clean state and kept there for the remainder of the VR immersion. No other incidents related to the dirtiness of the environments were observed.

2.5 Statistical analyses

The scale scores for all the instruments were computed by averaging the items that composed them. This was done to keep the scale scores in the same plausible range of values as the individual items, thus making their interpretation easier. Regarding the total scale scores of the VRSQ and MSAQ, they were computed by averaging their respective subscale scores. In the case of the COPSOQ III Self Rated Health scale, the two items that composed it did not have the same number of response options. Therefore, the scores on the item “If you evaluate the best conceivable state of health at 10 points and the worst at 0 points: how many points do you then give your present state of health?” which ranged from 0 to 10 were subsequently divided by 2.5, so that they ranged from 0 to 4, as the other item in the scale “In general, would you say your health is:”.

The internal consistency reliability of the scale scores was computed using McDonald’s omega coefficient (ω; McDonald 1999). The omega coefficient is generally recommended over the widely used alpha coefficient because it provides a more precise estimate of reliability (Hayes and Coutts 2020). This is mainly because, unlike alpha, omega does not assume tau equivalence (i.e., that all items on a scale contribute equally to the total scale score) (McNeish 2018). Reliability values above .70 are commonly considered acceptable for research purposes, although a lower limit of .60 may be extended for exploratory research (Hair et al. 2014, p.123). For the present study, scale scores with reliabilities below .60 were not considered for inferential analyses due to poor reliability. In the case of the total scores for the VRSQ, MSAQ, and IPQ the reliabilities were computed using the subscale scores as indicators.

To ascertain whether cybersickness increased after the virtual reality exposure we employed paired samples t tests with the pre-immersion and post-immersion scale scores. Because of the central limit theorem, the distributions of the means for the differences in cybersickness (post-immersion–pre-immersion) were expected to be normally distributed for our sample size of 92, even if the cybersickness variables were not normally distributed themselves (Lumley et al. 2002). Therefore, a paired samples t test was deemed appropriate. As a measure of effect size, we used Cohen’s d, with values of 0.20, 0.50, and 0.80 interpreted as representing small, medium, and large effects, respectively (Cohen 1992).

In order to estimate the linear relationships between the scale scores we used Spearman’s correlation coefficient as the distribution of most of the variables departed markedly from normality and in the case of cybersickness were heavy-tailed (Bishara and Hittner 2014, 2017). The size of the correlation coefficients was interpreted according to Cohen’s (1992) guide, which suggests that correlations of .10, .30, and .50, can be considered as small, medium, and large, respectively. Path analysis models were employed to determine the conditional relationships between multiple predictors and a dependent variable. The path models were estimated using maximum likelihood estimation with robust standard errors (MLR) to account for the non-normality of the variables. As the number of predictors was generally small (≤ 8), we used backward selection to establish groups of significant predictors (Sauerbrei et al. 2007). Initially, all predictors with significant bivariate correlations with the dependent variable were entered into the path model. Then, nonsignificant predictors were eliminated one at a time from the path model using the highest nonsignificant p value as the criterion to establish the predictor to be eliminated from the model. This process was repeated until all predictors in the path model had a significant (< .05) regression coefficient. Sauerbrei et al. (2007) recommend that the predictors not be very strongly correlated (e.g., bivariate correlation coefficients below .70) and that the sample size contains at least 10 observations per variable in the model. For the current study all the correlations between the predictors were lower than .70 and the sample size (92) was higher than 10 times the largest number of variables in an estimated path model (8 × 10 = 80).

To examine the latent trajectories of cybersickness during the VR immersion we analyzed the FMS scores using latent growth curve models (LGCMs; Curran et al. 2010) and the MLR estimator to account for the non-normality of the FMS scores. The LGCMs were specified to have random effects. To compare LGCMs with different growth functions we used the Satorra–Bentler chi-square difference test (Satorra and Bentler 2010), as well as information criteria indices (Vrieze 2012), including the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). These were complemented with the traditional structural equation modeling fit indices (Leite and Stapleton 2011), including the Comparative Fit Index (CFI), the Tucker–Lewis Index (TLI), the Root Mean Square Error of Approximation (RMSEA), and the Standardized Root Mean Square Residual (SRMR).

Even though the Qualtrics web application used to collect the data was programmed to not allow missing responses, there were two groups of items that had missing data for specific reasons. First, in the MSSQ there was a response option labeled “not applicable—never traveled” which was coded as missing for the inferential statistical analyses. Second, the participants who reached or exceeded the maximum allowable FMS score (15) were prevented from continuing with the VR immersion, with their answers for the rest of the administration points (up until 10 min) also being coded as missing. In the case of the MSSQ, 13.0% of the cells were missing, with a range from 1.1 to 46.7% for individual items. Because a scale score is computed for the MSSQ, we followed the recommendations in the literature (Eekhout et al. 2014; Gottschall et al. 2012; Graham et al. 2007) and imputed the item scores across 20 multiply imputed datasets. Once missing data for the MSSQ items was imputed, the scale scores were computed in the typical way by averaging the MSSQ item scores. The missing data for the MSSQ items was imputed using the latent variable approach (Asparouhov and Muthén 2010), with the values of the variables constrained to their natural range according to the response options of the items. Regarding the missing data for the FMS, the total amount of missing information was 7.2%, with rates ranging from 0.0 to 23.9% for individual items. As with the MSSQ items, the missing values for the FMS variables were multiply imputed across 20 datasets using the latent variable approach.

It should be noted that chi-square difference testing to compare LGCMs with different growth functions cannot be performed for multiply imputed datasets. Multiple imputation was preferred for this study because one of the predictors for the LGCMs was the MSSQ scale scores, which are sum scores obtained from the MSSQ items that had the missing data. In these cases, multiple imputation at the item-level is recommended to compute the scale scores (Gottschall et al. 2012; Lang and Little 2018). Thus, to formally compare unconditional (without predictors) LGCMs with different growth functions using the chi-square difference test we used full information maximum likelihood (FIML) to treat the missing data in the FMS item scores (Lang and Little 2018).

Data handling, descriptive statistics, and Spearman’s correlation coefficients for multiply imputed datasets were computed using IBM SPSS Statistics (Version 25). Omega internal consistency reliabilities, paired samples t tests, and Shapiro–Wilks’s normality tests were computed using Jamovi (Version 1.6.15.0). Latent growth curve and path models for multiply imputed datasets were computed using Mplus (Version 8.3).

3 Results

According to the omega coefficient the reliabilities of the scales scores were generally adequate. For the cybersickness scales it is important to note that lower pre-immersion reliabilities were expected due to diminished variability in the item scores. Specifically, the reliabilities (pre/post) for the cybersickness scale scores were: .82/.75 for VRSQ Oculomotor, .62/.67 for VRSQ Disorientation, .90/.87 for the VRSQ Total, .84/.89 for the MSAQ Gastrointestinal, .45/.89 for the MSAQ Central, .68/.86 for the MSAQ Peripheral, .78/.73 for the MSAQ Sopite-related, and .64/.85 for the MSAQ Total. Regarding the reliabilities for the virtual presence scores, they were as follows: .81 for MPS Physical presence, .77 for IPQ Realness, .57 for IPQ Involvement, and .73 for IPQ Spatial presence. Due to a reliability lower than .60, the IPQ Involvement scale was removed from subsequent analyses. Additionally, the reliability of the IPQ General presence scores (without Involvement) was .85. For their part, the reliabilities for Perceived enjoyment (ω = .93) and Behavioral intention (ω = .95) were notably high. Similarly, the reliability for the MSSQ susceptibility scores was also high at .81. As far as the COPSOQ III health-related scales, the reliabilities were generally adequate, with a .91 for General health, .86 for Sleep problems, .75 for Stress, and .75 for Cognitive Stress. However, the reliability for Somatic Stress was only .56, and therefore this scale was not included in the inferential analyses. Because the four items that composed this scale (stomach aches, headaches, palpitations, and tension in the muscles) tapped into relevant content not captured elsewhere, they were included in the inferential analyses as separate variables. Shapiro–Wilks’s normality tests showed that all the scores were non-normal (p < .05). Density plots (Supplemental Figure 1) and descriptive statistics indicated that in general the non-normality was non-trivial.

3.1 Pervasiveness of cybersickness

Cybersickness in this study was measured in two ways. First, we used the FMS to measure the overall cybersickness levels that the participants were feeling during the VR immersion. Second, we administered the VRSQ and MSAQ scales to measure cybersickness before and after the VR immersion. Regarding the FMS scores, there were 11 measurements, starting at minute zero (when the virtual reality headset was in place and functioning) and continuing every minute until minute ten when the immersion ended. A total of 22 participants (22.8%) could not complete the 10-min immersion because their FMS score reached 15 or higher, while another participant decided to stop due to the severity of the symptoms they were experiencing. Specifically, two participants had to stop the immersion at minute 3 (2.2%), one at minute 4 (1.1%), seven at minute 5 (7.6%), one at minute 6 (1.1%), three at minute 8 (3.3%), and eight at minute 9 (8.7%).

The final FMS scores given by the participants, at minute 10 for those who completed the VR immersion and at the stopping point for those who could not, can be used to create a rough estimate of how many experienced cybersickness and to what extent. The FMS response scale, which went from 0 (no discomfort) to 20 (extreme discomfort) was divided into four sections of approximately equal size: those who gave a final FMS score in the range of 0–4 were considered as having not suffered from cybersickness, from 5 to 9 was considered as mild cybersickness, from 10 to 14 as moderate cybersickness, and from 15 to 20 as severe cybersickness. Using this guide, the final FMS scores show that 34.8% did not suffer from cybersickness, while 28.3%, 13.0%, and 23.9% experienced mild, moderate, and severe cybersickness, respectively.

Regarding the cybersickness scale scores, we first computed the correlations between the post-immersion scores to assess their level of similarity. As can be seen in Table 1, the two VRSQ first-order scales, Oculomotor and Disorientation, had a high correlation (rho = .69, p < .001). In the case of the four MSAQ first-order scales, the scales had correlations ranging from .39 to .49 (p < .001), except for Central and Sopite-related, which exhibited a notably high correlation of .75 (p < .001). As far as the correlation between the VRSQ and MSAQ scales, they ranged between .35 (p < .01; MSAQ Peripheral with both VRSQ Oculomotor and Disorientation) and .69 (p < .001; VRSQ Total with MSAQ Total). The high correlation between the VRSQ and MSAQ Total scale scores suggest a notable degree of convergence, but also enough room for discriminant validities. Table 1 also shows the correlations between the FMS Last scores (at minute 10) of the participants and the cybersickness scale scores. These results indicated that the FMS Last score had a notably high correlation with the MSAQ Total (rho = .69, p < .001), while the rest of the correlations ranged from .29 (p < .01, with the MSAQ Peripheral) to .65 (p < .001, with the MSAQ Central).

Table 1 Spearman correlation coefficients between the Post-Immersion Cybersickness Scores

Hypothesis H1 stated that cybersickness scores after the VR immersion would be higher than at baseline. To answer this hypothesis, we conducted paired samples t tests for the pre- and post-immersion cybersickness scale scores, which are shown in Table 2. The results of the t tests support hypothesis H1, as all the post-immersion scores were significantly higher than the pre-immersion scores (p = .013 for MSAQ Sopite-related and < .001 for the rest). Cohen’s d measure of effect size indicated that the effects ranged from small to large (d = 0.27–1.03).

Table 2 Paired samples T tests for the pre- and post-immersion cybersickness scale scores

Even though these pre-post mean differences are evidence that the participants experienced cybersickness because of the VR experience, it is important to note that the cybersickness scores were not zero at baseline (see the pre-immersion mean scores in Table 2). These results suggest that the cybersickness scales measured traits partly independent of the VR experience. This is also corroborated by the reliability levels for the pre-immersions scores, which were generally above .60, suggesting that there was notable trait variability in the scale scores prior to the VR experience. Supplemental Figure 1 shows density plots for the pre- and post-immersion cybersickness scores, which further illustrates the point.

3.2 Latent trajectories of cybersickness

The observed trajectories of cybersickness (FMS scores) for each participant across the length of the VR immersion are shown in Fig. 3. The trajectories in Fig. 3 are organized according to the final FMS score provided by the participants, from highest to lowest. Thus, the trajectories at the top correspond to those participants that did not finish the VR immersion due to severe cybersickness, and those at the bottom correspond to those participants who at the end of the immersion were not experiencing symptoms of cybersickness. For example, Fig. 3 shows that the participant with ID#2 started experiencing cybersickness symptoms from the very first minute of immersion and that these symptoms quickly rose in intensity, leading the participant to abandon the VR immersion at the third minute. In contrast, participant with ID#85 never developed any symptoms of cybersickness across the length of the VR immersion. Yet another case was participant with ID#73, who experienced greatly fluctuating levels of cybersickness during the immersion.

Fig. 3
figure 3

Longitudinal trajectories of cybersickness during the virtual reality immersion

Table 3 shows the descriptive statistics for the FMS scale at each time point. As can be seen in the table, the mean values of cybersickness according to the FMS generally increased as time passed in the VR immersion. In line with this, hypothesis H2 stated that cybersickness would be positively related with exposure time on the virtual environments. To formally test this hypothesis and assess the trajectory of cybersickness during the VR immersion we fitted the multiply imputed FMS data to latent growth curve models (LGCM) with random effects. We first fitted a linear LGCM, which produced the following means for the fit indices: AIC = 4661.63, BIC = 4701.98, CFI = 0.86, TLI = 0.87, RMSEA = 0.15, and SRMR = 0.13. Next, we proceed to fit a LGCM with quadratic trajectories, and this model produced moderate improvements in fit over the linear model: AIC = 4607.57, BIC = 4658.010, CFI = 0.89, TLI = 0.90, RMSEA = 0.13, and SRMR = 0.07. Additionally, using FIML we performed a Satorra–Bentler chi-square difference test to compare the linear and quadratic LGCMs. The linear LGCM produced a chi-square of 136.18 with 61 degrees of freedom and a scaling correction factor of 1.35, while the quadratic LGCM produced a chi-square of 103.96 with 57 degrees of freedom and a scaling correction factor of 1.28. Based on these values, the Satorra–Bentler chi-square difference test produced a chi-square value of 21.10 with 4 degrees of freedom, which is associated with a p value < .001. These results indicate that the quadratic LGCM fits the data significantly better than the linear LGCM. Based on these combined results, we chose the quadratic LGCM as optimal.

Table 3 Descriptive statistics for the FMS, MPS, IPQ, TAM3, MSSQ, and COPSOQ III Scale Scores

The quadratic LGCM produced the following (unstandardized) means for the random effects: 0.65 (p = .003) for the intercepts, 0.83 (p < .001) for the linear effects, and − 0.01 (p = .329) for the quadratic effects. These results imply that at minute 0 the model estimated that the participants had a mean score of 0.65 on the FMS. Because the FMS score at minute 0 was obtained as soon as the VR headset was properly set on the person’s head and functioning, most participants had not experienced discomfort at this point. The value of 0.83 indicates the amount of growth expected in the FMS scores for an increase of 1 min of virtual exposure according to the linear term. In terms of the quadratic term, a positive sign indicates that the curve is convex and a negative sign that the curve is concave. In this case, the quadratic term was nonsignificant (not different from zero), which indicates that the mean trajectory for all participants was essentially linear. However, because the quadratic model provided better fit than the linear model, and because the quadratic term had a significant variance (p = .003) of 0.005, it follows that the individual trajectories of some participants could be better approximated by a convex curve (e.g., ID40, Fig. 3), while for other participants a concave curve provided better fit (e.g., ID32, Fig. 3). These opposite sign trajectories would cancel each other out, producing a mean quadratic parameter estimate not different from zero.

It is also worth noting that the variance for the intercepts (1.541, p = .033) and linear effects (1.071, p < .001) were also significant, supporting the estimation of random effects across participants. Especially relevant in these results is the size of the variance of the linear effects, which implies a standard deviation (SD) of 1.15 for the slope estimates across individuals. This indicates, for example, that whereas for the whole sample the slope parameter indicated an increase of 0.83 FMS points per minute, for some individuals, these values could be 1.98 FMS points (0.83 + 1SD) or 3.13 FMS points (0.83 + 2SD) per minute. In terms of the correlations between the effects in the LGCM, the linear and quadratic effects correlated negatively at − 0.73 (p < .001), while neither the intercepts and the linear effects (0.24, p = .265) nor the intercepts and the quadratic effects (− .26, p = .287) correlated significantly. This negative correlation between the linear and quadratic effects indicates that the participants with higher overall rates of growth in cybersickness in the beginning of the immersion experienced larger decreases in their rates of growth as the VR immersion advanced. Conversely, those participants that had the lowest rates of growth in cybersickness initially, had the largest increases in their rates of growth at the latter stages of immersion. Finally, the observed and model-implied mean cybersickness trajectories are shown in Fig. 4. In all, these results indicate a positive relationship between cybersickness and VR exposure time, thus supporting hypothesis H2. They also underscore the large variability in the cybersickness immersion trajectories across individuals.

Fig. 4
figure 4

Observed and latent growth curve model-implied mean trajectories of cybersickness. FMS Fast Motion Sickness Scale

3.3 Effects of cybersickness on the virtual reality experience

Cybersickness was expected to have a deleterious impact on the VR experience. These expected impacts were stated on Hypotheses #3 to #5. Specifically, hypothesis H3 postulated that cybersickness would be negatively related with virtual presence. The results in Table 4 partially support this hypothesis, as the VRSQ Oculomotor and Total scales scores were negatively associated with all the presence scales, with the correlations ranging from − .23 (p < .05, VRSQ Total with IPQ Spatial presence) to − .41 (p < .001, VRSQ Oculomotor with IPQ General presence). Additionally, the MSAQ Peripheral scale correlated negatively with the IPQ Spatial presence scale (rho = − .25, p < .05), while the MSAQ Sopite-related scale correlated negatively with the MPS Physical presence scale (rho = − .23, p < .05). The other correlations between the cybersickness and presence scales were not statistically significant.

Table 4 Spearman correlation coefficients between immersion/post-immersion cybersickness and the virtual reality experience

Two hypotheses were proposed regarding the relationship between cybersickness and technology acceptance. Hypothesis H4 stated that cybersickness would be negatively related with the perceived enjoyment of the VR experience, while hypothesis H5 stated that cybersickness would be negatively related with the intention of using VR in the future. As Table 4 shows, both hypotheses were supported. In the case of perceived enjoyment, all cybersickness scales correlated negatively with it, with the correlations ranging from medium to large size (− .33 ≤ rho ≤ − .50, p < .01). Similarly, all the cybersickness scales correlated negatively with the intention of using VR in the future, with the correlations ranging from small to medium size (− .21 ≤ rho ≤ − .33, p < .05). The strongest correlations with the two technology acceptance variables were obtained for the cybersickness scales of VRSQ Oculomotor, VRSQ Total, and MSAQ Total.

It is important to note that, as theoretically expected, virtual presence correlated positively with both perceived enjoyment and behavioral intention. MPS Physical presence had the highest correlations with perceived enjoyment (rho = .44, p < .001) as well as behavioral intention (rho = .31, p < .01). Also, perceived enjoyment correlated very highly with behavioral intention (rho = .81, p < .001). Further, there was a high degree of convergence between the MPS physical presence scale and the IPQ General presence scale (rho = .76, p < .001). Descriptive statistics for the virtual presence and technology acceptance variables are shown in Table 3.

3.4 Cybersickness susceptibility

This study sought to identify variables that could predict the amounts of cybersickness experienced by the participants (see Table 3 for descriptive statistics). In this regard, hypothesis H6 stated that cybersickness would be positively related with the susceptibility to motion sickness, while H7 postulated that cybersickness would be positively related with poorer general health. These two hypotheses, which involved a set of nine variables (one for motion sickness susceptibility and eight for general health) were tested in two ways: (1) by using the predictor variables to try to explain the latent trajectories of cybersickness during the VR immersion, and (2) by assessing the correlations between the predictor variables and the cybersickness last immersion score and the post-immersion scores. We also included in these analyses the variables of sex and age.

The first step in the cybersickness susceptibility analyses involved exploring potential explanatory variables of the LGCM effects. For this, we incorporated in the model as covariates motion sickness susceptibility (MSSQ scale), cybersickness health risk factors (COPSOQ III scales), sex, and age. We first introduced each of the variables separately, and those that had significant relationships with the effects, were then introduced together in a subsequent model. Then, the covariates in the LGCM were reduced using the backward selection method. When introduced separately, no covariates were significantly associated with the intercepts or quadratic effects. In contrast, five variables correlated with the linear effects: the MSSQ (.40, p = .001), COPSOQ III General health (− .26, p = .033), COPSOQ III Stress (.29, p = .008), COPSOQ III Cognitive stress (.28, p = .016), and COPSOQ III Headaches (.36, p < .001). These results indicate that those with higher motion susceptibility and worse health tended to have higher rates of increase in cybersickness during the VR exposure. Finally, using backward selection a final LGCM model was obtained with two predictors, producing the following fit indices: AIC = 4589.89, BIC = 4655.46, CFI = 0.90, TLI = 0.90, RMSEA = 0.12, and SRMR = 0.06. The two predictors of this model were significantly associated with the linear effects of the LGCM, with one being the MSSQ, which had a standardized regression coefficient of .31 (p = .023), and the other being COPSOQ III Headaches with a coefficient of .25 (p = .028). According to this model, the covariates were able to explain 21.4% of the variance of the linear effects.

The second step in the evaluation of cybersickness susceptibility involved the computation of the pairwise Spearman correlation coefficients between the predictor variables and the cybersickness scores. As shown in Table 5, hypothesis H6 was supported as the MSSQ scale correlated positively with all but one of the cybersickness scales (MSAQ Peripheral, rho = .02, p = .86). The MSSQ obtained the highest correlations with the VRSQ scales (.35 ≤ rho ≤ .39, p < .05) and the FMS Last score (rho = .32, p < .01). In addition to the MSSQ, Table 5 shows support for H7 as the health problems predictors correlated consistently and positively with the cybersickness scores, such as recent experiences of cognitive stress, stomach aches, headaches, and stress, with correlations as high as .44 (p < .001, Cognitive stress and VRSQ Oculomotor). Also, it is worth noting that self-rated general health obtained a medium negative correlation with the FMS Last score (rho = − .32, p < .01). In general, these results indicate that people with poorer health tended to experience higher levels of cybersickness during and after the VR immersion. Another important predictor of cybersickness was age, which correlated negatively with six of the nine cybersickness scale scores, with a maximum correlation of − .38 (p < .001, with VRSQ Disorientation). Sex, for its part, correlated negatively with the FMS Last score (rho = − .23, p < .05) and the MSAQ Central (rho = − .26, p < .05), indicating that males experienced less cybersickness.

Table 5 Spearman correlation coefficients between immersion/post-immersion cybersickness and its predictors

The final step in the cybersickness susceptibility analyses involved the estimation of path regression models to identify the groups of variables that could, together, best predict the last cybersickness immersion score and the post-immersion cybersickness scale scores. Initially, all variables with significant correlations with the target cybersickness scales were entered into the path model. Then, variables with nonsignificant regression weights were removed using the backward selection method. The final models for each cybersickness scale are shown in Table 6, except for the MSAQ Gastrointestinal scale, which did not have any significant predictors in its path model. The results in Table 6 indicate that across cybersickness scales the most consistent and strongest predictor in the path models was motion sickness susceptibility (as measured by the MSSQ). The MSSQ was a significant predictor in all the path models contained in Table 6 and had the highest standardized regression coefficient in each model as well. Other important predictors were age (in 6 of 7 models), COPSOQ III Cognitive stress (in 4 of 7 models), and COPSOQ III Headaches (in 3 of 7 models). Overall, the predictive models of cybersickness were able to explain between 10.7% (MSAQ Total) and 41.3% (VRSQ Oculomotor) of the cybersickness scale score variances.

Table 6 Immersion/post-immersion cybersickness predictive path models

4 Discussion

Virtual reality technology has been steadily gaining popularity in the last two decades and is currently being used in a wide array of fields, such as education, psychotherapy, medical training, and tourism, among many others (García-Batista et al. 2020; Li et al. 2017; Loureiro et al. 2020; McHugh 2019; Riva et al. 2016, 2019; Servotte et al. 2020). However, the adoption of this technology has been hampered by sensations of discomfort characterized by symptoms such as nausea, visual fatigue, headaches, disorientation, and dizziness that are collectively known as cybersickness (Rebenitsch and Owen 2016; Weech et al. 2019). Although numerous studies of cybersickness have been carried out in recent years, many of these have been conducted on very small samples that lacked statistical power to detect realistic effects for the social sciences (Weech et al. 2019). In addition, developing a better understanding of the magnitude and effects of cybersickness, as well as creating predictive models of user susceptibility to cybersickness and its aftereffects, have been deemed a priority in the 2020 cybersickness R&D agenda (Stanney et al. 2020). The present study thus attempted to address these issues by broadly examining the cybersickness phenomenon in the context of on an adequately powered sample and three main objectives: (1) to estimate the incidence of cybersickness and its evolution across the length of the VR immersion, (2) to assess the effects of cybersickness on the sense of virtual presence and the adoption of the VR technology, and (3) to identify sets of variables that could predict user susceptibility to cybersickness.

4.1 Main findings

4.1.1 Pervasiveness and longitudinal trajectories of cybersickness

The findings of this study indicate that cybersickness due to VR experiences on HMDs is a pervasive phenomenon. The emergence of cybersickness was measured in two ways: as the level of symptoms reported during the immersion, and as pre- and post-immersion cybersickness scale score comparisons. In terms of the cybersickness levels reported during the 10-min VR immersion, the results of the FMS instrument showed that approximately 65% of the participants experienced some form of cybersickness, with about 24% experiencing severe cybersickness. The proportion of participants experiencing cybersickness found in this study is in line with the estimates presented by Stanney et al. (2020) in their literature review, which indicated that more than 60% of users experienced cybersickness during their first VR exposure. Also, this proportion of users experiencing cybersickness and the intensity of their symptoms is noteworthy considering that the psychotherapeutic VR environments employed in this study contained no features particularly conducive of cybersickness.

The emergence of cybersickness was also corroborated through mean comparisons of the pre- and post-immersion cybersickness scale scores. According to six subscale scores and two total scores from the VRSQ and MSAQ instruments, the post-immersion cybersickness mean scores were significantly higher that the pre-immersion means, thus supporting hypothesis H1 of the study. Regarding the magnitude of the differences, the majority could be characterized as being of medium size, except for the Central scale of the MSAQ, which achieved a large difference. The Central scale measures sensations of dizziness, disorientation, lightheadedness, spinning and being faint-like. Another scale that showed mean differences of notable size was the MSAQ Gastrointestinal scale, which assesses symptoms such as nausea, upset stomach, and inclination to vomit. These results are congruent with the literature, which indicates that symptoms such as dizziness, nausea, and disorientation are the most characteristic of cybersickness (Davis et al. 2014; McHugh 2019; Nesbitt et al. 2017). On the other hand, the scale with smallest pre- and post-immersion mean differences was Sopite-related from the MSAQ, which assesses symptoms such as irritation, fatigue, uneasiness, and drowsiness. It should be noted that while the Central scale had mostly null cybersickness pre-immersion scores, most of the scales had non-zero scores with non-trivial levels of variance before the VR exposure. This indicates that these scales measure symptoms and feelings that are not uncommon in normal life, such as having headaches, being fatigued, or having eyestrain, and that to better understand the impact of using VR technology it is necessary to obtain pre-immersion baseline estimates of these symptoms (Pot-Kolder et al. 2018).

In general, most of the subscales from the VRSQ and the MSAQ were highly related, with correlations typically ranging between .40 and .70. The subscale that was the most weakly correlated with the rest was Peripheral from the MSAQ, which measures feeling sweaty, clammy/cold sweat, and hot/warm. There was a high level of convergence between the total scores of the VRSQ and the MSAQ, with a large correlation of .69. This suggests a high level of congruence between the total scores of these two instruments, but with enough room for discriminant validity. Furthermore, the last FMS score of the participants at minute 10 of the immersion correlated very highly with the MSAQ Total score (.69), and somewhat lower but still highly with the VRSQ Total score (.55). These results provide additional evidence of the strong relationship between the FMS immersion cybersickness scores and the post-immersion cybersickness scale scores (Keshavarz and Hecht 2011).

Hypothesis H2 of this study posited that there would be a positive relationship between cybersickness and exposure time on the VR environments. Results from latent growth curve modeling (LGCM) supported this hypothesis by showing a significant and positive relationship between the FMS scores and exposure time. A quadratic LGCM fit the data better than the linear model and was deemed optimal. At the average level, the trajectory of cybersickness across the VR exposure was close to linear (the quadratic component was nonsignificant), showing a consistent and important increase in cybersickness with each minute of VR exposure. The variances of both the linear and quadratic components of the LGCM were significant, indicating that the longitudinal trajectories varied across participants. Indeed, a look at the trajectories for each of the participants revealed large variability, going from participants who experienced a very fast increase in cybersickness and had to stop the VR immersion to others who did not experience cybersickness symptoms at any point of the immersion. These results corroborate and extend previous findings in the literature regarding the positive relationship between VR exposure time and cybersickness (Keshavarz and Hecht 2011; Rebenitsch and Owen 2016; Teixeira and Palmisano 2020), while highlighting vast individual differences in the experiences of cybersickness.

Although the exact causes of cybersickness are currently unknown, as mentioned previously, different theories such as the sensory conflict, the posture instability, and the poison theory, are generally the most accepted (LaViola 2000; Mousavi et al. 2013). In this regard, the recent review by Stanney et al. (2020) and results by Kim et al. (2020), highlight the multisensory reweighting / integration hypothesis, which proposes that susceptibility to cybersickness may be related to the speed at which individuals are able to balance the conflicting multisensory signals produced by the virtual environment. Indeed, the meta-analysis of Caserman et al. (2021) showed that mismatch stimuli (e.g., joystick-based movements) cause a significant increase in cybersickness compared to matched stimuli.

4.1.2 Effects of cybersickness on the virtual reality experience

The second objective of this study was to assess the effects of cybersickness on the sense of presence in the virtual environments and on the adoption of the VR technology. Regarding virtual presence, it is one of the main objectives of VR developers, since achieving a good sense of presence means that the user has felt that the experience has been real and credible (Grassini and Laumann 2020a, b; Weech et al. 2019). However, studies have found that cybersickness can make users lose concentration, leading them to feel less present (Grassini and Laumann 2020a, b; Melo et al. 2017; Weech et al. 2019). Taking this into account, Hypothesis H3 posited that there would be a negative relationship between the intensity of cybersickness and virtual presence. The results obtained supported this hypothesis, as all the significant relationships between cybersickness and the virtual presence variables were negative. In particular, the VRSQ Oculomotor cybersickness scale was the most relevant, as it had medium negative correlations with all the virtual presence scales. The Oculomotor scale includes the symptoms eyestrain, difficulty focusing, general discomfort, and fatigue. It is likely that those participants who experience these symptoms could become more internally focused and less able to process the features of the virtual environment, thus limiting the sense of presence (Weech et al. 2019).

Cybersickness was also posited to be negatively correlated with two key variables related to technology acceptance, perceived enjoyment (hypothesis H4) and intention to use VR in the future (hypothesis H5). The results from this study showed that the two variables correlated negatively with all the cybersickness scores, therefore supporting both hypotheses. In the case of perceived enjoyment, the correlations ranged between medium and large size (− .33 to − .50), with the largest correlations obtained for the VRSQ and MSAQ Total scores, the VRSQ Oculomotor scores and the FMS scores. These negative associations are expected, as the feelings of discomfort due to cybersickness would interfere with the enjoyment of the VR experience, a finding documented in the literature (e.g., Lin et al. 2002; Yildirim 2019). Additionally, the results from this study show that this association is strong and that the overall feelings of cybersickness, considering all the symptoms, are generally the most predictive of the levels of enjoyment experienced by the users. In the case of intention to use VR in the future, the results were like those of perceived enjoyment as all the cybersickness scores correlated negatively with it, albeit with smaller correlations that ranged from small to medium size (− .21 to − .33). Here, the scales scores that had the strongest negative correlations with the intention to use VR in the future were the Oculomotor and Total VRSQ scores, and the MSAQ Gastrointestinal scores. The recent studies conducted by Sagnier et al. (2020) and Hildebrandt et al. (2018) also found similar negative relations. In all, these results indicate that users who suffer more intensely from the symptoms of cybersickness will tend to enjoy the VR experience less and will be less likely to use VR in the future, a finding that underscores the problems created by cybersickness in relation to the mass adoption of VR with HMDs.

Although not an explicit focus on this study, some additional results are worth mentioning as they pertain to the VR experience. In this regard, it should be noted that the sense of virtual presence was positively related with the perceived enjoyment of the VR experience and the intention to use VR in the future. Indeed, it is known that a heightened sense of presence will make the VR experience more attractive and enjoyable (Sylaiou et al. 2010; Tussyadiah et al. 2018). Also, a very strong positive relationship was found between perceived enjoyment and intention to use in the future, a result in line with previous findings (e.g., Karjaluoto and Leppaniemi 2013; Wu and Liu 2007). This result is expected, as the people who enjoy an activity more are more likely willing to repeat it (Norazah and Norbayah 2011).

4.1.3 Cybersickness susceptibility

The final objective of this study was to identify groups of predictor variables that could explain the cybersickness symptomatology. In relation to this objective, hypothesis H6 posited that cybersickness would be positively related with motion sickness susceptibility and hypothesis H7 that cybersickness would be positively related with poorer general health. The results supported hypothesis H6, as motion sickness susceptibility was significantly and positively related with all but one of the cybersickness scale scores, with correlations than ranged from small to medium (.24 to .39). Moreover, in the predictive path models of cybersickness this variable was always a significant and positive predictor after statistically controlling for the rest of the predictors. These results are congruent with previous findings indicating that motion sickness and cybersickness share similar symptomatology and are positively related (Golding et al. 2021; Mazloumi Gavgani et al. 2018; Nesbitt et al. 2017; Rebenitsch and Owen 2021).

The path analyses also revealed that cognitive stress, past headaches, and stomach aches, were unique predictors positively related with the post-immersion cybersickness scale scores, thus supporting hypothesis H7. Another predictor variable that was consistently and uniquely related with cybersickness was age, with older participants experiencing less cybersickness, a finding consistent with the meta-analysis of Saredakis et al. (2020), but not with the meta-analysis of Howard and Van Zandt (2021), which found no relationship between cybersickness and age. However, the latter authors argue that further research is needed with adequate age ranges to establish the relationship (or lack thereof) more firmly between age and cybersickness. In this regard, there is the possibility of a nonlinear relationship between these variables across the lifespan, with children and elderly adults experiencing greater cybersickness than young and middle-aged adults (Brooks et al. 2010; Howard and Van Zandt 2021; Park et al. 2006). On the other hand, the path analyses revealed that sex was not a unique predictor of cybersickness, although negative correlations between sex and two cybersickness scores (FMS Last score and MSAQ Central) were found, indicating that women experienced more cybersickness than men. This finding is congruent with the meta-analysis of Howard and Van Zandt (2021). It should be noted that sex and motion sickness susceptibility also correlated negatively (i.e., women had higher motion sickness susceptibility scores), so it is possible that the relationship between sex and cybersickness could be mediated by this predictor variable. Overall, the predictors were able to explain between 11% (MSAQ Total) and 41% (VRSQ Oculomotor) of the variance of the post-immersion cybersickness scale scores, with the exception the MSAQ Gastrointestinal scale, which did not have significant predictors.

A novel finding of study was the identification of variables that could help explain the latent trajectories of cybersickness during the VR immersion. In this regard, the results showed that steeper increasing trajectories of cybersickness could be found for those with greater susceptibility to motion sickness, poorer general health, and recent experiences of physical stress, cognitive stress, and headaches. A predictive model with motion sickness susceptibility and recent headache history was able to explain 21% of the variance of the linear effects in the latent growth trajectories. In combination, these susceptibility analyses shed light into variables that can help in identifying individuals at greater risk of suffering from high levels of cybersickness.

4.2 Limitations and recommendations

A limitation for this study is that the sample was collected through a non-probabilistic convenience sampling strategy. Because of this, caution should be taken when generalizing to the Dominican population. On the other hand, the size of the collected sample could be considered large for VR studies, making it possible to detect effects of medium magnitude with sufficient statistical power (Weech et al. 2019). Another limitation of note is that, although we used a current-generation VR HMD (Oculus Rift), last-generation devices such as the HTC Vive have been shown to produce lower levels on some cybersickness symptoms (Caserman et al. 2021). Regarding this issue, we support recommendations in the literature for more studies of cybersickness that compare different types of VR HMDs (Caserman et al. 2021; Howard and Van Zandt 2021). Additionally, we recommend more cybersickness research on computer-aided virtual environments (CAVEs), which consist of multiple large screens rendered from the point of a person with head tracking (Rebenitsch and Owen 2016). Because CAVEs are expensive, difficult to calibrate and maintain, and require large rooms to implement, they are the least common VR display system in studies of cybersickness (Rebenitsch and Owen 2016).

4.3 Practical implications

The results from this study have shown that cybersickness resulting from VR immersions with HMDs is a common phenomenon that negatively impacts the user experience and the intentions to adopt the VR technology. As there is a strong relationship between intention and actual behavior (Norazah and Norbayah 2011; Sheeran 2002), these results suggest that users who suffer from intense cybersickness could be prevented from enjoying the diverse benefits that the VR technology has to offer. In areas such as education and medical training, for example, these users would be discriminated against if they could not partake in the same formative activities as the other students or trainees. Similarly, users more prone to cybersickness would find it more difficult to start or continue VR psychotherapy treatments due to the negative experiences with this technology. In this line, Stanney et al. (2020) point out that while the application possibilities of VR are unlimited, it may never reach its potential if the problems related to cybersickness cannot be solved. Further, they argue that this could create divisions between susceptible and non-susceptible individuals that could have an impact in many areas of their lives, including job opportunities and career advancement. Thus, research directed toward finding technological solutions to the problem of cybersickness are especially important (Stanney et al. 2020). In this line, there is a need to assess and consider potential adverse effects of VR with HMDs for individuals with autism spectrum disorders and sensory processing disorders (Newbutt et al. 2016; Schmidt et al 2021). Although the emerging literature of VR interventions for these populations has shown promising results (Bradley and Newbutt 2018; Glaser and Schmidt 2021; Newbutt et al. 2016), there is a lack of understanding regarding how these interfaces and environments might lead to cybersickness when experienced in HMDs by these individuals (Schmidt et al 2021).

The findings of this research also indicate that there are several predictor variables, and combinations of them, that can explain the intensity of the cybersickness symptoms. Variables such as motion sickness susceptibility, cognitive stress, past headaches, and age were able to explain a notable amount of variance of the various cybersickness scores we examined. These variables could be used to create a screening model of user susceptibility that could identify those individuals more at risk from suffering high levels of cybersickness. Indeed, there is research that has attempted to find generalizable predictive models of cybersickness (Bockleman and Lingum 2017; Rebenitsch and Owen 2021). If those users that are highly susceptible to cybersickness could be identified with sufficient accuracy, then special VR immersion protocols such as those proposed by Rebenitsch and Owen (2021) could be created for them in order to mitigate as much as possible their experiences of cybersickness. In this regard, the variables identified as predictors of the latent trajectories of cybersickness across the VR immersion are especially relevant, as they directly related to the feelings of discomfort that can lead users to prematurely terminate the VR experience. Other measures, such as dynamic field-of-view restriction or discrete viewpoint control (Farmani and Teather 2020; Teixeira and Palmisano 2020), can also be implemented to reduce the symptoms of cybersickness.

5 Conclusion

The present research had three main objectives: to estimate the pervasiveness and latent trajectories of cybersickness, to ascertain the impact of cybersickness on the VR experience, and to identify groups of variables capable of predicting the cybersickness symptomatology. Based on a robust sample and in the context of a psychotherapeutic VR environment for HMDs, we were able to determine that cybersickness is a common phenomenon that severely affects an important proportion of VR users. Additionally, the findings showed that cybersickness trajectories across the VR immersion are highly variable across individuals, but that at the mean level there is a strong near-linear increase of symptom severity with added time of VR exposure. Further, the current findings indicate that cybersickness has a notable negative impact on the sense of virtual presence, the enjoyment of the VR experience, and the intentions of future usage, thus highlighting the relevancy of the problem for mass adoption of HMDs and VR programs. Finally, we identified groups of predictor variables that were able to substantially explain the different cybersickness symptoms, as well as explain the latent trajectories of cybersickness during the VR immersion. These variables that uniquely predicted greater severity of cybersickness were motion sickness susceptibility, cognitive stress, recent headaches, and age (being younger). Overall, this study presents a broad overview of the occurrence, consequents, and antecedents of cybersickness than can help guide future research and practice related to VR experiences with HMDs.