1 Introduction

The recent emergence of virtual reality (VR) has contributed immensely to the advancement of technology and increased economic activity in this area. It has also shifted the focus of industry worldwide. In the past two decades, VR has been used in areas such as architecture, education and training, mobile technology, medical visualization, user interfaces, entertainment, and manufacturing (Azuma 1997; Höllerer et al. 1999; Zyda 2005; Pan et al. 2006; Van Krevelen and Poelman 2010; Kesim and Ozarslan 2012; Rodríguez et al. 2015; Stolz et al. 2019). These integrations of technology have enhanced the efficiency and accuracy of tasks in the workplace and daily activities. VR technologies have been combined to create a hybrid of virtual and augmented experiences (Raajan et al. 2012). The technology is still undergoing rapid development and has attracted the most attention in all fields of work and study, particularly in creating virtual content for entertainment, therapies, training practices, applications, and so on (Zyda 2005; Kesim and Ozarslan 2012).

VR appears to have a positive effect on people because the technology improves the experience of immersion, realism, interactivity, and co-existence (Steuer 1992; Psotka 1995; Ryan 1999; Lambooij et al. 2007, 2009; Bailenson et al. 2008; Clemente et al. 2014). However, there are also side effects, especially from head-mounted displays (HMD) with stereoscopic views, which prevent daily or long-term use of the devices. The technology is known to cause visually induced motion sickness (VIMS) (Hettinger and Riccio 1992; Kennedy et al. 2010; Naqvi et al. 2013) and symptoms such as visual fatigue, anxiety, nausea, disorientation, abdominal discomfort, and oculomotor symptoms (Mon-Williams et al. 1993; Lambooij et al. 2007, 2009; Diels et al. 2007; Bouchard et al. 2011; Carnegie and Rhee 2015). The severity of these adverse effects ranges from mild (some discomfort) to significant (distressing enough to prevent the user from carrying on with the experience). Some studies have even investigated the use of drugs such as hyoscine hydrobromide to relieve the associated motion sickness for a short period (Regan 1995; Regan and Ramsey 1996). This human factor may, ultimately, negatively influence the advancement and popularization of the VR industry. To eliminate this possibility, human factors such as VIMS need to be identified carefully and resolved (Bos et al. 2008). Numerous factors may trigger VIMS including postural instability, such as body swaying, and viewing conditions, such as gaze angle, fixation, retinal slip, and HMD fields of view (Smart et al. 2002; Yokota et al. 2005; Merhi et al. 2007; Diels et al. 2007; Bos et al. 2010; Moss and Muth 2011; Kim et al. 2018). The likelihood of these triggers to cause VIMS needs verification and guidelines for minimalizing VIMS are required to inform developers and users. Therefore, studies on standardized indicators that quantitatively measure VIMS must be conducted.

Previous studies have measured motion sickness using self-reporting, behavior, and physiological responses to improve the viewing experience of VR content. Self-reporting-based studies demonstrated that subjective rating scores from a simulator sickness questionnaire (SSQ) (Merhi et al. 2007; Sharples et al. 2008; Kiryu et al. 2008; Palmisano et al. 2017; Mazloumi Gavgani et al. 2018) and motion sickness susceptibility questionnaire (MSSQ) (Yokota et al. 2005; Kim et al. 2005; Nalivaiko et al. 2015; Chuang et al. 2016; Mazloumi Gavgani et al. 2018) increased when viewers experienced motion sickness. Other researchers measured motion sickness using the Coriolis test (Zuzewicz et al. 2011; Malinska et al. 2015) and the Graybiel and Hamilton questionnaire (Ohyama et al. 2007). In measuring motion sickness through observations of the viewer’s behavior, including head, body, and eye movements, Merhi et al. (2007) reported that movements measured from six positions in the X, Y, and Z axes of the head increased significantly when the subjects experienced motion sickness. In the study by Yokota et al. (2005), body sway (in the X and Y axes) during the presentation of a motion sickness-inducing stimulus was measured and was found to increase. Interestingly, another group reported that the area of the center of gravity (COG) tends to change from being elliptical to circular in shape when the subject experiences visually induced motion sickness (Chardonnet et al. 2015) and Kim et al. (2005) revealed that the rate of blinking is significantly positively correlated with the MSSQ score. Lastly, other researchers measured physiological responses to motion sickness such as heart rate, autonomic balance, respiration, and electroencephalogram (EEG) spectrum.

In terms of cardiac response, many previous studies found that an increased heart rate (Kim et al. 2005; Zuzewicz et al. 2011; Nalivaiko et al. 2015; Malinska et al. 2015) and activated sympathetic nervous system (i.e., increased low frequency (LF) and decreased high frequency (HF) on the heart rate variability (HRV) spectrum) (Uijtdehaage et al. 1992; Gianaros et al. 2003; Yokota et al. 2005; Ohyama et al. 2007; Kiryu et al. 2008; Zuzewicz et al. 2011; Malinska et al. 2015) are associated with motion sickness. In view of the activated sympathetic nervous system in response to motion sickness, one report mentions that skin temperature decreases and the galvanic skin response rises (Kim et al. 2005). Additionally, researchers have shown that respiration increases (Kim et al. 2005; Kiryu et al. 2008) and the spectral power of respiratory and blood pressure is activated (Kiryu et al. 2008) when motion sickness is being experienced. In studies related to brain response, motion sickness was assessed by brain activity observed on an EEG spectrum. Chuang et al. (2016) reported that increased motion sickness is correlated with the activation of alpha and gamma bands in motor, parietal, and occipital areas. The demonstrations of these phenomena have revealed an increase in the neurophysiological demand to process information from multi-modal sensory systems. Lin et al. (2013) showed that brain areas in the left and right motor, parietal, lateral, and midline occipital lobes are activated more than other brain areas while subjects are experiencing motion sickness. Additionally, the relative delta power increases and the relative beta power decreases in F3 and T3 regions (Kim et al. 2005). A summary of the literature associated with measuring motion sickness is given in Table 1.

Table 1 The summary of literature for the motion sickness measurement

The evaluation of motion sickness has thus far been scored subjectively according to predefined questionnaires and described using interview techniques. However, these questionnaires depend on personal interpretation and experience (Cain 2007). Since there are individual differences in interpretation, the repeatability and validity of subjective evaluations are often unclear and uncertain (Annett 2002; Cain 2007). Therefore, subjective rating is limited in that it cannot be used to assess motion sickness quantitatively. Moreover, the measurement of behavior using cameras or other devices is strongly influenced by the noise of the surrounding lights and movement. Also, the results of these measurements do not consider physiological mechanisms sufficiently to enable the phenomenon of motion sickness to be interpreted accurately. Motion sickness is strongly related to the subject’s capacity to process information and resources for interpreting multi-sensory stimuli that are produced by VR content (Lin et al. 2007, 2013; Chen et al. 2010; Chuang et al. 2016). The cause of VIMS needs to be interpreted by cognitive load, i.e., the large amount of information from VR content that the brain needs to process compared with that from 2D display content. Thus, psychophysiological measures are the objective method of assessing motion sickness through consideration of psychological processes rather than behavioral and subjective rating measures. However, the cognitive process is not controlled by brain function alone but does involve the phenomenon of heartbeat evoked potential (HEP). The vagus nerve in the heart and major organs communicates sensory information from the external environment to the brain through efferent and afferent pathways (Davis and Natelson 1993; Porges 1997, 2007). The heart’s response to sensory input has an effect on brain sensory systems such as emotional state, cognitive function, and performance (Hansen et al. 2003; McCraty et al. 2009; Park et al. 2014, 2015). Thus, there is a limitation to interpreting the phenomenon of VIMS from fragmentary responses in the heart or brain.

The purpose of this study was to determine a method for measuring motion sickness from VR content in HMD based on the HEP phenomenon and to propose new indicators for the evaluation of motion sickness. To assess motion sickness, HEP measurements (defined in Figs. 1, 5) taken from ECG and EEG signals collected while subjects viewed HMD content were compared with the same measurements made while subjects viewed 2D content. The proposed new indicators (latency and amplitude in HEP) in this study were compared with one (alpha power of HEP) used in a previous study (Park et al. 2015) by employing the multitrait-multimethod (MTMM) matrix for evaluating the reliability of variables Finally, selecting the useful classification features giving significant results in the HEP measurements, the motion-sickness and normal states were distinguished by various classifiers: linear support-vector machine (SVM), radial basis function (RBF) SVM, elastic net regularization, logistic regression (the LASSO model), and L2 (Ridge model) regularization. The accuracies, F1 scores, precisions, recalls, and areas under the curve (AUCs) were compared among the results of the various classifiers.

Fig. 1
figure 1

Overview for heartbeat evoked potential (HEP)

2 Heartbeat evoked potential

Heartbeat Evoked Potential (HEP) is a change in alpha brain waves to communicate the change of cardiac output such as blood pressure, heart rhythm, and variability from major organs such as heart to brain (Schandry and Montoya 1996). The communication occurs through a visceral nerve, known as “vagus nervous”, to transmit visceral-afferent information into various brain parts: hypothalamic and thalamic nuclei, amygdalae, hippocampus, cerebellum, somatosensory cortex, prefrontal cortex, and insula (Warner and Cox 1962; Montoya et al. 1993; Davis and Natelson 1993; Drew et al. 2008). First, the visceral nerve transmits the information to the NTS (nucleus tractus solitarius) in the brainstem. The joint information at the NTS is then sent to the mid-brain such as hypothalamus and thalamus, followed by the arrival at the cortex (Janig 1996). The mid-brain transfers information with neocortex, especially at the frontal and prefrontal brain cortices. Some researchers have focused their studies on the premotor and orbital areas of the frontal lobe because of the direct processing of information through visceral-afferent pathways from the mid-brain (hypothalamus and thalamus) to the prefrontal and frontal cortices (Nauta et al. 1986; Fuster 1988; Nieuwenhuys et al. 2007). These areas are related with attention and mental processes (Boussaoud 2001; Hartikainen and Knight 2003; Villena-Gonzalez et al. 2017), thus signifying the relationship between heartbeat and attention. Based on this relationship, a significant post-R-wave (250–450 ms) in Fz brain region was found (Schandry et al. 1986). Also, a negative shift was found in the range of 250–400 ms waveform at Fz, F7, F8, and Cz brain regions (Schandry and Weitkunat 1990). This significant change in HEP can be an indicator for assessing cortical activity such as attention and mental state by measuring the brain and heart synchronization (Pollatos and Schandry 2004; Fukushima et al. 2011; Lechinger et al. 2015).

A synchronization of heart and brain causes the brain to display a corresponding negative peak on the electroencephalography (EEG) simultaneously with the R-peak of the ECG signal. This synchronized response has previously been characterized as the event-related potential (ERP), and known to be related to the first and second periods of the HEP (McCraty et al. 2009; Park et al. 2015). The first period of HEP, 50–250 ms after the R-peak, reflects the time interval of the rate of change from heart to brain along the afferent pathway. When brain has more information to process, the heart–brain communication increases by the afferent pathway, and the synchronization in the alpha band also increases (Wölk et al. 1989; McCraty et al. 2009; Park et al. 2015). The second period of HEP, 250–600 ms after R-peak, indicates communication of both afferent signals and the hydraulic blood pressure wave from heart to brain, as well as alpha synchronization. An increased synchronization of the alpha wave in first period of HEP indicates an active state of cardiovascular information processing in the afferent pathway (Wölk et al. 1989; McCraty et al. 2009; Park et al. 2015). As shown in Fig. 1, HEP is a characteristic phenomenon from the synchronization of heart and brain through afferent pathways in the vagus nerve.

3 Methods

3.1 Participants

A total of 48 undergraduate volunteers of both genders (24 females) with ages ranging between 21 and 30 years (mean age, 24.95 ± 2.69 years) participated in this study. First, 28 participants (mean age, 25.04 ± 2.22 years) of both genders (14 females) were recruited to conduct statistical analysis. We have trained classifiers for using statistically significant features. Finally, 20 participants (mean age, 24.86 ± 3.16 years) of both genders (10 females) were recruited as test samples to validate the classifiers. All participants were right-handed and had no family or medical history of cardiovascular, autonomic, or central nervous system disorders. Every participant was asked to abstain from alcohol, cigarettes, and caffeine for 24 h prior to the experiment and to sleep normally. Consents from all participants, who were notified of the restrictions and requirements, were received. This research complied with the tenets of the Declaration of Helsinki and was approved by the Institutional Review Board at Sangmyung University (No. BE2018-46). Informed consent was obtained from each participant.

3.2 Experimental procedure

A pre-task was conducted to measure the sensitivity of participants to motion sickness prior to the main task. The participants were required to view the VR content “Ultimate Booster Experience” (GexagonVR, 2016) through an HTC VIVE device (HTC Inc., Taiwan & Valve Inc., USA) for 10 min and report their motion sickness using a subjective rating and interview. The subjective rating was composed of three factors including nausea (7 items), oculomotor symptoms (7 items), and disorientation (7 items) and is well-known as the SSQ (Kennedy et al. 1993). The participants were also asked to self-report subjective motion sickness using a four-point scale (0–3) for 16 items both before and after viewing the VR content with the following specifications: (1) nausea (general discomfort, increased salivation, sweating, nausea, difficulty concentrating, stomach awareness, and burping); (2) oculomotor symptoms (general discomfort, fatigue, headache, eyestrain, difficulty focusing, difficulty concentrating, and blurred vision); and (3) disorientation [difficulty focusing, nausea, fullness of head, blurred vision, dizzy (eyes open), dizzy (eyes close), and vertigo]. The subjective rating of motion sickness was calculated by the SSQ score of the following Eq. (1) (Kennedy et al. 1993):

$${\text{SSQ }}\;{\text{score}} = \left\{ {\left( {N \times 9.54} \right) + \left( {O \times 7.58} \right) + \left( {D \times 13.92} \right)} \right\} \times 3.74$$
(1)

where the values of N, O, and D were calculated by the summation of each of the items for nausea, oculomotor symptoms, and disorientation, respectively. Participants who did not feel or felt motion sickness severely enough to discontinue the experiment were excluded from the main experiment. Forty subjects participated in the pre-task; twenty-eight subjects were included in the main experiment.

They watched the 2D version of the VR content on the first day and the HMD version on the following day or vice versa. Because this study was designed “within subject design”, the order of tasks (i.e., 2D and HMD) was randomly decided based on a counterbalanced repeated measures design to minimize sequence/order effects. The content of “No Limits 2 Roller Coaster Simulation” (Ole Lange, Mad Data GmbH & Co. KG, 2014) was used in the experiment to cause motion sickness in participants; there were differences in 2D and VR versions only; however, the scene was identical. Participants watched both 2D and HMD versions using an LED monitor and HMD device, respectively. The VR contents used for the pre- and main tasks are shown in Fig. 2.

Fig. 2
figure 2

The VR contents used for the pre- and main tasks. Top image: VR content used in the pre-task (“Ultimate Booster Experience,” GexagonVR, 2016). Bottom image: VR content used in the main task (“No Limits 2 Roller Coaster Simulation,” Ole Lange, Mad Data GmbH & Co. KG, 2014)

The 2D and HMD versions were presented on a 27-inch LED monitor (27MP68HM, LG) and HTC VIVE device (HTC Inc., Taiwan & Valve Inc., USA). The participants were required to self-report subjective motion sickness using a 1 to 7 points scale in the SSQ both before and after the main experiment. A reference section was included for 5 min before and after the VR content. ECG and EEG signals were measured both before and after each VR content viewing period. The setup of the experiment procedure and environment are shown in Figs. 3 and 4.

Fig. 3
figure 3

The experimental procedure

Fig. 4
figure 4

The experimental environment and equipment

3.3 Data acquisition and signal processing

EEG signals were recorded at a 500 Hz sampling rate from three channels on the scalp at positions FP1, FPz, and FP2 based on the international “10–10” system (ground: FAz, reference: average between electrodes on the two ears, amplitude: 70 μV, and DC level: 0–150 Hz) and using a Mitsar-EEG 202 machine (Mitsar Inc., Russia). The electrode impedance was kept below 3 kΩ. The FP1, FPz, and FP2 regions were measured because these regions were strongly related to the HEP phenomenon (Montoya et al. 1993; McCraty et al. 2009). ECG signals were recorded at a 500 Hz sampling rate using an amplifier system (ECG 100C amplifiers in BIOPAC system Inc., USA) based on the Lead-I method. The ECG signal was digitized with the DAQ-Board (NI-DAQ-Pad9205 in National Instrument Inc., USA) and MP100 power supply (BIOPAC Systems Inc., USA).

The processing of HEP signals was as follows: (1) the ocular and muscular artifacts were removed from the EEG signals by artifact subspace reconstruction (Mullen et al. 2013); (2) the R-peak was detected from the ECG signals based on the QRS detection algorithm (Pan and Tompkins 1985); (3) EEG signals with the artifacts removed were separated from 50 to 600 ms based on R-peak; (4) these separated signals were averaged by the “grand average technique,” and this signal was defined as the HEP signal in this study; (5) HEP signals were then divided into the two components of interest—the first period in HEP (50–250 ms after the R-peak) and the second period in HEP (250–600 ms after the R-peak). The indicators of amplitude, latency, and alpha power in HEP were as follows. The amplitude of HEP was defined by the difference in value between positive and negative dominant peaks from the HEP signal in the range 50–600 ms. The latency of the first and second components of HEP was defined by the location (time value) of the dominant positive peak from each period. The alpha power of the HEP first and second components was defined by the relative power of the alpha band from each period based on fast Fourier transform (FFT) (Wölk et al. 1989; McCraty et al. 2009; Park et al. 2019). Signal processing and the definitions of indicators are shown in Fig. 5.

Fig. 5
figure 5

Examples of signal processing for HEP measurements (alpha power, latency, and amplitude of HEP waveform). A Removing the eye movement and blinking artifacts from EEG signals. B Detecting the R-peak from ECG signals. C EEG signals after removing the artifacts. D Data separation (trial) in EEG signals based on R-peak from ECG signals. E Grand average signal for all trials in EEG signals. F Definition for latency of HEP first and second components and amplitude of HEP. G Definition for alpha power of HEP first and second components

3.4 Statistical analysis

This study was designed to test and compare the viewer’s experience of motion sickness while experiencing both 2D and HMD contents “within subject design.” Therefore, a paired t-test was performed on sample data based on the normality test. In addition, because the independent t-test could not confirm the viewer’s state before watching the VR content, this study was also applied to an analysis of covariance (ANCOVA). The ANCOVA compared dependent variables between groups after the VR content with the pre-VR content baseline as a covariate (Keselman et al. 2016; McGibbon and Krebs 2004; Park et al. 2014; Mun et al. 2014). The statistical significance was controlled by the Bonferroni correction to resolve the problem caused by multiple comparison based on the number of each individual hypothesis (i.e., α = 0.05/n) (Dunnett 1955). For this experiment, the statistically significant level of the HEP measure was set to 0.0033 (HEP indicators: alpha power (6), amplitude (3), and latency (6), α = 0.05/15). The effect size based on Cohen’s d (Morris et al. 2014) and the partial eta-squared value (\(\eta_{p}^{2}\)) (de Morree et al. 2014) were calculated to confirm not only the statistical significance, but also the effect size. In the case of Cohen’s d, standard values of 0.10, 0.25, and 0.40 for effect size were generally regarded as small, medium, and large, respectively. In the case of the partial eta-squared value, standard values of 0.01, 0.06, and 0.14 for effect size were generally regarded as small, medium, and large, respectively (Huck et al. 1974). Also, the MTMM matrix was applied to verify test–retest reliability, convergent, and discriminant validity among various motion sickness indicators such as SSQ score, amplitude, alpha power, and latency of HEP in the FP1, FPz, and FP2 regions. If data samples involve the multi-trait and the multi-method, the MTMM matrix evaluates the relationship between multiple measures. This study defined the multi-trait and the multi-method as HEP measures and display types (2D and HMD), respectively. By confirming the monomethod-monotrait (reliability diagonal), monomethod-heterotrait, and heteromethod-monotrait, the test–retest reliability, discriminant validity, and convergent validity were tested and verified (Campbell and Fiske 1959). All statistical data analysis (i.e., subjective ratings, HEP measures, and MTMM matrix) was conducted using IBM SPSS Statistics 21.0 for Windows (SPSS Inc., USA).

3.5 Classification

The classification into motion sickness and normal state was performed by five classification algorithms: SVM, RBF–SVM, elastic net regularization, LASSO model, and L2 (Ridge model) regularization. These algorithms were chosen for their popularity in biomedical data classification (Zhou et al. 2010; Herrera et al. 2013; Li et al. 2016). Fifteen HEP features (six alpha power features and six latency features of the HEP first and second components in the FP1, FP2, and FPz regions; three amplitudes of HEP waveforms in the FP1, FP2, and FPz regions) were extracted from our experimental data, and the ten statistical features showing statistically significant results were trained by the five-classification algorithm on a 28-subject dataset. The classification performances of the five algorithms were evaluated by their accuracies, F1 scores, precisions, recalls, and AUCs (James et al. 2013; Saito and Rehmsmeier 2017) on a new dataset of 20 subjects. The classification measures are defined below:

  • Accuracy: proportion of correct predictions among the total number of predictions.

    $${\text{Accuracy}} = \left( {{\text{TP}} + {\text{TN}}} \right)/\left( {{\text{TP}} + {\text{FN}} + {\text{TN}} + {\text{FP}}} \right)$$
    (2)
  • Recall: ratio of correctly predicted positive observations to all observations in the actual class.

    $${\text{Recall}} = {\text{TP}}/\left( {{\text{TP}} + {\text{FN}}} \right)$$
    (3)
  • Precision: ratio of correctly predicted positive observations to all predicted positive observations.

    $${\text{Precision}} = {\text{TP}}/\left( {{\text{TP}} + {\text{FP}}} \right)$$
    (4)
  • F1 Score: weighted average of Precision and Recall. Note that this score accounts for both false positives and false negatives.

    $${\text{F1 Score}} = 2*\left( {{\text{Recall}}*{\text{Precision}}} \right)/\left( {{\text{Recall}} + {\text{Precision}}} \right)$$
    (5)
  • AUC: area under the receiver operating characteristics (ROC) curve. The AUC value lies between 0.5 (bad classifier) and 1 (excellent classifier).

In the above expressions, TP and FN denote the numbers of correctly classified and incorrectly classified motion-sickness instances, respectively, TN is the number of true negative classifications, and FP is the number of true positive classifications.

3.5.1 Logistic regression: Lasso, Ridge, and ElasticNet

Logistic regression is the appropriate analysis technique for dichotomous (binary) dependent variables. This linear classifier sums the weighted polynomials of features (Pereira et al. 2016). Logistic regression methods can be trained with different objective functions. For this purpose, we selected three objective functions: Lasso (Zhang et al. 2012), Ridge (Cessie and Houwelingen 1992), and ElasticNet (Zou and Hastie 2005), respectively.

3.5.2 Support vector machine: linear and RBF

SVM is widely used in physiological and biomedical data classification (Diykh and Li 2016). The SVM classifier finds the optimal hyperplane that maximizes the margin between two groups. The hyperplane is determined by the following decision function (Lima et al. 2009). This classifier determines the prediction class (positive if f(x) exceeds 1, negative if f(x) is less than − 1). We applied a linear SVM with no kernel (Chang et al. 2010), and SVM using an RBF kernel (Zhao et al. 2011).

4 Results

4.1 Subjective rating

For the HMD viewing condition, a paired-samples t-test showed significant differences between the pre- and post-viewing conditions for the total SSQ score (t[54] = -10.801, p = 0.0000, with large effect size [Cohen’s d = 2.940]). However, no significant differences were found between the ratings obtained pre- and post-viewing in the 2D viewing condition (t[54] = 0.050, p = 0.9604, Cohen’s d = 0.014), as shown in Fig. 6. ANCOVA was also performed to further compare the differences in the total SSQ scores between the 2D and HMD viewing conditions. There was a significant difference in the total SSQ score post-viewing with an adjusted total SSQ score in the pre-viewing condition as a covariate (F[1, 54] = 127.989, p = 0.0000, with large effect size [\(\eta_{p}^{2}\) = 0.707]), as shown in Fig. 6.

Fig. 6
figure 6

Average subjective rating for motion sickness in the 2D and HMD conditions. There was a significant difference in the total SSQ score between the 2D and HMD groups based on a paired t-test and ANCOVA (*p < 0.05, **p < 0.01, ***p < 0.001)

4.2 Alpha power, latency, and amplitude of HEP

In assessing the results from the HMD viewing task, a paired-samples t-test showed that the alpha power of the first HEP component post-viewing was significantly higher than that found before the viewing task in FP1 (t[54] =  − 4.197, p = 0.0001, with large effect size [Cohen’s d = 1.142]), FPz (t[54] = − 4.296, p = 0.0000, with large effect size [Cohen’s d = 1.169]), and FP2 (t[54] = − 4.258, p = 0.0000, with large effect size [Cohen’s d = 1.159]). No significant differences were found in the alpha power of the second HEP component in FP1 (t[54] = -2.536, p = 0.0141, with large effect size [Cohen’s d = 0.690]), FPz (t[54] = −2.454, p = 0.0174, with large effect size [Cohen’s d = 0.668]), or FP2 (t[54] = −2.289, p = 0.0260, with large effect size [Cohen’s d = 0.623]). In considering the 2D viewing tasks, a paired-samples t-test showed that there was no significant difference pre- and post-viewing in either the alpha power of the first or second components in all regions, as shown in Fig. 7.

Fig. 7
figure 7

Average alpha power of first and second HEP components of the 2D and HMD viewing conditions in the FP1, FPz, and FP2 regions. There was significant difference in the alpha power of the first HEP component between the 2D and HMD groups in all regions but the alpha power of the second HEP component was not significant based on a paired t-test and ANCOVA (*p < 0.05, **p < 0.0033, ***p < 0.001)

When comparing the 2D and HMD groups based on ANCOVA, the alpha power of the first HEP component in the HMD group was significantly higher than that in the 2D group in FP1 (F[1, 54] = 151.753, p = 0.0001, with large effect size [\(\eta_{p}^{2}\) = 0.254]), FPz (F[1, 54] = 154.865, p = 0.0001, with large effect size [\(\eta_{p}^{2}\) = 0.258]), and FP2 (F[1, 54] = 149.505, p = 0.0001, with large effect size [\(\eta_{p}^{2}\) = 0.257]) with the alpha power of the HEP first component adjusted in the pre-viewing condition as a covariate. No significant differences were found in the alpha power of the second HEP component in regions FP1 (F[1, 54] = 6.326, p = 0.0150, with medium effect size [\(\eta_{p}^{2}\) = 0.107]), FPz (F[1, 54] = 5.957, p = 0.0179, with medium effect size [\(\eta_{p}^{2}\) = 0.101]), and FP2 (F[1, 54] = 5.123, p = 0.0277, with medium effect size [\(\eta_{p}^{2}\) = 0.088]), with the adjusted alpha power of the second HEP component in the pre-viewing condition as a covariate, as shown in Fig. 7.

As an example, one participant’s changes in HEP waveform in the FP1, FPz, and FP2 regions before and after viewing the 2D and HMD are shown in Fig. 8. There were minute differences in the dominant positive peak of the first (50–250 ms) and second (250–600 ms) HEP periods before and after viewing in 2D. Interestingly, the dominant peak had an advance after viewing the HMD, instead of before, in both the first and second HEP components. Also, the amplitude (difference between the dominant positive and negative peaks) of the HEP waveform before and after viewing in 2D showed small differences. The amplitude was lower after viewing the HMD content than before watching it.

Fig. 8
figure 8

An example of the changes (before and after viewing) in HEP latency and amplitude for both the 2D and HMD viewing conditions in the FP1, FPz, and FP2 regions for Participant 11. The left and right sides display the results for the 2D and HMD conditions, respectively. The top, mid, and bottom lines are shown FP1, FPz, and FP2 regions, respectively. The difference in HEP latency and amplitude in the FP1, FPz, and FP2 regions are as follows: latency of the first HEP component (2D: 222–206, 206–208, and 224–222 ms; HMD: 244–220, 246–218, and 242–224 ms), latency of the second HEP component (2D: 548–546, 520–542, and 532–550 ms; HMD: 448–396, 558–390, and 560–394 ms), and amplitude of the HEP (2D: 1.208–1.190, 0.947–0.920, and 3.399–3.223 uV; HMD: 2.907–2.326, 3.226–2.895, and 3.205–2.721 uV)

In analyzing the results of the HMD viewing task, a paired-samples t-test showed that the latency of the first HEP component post-viewing was significantly lower than that for the pre-viewing condition in FP1 (t[54] = 5.990, p = 0.0000, with large effect size [Cohen’s d = 1.630]), FPz (t[54] = 5.886, p = 0.0000, with large effect size [Cohen’s d = 1.602]), and FP2 (t[54] = 6.342, p = 0.0000, with large effect size [Cohen’s d = 1.726]). The latency of the second HEP component post-viewing was also significantly lower compared with the pre-viewing condition in FP1 (t[54] = 4.154, p = 0.0001, with large effect size [Cohen’s d = 1.131]), FPz (t[54] = 4.250, p = 0.0001, with large effect size [Cohen’s d = 1.157]), and FP2 (t[54] = 4.317, p = 0.0001, with large effect size [Cohen’s d = 1.175]). In analyzing the results of the 2D viewing tasks, a paired-samples t-test showed that there was no significant difference in the latencies of the first and second HEP components pre- and post-viewing in all regions, as shown in Fig. 9.

Fig. 9
figure 9

Average latency of first and second HEP components for the 2D and HMD conditions in the FP1, FPz, and FP2 regions. There was significant difference in the latency of the first and second HEP components between the 2D and HMD groups in all regions based on a paired t-test and ANCOVA (*p < 0.05, **p < 0.0033, ***p < 0.001)

When comparing the 2D and HMD groups based on ANCOVA, the latency of the first HEP component in the HMD group was significantly lower than that for the 2D group in FP1 (F[1, 54] = 47.625, p = 0.0000, with large effect size [\(\eta_{p}^{2}\) = 0.473]), FPz (F[1, 54] = 44.355, p = 0.0000, with large effect size [\(\eta_{p}^{2}\) = 0.456]), and FP2 (F[1, 54] = 57.567, p = 0.0000, with large effect size [\(\eta_{p}^{2}\) = 0.521]) with the adjusted latency of first HEP component in the pre-viewing condition as a covariate. The latency of second HEP component in the HMD group was significantly lower than that for the 2D group in FP1 (F[1, 54] = 16.236, p = 0.0002, with large effect size [\(\eta_{p}^{2}\) = 0.235]), FPz (F[1, 54] = 17.278, p = 0.0001, with large effect size [\(\eta_{p}^{2}\) = 0.246]), and FP2 (F[1, 54] = 17.919, p = 0.0001, with large effect size [\(\eta_{p}^{2}\) = 0.253]) with the adjusted latency of second HEP component in the pre-viewing condition as a covariate, as shown in Fig. 9.

In assessing the results of the HMD viewing tasks, a paired-samples t-test showed that the amplitude of the HEP post-viewing was significantly lower than that pre-viewing in the FPz (t[54] = 3.076, p = 0.00329, with large effect size [Cohen’s d = 0.837]) and FP2 (t[54] = 3.557, p = 0.0008, with large effect size [Cohen’s d = 0.968]). No significant differences were found pre- and post-viewing in the amplitudes of the HEP in FP1 (t[54] = 2.639, p = 0.0108, with large effect size [Cohen’s d = 0.718]). For the 2D viewing condition, a paired-samples t-test showed that there were no significant pre- and post-viewing differences in the latencies of the HEP in all regions, as shown in Fig. 10.

Fig. 10
figure 10

Average amplitude of the HEP for the 2D and HMD conditions in FP1, FPz, and FP2. There was a significant difference in the amplitude of the HEP between the 2D and HMD groups in FP2 but no significant difference in the other regions (FP1 and FPz) based on a paired t-test and ANCOVA (*p < 0.05, **p < 0.0033, ***p < 0.001)

From the comparison between the 2D and HMD groups based on ANCOVA, the amplitude of the HEP in the HMD group was significantly lower than the amplitude measured for the 2D group in FP2 (F[1, 54] = 12.475, p = 0.0009, with large effect size [\(\eta_{p}^{2}\) = 0.191]) with an adjusted amplitude of HEP in the pre-viewing condition as a covariate. However, in the other regions, no significant differences were found in the amplitude of the second HEP component in FP1 (F[1, 54] = 6.9976, p = 0.0108, with medium effect size [\(\eta_{p}^{2}\) = 0.116]) and FPz (F[1, 54] = 9.384, p = 0.0034, with medium effect size [\(\eta_{p}^{2}\) = 0.150]) with an adjusted alpha power of the second HEP component in the pre-viewing condition as a covariate, as shown in Fig. 10.

4.3 Correlation analysis

As seen Fig. 11, we drew the plot for residuals of SSQ scores and significant features of HEP (AP(1)FP1, AP(1)FPz, AP(1)FP2, L(1)FP1, L(1)FPz, L(1)FP2, L(2)FP1, L(2)FPz, L(2)FP1, AFP2) with linear regression lines. Correlation coefficients between SSQ scores and each HEP features in the post-viewing condition considering pre-viewing condition are statistically significant (AP(1)FP1: r = 0.531, p < 0.05; AP(1)FPz: r = 0.564, p < 0.05; AP(1)FP2: r = 0.542, p < 0.05; L(1)FP1: r = 0.642, p < 0.05; L(1)FPz: r = 0.625, p < 0.05; L(1)FP2: r = 0.683, p < 0.05; L(2)FP1: r = 0.642, p < 0.05; L(2)FPz: r = 0.643, p < 0.05; L(2)FP1: r = 0.628, p < 0.05; AFP2: r = 0.497, p < 0.05).

Fig. 11
figure 11

Results of correlation analysis among HEP features and SSQ score (p < 0.05 [n = 56])

4.4 MTMM matrix

In our research, the multi-method was defined by the 2D and HMD viewing conditions and the multi-trait was defined by measurements such as the alpha power of first HEP component (FP1, FPz, and FP2 regions), the latency of the first and second HEP components (FP1, FPz, and FP2 regions), and the amplitude of HEP (FP2 region) based on measures of statistical significance. The detailed results of the MTMM analysis are shown in Table 2.

Table 2 Multitrait-multimethod matrix

Firstly, the test–retest reliability was defined by the main diagonal of the MTMM correlation matrix. The HEP measures (alpha power, latency, and amplitude of HEP) showed good reliability in range of 0.752 to 0.785 (over 0.700) in both the 2D and HMD viewing conditions. The alpha power of the first HEP component in the FP1, FPz, and FP2 regions revealed good reliability in both the 2D (0.785, 0.785, and 0.785) and HMD (0.785, 0.785, and 0.785) viewing conditions. The latency of the first HEP component in FP1, FPz, and FP2 revealed good reliability in both the 2D (first component: 0.761, 0.762, and 0.759; second component: 0.757, 0.762, and 0.758) and HMD (first component: 0.759, 0.790, and 0.761; second component: 0.759, 0.761, and 0.753) viewing conditions. The amplitude of HEP in FP2 revealed good reliability in both the 2D (0.782) and HMD (0.782) viewing conditions. The SSQ score showed moderated reliability in both the 2D (0.461) and HMD (0.472) viewing condition, and had low reliability comparison with HEP features. Secondly, the discriminant validity was defined by the heterotrait–monomethod triangles. The correlation coefficients among SSQ score and latency of HEP second component (FP1, FPz, and FP2 regions) showed a low and medium negative correlation ranging from − 0.241 to − 0.448, and revealed no significant results with other HEP features. The SSQ score did have discriminant validity with HEP features. The correlation coefficients between the alpha power and latency of the first HEP component measures (FP1, FPz, and FP2 regions) revealed a strong negative correlation in the range of − 0.914 to − 0.928. In contrast, the correlation coefficients associated with the latency of the second HEP component demonstrated a medium negative correlation ranging from − 0.354 to − 0.422. The correlation coefficients associated with the amplitude of HEP showed a medium negative correlation ranging from − 0.321 to − 0.416. The latency of the first HEP component did not have discriminant validity with the alpha power of the first HEP component, but other measures did have discriminant validity with the alpha power measure. Lastly, convergent validity was defined by the monotrait-heteromethod (validity diagonal). The HEP latency measures showed higher correlation (0.529–0.686) than other measures (SSQ score: 0.246; alpha power of first HEP component: 0.451–0.471; amplitude of HEP: − 0.367). In particular, the correlation coefficients for the latency of the first HEP component showed the highest positive correlation (0.639–0.686).

4.5 Classification

In this experiment, the motion-sickness and normal groups were classified by linear-SVM, RBF–SVM, elastic net regularization, the LASSO model, and L2 (Ridge model) regularization. Ten features (three alpha powers of the HEP first component in the FP1, FP2, and FPz regions, six latencies of the HEP first and second components in the FP1, FP2, and FPz regions, and the amplitude of the HEP waveform in FP2 region) were statistically significant for classification (see Figs. 7, 9, and 10). The classification algorithm was trained on the 28-subject dataset, and its performance was evaluated on a new 20-subject dataset. The comparison results between the 2D and HMD conditions based on the significant features in the 20-subject dataset are shown in Table 3.

Table 3 The analysis results of HEP measurements for validation data (n = 20) between 2D and HMD conditions

The logistic regression classifiers (elastic net regularization and logistic regression with L1 and L2) clearly distinguished motion sickness from the normal state, with classification accuracies of 0.850, 0.900, and 0.898 respectively, F1 scores of 0.870, 0.870, and 0.851 respectively, precisions of 0.769, 0.769, and 0.741 respectively, and AUCs of 0.895, 0.900, and 0.898, respectively. The recall was 1.0 in all three algorithms (Table 4). The regulation parameters were α = 4.8 for Lasso, α = 3.5 for Ridge, and α = 2.7 and γ = 0.98 for elastic net. The linear and RBF SVM classifiers also distinguished between motion sickness and the normal state, with classification accuracies of 0.850 and 0.875 respectively, F1 scores of 0.870 and 0.865 respectively, precisions of 0.769 and 0.941 respectively, recalls of 1.0 and 0.8 respectively, and AUCs of 0.898 and 0.963, respectively (Table 4). The regulation parameters were α = 1.7 and γ = 9.4 for RBF–SVM, and α = 3.5 for linear SVM. The ROC curves for evaluating the classification performance, and the t-stochastic neighbor embedding (t-SNE) for vector visualization, are shown in Fig. 12.

Table 4 The comparison of the performance of difference types of classifiers
Fig. 12
figure 12

ROC curves (left) for five classification methods and t-SNE data plot (right)

5 Discussion

VIMS is a major obstacle to the development of the VR industry and the HMD device in particular. Many previous studies have tried to measure the motion sickness in order to resolve this problem. However, the previously proposed methods had limitations and there has not yet been a standardized method suggested. The aim of this study was to develop an advanced method for measuring motion sickness based on cognitive function using heart–brain synchronization by studying HEPs. This study proposed new indicators such as latency and amplitude of HEP to assess motion sickness and compared this with the alpha power of HEP from a previous study based on the MTMM matrix. Based on the SSQ, this study confirmed whether 2D and HMD viewing conditions cause motion sickness. Following the subjective rating obtained from the SSQ, participants experienced motion sickness after the HMD viewing task, but not after the 2D viewing task. This result is consistent with previous studies (Kennedy et al. 1993; Merhi et al. 2007; Sharples et al. 2008; Kiryu et al. 2008; Chardonnet et al. 2015; Palmisano et al. 2017).

Overall, our research yields three significant findings. Firstly, the alpha powers of the first HEP components in the FP1, FPz, and FP2 regions were significantly lower when motion sickness was being experienced. In previous studies, brain sensory processing was found to be influenced by changes in heart rhythm via afferent and efferent pathways, which are related to cognitive functions (Hansen et al. 2003; McCraty et al. 2009; Park et al. 2014, 2015). An increase in the alpha power of the first HEP component is related to the time interval for “rate of change” information to transmit from the heart to the brain through afferent pathways in the vagus nerve (Wölk et al. 1989; McCraty et al. 2009). Park et al. (2015) reported that the alpha power of the first HEP component was increased during cognitive loading and that result is consistent with this research. If information about cardiac rhythm is transmitted rapidly to the brain (increasing the alpha power in the first HEP component), the brain requires information rapidly through sensory input to activate cognitive processing. Thus, as determined by this study, the increasing alpha power of the first HEP component can be interpreted as showing that cognitive load is the cause of motion sickness. Also, many previous studies have demonstrated that motion sickness is strongly related to the cognitive load caused by experiencing VR content (Lin et al. 2007, 2013; Chen et al. 2010; Chuang et al. 2016). An increase in the alpha power of second HEP component is related to the time taken for the pulse wave from the heart to be transmitted to the brain (Wölk et al. 1989; McCraty et al. 2009). If the pulse wave is rapidly transmitted to the brain, the brain requires blood flow to achieve increased information processing. The alpha power of the second HEP component raised during cognitive loading (Park et al. 2015). In the results of this study, the alpha power of the second HEP component in all brain regions tended to decrease during the experience of motion sickness, but this was not statistically significant based on the Bonferroni correction. Secondly, the latencies of the first and second HEP components in FP1, FPz, and FP2 were significantly lower during motion sickness. This result also can be interpreted in terms of cognitive load. As mentioned above, the first and second components of the HEP are strongly related to the information transfer rate to the brain from the heart (Wölk et al. 1989; McCraty et al. 2009; Park et al. 2015). Increasing the transfer rate is highly correlated with activating cognitive load, based on the alpha power (Park et al. 2015). Decreasing latencies of first and second HEP components is also related to the information transfer rate to the brain from the heart. The first and second components represent the average time in which cardiac rhythm information is transferred from heart to brain (range 50–250 ms and 250–600 ms after the R-peak) (Wölk et al. 1989; McCraty et al. 2009; Park et al. 2015). Moreover, brain waves in prefrontal and frontal areas are influenced by information about the cardiac rhythm (Schandry et al. 1986; Wölk et al. 1989; Schandry and Weitkunat 1990; McCraty et al. 2009; Park et al. 2015). In this study, the dominant response (peak) from the HEP waveform was extracted from the micro-response in the brain wave caused by the heartbeat using signal averaging techniques. The location of the dominant peak (latency) revealed the time taken for information to be transmitted to the brain from the heart through afferent pathways in the vagus nerve. If the latencies in the first and second HEP components decrease, the increase in the transfer rate from the heart is caused by the requirements of the brain. Thus, decreases in the latency of HEP can be interpreted as cognitive load, thus quantitatively assessing motion sickness. Lastly, the amplitude of the HEP in FP2 was significantly reduced during motion sickness and revealed a medium negative correlation (− 0.321 to − 0.416) with the alpha power of the second HEP component, which was the indicator of cognitive load.

The HEP waveform is the evoked potential caused by the heartbeat in a similar way to the ERP response (by event stimulus). The amplitudes in ERPs are strongly related to high-level cognitive processing measures such as task difficulty, selective attention, and mental workload (Friederici et al. 1993; Uetake and Murata 2000; Kok 2001; Murata et al. 2005; Cheng et al. 2007; Li et al. 2008; Kato et al. 2009; Miller et al. 2011; Mun et al. 2012, 2014; Kathner et al. 2014; Park and Mun 2015; Chang et al. 2017; Getzmann et al. 2018). ERP amplitude is consistent with the inhibition function in the brain. For low level external stimuli, the brain controls decrease the inhibition function to efficiently process the information before revealing a large ERP amplitude (Polich 2007). Thus, decreasing the ERP amplitude is closely related to cognitive load and decreasing the amplitude of the HEP waveform can be interpreted in the same context.

The HEP measures such as alpha power, latency, and amplitude found in this study were significantly different when comparing the HMD (motion sickness inducing) and 2D viewing conditions. These measures are strongly related to the cognitive mechanisms underlying the mental workload. Thus, the phenomenon of motion sickness can be interpreted as the degradation of the human visual system due to sensory overload. This is similar to the 3D visual fatigue found in previous studies (Li et al. 2008; Lambooij et al. 2009; Mun et al. 2012; Park et al. 2014, 2015). The results presented here enable the quantitative assessment of motion sickness and will assist with the establishment of guidelines regarding HMD-based viewing of VR content.

The MTMM matrix, which was found to be very reliable in all HEP measures rather than SSQ score, was internally consistent in both the 2D and HMD conditions. Therefore, the HEP measures showed strongly reliable repeat measurements and consistent and high correlation with the multi-method (2D or HMD). Discriminant validity showed that the alpha power of the first HEP component did not identify a relationship with the latency of the first HEP component but did with other measures such as the latency of the second HEP component and the amplitude of the HEP. The HEP alpha power validated the indicator for cognitive load (Park et al. 2015) and the correlation coefficient between the alpha power and the latency of the first HEP component revealed a strong negative correlation. Generally, electrophysiology features showed high reliability rather than non-electrophysiology such as subjective rating (Park et al. 2015). We found that SSQ score showed significant discrepancy in reliability with electrophysiology measures. Also, the convergent validity (monotrait-heteromethod) was defined by the correlation between two measures of the same trait with two different methods (2D and HMD). Because the two measures are of the same trait, these measures are expected to be strongly correlated. The HEP latency (first component) measures had a higher correlation for the method than other measures. In summary, the latency of the first HEP component had a higher correlation with HEP alpha power, which is well-known as being associated with cognitive load, and higher test–retest reliability and convergent validity than other measures. This measurement, therefore, is recommended to provide a better quantitative evaluation of motion sickness and cognitive load than alpha power and other measures.

Among the algorithms for classifying the motion sickness and normal groups, the RBF–SVM achieved the highest average recognition accuracy (0.964 on the training set and 0.875 on the test set). Hence, RBF–SVM is a suitable promising classifier for motion sickness. To better illustrate the study findings, this paper compared the methods and results with those of similar studies. In previous studies, the accuracy of recognizing motion sickness was 0.793–0.996 in the training set and 0.721 in the test set (one example), as shown in Table. 5. In terms of accuracy, sample size, and validation results, our methods outperformed the existing state-of-the-art classification methods for motion-sickness detection.

Table 5 Performance comparison of our method and previous methods in motion-sickness recognition

6 Conclusion

The aim of this study was to determine a method for measuring motion sickness experience by watching VR content on a HMD using the HEP phenomenon and to propose a new indicator for evaluating motion sickness (cognitive function). This study confirmed that motion sickness leads to a decay in cognitive processing in the brain caused by multi-sensory input as demonstrated by reductions in the alpha power of the first HEP component in regions FP1, FPz, and FP2. Also, the proposed indicators such as latency (first component in FP1, FPz, and FP2) and amplitude (FP2) of the HEP waveform in this study were significantly different when participants experienced motion sickness and showed higher correlations with alpha power measures (cognitive load). In particular, latencies in the first HEP component was recommended as better quantitative evaluators of motion sickness (cognitive load) than alpha power and other measures when test–retest reliability, discriminant, and convergent validity were verified by the MTMM matrix. Because the HEP measures were extracted from the HEP waveforms of the heartbeat, the proposed method is more flexible than offline methods such as the ERP method, which requires specific tasks. In addition, our proposed method implemented in RBF–SVM more successfully classified the motion-sickness state than state-of-the-art recognition methods for motion sickness, demonstrating a higher performance than previous studies. The proposed HEP measurement method is useful for quantifying motion sickness and determining the optimal viewing parameters of VR content, including the viewer characteristics, viewing environment, content, and device factors. These results will improve the popularization of VR and invigorate the development of future VR with suppressed negative side effects.