1 Introduction

Virtual reality (VR) has gained popularity as clinical intervention and assessment tool, such as a VR exposure therapy (VRET) for delivering treatment for PTSD (Deng et al. 2019), anxiety disorders (Carl et al. 2019) and multiple phobias (Wechsler et al. 2019). The advantage offered by VR over traditional technology is a greater sense of presence thought to enhance the effectiveness of exposure-based interventions (Cummings and Bailenson 2016). Presence represents the extent to which an individual experiences the digital environment as the one in which they are consciously present (Cummings and Bailenson 2016). Presence is directly related to immersion, which is the perception of being enveloped by an environment that provides a continuous flow of sensory information (Witmer and Singer 1998). The immersive capabilities of a system, such as the ability to isolate the user from their physical environment, are key factors in generating a greater sense of presence (Witmer and Singer 1998). However, there are different hypotheses as to how these immersive capabilities of VR systems generate greater presence on basic processing levels (Freeman et al. 2005; Wirth et al. 2007). Moreover, existing theoretical frameworks to date ignored in the research field may allow us to further tease apart the different psychological mechanisms involved in those processes (c.f. Bradley and Lang 2000). Understanding how immersive capabilities underpin this sense of presence in VR relative to standard technologies would not only be theoretically useful but may provide important insight into refining VR-based exposure therapies.

Presence is widely regarded as a key factor in facilitating real emotional responses to virtual environments (Parsons and Rizzo 2008; Price et al. 2011). Indeed, presence has been associated with elicitation of greater levels of fear (Price et al. 2011) and anxiety (Ling et al. 2014), which are essential for exposure therapy to be successful (Benito and Walther 2015). Indeed, virtual threat in VRET generates as much distress, anxiety and physiological arousal as in-vivo exposure to threat (Owens and Beidel 2015; Kishimoto and Ding 2019). While the sense of presence appears to be directly associated with the effectiveness of VRETs, little research has been conducted to systematically identify the fundamental factors that influence presence in VR compared to standard modes of presentation (such as computer monitors/screens). These factors can broadly be divided into two groups: system-driven factors of the VR hardware and media-driven factors of the virtual environment. Therefore, the aims of the current study were to examine the (1) system-driven factors that are exclusive to VR devices, while controlling general factors such as field of view and image quality; (2) media-driven factors of the virtual environment eliciting motivational salience through different levels of arousal and valence (relaxing, exciting and fear evoking stimuli); and (3) the effects of presence on magnifying affective response.

1.1 System-driven factors in the elicitation of presence

Although VR hardware can take many forms, the most common is the use of head-mounted displays (HMD), which are goggle-like devices that display stereoscopic images coupled with rapid head tracking and large fields of view. They elicit higher levels of presence than standard viewing modes (Cummings and Bailenson 2016), which is thought to lead to an increase in engagement with the displayed media compared to standard viewing modes (Buttussi and Chittaro 2017). The greater sense of presence experienced in VR compared to standard viewing is likely due to the superior immersive capabilities of HMD’s (Diemer et al. 2015). A meta-analysis of research using immersive technology suggests that overall, system-driven immersive factors (including stereoscopic vision, tracking level such as head tracking or controller driven, image quality, field of view, sound quality and update rate) have a medium effect size on presence. In particular, head tracking, stereoscopic vision and large field of view (FoV) have the largest effect size (Cummings and Bailenson 2016). Of these immersive factors, only stereoscopic vision and head tracking are exclusive to HMDs when compared to standard viewing modes. However, previous studies examining elicitation of presence between HMDs and non-immersive technologies have not controlled for the system-driven immersive capabilities that are not exclusive to HMDs (Cummings and Bailenson 2016). The superiority of VR against other techniques may be simply due to the large FoV that HMD’s offer. Indeed, larger FoV has been associated with larger effect sizes in anxiety generation in VRET studies (Ling et al. 2014). However, large FoVs can be achieved with standard viewing modes and may thus be just as immersive. Therefore, the current study aims to test the ability of HMDs to elicit higher levels of presence than standard viewing mode, through careful control of non-HMD specific immersive factors (FoV, image quality, sound quality and update rate), allowing for a systematic examination of the role of the HMD exclusive combination of head tracking and stereoscopy on presence.

1.2 Media-driven factors in the elicitation of presence

The media viewed can also affect users’ sense of presence. For example, first-person point of view (Cummings and Bailenson 2016; Ling et al. 2013a, b) and use of emotion eliciting narrative, such as narratives that promote a sense of urgency (Gorini et al. 2011), have consistently been found to elicit greater presence. Consequently, eliciting intense emotional reactions is often the aim when developing entertainment in virtual worlds. One important media-driven factor affecting presence is thought to be the level of arousal elicited by the stimuli (Freeman et al. 2005). It has been argued that increased arousal causes users to become more alert, ready to respond to events that require action and attentive towards the presented stimuli (Carretié 2014). The arousal theory fits with models of the formation of presence that propose that attention needs to be directed to the virtual environment, rather than the actual environment, to achieve a sense of presence and resolve conflict between spatial situational models (Wirth et al. 2007). However, the arousal theory does not account for stimulus valence which is central to generating motivational salience in emotion processing (Bradley and Lang 2000).

Theoretically, emotion is fundamentally organised around two motivational systems; appetitive and defensive (Bradley et al. 2001). The appetitive system is activated in positive contexts that are likely to provide a feeling of pleasure. The defensive system is activated in response to threat and a feeling of displeasure. Both systems have common outputs mediating physiological systems involved in attention (Davis 2000; Davis and Lang 2001; LeDoux 1990). Importantly, valence and arousal are two independent dimensions contributing to motivational salience: the valence of a stimulus activates the motivational system, whereas arousal indicates the intensity of that emotion. In line with this theory, any appetitive or aversive stimuli have some motivational salience (independent of level of arousal) and would activate the motivational systems that mediate attention. Hence, presence should still be achieved in low arousal motivationally salient environments (i.e. relaxing). Indeed, presence tends to increase for emotional stimuli with both negative valence, for example fear (Alsina-Jurnet et al. 2011; Bouchard et al. 2008; Price and Anderson 2007; Riva et al. 2007) and sadness (Baños et al. 2004), as well as positive valence of joy and relaxation compared to neutral environments (Baños et al. 2008; Riva et al. 2007). Findings that presence is elicited by relaxing stimuli at similar levels to anxiety (Riva et al. 2007) are seemingly in contrast to the arousal theory (Freeman et al. 2005), according to which relaxing stimuli should not elicit a strong sense of presence as they are less arousing compared to anxiety evoking stimuli. Nevertheless, even within the motivational salience account, it would still be the case that greater arousal (as independent dimension) generates more (intense) motivational salience of aversive or appetitive stimuli, leading to a greater sense of presence. The current study aims to test the motivational salience proposition by systematically examining the role of different levels of arousal and valence in the generation of presence.

1.3 Subjective and objective assessment of presence

Subjective measures have been typically used to quantify user experience of presence (Buttussi and Chittaro 2017; Makransky et al. 2019). With these, sense of presence can be broken down into three components: sense of physical space, ecological validity, and engagement (Freeman et al. 2005). Sense of physical space is the users’ perception of being located in the virtual environment and is mostly determined by system-driven factors. Ecological validity refers to how real the user feels the virtual environment seems and is determined by a mixture of system- and media-driven factors. Engagement is the users’ interest in interacting with the content of the environment and is mostly determined by media-driven factors. Thus, by examining both system- and media-driven factors, the current study will examine their individual and combined effects on these aspects of presence.

The objective assessment of presence, and thus, its underpinning psychophysiological markers are less clear. Previous studies have used heart rate (HR) and galvanic skin response (GSR) as indicators of presence (Kobayashi et al. 2015). However, reliance on HR and GSR to measure presence due to arousal (cf. Freeman et al. 2005) presents a method artefact as HR and GSR are indicators of arousal itself (Bradley et al. 2001). Indeed, previous research linking presence to HR and GSR has been limited to arousing stimuli, such as rollercoaster simulations or urgent situations (Baumgartner et al. 2006; Gorini et al. 2011). Though HR and GSR are useful indicators of arousal and have been widely applied in emotion research (Burriss et al. 2007; Lang et al. 1998), they are less direct indicators of presence due to motivational salience.

Electroencephalogram (EEG) offers a direct assessment of brain function and can be used as an indicator of presence. EEG studies have found that greater frontal alpha and reduced parietal alpha was indicative of presence (Kober et al. 2012; Baumgartner et al. 2006), which fits with theoretical models of presence generation (Wirth et al. 2007). Alpha band power (8–13 Hz) purportedly reflects active neuronal inhibition and is inversely related to cortical activity (Pfurtscheller 1989). The parietal lobe is active in spatial processing and attention (Shomstein and Gottlieb 2016), and thus, reduced parietal alpha would indicate greater attention to the virtual environment, which is key in the formation of presence (Wirth et al. 2007). The dorsolateral prefrontal cortex (DLPFC) is involved in executive control and conflict adaptation (Mansouri et al. 2017), and thus, higher frontal alpha may indicate less conflict between spatial situational models, which need to be resolved to generate presence (Wirth et al. 2007). Indeed, this connection is supported by fMRI studies showing that activity in the DLPFC is inversely related to parietal activity and subjective presence ratings (Baumgartner et al. 2008). Another potential EEG marker of presence is the frontal theta/beta ratio (TBR). TBR reflects top-down processing that is mediated by the dlPFC, such as attentional control (van Son et al. 2019a, b). TBR has specifically been found to be a reliable biomarker of attentional control, with lower TBR representing greater attentional control (Angelidis et al. 2016; Putman et al. 2010). TBR is also positively correlated with mind wandering, which represents a reduction in attention to the current task (van Son et al. 2019a, b). Based on theoretical models of presence, presence is reliant on the acceptance of the environments spatial situational model, and thus, lower TBR should indicate greater attention allocation to the virtual environment, which is more likely lead to acceptance of the spatial situational models and thus be indicative of greater presence (Wirth et al. 2007). Logistic restraints in using EEG within a VR environment include poor compatibility between large EEG arrays, and most HMD’s. Nevertheless, novel low-density EEG devices can be worn alongside the HMD to test brain mechanisms underpinning presence in VR.

1.4 Research aims

The aims of the current study were to, firstly, examine whether HMD specific immersive factors (head tracking and stereoscopy) elicit greater presence, as measured by objective EEG indicators and subjective ratings of presence, in VR compared to standard viewing modes, while controlling for non-HMD specific immersive factors, specifically field of view. Secondly, this study aimed to examine the effect of media-driven factors, specifically the role of motivational salience, on the elicitation of presence based on theoretical frameworks of emotion processing (Bradley and Lang 2000). Accordingly, a sense of presence will be generated for any valenced stimuli, including relaxing stimuli, as it would lead to activation of motivational systems that mediate attention; however, presence will be greater for more arousing stimuli due to intensifying the activation in the motivational systems. Research to date has not examined the effects of valence and arousal on presence. The final aim is to examine to what extent more immersive technologies and sense of presence magnify user effects (Cummings and Bailenson 2016), and thus, influence affective experience in VR as compared to standard viewing mode. Therefore, it was hypothesised that:

  • 1. HMD viewing mode will elicit greater presence than a projected viewing mode when controlling for all non-HMD specific immersive factors.

  • 2. Motivational salience elicits the sense of presence such that:

    • a. Appetitive and aversive stimuli elicit sense of presence due to their motivational salience independent of their level of arousal.

    • b. Nevertheless, more arousing stimuli will elicit a greater sense of presence.

  • 3. As a greater sense of presence is proposed to magnify user effects in VR (Cummings and Bailenson 2016), we hypothesise that respective emotional responses will be greater in HMD viewing mode—such that:

    • a. Subjective ratings of fear when viewing fear evoking stimulus will be greater in HMD viewing mode.

    • b. Subjective ratings of excitement when viewing exciting stimulus will be greater in HMD viewing mode.

    • c. Subjective ratings of serenity when viewing relaxing stimuli will be greater in HMD viewing mode.

2 Method

2.1 Design

This study used a within participants 2 (display type) × 3 (stimulus type) factorial design to assess the effect of system- and media-driven immersive capabilities on presence and affective experience. The independent variables were display type (HMD or monoscopic wall projection) and stimulus type (fear evoking, relaxing and exciting). The DVs were subjective (Presence, SAM and PANAS questionnaires) and objective measures (alpha power and theta/beta ratio) of presence and affective experience. Full ethical approval was obtained from the local ethic committee following British Psychological Society guidelines.

2.2 Participants

Participants were 14 students (10 males, 4 females, mean age = 22.9, SD = 1.71) studying at Nottingham Trent University. All participants had normal or corrected-to-normal vision. Participants were recruited through an on-going psychometric validation study for immersion in entertainment media and were offered research credits for their participation.

2.3 Procedure

The study was conducted in a VR laboratory with the following experimental study protocol: On arrival to the laboratory, EEG hardware was attached to the participants. Participants completed initial psychometric baseline assessments. Two (eyes open, eyes closed; 1-min each) baseline EEG recordings were performed. Participants were then randomised to perform either the HMD or projection mode first, followed by the alternate display-type condition. In each case, three videos were presented (one of each affective category: 1 × fear evoking, 1 × relaxing and 1 × exciting). The relaxing video was always displayed in the middle and the order of aversive and appetitive counterbalanced. Upon completion of each display-type condition, participants had a short 5-min break. After each video, participants completed psychometric measures of presence, arousal and affect. Upon completion of all the videos, participants were thanked and debriefed.

2.4 In-session psychometrics

These measures were recorded at baseline before exposure to stimuli and immediately after each video presentation. Data were collected using Bristol Online Survey software (University of Bristol 2017).

Positive and negative affect scale (PANAS-X; Watson and Clark 1999): a 60-item scale with the two higher order scales and 11 specific affects. For the purpose of this study, 14 specific negative affect items relating to 2 specific affects (fear and fatigue) and 16 specific positive affect items relating to 4 specific affects (excitement, serenity, attentiveness and surprise) were selected. The instructions were worded to measure affect at the exact moment of completion. The PANAS-X uses a 5-point Likert scale (1 = not at all, 5 = extremely). All subscales used in this study have shown high internal consistency in the initial paper for exact moment (all Cronbach’s alphas > 0.72). All subscales used also show high convergent validity with their equivalent subscales of the Profile of Mood States (POMS; McNair et al. 1971) with all correlations > 0.85.

Self-Assessment Manikin (SAM; Lang 1980): It is a non-verbal pictorial assessment technique that directly measures perceived pleasure and arousal associated with a person’s affective reaction to stimuli. They are rated using a 9-point Likert scale, each point having its own pictorial representation. The scale shows high internal consistency in adults (Cronbach’s alpha = 0.82 for pleasure and 0.98 for arousal; Backs et al. 2005).

Presence questionnaire: It was adapted from the igroup presence questionnaire (IPQ; Schubert et al. 2001). Questions are based on theoretical models of presence that indicate three measurable dimensions of presence; sense of physical space, ecological validity and engagement. Adaptation was necessary as the IPQ targets stimulus where the participant has control of their location in the virtual world. In this study, participants remained stationary, this meant questions had to be adapted or removed. Six questions were selected to measure presence. Two questions targeted sense of physical space (“How aware were you of the real world while navigating the virtual world (i.e. its sounds, other people, temperature etc.)?”,” How much did you feel the virtual world surrounded you?”), one question targeted ecological validity (“How real did the virtual world seem to you?”) and three questions targeted engagement (“Did you have a sense of being in the virtual world (i.e. “of being there”)?”,” How much attention did you pay to the real environment?”, “How captivated were you by the virtual world?”). The scale uses a 5-point Likert scale (1 = not at all, 5 = extremely).

2.5 Stimuli

Six spherical 360° videos were used as stimuli and split into three affective categories: fear evoking, relaxing and exciting, see Fig. 1 for screenshots of the stimuli. Two stimuli groups were created, each containing 1 × fear evoking, 1 × relaxing and 1 × exciting stimuli parings. Fear evoking videos incorporated imminent human threat, which has been classed as highly arousing with negative valence. For relaxing stimuli, scenes containing natural environments were selected as these tend to be the lowest arousal stimuli with a positive valence. Stimuli containing upbeat dance sequences were selected as exciting stimuli with high arousal and positive valence combination. All stimuli were first-person point of view and free from narrative, which are media factors that have been found to influence presence (Cummings and Bailenson 2016; Gorini et al. 2011; Ling et al. 2013a, 2013b; Weech et al. 2020). Both relaxing videos had to have music dubbed to remove narrative. Narratives, in this sense, does not mean spoken words, but words used to elicit an emotional response, such as the telling of a story. All videos were from a stationary point of view to reduce the possibility of cybersickness, which may arise from differences between perceived and physical movement (Davis et al. 2014) and is negatively related to presence (Weech et al. 2019). Each video lasted 150 s broken down into 10 s of black screen, 120 s of 360° video, 20 s of black screen.

Fig. 1
figure 1

Line graph depicting the effect of mode and stimulus type on SAM arousal

2.6 Experimental set-up

For the 3D (HMD) condition, stimuli were displayed using an Oculus Rift DK2 connected to a computer consisting of an Intel Core I7-5930 K CPU running at 3.5 Ghz with 32 GB Ram, a Nvidia GeForce GTX 980 TI graphics card and a Windows 10 Pro × 64 operating system. The stereoscopic DK2 has a 100° field of view, 1080p resolution, 60 Hz refresh rate and 1000 Hz internal head tracking update rate. To limit the amount of movement artefact, participants were fitted with a neck support cushion throughout the experiment that limited their head movement and were encouraged to use the swivel on the chair to move their perspective in the video when wearing the HMD. The 2D (large projected viewing mode) condition was displayed using an Epson projector connected to the same computer as the HMD condition with the capabilities to display 1080p resolution at 60 Hz. Stimuli were projected onto a white wall. The screen width was 241.5 cm and the participant was positioned 84.5 cm from the wall achieving a 100° field. Participants navigated the 360-video using a mouse on a pad that rested over the chair. The stimuli were displayed with a 1080p resolution and at 30 frames per second. Sound was delivered to the participant using Sennheiser around ear headphones in both conditions. This set-up controlled for field of view, image quality, sound quality and refresh rate with the only differing immersive factors being stereoscopy and tracking method. The temperature-controlled laboratory was set at a constant temperature of 20 °C. The windowless laboratory also allowed for complete darkness in the room to minimise distractions and avoid light affecting the quality of the projected images. Though measures of cybersickness were not collected, the researcher observed participants and checked on their wellbeing throughout the experiment, being specifically mindful of cybersickness. None of the participants reported any symptoms of cybersickness.

2.7 EEG recording

Electroencephalogram (EEG) was recorded using an Alpha-Active Ltd HeadCoach EEG system (Alpha Active 2017). This system consists of 2 channels (attached at AF7 and AF8; referenced to ipsilateral mastoids, and fPz ground; sampling rate, 128 Hz). Electrode sites were chosen based on (Baumgartner et al. 2006) and compatibility HMD systems. Impedance was maintained below 5 kOhms. Signal processing of EEG data was performed using CURRY 7.12 software (Compumedics Neuroscan 2017). Data were filtered offline between 0 and 30 Hz. Ocular and muscle artefacts were removed. Alpha power (8–12 Hz) and TBR were measured and averaged across AF7 and AF8.

2.8 Statistical analysis

Data cleaning and statistical analysis were conducted using R studio (RStudio Team 2020). Firstly, manipulation checks were conducted using a 2 × 3 ANOVA on SAM scores to ascertain if the conditions differed significantly in terms of arousal and valence. 2 × 3 ANOVA’s were conducted to examine the effect of viewing mode and stimuli on presence by using all IPQ subscales as well as frontal alpha power and TBR. Paired samples t tests were used to examine the effects of mode on magnifying user effects. Fear evoking stimuli were compared using PANAS fear items, relaxing stimuli using PANAS serenity items and exciting stimuli using PANAS excitement items. Finally, a 2 × 3 ANOVA was conducted on PANAS attentiveness to examine the effect of mode and stimuli on attention.

3 Results

3.1 Manipulation checks for motivational salience of stimuli and differences between display

Initially, to examine the motivational salience of the stimuli and whether this differed between mode, a 2 (viewing mode: HMD or projector) × 3 (stimulus type: fear evoking, relaxing or exciting) within subjects ANOVA was performed on scores of SAM valence and SAM arousal. For SAM valence, there was a significant effect of stimulus type (F(2,26) = 11.9, p = 0.001, η2 = 0.32). Post hoc Bonferroni comparisons revealed that both relaxing (t(27) = 6.23, p < 0.001, d = 1.52) and exciting stimuli (t(27) = 4.1, p = 0.001, d = 1.33) had significantly greater valence compared to fear evoking stimuli, but did not differ from each other (t(27) = 0.61, p = 1.00, d = 0.16). For SAM arousal, there was a significant effect of stimulus type (F(2,26) = 17.75, p < 0.001, η2 = 0.36). Post hoc comparisons revealed that arousal was significantly lower for relaxing stimuli compared to both fear evoking (t(27) = 6.24, p < 0.001, d = 1.99) and exciting stimuli (t(27) = 5.13, p < 0.001, d = 1.47). There was no significant difference between fear evoking and exciting stimuli (t(27) = 1.55, p = 0.4, d = 0.39). The manipulation checks confirm that relaxing and exciting stimuli differ in arousal but not valence, allowing for a direct examination of the role of arousal in presence. There was no significant effect of viewing mode for either SAM arousal (F(1,13) = 0.506, p = 0.489, η2 = 0.004) or SAM valence (F(1,13) = 1.358, p = 0.265, η2 = 0.002). There were no significant interactions between mode and stimuli for SAM arousal (F(2,26) = 3.25, p = 0.071, η2 = 0.036) or SAM valence (F(2,26) = 2.49, p = 0.118, η2 = 0.021). The descriptive statistics for each condition and all dependent variables for all analyses can be seen in Table 1 and line graphs can be seen for SAM arousal (Fig. 1) and SAM valence (Fig. 2).

Table 1 Descriptive statistics for each condition and all dependent variables
Fig. 2
figure 2

Line graph depicting the effect of mode and stimulus type on SAM valence

3.2 Effects of HMD exclusive immersive capabilities and stimuli motivational salience on presence

3.2.1 Subjective measures of presence

To examine the effects of HMD exclusive immersive capabilities on subjective presence, 2 (viewing mode: HMD or Projector) × 3 (stimulus type: Fear evoking, relaxing or exciting) within subjects ANOVA’s were performed on the overall presence scale and its subscales of physical space and ecological validity. Significant effects of mode were found for overall presence (F(1,13) = 4.890, p = 0.046, η2 = 0.16), physical space (F(1,13) = 21.564, p < 0.001, η2 = 0.39), ecological validity (F(1,13) = 20.717, p < 0.001, η2 = 0.22). Results showed greater subjective overall presence, physical space and ecological validity in the HMD condition compared to the projector condition. However, no significant effects of stimulus type were found for overall presence (F(2,26) = 0.007, p = 0.989, η2 < 0.001), physical space (F(2,26) = 0.38, p = 0.680, η2 = 0.003), ecological validity (F(2,26) = 0.389, p = 0.671, η2 = 0.005). No significant effects of mode (F(1,13) = 0.0003, p = 0.99, η2 < 0.001) or stimuli (F(2,16) = 0.304, p = 0.641, η2 = 0.002) were found for the subscale engagement. There were no significant interactions. A line graph representing overall presence can be seen in Fig. 3.

Fig. 3
figure 3

Line graph depicting the effect of mode and stimulus type on subjective presence

3.2.2 Objective measures of presence

To examine the effects of HMD exclusive immersive capabilities on objective presence, 2 (viewing mode: HMD or Projector) × 3 (stimulus type: Fear evoking, relaxing or exciting) within subjects ANOVA’s were performed on frontal alpha power and frontal TBR. A significant effect of mode was found on overall alpha power (F(1,13) = 7.597, p = 0.016, η2 = 0.16), with mean alpha being higher in the HMD condition. There were no significant effects of stimulus type for alpha power (F(2,26) = 0.270, p = 0.747, η2 = 0.003) and no significant interactions (F(2,26) = 0.542, p = 0.535, η2 = 0.006). A line graph representing overall alpha power can be seen in Fig. 4.

Fig. 4
figure 4

Line graph depicting the effect of mode and stimulus type on frontal alpha power

For TBR, a significant effect of stimulus type was found (F(2,26) = 3.424, p = 0.048, η2 = 0.02); however, post hoc comparisons did not reach significance for any contrasts (for all contrasts p > 0.1). There was no significant effect of viewing mode on TBR (F(1,13) = 2.286, p = 0.154, η2 = 0.033). There was a significant interaction found between viewing mode and stimulus type for TBR (F(2,26) = 4.633, p = 0.019, η2 = 0.023); however, post hoc comparisons did not reach significance (for all contrasts p > 0.2; detailed statistics can be found in the supplementary materials Table 1 and 2). A line graph representing TBR can be seen in Fig. 5.

Fig. 5
figure 5

Line graph depicting the effect of mode and stimulus type on TBR

3.3 Effects of HMD exclusive immersive capabilities and motivational salience on affective experience

To examine the effects of HMD exclusive immersive capabilities on affective experience for each motivationally salient stimulus, paired T tests were carried out comparing viewing modes for each stimulus type and its associated affect. For fear evoking stimulus, a paired t test was conducted to examine the effect of mode on PANAS Fear. Results found than subjective fear was significantly greater in the HMD condition (t(13) = 2.58, p = 0.023, d = 0.48). For exciting stimulus, a paired t test was conducted to examine the effect of mode on PANAS Excitement. Results found no significant difference between conditions (t(13) = 1.33, p = 0.207, d = 0.36). For relaxing stimulus, a paired t test was conducted to examine the effect of mode on PANAS Serenity. Results found no significant difference between conditions (t(13) = 0.56, p = 0.583, d = 0.09).

3.4 Subjective attention by mode and stimulus type

To examine the effect of viewing mode and stimulus type on subjective levels of attentiveness, a 2 (viewing mode: HMD or Projector) × 3 (stimulus type: Fear evoking, relaxing or exciting) within subjects ANOVA’s were performed on PANAS attentiveness. There was a significant effect of stimulus type (F(2,26) = 6.443, p = 0.005, η2 = 0.065). Post hoc comparisons revealed that mean scores were higher for fear evoking (t(27) = 2.55, p = 0.05, d = 0.56) and exciting stimuli (t(27) = 3.49, p = 0.005, d = 0.65) compared to relaxing stimuli. There was no significant difference between fear evoking and exciting stimuli (t(27) = 0.59, p = 1.00, d = 0.11). This suggests that subjectively, participants felt that they were less attentive when viewing relaxing stimuli. There was no significant effect of mode (F(1,13) = 1.597, p = 0.229, η2 = 0.014). This suggests that attentiveness experienced in response to stimuli is not significantly affected by the viewing mode used. There was no significant interaction (F(2,26) = 1.48, p = 0.249, η2 = 0.015).

4 Discussion

Previous research has highlighted that HMDs elicit greater presence than standard viewing modes (Cummings and Bailenson 2016; Diemer et al. 2015). The current study extended this by conducting the first systematic test of HMD exclusive immersive properties on presence (stereoscopy and head tracking), while controlling for the factors of FoV, image quality and refresh rate. It also examined media-driven factors in the generation of presence and the role of presence in magnifying user effects.

4.1 System-driven immersive capabilities and presence

In line with our hypothesis (Hyp 1), HMD viewing modes elicited a significantly greater level of subjective presence compared to large projected images of equal FoV, image quality, sound quality and refresh rate. This was found for overall presence ratings and previously suggested sub-factors of physical space and ecological validity (Freeman et al. 2005). Both overall presence and ecological validity had medium effect sizes while physical space had the largest effect size. Interestingly, the subscale for engagement found no significant difference between conditions, which is in line with suggestions that this is more related to media-driven factors (Freeman et al. 2005). The significant differences in subjective measures of presence were partially supported by objective measures. Greater alpha power in the frontal areas was found in the HMD condition with a medium effect size. This is in line with previous research showing increased alpha in frontal lobes and decreased alpha in parietal lobes related to presence generation (Baumgartner et al. 2006; Kober et al. 2012). Though TBR was indicated as another objective measure of presence due to its relationship with attentional control (Angelidis et al. 2016), there were no differences in TBR as a function of viewing mode. This suggests that attentional control was not greater in the HMD condition despite its greater sense of presence in terms of alpha power and subjective presence. This is surprising due to the proposed role of attention in the generation of presence (Wirth et al. 2007). Nevertheless, the findings across previously used subjective and objective measures of presence provide support for the role of HMD exclusive immersive factors, stereoscopy and head tracking, increasing presence when all other known immersive factors are controlled. This addresses flaws in previous research that did not control for field of view (Buttussi and Chittaro 2017), which is known to have a large effect size on presence (Cummings and Bailenson 2016). These findings provide support for the continued use of HMD’s to deliver stimuli that requires presence to achieve its desired results, for example, VRET’s (Bouchard et al. 2017; Miloff et al. 2016; Pitti et al. 2016).

4.2 Media-driven factors and presence

Alongside system-driven immersive capabilities, this study examined media-driven factors in the elicitation of presence, specifically arousal and valence. Due to their motivational salience all of the selected stimuli were expected to generate some degree of presence independent their level of arousal (Hyp 2a); however, fear evoking and exciting stimuli were expected to induce the greatest sense of presence due to their increased arousal and its proposed role in presence (Hyp 2b; c.f. Freeman et al. 2005; Wirth et al. 2007). Users indeed experienced a sense of presence even in low arousal environments (relaxing condition); however, despite greater levels of subjective arousal seen in the manipulation checks, the fearful and exciting stimuli did not generate greater sense of presence than the relaxing stimuli for any of the subjective or objective indicators, including the engagement subscale of the IPQ which is proposed to be related to the presented media (Freeman et al. 2005). This is inconsistent with the arousal theory of presence (Freeman et al. 2005), which proposes that greater arousal leads to greater presence, potentially due to the increased attention allocation to the stimulus caused by the elevated arousal. In relation to theories of presence formulation that suggest attention is key in elicitation of presence (Wirth et al. 2007), motivational salience might be sufficient for attention allocation—independent of arousal (c.f. Bradley and Lang 2000). Both fear evoking and exciting stimuli scored significantly higher on the PANAS subscale for attentiveness than relaxing environments with medium to large effect sizes, suggesting subjectively more attention was paid in these conditions, which should have increased the sense of presence. However, this was not supported by the objective measures. The analysis of EEG data showed no effect of stimulus type on frontal alpha power, and although there was an effect on TBR, the effect size was small and post hoc tests revealed no significant differences between conditions—suggesting that attentional control was similar across stimulus types, hence leading to similar levels of presence. Together, this suggests that presence is not completely limited to arousing stimuli, but may still depend on attentional demands, which would be determined by motivational salience. This is in line with more complex theoretical frameworks, like the motivational activation theory (Bradley and Lang 2000) which suggests that valence and arousal are the main but independent components of emotional response. The activation of a motivational system (triggered by valence) may be a precursor for the generation of presence. This would still explain why previous research found greater presence in response to emotionally laden stimuli, which activates a motivational system, compared to neutral stimuli (Baños et al. 2004, 2008; Bouchard et al. 2008; Riva et al. 2007). Nevertheless, the intensity of response (triggered by arousal) did not directly translate into greater presence. Thus, the role of arousal in presence generation may be more complex as initially proposed (c.f. Freeman et al. 2005) and presence generation may not be directly mappable to a motivational activation framework (c.f. Bradley and Lang 2000). However, Bradley and Lang (2000) do specify that increased arousal does not always lead to increased physiological and behavioural response (e.g. attention), and it can be context dependent and based on the behaviour selected. Indeed, it has been found that although relaxation is associated with reduced arousal, it is not linked with reduced attention (Scheufele 2000). A more refined paradigm including also completely neutral/non-arousing and motivationally non-salient stimuli is needed further study the interaction of arousal and motivational salience in motivational activation, and how/when arousal affects the generation of presence depending on the salience elicited.

4.3 Presence and user effects

Previous research suggests that increased sense of presence magnifies user effects (Cummings and Bailenson 2016) and is key in producing emotional response to virtual environments (Parsons and Rizzo 2008; Price et al. 2011). Hence, emotional response should be greater in the HMD condition, due to an increased sense of presence related to its superior immersive capabilities (Hyp 3). However, this study had mixed results. Viewing mode did not affect subjective scores for serenity in response to relaxing stimuli or excitement in response to exciting stimuli (Hyp 3b & 3c). Thus, HMD mode did not magnify the respective emotional reaction to exciting and relaxing stimuli. However, the HMD mode elicited a greater subjective fear response in response to fear evoking stimuli than the projected media mode with a medium effect size (Hyp 3a). This is in line with research suggesting that fear and presence are mutually dependent, with an increase in one leading to an increase in the other (Alsina-Jurnet et al. 2011; Price and Anderson 2007; Riva et al. 2007). The consistent replication of this finding calls for further research into why fear responses seem to be the most enhanced by the greater presence that HMD's offer. Threat response theories (Öhman and Mineka 2001) suggest that fear evoking stimuli automatically receive attentional resources due to their evolutionary significance. As such, the motivational salience of fear evoking stimuli would be greater and prioritise attention allocation—hence potentially leading to greater sense of presence and the magnification of user effects. Understanding this link is vital due to the importance of eliciting fear response in VRETs (Benito and Walther 2015).

4.4 Limitations and future work

Given necessary data restriction to two frontal sites to collect EEG when wearing a HMD, activity in the parietal lobes was not measured. The parietal lobe may be important in presence research and further work with larger arrays is needed to support findings from Baumgartner et al. (2006) and Kober et al. (2012). As technology develops that integrates HMD hardware with higher density EEG systems, this will become possible, allowing examination of cortical activation across other pertinent scalp regions related to attention and visual processing, such as the parietal (Baumgartner et al. 2006; Kober et al. 2012; Howard et al. 2015, 2019) and occipital lobes (Ko et al. 2017; Howard et al. 2017). The constant technological advancement may also lead to the development of more HMD exclusive immersive capabilities. There has already been advancement in devices to interact with the virtual environment such as handheld controllers and motion detection systems that play a role in user experience, immersion, and presence (De Paolis and De Luca 2020). With the current trajectory of technological advancement, it is likely that the gap between HMD’s and standard viewing modes will only widen in terms of immersive capabilities and users sense of presence and future research should focus on testing these new modalities.

This study also found differences between stimuli on subjective attentiveness, but not objective attention. Due to the role of attention in theoretical models of presence (Freeman et al. 2005; Wirth et al. 2007) and highlighted by the current findings, future research could systematically test the role of attention and motivational salience of different stimuli inputs using of eye-tracking and attentional distraction tasks to further examine attentions role in the generation of presence. As highlighted above, future research should also include a wider array of stimuli manipulating valence and arousal—including completely non-salient/non-arousing stimuli. Moreover, the stimuli used here covered both high and low arousal positive stimuli but only high arousal negative stimuli. To provide a better examination of the role of fear as a unique motivational network, future work may benefit from including low arousal negatively valenced conditions as well.

Finally, this study suffered from a small sample size (N = 14) impacting the statistical power. Although this research indicated magnified fear responses when comparing HMD to standard viewing modes, the lack of statistical power did not allow for an examination of the potential mediating effect of presence. From the results of this study, it can be assumed that presence was generally greater in HMD viewing modes; however, it cannot be assumed that the magnified fear response seen in HMD viewing mode was mediated by increased presence. When a VR-compatible, large array EEG system is created these findings must be replicated in a much larger sample to explore the effects of system- and media-driven factors in presence generation and their neural markers in a much more detailed way. The study protocol could also be extended to examine more media-driven factors such as point of view, narrative and player movement, which can impact sense of presence (Clemente et al. 2014; Ling et al. 2013a, 2013b; Weech et al. 2019). This could inform on the best media-driven set-ups for a successful VRET.

5 Conclusion

In conclusion, this study systematically tested the HMD exclusive combination of stereoscopy and head tracking while controlling for all other known immersive factors, specifically field of view (Cummings and Bailenson 2016). Results were in line with existing literature regarding the ability of HMDs to elicit greater subjective (Diemer et al. 2015) and objective measure of presence (Baumgartner et al. 2006; Kober et al. 2012) than standard viewing modes. However, this was not linked with increased attention that had been proposed by theoretic models of presence (Wirth et al. 2007). More systematic research is needed into the physiological markers of presence through the implementation of more advanced EEG systems alongside HMDs. Finally, this research examined the role of presence in magnifying affective experience. Results suggest that increased subjective presence only has a significant effect on fear in relation to fear evoking stimuli. Overall, the results highlighted a much more complex pattern of interaction of the roles of arousal and attention in the formation of presence as previously suggested (Freeman et al. 2005; Wirth et al. 2007). Here, an explanation involving motivational salience of stimuli has been put forward, in line with theories of motivational activation (Bradley and Lang 2000), which had not been previously investigated in the context of presence formation research but seems to show some promise. More nuanced experimental paradigms, however, are needed to fully understand the relationship and independent contributions of specific emotions and arousal in relation to presence.