Introduction

Virtual reality (VR) is currently being used and studied as a tool for psychological intervention in various ways (Carl et al. 2019; Pizzoli et al. 2019). Virtual relaxation, especially exposure to virtual natural environments, is a more recent development in this field (Browning et al. 2020a; Orr et al. 2021; Riches et al. 2021). The theoretical framework for the efficacy of virtual natural environments is grounded in the Attention Restoration Theory (Kaplan 1995) and/or Stress Reduction Theory (Ulrich 1981), which were originally developed for real nature environments. Although the specific efficacy factors of both theories have not yet been conclusively confirmed empirically (Corazon et al. 2019; Ohly et al. 2016), the stress-reducing and mood-enhancing effects of natural environments on mental and physical well-being are generally accepted (White et al. 2018; World Health Organization 2016).

However, access to real natural environments is limited (e.g., in big city areas, during pandemic-associated lockdowns, in isolated and confined environments; Anderson et al. 2017). Furthermore, certain health conditions may make accessing real natural environments more challenging (e.g., inpatients, older adults in nursing homes, long-term care; Mostajeran et al. 2021; Orr et al. 2021). Therefore, over the past years, the research focus has shifted to virtual natural environments presented via immersive technology and to the question of whether they have a relaxing effect on recipients (Browning et al. 2020a).

Immersion is a technical continuum that is realized more or less by different hardware and softwares (Slater and Sanchez-Vives 2016; Slater and Wilbur 1997). In general, immersion describes the capability to deliver “an inclusive, extensive, surrounding and vivid illusion of reality” (Slater and Wilbur 1997, p. 605). Inclusiveness defines the degree to which external stimuli from the physical/real environment are substituted by the technology. It is applicable for all sensory modalities, although it often focuses on vision, as this sense is perceptually dominant for many people (Slater and Sanchez-Vives 2016). In practice, contemporary head-mounted displays (HMDs), in combination with headphones, already achieve audiovisual substitution. Extensiveness describes the number of senses addressed by the technology. The field-of-view operationalizes the surrounding criteria, with a large panoramic field of view positively influencing the immersion level. Finally, further technical features are summarized under the term of vividness to create a deceptively real illusion, e.g., stereoscopy, high-resolution, degree of realism, spatial audio, kind of head and hand tracking (three-degrees-of-freedom, 3DoF; six-degrees-of-freedom, 6DoF), and low-latency of head and hand tracking (Cummings and Bailenson 2016; Slater and Sanchez-Vives 2016; Slater and Wilbur 1997).

Regarding virtual natural environments, a key assumption is that if they are presented with immersive technology, they are more likely to mimic real natural environments. Therefore, similar relaxation effects are expected (Browning et al. 2020b; Chirico and Gaggioli 2019). Herein, the degree of immersion in virtual natural environments depends on the output device (hardware) and the associated software (e.g., a three-dimensional [3D], computer-programmed virtual natural environment, or a 360° capturing of a real natural environment). Previous studies examining the relaxation effects of viewing virtual natural environments with immersive technology have used a 3DoF HMD (Browning et al. 2020b; Calogiuri et al. 2018; Chirico and Gaggioli 2019; Schutte et al. 2017; Villani et al. 2007; Yin et al. 2018) or a 6DoF HMD (Anderson et al. 2017; Liszio et al. 2018; Mostajeran et al. 2021; Nukarinen et al. 2020; Yeo et al. 2020) as well as computer-programmed 3D natural environments (Liszio et al. 2018; Nukarinen et al. 2020; Villani et al. 2007; Yeo et al. 2020) or monoscopic 360° videos of real natural environments (Anderson et al. 2017; Browning et al. 2020b; Calogiuri et al. 2018; Chirico and Gaggioli 2019; Mostajeran et al. 2021; Nukarinen et al. 2020; Schutte et al. 2017; Yeo et al. 2020; Yin et al. 2018).

Current computer-programmed 3D natural environments comprehensively use the immersive features of contemporary HMDs by showing a stereoscopic 3D virtual natural environment. Moreover, the recipients can actively interact with the virtual environment by using the 6DoF head and hand tracking. For example, Lisizio et al. (2018) and Yeo et al. (2020) used TheBlu. Here, recipients can interact with fish or corals using the HTC Vive handheld controllers. However, the degree of realism of TheBlu, and computer-programmed natural environments in general, is limited, as the computational costs for modeling and rendering photorealistic computer-programmed environments are still high (Ritter and Chambers 2021).

In contrast, previous studies have also used monoscopic 360° nature videos. They offer a low-cost, easy-to-capture alternative with a high degree of realism (Ritter and Chambers 2021). However, the same footage is rendered for both eyes. Therefore, monoscopic 360° videos lack 3D depth sensation (Huang et al. 2017). In addition, they can only respond to rotational motion (3DoF), although contemporary HMDs support full 6DoF. There are promising stereoscopic 360° cameras (e.g., Insta 360 Pro 2.0, Insta360 Titan, or KanDao Obsidian). Nonetheless, they are considerably more expensive, and the post-production workflow is more complicated compared to monoscopic ones (Huang et al. 2017). The easy-to-use and true-to-reality footage provided by monoscopic 360° videos may explain its growing popularity.

In essence, 3D computer-programmed natural environments take full advantage of the immersive features of contemporary HMDs, although the degree of realism is limited. Conversely, 360° videos benefit from a high degree of realism, are easy-to-capture and of low-cost, although the degree of immersion is limited.

Previous studies can be divided and summarized into three main categories based on their aims: (1) efficacy studies examining the relaxation effects of immersive virtual natural environments when compared to real natural environments; and (2) efficacy studies comparing immersive virtual natural environments with immersive control conditions; and (3) efficacy studies exploring the relaxation effects of more versus less immersive virtual natural environments.

Regarding the first category, a recent meta-analysis demonstrated that real natural environments are more effective than virtual ones (hedge’s g: 0.87, included studies: k = 6, N = 296) in changing positive affect, but there were no significant differences for negative affect (hedge’s g: − 0.28, p = 0.10; Browning et al. 2020a). However, these results should be interpreted with caution due to small sample sizes and varying degrees of immersion across studies. For instance, three studies applied monoscopic 360° videos via 3DoF HMDs (Browning et al. 2020b; Calogiuri et al. 2018; Chirico and Gaggioli 2019), one study used several nature images which were presented on a laptop screen (Brooks et al. 2017), one study presented a 40-min nature video via a TV screen (Olafsdottir et al. 2018), and another one used a monoscopic 360° nature video and a 3D computer-programmed natural environment via 6DoF HMD (Nukarinen et al. 2020).

The results are less conclusive when considering only those studies that used an HMD. Browning et al. (2020b) reported no significant differences between virtual and real natural environments in terms of negative and positive affect and skin conductance levels (SCL). Similarly, Chirico and Gaggioli (2019) also found no significant differences between virtual and real natural conditions in terms of negative and positive affect. Calogiuri et al. (2018) showed that real natural environments performed better than virtual ones with regards to changes in both positive and negative affect measures. Nevertheless, the results should be interpreted with caution due to the small sample size and order effects (all participants completed the real natural environment first, followed by the virtual ones in a randomized order). Moreover, the inconsistent results may be attributed to whether the participants walked (Calogiuri et al. 2018) or sat down (Browning et al. 2020b; Chricio and Gaggioli 2019) in the real or virtual natural environments. Nukarinen et al. (2020) showed that the real natural environments were superior to 3D natural environments in terms of heart rate variability (HRV) and heart rate (HR), but they found no significant differences in SCL and positive and negative affect. In addition, they showed that a real natural environment was superior to a monoscopic 360° nature video in terms of impacting HRV, HR, and negative affect, but there were no significant differences for positive affect and SCL. Yet, the small sample size (allocated to real nature vs. 3D nature: N = 8, allocated to real nature vs. 360° nature: N = 8) limits the results’ interpretation (Nukarinen et al. 2020). Although not included in the meta-analysis, Yin et al. (2018) found no significant differences between the real exposure to a biophilic environment and a monoscopic 360° biophilic video in terms of HR, SCL, and cognitive performance indicators. In short, prior results are heterogeneous and often non-significant. There is a clear need for a non-inferiority study with good statistical power to compare the efficacy of a monoscopic 360° nature video with a real natural environment.

Regarding the second main category, previous studies examined whether there are significant differences between monoscopic 360° videos of nature and of an indoor classroom via 6DoF HMD (Anderson et al. 2017), as well as between a monoscopic 360° video of nature and of an urban environment via 3DoF HMD (Schutte et al. 2017). In addition, another study compared the efficacy of a monoscopic 360° nature video and a nature photo slideshow as well as a monoscopic 360° urban environment video and an urban photo slideshow via 6DoF HMD (Mostajeran et al. 2021). Anderson et al. (2017) found no significant differences in negative and positive affect between the conditions at post-measurement. However, they found a significant reduction of negative affect from pre- to post-measurement for both nature conditions, but not for the control condition. In contrast, the control condition did not lead to significant reduction of positive affect from pre- to post-measurement. There was a significant decrease over time in SCL for all conditions. However, only the nature conditions reached SCL values below baseline, which was not found in the control condition. The HRV measurements were inconclusive. Meanwhile, Schutte et al. (2017) showed that positive affect was significantly higher in those who saw the 360° video of nature than in those in the control condition. However, there was no significant difference between conditions in terms of negative affect, probably caused by a floor effect.

Mostajeran et al. (2021) revealed that the natural context was superior to the urban one regarding the profile of mood states. In contrast to their expectations, there was neither a significant main effect of how the content was presented (360° vs. photos) nor an interaction effect. Regarding SCL, the authors found a significant main effect of the presentation type with the photo condition leading to a greater reduction in SCL than the 360° videos. There were no significant differences for the main factor content (nature vs. urban) or a significant interaction. Regarding HR, they found no significant main or interaction effects between the conditions.

Besides employing small sample sizes, the previous studies did not clarify whether more or less immersive technology yields different relaxation effects. This question was to some degree addressed by Mostajeran et al. (2021); however, their immersion manipulation consisted of presenting the environment via a monoscopic 360° video or photos. The hardware (6DoF HMD) was the same for both conditions. Therefore, it is likely that the manipulation was not strong enough to detect differences between the conditions.

The third main category of studies examined the relaxation effects of virtual natural environments, which were presented with more versus less immersive technology. Villani et al. (2007) compared a 3D computer-programmed beach presented via 3DoF HMD with a joystick controller, a 2D beach video via a TV screen, and a control condition in terms of respiration rate, HR, SCL, and self-reported relaxation ratings. Although the degree of immersion was manipulated via the hardware (e.g., inclusiveness) and software (3D vs. 2D), the authors found no significant differences between the conditions. A possible explanation could be the limited technological capabilities at that time.

In contrast, a more recent study demonstrated that a 3D computer-programmed virtual natural environment (TheBlu) presented via 6DoF HMD with headphones was superior to a 2D video recording of TheBlu presented via PC screen with desktop speaker, along with a control condition with no intervention in terms of positive and negative affect and HRV (Liszio et al. 2018). However, no significant differences were found for cortisol levels. Although the psychophysiological results are less clear, it can be argued that virtual natural environments presented with more immersive technology (6DoF head and hand tracking, active interactions with the virtual environments, audiovisual substitution, stereoscopy) resulted in more relaxation than those presented with less immersive technology. Still, it remains unclear whether this finding can be generalized to monoscopic 360° nature videos.

This was one of the main research questions in Yeo et al. (2020). The authors examined the effect of the type of presentation of a virtual underwater tropical coral reef scene on self-reported mood changes. The virtual scene was presented either via 6DoF HMD with integrated headphones (TheBlu), a monoscopic 360° video (an unseen scene of the BBC documentary The Blue Planet II) via 6DoF HMD with integrated headphones, or regular video footage from The Blue Planet II via a TV screen (control condition). Each condition displayed slightly different underwater sounds and documentary narratives. Before the virtual natural exposure, there was an experimental induction of boredom. The results showed that the computer-programmed 3D underwater condition evoked a significantly stronger positive mood than the monoscopic 360° video and the control condition. Contrary to the authors’ expectations, there were no significant differences between the 360° video and TV condition. Furthermore, there were no significant differences between the three recovery conditions in terms of negative moods and post-boredom scores. The authors concluded that the 360° video added little value compared to the TV condition, whereas the computer-programmed 3D natural environment with 6DoF HMD “appears to offer a qualitatively different experiences” (Yeo et al. 2020, p. 11).

Formulation of hypotheses and exploratory questions

In summary, monoscopic 360° nature videos are gaining popularity. They benefit from true-to-reality footage material, they are easy-to-capture, the post-production workflow is less complex, and the hardware costs are low. However, the degree of immersion is limited (2D, 3DoF). Therefore, a key question is whether monoscopic 360° nature videos presented with different immersive hardware provide relaxation benefits. In this regard, Yeo et al. (2020) showed that a monoscopic 360° video presented via 6DoF HMD did not significantly differ from a similar nature video presented via TV screen. However, it remains unclear whether the different film materials and/or documentary narratives may have biased the results. Furthermore, this study did not include a control condition or physiological outcome measures. Accordingly, the authors encouraged future studies to replicate and complement previous findings by addressing their shortcomings (Yeo et al. 2020).

Therefore, the main goal of this study is to examine whether a monoscopic 360° beach video presented via a 6DoF HMD with headphones is significantly more relaxing than the same video presented via a normal PC screen with headphones or a control condition without an intervention. Using the same video enhances internal validity but limits the degree of immersion manipulation. Nevertheless, we assume that manipulating the immersion level in the HMD condition should be sufficient (inclusiveness, 360° viewing, and changing the viewpoint with head movements) to detect differences in relaxation across conditions. Furthermore, the PC condition should lead to more relaxation than the control condition.

Psychophysiological relaxation can be operationalized by changes in the sympathetic and parasympathetic nervous systems (Ben-Shakhar 1985; Malik et al. 1996). SCL is exclusively sympathetically innervated (Boucsein 2012). HR is innervated both sympathetically (inhalation) and parasympathetically (exhalation; Malik et al. 1996). Previous studies showed that SCL was more sensitive than HRV (Anderson et al. 2017) or HR (Mostajeran et al. 2021) measures, although SCL is very susceptible to movement artifacts. In the present study, we set SCL as the primary outcome variable, but HR was also measured to provide a more detailed assessment of physiological relaxation response. Finally, it is also possible that the perceived relaxation can be derived independently from parasympathetic and sympathetic activity (Anderson et al. 2017; Lazarus 1994; Valtchanov et al. 2010). Therefore, we also collected perceived relaxation ratings (with a self-reported single-item). In sum, we proposed the following hypotheses:

  1. 1.

    The monoscopic 360° beach video presented via 6DoF HMD leads to a significantly larger decrease in SCL and HR and is perceived as significantly more relaxing than the same video presented via PC screen and no video (control condition).

  2. 2.

    The monoscopic 360° beach video presented via PC-screen leads to a significantly larger decrease in SCL and HR and is perceived as significantly more relaxing than no video (control condition).

Furthermore, the question of which variables moderate the relaxation of monoscopic 360° nature videos has not been sufficiently clarified; though whether an immersive experience differs depending on participants’ age and gender has been examined (Felnhofer et al. 2013, 2014; Kothgassner et al. 2018; Kothgassner and Felnhofer 2014; Yu et al. 2020). Moreover, it has been discussed whether attitudes toward new technologies influence the immersive experience (Felnhofer et al. 2013; Kothgassner et al. 2013). For example, someone who is generally rather skeptical and apprehensive of new technology may be less susceptible to immersive natural environments than someone who is more open-minded toward technology (Heerink et al. 2010; Kamal et al. 2020; Kothgassner et al. 2013; Manis and Choi 2019). Therefore, we performed exploratory analyses to examine whether participants’ age, gender, and/or technology anxiety moderated the relaxation effects across the three conditions (HMD, PC, control).

Methods

This experiment was conducted in accordance with the Declaration of Helsinki. All participants provided signed, informed consent prior to their participation and were debriefed after finishing the study. Preregistration can be found at https://aspredicted.org/22b5x.pdf.

Sample

The experiment was conducted at the Bundeswehr Hospital, Hamburg. The recruitment followed an ad-hoc convenience sampling with healthy adults voluntarily participating in the experiment. A quota scheme was used to systematically ensure that the ratio between men and women was evenly distributed and that a wide age range was included. We chose a convenience sample accompanied by a quota scheme to enhance the feasibility of acquiring a larger and well-distributed sample, although the generalizability of the results is limited.

In total, 102 (41 females, 61 males) healthy adults participated. They were informed about the study via e-mail, leaflets, and announcements in team meetings. Participants with insufficient German language skills were excluded from the current study. No further inclusion or exclusion criteria were defined.

The participants’ age ranged from 19 to 62 years (M = 36.52, SD 12.63), and an average of 15.54 years (SD 12.93) of working experience was reported. In addition, 28 (27.5%) participants reported that they had graduated from middle school, 44 (43.1%) from high school, and 28 (27.5%) had a university degree. The average technology anxiety rating was 2.34 (SD 1.13, range = 1–7, see measures). When asked about prior experiences with HMDs, 56.9% reported no previous experience, while 43.1% reported at least one experience with HMDs. Table 1 summarizes the participants’ characteristics separated by gender.

Table 1 Demographic characteristics, previous HMD experience, and technology-related anxiety separated by gender

Design

Our experiment followed a counterbalanced, randomized, controlled, within-subject design. The participants completed a mental arithmetic task followed by one recovery condition. They had to complete all three mental arithmetic tasks and all three recovery conditions. To control for carryover effects (e.g., accumulation and practice effects), the order of mental arithmetic tasks and recovery conditions was randomized (Fig. 1).

Fig. 1
figure 1

Overview of the study design. The order of the mental arithmetic tasks was randomized (Phase 2, 4, 6). Additionally, the order of the recovery phase (3, 5, 7) was randomized

Measures

Skin conductance level and heart rate

We measured SCL and HR as psychophysiological indicators of relaxation (Ben-Shakhar 1985; Khazan 2013; Malik et al. 1996; Schwartz and Andrasik 2017). Both parameters quantify changes in the autonomic nervous system (ANS) and have been used in previous studies; thus, they are considered as valid, practicable, and comprehensive measures of ANS activity (Anderson et al. 2017; Blum et al. 2019; Calogiuri et al. 2018; Liszio et al. 2018; Mostajeran et al. 2021; Rockstroh et al. 2019, 2020).

We used the Nexus Mark II System (Mind Media B. V., Netherlands) and Biotrace+ software (version 20.13) to record SCL and HR. Two Ag/AgCl electrodes were attached to the index and ring fingers of the non-dominant hand and measured SCL in micro siemens. HR was measured in beats per minute (bpm) by attaching a blood volume pulse sensor to the index finger of the dominant hand. Both SCL and HR were recorded at 32 samples per second (SPS).

Participants were instructed to sit during the entire experiment. The Nexus Mark II was set up on a side table next to the chair. During the test, participants could choose whether to place their hands on the chair’s arms or on their thighs. After finding the most comfortable position, they were instructed to reduce movements with their arms and hands as much as possible to avoid motion artifacts.

Perceived relaxation

The within-subject design increased statistical power but extended the procedure by about one hour. Furthermore, the within-subject design makes the use of validated scales to assess relaxation/mood changes (e.g., Positive and Negative Affect Schedule, PANAS, composed of 20 items, Watson et al. 1988; or Scale of Positive and Negative Experience, SPANE, composed of 12 Items, Diener et al. 2010) challenging. Each participant would have to complete either 20 or 12 items for each condition, probably leading to fatigue or boredom effects. In addition, the already rather long experimental procedure would be additionally prolonged. Therefore, we used a single-item to evaluate perceived relaxation, which was designed to be clear and transparent, to counteract fatigue or boredom effects. However, single-item reliability is debatable, and the results should be interpreted with caution (Egleston et al. 2011; Loo 2002; Postmes et al. 2013).

Perceived relaxation was assessed on a 7-point Likert scale (1 = “strongly disagree, "7 = “strongly agree”) in each recovery condition (control condition: “To me, not seeing any video (PC or VR) was relaxing;” PC condition: “To me, the video on the PC monitor was relaxing;” HMD condition: “To me, the VR video was relaxing”).

Technology anxiety

Participants’ technology related anxiety was measured with the Technology Usage Inventory (Kothgassner et al. 2013). Based on four items, the technology anxiety subscale assesses the level of stress and anxiety about technological devices in general and about making mistakes when using them. Anxiety was assessed on a 7-point-Likert scale (1 = “strongly disagree,” 7 = “strongly agree”). In this study, the internal consistency was α = 0.79, indicating good reliability (Cronbach 1951).

Technological equipment

For the more immersive condition, a 6DoF HMD with an integrated head-tracking system (HTC Vive, https://www.vive.com/de/product/) was connected to a computer (Windows 7, Intel (R) Core (TM) i7-3820 CPU, 64 bits, NVIDIA GeForce GTX 660, 8 GB Ram). A typical desktop PC monitor (EV2750) was connected to the same computer for the less immersive condition. On-ear headphones (JBL JR300) were connected to the 6DoF HMD or PC monitor, depending on which condition was tested. Thus, the sound quality and audio substitution between both conditions were stably maintained.

Based on Anderson et al. (2017), we chose a monoscopic 360° beach video (https://www.sphaeresvr.com/experience/vr-nature/dream-beach-mallorca) showing a sheltered cove looking toward the ocean, accompanied with ocean sounds (see Fig. 2 for a screenshot; permission granted from Atmosphaeres). The video lasted five minutes and had no scene changes. It was played once via the 6DoF HMD and once via the PC screen, depending on the condition.

Fig. 2
figure 2

A screenshot from “Dream Beach Mallorca” (permission granted from Atmosphaeres)

Procedure

After signing the informed consent, participants were connected to the NeXus Mark II, and the two-minute baseline SCL and HR measurements were obtained. Then, participants went through alternating phases of stressors and recovery until all phases were completed.

Following the example of the Trier Social Stress Test (TSST; Kirschbaum et al. 1993), all participants had to solve three arithmetic tasks which required all participants to mentally subtract a number (e.g., 13) from a larger one in successive steps (1022-13, 1687-19, and 2043-17). Participants subtracted the numbers out loud in front of the experimenter with a three-minute time limit for each task. The experimenter increased participants’ stress level by continuously watching them with a neutral facial expression, noting their answers on a sheet, instructing participants to start over if they made a mistake, and calling out time markers if the participants paused.

During one recovery phase, the subjects watched the 360° beach video via a 6DoF HMD (HMD condition). During the others, they watched the same video using a standard computer monitor (PC condition) or completed a control condition without a beach video.

Participants could control their perspective via head movements in the HMD condition. In the PC condition, participants could set their perspective with a computer mouse before watching the video. Afterward, no further interactions with the computer mouse were allowed, to avoid artifacts in recording HR.

In the control condition, participants were instructed to sit quietly for five minutes without listening to any sounds. Since participants were seated in front of a partition wall, the experimenter was not visible to them during any condition. The computer screen was turned off during the control condition.

After the last recovery condition, the SCL and HR recordings ended. Next, a questionnaire was handed out to assess socio-demographic information, technology usage (Technology Usage Inventory), and self-reported relaxation for the three recovery conditions.

Preprocessing SCL and HR data

The raw SCL and HR data were preprocessed in the following four steps. In the first step, we averaged the 32 SPS to one data point per second. Subsequently, we removed all time intervals that did not directly belong to one of the seven experimental phases (e.g., time intervals during which the participants put on the 6DoF HMD). In the third step, this dataset was z-transformed to minimize interindividual differences in SCL and HR (Ben-Shakhar 1985). We calculated 60 s intervals for SCL and HR by averaging the 60 data points (Birkett 2011). Finally, we arrived at five measurements of SCL and HR (min 1, min 2, min 3, min 4, and min 5) for each recovery phase.

Results

Data were analyzed using IBM SPSS Version 26 (SPSS Inc. Chicago, USA). To analyze differences in SCL, we conducted a two-factorial repeated-measure ANOVA (rmANOVA) with time (min1, min2, min3, min4, min5) and recovery conditions (HMD, PC, control). In case of an interaction effect of time × condition, subsequent contrast analysis with t-tests for dependent samples and a Bonferroni correction were computed to examine whether the reduction difference in SCL (min1–min5) was significantly higher in the HMD than in the PC and control condition. Furthermore, contrasts examined whether the reduction difference was significantly higher for the PC condition than for the control condition. Exploratory three-way interactions were conducted to determine whether the reduction of SCLs was dependent on participants’ gender, age, and technology anxiety. We applied the same procedure for HR.

To assess differences in perceived relaxation, we conducted a one-way rmANOVA with the three relaxation ratings as a within-subject factor. In case of a main effect, subsequent contrast analysis followed the same procedure as mentioned above. Exploratory two-way interactions were calculated to determine whether the relaxation ratings were dependent on participants’ gender, age, and technology anxiety.

For each independent variable, a sensitivity analysis for dependent sample t-tests (one-side, α = 0.017, β = 0.80, N = 102) indicated that we could detect differences if the effect size was d ≥ 0.29 (0.20 = small effect, 0.50 = medium effect, 0.80 = large effect; Cohen 2013). Sensitivity analysis was performed with the program G*power (Faul et al. 2007).

All reported results were corrected by the Greenhouse–Geisser procedure whenever assumptions of sphericity were violated.

Manipulation check

For the manipulation check, we averaged the SCL and HR for all the three mental arithmetic tasks to one SCL and HR value. The same procedure was performed for all three recovery conditions. Dependent t-tests revealed that the SCL was significantly higher during the mental arithmetic tasks (M = 0.74, SD 0.31) than during the recovery condition (M = − 0.20, SD 0.23), t(100) = 17.69, p < 0.001, d = 1.74, or during baseline (M = − 1.82, SD 0.57), t(100) = 48.87, p < 0.001, d = 5.14. Similarly, the HR was significantly higher during the mental arithmetic tasks (M = 0.59, SD 0.26) than during the recovery condition (M = − 0.32, SD 0.14), t(100) = 23.08, p < 0.001, d = 2.19, or during baseline (M = − 0.26, SD 0.32), t(100) = 16.88, p < 0.001, d = 1.67. This suggests that the mental arithmetic tasks did, in fact, cause psychophysiological activation.

Hypotheses testing for SCL

The two factorial rmANOVA revealed a significant main effect of time, F(1.35, 135.29) = 286.03, p < 0.001, ηp2 = 0.741; no significant main effect of recovery condition, F(2, 200) = 0.04, p = 0.958, ηp2 < 0.001; and a significant interaction of time × recovery condition, F(3.84, 384.83) = 3.69, p = 0.006, ηp2 = 0.036 (see Fig. 3). As expected, contrasts revealed that the reduction difference of the HMD condition (M = 0.97, SD 0.64) was significantly higher than that in the control condition, (M = 0.79, SD 0.69), t(100) = 2.43, p = 0.008, d = 0.24. In addition, results revealed that the reduction difference of the PC condition (M = 0.98, SD 0.57) was significantly higher than that in the control condition, t(100) = 2.70, p = 0.004, d = 0.28. Contrary to expectations, there was no significant difference between the HMD and PC condition, t(100) = 0.08, p = 0.465, d = 0.017.

Fig. 3
figure 3

Shows the z-transformed skin conductance level per minute separated by conditions

Exploratory moderation analyses for SCL

With regard to the exploratory questions, we found no significant three-way interactions for gender, F(3.89, 377.51) = 0.76, p = 0.544, ηp2 = 0.008, age, F(3.89, 377.51) = 0.79, p = 0.528, ηp2 = 0.008, or technology anxiety, F(3.89, 377.51) = 1.25, p = 0.289, ηp2 = 0.013.

Hypotheses testing for HR

The two factorial rmANOVA revealed a significant main effect of time, F(2.85, 285.75) = 5.03, p = 0.002, ηp2 = 0.048; no significant main effect of recovery condition, F(2, 200) = 1.20, p = 0.303, ηp2 = 0.012; and no significant interaction of time × recovery condition, F(6.71, 671.17.83) = 0.60, p = 0.744, ηp2 = 0.006. Contrary to expectations, we did not find significant differences between the HMD, PC, or control condition. Since there was no significant interaction or main effect for the recovery condition, the contrast and exploratory analyses were not performed.

Hypotheses testing for relaxation ratings

The one-way rmANOVA revealed a significant main effect of recovery condition, F(1.82, 182.33) = 30.53, p < 0.001, \({\eta }_{{p}^{2}}\) = 0.234. As expected, contrasts revealed that the HMD condition (M = 5.40, SD 1.42) was perceived as significantly more relaxing than the PC (M = 4.07, SD 1.46), t(100) = 6.83, p < 0.001, d = 0.62, and the control condition (M = 3.98, SD 1.64), t(100) = 6.12, p < 0.001, d = 0.60. Contrary to expectations, contrasts found no significant difference between the PC and control condition, t(100) = 0.54, p = 0.293, d = 0.036.

Exploratory moderation analyses for relaxation ratings

With regard to the exploratory questions, we found no significant two-way interactions for gender, F(1.79, 174.34) = 0.10, p = 0.878, \({\eta }_{{p}^{2}}\)= 0.001), age, F(1.79, 174.34) = 2.03, p = 0.134, \({\eta }_{{p}^{2}}\) = 0.021), or technology anxiety, F(1.79, 174.34) = 0.19, p = 0.973, \({\eta }_{{p}^{2}}\) < 0.001).

Discussion

Summary of main findings and relation to previous research

The overarching goal of the present study was to examine whether a monoscopic 360° beach video was significantly more relaxing after an acute stressor than the same video presented via a TV screen or no video (control condition). Adding a control condition with no video, a valid acute stressor before the recovery conditions, and psychophysiological (SCL and HR) measures of relaxation, complements previous research.

In line with the hypotheses, results showed that the HMD and PC condition led to significantly larger decreases in SCL than the control condition. Contrary to our expectations, no significant differences were found between the HMD and PC condition. Furthermore, the HR analyses showed no significant difference between the recovery conditions, which may be due to a floor effect. Figure 4 illustrates that the average z-transformed bpm reached the baseline level in the first minute and remained there. Moreover, the results showed that the monoscopic 360° beach footage presented via 6DoF HMD was perceived as significantly more relaxing than the PC and control condition. However, contrary to expectations, the PC condition was not perceived as significantly more relaxing than the control condition.

Fig. 4
figure 4

Shows the z-transformed beats per minute for the baseline and all recovery conditions over time

Exploratory analyses revealed that participants’ age, gender, and technology anxiety did not significantly influence the SCL and perceived relaxation results. Nevertheless, women generally reported more technology anxiety, so we reran our analyses and included a gender × technology anxiety interaction term for the SCL and perceived relaxation analyses (see Table 2). There were no significant differences. This lack of differences could be explained by the generally low level of technology anxiety in the present sample.

Table 2 The influence of the interaction term gender × technology anxiety and previous HMD experience regarding SCL and perceived relaxation

Our main results can be discussed and interpreted in light of those of Yeo et al. (2020). Although potential confounding effects cannot be ruled out (e.g., different video footage, different documentary narratives, a mere-exposure effect of the Blue Planet II), Yeo et al. (2020) found no significant differences between the monoscopic 360° underwater video presented via 6DoF HMD and an underwater video presented via a TV screen in terms of self-reported mood changes. Psychophysiological relaxation effects were not measured. Our analyses showed no significant differences between the HMD and PC condition regarding SCL and HR. In line with Yeo et al. (2020) the results indicate that a 360° monoscopic nature video presented with more (6DoF HMD) or less (PC/TV screen) immersive hardware offers only a slight relaxation advantage. It raises the question of whether the characteristics of monoscopic 360° videos (2D, no active interactions with the virtual environments, and reducing 6DoF to 3DoF) may have made the different hardware conditions, in fact, more similar than expected.

However, our results also showed that the HMD condition was perceived as significantly more relaxing than the PC and control condition. There are several explanations for this finding. Most likely, our results differ from Yeo et al.’s (2020) due to the differences in study designs. We chose a within-subject design with good statistical power to address the shortcomings of previous research with small sample sizes. However, we cannot rule out a halo or “jaw-dropping” effect in the HMD condition, which may have biased the self-reported relaxation ratings. To counteract these arguments to some degree, we reran our analyses and included the previous HMD experience as an independent variable (see Table 2). There were no significant differences. Nevertheless, a potential halo effect can ultimately only be controlled in a between-subject design, as conducted by Yeo et al. (2020).

Interestingly, the incongruence between perceived relaxation and psychophysiological response is not uncommon, and is not necessarily a contradiction (Anderson et al. 2017; Mostajeran et al. 2021). Despite methodological limitations, the perceived relaxation can be derived independently from ANS arousal (Lazarus 1994; Scherer and Moors 2019). Based on the approach to social perception (Burner and Postman 1948), it is possible that perceived relaxation is not a direct reflection of ANS activity. Instead, it is a multifactorial process in which participants compare their expectations with relaxation experiences gained in the experiment. Therefore, the monoscopic 360° beach video presented via HMD may be more congruent with participants’ expectations of a relaxation effect, which is why it was retrospectively evaluated as more relaxing.

Limitations and alternative explanations

Regarding SCL results, Fig. 3 illustrates an almost identical decrease in all three recovery conditions until the third minute. This decrease is most likely explained by a general habituation effect, because participants were no longer doing math tasks. From minute 4 onward, the SCL decrease was higher for the HMD and PC condition than the control condition. A fatigue/boredom effect probably occurred over time in the control condition and was prevented by a general distraction effect in both the HMD and PC condition.

A general distraction effect can be attributed to nature sounds (Alvarsson et al. 2010), which existed in both conditions, and is in line with Conditioned Restoration Theory (CRT; Egner et al. 2020). CRT postulates a two-step model for the relaxation effects of natural environments. The first step involves associating nature with relaxation, and the second step consists of experiencing relaxation when presented with an associated stimulus (e.g., auditory stimuli; Egner et al. 2020). Therefore, it is possible that auditory rather than visual stimuli influenced the habituation speed of SCL. Thus, future studies should systematically control the presentation of natural sounds.

Moreover, it is possible that differences between the HMD and PC monitor condition would have required more time to affect the results. One argument to support this alternative is that SCL is exclusively sympathetically innervated, and as such it constitutes a direct indicator for physiological arousal (Boucsein 2012). Parasympathetic antagonisms are excluded for SCL (Boucsein 2012; Schwartz and Andrasik 2017). Therefore, it is likely that the habituation of SCL requires more time. Furthermore, this alternative explanation is also relevant to the question of whether the more immersive features of the HMD condition (visual substitution, 360° degree, and changing the viewpoint through head movements) can trigger more attentional resources or capture them over a longer time than the PC condition, which should influence the physiological relaxation response, either in strength or over time. Since our SCL results did not find significant differences between the HMD and PC monitor condition within the first five minutes and none of the recovery conditions reached baseline level (see Fig. 5), we consider the assumption that the more immersive features of the HMD can capture attentional resources over a longer time to be promising. Nevertheless, our design cannot verify this assumption. Therefore, further research is needed to confirm it empirically by extending the presentation time of a monoscopic 360° nature video via an HMD or a normal screen.

Fig. 5
figure 5

Shows the z-transformed skin conductance level for the baseline and all recovery conditions over time

In contrast to our expectations, the HR analyses showed no significant difference between the recovery conditions, which may be due to a floor effect. Figure 4 illustrates that the average z-transformed bpm reached the baseline level at the first minute and remained there. Previous studies reported the average HR over several minutes or throughout the entire intervention but did not report the minute-by-minute HR (Anderson et al. 2017; Annerstedt et al. 2013; Blum et al. 2019; Rockstroh et al. 2019; Wang et al. 2019). Therefore, this fast habituation was difficult to anticipate.

Physiologically, HR can be innervated both sympathetically (inspiration) and parasympathetically (expiration; Blum et al. 2019; Schwartz and Andrasik 2017; Wang et al. 2018). During the mental arithmetic task, the participants had to verbalize the results. This verbalization may have caused a change in the breathing cycle, increasing the breathing frequency and HR. In the subsequent recovery phases, the participants did not speak, and the breathing rhythm normalized to the baseline level within one minute (Shaffer and Ginsberg 2017; Tursky et al. 1969). Thus, the floor effect can be explained by physiological mechanisms, which may have prevented the detection of further relaxing effects between the recovery conditions.

In addition, other psychophysiological parameters with higher temporal resolution might have shown differences between the recovery conditions, such as HRV (Anderson et al. 2017; Liszio et al. 2018), or electroencephalography (EEG; Bilgin et al. 2019; Vaquero-Blasco et al. 2021). In contrast to HR (beats per minute), HRV measures specific changes in time (or variability) between the R-R intervals in milliseconds (Malik et al. 1996). Thus, HRV may be more likely to reveal differences in shorter experimental phases due to its higher temporal resolution. In addition, HRV offers several analysis options to examine ANS activity in more detail (e.g., low frequency, LF; high frequency, HF; the ratio of LF/HF; root mean square of successive differences, RMSSD; and standard deviation of successive differences, SDSD). Empirically, results are ambivalent. For instance, Anderson et al. (2017) could not detect clear differences with HRV (operationalized by LF, HF, and LF/HF). In contrast, Liszio et al. (2018) were able to demonstrate that HRV (operationalized by SDSD) was significantly higher with TheBlu presented via 6DoF compared to a video recording of TheBlu presented via PC screen and a control condition. Thus, it is possible that HRV would have detected differences in the present study.

Furthermore, the present study focused exclusively on the peripheral nervous system. It remains unclear whether changes in the central nervous system would have shown differences between the recovery conditions (Amores et al. 2018; Bilgin et al. 2019). Therefore, future studies should explore EEG activity during exposure to a virtual natural environment. Recent research demonstrated that EEG can be used to quantify physical relaxation responses during a monoscopic 360° nature video presented via 6DoF HMD (Vaquero-Blasco et al. 2021). Moreover, the authors argued that EEG is less susceptible to artifacts compared to SCL and HR measures and benefits from high temporal resolution. Therefore, future studies should also use EEG measures to operationalize psychophysiological relaxation.

In addition, the two-minute baseline interval may have been too short to detect a valid baseline level. Although the manipulation check demonstrated clear changes between baseline, acute stressor, and recovery conditions and although previous studies have also chosen relatively short baseline intervals (e.g., 2 min baseline interval in Anderson et al. 2017), 10–15-min intervals are recommended for baseline measuring (Boucsein et al. 2012).

Furthermore, head movements in the HMD condition might have diminished the effect of the immersive presentation on arousal reduction because this condition included greater body movements, which influence both SCL and HR. Similarly, it is also possible that a small portion of the current sample did not exhibit significant skin conductance changes e.g., due to certain low-threshold medications like antihistamine medications. However, we did not systematically collect data on participants’ drug treatments. Hence, we cannot rule out that this may have biased the psychophysiological results. Therefore, future studies should systematically collect information on participants’ drug treatments.

Conclusions

In essence, our study found that: (1) watching a monoscopic 360° beach video via 6DoF HMD or PC monitor after being experimentally stressed was more relaxing, according to SCL measures, than doing nothing; (2) no significant difference was found between the HMD and PC condition regarding SCL; (3) there were no significant differences between all three recovery conditions regarding HR, probably caused by a floor effect; but (4) the HMD condition was perceived as significantly more relaxing (single-item) than the PC and control condition; (5) however, the PC condition was not perceived as significantly more relaxing than the control condition, and (6) exploratory analyses revealed that the findings were not influenced by participants’ gender, age, or technology anxiety.

Monoscopic 360° videos do not comprehensively use all the immersive characteristics of contemporary 6DoF HMDs (no stereoscopy, no active interactions, 6DoF is reduced to 3DoF). Therefore, the degree of immersion is limited when presenting monoscopic 360° videos via 6DoF HMDs, making the HMD and PC condition more similar than expected. Similar results were found by Yeo et al. (2020). Therefore, future studies should systematically examine how much immersion manipulation (hardware and software) is needed for the presentation of monoscopic 360° nature video to provide the best relaxation effects.