Background

A general shift to digital work over the last decades and recent events such as the COVID-19 pandemic have contributed to the increasing prevalence of stress in our society [1,2,3,4,5,6]. While acute stress is mostly innocuous, as it reflects an adaptive response to demands and challenges [7, 8], repeated or longer periods of stress (i.e., chronic stress) have been shown to harm physical and mental health [9,10,11]. Chronic stress is associated with anxiety disorders [12], depression [13], vital exhaustion [14], pain and fatigue syndromes [9], and cardiovascular problems [15], including hypertension [9], among other symptoms and disorders, and negatively affects an individual’s psychological well-being and quality of life [16, 17]. The alarming rise of stress and associated symptoms and disorders calls for effective prevention and treatment options. One increasingly recognized stress management method is heart rate variability biofeedback (HRV-BF) [18, 19]. Based on the Association for Applied Psychophysiology and Biofeedback’s general definition of biofeedback [20], HRV-BF can be defined as a process of teaching individuals to regulate cardiac activity to improve health-related measures. Immediate and precise feedback on users’ cardiac activity, measured using electrocardiography or photoplethysmography devices, combined with changes in thoughts, emotions, and behavior, promotes the desired physiological and psychological improvements. With practice, these changes can become lasting even without the ongoing use of a device [20]. HRV-BF is effective in reducing chronic stress and various stress-related physiological and psychological symptoms such as hypertension, cardiovascular disease, asthma, pain, sleep disturbances, hypertension, cognitive performance, depression, anxiety, and emotional states more generally (for reviews and meta-analyses, see [19, 21,22,23,24,25,26]). In their meta-analysis on the effectiveness of HRV-BF, Lehrer et al. [22] found medium effect sizes for reducing anxiety and depression and small effect sizes for decreasing perceived stress and improving cardiovascular measures in studies conducted up until 2018. Pizzoli et al. [23] found even larger effect sizes regarding reducing depressive symptoms – a degree of effectiveness comparable to cognitive behavioral therapy. They note that these differences in meta-analytic effect sizes compared to Lehrer et al. [22] might be due to the inclusion of more recent studies (years 2019-2020) and the use of newer biofeedback (BF) devices able to provide more sophisticated visual feedback making HRV-BF more effective [23].

Heart rate variability (HRV) reflects the variability of time intervals between two heartbeats [27]. HRV at rest indicates the balance of the parasympathetic and sympathetic branches of the autonomic nervous system (ANS) [28]. High resting HRV, indicating high parasympathetic control, has been linked to psychological resilience and the capacity to adapt to changing demands, while low resting HRV, indicating low parasympathetic control, has been linked to chronic stress, psychopathology and increased mortality [28,29,30,31,32,33,34]. During HRV-BF, HRV is intentionally maximized through breathing. These increases in HRV are associated with increased vagal afferent transmission to prefrontal regions involved in executive control [35]. When practiced over a longer period, HRV-BF is believed to ameliorate heart-brain connectivity and improve self-regulation, allowing individuals to better cope with stressful situations [30, 36,37,38,39,40]. Indeed, studies have shown that HRV-BF interventions lead to increases in resting HRV [37, 41,42,43], as well as a buffered decrease in HRV during stress and a greater increase of HRV during recovery compared to control groups [41, 42, 44]. Generally, individuals with higher HRV have also been observed to recover more swiftly from stress regarding cortisol levels [45]. Cortisol indicates hypothalamic-pituitary-adrenal (HPA) axis re/activity and affects several bodily processes important to overcoming stressful situations [46]. The ANS and the HPA axis are anatomically and functionally connected, whereby the ANS has been found to moderate the reactivity of the HPA axis [45, 47,48,49,50]. According to Porges et al. [29], the parasympathetic nervous system regulates an individual’s response to stress by modulating the activity of the sympathetic nervous system and HPA axis.

The intentional increases in HRV during HRV-BF are brought about through slow and paced breathing at resonance frequency (RF) [38, 39]. RF can be identified as a high-amplitude peak in the low-frequency (LF; 0.04-0.15 Hz) band of the power spectrum of HRV [51]. Resonant breathing stimulates the baroreflex and maximizes respiratory sinus arrhythmia [51]. The RF of an individual lies between 4.5 and 6.5 breaths per minute and is determined before HRV-BF training [38, 52]. Studies have shown that resonant breathing leads to higher increases in HRV than breathing at an average rate of 6 breaths per minute [53,54,55]. Initially, clients are often guided by an auditory or visual pacer set to their RF. In time, clients learn to maximize their HRV based on the provided visual feedback of, for example, heart rate (HR) fluctuations or shifts in the power spectrum of HRV. Ultimately, the goal of HRV-BF is to increase parasympathetic and dampen sympathetic activation without the continuous help of supervision, BF, or technological devices [30, 38, 56]. Most HRV-BF intervention programs include daily resonant breathing of at least 10 minutes to reinforce the effects on the baroreflex, parasympathetic nervous system, and emotion-modulating regions of the brain [22,23,24], and to support the transfer of skills into everyday life [56]. Typically, these HRV-BF intervention programs have been based on simple HR monitors (e.g., cardio tachometer) or other HR tracing devices on computer screens [22, 57].

More recently, the use of virtual reality (VR) technology for BF has been proposed as a novel form of delivery for treating different symptoms and disorders such as pain, post-traumatic stress disorder, or anxiety [58,59,60], but also for stress management purposes [57, 61]. VR has been investigated because it enables high levels of agency and presence for users in virtual environments due to eliminating external distractions and increased ease of use, especially in combination with head-mounted displays (HMDs) [62]. According to the Cognitive Affective Model of Immersive Learning (CAMIL; [62]), these two main affordances of VR in HMD (i.e., high presence and agency) positively influence factors such as motivation, self-efficacy and self-regulation, which, in turn, foster the effectiveness of learning and training in general [62]. Similarly defined constructs, such as motivation, or related constructs, such as autonomy, mastery and learnability, have also been identified as key properties of an effective biofeedback training by the psychoengineering paradigm [63]. In line with CAMIL [62], initial empirical evidence indicates that using a VR with HMD for HRV-BF might be especially effective. Studies investigating the feasibility of VR-supported HRV-BF for stress management found short-term improvements in stress-related measures and high levels of motivation, involvement, and self-efficacy [53, 61, 64,65,66,67,68,69,70]. Blum et al. [65] found that using an HMD for an HRV-BF session led to significantly higher levels of mindfulness and attention resources and reduced mind-wandering than using a two-dimensional screen. In our work [53], we compared standardized paced breathing to RF HRV-BF delivered both with an HMD and a desktop screen in a single-session study. Results show that using HMDs for HRV-BF and paced breathing reduced psychological stress as much as using desktop screens. Moreover, increases in the coherence ratio (CR; a measure derived from LF) and immersion adaptation were significantly higher while performing HRV-BF and paced breathing in VR than using a desktop screen. Independent of technology, HRV-BF led to higher increases in LF power and was associated with higher degrees of presence than paced breathing. We also found that participants felt competent and autonomously motivated to adopt both technologies and techniques. According to Self-Determination Theory (SDT) [71], satisfying these two basic psychological needs, an individual’s competence and autonomy, is important for overall health and well-being, both on the level of technology and in life, in general, [71, 72]. Health interventions that are able to satisfy basic psychological needs have been shown to promote autonomous motivation [73] and improve well-being and reduce stress [74].

However, the mentioned VR-supported HRV-BF interventions for stress management have so far mostly been investigated in single-session studies. There are only two studies investigating the effects of VR-supported HRV-BF interventions over multiple sessions. One study by Gaggioli et al. [64] compared a VR-based intervention program using an HMD that combined stress inoculation training and relaxation with BF to an active control group (i.e., a cognitive behavioral technique intervention) and a wait-list control (WLC) group over 5 weeks. Both interventions were able to reduce perceived stress compared to the WLC, and only the VR-based intervention program was able to significantly reduce anxiety. The study, however, did neither assess any lasting effects in a follow-up assessment nor the user experience (UX) during the intervention itself. Furthermore, due to the combination of both relaxation and stress inoculation, it is not possible to disentangle the specific effects of either method. Lastly, HRV-BF was only used during relaxation in VR to enhance the environment rather than to guide participants’ breathing. In another study, Maarsingh et al. [69] tested the effectiveness and usability of a VR-based stress reduction game using an HMD. Specifically, they assessed the game’s usability in a group of healthy participants in a single session and the effectiveness over three sessions with a clinical sample. They found their application effectively decreased the participant’s stress levels and to be user-friendly with good involvement. The study, however, did not include a follow-up assessment or a control group, and the time between the sessions was not reported.

In summary, initial results look promising and have shown that VR-based HRV-BF interventions delivered with HMDs are feasible. However, existing research is missing the long-term evaluation of effectiveness and the assessment of changes in UX. Investigating specific and lasting psychological and biological effects in longitudinal randomized controlled studies is important to evaluate whether such an intervention would be effective enough to be offered as a program outside of research, for example, for private use at home, at the workplace, or in different healthcare settings. Besides investigating a range of stress-related psychological and psychobiological outcomes, it is also essential to investigate potential changes in UX (e.g., motivation, involvement, and usability) towards the technology over time (e.g., a novelty effect that wears off) as they influence the training outcome [62]. Furthermore, sustained good UX, such as high motivation, is important for adherence to an intervention program [75].

To our knowledge, this is the first study to investigate the effectiveness of using HMD technology, besides the support of mobile technology, to deliver a breathing-based HRV-BF intervention program for stress management. To evaluate this intervention, we aimed to study changes in primary psychological and psychobiological outcomes, secondary psychological outcomes, and UX measures:

  1. 1)

    Primary outcomes: Can a VR-supported HRV-BF intervention program improve psychological and psychobiological indicators of stress (re/activity)?

  2. 2)

    Secondary outcomes: Can a VR-supported HRV-BF intervention program improve levels of anxiety, depression, health-related quality of life, fatigue, mindfulness, and psychological well-being?

  3. 3)

    User experience: What are the UX measures of a VR-supported HRV-BF intervention program, and how do they change with repeated use?

Regarding primary outcomes, we hypothesized that self-reported stress, resting heart rate, and blood pressure would decrease, whereas measures of resting heart rate variability would increase in the intervention compared to the WLC group from pre- to post-intervention and from pre-intervention to follow-up. In addition, we hypothesized that the intervention would lead to a more adaptive stress reactivity in terms of a less heightened and prolonged response compared to a WLC. Regarding secondary outcomes, we hypothesized that self-reported anxiety, depression, fatigue, and frustration of basic psychological needs would decrease while health-related quality of life, mindfulness, psychological well-being, and satisfaction of basic psychological needs would increase in the intervention compared to the WLC group from pre- to post-intervention and from pre-intervention to follow-up. The analysis of UX measures was exploratory, but based on our previous study [53], we expected the user experience to be high at the beginning and to either remain at that level or even improve throughout the study.

Methods

Study design

In this study, healthy university students were randomly assigned to either participate in an intervention (INT), which consisted of HRV-BF training in VR and using a smartphone application for at-home exercises over four weeks, or to a WLC group (refer to Fig. 1). Stratified by biological sex, the participants were randomly allocated the INT or the WLC. Both groups came to the lab for three measurements. Namely, in week 1 for pre-intervention measurements (Pre), in week 5 for post-intervention measurements (Post), and in week 9 for follow-up measurements (Follow-up). Specifically, participants filled in the same psychological trait questionnaires, had their blood pressure (BP) measured, and had their cardiac activity continuously monitored during all measurements (i.e., Pre, Post, Follow-up). During Pre, all participants answered control and sociodemographic questions and questions on their experience with technology and relaxation methods to assess the characteristics of both groups. After Post, both groups underwent a psychosocial stress test immediately after the measurements. Participants in the INT additionally came to the lab once a week from weeks 1 to 4 to undergo the VR-supported HRV-BF intervention. In between lab sessions, participants of the INT performed exercises at home. The first VR-supported HRV-BF session in the lab took place immediately after Pre. The WLC received the HRV-BF intervention after Follow-up.

Fig. 1
figure 1

Study protocol. Abbreviations: INT = Intervention group; WLC = Wait-list control group

Participants and procedure

Participants were recruited using the University Registration Center for Study Participants [76]. Participants were compensated with 25 Swiss francs per hour of participation (INT: 9.75h; WLC: 3.75h). Potential participants’ eligibility regarding the inclusion criteria were assessed during the enrolment using an online screening questionnaire. Participants were required to be fluent in German, between the ages of 18-40, have normal or corrected-to-normal vision, have no disability of arms or hands, and have at least a secondary school diploma. Furthermore, we excluded potential participants that reported acute or chronic somatic diseases, psychiatric disorders, regular use of medication (e.g., antidepressant, antipsychotic, antihypertensive), the consumption of psychoactive substances in the last three months, or heavy drinking (\(\ge\)15 and \(\ge\)8 drinks per week for men and women, respectively [77]). Further, the exclusion criteria included regular tobacco consumption (>5 cigarettes per week) unless the consumption was limited to the weekends. Women with an irregular menstrual cycle (i.e., not between 27-32 days ±4 days), using hormonal contraceptives, that were pregnant or lactating were also excluded from the study due to the respective effects on the ANS and, in particular on HRV (see, e.g., [78,79,80]).

Due to a lack of consensus and well-established power analysis methods for the statistical models used in the analysis (i.e., multilevel models), we determined the sample size based on propositions concerning the number of participants for randomized controlled trial pilot studies [81] and studies analyzed with multilevel models, which recommend a minimum total sample size of 70 participants [82,83,84]. Furthermore, we increased the number of participants to account for possible no-shows, dropouts, and missing data. Thus, we aimed to enroll a total of 100 participants (50% female).

To accommodate the planned number of participants, we organized multiple session slots in week 1 of, at most, 15 participants per slot. Participants were free to select a series of slots that best fitted their schedule for the duration of the study on a first-come-first-served basis; that is, they were asked to return to the lab on the same weekday and time for each week of measurements and/or HRV-BF training. Before participants chose their slot, each slot was randomly assigned a condition (i.e., INT or WLC). To ensure the randomization of participants, this information was concealed and not disclosed to participants before they chose their slots. After choosing a series of slots, participants were informed about their assigned condition. At this point, participants in the WLC were informed that they would only participate in the measurement sessions, and their schedule was adapted accordingly (i.e., pre-scheduled VR-supported HRV-BF sessions were canceled).

Intervention

The four-week intervention consisted of four weekly VR-supported HRV-BF training sessions at the lab, which were accompanied by psychoeducational videos and breathing exercises outside of the lab, which were guided by a smartphone application. In the first HRV-BF session (S1), participants spent 26 minutes in VR, whereas in sessions 2 to 4 (S2, S3, S4), they spent 20 minutes in VR (refer to Table 1 for detailed information on the session timings). For their homework, participants were asked to perform the breathing exercises for at least 10 minutes on days when they were not in the lab (i.e., six days per week).

Table 1 Timings for each intervention session at the lab

Virtual reality heart rate variability biofeedback

Lehrer et al.’s manuals informed the number and content of intervention sessions [38, 85]. The more recent manual proposes a training period in the laboratory of around 4-5 weekly sessions, including psychoeducational, accompanied by at-home exercises between sessions. Sessions usually last around 25 minutes with 2-minute breaks in-between biofeedback blocks. In our study, each HRV-BF session also started with a psychoeducational video participants watched on a computer screen. The videos explained the causes and effects of stress on mental and physical health and how BF may help to improve autonomic functioning, gas efficiency of the lungs, and stress management skills. In addition, the videos introduced the individual components of HRV-BF and the meaning of the different visual elements shown in VR and gave instructions on how to breathe. Refer to Table A4 in the supplementary information for a more detailed overview of the content of the videos. Following each psychoeducational video, participants then put on the HMDs to start the HRV-BF training. In S1, participants first completed a six-minute block of paced breathing at 0.25 Hz, from which their individual RF was determined following Sakakibara et al.’s protocol [52]. This protocol offers a more precise and time-efficient estimation of an individual’s RF than the protocol proposed by Lehrer et al.’s manuals [38, 86]. Sakakibara’s protocol uses the peak frequency in the LF region of HRV under respiratory control at 0.25 Hz as an estimate for individual RF. RF is calculated as the argmax in the region of 0.075 Hz to 0.10833 Hz (i.e., between 4.5 and 6.5 breaths per minute) of the power spectral density from the last 5 min of collected inter-beat intervals (i.e., R-R intervals; RRi). Following RF determination, Lehrer et al. [86] suggest instructing clients to follow a pacer set to their RF in the first two sessions. From the third session onward, clients are instructed to follow the fluctuations in their heart rate to guide RF breathing and only use the pacer when needed. During the HRV-BF training, participants in our study were asked to breathe slowly and regularly at the pace of their determined RF. In S1, they were guided by an auditory and visual pacer, which was placed as a three-dimensional object on a rock in the field in front of the participants in the VE (refer to Fig. 2). After acclimatizing to the VE and the pacer, real-time HR feedback was introduced during the second block of S1. Specifically, the HR values of the last 25 seconds were shown using a line graph, which was in a fixed position in the VE (i.e., to the right of the pacer; refer to Fig. 2). In S2 and S3, the audiovisual pacer was only present for two minutes at the beginning of each block. In S4, the pacer was omitted completely. In the sessions where the pacer was no longer present, participants were instructed to exhale when the HR reached its highest value and inhale at its lowest value, thus amplifying the sinusoidal HR curve caused by RF breathing. The aim of gradually removing the pacer was to increase participants’ self-awareness, self-regulation, feelings of autonomy, and competence and to shift from external to increasingly internal feedback.

Fig. 2
figure 2

The virtual environment from the user’s perspective. The pacer was placed above a stone on the left side of the meadow. The blue graph in the center was used for immediate feedback, visualizing the heart rate fluctuations of the last 25 seconds. The flowers in the meadow and the sun in the background were used for the heart rate variability cumulative feedback

The virtual environment (VE) used in both the HRV-BF training and the RF determination was an improved version of our previously developed and tested VE [53]. The VE depicted an open alpine region on a summer day and was experienced from the perspective of sitting on a large tree trunk and overlooking a flat, green field (refer to Fig. 2). Bird songs and the sounds of water flowing in a tiny creek made up the soundscape. With respect to the version used in our previous study [53], we improved the environment by adding new elements wished for by a majority of participants (i.e., dynamic clouds and changing sky color over time), polishing the soundscape (i.e., removing sounds that were perceived as annoying by participants and adjusting the volume), moving elements such as the pacer to increase their visibility and optimizing visual performance.

In addition to the immediate feedback, participants received cumulative feedback during the HRV-BF blocks. The cumulative feedback was used to provide participants with easy-to-understand feedback on their overall progress and for positive reinforcement [63, 68]. Based on participants’ HRV, their coherence ratio (CR) was computed every 10 seconds from a time frame of the last 90 seconds. The CR is computed based on the relative power of the main peak in the low-frequency band of the HRV signals (i.e., CR is defined as peak power/(total spectral power - peak power) [36]. If the computed CR was \(> 1\) the cumulative feedback score was increased by one point. Otherwise, it remained the same. The VE changed in response to increases in this score. Blossoms grew in the green meadow, the sun rose further on the horizon, and the natural soundscape became more intense. The cumulative score and associated environmental changes were present during all sessions. At the end of each session block, participants were informed about their achieved cumulative score reflecting their progress.

The virtual environment was created using Unity (Unity Technologies, San Francisco, CA, USA) and delivered using an HMD (i.e., Oculus Quest 1, Reality Labs, Menlo Park, CA, USA). The virtual soundscape was heard directly through the HMD’s built-in speakers. A wearable electrocardiogram device, the Polar H10 chest belt (Polar Electro Oy, Kempele, Finland), was used to record inter-beat intervals (i.e., R-R intervals; RRi). The recorded RRi values were streamed from the Polar devices to a desktop computer in the lab, using a custom-made Windows application (based on the Polar software development kit [87]). The RRi data were subsequently sent from the desktop computer to a centralized database, where all of the participants’ information was stored. The RF and cumulative feedback score were then calculated from these RRi values using a self-developed Python script. The results were stored in the centralized database, from which the VR application regularly polled the latest score and RRi values to effectuate changes in the VE and to compute and visualize the HR graph.

Smartphone homework

A self-developed smartphone application supported participants’ daily practice outside of the lab setting (refer to Fig. 3). Daily exercises outside of the laboratory are an integral part of the majority of HRV-BF interventions [88] and are meant to increase the intervention effects and support skill transfer into daily life [38]. In the app, participants could start a breathing exercise session featuring the same audiovisual breathing pacer from the VE, set to their individual RF. Before each breathing exercise session, participants could choose to practice for either 10 or 20 minutes. Lehrer et al.’s manual [38] recommends asking participants to practice for 20 minutes twice a day. To counteract dropout due to potentially high perceived workload and to increase motivation, we asked participants to at least practice for 10 minutes daily but ideally for longer and more than once a day. Adherence was further fostered by instructing participants to set a daily reminder on the smartphone, asking them how the at-home practice was going after each VR session at the lab, and by an in-app progress tracker (refer to Fig. 3). The app recorded the start and end times of each exercise locally, allowing the participants to keep track of their exercises. White tick marks appeared for each day where participants had practiced for at least 10 minutes. Additionally, the data was sent to a database for later analysis. The application was developed for both iOS and Android using Unity.

Fig. 3
figure 3

a The pacer shown in the mobile application was used for breathing exercises outside of the lab. b The progress tracker page of the mobile application allowed participants to keep track of their breathing exercises outside of the lab

Stress test

After regular measurements during the Post assessment in week 5, participants of both groups additionally underwent the Trier Social Stress Test for Groups (TSST-G; [89]) to assess participants’ stress reactivity. Specifically, participants were subjected to a mock job interview followed by a mental arithmetic task. Both tasks were conducted in the presence of two judges in lab coats and two mock cameras, as well as the other participants in that slot. The judges were instructed to behave in a reserved and neutral manner. The tasks were preceded by a preparation phase of five minutes. In this anticipation phase, participants were informed of the task using a prerecorded audio message and asked to prepare themselves for the mock job interview. During the mock job interview (12 minutes), each participant was asked to present themselves to the judges and speak freely about why they would be a good candidate (up to two minutes each). In the following arithmetic task (eight minutes), each participant was asked to count backward from 2043 in steps of 17 in front of the judges and the other participants (60 seconds per participant; the next participant continued from the number where the previous participants had left off to avoid learning effects, no group of participants arrived at numbers < 1000). A research assistant was present during the stress test, assisting the panel of judges with the timing and the participants with the second saliva sampling. The stress test was followed by a recovery period, where participants watched a nature documentary for 30 minutes. The stress tests all took place in the afternoons, starting either at 13:30 or at 16:30.

Outcomes

The primary outcomes of the study were psychological and psychobiological measures of stress: self-reported levels of stress (i.e., periods of one month and one week), HR, HRV, and systolic (SYS) and diastolic (DIA) blood pressure at rest. Additionally, the following stress reactivity measures were used as primary outcomes: self-reported psychological state, levels of salivary cortisol (sCort) and alpha-amylase (sAA), HR, and HRV. Secondary outcomes included levels of anxiety, depression, mindfulness, psychological well-being, health-related quality of life (HRQoL), and fatigue.

Psychological assessment

For the psychological assessment, we used the following questionnaires during Pre, Post, and Follow-up. The Perceived Stress Scale (PSS-10; for which the required license was obtained; [90, 91]) to assess levels of chronic stress during the last month, with scores ranging from 0 to 40. Stress levels during the last week were assessed using the stress scale of the Depression Anxiety and Stress Scales (DASS-21; [92, 93]), while the remaining scales were used to determine levels of anxiety and depression, with scores of all scales ranging from 0 to 42. The Mindfulness Attention and Awareness Scale (MAAS, [94]) was used to measure levels of mindfulness, with scores ranging from 1 to 6. The World Health Organisation-Five Well-Being Index (WHO-5; [95, 96]) was used to measure psychological well-being, with scores ranging from 0 to 100, and the Short Form Health Survey (SF-36; [97]) to determine HRQoL. The SF-36 consists of eight scales (i.e., general health, mental health, bodily pain, vitality, social functioning, physical functioning, role-physical, role-emotional, and vitality), with scores ranging from 0 to 100. Higher scores indicate higher HRQoL. The scales role-emotional and role-physical describe role limitations while performing daily activities and work due to difficulties with mental or physical health. In addition, the Multidimensional Fatigue Inventory (MFI-20; [98]) was used as a measure of fatigue, including scales on general fatigue, mental fatigue, reduced activity, and reduced motivation, with scores ranging from 4 to 20. Lastly, the Basic Psychological Need Satisfaction and Frustration Scale (BPNSFS; [99]), based on SDT [71], was used to study whether the training led to satisfaction and/or frustration of basic psychological needs (i.e., autonomy, competence, and relatedness), resulting in six scales with scores ranging from 4 to 20. In addition, the BPNSFS comprises two composite scores reflecting general satisfaction and frustration with these needs, with scores ranging from 12 to 60.

Physiological assessment

During Pre, Post, and Follow-up, participants’ cardiac activity was measured. From the recorded R-R-interval (RRi), we derived HR and HRV following established measurement standards [100]. For the analysis, we used 10 minutes of the data recorded while participants filled in questionnaires (starting one minute after they began the questionnaires to allow for acclimatization). The raw RRi were filtered for motion artifacts and ectopic beats using the threshold-based filtering method from the Python package hrv, resulting in the exclusion of three measurements (from three different participants) with >10% noisy data. From the filtered data we extracted the mean HR, the Root Mean Square of Successive Differences (rMSSD), the percentage of successive normal-to-normal intervals differing by >50 ms (pNN50), and the standard deviation of normal-to-normal heartbeat intervals (SDNN). Moreover, we extracted the power in the low and high bands (LF and HF) of the power spectral density, and the ratio of LF to HF (LF/HF ratio). The power spectral density was estimated via Fast Fourier Transform from the RRi series interpolated at 4Hz with cubic splines. Baseline measurements of SYS and DIA were assessed using a BP monitor (HBP-1120, Omron, Kyoto, Japan) whilst participants filled in questionnaires during Pre, Post, and Follow-up (i.e., three measurements per participant). Due to a technical issue, the cardiac and BP measurements of three participants were not recorded at Follow-up.

Stress reactivity

Psychological state measures were repeatedly assessed before, during, and after the stress test (-5, 0, +25, and +55 minutes) with the Multidimensional Mood State Questionnaire (MDMQ; which was obtained from the questionnaire’s publisher; [101]) and a self-developed Visual Analog Scale (VAS) asking how stressed the participants felt (slider from 0-100). The MDMQ includes the scales \(\text {MDMQ}_{\text {mood}}\) (“good mood–bad mood”), \(\text {MDMQ}_{\text {calmness}}\) (“calmness–nervousness”), and \(\text {MDMQ}_{\text {wakefulness}}\) (“wakefulness–sleepiness”), with scores ranging from 4 to 20. Lower scores on the MDMQ scales indicate bad mood, nervousness, or sleepiness.

Cardiac measures were continuously recorded during both Post and the subsequent stress test, allowing us to analyze the cardiac measures at different intervals before, during, and after the stress test. Specifically, data was extracted in five-minute intervals at baseline before the stress test (\(\text {TSST}_{\text {BL}}\), starting six minutes after participants’ began answering the Post measurement questionnaires), during the mock job interview (\(\text {TSST}_{\text {I}}\), i.e., +1 to +6 minutes from the start of the stress test), during the arithmetic task (\(\text {TSST}_{\text {M}}\), +13 to +18 minutes), and twice during the recovery phase (\(\text {TSST}_{\text {REC1}}\) and \(\text {TSST}_{\text {REC2}}\), i.e., +24 to +29 and +39 to +44 minutes). The filtering of the raw cardiac data and the extracted measures were the same as described in the Physiological assessment section above.

Biochemical measures were collected through saliva samples using SaliCaps® (IBL-Tecan, Hamburg, Germany) at four time points (-5, +12, +25, +55 minutes) to analyze sCort (as an indicator of the HPA axis) and sAA levels (as an indicator of the ANS) before during and after the stress test. Fresh samples were stored at \(-20^{\circ }\) Celsius until shipment to the biochemical laboratory of the Faculty of Psychology at the University of Vienna (Vienna, Austria). There, saliva samples were stored further at \(-30^{\circ }\) Celsius until analyses. Salivary cortisol concentration was assessed using a commercial luminescence immunosorbent assay (IBL-Tecan, Hamburg, Germany). Salivary alpha-amylase activity was determined using a kinetic colorimetric test and reagents obtained from DiaSys Diagnostic Systems (Holzheim, Germany) after saliva samples were diluted at 1:400 using 0.9% saline solution [102]. Intra- and inter-assay coefficients of variation were \(<10\%\) for both sCort and sAA. For one sample, the concentration of sAA was under the detection threshold of <3 U/mL.

User experience

Participants in the INT were asked to assess the UX of the intervention each time they used the VR application in the lab (S1-S4). Specifically, they completed the Presence Questionnaire (PQ; [103]), the Flow Short Scale (FSS; [104]), the System Usability Scale (SUS; [105]), four sub-scales of the Unified Theory of Acceptance and Use of Technology questionnaire (UTAUT; [106]), and the Simulator Sickness Questionnaire (SSQ; [107]) at the end of each HRV-BF session. The PQ assesses participants’ involvement (score range: 1 to 77), the sensor fidelity (1 to 35), participants’ ability to immerse and adapt (1 to 56), and the degree of negative effects of the interface quality (1 to 21). The FSS assesses participants’ flow experience (10 to 70) and the degree of anxiety (i.e., fear of failure while using a system; score range 1 to 21). The UTAUT assesses effort expectancy, facilitating conditions, behavioral intention, and hedonic motivation (all scores ranging from 1 to 7). The SSQ assesses the degree of nausea (1 to 467), its effects on the oculomotor systems (1 to 371), and the amount of disorientation felt by participants (1 to 682). Participants were also allowed to give free text answers to address any issues or suggestions not covered by the questionnaires. During Post (i.e., after the intervention), participants were asked a set of self-developed questions about using the mobile application for the at-home exercises. In addition, participants were asked to fill in the Autonomy and Competence in Technology Adoption questionnaire (ACTA; [72]) during Pre and Post, also a measure rooted in SDT [71]. The ACTA addressed how much a person thinks using a certain technology will increase their sense of autonomy and competence using four indices. The Autonomy Regulation Score (\(\text {ACTA}_{\text {ARS}}\)) captures intrinsic (“It is going to be fun to use”) and identified regulation (“I believe it could improve my life”), and the Controlled Regulation Score (\(\text {ACTA}_{\text {CRS}}\)) captures introjected (“I want others to know I use it”) and external regulation (“I feel pressured to use it”). Both indices have a score range from 6 to 30. The Relative Autonomy Index (\(\text {ACTA}_{\text {RAI}}\)) subtracts \(\text {ACTA}_{\text {CRS}}\) from \(\text {ACTA}_{\text {ARS}}\). Finally, the Perceived Competence Score (\(\text {ACTA}_{\text {PCS}}\)) measures perceived competence towards a technology, with a score ranging from 3 to 15. Higher scores on all scales reflect higher values of the measured construct.

Statistical analysis

Data analysis was performed using R (version 4.2.1) and RStudio. A significance level of .05 was used for all tests, and effect size Pearson’s r was used to quantify the effect sizes of planned contrasts (small: 0.1 to 03; medium: 0.3 to 0.5; large: 0.5 or greater; [108]). To investigate the general effectiveness of the intervention concerning the psychological and psychophysiological outcomes, we ran linear multilevel models with the R package nlme with the between-subjects factor Group (two levels: INT vs. WLC) and the within-subject factor Time (three levels: Pre, Post and Follow-up). Specifically, we were interested in the interaction effects of Time x Group on psychological and physiological outcomes. To capture any significant changes in the INT from Pre to Post and from Pre to Follow-up compared to the WLC, we specified planned contrasts for the factor Time using dummy-coding (Pre [0,0], Post [1,0], and Follow-up [0,1]). Using planned contrasts counters the inflation of the Type 1 error [109]. Psychological and psychobiological stress reactivity assessed after the Post measurements was also analyzed using a multilevel model with the between-subjects factor Group (two levels: INT vs. WLC) and the within-subject factor Time (four levels for psychological state and biochemical measures and five levels for cardiac measures). UX measures assessed at the end of S1-S4 (PQ, FSS, SUS, UTAUT, SSQ) were analyzed using multilevel models with the within-subject factor Time (four levels: S1, S2, S3, and S4). Post hoc tests were run to capture the effects of time (i.e., comparing S1 to S4 and S2 to S3) and changes to the feedback (i.e., comparing S1 to S2, S2 to S4, and S3 to S4). Pairwise comparisons for these measures were computed using the R package emmeans and corrected using the Benjamini-Hochberg method. For the ACTA, which was only assessed at Pre and Post, we ran two-tailed dependent samples t-tests to explore whether scores had significantly changed from Pre to Post in the INT. All specified multilevel models were checked for level-one heteroscedasticity, level-one and level-two residuals normality, and multicollinearity (where applicable). We found that the normality assumption for level-one and level-two residuals was violated (i.e., longer tails) by some models. However, it has been demonstrated that fixed effects estimates, which were used for our hypothesis tests, are robust to violations of the normality assumption [110]. Similarly, we found that level-two residuals for random intercepts suggested slight deviations from normality for some models. Regardless, it has been shown that estimates of fixed effects and their standard errors are robust to non-normal level-two residual errors when sample sizes are as large as the one used in this study [111]. Additionally, we checked for influential data points using the influence.ME package in R. Here, results revealed that no participant had an undue influence on the model (all Cook’s Distances \(< 1\)).

Results

Study sample

In total, two hundred and three (\(n = 203\)) participants were assessed for eligibility. One hundred and four (\(n = 104\)) were excluded, resulting in 99 participants who were randomly assigned to either the INT (\(n = 54\)) or the WLC (\(n = 45\)) by blinded slot series sign-up. Of participants assigned to the INT, 44 attended the first session and received the intervention. In the WLC, 43 participants showed up to the first session. Thus, a total of twelve participants were lost before Pre measurement. Across both groups, three participants dropped out before Post measurements and two dropped out before the Follow-up measurement. Incomplete data of seven participants were included in the final analysis. Participants in the INT who took part in all the lab sessions and practiced the breathing exercises at home at least \(50\%\) of the time were included in the analysis. Therefore, six participants of the INT were excluded from all analyses, except the analysis of UX measures, because they either did not practice the exercises often enough (\(n = 3\)) or called in sick (\(n = 3\)). Finally, the data from 81 participants were analyzed. Refer to Fig. 4 for a participant flow diagram.

Fig. 4
figure 4

Diagram of participant flow. Abbreviations: INT = Intervention group; WLC = Wait-list control group. * Incomplete data of seven participants were included in the analysis

The mean age of participants was 22.88 (\(SD = 4.02\)) (refer to Table 2 for detailed sample characteristics per group). Most participants were university students and had at least a Baccalaureate degree. The majority reported being physically active \(\le 5\) hours a week, did not smoke, and had no children. A little less than half of the participants reported being in a relationship at the time (33/87). Furthermore, most had either never used an HMD (47/87) or used it less than once a month (37/87). Similarly, participants had no experience with BF except for two people in the INT. Experience with breathing exercises or other exercises for stress management and relaxation was more common in both groups. Analysis after four weeks of intervention showed that participants in the INT practiced 7.7 min/day (\(SD = 1.9\), range = 0 - 20 min) at home on average across all weeks. The log data also revealed that there was a decrease in the average number of exercises recorded per week from 5.5(\(SD = 1.32\)) in the initial week to 4.5 (\(SD = 1.04\)) in the final week. Of a total of 888 possible in-app exercises, the 37 participants of the INT initiated 732 exercises (\(82.4\%\)) throughout the four-week intervention. Of these 732 exercises, \(92.1\%\) (674) were completed (\(\ge\) 10-minute practice), and \(7.9\%\) (58) were aborted (< 10-minute practice).

Table 2 Sample characteristics of the intervention group (INT) and wait-list control group (WLC)

Intervention effects

Psychological outcomes

Following up on significant interaction effects, planned contrasts revealed a small yet significant decrease of chronic stress (i.e., PSS-10) from Pre to Follow-up, and of anxiety and mental fatigue from Pre to Post and from Pre to Follow-up in the INT compared to the WLC. Moreover, there was a small but significant increase in mindfulness, role-emotional, and social functioning from Pre to Post and from Pre to Follow-up in the INT compared to the WLC. No other significant effects were found (refer to Table 3 for detailed results and Tables A1-A2 in the supplementary information for means and standard deviations).

Physiological outcomes

Planned contrasts following significant interaction effects for cardiac measures showed a small yet significant increase in LF from Pre to Post in the INT compared to the WLC and a small yet significant increase in pNN50 from Pre to Follow-up in the INT compared to the WLC. No other significant effects were found (refer to Table 3 for detailed results and Table A3 in the supplementary information for means and standard deviations).

Table 3 Effect of intervention on stress and stress-related symptoms: Interaction effect of Time \(\times\) Group and planned contrasts following significant interaction effects

Stress reactivity

With regard to psychological state measures, results of the stress test (after Post) showed significant main effects of Time on \(\text {MDMQ}_{\text {mood}}\), \(\chi ^2(3) = 102.42\), \(p < .001\), \(\text {MDMQ}_{\text {calmness}}\), \(\chi ^2(3) = 159.64\), \(p < .001\), and \(\text {VAS}_{\text {stressed}}\), \(\chi ^2(3) = 143.50\), \(p < .001\), but not on \(\text {MDMQ}_{\text {wakefulness}}\), \(\chi ^2(3) = 2.23\), \(p = .53\). Refer to Fig. 5 for plots of mean changes of \(\text {MDMQ}_{\text {mood}}\) and \(\text {MDMQ}_{\text {calmness}}\). As for cardiac measures, there was also a significant effect of Time on HR, \(\chi ^2(4) = 346.91\), \(p < .001\), rMSSD, \(\chi ^2(4) = 120.43\), \(p < .001\), SDNN, \(\chi ^2(4) = 129.84\), \(p < .001\), pNN50, \(\chi ^2(4) = 118.07\), \(p < .001\), HF, \(\chi ^2(4) = 61.55\), \(p < .001\), and LF, \(\chi ^2(4) = 60.52\), \(p < .001\), but not on LF/HF, \(\chi ^2(4) = 3.42\), \(p = .49\). Finally, analysis of biochemical measures also revealed a significant main effect of Time on both sCort, \(\chi ^2(3) = 113.74\), \(p < .001\), and sAA, \(\chi ^2(3) = 109.58\), \(p < .001\). No other significant interaction effects Time \(\times\) Group were found (all \(p > .05\)). Refer to Fig. 5 for plots of mean changes of HR, LF, sAA, and sCort. Taken together, the TSST induced a temporary worsening of mood while increasing subjective stress, HR, sCort, and sAA during the stress test.

Fig. 5
figure 5

Group means of repeated psychological and biological measures of stress reactivity. Error bars indicate 95% confidence intervals. Abbreviations: INT = Intervention group; WLC = Wait-list control group; TSST-G = Trier Social Stress Test for Groups; TSSTi = Mock job interview; TSSTm = Mental arithmetic task; Rec1 = Recovery phase 1; Rec2 = Recovery phase 2; A. = Anticipation; T. = Trier Social Stress Test for Groups;  MDMQ = Multidimensional Mood Questionnaire; VAS = Visual Analog Scale; bpm = beats per minute; LF of HRV = Low-frequency power of heart rate variability; nmol/L = Nanomole per liter; U/ml = Units per milliliter. Error bars indicate 95% confidence intervals

User experience

User experience, which was only assessed for the INT, revealed a significant and a moderate decrease in the scores of \(\text {ACTA}_{\text {ARS}}\) from Pre (\(M = 23.51\), \(SD = 2.94\)) to Post (\(M = 21.87\), \(SD = 3.71\)), \(t(36) = 2.57\), \(p = .01\), \(r = .39\), and a significant and moderate increase in \(\text {ACTA}_{\text {CRS}}\) from Pre (\(M = 9.19\), \(SD = 2.41\)) to Post (\(M = 10.60\), \(SD = 3.76\)), \(t(36) = -2.13\), \(p = .04\), \(r = .33\). As a result, there was also a significant and large decrease in the scores of \(\text {ACTA}_{\text {RAI}}\) from Pre (\(M = 14.32\), \(SD = 3.75\)) to Post (\(M = 11.27\), \(SD = 4.17\)), \(t(36) = 4.41\), \(p < .001\), \(r = .59\). Finally, the analysis revealed a significant and moderate increase in \(\text {ACTA}_{\text {PCS}}\) from Pre (\(M = 10.73\), \(SD = 2.10\)) to Post (\(M = 11.68\), \(SD = 1.55\)), \(t(36) = -2.86\), \(p =.01\), \(r = .43\).

Concerning changes across all four intervention sessions (S1-S4) of the INT (refer to Table A4 in the supplementary information for detailed results), the analysis revealed a significant main effect of Time on \(\text {PQ}_{\text {Invo}}\), \(\chi ^2(3) = 25.63\), \(p <.001\), \(\text {PQ}_{\text {SensFi}}\), \(\chi ^2(3) = 10.87\), \(p = .01\), \(\text {PQ}_{\text {ImrsAdpt}}\), \(\chi ^2(3) = 10.46\), \(p = .02\), \(\text {UTAUT}_{\text {HedMotv}}\), \(\chi ^2(3) = 13.57\), \(p = .004\), SUS, \(\chi ^2(3) = 9.65\), \(p = .02\), \(\text {SSQ}_{\text {Ocu}}\), \(\chi ^2(3) = 16.02\), \(p = .001\), and \(\text {SSQ}_{\text {Dis}}\), \(\chi ^2(3) = 28.45\), \(p <.001\). Pairwise comparison revealed significant increases in \(\text {PQ}_{\text {Invo}}\), \(t(125) = 4.37\), \(p < .001\), \(r = .36\), \(\text {PQ}_{\text {ImrsAdpt}}\), \(t(125) = 2.82\), \(p = .03\), \(r = .25\), \(\text {UTAUT}_{\text {HedMotv}}\), \(t(126) = 2.56\), \(p = .02\), \(r = .22\), SUS, \(t(126) = 2.93\), \(p = .02\), \(r = .25\), and significant decreases in \(\text {SSQ}_{\text {Ocu}}\), \(t(126) = -3.71\), \(p = .002\), \(r = .31\), and \(\text {SSQ}_{\text {Dis}}\), \(t(126) = -4.04\), \(p < .001\), \(r = .34\), from S1 to S2. Similarly, there were significant increases in \(\text {PQ}_{\text {Invo}}\), \(t(125) = 3.98\), \(p < .001\), \(r = .34\), \(\text {PQ}_{\text {ImrsAdpt}}\), \(t(125) = 2.43\), \(p = .04\), \(r = .25\), and significant decreases in \(\text {SSQ}_{\text {Ocu}}\), \(t(126) = -3.13\), \(p = .005\), \(r = .26\) and \(\text {SSQ}_{\text {Dis}}\), \(t(126) = -4.35\), \(p < .001\), \(r = .36\), from S1 to S4. Furthermore, there were significant decreases in \(\text {UTAUT}_{\text {HedMotv}}\) from S2 to S3, \(t(126) = -2.80\), \(p = .01\), \(r = .24\), and S2 to S4, \(t(126) = -3.46\), \(p = .004\), \(r = .22\). There were no significant differences between individual sessions for \(\text {PQ}_{\text {IntQual}}\) (all \(p > .05\)).

Discussion

To our knowledge, this is the first randomized controlled study to investigate the psychological and psychophysiological effects of an HRV-BF intervention program supported by laboratory sessions in VR and at-home exercises using a mobile phone. Our results show that the four-week-long intervention significantly reduced chronic stress, anxiety, and mental fatigue and improved mindfulness and HRQoL (small effects). Specifically, all improvements in psychological traits post-intervention remained significant until follow-up, although chronic stress only significantly decreased when comparing levels at pre-intervention to follow-up. The intervention was also able to significantly increase measures of HRV, specifically pNN50, LF, and LF/HF ratio compared to the WLC. Differences in LF and LF/HF ratio between groups did not last until follow-up, while differences in pNN50 were only observed when comparing changes from pre-intervention to follow-up. Although changes in resting HR were significantly different in the two groups over time, planned contrasts revealed no significant differences between measurements. There was only a visual trend for a decrease in resting HR from pre-intervention to follow-up in the INT compared to the WLC. No differences in terms of resting BP or psychological and psychobiological stress reactivity, however, were found between the two groups. As for UX measures, results show that feeling autonomously motivated towards using the technology and the intervention significantly decreased, while perceived competence significantly increased over time. There was also a considerable rise from the first to the last of four sessions in both involvement and immersion. Finally, hedonic motivation was highest in the second session before progressively reverting to initial values.

Mean baseline values of psychological outcomes were mostly in line with other samples [53, 99, 112, 113]. Specifically, long-term stress levels, as indicated by PSS-10 and \(\text {DASS-21}_{\text {stress}}\), were not elevated [91, 92, 114]. The sample, however, had mild depressive symptoms according to the \(\text {DASS-21}_{\text {depression}}\) categories [92] and “fair” psychological well-being as defined by the WHO-5 benchmarks [115]. Baseline values of physiological outcomes such as mean HR and LF can be situated within the range of short-term HRV norms [116,117,118]. Concerning SYS, baseline values were slightly elevated in both groups for the age group of our sample, whereas baseline DIA was comparable to guidelines and norm values [119, 120].

The findings indicate that using VR to support the delivery of HRV-BF improves a range of indicators of stress and stress-related symptoms to the same extent as HRV-BF interventions delivered on two-dimensional screens (see [21,22,23]). We observed small improvements across a range of psychological traits (i.e., chronic stress, anxiety, mental fatigue, mindfulness, HRQoL) and small changes in measures of cardiac activity (i.e., pNN50, LF, and LF/HF ratio) in the intervention compared to the WLC. This pattern and extent of changes in primary and secondary outcomes are comparable to those found in Lehrer et al.’s [22] meta-analysis. Since effect sizes are modest and similar in size across measures, Lehrer et al. [22] conclude that HRV-BF may be useful for targeted and non-targeted measures and serve as a complementary treatment method.

Due to the lack of an active control group, improvements in psychological outcomes in the intervention group due to potential placebo effects could have been caused by participants’ expectation to improve and higher motivation compared to the WLC due to the multifaceted, novel, and engaging nature of the intervention. In addition, experimenter demands effects could have played a role, such that participants consciously or unconsciously aimed to produce effects believed to be expected by the experimenters. While the absence of an active control group prohibits us from a definite conclusion on effect sizes, there is evidence provided by the data suggesting that placebo or experimenter demands effects may be somewhat negligible.

First, there is the observation of significant differences between groups in some physiological measures, especially the significant increase in resting LF, the prime target measure of HRV-BF, from pre- to post-intervention in the intervention group. The increases in the main target parameter of HRV-BF, the LF band, presumably reflect increased baroreflex activity during resting conditions [29, 51, 121]. There were also increases in pNN50 from pre- to post-intervention and follow-up, and the visual trends of an attenuated endocrine stress reactivity is a pattern observed in other SMI studies [122,123,124].

Second, self-reported chronic stress decreased not until follow-up, although the expectation of experiencing stress-reducing effects would have been most salient to participants immediately after the intervention. Moreover, participants reported decreases in autonomous regulation and increases in controlled regulations toward using the technologies and intervention over time. This pattern of results challenges the notion of a pervasive positive bias in evaluating intervention features and outcomes.

Third, there is evidence provided by research on HRV-BF for stress management with active control groups showing that HRV-BF is indeed effective in reducing a variety of symptoms. De Bruin [125] and van der Zwan [126] both report on the same study but with different outcomes. The randomized controlled trial compared the effects of HRV-BF (\(n=25\)) to mindfulness meditation (\(n=27\)) and physical activity (\(n=23\)) for stress reduction in a stressed sample of young adults. The interventions included daily exercises for five weeks, similar to our intervention in terms of effort and duration. The articles report significant improvements in measures of stress, anxiety, worrying, mindful awareness, psychological well-being, and self-compassion, among others, with small to moderate effect sizes. These improvements were observed from pre- to post-intervention and pre-intervention to follow-up six weeks post-intervention in all groups. There were no significant between-group differences, however, concluding that HRV-BF is not more but equally effective for stress and anxiety as the other methods [125, 126]. This pattern is also backed by meta-analytic computations by Lehrer et al. [22]. Compared to active controls (e.g., EEG biofeedback, physical exercise, cognitive behavioral therapy, skill training, mindfulness meditation, monitoring HRV, progressive muscle relaxation), HRV-BF does not lead to significantly better outcomes. In contrast, compared to inactive controls (e.g., WLC, no treatment, sitting quietly, watching a video), HRV-BF achieves significant small to moderate effect sizes [22].

In conclusion, the results of the present study support the notion that HRV-BF supported by VR and mobile technology is also able to reduce stress and stress-related symptoms and modulate emotion-regulating networks of the brain through the baroreflex system (bottom-up processes) [22, 51] and that it is also able to train prefrontal regions of the brain to exert regulatory control over areas that are involved in perceiving and experiencing emotions (top-down processes) [33, 37, 127]. The observation that dimensions of HRQoL, such as role-emotional and social functioning, improved might be explained by the positive effects of vagal activation on self-regulatory processes in the brain. According to Porges [29], healthy vagal functioning inhibits sympathetic outflow to the heart and allows humans to self-regulate, feel safe, and show adaptive social behavior.

We assume that the repeated, near-to-daily RF breathing mainly drove the intervention effects during the laboratory sessions in VR and the mobile-supported exercises. According to CAMIL [62] and empirical findings, different technologies can be particularly conducive to teaching and learning a method or a skill, but they are not the main drivers of effects. Our previous work [53] showed that standardized slow and paced breathing is the main contributor to short-term increases in LF and CR, while RF breathing coupled with biofeedback is able to significantly amplify this effect. Similarly, both HMDs and two-dimensional screens can be used for both breathing types, but using an HMD was able to significantly amplify the effects on CR. Nevertheless, the individual contributions of VR-supported HRV-BF and mobile-supported breathing at RF cannot be discerned.

Some of the assessed psychological and psychobiological outcomes did not differ between conditions. For example, although HRV-BF has been shown to improve BP control modulated by the baroreflex [85] and recent meta-analyses confirming that HRV-BF interventions significantly reduce SYS and DIA [22, 26], we did not observe any differences between the INT and the WLC. However, in our study, BP levels also decreased in the control group from Pre to Post intervention. Moreover, although research assistants interacted only minimally with participants during the BP assessments, we cannot rule out a white coat effect (not only, but especially during the Pre measurement) which might also explain the slightly elevated levels of SYS throughout the study in this otherwise healthy sample. In addition, the VR-supported HRV-BF intervention at hand did not significantly affect psychological and psychobiological stress reactivity, except for the visual trends in LF, sCort, and sAA. Changes in mean values of LF seem to suggest greater power in the LF band from peak reactivity (i.e., mental arithmetic task) throughout recovery in the INT compared to the WLC, perhaps reflecting a more adaptive recovery of the autonomous nervous system (ANS). Notably, the values during recovery are higher than at baseline in the INT, which might suggest that some participants in the INT actively used slow and paced breathing to recover from the psychosocial stress test. Similarly, the trends of changes in mean levels of sCort and sAA could reflect a less pronounced stress response of the HPA axis and the sympathetic nervous system in the INT compared to the WLC. Participants in the INT might have also been particularly acclimatized to the laboratory setting, familiar with the research assistants present in the room and the participants in their slots. A sense of familiarity associated with increased resilience [128] in the INT might have led to a less pronounced stress response. Alternatively, one could also have expected that improved resting parameters are accompanied by stronger stress reactivity, possibly explained by the potential of showing a greater net reduction in vagal activity and a more solid increase in sympathetic activity in response to challenging demands [129, 130]. Nonetheless, the TSST-G in this study was able to induce significant psychological and psychobiological stress responses in both groups, although the increases in sCort were not as high as in other studies employing the same protocol (e.g., [89, 113]). According to a meta-analysis [131], having three members, instead of one or two, on the panel of judges and using the number 13 instead of 17 can increase the cortisol response. The time at which this study was conducted (May to June 2022) might also have contributed to the comparatively small improvements in measures in the INT and the slight worsening of some outcomes in the WLC. During this period, students typically transition into the end-of-semester examination phase, preparing for or already taking exams. Study participation might thus have attenuated improvements in the INT or even contributed to stress and stress-related symptoms in both groups. Moreover, the daily workload in the INT and having to come to the lab once a week over a month might have further increased pressure and stress in this group, potentially adding to diminished effectiveness. Log data of the mobile application also suggest that adherence to the exercises was high overall but did decrease slightly from the start to the end of the program, potentially contributing to the attenuated effectiveness.

The decrease in autonomous regulation and the increase in controlled regulation towards the adoption and use of the technology from pre- to post-intervention also support this explanation. Having to do the exercises with the mobile application every single day without any BF might have left participants feeling less in control and without agency, and therefore stressed and frustrated. The observation that levels of chronic stress only showed improvements at follow-up once the intervention was really over might also back this explanation. Being autonomously motivated to use technology and participate in health interventions is crucial to achieving better health and well-being outcomes as well as to experiencing overall less stress and negative affect (e.g., [73, 74, 132,133,134,135,136]). The fact that participants were remunerated for their participation may have also contributed to feeling more controlled and less autonomous. Years of research have shown that tangible rewards such as money or trophies feel controlling and thereby undermine intrinsic motivation [137]. On the contrary, rewards in the form of positive feedback that carry information about performance, such as a cumulative score during HRV-BF, for instance, have been shown to increase intrinsic motivation [137]. However, this is the case only when individuals’ perceived competence is accompanied by some degree of perceived autonomy, an internal locus of control [137]. Autonomy is considered to be an important principle from the perspective of biomedical ethics [138], which should be safeguarded and fostered throughout the design of any technology-supported stress management intervention [139]. Participants’ free text feedback indicated that they wished for the ability to personalize the virtual environment and add BF to the mobile application. Moreover, considering that 10 participants who were allocated to the INT did not show up for the first session (compared to two participants in the WLC), the schedule of the intervention program might have been viewed as too intense, stressful, and inflexible already in advance. Regardless of the decreases in autonomy, the increase in perceived competence over time shows that the system did become easier to use, and participants felt more effective at using it and less challenged over time. This might entail that participants got more confident in using the technology and devices and in navigating the virtual environment and mastering the breathing exercises and the BF task. Experiencing increased competence in a task can ultimately benefit intrinsic motivation [71].

The remaining UX assessment of the intervention during the four lab sessions of the INT yielded overall positive results among all dimensions. The mean values of UX measures were in the upper third of the scale ranges, except for interface quality, involvement, and sensor fidelity, which were all in the upper half. Due to the lack of long-term studies, we can only compare our results to previous research using either the mean values of UX measures from the first session or the mean values across all four sessions. Here, the means of both the first session and across sessions are similar or higher to values reported in previous research using VR for BF [65, 68, 69, 140,141,142]. Furthermore, we found that the UX measures of the first session were overall similar or higher than values measured for HMDs in our previous work [53], indicating the improvements made to the VE had the intended effect. CAMIL proposes that using HMDs for VR experiences can improve learning outcomes due to higher values of affective and cognitive factors (e.g., motivation, embodiment, and self-efficacy) [62]. The fact that the UX values stayed high throughout all four sessions thus indicates that the use of HMDs could be beneficial in comparison to a screen not only in a single session but also in the long run.

As for changes in UX measures between sessions, we found significant increases from the first to the second session, after which the measures either stayed on the same level (i.e., involvement and immersion) or returned to the initial values of the first session (i.e., motivation and usability). These results indicate that the participants did not experience a novelty effect surrounding our system (i.e., initial excitement towards new technologies that wears off with repeated use). The absence of a novelty effect is important since sustained high UX is expected to positively affect the intervention outcome [62]. Furthermore, adherence to an intervention program is reliant on sustained motivation [75]. The lack of further improvements in UX measures after the second session could also be related to the corresponding changes in the interface designed to support the shift from external to internal feedback (i.e., removing the pacer and adding real-time feedback).

Although participants seemed to adapt to these changes, as indicated by the moderate increases in perceived competence, this pattern of results might also suggest that participants would require slightly more time to adapt to the changes or that BF should be included in the mobile application for homework. This is further supported by participants’ free text feedback, indicating that it became more difficult to perform the training in later sessions (i.e., the third and fourth sessions). Finally, we observed overall low levels of simulator sickness symptoms and a small but significant reduction (medium effect size) of symptoms over time, indicating that the technology used for the intervention did not hinder participants’ user experience or knowledge acquisition.

Limitations, future research, and implications

This study comes with a series of limitations. First, a major limitation of this study is that it did not include an active control group with comparable expectations of improvement (e.g., sham biofeedback, EEG feedback, cognitive behavioral therapy, mindfulness meditation, or physical exercise), which could have controlled for potential placebo effects in the intervention group. Second, although some measures worsened in the WLC over time, some psychological and psychophysiological outcomes indicate improvements (e.g., HRQoL, physical fatigue, DIA, and SYS). Even more pronounced positive effects in control groups relative to treatment groups have been reported in other HRV-BF intervention studies [143, 144]. Possible treatment diffusion [145] may explain these positive effects. All participants were informed about the general nature of the study (e.g., HRV-BF in VR, breathing exercises) before giving informed consent. Therefore, participants in the WLC may have felt inspired to practice breathing exercises or other stress management skills during the time in which the other participants received the intervention. Third, the sample consisted of healthy young students, limiting the generalisability to older, highly stressed individuals and clinical populations. Offering the intervention to populations with elevated stress levels might have led to more and greater improvements across measures. Fourth, the study only included one follow-up assessment four weeks post-intervention. The lack of later follow-up assessments at three, six, or perhaps twelve months do not allow conclusions on the long-lasting and sustained effects of this intervention beyond a month. Lastly, significant changes in primary and secondary outcomes warrant cautious interpretation and conclusion of implications considering the small effect sizes detected with the sample size at hand.

Based on the results of this study, we also see a series of avenues for future research and practical implications. First, introducing different and greater degrees of personalization of the virtual environment and BF might increase participants’ autonomy. In the context of our VR-supported HRV-BF protocol, autonomy might have been higher by integrating BF in the mobile application and adapting the BF according to individual progress. Using BF during homework exercises with the mobile application could be tested in combination with the VR application in the lab or perhaps provide an entirely autonomous BF system outside of a lab setting at users’ disposal. The more control over time, day, place, and circumstances participants are given, perhaps the higher their perceived autonomy and adherence and, thus, the beneficial effects of the intervention. Second, the intervention program could be tested over a longer period, as it could further improve the effects observed in the current study and lead to more pronounced differences compared to a control group. Additionally, it might be worth investigating whether different durations and frequencies of homework exercises impact the effectiveness of the intervention program. Fourth, as the technology is ever-evolving and improving, further research with both more advanced but also more simplified VR devices could be tested. This could include comparing HRV-BF in VR to standardized paced breathing in VR as an active control group. Alternatively, affordable VR devices could be handed out to participants, moving the intervention from the laboratory to the field. In such a field experiment, a standalone VR-supported HRV-BF setup could be compared to an entirely mobile-supported HRV-BF. This would reduce the complexity of the interventions and help isolate the effects of the different technologies. Fifth, the intervention program could be tested with adolescent populations, which have a high prevalence of stress and mental health issues [146] and might be especially affine and interested in using new technologies. Using new technologies might foster the education of young people about stress and teach them strategies to cope with stressful situations and challenging life events. As shown in our previous work [53], VR seems especially effective in practicing HRV-BF or paced breathing exercises and thereby teaching stress management skills. Lastly, the current findings need to be replicated and proven to be robust in randomized clinical trials in various settings and populations with adequate controls and larger sample sizes before it is offered as treatment. Assuming effectiveness, and higher degrees of independence and automatization of the psychoeducational content, system, and data processing, we can envision the practical use of such a VR-supported HRV-BF intervention program in the near future. This could include the private use or the use at the workplace as part of a wider stress management offer, the use in clinical settings such as psychiatric clinics, nursing homes, rehabilitation, or pain management centers to support the treatment of stress-related disorders and the management of chronic diseases. To facilitate this, full integration of the BF system and psychoeducational into a stand-alone application, easy connectivity to wearable devices, and guidance through a week-long program must be guaranteed. In a clinical setting, such VR-supported HRV-BF standalone could be offered as a complementary intervention alongside standard treatments. Such a standalone offer would not require much time and effort from healthcare personnel to introduce the intervention to patients. It could be easily integrated into a patient’s treatment plan, both for in- and outpatients. Patients could use such an application autonomously in their own time, even beyond the hospitalization period supported by the mobile application. This program could be offered to individuals wanting to prevent or alleviate chronic stress and improve their stress management and emotion regulation skills.

Conclusion

The VR and mobile-supported HRV-BF intervention program presented in this work improved various stress indicators and stress-related symptoms compared to a WLC group. The findings of this study provide preliminary evidence for the suitability of using VR technology for laboratory sessions of HRV-BF training. The sizes of beneficial effects are comparable to the effects reported in studies on HRV-BF interventions delivering laboratory sessions on two-dimensional screens. Considering the decreased autonomous regulation regarding the technology and intervention over time, an emphasis should be placed on autonomy-supportive interventions that include higher degrees of flexibility and adaptability, which should ultimately benefit long-term health and well-being. The good to very good UX ratings support the future use of HMDs for HRV-BF and continued investigations in this field. The absence of a novelty effect regarding technology adoption further supports using such devices for an HRV-BF intervention program, as a high UX throughout the program can lead to higher adherence and yield more beneficial outcomes. Concerning future research, the long-term psychological and psychobiological benefits of VR delivery compared to different screen modalities remain to be studied. The suitability and effectiveness of a VR-supported HRV-BF intervention for highly stressed individuals and clinical samples also require further investigation. Future studies should address potential placebo effects by comparing VR-supported HRV-BF to an active control group, equally expecting to improve. All in all, this work contributes to a growing field of research leveraging new and increasingly affordable digital technologies, including wearables, VR, and smartphones, to promote health and transform healthcare.