1 Introduction

Virtual reality (VR) offers users the opportunity to simulate different situations without being there. This makes virtual excursions with head-mounted displays (HMDs) to simulate existing places, past events or even purely virtual spaces in applications such as education, architecture, tourism or urban planning [22]. It has been shown that real walking in these environments has a strong positive effect on the subjective presence in virtual reality [25]. Unfortunately, the real physical spaces for such VR walkthroughs are usually limited in size and do not match the dimension of the virtual spaces. Omnidirectional treadmills can be used to leverage this challenge. However, they are bulky, expensive, and hence, hardly available, so the costs of a real excursion might be often more attractive. Even more, due to their mechanics, they can reduce the feeling of presence.

Another solution to this challenge is to manipulate the user’s movement in VR so that s/he does not notice it. Actually, the user’s movement can be influenced in VR by strengthening or weakening the user’s rotation or translation [15] so that they actually compensate this with weaker or stronger movements. The users can be manipulated by these different movements so that they walk in circles [22] while thinking, they walk a straight line. This is shortly the basic idea behind redirected walking (RDW). With this technique, it is possible to fool the user that s/he thinks s/he is walking freely in VR while the actual physical space is relatively small. Most approaches that realize RDW rely on visual manipulation, e.g., by rotating the viewpoint during the blink of the eyes or by applying curvature gains by scaling any rotation or translation of the user when rendering the view in the HMD.

In this paper, we propose a novel auditory approach to RDW that we called Auditory Step Feedback Redirected Walking (ASRDW). The basic idea is to use different surface sounds, appearing as step noises, that lead to a compensatory movement of the user without this being noticed as an unnatural deviation. This RDW technique has the advantage that it can be combined with traditional visual RDW (VRDW) approaches and thus, increase the amount of redirection.

We have formulated the following research questions to investigate our ASRDW approach:

Research Question:

  • R1 Can we influence the walking direction with our auditory step feedback?

  • R2 Is an effective redirection via auditory step feedback possible without recognition by the user?

  • R3 Is it possible to amplify the redirection of visual RDW approaches with our auditory step feedback?

We have conducted a user study to evaluate our research questions (Fig. 1). The results show that our ASRDW approach achieves all supposed properties while not inducing simulator sickness, which indicates the safety of our approach.

Fig. 1
figure 1

A run illustrated with partially superimposed recordings from the perspective of one of the cameras, with a grid of the visible measurement area drawn in

2 Related work

The users can be manipulated by different effects so that they finally walk in circles while still thinking that they are following a straight line [22]. This can be achieved by visual or auditory manipulations or combinations of both. In the following, we will shortly discuss some of these methods, starting with visual manipulations techniques.

One principle to achieve a redirection is the compensation movement that a person performs when the person is rotated on the body axis in the opposite direction in which the person would like to move. This leads directly to this behavior and is called subliminal reorientation [22]. This manipulation can be used at the moment of a user’s eye blink [10]. At this moment, the confidence value of the pupil detection drops to almost 0% with which an eye blink can be detected [10]. If the user’s virtual perspective rotates at this moment, he will not notice any rotation in any direction with a point of subjective equality (PSE) of \(0.495^{\circ }\) on the body axis [10]. In contrast to this discrete path manipulation at discrete points in time, there also exist visual methods that use continuous path manipulation. For instance, [18] presented a dynamic adjustment of the curvature to continuously manipulated visible paths.

Compared to visual RDW approaches, there is relatively few previous work about acoustic redirection methods. [20] investigated the rotation and curvature gains in acoustic redirection. They measured a maximum PSE of 5; however, the experiments did not include any actual movement of the users. Another approach of acoustic redirection aims at manipulate the users’ paths by applying unpleasant dynamic noises following the user [4]. The authors achieved a rerouting up to an average of 6m in a 20-m-long tracking area, whereby the found differences between men and women. In contrast to a completely dark atmosphere, a simple landscape was used to distract visually as little as possible. In the experiment, a red dot was shown as a point of orientation that moved along with the right vector of the user. We use similar setup in our experiment. Instead of a simple point, a house with a light source was faded in and a bare black landscape with a starry sky as a background scene (see Fig. 4). Our 20-m walking space was also inspired by this experiment.

Finally, there also exist some works that have investigated multimodal redirections combining visual and auditory cues. Gao et al. [5] investigated incongruent visual–auditory feedback in VR environments and they proposed a method to use visual noise and incongruence between visual and auditory cues when applying curvature manipulation. Rekowski et al. [17] presented a distractor-based framework for RDW, including also auditory distractions, which have shown to be very promising in navigational tasks. The rotation and curvature gains when combining acoustic and visual RDW were examined by Meyer et al. [12] with a WFS system (wave field synthesis) [2]. They discovered that the DTs with audiovisual RDW are higher than with pure acoustic or visual RDW. We therefore assume this in our experiments. However, all these works rely on external noises, i.e., artificial noises that are not an actual part of the virtual environment, either static, i.e., they are at a fixed at a certain position in the scene or dynamically moving. In contrast, we use a sound source that always follows the user; the user’s own step noises. Like a dynamic sound source, they are always close to the user and can suggest a deviation from the path by means of unpopular noises, such as stepping into a muddy puddle.

3 Redirected walking with acoustic step feedback

The basic idea of our RDW approach is to induce a subliminal manipulation via audio stimuli. The idea was inspired by the experiments of Feigl et al. [4]. However, their sound sources were artificial. We decided to use a more natural approach by using body-centric sounds that typically occur in virtual environments when walking. Hence, the sound of footsteps is a natural choice.

In reality, walking also gives haptic feedback, which can be applied to RDW. However, we avoided the haptic stimuli of the feet because they are inferior to the acoustic ones [24]. Translational gains that are obtained via footstep manipulation [21] can be improved when the participants’ feet are represented visually [9]. However, we avoided this visual representation because want to isolate the effects of visual and auditory effects as best as possible to limit effects of the visual VR scene and emphasize the acoustic environment.

A typical task in a virtual environment is to walk straight toward a visible target. In our approach, we aim at manipulating this straight line without the user noticing this, mainly with auditory cues. Actually, we use different kinds of sounds to indicate whether the user is still walking on the line or aside. By bending this “audible line,” we achieve the curvature gain.

3.1 Sound representation

In our example implementation, we represent the path to the target acoustically by a gravel path. Left of the path is a line of water followed by a muddy lawn environment. We have recorded sound samples for all three different acoustic environments. In our implementation, we simply played the respective samples with respect to the position of the user’s avatar position, or more precisely, it’s center. In our preliminary experiments, we also tested scenarios with dynamic blending between the samples. However, this leads to much slower reactions of the users. The respective sounds are played using the Unity spatial audio system. Moreover, we used permanent wind as background noise to cover the real step noises, similar to [15]. Since we want to manipulate the persons to walk only in one particular direction, similar to Feigl et al. [4], in our experiment we choose the right side, we only have to include the water and lawn strips to the left side of the gravel path.

The choice of these samples has some advantages: they are easily distinguishable, they can be easily included in a story (e.g., it has rained and hence, the person should try to reach the target dry-shod), and finally, people try to avoid stepping into water with their shoes in the real world; hence, the sample of the water acts as a natural alarm signal.

3.2 Path manipulation

To manipulate the straight path, we do not simply consider a linear deviation, but we define a function based on observations from previous RDW experiments. In general, we can define the deviation as a function D(x) with respect to the actual distance to the target \(d_{actual}=d_{t}-x\) where \(d_{t}\) is the starting distance to the target point. The user approaches this invisible target point, which s/he perceives as the end of the straight path. In our test scene, the actual visual target was placed beyond the physical target line, which was located at \(x = 20\) m.

As the participants get closer and closer to the house, this can lead to a discrepancy in the acoustic perception of the path and the visually perceived position of the target point. Therefore, the steepness of D(x) should decrease with a decreasing distance to the target. Actually, it should be almost 0 close to the target. Hence, we basically define the \(C^{\infty }\) function \( E(x) := 1+ \frac{-1}{1+(\frac{d_t-x}{3})^{10}}\) that is almost 1 for \(x=0\) and 0 for \(x=d_t\). Another advantage of the function is that it starts relatively slow: at the first few meters, the deviation from the ideal path is usually close to 0, since it can be assumed that the participants are walking straight toward the target in the first few meters. These meters without manipulation are used to get the participants used to walking in VR and are inspired by Steinicke’s curvature gain scenario [23].

To add some freedom in the maximum deviation, we simply scale E(x) linearly with a constant impulse factor c. Overall, \(D(x)=c \cdot E(x)\) defines the distorted path.

Around the ideal path D(x), we define equidistant curves that define the functions where the sound changes. Basically, these are punishment functions \(P_i(x)=D(x)-d_i\) where \(d_i>0\) defines the distance from D(x) (in case that we want to manipulate the user to walk into the positive y-direction. In the case that the user should walk in the negative y-direction, we obviously have \(d_i<0\)) with i = 1, ...n and \(d_{i+1}>d>i\).

This allows us to evaluate for each distance x how far away from the ideal path the user is located and play the appropriate sound \(s_i\) corresponding to the punishment function \(P_i(x)\). In other words, we compute the orthogonal distance of the actual user’s position to the ideal line D(x) and check in which of the intervals \([0,d_1],...[d_{n-1}, d_d]\) this distance is located. This is equivalent to the \(\overrightarrow{x} \)-function by Steinicke [22]

To give you an actual example, we present some values we used in our experimental application: we used three different sounds; a step sound on a gravel surface for walking on the correct line, a sound of footsteps in water for \(P_1\) and a sound of footsteps on a wet lawn for \(P_2\). This results in the two interval bounds \(d_1\) which was set to 0.4 and \(d_2\) that we set to 0.8. This means, in case that the actual distance of the user at a distance x to the ideal line D(x) is larger than 0.8, the sound of the wet lawn is played and may motivate him/her to go back to the gravel area.

According to our research questions, we are yet mainly interested if it is possible to manipulate the walking direction with our ASRDW approach instead of finding the minimum or maximum deviation we can achieve. Hence, we set the maximum deviation to a fixed number which we derived from pretests to our experiment to a relatively conservative value of 2m. However, higher deviations may occur due to changes in the orientations, i.e., movements around the body axis while our actual path manipulation only considers translational deviations.

In our experiment, people do not walk in the dark but try to reach a target point. Hence, in addition to the acoustic deviation, we also have to move the visual target to avoid a discrepancy between the perceived visual and acoustic cues. Consequently, we added a visual deviation of at most 3.5 meters in case of a positive trend. In this case, the user walks to the right (see Fig. 2). This movement of the target is deactivated in the last 2.5 m, analogously to the fading out of the acoustic deviation.

Fig. 2
figure 2

Sample scene with starting point at (0,0) and the initial target point at (20,20). The colored lines illustrate the acoustic path. The function D(x) causes a compensatory movement to guide the user to a positive deviation from the initial target point (Please note that we have mirrored and turned the coordinate system so that the positive x-axis is pointing up and the positive y-axis is pointing to the right side. This matches best the notion of user movements that are deviated to the right, even if it is counter-intuitive when considering D(x)). The gray area represents gravel, the blue area water and the green area the muddy grass. If the user simply walks a straight line and does not apply any compensatory movements, s/he will first walk through the water at app. 2 m and then walk (and stay) in the muddy grass (after 4 m) until s/he reaches the finishing line at 20 m. However, the sounds should motivate the user to do a compensation to the left when s/he first enters the water at 2 m which will redirect him or her to the safe gravel path. The proportions are not chosen correctly in order to simplify the representation

4 User study

We have implemented our ASRDW approach using the Unity engine (Version 2019.2.15f1) in combination with the Arduino-based step detection described in this section. In order to answer our research questions formulated in Sect. 1, we conducted a user study. First, we derived hypotheses from our research questions, and then, we designed an experiment to test these hypotheses. In the following, we will detail these steps.

4.1 Research questions

Following our research questions from Sect. 1, we formulated eight hypotheses for their evaluation.

Research question R1 automatically raises to the following hypothesis:

  • H1 The ASRDW used in the experiment achieved on average a higher positive deviation, i.e., a deviation to the right side, from the target point than with no manipulation applied.

The user should not notice that s/he is redirected, hence, to answer R2, we can formulate the hypothesis:

  • H2 The participants do not notice that they did not move straight toward the target point during the ASRDW scenario.

To answer the research question R3, we have implemented a traditional visual RDW method based on eye-tracking. The following two hypotheses will be checked to guarantee a correct implementation:

  • H3 The VRDW used remains unnoticed by the participants during the run.

  • H4 The applied VRDW manipulation achieves a higher positive deviation from the target point, on average, than the scenario without any manipulation.

Finally, to answer R3, we want to check whether our ASRDW approach can be used together with the VRDW approach, which can be reformulated into the following 4 hypotheses:

  • H5 The combination of the ASRDW and VRDW used here achieves on average a higher positive deviation from the target point than the scenario without manipulations.

  • H6 The combination of ASRDW and VRDW achieves on average a higher positive deviation from the target point than the pure ASRDW scenario.

  • H7 The combination of ASRDW and VRDW achieves on average a higher positive deviation from the target point than the VRDW scenario.

  • H8 In most cases, the users do not notice that they do not move straight toward the target point during combined scenario.

4.2 Visual RDW competitor

To test research question R3, we have additionally implemented a traditional visual RDW approach. Since we mainly wanted to focus our investigation on our new audio-based RDW method and its combination with visual RDW approaches, we wanted to keep the visual cues relatively minimal in our experiments. While our ARSDW manipulates the path continuously, it seems to be fair to use a continuous visual path manipulation technique. However, methods like the dynamic continuous path manipulation proposed by Sakano et al.  [18] require a visible path that we want to avoid in our experimental setup. Consequently, we decided to choose a visual RDW method without this prerequisite. Finally, we decided to reimplement the method based on eye-blinking described by [10] in Unity. Even if this method uses a discrete scene rotation, it can deal with limited visual cues. Moreover, it can be easily combined with our ASRDW approach.

From the literature [10], it is known, that manipulations of up to \(3^o\) per eye blink are not noticeable by 75% of the users. However, since we are not interested in a maximum deviation, but more in an unrecognizable manipulation, we decided to choose the deviation angle on the conservative side. Actually, we used a value of 0.6 which is just above the minimum noticeable difference according to [10]. This also coincides with the literature [1] in which a maximum rotation angle of \(0.82\pm 0.31^{\circ }\) on the horizontal axis was not noticeable to the users. Considering four eye blinks in a distance of 20m, this should result in a deviation of about 0.84m.

4.3 Sensors

An important factor is the detection of the steps and the eye blinks. In this section, we will shortly sketch our implementation and describe the sensors we used in our experiment.

4.3.1 Footstep detection

To detect the footsteps, we decided to use an instrumented shoes approach instead of and instrumented floor according to the notion of [14]. We did this mainly because of the large area. We attached two pressure sensors to the shoe soles and connected them to an Arduino device. This setup guarantees a latency of at most 15 ms between the measurement and the playback of the sound.

In contrast to Turchet et al. [24], we decided to use soles instead of sandals. The soles can be placed in slippers for walking. With this approach, it is easier to cover different foot sizes compared to directly including the sensors into the shoe [24] (see Fig. 3). We compared the measured steps with the steps seen in a video recording from a pretest and did not detect any difference regarding the number of steps measured by our sensors and the steps counted manually. Moreover, we asked the participants of our user study whether they experienced unexpected step noises that did not match their actual steps. Only one participant reported such problems that was caused by a twisted sensor.

Fig. 3
figure 3

Underside of the sole on which the sensors are attached (left) and the upper side (right)

4.3.2 Eye blink detection

In order to detect the blink of the eyes correctly, we used an eye tracker from Pupil Labs [6] that was integrated into a HTC Vive Cosmos HMD. It uses 120 Hz infrared cameras per eye to monitor both eyes. We use the same method as described by Langbehn et al. [10] to detect if both eyes are closed and detect a blink. The detection method is relatively conservative; thus, it is possible to miss actual eye blinks. However, in the case that an eye blink is detected, it is safe to rotate the scene. We did not perform our own experiments to measure the actual accuracy of the blink detection, but as it is the same as described in [10] it should be similar. A false positive blink detection should at least have caused the users to notify the manipulation of the scene during our experiment. As we did not recognize this, the blink detection seems to work properly.

4.4 Experimental setup

Fig. 4
figure 4

The upper area shows the scene before the start (path faded in) and the lower area after the start (path faded out)

We have implemented both methods, ASRDW-step detection, VRDW-eye blink, using the Unity game engine. We set up a scene in Unity where the user has to reach a target, represented by a simple house, while the background was completely dark (see Fig. 4). This allows us to motivate the scenario with a short storyline: it has been just stopped raining during the night, hence the lawn around the gravel road to the house is wet. Because of these conditions, the user tries to reach the house and it is not possible to see the environment. The condition was the same for both, the acoustic and the visual scenario. The light sources and shadows were static in the environment, and no particularly high-resolution textures or computationally intensive effects were used in order to guarantee stable FPS values.

To cover a wide path in a controlled environment, we set up our experiment in a gym with a \(30\times 20\)m playing field. The hall is equipped with ceiling lamps that are sufficient to illuminate the surroundings evenly and brightly. This is essential to guarantee a stable functionality of the inside-out tracking of a HTC Vive Cosmos that we used to recognize the space. We used a laser measuring device with a range of 50 m with a positive or negative measurement deviation of 1 cm at 50 m to be able to align and calibrate the HMD initially with minimal variation.

In our preliminary tests, we tried to put the complete hardware (Arduino, laptop, cables, etc.) in a backpack that the user can wear. It turns out that this led to problems with space, weight and heat. Finally, we decided to place the laptop that runs the scene in Unity on a serving trolley. This was pushed behind the participants during the run and caused no noise which heard by the participants. The step detector was mounted on the participants together with the soles that were cut for EU shoe sizes 37–48. For the sound, we used the closed-back Sony WH-1000XM3 headphones with noise-canceling activated.

4.5 Protocol

At the beginning of the experiment, we determined whether the participants were able to carry out the experiment and whether they had hearing or seeing impairments. They were also asked to what extent they have experience with video games and VR [10]. A preliminary questionnaire was developed for this purpose to gather some demographic data. Similarly to [10], we used the SSQ (Simulator Sickness Questionnaire) before and after the experiment to measure possible simulator sickness symptoms [3]. We expected symptoms of simulator sickness in this experiment because the participants are rotated in the blinking scenario, which increases the frequency of these symptoms [11].

The actual experiment started with a reference run. However, we did not use the result of the reference run to identify user-specific deviation patterns and to take them into account when evaluating the data or, as with Feigl et al. [4], to normalize the user specific deviations in the other scenarios. On the one hand, this would no longer have been a randomized process and, on the other hand, a possible learning effect after this initial run could no longer have been ruled out. Since we are mainly interested in the general applicability of our step-based RWD technique and less on the actual DTs, we decided to use a one-side trial, i.e., all participants are manipulated to be redirected to the right side (similar to [4]). After the test run, the participants performed the runs for the four conditions, i.e., ASRDW, VRDW, combined and without manipulation, in a randomized order.

In contrast to the studies that measured the detection thresholds (DTs) of certain RDW methods, in which the question was asked directly in which direction the participants, for example, more or less turned, it was not revealed here that any manipulation took place at all, so that the participants did not concentrate on this [13]. To determine whether the test participants noticed any manipulations in the scenarios, we added a questionnaire between all runs. However, we simply asked whether or not the participants recognized something unusual but not specifically about the manipulation.

In order to measure and evaluate the feeling of presence of the participant in the simulation afterward, the Igroup Presence Questionnaire (IPQ) was used [16]. This is recommended as a standardized questionnaire to evaluate the presence in the VR simulation [19]. Like the other questionnaires, the IPQ was filled out in paper form, as it has no influence on the average result, whether it is filled out in VR or in real life [19].

The experiment ended with answering the final SSQ questionnaire, to measure possible simulator sickness after the experiment.

5 Results

Overall, 20 participants took part in an experiment with a within-subject design. They randomly went through a scenario without manipulation, a scenario with only acoustic manipulation, a scenario with manipulation through blinking and the combination scenario of the two RDW methods.

5.1 Demographic data

Age groups between 14 and 55 years were represented in the selection of the participants. The genders of the participants are male and female. The ratio is 35% women to 65% men, corresponding to 7 women and 13 men. The participants had no knowledge of the content of the experiment and they were forbidden to share their findings after the run-through until the entire experiment was over. With 60%, the majority of the test participants had no experience with virtual reality.

The average age of the participants was thus M = 29.75 years (with SD = 13.4 years). The youngest participant was 14 years old and the oldest 55. The age of men was M = 29.54 and that of the women M = 30.14 years. With 65% a clear majority of the participants showed an interest in the technology area. It was found that two of the women voted for “Does not really apply” and one for “Does not apply.” With seven participants, this is 42.85%. On average (M = 1.8 where 1 = “Applies” and 2 = “Tends to apply”) the respondents answered “Tends to apply” (SD = 1.3).

Thirty percent of the participants said they played video games every day. The majority, at 85%, said they played video games at least infrequently. Eight participants stated that they had experience with VR and 12 did not. The slight majority of the participants therefore had no experience with VR. Of the participants who had experience with VR, an average of (\( M = \) 2.14 with \( SD = \) 0.9) stated that they use VR devices irregularly. One participant stated to have hearing problems and that these had not been compensated for by means of a hearing aid. With 12 participants, the majority reported having problems with vision (\( M = 0.6 \)), of which four had to use glasses in the experiment (\( M = 0.4 \)) to see clearly. The participants were on average 1.76 m (\( M = 1.76 \) and \( SD = 0.07 \)) tall, with a minimum height of 1.63 m and a maximum of 1.88 m.

5.2 Qualitative data

The results of the preliminary survey using SSQs showed an average total score of 8.6 (SD = 11.6) before the experiment. The weighting of the subscales given by Kennedy et al. [7] was used. Nine of the 20 participants had no symptoms. Four participants had a total score \(>20\) before the start of the experiment. The reason could be the high temperature of more than \(30^{\circ }\) Celsius in the gym during the experiment. Nevertheless, the participants felt good enough to conduct the experiment. Seventy-five percent of the participants reported no symptoms in the oculomotor subscale, 55% for disorientation and 55% for nausea. The feeling of nausea was most strongly represented at the beginning. After the experiment, the participants scored on average 3.4 (\( SD = 5.9\)) in the SSQ questionnaire. Hence, no deterioration of the simulator sickness appeared.

During the runs, the data was mainly collected from the hardware used. The results are summarized for the individual scenarios in order to be able to compare them directly.

5.3 Quantitative data

First, we have measured the differences in the duration (in seconds) of the runs (see Fig.  5). The scenario without manipulation had the shortest duration (\( M = 36.25\) and \( SD = 7.45\) and \( Mdn= 34.5\)). The duration of the shortest run was 24 seconds and the longest 52 seconds. In the VRDW scenario, the duration (\( M = 37.5\), \( SD = 8.04\) and \( Mdn = 35\)) had higher values than in the scenario without manipulation. The minimum duration during the VRDW scenario was 27 seconds and the maximum 61 seconds. The results of a Mann–Whitney U (MWU) was not statistically significant, \( p = 0.292\). The second longest scenario was the ASRDW scenario (\( M = 44.35\), \( SD = 18.46\) and \( Mdn = 38.5\)) which lasted an average of 8 seconds longer than the scenario without manipulation. Again, the difference for the duration between the ASRDW scenario and the scenario without manipulation was not statistically significant according to MWU. The combined scenario took the most time (\( M = 48.75\), \( SD = 18.46\) and \(Mdn=44\)) (Fig. 6).

Fig. 5
figure 5

Average walked real path for each scenario: a Without manipulations b VRDW c ASRDW k d Combined

Fig. 6
figure 6

Average duration per scenario. All times are measured in seconds. Without manipulation, the participants walked fastest, while the combination of the audio and eye-blinking redirection required the most time. This can be due to the longer paths in these scenarios

We measured the average number of steps with the step detector (see Fig. 7). In this case, the data were not clearly normally distributed, which is why we again used a nonparametric test. It should be noted that in the scenario without manipulation, there were two cases in which double steps were partially triggered because the sensors were not ideally placed under the sole. According to the participants, these occurred only a few times (between 1 and 5 times). Analogously to the duration of the scenarios, there were no significant differences between the scenario without manipulation (\( M = 48.4 \), \( SD = 7.33 \) and \( Mdn = 46.5 \)) and the blinking scenario (\( M = 48.3 \), \( SD = 8.50\) and \( Mdn = 44\)) according to MWU with \( p = 0.641\). This did not apply to the ASRDW scenario (\( M = 59.2 \), \( SD = 24.1 \) and \( Mdn = 50 \)), which with an MWU with \( p = 0.038 \) had statistically significantly more steps than the scenario without manipulation. The combination scenario (\( M = 48.75\), \( SD = 19.94\) and \( Mdn = 53\)) had significantly more steps with an MWU with \( p = 0.026\) than in the scenario without manipulations. The difference between the combination scenario and the ASRDW scenario, on the other hand, was not statistically significant according to MWU with \( p = 0.41\). The minimum number of steps was measured with 36 steps in the combination scenario and the maximum with 148 steps in the ASRDW scenario.

Fig. 7
figure 7

Average number of steps per scenario. In both scenarios with audio, the participants required more steps than in the pure visual scenarios. However, the differences are not statistically significant according to a MWU test

In addition to the sensors of the HMD and the step detector, the blinkers detected by the test participants were determined via eye tracking. Only those blinkers were evaluated that actually led to a manipulation and not in the case in which, for example, the eyes were individually closed. In the VRDW scenario (\( M = 2.95\), \( SD = 3.80\) and \( Mdn = 1.5\)), there was no significantly higher number of blinkers after the MWU with \( p = 0.823\) than in the scenario without manipulations (\( M = 3.8 \), \( SD = 3.76 \) and \( Mdn = 3 \)), the maximum number of recognized blinkers was 13 and the minimum was 0. In the ASRDW scenario (\( M = 3.85\), \( SD = 3.8 \) and \( Mdn = 4 \)), there were also no significant difference compared to the scenario without manipulations (\( p = 0.431 \)) and to the combination scenario (\( M = 3.85 \), \( SD = 3.86 \) and \( Mdn = 2.5 \)) with \( p = 0.431 \) according to MWU. All four scenarios have their maximum number of blinkers within half a meter right at the beginning. The total number of average blinkers in the VRDW scenario was approx. one blink below the mean values of the other three scenarios, which averaged between \(3.8--3.9\) blinkers.

Fig. 8
figure 8

The average deviations for each condition in centimeters. We achieved the largest deviation in the combination scenario of ASRDW and eye blinking. The condition without any manipulation and the eye-blinking scenario alone almost did not lead to any deviation

In contrast to the previous measurements using sensors, the absolute results of the deviations from the target point were obtained from an on-site optical measurement and from the coordinates determined by the Vive Cosmos. In contrast to the measurements of the Cosmos, the values were measured in practice with a measurement accuracy of 10 cm, whereby in case of doubt it was rounded off. The results in all scenarios were in a maximum negative range of \(-90\;cm\) and a maximum of \(510\;cm\). The standard deviations are higher in the scenarios with ASRDW than in those without. The results of the deviations from the scenarios are shown as box plots in Fig. 8. It shows that the range of values of the results of the combination scenario is the largest (\( M = 249.5\), \( SD = 126.9\) and \( Mdn = 235.0\)), and that there was a positive deviation in each case. The medians of the ASRDW (\( M = 167.5\), \( SD = 91.4\) and \( Mdn = 205.0 \)) and combination scenarios were 30 cm apart. The value range of the results of the scenario without manipulation (\( M = 10.5\), \( SD = 53.8\) and \( Mdn = 10.0\)) was between \(-90\) cm to + 110 cm and thus intersects the value ranges of the results of all four scenarios. The medians and means of the scenarios without manipulation and the VRDW scenario (\( M = 2\), \( SD = 38.5\) and \( Mdn = 0.0\)) are closest to each other and between 0 cm and 10.5 cm. The men achieved, on average, higher deviations than women in all scenarios in line with Feigl et al. [4], which we examine more closely in the discussion.

Figure  5 shows the average paths taken. These were evaluated analogously to the representation of the blinkers per half meter, and the resulting average coordinates per half meter resulted in a path that was linearly interpolated. The paths shown serve to represent the characteristics of the scenarios that will be taken up in the discussion. The starting point for all paths was x = 0m and y = 0m, the end point for variable x values and y = 20m. The combination scenario was the only scenario in which there was a clear positive deviation within the first 2 meters. In all other scenarios, this started between the first 2-3m. In the blinking scenario, the path formed a curve which, after a positive deviation on the X axis, tended back into the negative area halfway through the route.

6 Discussion

In the discussion, we interpret the findings from this experiment, test the hypotheses, and summarize exploratory and empirical results.

6.1 Research questions

In order to test the null hypothesis H0 to H1, a randomized run without manipulation took place in the experiment. In this case, the deviations from the target point were measured. Since the results of the deviations from the starting point were not normally distributed, the statistical significance was calculated using the nonparametric Mann–Whitney U test. We chose the usual significance level of \( \alpha = 0.05 \) in all tests. In the scenario without manipulation, the participants had lower values (\( Mdn = 10 \)) than in the ASRDW scenario (\( Mdn = 205 \)). A MWU showed this difference to be statistically significant, \( U = 37\), \( p < 0.001\), \( r = 0.698\). It is therefore likely that the ASRDW method used in this experiment will in most cases achieve a higher positive deviation from the target point than no manipulation. It can be assumed that the ASRDW works.

H2: 16 participants did not notice the manipulations in the ASRDW scenario. The test for binomial distribution with \(P (X = 16)\) resulted in \( p = 0.0046 \), whereby the null hypothesis is rejected. The ASRDW method used in this work is therefore probably not noticed on average if the users are not aware that it exists or is being used.

H3: The visual manipulation during the blinking scenario and combination scenario was not noticed by any participant, consequently, the null hypothesis can be rejected.

H4: In the VRDW scenario, the deviation values were lower (\( Mdn = 0 \)) than those in the scenario without manipulation (\( Mdn = 10 \)). The null hypothesis for this was tested via MWU. The result was not statistically significant, \( p = 0.718\). The null hypothesis for this is thus retained. One reason for this could be the conservative values set for the blink detection to avoid false positives. This is reflected by the relatively small number of detected blinks reported in the previous section. Properties like hygienic regulations due to the pandemic situation, e.g., all users had to wear a protection mask under the HMD, and the relatively high number of participants wearing glasses could further reduce the accuracy of the eye tracker. Nevertheless, in average, there has been almost 3 blinkers detected or even 5, if we exclude those where the tracking did not detect any blink, which should result in a deviation. Our value of 0.6 rotation per eye blink was chosen due to the literature close to subjective equality [10]. These values were obtained from artificial experiments where the users did not actually walk. Perhaps, the respective values in scenarios including actual walking are different due to effects of cross-modulation. However, the result remains an open question and should be further investigated in the future.

H5: In the scenario without manipulation, the deviations were lower (\( Mdn = 10 \)) than in the combination scenario (\( Mdn = 235 \)). The result of the MWU was statistically significant \( p < 0.001\). The null hypothesis is thus rejected. It is therefore likely that the combination scenario for the RDW can also be used, which makes the assumption more likely that ASRDW works, since the VRDW scenario used here with a gain that was very close to subjective equality [10], alone had no demonstrable influence on the deviation of the participants.

H6: In this hypothesis, it was assumed that the participants obtained higher deviation values in the combination scenario than in the ASRDW scenario. This can no longer be interpreted in this work, as H4 was rejected. With a median of \( Mdn = 205 \), the ASRDW scenario had a lower median in the deviation values than the combination scenario (\( Mdn = 235 \)). The null hypothesis was rejected in this case with MWU \( p = 0.028 \), which is why the result was statistically significant. However, the VRDW scenario would result in an average of a maximum of five blinkers, if one excludes the participants for whom, due to the slipping of the face mask or wearing glasses, no blinkers were detected by the eye tracker in some scenarios. There were a total of six participants in the combination scenario and in the blinking scenario. In the combination scenario, this would mean a maximum of six blinkers, whereby the difference between the scenarios in relation to their average, the maximum number of blinkers would be 17%. Whether there is an unknown independent variable for this could not be found out in this work. This indicates the need for further research. It would also be possible that this difference from one blink caused this \(+30\;cm\) deviation in the median, but this cannot be justified from our data. It is worth mentioning that in the combination scenario there were consistently positive deviations with a minimum of 80 cm, which is a value to be considered in comparison to the pure ASRDW scenario with a minimum of 0 cm. If this could be replicated, the combination scenario could possibly result in a higher minimum on average than the pure ASRDW.

H7: With a significantly lower median in the VRDW scenario (\( Mdn = 0 \)) than in the combination method (\( Mdn = 235 \)), the result with a MWU with \( p <0.001 \) was statistically significant. Consequently, the null hypothesis can be rejected.

H8: In contrast to the pure ASRDW scenario, in the combination scenario, only two participants noticed the manipulation, analogously to the test of H2, this resulted in the same test for binomial distribution, only that in this case \( x = 18\), a result of \( p = 0.0001 \) with which the null hypothesis was rejected. Most of the participants did not notice that they had been manipulated in the combination scenario. This is another indication that the combination scenario should be considered further.

In summary, the hypotheses on the research questions were confirmed or rejected as probably correct. The ASRDW was rated as likely and that it should be examined more closely in combination with other methods.

6.2 Additional findings

When comparing the results of the total scores of the SSQ questionnaires [7], a lower score was calculated in the questionnaire after the experiment. This does not mean that the experiment improved the health of the participants. If the SSQ score after the experiment was lower, it can be assumed that the experiment did not have any negative effects on the health condition of the participants [3].

We also investigated the speed of the participants, which is the distance divided by the time that we derived from the tracking information of the headset. This allows us to compute not only the overall speed, but also to investigate individual parts of the paths. We have recognized that in the scenarios with ASRDW the speed decreased at around 2.5 m and 3 m. This remained at a speed of less than 0.7m/s for the next 4 meters up to at least 6.5m. This could be due to the water noises that made the participants walk more cautiously. The fact that this was also taken into account in the other scenarios could indicate a learning effect that people have already stepped into the water at this distance in other scenarios or that this happened in the scenario. The drop within the last few meters to the target point could also be due to a learning effect, since the participants had to cover the same distance in each scenario.

In the VRDW scenario, the PSE in the rotation around the up axis of \(0.495^{\circ }\) was confirmed by Langbehn [10] in practice, as the rotation in this work was just above this at \(0.6^{\circ }\) and participants statistically did not deviate significantly from the line even though they were rotated when blinking. The blinkers occurred most frequently at the beginning of each run, this may have occurred because the path was faded out shortly before the start, which is why the participants blinked afterward when focusing the house. However, this cannot be precisely determined. The knowledge about this would be relevant for a higher deviation, since a rotation at the beginning of the run results in a higher potential deviation than in the later course.

The duration of the scenarios was \( M = 41.7\) seconds and the average number of blinkers in all scenarios was \( M = 4\). This means that fewer blinkers occurred on average in this experiment than the average blink frequency of 13 per minute reported by [13]. This could be due to the method of how these blinkers were recognized and whether they were recognized and could at the same time be justified with the results of previous work that the number of blinkers was reduced by higher concentration [8].

Regarding the deviations of the ASRDW scenarios, it was noticeable that the ASRDW scenario begins the deviation from the path after the first 2 meters, but in the combination scenario it already starts in the first meter. This could confirm the compensatory movement through blinking, but this cannot be supported by the results of the hypotheses.

The male group had higher deviations in the ASRDW (\( Mdn = 230 \)) than the female group (\( Mdn = 100 \)). The null hypothesis that the group of men had a smaller or the same median for the variable as the group of women was tested with the help of a MWU. The result was statistically significant, \( p = \) 0.028. This means that men were significantly more influenced by the acoustic manipulations than the women. This coincides with the results of Feigl et al. [4].

6.3 Limitations

Even though our experiments have shown that ASRDW works, it also has its limitations. For instance, our scenario contains only very few visual cues, this is very unfavorable for visual, in our example blink-based, methods. This could have influenced the results for the eye blinking approach. Moreover, the scene was relatively dark. It is possible that the effectiveness of our method changes in a bright VR environments. Moreover, our method requires an appropriate surface. The mud and water samples give a direct warning sign to the users that try to avoid them. However, suitable sounds for other scenarios could be more challenging to identify. In our experiment, we only measured the translational deviation, i.e., we did not consider the rotation of the participants. Hence, it remains an open question whether our method can actually produce circular paths. However, even such a large area of 10x20m seems to be too small to actually let people walk in circles, at least with the speed they achieved in our experiment. This raises the question whether RDW methods are suitable for living room-scale VR. On the other hand, it remains an open question, whether our ASRDW can be also scaled to smaller room sizes. Finally, we did not measure the leg lengths of the participants. Actually, it might be possible the legs of unequal sizes of their legs tend do move into a specific direction. However, this is independent of the actual redirection method and should be investigated in a specific user study.

7 Conclusions and future work

We have presented a new method for RDW via auditory step feedback. Our method is easy and cheap to implement, and it can be combined with traditional visual methods for RDW. Moreover, we have presented a methodology to test RDW methods in large areas without the need of expensive special hardware like laser trackers. And we have conducted a user study to evaluate the applicability of our auditory step feedback. The results show that we are able to achieve a significant translational manipulation of more than 2m on a path of length 20 m without that the manipulation is recognized by the users. Additionally, our method seems to be safe with respect to simulator sickness, and it significantly amplifies visual redirection based on eye-blinking by 30 cm.

However, this first attempt of proofing the applicability of our new method also opens several avenues for future works. First, it would be interesting to find the boundaries of possible manipulation. Until now, we choose the maximum deviation relatively conservative from the pretests, so that almost no user was able to recognize the redirection. Moreover, we would like to investigate the respective parameters like DTs, PSE according to [23]. Also the scalability of our method should be further researched, e.g., whether it also works in smaller spaces like living rooms and we would like to incorporate also rotational manipulations. Currently, we mainly concentrated on relatively alarming, unpleasant sounds in case of “wrong” footsteps. In case of external sound sources, the type has influence on the acoustic manipulation [4]. However, it remains an open question if this remains with self-generated acoustic feedback via steps on difference undergrounds. The nature and material of the environment could thus be used to increase or decrease the diversion if necessary. This could be regulated by synthetically generated step noises like by Turchet et al. [24] even if the surface is made of the same material, but the step noise changes slightly on it. Obviously, it would be interesting to test other RDW methods with our large field methodology and, finally, we would like to further research the combination of different RDW techniques. Our experiments with combined visual and acoustic manipulation seem to indicate that there could be an influence on the manipulation perception curve because it has led to an unexpected amplification of the deviation.