1 Introduction

1.1 Sense of agency in remote control

An autonomous robot is “a system that operates in an unpredictable and partially unknown environment” and has “little or no human intervention for its movement” (Alatise and Hancke 2020, p. 39,830). In recent decades, autonomous mobile robots have become a prominent technology with application in numerous spheres of life, including cleaning services, customer support or education (Alatise and Hancke 2020, p. 39,832). These applications include the use of robots in challenging environments where human presence is difficult or dangerous, for example, deep-sea (Picardi et al. 2020) and space exploration (Thoesen and Marvi 2021), or search and rescue missions (Habibian et al. 2021).

The use of autonomous mobile robots and self-driving vehicles is actively growing today, which has great promise given the pace of development of artificial intelligence. However, this raises ethical issues in the field, especially those related to safety priorities of the robot and the allocation of moral responsibility. Ethical concerns regarding mobile robots primarily revolve around their decision-making processes when operating autonomously (see Winfield et al. 2014). Conversely, our focus lies in scenarios where human intervention in the robot's operation may occur. According to modern legal theories, only a human agent can be responsible for their acts. At least part of the responsibility for the actions of a robot falls on its manufacturer (Miller 2011). It is the manufacturer's task to implement morality into the robot, which will eventually lead to the emergence of the robot's own moral agency (Dodig Crnkovic and Çürüklü 2012). Finally, given the complexities of control and agency studies, responsibility may be shared in some way between robots and humans (Gunkel 2020). This position combines the advantages of both instrumentalism with respect to robots and the accommodation of the increasing agency of robots.

Responsibility in driving a vehicle or controlling a robot can be partially assigned to the driver or the operator or the robot. First, the operator may be given the possibility to intervene in the ethical design of a robot, something that has been objected to on moral grounds (Gogoll and Müller 2017). Second, the operator may have the ability to seize control of a robot. Wen et al. (2019) propose that the possibility of human intervention is important in self-driving vehicles for safety reasons. In the case of mobile robots, identical issues could be even more pressing. High-risk applications such as search and rescue missions or exploration may require serious decision-making, as a miscalculation could result in the destruction of a robot or even loss of human life. The use of robots in high-risk situations might be based on human–robot interaction, even if the majority of robot operations are executed autonomously. Human agents should be able to intervene and take over control of the robot. The difference from conventional automatic driving here is that a mobile robot is handled by a professional operator and not by the user of the product. This does not mean that the operator will have full responsibility for everything that happens with the robot. Rather, the operator should provide a positive contribution by ensuring that the robot works correctly and safely.

In order for a person to feel involved and have a sense of responsibility for the operations of a robot, they need to have a sense of agency over the unfolding events (Haggard and Tsakiris 2009). The sense of agency (SoA) is the sense that the agent is "the one who is causing or generating an action". It includes the sense of control over the ongoing actions (Pacherie 2007). Note that SoA is distinct from agency or control itself: an agent can act without SoA, or conversely, have SoA regarding something they do not control (Sato and Yasuda 2005). The agent does not have to be alone in performing actions: the literature implies a possibility of joint agency (Bratman 2009; Jenkins et al. 2021), where the activities of a fellow co-agent can be perceived as one’s own and generate first person SoA (Pacherie 2014; Sahaï et al. 2017). Action without the sense of control may rely on practical knowledge and luck, but this approach is at best sub-optimal. Rule 7 of The Eight Golden Rules of Interface Design (Shneiderman 1987) states that an interface should support the operator's internal locus of control, and studies of SoA may provide the key to ensuring this rule.

Currently, there is a number of papers that tackle the relationship between SoA and automation. SoA naturally weakens with growing system automation (Berberian et al. 2012), and in the vehicle control scenario it weakens significantly even in assisted driving mode (Yun et al. 2019). “Disengaged” drivers are less proficient in gaining control over the vehicle in automated steering (Navarro et al. 2016). To solve the problem of SoA dissolution in automated driving, a variety of strategies have been proposed (Wen et al. 2019; Ueda et al. 2021), but the effects of automation are not the only issue if we address remote control of mobile robots.

In remote control scenarios, the operator is devoid of direct sensory feedback and relies solely on the transmission of information from the mobile robot. One of the mechanisms responsible for the emergence of SoA relies on implicit comparison between sensory outcomes of actions and sensory predictions based on motor commands (Frith et al. 2000; Zito et al. 2020) or, as others have suggested, SoA could rely on motor signals themselves (Christensen and Grünbaum 2018). This mechanism grounded in motor control is not the only one responsible for the emergence of SoA, but it plays a significant role in bringing about the experience of agency (Synofzik et al. 2013; Zito et al. 2020). However, if an agent acts using an external device, the action ceases to be a bodily one, up to the complete absence of overt movement in the case of brain-computer interfaces (Vidal 1973; Kawala-Sterniuk et al. 2021). The case of remote control is aggravated by the fact that the effects of the action are taken out of the operator's immediate environment. Of course, ordinary people learn to control characters in video games with success, but it would be useful to quantify SoA in similar situations even without taking in account the prospects of robot automation.

As we have shown, there is considerable value in studying SoA in the context of remote robot control despite the advancing autonomy of robotic devices. Since remote control has some special unique characteristics as a method of human–robot interaction, the applicability of the available SoA indicators should be evaluated for it in separate studies.

1.2 Implicit correlates of SoA

It is possible to study the intensity of SoA by examining explicit judgments of agents and probable implicit indicators of SoA. The use of explicit judgments is informative, but it has its limitations. Two-level theories of SoA (Bayne and Pacherie 2007; Synofzik et al. 2008) make a distinction between an immediate low-level variety of SoA (“feeling of agency”) and a reflexive high-level variety, expressed in judgements (“judgment of agency”). The coincidence between these components is not guaranteed, and some authors have argued for the possibility of dissociation between the two components (e.g. Braun et al. 2014; Dewey and Knoblich 2014; Van den Bussche et al. 2020). When we act in normal settings, and things go as planned, we do not formulate judgments about our causal role in the ongoing events: this happens when we doubt our performance. Moreover: judgments can be influenced by different beliefs and prejudices of participants, especially regarding differences between the experimental conditions (Wen 2019). These disadvantages of explicit judgments could be overcome by using the implicit indicators of SoA, that presumably correlate with low-level SoA.

Probably the most well-known implicit indicator of SoA is intentional binding (IB)—the phenomenon of subjective temporal convergence of a voluntary action and its sensory effect (Haggard et al. 2002). Tanaka et al. (2019) list five methods of registering IB. Let us consider these options and their applicability in remote control. In the first method, an experimenter asks a participant to register the moment of action or its effect on the Libet clock (Libet et al. 1983), as in the original experiment by Haggard et al. (2002). This option is far from ideal for remote control, as it requires attending to the clock, which could distract the participant from robot control. The second option is to ask a participant to estimate the temporal interval between an action and its effect (e.g. Engbert et al. 2008). This option requires the participant to estimate time intervals in ms, yet this task could seem to be impossible and counterintuitive for some participants. The third option is to ask a participant to reproduce the time interval by holding a button (e.g. Dewey and Knoblich 2014). This option could be the most proficient, though it has not been studied well enough. The other two options of IB measurement require reconstructing the sequence (e.g. Haering and Kiesel 2012) or simultaneity of events (e.g. Cravo et al. 2011), but we are unsure whether these options could be effectively integrated in remote control.

The main weakness of many IB studies is the ambiguity of the results due to the coincidence of SoA and high predictability of events. Despite a small variation in the delay between an action and its sensory consequence, the timing of the result is predictable: otherwise, the agent ceases to consider their action as its cause. In contrast, an involuntary action performed with TMS (transcranial magnetic stimulation) or an exoskeleton is unpredictable. IB is modulated by beliefs of participants, it decreases if they do not think that consequent events are caused by their actions (Moore and Haggard 2008; Desantis et al. 2011). It is argued that IB is simply a subjective convergence between the cause and its effect (Buehner and Humphreys 2009; Buehner 2012), and that voluntariness of action has nothing to do with it apart from predictability. It has still not been determined to what extent IB is affected by predictability of events and the presence of perceived causal connection between them (Hughes et al. 2013; Tanaka et al. 2019).

We believe that in some cases distance estimates can be more effective than time estimates in implicit SoA assessment. Estimation of traveled distances fit organically into various vehicle control situations. However, it is important to note that in the context of assessing the presence of SoA in the driver or robot operator, distance estimates serve as merely an indirect measure, albeit one tailored to the specific circumstances of control. The spatial variety of intentional binding was first described by Buehner and Humphreys (2010) and later by Kirsch et al. (2016). Notably, they framed this phenomenon as causal binding, criticizing the connection of IB with voluntary movement. In our previous study (Yashin et al. 2022), we considered a phenomenon somewhat similar to, but still different from spatial IB. In that study, participants were steering a robotic wheelchair and estimated the distance it traveled in one trial. We found that subjective distance estimates made by participants were significantly larger when they controlled the wheelchair, and not passively experienced its automatic movement. Although this pattern was inversely related to normal IB, we believe that the change in distance perception during vehicle control relies on similar mechanisms. Importantly, in that design we separated the factors of control and predictability.

1.3 SoA in virtual reality

VR technology is a good candidate for improving remote control (or teleoperation) systems. Trying to enhance the sense of immersion and situational awareness of the user, Stotko et al. (2019) implemented a VR system for live exploration with a mobile robot. Compared to a simple setup with a monitor, VR provided more positive user experience: the participants rated their navigation skills higher in the VR. Previously, Martins & Ventura (Martins and Ventura 2009) obtained similar results: VR enhanced situational awareness, depth perception and performance in a simulated search-and-rescue mission. Jankowski and Grabowski (2015) used a VR interface for the control of an inspection robot, finding that it improved performance and user experience of the participants in a task where they picked and delivered objects by using the robot. Whitney et al. (2020) showed that dexterous manipulation tasks can be completed more effectively by using a VR interface.

It is likely that shifting an agent’s point of reference could make remote robot control more direct, moving it closer to motoric manipulation of objects. Presumably, VR provides the agent with a greater sense of presence. Does this shift of perspective translate into increased SoA, and speaking in practical terms, can IB be applied to studying SoA in virtual reality? There have been studies of the temporal variety of IB in VR, although not for teleoperation tasks. In the experiment by Kong et al. (2017), participants learned to control a virtual avatar using VR, and reproduced IB in this environment. After they became fully accustomed to VR control, the strength of IB in VR did not differ significantly from IB during ordinary bodily movements. In a study by Winkler et al. (2020), the use of VR did not significantly affect IB in interaction with virtual objects. IB was not affected by an increased latency between an action and its virtual outcome, although the scores of SoA (quantification of judgments of agency on a 10-point scale) decreased. Jeunet et al. (2018) examined judgments of agency by manipulating SoA in VR. They obtained a set of significant effects while intervening in participants’ actions in different ways. In particular, they increased latencies and made unpredictable changes to action outcomes. This, too, led to a decrease in SoA scores. Taken together, these studies show that implicit correlates of SoA are applicable in virtual reality, and confirm that participants can have low-level and high-level SoA in VR.

Though not in the context of SoA studies, distance perception has been investigated in VR. In general, participants tend to underestimate distances when using head-mounted displays (Cutting and Vishton 1995), but as they interact with the virtual environment, their estimates become more accurate (Richardson and Waller 2007). Distance estimates are not affected by such interface parameters as field of view and binocular viewing restrictions (Creem-Regehr et al. 2005) or graphics quality (Thompson et al. 2004). Different methods have been used in studies of distance perception in VR, including reports after blind walking (Thompson et al. 2004; Richardson and Waller 2007) or instant teleportation (Keil et al. 2021), and bisection without moving (Bodenheimer et al. 2007). In all cases, the distance to a certain object in the subject's field of view is at issue. Despite the manipulation of various interface parameters, it is currently unknown why humans chronically underestimate distance in VR (Jamiy and Marsh 2019).

In this study, we aimed to test whether the implicit correlate of SoA measured by the difference in distance estimates in active and passive conditions is reproducible in the case of remote control of a robot. We were equally interested in how the putative implicit correlate of SoA and explicit SoA scores would be affected by the use of a VR headset for visual feedback compared to using a monitor. According to our hypothesis, the difference in distance estimates between the active and passive conditions should have been confirmed for remote robot control. We formulated this hypothesis for the active and passive conditions where the participant received feedback via a monitor screen (H1) and for the conditions where they used VR goggles (H1’).

H1:

When teleoperation is conducted via a monitor screen, the mean subjective distance estimates of the path traveled by a teleoperated mobile robot are greater when the operator actively controls the robot compared to when they passively observe its movement.

H1’:

When teleoperation is conducted using virtual reality goggles, the mean subjective distance estimates of the path traveled by a teleoperated mobile robot are greater when the operator actively controls the robot compared to when they passively observe its movement.

We also anticipated that the use of VR would create a greater relative difference between the passive and active condition, than in conditions with a monitor (H2).

H2:

The relative differences between the mean subjective distance estimates of the path traveled by a teleoperated mobile robot in the VR conditions are greater than differences between analogous estimates in the conditions with a monitor.

We formulated H2 for relative distances, as these distances may be perceived differently in VR regardless of the operator's control over the robot. It is important to note that confirmation of both H1 and H1’ was a necessary condition for H2. The rejection of either H1 or H1' would suggest that distance estimates cannot serve as an implicit correlate of SoA in our design, or that participants lacked SoA in the teleoperation process. H2 was formulated as a way test variations in the expression of SoA by the means of distance estimates.

In addition, we believed that ratings of comfort would be significantly higher in the active condition with VR than in the active condition with a monitor (H3).

H3:

Subjective comfort during mobile robot control, as rated on a Likert scale, is higher when the operator uses VR goggles than when they use a monitor.

2 Materials and methods

2.1 Participants

In total, 36 naïve right-handed healthy volunteers (21 males and 15 females, aged 24.3 ± 3.8 years, M ± SD) participated in this study. All participants gave informed consent prior to their involvement in the experiment. The experimental procedures were approved by the local ethics committee and were in agreement with the institutional and national guidelines for experiments with human participants, as well as with the Declaration of Helsinki. The minimum sample size of 36 participants was determined by using G*Power 3.1 (Faul et al., 2009): we calculated it for one group ANOVA with 4 repeated measures, with the effect size f = 0.25, error probability α = 0.05 and power (1 − β) = 0.95.

2.2 Equipment

For the experiment, we developed an experimental setup, modelling remote control of a robot (Fig. 1). We decided to use a real robot, rather than a simulation software, in order to train the participants to control a real device. We also wanted the participants to perceive a real environment when operating the robot with a VR-headset.

Fig. 1
figure 1

a A sketch of the robot, b The robot on the rail track

In our study, a robot of the YARP series was used. The robots of this series were created to work with the architecture, interaction and behavior models for swarm robotics. This architecture allows one to collect and process data from several independent controllers, take inputs from different sensors and be controlled from a single control panel, in our case from a computer. One of the main features of this architecture is hardware extensibility under the I2C protocol. This series of robots has a flexible structure, extensive tools and custom software that allows them to be used for various experiments with mobile robots.

The robot consisted of two parts: the moving chassis and the housing with controller electronics. At the top of the platform was mounted a USB camera (Canyon CNS-CWC6N, Canyon, Netherlands). We did not use a wireless camera to reduce the delay in image transmission, which positively affected the visual experience. The cord was not detrimental for robot movement, as it only moved. For the consistency of movement and the repeatability of the experimental conditions during the study, the robot was equipped with a rail movement system and a special rail track was created for it. The proposed chassis had a double front wheel drive, two passive wheels rotating around the Y axis, and four wheels that rested on both sides of the rail track due to the spring suspension.

The robot was controlled from a computer using ROS (Robot Operating System; Open Robotics, USA) software. For convenient data collection, processing and control, the Rosserial protocol was used, which allows to compute several independent processes simultaneously. On-board control of the robot relied on an ATmega328-based microcontroller with an expansion board, which controlled the motors using the motor driver. Communication between the controller and the application (control system) on the computer was carried out via Bluetooth. In the user application, the experimenter selected the mode of movement of the robot (manual or automatic), and a participant controlled the robot and entered distance estimates.

To implement the virtual reality conditions (as an alternative to the monitor), we used an Oculus Rift CV1 headset (Oculus, USA). The viewing angle of the headset was 110°. When a participant was using the headset, the active window occupied their entire field of view. Only when turning their head by 90°, the participant could see the boundaries of the image. As the robot moved in a straight line, and there was no need to turn one’s head.

2.3 Experiment design

A participant was seated in a comfortable armchair in front of a table with a computer monitor, a keyboard and a mouse on it. On the right side of the table we placed a laptop, that was used by the experimenter to control the course of the experiment. The virtual reality headset was also placed on the table when it was not used by the participant. In the experiment, the participant watched the movement of the robot along a rail track (Fig. 2A). Depending on the experimental condition, the image from the camera of the robot was displayed either in the VR headset, or on the monitor. The sound was also transmitted either through the headphones of the VR headset, or through the built-in laptop speaker. The visualization of the video stream in VR was based on possible approach for remote control of a mobile robot, in which the video stream is displayed in its original form in a graphical window in front of the user, e.g. (Jankowski and Grabowski 2015; Stotko et al. 2019). In our case, when studying the SoA, it was necessary to remove distracting factors such as additional indicators in VR or visual boundaries of the graphical window to maintain the participant's concentration on the experimental task and, as a result, their deeper immersion in the control process.

For this purpose, the settings were adjusted to ensure that the entire visual field of the participant was occupied by the camera image. At the same time, to maintain the illusion of perspective shift, we adjusted the curvature of the VR window as if the participant was positioned in front of a huge curved screen. When moving the head, the participant could see the edge of the screen, which would also harm their immersion into robot control, so the size of the graphical window was increased to the point where the participant could see only 30% of the entire window in the VR headset, while the image being centered on the rail track (Fig. 2B). In this case, the boundaries of the window could only be seen with a significant turn of the head.

Fig. 2
figure 2

Camera image of robot movement on the computer monitor (a) and part of the image in VR that a participant can see without turning their head (b)

The robot and the track were in the same room as the participant, but were located outside of their field of view. In two experimental conditions, the participant tracked the camera image by looking at the monitor, and in the other two, they put on the headset to see the camera image.

In half of the conditions the participant controlled the robot using the mouse (“active” conditions), and in the other half they passively observed the automatic movement of the robot (“passive” conditions). In total, the experiment included four conditions, which independently combined 2 factors (Fig. 3).

  • VRPas—The robot moves automatically; the image is displayed in the VR headset.

  • MPas—The robot moves automatically; the image is displayed on the computer monitor.

  • VRAct—Participant controls the robot; the image is displayed in the VR headset.

  • MAct—Participant controls the robot; the image is displayed on the computer monitor.

Fig. 3
figure 3

Types of experimental conditions

All aforementioned activities were perfectly safe for participants and required minimal physical effort. Every condition included 30 trials. To become acquainted with the procedure, the participant completed 2 or 3 trials from every condition. The order of conditions was random for each participant with a restriction: two conditions with the same mode of presentation (VR/Monitor) were never placed consecutively. For example, if MAct was the first condition, it could be followed either by VRPas or VRAct condition, as they include a different mode of presentation (Fig. 4). This restriction left us with eight possible condition orders.

Fig. 4
figure 4

An example of condition sequences

2.4 Procedure

The structures of trials in the active and passive conditions are shown in the flowcharts (see Fig. 5). In the beginning of each trial, the camera image appeared, and a metronome sound (“tick”) informed the participant that robot movement could occur. In the active conditions (Act), it notified the participant that they could initiate robot movement by pressing the “8” key on the keyboard or the left mouse button. According to the instruction, the “tick” was not a command for movement initiation, but merely a prompt. In the passive conditions (Pas), the “tick” sound informed the participant that the robot could start moving at any time. In reality, the robot started moving in 1.5–3 s after the sound (the interval was random). To sustain movement with a constant velocity, the participant was asked to hold the left mouse button. If they released the button too early, an error message was displayed on the screen, the robot returned to its initial position, and the trial was run again. This issue could not arise in the case of automatic movement.

In the Act conditions, the task of the participant was to cover a certain distance and stop the robot by releasing the left mouse button. The participant did not know the distance in advance. A short sound (a standard Microsoft Windows (Microsoft, USA) sound called by the MessageBeep function) informed the participant that they had covered the necessary distance, and needed to stop the robot. We instructed the participant that the distances at which the signal sounded were completely random, but in reality there were six possible distance values: 1, 1.2, 1.4, 1.6, 1.8 and 2 m. A random value out of these six was picked in every trial. After the sound, the participant released the button. However, the robot was preprogrammed to decelerate after the sound in all conditions, and covered a certain braking distance (see details in Results section). Several participants noticed that the robot stopped automatically even in the Act conditions. However, we informed them that automatic braking was a safety measure, as theoretically someone could lead the robot to the end of the rail track (which was partially true).

The distances in our experiment were relatively large, which arguably goes against the literature on temporal IB. In the studies with a Libet clock (Haggard et al. 2002; Ruess et al. 2017) IB weakened as the action-effect latencies grew. However, in studies where the temporal interval estimates were used, the magnitude of IB did not decrease (or even increased) with latencies (Wen et al. 2015), with the exception of study by Imaizumi and Tanno (2019). Humphreys and Buehner (2009) registered IB for temporal intervals of 4 s.

As in our previous study (Yashin et al. 2022), we tried to separate the factors of control and predictability in our method. The participant did not generate the sound directly by a key press, but in the Act conditions the presentation of the sound did not take place unless the participant pressed the key and thus moved the robot. This way, we tried to reduce the effect of predictability on the results by sacrificing the immediacy of the causal link between the action and its result. Since voluntary actions are more predictable than other events, but agency cannot be reduced to predictability, this separation is important for studying the sense of agency. Driven by this consideration, in the current study, we implemented a similar design. As the design had been validated by using distance estimates, we asked participants to measure distance. Probably, estimates of time intervals given in ms could be used as effectively, but this procedure requires validation, and, as we pointed out in the Introduction, measuring distance seems to be a natural activity that is quite compatible with vehicle control.

After the robot stopped, the image of the track became blurred, and a dialog box appeared on the screen. The box contained an input field and number buttons. In the input field, the participant entered an estimate of distance (in cm) that was covered by the robot from the start of movement till the sound signal. The participants estimated the distance in every trial: they were entered via a Numpad or by clicking the number buttons on the screen with a mouse, which for many participants was more comfortable in the VR conditions. We asked participants to give honest estimates, and at least try to determine the distance accurately. They did not know the range of estimates, but we told them that the distance could be greater than 1 m, even though we asked to give estimates in cm.

Before the experiment, we asked participants not to use spatial references to estimate distance, but to focus primarily on the internal sense of movement. When the participant entered the estimate, the camera image would disappear, and the robot would automatically move back to its initial place.

After the four conditions, the participant was asked to fill out a questionnaire. There, the participants answered several questions by using a 10-point scale (from 0 to 9). The questions were asked in Russian. Below we provide their translation into English (Table 1); see the original formulations in Russian in “Appendix 1”.

Table 1 The questionnaire presented after the experimental conditions

In the literature, several questionnaires designed for measuring SoA have been proposed (e.g. Polito et al. 2013; Tapal et al. 2017)). However, we opted not to use them due to their complexity. We required subjective judgments of control and causation to validate the design. Overall, we assumed that participants would perceive the differences between conditions with and without control as transparent.

2.5 Statistical analysis

We used the repeated measures ANOVA to analyze median distance estimates and distances. Pairwise comparisons of subjective scores were made by using the signed-rank Wilcoxon test. We counteracted the multiple comparisons problem by running the Benjamini-Hochberg (Benjamini and Hochberg 1995) procedure (FDR) with a false discovery rate Q = 0.05. Statistical analysis was performed using Statistica software (Statsoft, USA) and Statsmodels 0.14.1 Python package.

3 Results

3.1 Distance estimates

To calculate the effect of the Activity factor on distance estimates we conducted two-way repeated measures ANOVA for MAct and MPas conditions, and for VRAct and VRPas conditions. We analyzed medians of distance estimates, which are less sensitive to statistical outliers, than mean values. In the analysis, the estimates are equivalent to differences between the estimates and real distance values, since the Distance factor was present. We performed two analyses rather than one three-way ANOVA because we were not interested in the difference between the estimates in VR and with the monitor: the estimates could have varied simply because of the difference in perspective. We wanted to know whether the Activity factor influenced the estimates in one and the other perspective. Had there been an effect in both, we would have compared the strength of the effect in either case using other methods.

While analyzing the data, we realized that not all participants were presented with distances of all 6 values in every condition. Because of this, the number of subjects in the analysis of distance estimates reduced to 33. We could not run additional experiments due to time constraints.

As in our previous study (Yashin et al. 2022), on average participants underestimated the distances. In the monitor conditions, two-way repeated measures ANOVA showed that the Distance (F(5, 160) = 37.329, p < 0.00001) and Activity (F(1, 32) = 5.2679, p = 0.0284) factors were significant. The interaction between the factors was not significant (F(5, 160) = 1.9343, p = 0.0915) (Fig. 6), which impeded the post-hoc analysis. The significance of the “Distance” factor indicates the overall plausibility of the estimates: the farther the robot traveled, the higher were the estimates given by the participants. The significant effect for the “Activity” factor confirms our hypothesis (H1): the distance estimates were greater when the participants controlled the robot.

In the VR conditions, the two-way repeated measures ANOVA also showed the significance of Distance factor (F(5, 160) = 35.136, p = 0.00001), but not the Activity factor (F(1, 32) = 0.3492, p = 0.5587) (Fig. 7). Thus, the effect of control on distance estimates was not reproduced in VR, and our hypothesis H1’ was not confirmed. Since H1’ was not confirmed, we did not compare the pairs of VR and Monitor conditions, as confirmation of H1’ was necessary for H2 (Figs. 6, 7).

Fig. 5
figure 5

Flowcharts of the experimental procedure in different conditions. a: Procedure in the Act conditions; b: Procedure in the Pas conditions

Fig. 6
figure 6

Group means of median subjective distance estimates in Monitor conditions. Vertical lines denote 95% confidence intervals

Fig. 7
figure 7

Group means of median subjective distance estimates in VR conditions. Vertical lines denote 95% confidence intervals

3.2 Braking distances

In the Materials and Methods section, we mentioned that after the stop signal, the robot traveled a certain braking distance. The deceleration of the robot was slow to conceal the fact that it stopped automatically, even if a participant released the mouse button in the Act conditions. However, participants still could stop the robot before the sound, which led to a repetition of the trail. To assess the stability of the braking distance, we measured it after different trial distances (1–2 m). Below we present a table (Table 2) of group average braking distances and standard deviations in the four conditions.

Table 2 Group means of robot braking distances in four experimental conditions after all trial distances
Fig. 8
figure 8

Group mean of braking distances, averaged over six trial distances. Vertical lines denote 95% confidence intervals

Fig. 9
figure 9

Box & Whisker plot of video delays in VR and Monitor

In this table, one can see that the braking distances in the ActM condition appear to be smaller than in the other conditions. To see if this was true, we performed a two-way repeated measures ANOVA. In every participant, we calculated mean braking distances in each of the four conditions, and used these values in the analysis (Fig. 8). The factors in the analysis were Activity (Act/Pas) and Presentation (VR/M). The analysis showed significance of the Activity factor (F(1, 35) = 10.659, p = 0.002) and Presentation factor (F(1, 35) = 113.91, p < 0.00001), factor interaction was also significant (F(1, 35) = 22.15, p = 0.00004). Post-hoc analysis (Fisher LSD) showed a difference between the ActM condition and all other conditions: in the ActM mode, the stopping distance was significantly less (p < 0.00001 for all comparisons). On average, it differed from braking distances in other conditions by 2.25 ± 0.39 cm.

Fig. 10
figure 10

Control scores in different experimental conditions

Here, we should note that this effect is the opposite of the effect for the subjective estimates: in this mode, participants estimated distances in the ActM condition as significantly greater than in the PasM condition. The detected difference between the braking distances could only weaken the effect for the estimates, since the estimates were predictably lower for smaller distances. But why were the braking distances different in the ActM condition? We speculate that it was due to the timing of mouse button release. Apparently, in the ActM condition participants released the button fast enough that braking after the sound started earlier than if it had started automatically. This detail is a limitation of our current method, although an interesting one, as we will see in Discussion.

3.3 Video delay

To evaluate the difference between the presentation of the camera video from the robot in the VR and on the monitor screen, we performed the following test. As an indicator, we measured the delay between a key press and the start of the actual movement of the robot in both cases. A square mark was added to the robot control GUI that was recolored when the forward movement key was pressed. The video image was recorded during successive test runs of the robot using Open Broadcaster Software (OBS).

In the case of the monitor, the final video recording was done at 60 Hz. For recording the image from the VR headset, an additional monitor was connected on which the image from the VR headset was duplicated and the video was recorded from it at 60 Hz. The recording procedure consisted of the following steps. The experimenter turned on the video recording, then pressed the key to start the forward movement approximately for 3–4 s. This procedure was done 105 times each for the monitor and VR, as 105 is the required sample size for a two-sample t-test if a medium-size effect is expected.

In the final video recordings, the difference between the frames was calculated: the frame with the recoloring of the square and the moment when the remote robot started moving. For convenience in tracing the start of movement, visual landmarks were placed next to the robot. Then, the difference was converted to milliseconds.

A two-sample independent t-test (Fig. 9) showed no significant difference between the VR and monitor delays (t = − 0.712, p = 0.478). Following Quertemont (Quertemont 2011), in order to verify the absence of an effect, we performed an equivalence test. Assuming that a 20-ms difference in delays would have been consequential in our setup, our null-hypothesis was that mean delays of the monitor and the VR headset differed for more than 20 ms. The equivalence test refuted the null-hypothesis (t = -2.158, p = 0.016), meaning that the delay difference between VR headset and the monitor did not (on average) exceed 20 ms.

It is important to note that during recording from the VR, there was a near-limit overload of the GPU due to the recording program. This was visually noticeable when inspecting the final recordings. Accordingly, the final difference was probably shorter in the real experiment, and the deviance of delay values was lower.

3.4 Survey

In a survey after the experimental conditions, the participants assessed their sense of control over the robot in all four modes on a 10-point scale from 0 to 9. We posed this question as a means of validating the system: we anticipated that participants would attribute control of the robot to themselves, despite the randomness of sound presentation. The signed-rank Wilcoxon test (Fig. 10) showed a significant difference between perceived control in ActVR and PasVR conditions (Z = 4.7326, p < 0.00001), as well as between ActM and PasM conditions (Z = 4.8403, p < 0.00001). There was no significant difference between ActVR and ActM conditions (Z = 1.1075, p = 0.268).

For the purposes of validation, we were also interested in causation scores in the four conditions. The difference between perceived causality Fig. 11) in the ActVR and PasVR conditions was significant (Z = 3.2942, p = 0.001), as was the difference between the ActM and PasM modes (Z = 3.2958, p = 0.001). Yet again, the difference between ActVR and ActM conditions was not significant (Z = 1.3416, p = 0.1797). The effects observed for the control and causation scores validated our system: participants in the active conditions believed they were controlling the robot and contributed to its stopping, in contrast to the passive conditions.

Fig. 11
figure 11

Causation scores in four experimental conditions

Participants also evaluated the overall difference between conditions with and without VR headset. The convenience scores in the VR and monitor conditions differed significantly (Z = 2.0421, p = 0.0411) with higher scores of monitor control (Fig. 12a). However, according to the adjustment for multiple comparisons using the Benjamini–Hochberg procedure, this result is not significant. We attach a table with p-values and calculated q-values for survey results in “Appendix 1”. There was no significant effect (Z = 0.0309, p = 0.9753) between the conditions in respect to convenience of distance estimation (Fig. 12b). The hypothesis H3 was not confirmed: robot control using VR was not more comfortable compared to a monitor.

Fig. 12
figure 12

a Comfort scores in VR and Monitor conditions; b Distance estimation comfort scores in VR and Monitor conditions

4 Discussion

4.1 Estimating distance in VR and with a monitor

In our experiment, participants interacted with a remote-controlled robot that moved along a straight rail track. In half of the experimental conditions, a participant controlled the movement of the robot, while in the other half, they watched the robot move automatically. The ways of image and sound presentation also differed: the participant either received the image and sound through a VR headset, or via the monitor and in-built laptop speaker. In all conditions, participants subjectively estimated the distance traveled by the robot before a sound, which notified the participant that the required distance had been covered. The sound occurred after the robot traveled a random distance of one of six types ranging from 100 to 200 cm with a step of 20 cm. The sound was presented randomly to separate the factor of control from the predictability of action outcomes. The coincidence of these factors in most experiments calls into question the implicit indicators of the sense of agency (SoA), which might arise solely due to stimulus predictability.

We anticipated that the distance estimates would be significantly greater in the conditions where participants controlled the robot (Active conditions) compared to the conditions where the robot moved automatically (Passive conditions). This expectation held true for both when the participant used a monitor (H1) and VR goggles (H1’). We made this assumption based on the results of our previous experiment, where a similar effect was observed when participants were driving a robotic wheelchair. In that experiment, a participant moved in the vehicle rather than stayed at the same place, which is the main difference of that method from the current one.

Apart from confirming the effect for mobile robot control, we expected it to be more pronounced in the VR conditions. According to our second hypothesis, the relative difference in distance estimates between the Active and Passive conditions in VR should have been greater than the analogous difference between conditions with the monitor. This hypothesis (H2) was based on speculations about greater immersion in conditions with the VR headset: the participant's perspective coincided with that of the robot, whereas in conditions with the monitor, the operator observed its movement in a detached manner. Given the results obtained by other authors (Kong et al. 2017), we believed that this shift of perspective would enhance SoA.

Overall, the distance estimates made by participants were valid: on average, they grew with real distances. At the same time, the results of the experiment confirmed H1 but not H1’, and consequently, H2. We reproduced the difference in distance estimates in the conditions with the monitor, but the effect was not observed in the conditions with VR, which made further comparison of presentation modes meaningless.

Since the effect we reproduced has not been widely studied, we are currently unable to provide an explanation for why the estimates of covered distance are larger when an agent controls a vehicle rather than observes its movement. Importantly, estimates are more accurate in these conditions, as the traveled distances overall appear shorter to the participants. Nevertheless, we refer to these estimates as the larger ones, not the more accurate ones, since we believe that the discussed effect is related to temporal binding. In temporal binding, the estimates of time intervals are less accurate in the active conditions, so we hesitate to associate estimation accuracy with SoA. When it comes to temporal binding, the explanatory models of the effect are still being developed, though they appear in the more recent literature (e.g. Lush et al. 2019)). Probably, as the models of human perception of time in distance evolve, it will become clear whether the difference between estimates in a design similar to ours has some connection to estimation accuracy.

4.2 Judgments of control and causation

To interpret the distance estimate results more accurately, we sought to validate our system through a survey of participants. We inquired the participants about their perceived control over the robot and their understanding of the causal connection between their actions and the robot's stopping. We assumed that participants would attribute control to themselves and believe that they were responsible for stopping the robot. We sought this validation because of the unpredictability of the sound that signaled a participant to stop the robot. The participant controlled the movement of the robot, but not the moment the sound was presented.

The scores validated our experimental design. In the subjective scores, participants acknowledged control over the robot in the active conditions but not in the passive ones. Additionally, a significant difference was observed between the scores of apparent causation. The participants generally believed their actions to be the cause of the sound, despite the lack of direct control. In this regard, it would be useful to mention the reports of several participants after the experiment. Some participants reasoned about the cause of the sound in accordance with the counterfactual understanding of causation (Lewis 1973): if they had not been pressing the button, the robot would not have traveled the required distance and the sound would not have occurred. This intuition about the nature of causation contrasts from the understanding of causality as production, a direct generation of one event by another (Hall 2004). Thus, the participants did not produce the sound by pressing the keys, but performed actions without which the sound would not have occurred. The effect obtained for distance estimates in the active and passive conditions with the monitor is consistent with scores of control and causation. However, differences between control and causation scores in the active and passive conditions were also seen in the VR conditions.

The lack of the effect for distance estimates in the VR conditions is not in line with survey results. Nevertheless, research on implicit indicators of SoA provides a line of reasoning to handle such cases. Some authors believe that the immediate low-level SoA and explicit judgments of agency do not have to match (Dewey and Knoblich 2014). This position could be questioned (Imaizumi and Tanno 2019), yet it is consistent with our results: low-level differences were not accompanied by high-level differences.

4.3 Subjective rating of comfort

According to our third hypothesis (H3), using VR for motor control should have been more comfortable for the participants. This hypothesis was not confirmed, as there was no significant difference between the comfort scores in conditions with VR and the monitor: on average, participants considered these conditions to be similar in terms of overall comfort and ease of making distance estimates. We should note that participants reported lower comfort scores in the VR conditions, but this difference was not statistically significant after applying correction for multiple comparisons. If significant, this result would have been in line with the results obtained for distance estimates.

4.4 Distance estimates and SoA

How could one explain the fact that no difference between the distance estimates was obtained for the VR conditions? In terms of experimental procedure, they did not differ from the conditions with the monitor. It is possible that robot control in VR was perceived as uncomfortable by the participants. The scores of comfort may not have reflected this fact vividly enough because the use of VR technology was novel and interesting for the participants. Of course, it is unlikely that VR per se suppresses SoA: other authors concluded that the use of VR did not lead to a decrease in temporal IB (Winkler et al. 2020). In our setup, VR technology failed to provide immersion and may have induced VR sickness in the participants. It is known that VR use can lead to motion sickness, although its occurrence is difficult to predict (Chang et al. 2020; Chattha et al. 2020). In turn, susceptibility to motion sickness is known to increase the likelihood of VR sickness (Llorach et al. 2014).

VR sickness has been widely studied in recent decades, but its causes are still somewhat elusive (Chang et al. 2020): for example, it is not determined by the quality of graphics or the parameters of FOV. On the other hand, lack of experience in dealing with VR (Freitag et al. 2016) is a factor contributing to VR sickness. Most of our participants had no prior VR experience and did not have much time to accommodate. Also, unnatural depth of field processing can contribute to VR sickness (Carnegie and Rhee 2015). The image presented in VR typically does not support a visual difference between in-focus and out-of-focus objects, which creates the depth of field in human vision. In VR, the presented image is uniform. This factor could play a significant role in our design, since we asked the participants to measure distances. Keshavarz et al. (2011) observed that motion sickness was induced by the presence of visual surround: the objects that are irrelevant to the task and that can be seen around the FOV. This phenomenon was observed both when participants used a VR headset and when they watched a video projected on a wall. In our implementation of VR, a participant could still see the edges of the image, if they turned their head far enough. Keshevarz et al. propose that visual surround can create a conflict between visual and vestibular systems.

Another possible contribution to VR sickness is also related to discrepancy between vision and other modalities, which overall may be one of the main causes of VR sickness (Chang et al. 2020). In driving and flying simulations, sickness can result from the absence of vestibular and proprioceptive sensation associated with motion (Rangelova and Andre 2019). In the VR conditions in our study, image and sound were transmitted through the headset, and the participants’ perspective was transferred to the robot. However, the visual presentation of motion in the VR was not accompanied by appropriate vestibular and proprioceptive sensations. In scenarios involving anthropomorphic virtual avatars, proprioceptive and tactile cues play crucial roles in achieving high immersion and adaptability of the user. Congruent visuotactile stimulation enables manipulation of the user's sense of ownership and self-location (Maselli and Slater 2014). Proprioception exhibits resilience to changes in avatar behavior. Despite frequent variations in avatar behavior, the sense of embodiment with respect to the virtual avatar persists, even though this may complicate adaptation (Soccini et al. 2022).

Our participants could have experienced the so-called space motion sickness, which is observed in spaceship control (Heer and Paloski 2006). The participant’s action (moving forward) was not consistent in terms of its sensory consequences. As of today, there are no reliable techniques that could deal with motion sickness in driving simulations in VR (Rangelova and Andre 2019).

The explanation of our results that involves motion sickness is directly connected to SoA. It is known that one of the conditions for the emergence of SoA is congruence between intention and the sensory results of an action (Synofzik et al. 2013). We believe that deterioration of this relationship in vestibular and proprioceptive sensory modalities led to a weakening of the immediate low-level sense of agency, but not to the extent that it affected explicit judgments. If this line of reasoning is correct, our results speak in favor of exploring implicit correlates of the sense of agency. Incongruent proprioceptive cues could be less damaging for SoA than, for example, the sense of ownership. In their study on alien motion in anthropomorphic VR avatars, Soccini et al..(2019) report significant loss of the sense of ownership in the participants without significant reduction of SoA. Since we did not attempt to induce an illusion of ownership in the participants (in addition to a shift in self-location), we can only focus on SoA.

We should also note the difference between the braking distances in the ActM condition and all other conditions. We did not formulate any hypotheses for braking distances, and yet this effect could be useful for explaining other results. The difference in braking distances arises from the variation in the moments when a participant released the button after the sound was presented. In the ActM condition, the participants reacted to the sound faster than in the ActVR condition. But why was this the case? Perhaps the participant reacted to the sound with a longer delay in VR because they could not see their hand on the mouse. Nevertheless, the participant did not have to look for the button: all they had to do was release it. We believe the most probable explanation is based on the weakened SoA when using VR. This is likely due to the mismatch of visual and vestibular sensations and other factors contributing to VR sickness, causing participants to experience less control over the robot and react slower to events accompanying robot control. Despite the slightly (the mean difference was 2.3 cm) reduced distance, the subjective estimates were significantly greater in the ActM condition than in the PasM condition. We believe that the unintended difference in braking distance further supports our explanation of the effect.

4.5 Application and limitations

The results we obtained have several potential applications. Primarily, the replication of the difference between distance estimates in the Active and Passive conditions suggests that this phenomenon can be utilized to study the sense of agency in vehicle remote control. SoA research in this area is important, since it could allow us to find solutions for SoA enhancement in robot operators. The problem of SoA reduction is especially pressing when a robot is partially autonomous or even fully autonomous with the possibility of human intervention. The preservation of SoA could improve the user experience of the operator and encourage them to take responsibility for the events happening to the robot.

Secondly, if our explanation of the lack of effect in conditions with VR is correct, this negative result has implications for organizing remote vehicle control with VR technologies. VR sickness could be more than inconvenience for the user: it could also reduce implicit sense of agency, which may have dreadful consequences in high-risk environments. This, in turn, implies that causes and the ways of preventing VR sickness should be studied further.

Finally, let us consider the limitations of our study. Although we asked the participants not to use spatial landmarks, we could not objectively control their compliance with this instruction. We also could not check whether they estimated distance or time, which they could convert into distance. These limitations are largely inherent in the procedure itself: we conducted the experiment in the real world, where the environment usually contains eye-catching features. Time estimates could not be ruled out because the movement of the robot was temporally extended. In this sense, any experimental procedure relying on time or distance estimates in vehicle control has the same weakness: the passage of time goes hand in hand with covering distance, so spatial and temporal dimensions cannot be completely separated.

While a participant controlled the robot in the active conditions, the robot would stop automatically after the sound even if the participant continued to hold the key. Participants tended to release the key before the automatic braking, but in some trials the robot stopped while the key was still being pressed. To them, we explained automatic braking as a safety feature, but it could have influenced the sense of control of the participants. Automatic braking gave us guarantees that the distance covered by the robot would not be significantly larger in the Active conditions. Nevertheless, as the track had finite length, it was reasonable enough to implement automatic braking one way or another, and so the “safety” explanation was valid.

It also should be noted that we used the VR headset in a very straightforward way, without capitalizing on some of its possibilities. For example, we could have upgraded the robot so the camera would rotate as an operator turned their head. This feature may have enhanced the experience of the operator by reconciling visual and proprioceptive cues at least during camera rotation. However, with the implementation of this feature, a different experimental procedure would have been needed. If a participant could rotate the camera, they would be more inclined to use landmarks to estimate the distance.

5 Conclusion

In this study, we investigated subjective distance estimates of participants in robot remote control. Participants controlled the robot either by using a VR headset or a monitor and estimated the distance traveled by the robot before a signal. In the conditions when the image was transmitted through the monitor, we obtained a significant difference in estimates between the condition where a participant was controlling the robot and where they were passively observing the robot's movement. This result suggests that distance estimates can serve as an implicit indicator of the sense of agency when controlling a remote vehicle. However, the effect was not reproduced in the VR conditions, which did not differ from conditions with the monitor in terms of procedure. We explain this result by appealing to VR sickness in driving simulation, which could arise because of sensory discrepancy between visual, vestibular, and proprioceptive cues in VR. At the same time, VR sickness can be induced by other factors, which need to be studied further. We believe that VR sickness could have suppressed the sense of agency in VR control in our experimental design.