1 Introduction

Post-disaster scenarios result from natural events such as earthquakes, hurricanes, or attacks, which generate devastated areas. These events also produce high economic losses and, above all, the death of people trapped by the structures. The search and rescue brigades that carry out initial explorations in these environments to identify trapped victims pose a high risk to their physical integrity. (Wannous and Velasquez et al. 2017).

Recent advances in robotics and locomotion systems (legged-manipulator robots) have shown significant progress in addressing unstructured terrain and carrying out explorations in rustic conditions (Biswal and Mohanty 2021). In this context, search and rescue robotics has played a leading role in the last two decades, providing support to rescue brigades in catastrophic events such as: in the United States (Twin Towers—2001) (Blackburn et al. 2022; Murphy 2014), Japan (Fukushima—2011) (Eguchi et al. 2012), Italy (Amatrice—2016) (Kruijff et al. 2016), Mexico (2017) (Whitman et al. 2017).

As the next phase in this context, immersive technologies, such as Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR), have shown great potential as a human–robot interface in recent years. VR immerses users in virtual environments, while AR enhances the real world with digital elements. MR seamlessly integrates virtual and physical elements, enabling virtual objects to interact with and respond to the real world. VR disconnects users from reality, AR enhances it, and MR combines the two, providing a diverse range of immersive experiences (Milman 2018). This has allowed the operator to manage processes and control complex robotic systems to develop remote missions (Peppoloni et al. 2015; Walker et al. 2019; Makhataeva and Varol 2020).

Within the state-of-the-art, few solutions have been proposed for tele-operation with immersive interfaces of legged-manipulator type robots due to their kinematic complexity with many degrees of freedom (n >  = 18 in most cases), highlighting the work carried out by the different research teams, such as(Klamt et al. 2019; Zhou et al. 2022), as well as the one developed by the authors (Ulloa et al. 2023; Ulloa et al. 2022). A series of challenges are posed regarding the full autonomy of these robotic systems, especially with tasks involving contact and movement of elements in the environment. Therefore, immersive interfaces present a viable high-level management alternative.

This work’s main contribution is highlighted in its comparison of MR and VR-based technologies for teleoperating legged manipulator-type robots. Beginning with an experimental phase for data collection involving various operators, the study analyzes the context both at the mission level and from the operator’s perspective. The analysis compares advantages and disadvantages qualitatively and quantitatively through metrics proposed by the authors to assess the performance of each type of interface. Additionally, a discussion on threats to validity, identify, discuss, and mitigate the possible factors that could compromise the validity of the results and conclusions of this work (Wohlin et al. 2012; Romano et al. 2020).

An experimentation phase has been developed to carry out this comparison using the two types of interfaces MR (with the Hololens2) and VR (with the HTC-Vive) and the ARTU-R robot (A1 Rescue Task UPM Robot), a Quadruped robot equipped with a 6DoF manipulator. In a previous phase of this development, the kinematic study of the ARTU-R was performed, as well as its tele-operation using MR methodology; this methodology and results are available in the works (Ulloa et al. 2023; Ulloa et al. 2022). In this complementary phase, an immersive interface based on VR has been developed, seeking to contemplate manipulation scenarios that were not achieved with the first remote operations.

The test phase was carried out on reconstructed indoor and outdoor stages with ten participants, qualifying both interfaces. It should be noted that within the state-of-the-art, no comparison has been found between MR–VR interfaces focused on robotics, much less legged-manipulator-type robots, which is why the proposal in this article is novel.

The main results have shown that this type of interface is versatile for controlling complex robotic systems with a high number of degrees of freedom, mainly thanks to the immersive experience that they provide to the operator, which increases their confidence level for decision-making. On the other hand, each interface has shown to stand out in particular situations; thus, the one based on VR is better in remote tele-operation situations, while MR is for on-site operations.

The document’s structure begins with the contribution of previous works related to Sect. 2. Section 3 details the materials and methods used. Section 4 shows the main results. Finally, Sect. 5 demonstrates the conclusions and future work.

2 Related works

Immersive interfaces (VR–MR) have shown relevant advances in the last decade. They have shown great potential to improve the user experience in process control and remote tele-operation of robotic systems, facilitating the user to control complex robotic systems in a versatile way, such as natural language or joysticks (Walker et al. 2019; Makhataeva and Varol 2020; Mart´ın-Barrioet al. 2020; H¨onig et al. 2015).

Most of the developments focused on robotics have focused on using immersive interfaces to view in the first-person (remote cameras) and, above all, to program sequences in the industrial field (Stadler et al. 2016; Akan et al. 2011; Pettersen et al. 2003) or train operators (Rold´an et al. 2019; Hidalgo et al. 2020).

Within the state-of-the-art, taxonomies and comparisons focused on human–robot interaction have been developed (Walker et al. 2022; Chang et al. 2022; Williams et al. 2018). However, relevant works that focus on analyzing and comparing VR and MR related to field robotics have not been found, much less on legged-manipulator-type robotics.

This section analyzes previous studies on Teleoperation using MR–VR interfaces for legged-manipulator robots and their approach to field applications.

2.1 Mixed and virtual reality for robotics control

The interaction between mixed reality systems and robots is usually developed through Robot Operating System (ROS) due to its versatility for node and topic management 3D systems, allowing the combination of commercial software such as Unity3D or Unreal Engine with ROS development software (Wassermann et al. 2018).

One of the most recent and relevant works in tele-operation using a mixed reality system shows a framework to control robots with Hololens. This work is limited to establishing any destination point in a laboratory indoors and structured environment (Ostanin et al. 2020).

Mixed Reality Techniques applied to mobile-robot control are mainly based on interactive hand pointers (Chakraborti et al. 2018; Krupke et al. 2018), eye gestures, and head tilting to define Way-Points (Park et al. 2021; Liang et al. 2019). Other applications are limited to the manipulator simulation part, considering the kinematic and dynamic parameters (Sha et al. 2019). Considering the mixed reality path planning area, the proposed methods develop visual interfaces on the Hololens that display information from 2D maps of the environment and trajectories without direct interaction with the robot (Mulun et al. 2018, 2020). There are few developments with direct interaction through the assignment of waypoints.

Some approaches have focused on the selective presentation of data to the operator to facilitate the tele-operation of the robot (Livatino et al. 2021). However, they are limited to presenting RGB images (Kot et al. 2018).

Currently, technologies such as VR are used similarly for field developments. The main advantage of this type of technology is that it allows the operator to be kept in a safe area. However, in high-precision operations, execution times are usually higher due to the uncertainty generated in the operator by not being in the field of operation (Milman 2018). In (Mart´ın-Barrio et al. 2020; Martín-Barrio et al. 2020), it is shown how using interfaces based on virtual reality allows executing tele-operation tasks with complex robots (high number of degrees of freedom).

2.2 Legged-manipulator robots tele-operation using MR–VR interfaces

Legged robots are becoming increasingly popular for search and rescue applications due to their ability to navigate rugged terrain and their versatility in various environments. One of the main advantages of quadruped robots is their ability to maintain stability and balance even when moving over uneven or unstable surfaces. This makes them particularly useful for search and rescue operations, where they can access areas that may be too dangerous or difficult for human rescuers to reach (Biswal and Mohanty 2021). In addition, using immersive interfaces such as VR/MR can positively influence by contextualizing better the mission details by allowing operators to experience the environment from the robot’s perspective and make more informed decisions about the best course of action (Ulloa et al. 2023; Humphreys et al. 2022).

The most notable developments within the state-of-the-art include the tele-operation of a centaur-type robot through VR and a complex and expensive feedback system that allows Full-body telepresence of the operator (Klamt et al. 2019). In this same sense, in Zhou et al. (2022), the importance of the intervention of this type of robot tele-operated by VR to keep the operator safe is highlighted.

Regarding the MR interfaces, the developments focus on high-level robotic assembly control as in Ulloa et al. (2023); Quesada and Demiris 2022). While some work focuses on analyzing Hololens 1, 2 glasses and conventional interfaces to carry out manipulation areas, as in the study performed by the authors in Ulloa et al. (2022); Quesada and Demiris. 2022).

Table 1 compares the main developments within the state-of-the-art related to the proposed work. It should be noted that most of these developments focus on the applicability of robots for inspection and remote teleoperation in the environment. In the work by Martin et al., a preliminary analysis of such interfaces is proposed compared to conventional interfaces, but a comparison between VR and MR is not established.

Table 1 Comparison of the main developments on the art stage for managing robotic missions using immersive technologies

Moreover, these technologies have significant applicability in the industrial domain, targeting a component of repetitive programming for industrial robots confined to a work cell (Zhang et al. 2023; Luu et al. 2024).

3 Proposed methods

3.1 Virtual and mixed reality setup

3.1.1 Human–robot interaction elements

Specific equipment has been used to evaluate the versatility of both types of immersive technologies. The HTC-Vive (VR) glasses and the Hololens2 (MR) glasses have been used as human–robot interaction elements in the hardware system. At the same time, the test robot was ARTU-R (from Unitree company), equipped with a 6 DoF manipulator. The selection of the robot type employed in this research was predicated on the locomotion capabilities intrinsic to quadrupeds, allowing them to navigate complex locomotive scenarios across diverse terrains. This choice was favoured over conventional locomotion systems such as wheels or tracks; Table 2 provides a qualitative overview of the key comparative metrics, highlighting the advantages of utilizing these robots (Bruzzone and Quaglia 2012; He et al. 2019).

Table 2 Description of the locomotion systems parameters for unmanned ground robots (UGVs)

Figure 1a shows the operator with the Hololens glasses and the ARTU-R quadruped robot in an outdoor setting. While Fig. 1b shows a part of the virtual environment where the ARTU-R model has been placed and the operator’s perspective, along with his hand taking the end of the manipulator, this Figure also shows the operator with the HTC-Vive glasses in the control room (the robot is outdoors). This experimental development was conducted in the UPM facilities’ Center for Automation and Robotics, where different objects are available to recreate a post-disaster scene. The stage has been reconstructed to recreate a post-disaster scene, including obstacles, mannequins, and people who have pretended to be victims. During the test, people remained partially covered, showing only some limbs, heads or legs.

Fig. 1
figure 1

Operator using the VR–MR glasses next to the ARTU-R robot

Executing tasks through a virtual environment has clear disadvantages, such as losing visibility due to occlusions and limitations due to the visual field.

For this reason, the auxiliary robot Wall-e (tracked type) has been used, which provides an external image perspective that covers part of the environment and ARTU-R. This view makes it easier for the operator to execute tasks and make decisions from an external perspective. It is also noteworthy that in the case of MR, this problem may arise in instances of loss of direct line of sight visibility. The rest of the components and sensors used in this work are detailed in Table 3.

Table 3 Elements and robots used for development

3.1.2 Virtual and mixed reality interfaces

The interfaces were developed using the Unity3D v20.03 Software and the HoloToolkit (MR system) packages, SteamVR (VR system), and the RosBridge libraries for communication with robotic systems.

Both interfaces have been designed, starting from the simulated ARTU-R robot as a common element in such a way that in both cases, the operability and familiarization of the operator with the virtual elements are common, and it is possible to have a comparison with reasonably approximate conditions related to the interfaces.

In this way, the virtual robot controls the real robot’s target positions by manipulating its virtual end effector. For this, a blue ball has been incorporated into the virtual robot at the arm gripper level. For the virtual manipulation phase, the hand positions are respectively captured by the Leap Motion (VR). Interaction (grasping and collisions) between the virtual hand and other elements of the virtual environment was carried out using Interaction Behaviour libraries. In the case of MR, interaction between hands and virtual elements is performed through the gesture called Airtap, captured by the HoloLens, which involves closing the index and thumb fingers while positioning the central projection point of the HoloLens, referred to as Raycast, on the virtual object. Based on each end-effector position given by the blue ball, the inverse kinematics of the manipulator are calculated, defining the positions of each degree of freedom, which are subsequently sent to the real robot controller through RosBridge. The operator can manipulate and move to spatial positions to move the robotic arm (these are limited by the kinematics and the robot’s work field). On the other hand, the base of the quadruped robot is tele-operated by a joystick.

The first interface developed for this work is shown in Fig. 2a, where the operator’s first-person view through the HTC-Vive glasses is shown. The interface is designed to provide the operator with all the elements for tele-operation and front screens to receive visual information from the robots.

Fig. 2
figure 2

Interfaces and elements used in the tele-operation of the robotic set

Figure 2a shows the VR interface elements: A. ARTU-R first-person view screen. B. External view of Wall-e robot screen. C. Virtual model of ARTU-R, the manipulator has the described blue ball at the end-effector that the operator can move to send target positions to the real robot. D. Virtual hands that reflect the behavior of hands, in reality, captured from the leap-motion sensor attached to the HTC-Vive glasses. E. A virtual control panel is used to carry out actions such as gripper opening-closing, positions defined as Home, etc.

On the other hand, Fig. 2b shows the first-person view captured from the Hololens2 glasses in an indoor setting. Here both the real robot in black and the virtual robot superimposed in grey, virtual buttons for the execution of actions and elements of the environment, such as a mannequin on the ground.

The MR interface and previous work on kinematic modelling and control of the robotic assembly have been used to contrast the proposed method and establish the comparison. It is better detailed in the articles (Ulloa et al. 2023; Ulloa et al. 2022).

3.1.3 Connection between subsystems

Figure 3 shows the connection diagram between subsystems. This Figure schematizes the different groups involved at the macro level in tele-operation, centralized into three groups—the first corresponding to both types of glasses, which allow the development of human–robot interaction. The exchange between HTC-Vive and Hololens has been represented with a switch, indicating that either can be used in the tele-operation of the robotic set.

Fig. 3
figure 3

Subsystem connection layout

The second group corresponds to the command station, which links the VR–MR environments developed in Windows and the control signals to the robot. It consists of an MSI 1660Ti computer with an Intel i7, 10th Gen processor and Windows 10 for the operation of Unity 20.03.

The third group corresponds to the robotic set. Here, the position and speed commands for executing movements are received. It is controlled by an Nvidia Jetson Xavier-Nx card with Ubuntu 18 and ROS Melodic.

The communication between groups two and three uses RosBridge to manage the bidirectional flow of information between Windows (Unity) and Ubuntu (ROS).

3.2 Evaluation–performance metrics

3.2.1 Experiments and participants

The experiments phase consisted of completing a pick-and-place task of a first aid team from point A to point B, using the ARTU-R robot and the MR-VR interfaces.

The stages considered relevant during the testing process are:

  • Movement of ARTU-R towards point A.

  • Manipulation of the robotic arm to capture the object through the interface elements.

  • Displacement of ARTU-R towards an area close to point B.

  • Placement of the object at point B

Ten operators participated in the testing phase and repeated each test fifteen times with each interface, which allowed them to obtain high-confidence results. After completing the tests, the participants completed a survey in which they rated the variables (described in the next section) on a scale from 0 to 10. A randomized order of modality presentation was included to mitigate the potential bias associated with the order effect. Each participant performed the task multiple times with both MR and VR modalities, with the order of presentation randomized for each participant. This randomized approach helps to balance out any potential systematic effects related to the order of modality exposure, ensuring that the comparative results are valid and reliable.

The participants, who had an average age of 28, consisted of men and women with knowledge of immersive interfaces and medium-sized robotic systems.

3.2.2 Evaluation metrics

Two types of evaluation have been carried out: one related to the performance operator and another related to the performance of the interfaces. In the first case, different variables related to the user experience during the execution of missions have been evaluated based on the metrics of the NASA-TLX workload questionnaire, which is used in the context of robotic missions (Hart 2006). On the other hand, the variables related to the interfaces seek to be measured in percentage terms in terms of different aspects related to their use. The measured variables are shown in Table 4.

Table 4 Variables for analysis of operator and interface performance

Equations (1) and (2) are specifically aimed at evaluating operator performance. Equation (1) calculates a score representing the operator’s performance when using the VR interface, while Eq. (2) computes a similar score for the MR interface. These scores are derived from a combination of different variables such as user preference, safety feeling, confidence for decision-making, immersive experience, physical effort, frustration experienced, and others, as outlined in Table 4.

Equations (3) and (4), on the other hand, are proposed for analyzing the score obtained for the VR and MR interfaces, respectively. These equations assess the interfaces’ performance based on specific criteria such as latency, interaction with virtual objects, autonomy, equipment weight, space required, covered field of vision, and security for the operator, among others.

$$Score_{op} \left( {VR} \right) = \beta_{1} + \gamma_{1} + \delta_{1} + \varepsilon_{1} + \theta_{1} + (100 \, {-}\phi_{1} ){-}\alpha_{1} * 10$$
(1)
$$Score_{op} \left( {MR} \right) = \beta_{2} + \gamma_{2} + \delta_{2} + \varepsilon_{2} + \theta_{2} + \left( {100 \, {-}\phi_{2} } \right){-}\alpha_{2} * 10$$
(2)
$$Score_{{\text{int}}} \left( {VR} \right) \, = \alpha ^{\prime}_{1} + \beta ^{\prime}_{1} + \gamma ^{\prime}_{1} + \delta^ \prime_{1} + \varepsilon ^{\prime}_{1} + \theta ^{\prime}_{1} + \phi ^{\prime}_{1}$$
(3)
$$Score_{{\text{int}}} \left( {MR} \right) = \alpha_{2}^{\prime } + \beta^ \prime_{2} + \gamma ^{\prime}_{2} + \delta ^{\prime}_{2} + \varepsilon ^{\prime}_{2} + \theta ^{\prime}_{2} + \phi ^{\prime}_{2}$$
(4)

The relationships described by Eqs. (1) and (2) demonstrate direct proportionality between their elements, where the coefficients are evaluated on a scale from 0 to 100. However, it’s important to note an exception with α: representing the number of training sessions required to achieve task execution success exceeding 95%. Consequently, it is scaled by a factor of 10 to underscore its significance. As for the frustration parameter ϕ, its quantification is based on the complementary value relative to 100. This approach ensures a comprehensive assessment of both the training requirement and frustration levels within the context of the study.

The equations proposed were established through a systematic process aimed at quantitatively assessing the performance of operators and interfaces in tele-operating legged-manipulator robots using MR and VR technologies. Identifying Relevant Variables, especially those involved directly in evaluating operator performance and interface effectiveness in tele-operation tasks. A literature review was also conducted to understand the factors influencing operator performance and interface usability in tele-operation tasks. Finally, the equations were formulated by combining the selected variables with their assigned weights to create a quantitative framework for evaluating operator performance and interface effectiveness in tele-operation tasks with legged-manipulator robots using MR and VR technologies.

3.2.3 Proposed analysis for use in search and rescue missions

The previous section allows for a general comparison of the VR–MR interfaces; however, for more specific cases, such as search and rescue scenarios, greater weight should be given to some variables that consider the operator’s integrity. This way, in percentage terms, variables such as γ (Safety feeling), δ confidence for decision-making, and θ physical effort are increased. In contrast, variables like α training evolution are not considered influential in these cases. In this context, the new Eqs. (5) and (6) are proposed for operator performance analysis scenarios in tele-operation tasks in search and rescue environments applied to robotic systems.

$$Score_{op} \left( {VR} \right) = \beta_{1} + 2 * \gamma_{1} + 2 * \delta_{1} + \varepsilon_{1} + 1.5 * \theta_{1} + (100-\phi_{1} )$$
(5)
$$Score_{op} \left( {MR} \right) = \beta_{2} + 2 * \gamma_{2} + 2 * \delta_{2} + \varepsilon_{2} + 1.5 * \theta_{2} + \left( {100-\phi_{2} } \right)$$
(6)

The weights assigned to each variable were determined based on their perceived importance in influencing operator performance and interface effectiveness. The equations were weighted accordingly to reflect their relative importance in the evaluation process. These coefficients are determined based on factors such as literature review, and empirical data collected during the experimental phase. Overall, these equations provide a systematic and quantitative framework for evaluating the performance of operators and interfaces in tele-operating legged-manipulator robots using the proposed method. Variables deemed more critical or had a greater impact on the overall evaluation process were assigned higher weights.

4 Experiments and results

4.1 Experiments

4.1.1 VR–MR task execution

Results of the testing phase are available in the videos of the linksFootnote 1 and Footnote 2

Figure 4 shows part of the manipulation process carried out by both interfaces to take the objects of interest at point A.

Fig. 4
figure 4

Manipulation and delivery tasks experiments using VR–MR interfaces

Figure 4a and b show the process corresponding to manipulation through the VR interface. Figure 4a corresponds to the first-person view of the HTC-Vive headset, showing the operator’s hand taking the blue ball to position and orient the ARTU-R manipulator. In this case, the operator has visual feedback through the front screen thanks to the perspective the Wall-e robot gives, shown in Fig. 4b.

Figure 4c and d correspond to the interaction with the MR interface. Figure 4c shows the first-person view of the viewer through the Hololens2 glasses and the gesture of the operator holding the interaction element (blue ball) and moving it towards the object of interest. On the other hand, Fig. 4d shows an external view of the scene: ARTU-R, the operator and several victims on the scene.

4.1.2 Events evolution

Figure 5 illustrates the progression of states and events during executing the task outlined in the experiments. The graphical representation (red and black) depicts the evolution from the initial positioning of ARTU-R in zone A near the target object to its placement in zone B. Two graphs are presented: one in red (scale: left vertical axis) representing the arm’s distance from the first aid object, and another in black (scale: right vertical axis) showing the distance travelled by the robot in the environment.

Fig. 5
figure 5

Representative temporal evolution of the states to execute the task of handling and delivering an object

Fundamental interface interactions are highlighted in blue, emphasizing events that trigger the extrapolation of actions from the virtual world to the Robot in physical space. Within the first five seconds, the first contact of the operator’s hand with the manipulation element (blue ball) is established to move the arm towards the area near the object of interest. Subsequently, at t = 25 s, the gripper is actuated to close and pick up the object. The following interaction occurs at t = 63 s, where the gripper opens to release the object. During these two stages, the quadruped robot is in the transportation phase. The time scale in this Figure represents the average time intervals for these events across both interfaces.

4.2 Operator evaluation

Figure 6 shows the training curve in the percentage of success of the completed task vs the number of attempts made by the operators for both types of interfaces. As expected, when the operators have not had any training, the task is not 100% complete, and as the operator’s skill develops, this percentage increases remarkably. This growth is not uniform in both cases. On the one hand, according to the experiments with an average of seven times, the operators can complete the task with a success of more than 95% with the VR interface. In contrast, this process takes longer, reaching an average of eleven times.

Fig. 6
figure 6

Evolution of the task completed percentage as a function of the number of operator training sessions

This preamble has been considered the first point (having trained operators) for evaluating the different variables when executing the proposed manipulation task with a legged-manipulator-type robot. Figure 7 shows the boxplots of the different variables analyzed to evaluate operator performance.

Fig. 7
figure 7

Evaluation of operator metrics for VR–MR systems based on experimentation

Among the most notable differences between the variables is an explicit user preference for the VR-based interface. This is directly related to the immersive experience; this type of interface provides, inversely, the frustration experienced.

In both cases, the level of effort and confidence in making decisions are similar. However, the feeling of security that a VR interface provides by placing the operator in a safe area is very noticeable in the evaluation, with a high difference of around 44%.

Statistical analysis revealed significant differences in several key metrics between the VR and MR interfaces. Firstly, the success rates of completing tasks differed significantly between the two interfaces, with the VR interface exhibiting a higher success rate on average (VR: x = 95%, MR: x = 80%, p < 0.05). Additionally, user preferences for the VR interface over the MR interface were statistically significant, indicating a clear preference among participants for the immersive experience offered by VR (VR: x = 8.5, MR: x = 4.2, p < 0.001).

Moreover, no significant differences were found in effort levels and confidence levels between the two interfaces (Effort: p = 0.312, Confidence: p = 0.215), suggesting comparable user experiences in terms of mental and physical workload. However, the feeling of security provided by the VR interface was statistically significantly higher than that of the MR interface (VR: x = 9.2, MR: x = 6.7, p < 0.01), highlighting the importance of this aspect in operator satisfaction and performance.

The results referring to the analysis of the Eqs. (1) and (2) have shown values of Scoreop(V R) = 450 and Scoreop(MR) = 245, which shows that interfaces based on Virtual Reality are better than those based on Mixed Reality to tele-operate robotic systems.

On the other hand, and more specifically, the analysis of the Eqs. (5) and (6), whose variables were more oriented to prioritizing the integrity of the operator in a case of robot tele-operation in post-disaster environments, have given as results respectively Scoreop(V R) = 747.5 and Scoreop(MR) = 624.5. These results show that by prioritizing critical variables such as operator safety and confidence in decision-making, the operator can better perform the task with the VR-based interface.

4.2.1 Usability evaluation

To evaluate the usability of Mixed Reality (MR) and Virtual Reality (VR) interfaces for teleoperating legged-manipulator robots in the SAR context, methodologies outlined by Bradley and Lang (1994) and Brooke et al. (1996) were employed to measure affective responses and perceived usability by users.

Using a validated scale to measure critical parameters such as latency, physical effort, immersive experience, and decision-making confidence, the study obtained usability scores of 78.5 for VR and 72.3 for MR. These results indicate higher usability for the VR interface, particularly in remote operations, due to its effectiveness and safety, while the MR interface demonstrated good usability in on-site operations due to better visualization and support in decision-making.

Additionally, the Self-Assessment Manikin (SAM) was utilized to evaluate operators’ emotional responses during their interaction with the MR and VR interfaces. SAM, a nonverbal pictorial assessment technique, revealed a high correlation between rating methods in terms of experienced pleasure and felt arousal. This suggests that the VR interface is not only more usable but also more pleasant and less stressful for operators compared to the MR interface.

Statistical analysis revealed significant differences in physical effort and immersive experience metrics (p < 0.05), validating the superiority of VR in critical teleoperation environments. These differences were also reflected in emotional evaluations, where the VR interface scored higher in terms of pleasure and lower in stress.

4.3 Interfaces evaluation

Figure 8 shows a radial diagram of the analysis of the average values obtained for the different items referring to the evaluation of the interfaces. The most notable items have to do with those related to the non-in-situ operation, such as autonomy. On the one hand, the HTC-Vive glasses, being directly connected to the electric current, have no use limitation compared to Hololens2. In the same way, the security they provide to the operator when remote is more noticeable in this type of interface. Parameters such as latency or interaction with virtual objects are pretty similar.

Fig. 8
figure 8

Parameter comparison between VR–MR systems focused on system performance

According to the Eqs. (3) and (4) proposed for the interrelation of these variables, the following values Scoreint(V R) = 645 and Scoreint(MR) = 597 have been obtained, where it is evident that there is not a very wide difference between both types of interfaces since they allow the task of robotic manipulation to be carried out efficiently. However, remote scenarios are a turning point to be highlighted and a clear improvement for the VR interfaces.

The statistical analysis shows significant differences between the two interfaces in terms of equipment weight (p < 0.05) and security provided for the operator (p < 0.01), with MR scoring higher in equipment weight and VR performing better in security provision. However, no statistically significant differences were found for latency, interaction with virtual objects, space required for use, or covered field of vision (p > 0.05).

4.4 Lessons learned

Given its precarious conditions and terrain uncertainty, tele-operating this complex legged-manipulator-type robotic platform is usually the best option in a post-disaster scenario. It has been shown that immersive interfaces are a viable and mature enough alternative to control complex robotics configurations (Mart´ın-Barrio et al. 2020; Martín-Barrio et al. 2020; Ulloa et al. 2023). Based on the experience gained from work carried out with this type of robots and interfaces, several points can be raised that may be interesting for the state-of-the-art and, above all, to consider avoiding both human and robotic damage.

The first and most noteworthy thing is to prevent the robot from colliding. For this, as shown in Fig. 9, it can be seen in the bar diagrams that inexperienced operators tend to generate a high number of collisions, regardless of the type of collision. Interface, so it is advisable to recreate a complete simulation and mission environment (including a digital twin) to prevent crashes and economic losses. The Figure also shows that despite specific training, the robot may fall due to its variables or those of the environment that often cannot be controlled. On the other hand, Fig. 9 shows a blue interlined curve, which corresponds to mission execution time. This is relative to operator training.

Fig. 9
figure 9

Evolution of falls and time required to complete the task based on the training performed

Other interesting points that can be listed to achieve adequate functionality of the mission are:

  • Carry out very soft hand movements during tele-operation, moving the virtual model very slowly, at the risk of avoiding targets that condition the controller and generate overcurrents.

  • Limiting the linear and angular velocities of the robotic assembly is a relatively simple solution to unforeseen sudden movements.

  • The autonomy in this type of quadruped robot is usually limited, even more so when a robotic arm is incorporated as payload, for which it is advisable to work on 25% of the total. Below this percentage, it is better to change the batteries.

  • Train operators in mission simulation and recreation environments considering the most significant number of parameters or eventualities that can be had in an actual situation.

Table 5 shows a thorough comparative analysis of related works about immersive interface technologies, particularly Mixed Reality and Virtual Reality. These inquiries delve into the ramifications of diverse display configurations, user interactions, and performance assessments across various applications, encompassing collaborative endeavours, surgical simulation, and engineering design assessments; different applications have been selected, given the lack of comparative works related to the field of robotics and immersive technologies. By delineating each study’s principal findings and methodological frameworks, alongside incorporating empirical data, this comprehensive presentation aims to elucidate the contemporary overview of research in immersive interface technologies. Such insights are instrumental in discerning the current state-of-the-art and the prospective applications and implications of these emerging technologies.

Table 5 Comparison of Immersive Interface Studies in MR and VR Technologies

4.4.1 Threats to validity discussion

This work addressed potential threats to validity in the comparative usability analysis of MR and VR interfaces for teleoperating legged-manipulator robots in the SAR context.

Internal validity threats were addressed by selecting participants with equivalent levels of expertise and providing standardized training sessions. This approach ensured consistent baseline competencies. The time required to achieve a 95% task success rate averaged seven sessions for the VR interface and eleven sessions for the MR interface, demonstrating controlled and uniform learning processes across participants.

External validity was enhanced by involving a diverse group of ten participants and utilizing industry-standard hardware (HTC-Vive for VR and Hololens2 for MR). The study’s experiments were conducted in both indoor and outdoor reconstructed environments, closely simulating real-world SAR conditions. Consequently, the findings, including usability scores of 78.5 for VR and 72.3 for MR, are robust and applicable to various SAR contexts.

Construct validity was ensured through validated measurement scales such as the NASA-TLX and the Self-Assessment Manikin (SAM) to accurately gauge usability and operator performance. The SAM results, indicating higher pleasure and lower stress for VR, were corroborated by objective task performance metrics, reinforcing the reliability of the findings.

Conclusion validity was secured through rigorous statistical analyses, identifying significant differences in key metrics such as physical effort and immersive experience (p < 0.05). The sample size was adequate to provide statistical power and robust conclusions. User preference data (VR: mean = 8.5, MR: mean = 4.2, p¡0.001) were strongly supported by comprehensive analysis.

5 Conclusions

This article compares virtual and mixed reality systems and lessons learned for controlling legged manipulator robots. It addresses experiences and uses cases for each system, focusing on search and rescue missions.

The emerging technologies in the field of quadruped robots and systems based on virtual-mixed reality have shown to be capable of addressing and providing solutions to complex problems, such as manipulation in unstructured environments from the Hardware-Software integration of quadruped robots and interfaces immersive.

The Virtual Reality-based interface has proven to be the best option (effective and provides security) to address remote operations, such as search and rescue missions. This kind of mission implies a high degree of risk for the operator (located remotely) due to the instability of the terrain and structures. One of the VR system’s strong points is its interaction with virtual objects. On the one hand, it has a specific system for reading the operator’s gestures and fixed anchors for precisely reading the operator’s movements.

The interface based on mixed reality has shown to be better at helping the user with decision-making and reducing stress levels by having the operator in situ next to the robot in the field since it covers a greater field of visibility and increases its trust. However, its use is limited to operations in limited distance ranges.

The percentage of success of the test missions has shown a clear improvement based on operator training, reducing: on the one hand, the time in the execution of the task, as well as the robot falls-collisions.

In future work, it is proposed to carry out tests in areas located at a greater distance from the operator and the remote virtual reality system to evaluate the use of a 5G network.