An empirical evaluation of two natural hand interaction systems in augmented reality

Serrano, Ramiro; Morillo, Pedro; Casas, Sergio; Cruz-Neira, Carolina

doi:10.1007/s11042-022-12864-6

An empirical evaluation of two natural hand interaction systems in augmented reality

Open access
Published: 11 April 2022

Volume 81, pages 31657–31683, (2022)
Cite this article

Download PDF

You have full access to this open access article

Multimedia Tools and Applications Aims and scope Submit manuscript

An empirical evaluation of two natural hand interaction systems in augmented reality

Download PDF

Ramiro Serrano¹,
Pedro Morillo²,
Sergio Casas ORCID: orcid.org/0000-0002-0396-4628² &
…
Carolina Cruz-Neira³

2194 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Human-computer interaction based on hand gesture tracking is not uncommon in Augmented Reality. In fact, the most recent optical Augmented Reality devices include this type of natural interaction. However, due to hardware and system limitations, these devices, more often than not, settle for semi-natural interaction techniques, which may not always be appropriate for some of the tasks needed in Augmented Reality applications. For this reason, we compare two different optical Augmented Reality setups equipped with hand tracking. The first one is based on a Microsoft HoloLens (released in 2016) and the other one is based on a Magic Leap One (released more than two years later). Both devices offer similar solutions for the visualization and registration problems but differ in the hand tracking approach, since the former uses a metaphoric hand-gesture tracking and the latter relies on an isomorphic approach. We raise seven research questions regarding these two setups, which we answer after performing two task-based experiments using virtual elements, of different sizes, that are moved using natural hand interaction. The questions deal with the accuracy and performance achieved with these setups and also with user preference, recommendation and perceived usefulness. For this purpose, we collect both subjective and objective data about the completion of these tasks. Our initial hypothesis was that there would be differences, in favor of the isomorphic and newer setup, in the use of hand interaction. However, the results surprisingly show that there are very small objective differences between these setups, and the isomorphic approach is not significantly better in terms of accuracy and mistakes, although it allows a faster completion of one of the tasks. In addition, no remarkable statistically significant differences can be found between the two setups in the subjective datasets gathered through a specific questionnaire. We also analyze the opinions of the participants in terms of usefulness, preference and recommendation. The results show that, although the Magic Leap-based system gets more support, the differences are not statistically significant.

Hand Interaction Toolset for Augmented Reality Environments

Bridging the Physical and Virtual Worlds: A Hand Tracking Gesture Recognition System for XR Applications

Touch and hand gesture-based interactions for directly manipulating 3D virtual objects in mobile augmented reality

Article 22 February 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Augmented Reality (AR), as defined by Azuma [7], is a computer-based technology that simultaneously combines real and virtual objects which are registered in 3D and are interactive in real-time. This widely accepted three-feature definition is directly related to the three major technical challenges that AR faces. The first one is the meaningful combination of virtual and real objects, which is called the visualization problem. A solution to this problem requires to build display devices that are able to draw virtual objects over a view of the real world (or onto the real world itself as in spatial AR [9]). In order to accomplish this, a realistic and seamless combination of virtual and real objects is necessary. The second challenge is the registration of virtual objects within a real 3D world, which is called the registration problem. This problem requires tracking the user movements so that the application can identify where the objects of the real world are located with respect to the user. The last challenge is to achieve an interactive application. This is closely related to the interaction problem, which implies that the user should - ideally - be able to perform a seamless interaction with both real and virtual objects. Interfaces are said to have seams if there is a functional, temporal, spatial or cognitive element that forces the user to change the way in which he/she interacts with the interface [23].

None of these problems is easy to solve. The visualization problem, for instance, includes also other problems, such as the occlusion problem [19, 59]. There are several technical solutions to perform this integration of virtual and real objects. These solutions range from camera-based solutions (as in mobile AR [13]), light projection (as in spatial AR [29]) or optical solutions based on half-silvered mirrors [46] or on see-trough Head-Mounted Displays (HMDs) [18]. It is also possible to use a non-see-through HMD to create an AR application [65].

The registration and tracking problem has been a major problem in AR research over the last twenty years. Although outdoor tracking is still an issue [31], there are currently good solutions for indoor tracking that are quite reliable [32].

With respect to human interaction in AR systems, many possible interaction systems can be considered natural, such as voice recognition, eye/gaze tracking, body movement, facial expression and, of course, hand and gesture tracking. In this area of multimodal interaction, there are still many unsolved problems, such as tactile feedback, device calibration or decoupling between real and virtual actions. For these reasons, the goal of achieving a seamless natural interaction between human and virtual objects is still unattainable. However, interaction systems based on hand and gesture tracking are not uncommon in AR environments.

Natural hand interaction systems can be classified in metaphoric and isomorphic [35]. A metaphoric hand interaction system is defined as one that “bases input actions on image schemas - mental models formed from repeated patterns in everyday experiences - and system effects on related conceptual metaphors”. In this regard, as HoloLens only detects a limited set of gestures representing common actions, it can be considered a metaphoric system. On the contrary, isomorphic interaction systems can be defined as those that perform “one-to-one literal spatial relations between input actions and resulting system effects”. Meta 2 and Magic Leap One could fall in this category.

Understanding the benefits of using either of these two types of interaction techniques is key for developing successful AR applications. For this reason, we perform in this research a comparison of these two natural hand interaction systems. The analysis is performed using two AR devices - from two different manufacturers -, a Microsoft HoloLens (released in 2016) and a Magic Leap One (released in 2018), that offer similar solutions for the visualization (by means of an optical see-through HMD) and registration problems (using Simultaneous Localization and Mapping (SLAM) algorithms). These two devices are representative of the two aforementioned interaction techniques. The former uses a metaphoric hand-gesture tracking system and the latter relies on an isomorphic approach. In addition, although this is not the goal of the experiment, quantifying the amount of improvement in natural hand interaction that the newer AR device (the Magic Leap One) achieves with respect to the older (the Microsoft HoloLens) could serve as an indirect indicator of the maturity of this technology, since gesture tracking is one of the features in which both systems present more differences.

The rest of the paper is organized as follows. Section 2 reviews related works using HoloLens and Magic Leap and also reviews studies on AR interaction, with special focus on hand interaction. Section 3 describes the materials and methods utilized to develop the tools employed in the experiments. Section 4 details the experimental study. In section 5, the results of these experiments are presented and discussed. Finally, section 6 outlines the future work and shows the conclusions of the paper.

2 Related work

Several applications and research works have been published using a HoloLens. A sizeable amount of this research is concentrated in the medical field, in which many applications have been recently proposed using this optical AR device [1, 2, 14, 20, 26, 30, 36, 47, 50, 67]. The fact that this device allows physicians to experience virtual content without having their hands busy holding hardware, makes it very convenient for medical applications, and explains why is so widely studied in the medical field. There exist, however, many other applications of this optical AR system [5, 6, 22, 41, 55, 64, 66].

In addition, some research works have analyzed the performance of this optical AR device. For instance, in [33] a comprehensive technical evaluation of HoloLens’s performance is reported. The authors of this work analyze head tracking, environment reconstruction, spatial mapping and speech recognition. They find that head tracking (using OptiTrack Flex V100 [42] as ground truth) is more properly performed at low head-movement speeds, that the 3D reconstruction of the environment is most precisely for flat surfaces under bright conditions, and that speech commands have correctness rates of 74.47% and 66.87% for user-defined and system-defined commands, respectively. Spatial mapping and tracking accuracy are also evaluated in [16] for an AR assembly application. They find that the HoloLens system was not accurate enough to create a mesh of some intricate parts and, thus, a Vuforia marker-based tracking plug-in was necessary. Therefore, they conclude that the device is not yet ready for deployment in a factory assembly setting where high precision is necessary. A different analysis is performed in [11]. This work focuses on gesture tracking and in bimanual gestures in particular. The authors implement and evaluate five techniques for rotation and scale manipulation gestures on the Microsoft device, conducting a study with 48 users. They find that hand-tracking losses occurs frequently in two-handed gestures due to the limited field of view (FOV) of the device. In fact, the manufacturer discourages the development of two-hand gestures [39]. Nevertheless, this work also reports that certain bimanual techniques “perform comparatively to the one-handed baseline technique (the one-handed wireframe cube technique currently in use on standard HoloLens applications shipped with the device) in terms of accuracy and time”.

Similar problems are found in a recent paper comparing a Microsoft HoloLens and a Virtual Reality (VR) Table [54]. This study analyses the responses of 82 participants using virtual cadavers for anatomy training in both the AR setup (with HoloLens) and the VR setup. Some of these participants commented about the limited FOV (around 30°) of the HoloLens. Some others indicated that they expected a higher accuracy both in head tracking (especially when fast head movements are performed) and in hand gesture recognition. These and other problems caused a preference for the setup based on the VR Table. The results of this previous study encouraged us to perform the work presented here, since the use of the HoloLens hand interaction system in the AR setup seemed to be one of the key points that favored the VR-based setup in [54].

As it can be seen, there is a substantial amount of research involving or using the HoloLens. On the contrary, there are very few academic works using or dealing with the Magic Leap One, since it is a newer device. However, an interesting application can be found in [51] where the Magic Leap is combined with an Apple Watch and a touchless ultrasonic haptic device, in order to allow the user to experience an holographic view of his/her own heart beating. The touchless haptic device combined with the Magic Leap allows an untethered natural hand interaction with some tactile feedback. As in the case of HoloLens, the medical field seems to be especially attractive, such as in [58], where a remote medical monitoring system using Magic Leap is presented, or in [21], an application where an AR system based on Magic Leap is implemented for people with Autism Spectrum Disorder (ASD). Other works using or dealing with this device can be found in [8, 15, 56, 57]. However, to the best of our knowledge, no academic work has yet reported a detailed scientific analysis of the technical features of this device, nor a comparison between a Magic Leap One AR application and a HoloLens-based one. For this reason, we believe our contribution is meaningful.

Regarding interaction in AR/VR systems, it is generally accepted that natural interaction techniques increase the level of fidelity of AR/VR applications. However, there can be cases where non-natural interaction systems could be preferred over semi-natural or natural interfaces [37, 48]. Furthermore, it is hypothesized that interaction fidelity may follow a U-shaped curve [38] where mid-fidelity techniques are the worst and systems with low or high levels are preferred by users, because unlike semi-natural interfaces, both natural and non-natural interfaces feel familiar. This uncanny valley [40] appears also in the perception of the aesthetics of robots and has been widely studied.

With respect to natural hand interaction in AR/VR systems, many solutions have been proposed. Two main technologies can be identified in the implementation of hand gesture recognition [61]: those based on wearables such as data gloves [34, 44] and those relying on sensors such as video cameras, depth cameras or infrared sensors [27, 53, 60]. A third category is also possible for those combining both methods ([24, 62]). Glove-based interaction systems were first developed in the 1980s, but today there is a preference towards the use of sensors and cameras for hand tracking, due to freedom of movement. Wearables, on the other hand, can provide higher accuracy and reliability. In addition, they can provide haptic cues [10], which is an advantage over vision-based systems.

With respect to comparisons in natural interaction for AR systems, some authors have performed comparisons between different interaction systems. For instance [45] compares a free-hand gesture-based interaction technique with a multimodal gesture-speech interaction system. The results show that both systems have strengths and weaknesses. Similarly, [28] compares a multimodal interaction system combining free-hand gesture and speech input with speech-only and 3D-hand-gesture-only interaction conditions (across a series of object manipulation tasks). The results show that the multimodal interface (MMI) is more usable than the gesture-only version. Nevertheless, the MMI was neither more effective nor more efficient than the speech-only system. Another experiment in the same line can be found in [12]. The results show that speech outperforms gesture in terms of accuracy but the simplicity of gestures compensates accuracy with speed. Therefore, they conclude that both gesture and speech are effective interaction modalities for performing simple tasks in AR applications. A somehow similar experiment is performed in [52] using a HoloLens. The authors conclude that users were faster using the hands-free approach (voice control) than using the manual (gestural) interaction.

Other authors focus their research on the hand gesture interaction system. For instance, in [43] two gestural interaction mechanisms are evaluated and compared. However, the experiment is not performed for AR applications and uses a data glove. In a more recent work, a Leap Motion Controller and a smartphone are used to implement and AR application to interact with 3D models from a museum collection [27]. The results show that the system is well accepted by the museum visitors. Another interesting experiment is shown in [4] where the authors analyze freehand grasping in exocentric Mixed Reality, using a Microsoft Kinect. Other authors focus in the learning effects in the use of gesture interaction in AR, such as in [49]; this study shows that users learn how to use hand-based gesture control in a short time.

Finally, a few works are focused specifically on comparing two or more hand interaction devices/mechanisms for virtual environments. Three different hand gesture recognition devices - Leap Motion, Microsoft’s Kinect, and Intel’s RealSense - are analyzed in [25]. The authors conclude that Leap Motion is the best device in the context of game design. In [3] an experiment is designed to compare two different gesture-based approaches (high and low level of naturalness) in AR using a Leap Motion Controller for three basic actions: translation, rotation and scaling. The authors conclude that participants struggled with highly natural approaches. Therefore, although these approaches were more enjoyable, participants would choose the less natural approaches for future interactions.

A much more rigorous approach to this matter is shown in [17]. This paper is especially interesting because it shares some similarities with our approach. It shows an experimental comparative evaluation between a metaphoric hand interaction system (represented by a Microsoft HoloLens) and an isomorphic hand interaction system (represented by a Meta2 device) for three different types of tasks: move, scale and rotate a virtual cube. The results show that both interaction systems present strengths and weaknesses. The isomorphic paradigm is perceived as more natural and usable (with a higher usability score and a lower task-load index) and is more accurate for the displacement task, whereas the resize task is more accurate under the metaphoric paradigm.

There are several differences between the work shown in [17] and our approach. First, an obvious difference is that we compare different devices. Second, the tasks that we compare in this work are more complex. Finally, the methodology is different, since [17] uses objective measurements and two generic subjective evaluation systems - the System Usability Scale (SUS) and the NASA Task Load Index (NASA-TLX) -, whereas we analyze much more objective indicators and use subjective questionnaires designed specifically for AR/VR applications.

3 Materials and methods

As previously mentioned, two different hardware setups are used in this research in order to assess the differences regarding selection and displacement of objects in AR: A HoloLens-based system and a Magic Leap-based system. Both devices use the same optical AR paradigm, use similar head-tracking procedures and will run the same software application in these experiments. These two setups represent two of the most common see-through HMD used in AR applications.

3.1 HoloLens-based system

The first hardware station was setup by means of a table and a Microsoft HoloLens (version 1) device. This device consists of a pair of Mixed Reality smart glasses developed and manufactured by Microsoft. It is a see-through HMD with a built-in PC with Windows 10, a FOV of approximately 30 × 17 degrees, a display with 1268 × 720 pixels per eye, a two hand-gesture recognition system with limited hand tracking and a time-of-flight depth sensor. Figure 1 shows a user testing the HoloLens-based station.

3.2 Magic leap-based system

The second hardware station was set up using a table and a Magic Leap One device (see Fig. 2). The Magic Leap One is a three-piece system that includes a headset called lightwear, a small wearable computer called the lightpack, and a handheld controller. The headset is studded with tracking cameras for mapping the environment, as well as inward-facing eye-tracking cameras. The darkened lenses are inset with small glass waveguides, which Magic Leap manufacturer calls photonics chips. The sensors are located directly on the headset.

The device displays are based on the virtual retinal display (VRD) technology, which draws a raster display directly onto the retina of the eye. Pupil tracking technology is also incorporated in the headset. Thus, it is possible to know where exactly a user is looking at, although this is not used in our experiments.

This device provides a FOV of 40 × 30 degrees, a display with 1280 × 960 pixels per eye, a controller with 6 degrees of freedom (not used in the experiments), a two hand-gesture recognition system including finger tracking (with 3 joints per finger), and a time-of-flight depth sensor. It sells at $2300 and is a serious competitor of Microsoft HoloLens as it provides improved graphics and improved hand movements tracking at a cheaper price. Figure 3 shows a user testing the Magic Leap-based station.

3.3 Software

The software application used in the experiments has been implemented using Unity3D 2019.1.0f2. This tool was selected because it makes easy deploying the same software for different devices. C# was chosen to program the scripts in Unity, using Visual Studio 2010 to code and debug them. 3DS Max and Adobe Photoshop were used to create the 3D model of the objects in the experiment. The software is run on the computers integrated in both HoloLens and Magic Leap systems, which, unlike other AR platforms come with dedicated computers to host and run all the software.

The AR application used in the experiment is composed of several modules (see Fig. 4). The Manipulation Controller is the component that manages the control related to the location and displacement of the objects in the scene. For example, when a user moves his/her hand to grab an object, an action is triggered and the object is moved by this component according to the corresponding movement of the hand. The Tracking Controller is responsible for identifying each action that the user performs during the session, so each time the user moves or drops an object, the action is recorded in a specific codification. The AR Camera Controller manages the cameras of the application. A camera in the application represents the user’s point of view. Therefore, it is important to be able to control the rotation and translation of the camera depending on the device that is being used. Since we define two different instances for the two setups (HoloLens and Magic Leap), the AR Camera Controller is a dual component. The Input Controller is also a dual component. It has two different instances that handle the interaction of the two types of interaction systems used by the two different setups. The first instance handles the data collected by HoloLens’ sensors. The second instance handles the inputs to the Magic Leap device. Both instances perform a translation of these inputs to the events of the application. Gestures and motion are the input elements utilized in this setup. The Feedback Controller is responsible for the activation of visual and sound feedback according to the user’s interaction in the application. For example, two different sounds are used to identify when an object has been grabbed or dropped respectively. The Task Manager is responsible for initiating and controlling the execution of the tasks defined in the application. The Data Manager is responsible for storing the data generated by each user while completing the assigned tasks. We record several objective datasets that represent the actions performed by the user. These datasets are saved in a CSV file for its registration and subsequent analysis. The list of datasets is detailed in the next section. Finally, the Application Manager is responsible for coordinating all other components. In other words, it is responsible for communicating the status of the application to the other components and for triggering the events or elements that must be raised to respond to the actions defined in the system.

4 Experimental study

As previously explained, the goal of the experiment is to analyze the use of natural hand interaction in AR systems by comparing the same application under two hardware setups that represent two different hand-tracking paradigms: a HoloLens-based system (representing a metaphoric interaction) and a Magic Leap-based system (representing an isomorphic interaction). The study tests if there are statistically significant differences in the use of hand interaction for two different tasks. Our hypothesis is that there would be differences, in terms of both objective and subjective measures regarding the use of hand interaction between these two AR setups, in favor of the isomorphic setup. Thus, several research questions are formulated and tested in this research in order to assess this initial hypothesis. These seven questions are enunciated next:

Research Question 1 (RQ1): Which system will allow a more accurate completion of the tasks?
Research Question 2 (RQ2): With what system will users complete the tasks with fewer mistakes?
Research Question 3 (RQ3): Which system will allow a faster completion of the tasks?
Research Question 4 (RQ4): Will the size of the virtual elements affect the performance of the users?
Research Question 5 (RQ5): Which system will users consider more useful for these tasks?
Research Question 6 (RQ6): Which system will users prefer?
Research Question 7 (RQ7): Which system will users recommend?

4.1 Participants and procedure

In order to answer these questions and analyze differences between the two setups regarding hand interaction, we designed a set of task-based experiments. We ran our experimental study in the Emerging Analytics Center (EAC) of the University of Arkansas at Little Rock (UALR). Since the goal of this research is to study the use of natural hand interaction in AR systems, we decided to recruit participants who did not have any previous experience using AR technologies with natural interaction, in order to avoid skilled users, which can benefit from some earlier experiences. The experiment was announced in every department of UALR. As a result, 45 people volunteered and registered to perform the experiment by signing up online. Of the 45 participants, 18 were women (40%) and 27 men (60%), with ages ranging from 18 to 61 (mean 34.67 ± 12.40).

In order to avoid biases for either of the systems, users were randomly assigned to two groups of roughly equal size: group A (24 participants) and group B (21 participants). Users in group A tested first the HoloLens-based system and then the Magic Leap, whereas users in group B tested first the Magic Leap-based system and then the HoloLens. The age, gender and group distribution of the participants are shown in Table 1.

Table 1 Age, gender and group distribution of the participants of the experiment

An empirical evaluation of two natural hand interaction systems in augmented reality

Abstract

Similar content being viewed by others

Hand Interaction Toolset for Augmented Reality Environments

Bridging the Physical and Virtual Worlds: A Hand Tracking Gesture Recognition System for XR Applications

Touch and hand gesture-based interactions for directly manipulating 3D virtual objects in mobile augmented reality

1 Introduction

2 Related work

3 Materials and methods

3.1 HoloLens-based system

3.2 Magic leap-based system

3.3 Software

4 Experimental study

4.1 Participants and procedure

4.2 Tasks and objective datasets

5 Results and discussion

5.1 Statistical analysis of objective data

5.2 Statistical analysis of user responses

5.3 Statistical analysis of two-choice questions

5.4 Statistical analysis of correlation and sources of variation

6 Conclusion and future work

References

Availability of data and material

Code availability

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation