Keywords

1 Introduction

In the approach of developing mid-air hand gestures based on user-defined gestures, many factors should be considered while selecting the right set of gestures for a group of tasks. These factors include the complexity and the ergonomics of a single gesture, the occlusions among fingers, the natural mapping or compatibility among tasks and gestures, the differentiation among gestures for different tasks, and the repetition of pose or motion among gestures. Since the participants of gesture elicitation experiments may not have the knowledge about ergonomics, gestures with high acceptability still need to be analyzed carefully to avoid cumulative disorder after repetitive hand poses or motions. Therefore, the objective of this research is to explore possible ergonomic issues and to develop a systematic method of behavior analysis and gesture evaluation.

2 Literature Review

Since mid-air hand gesture controls are natural, intuitive and sanitary [13], the number of applications have increased significantly. The contexts include interactive navigation systems in museum [4], surgical imaging system [1, 2], interactive public display [5], and 3D modelling [6]. Based on the number and trajectory of hands, mid-air gestures could be classified as one or two hands, linear or circular movements, and different degrees of freedom in path (1D, 2D, or 3D) [7]. If the context is not considered, mid-air gestures could be pointing, semaphoric, pantomimic, iconic, and manipulation [8]. The types of control tasks could be select, release, accept, refuse, remove, cancel, navigate, identify, translate, and rotate [8]. Since the characteristics of context could influence gesture vocabulary [9], the gestures for short-range human computer interaction [10] and TV controls [11, 12] were reported to be different. While choosing a set of intuitive mid-air gestures for a specific group of tasks, it is necessary to consider and analyze the ergonomic problems of user-defined gestures.

3 Experiment

In order to explore the ergonomic issues of mid-air hand gestures, a pilot experiment was carried out. In product design education, studying the features of classic products and analyzing the evolution are the common and basic training. Therefore, the authors considered the context of an interactive exhibition system for product evolution demonstration. The digital information contents of such a system could be decomposed into two levels of abstraction. In the overview level, the images of representative products across different stages of a timeline were displayed with a tile menu (Fig. 1). In the product information level, the detailed image of a specific product on the timeline was displayed. The users could access the detail level through the link of menu item shown on the overview level. In these levels, participants performed user-defined gestures for a set of tasks, including (1) moving the main menu panel to left/right (in order to reveal hidden items); (2) targeting an item on the main menu; (3) confirming the selection of a menu item; (4) zooming (enlarging/shrinking) the image of a product; (5) panning the image of a product; and (6) returning to the main menu.

Fig. 1.
figure 1

Experiment setup

In a laboratory with illumination control, each participant stood on the spot in front of a 50-in. TV, with a distance of 200 cm. During the experiments, the images simulating the tile menu of product information were displayed on the TV, which was controlled by a laptop computer with a computer mouse. In order to obtain gesture characteristics, the motions of the body and hand joints were recorded by one overhead camera and two 3D depth cameras. Each participant conducted two trials of experiments to offer self-defined gestures. In the first trial, a Microsoft Kinect for Windows (v2) sensor was mounted on the top of the TV. The sensor could extract 25 joints per person. The motion of the arms and hands was recorded by the Kinect Studio program running on a desktop computer. The images of body tracking were displayed on a 23-in. monitor, which was placed on the right hand side of the TV. In the second trial, an Intel RealSense 3D Camera (F200) was used to extract the position and orientation of 22 joints on a hand. It was placed between the participant and the TV. The distance to the participant was adjusted with respect to the arm length. The height was adjusted to the shoulder height of each participant. The motion of each hand gesture was recorded by the Hands Viewer program. The program was running on a laptop computer with a 15-in. display, which was placed on the lower right hand side of the TV. Therefore, each participant performed the tasks of user-defined gestures by facing two 3D depth cameras with different distances. In addition, offering different gestures between two trials was encouraged.

4 Results and Discussions

Twenty students, majored in the Master Program of Industrial Design, were invited to participate in the experiment. From two trials of user-defined gestures, forty gestures were recorded. For the task of moving the main menu panel to left/right, seven different gestures were identified (Table 1). The second-ranked gesture (M-02), i.e. opening palm, facing forward and moving to right/left, seemed to be easier to be recognized by a 3D depth sensor. However, hand dorsiflexion might introduce discomfort after prolonged posture of opening palm and facing forward. Although the first-ranked gesture (M-01), i.e. one hand with open palm facing and swiping to right/left, might be more difficult for recognition by 3D sensors, is was the dominant gesture offered by more participants. However, swipe motion was not appropriate for controlling the precise movement of a tile menu continuously. Therefore, stepwise movements for each column of a tile menu would be the necessary response for swipe motions.

Table 1. Moving the main menu panel to left/right

For the task of targeting an item on the main menu, one hand with D handshape (American Sign Language, ASL) and a single tap motion, facing to the target, was the most popular gesture (T-01) (Table 2). For the task of confirming the selection of an item on the main menu, one hand with D handshape and a double tap motion was the most popular gesture (C-01) (Table 3). Single and double tapping with D handshape seemed to be the legacy gestures on touchscreens. However, tapping toward the target from a distance and in the air remained a great challenge for gesture recognition.

Table 2. Targeting an item on the main menu
Table 3. Confirming the selection of a menu item

For the task of zooming (enlarging/shrinking) the image of a product, two hands with open palm were the popular hand poses (Table 4). However, there were no significant differences in the frequencies about the hand orientations, i.e. facing forward (Z-01) or facing to each other (Z-02). Similarly, there were no significant differences in the frequencies about hand motions and trajectories, i.e., moving or swipe apart/close to each other. Since “zooming” was a continuous control that required precise movement, swipe motion was not appropriate. One the other hand, open palm and face forward (Z-01, Z-03, and Z04) resulted in hand dorsiflexion. Therefore, alternative gestures, such as Z-05 and Z-10, could be considered (Fig. 2).

Table 4. Zooming (enlarging/shrinking) the image of a product
Fig. 2.
figure 2

Alternative gestures for zooming (Z-05 and Z-10) (Taken from Intel RealSense F200 3D Camera and the Hands Viewer Program).

For the task of panning the image of a product, open palm and face forward was still the most popular hand pose. In order to avoid hand dorsiflexion (P-01 or P-03) and gestures similar to aforementioned tasks (P-02 or P-05), “open palm then grab and move to desired direction while grabbing” (P-04) had the benefits of avoiding static hand pose and could be the alternative gesture (Table 5).

Table 5. Panning the image of a product

For the task of returning to the main menu, the most dominant hand pose and orientation was D handshape (ASL), facing toward the return button. The hand motion and trajectory was single tapping, moving toward the direction of the return button (R-01). However, this gesture was similar to the most popular gesture of targeting an item on the main menu. In addition, one hand with open palm facing and swiping to right/left (R-02) was similar to the most popular gesture of moving the main menu panel to left/right. Therefore, pinching (R-03) or patting twice (R-04) toward the return button had the benefits of avoiding static hand pose and seemed to be alternative options (Table 6).

Table 6. Returning to the main menu

5 Conclusion

In this research, a systematic behavior coding scheme was developed to analyze user-defined gestures for six tasks of interactive product exhibition. The results indicated that hand dorsiflexion caused by the posture of opening palm and facing forward was the common ergonomic issue. In order to reduce discomfort of prolonged gesture controls, the alternative combinations of gestures for accomplishing these tasks was determined based on ergonomic limitations and the considerations of vision-based hand gesture recognitions.