The first step in designing gestures for an interactive system is to understand the design space of possible gestures. Various design methods, such as bodystorming, role-playing, personas and image boards, were used in the early stages of the project to explore possible avenues for gestural interaction with the marionette. These methods provided the design team with the opportunity to explore the possibilities of both the hardware and software technologies that could be used in the development phase. In bodystorming and role playing, members of the design team played either a human participant or a marionette role, and acted out gestural dialogues. This enabled the design team to better understand how an embodied gestural interaction with a marionette could proceed.
Based on the initial prototypes and the results of bodystorming, a set of preliminary gestures were selected and implemented for the marionette. Each gesture was defined by specifying each motor’s movement in time. This definition allowed us to write and store gestures for selection by the cognitive system, i.e. when a marionette gesture is selected in response to a perceived human gesture.
After the implementation of the selected gestures for the marionette, a preliminary evaluation study was performed to assess and refine them. The evaluation of this study was done using the Wizard of Oz technique, since the perceptual system that would sense human gestures and map them to marionette gesture responses had not been developed at that time. A human operator decided which marionette gesture to execute based on participants’ gestures. The think aloud method was used to gather further data on how human participants perceived the interaction was progressing, and the reasons behind their gestural selections.
For this preliminary study we recruited twelve students as our participants. Each participant was asked to interact with the marionette spontaneously without specifying a specific task. Participants were asked to think aloud while interacting with the marionette in this study. After each session, an interview was conducted to collect additional insights about participant behavior through gestures. Video recording was used to capture each performed gesture along with notes taken by the design team.
One of the biggest challenges that distinguish Little Bill from other gesture-based interaction systems is that participants in this context are not given instructions or tasks. The most difficult moment is the “cold start”: participants initially have no conception of the scope of the marionette’s ability to perceive or respond to them. Seeing the marionette respond to their presence typically gives participants the confidence to initiate gesturing towards the marionette. Participants were more willing to continue the dialogue after they noticed that they got Little Bill’s “attention” (i.e. eye contact). Based on this feedback we designed the interaction such that Little Bill “makes the first move” and directs its attention to the new participant. To achieve this, we added an “approach” gesture that the marionette perceives when a participant is approaching it. A corresponding “retreat” gesture was added, ensuring that walking away signifies to the system the participant has lost interest in Little Bill.
The full list of participant gestures elicited by the preliminary study was: waving, bending over, approaching, walking away, getting too close, and going behind the marionette. The last two gestures are detected by the angle and distance of the participant. Results of the gesture elicitation study and the interviews revealed that lifting the marionette’s head to make eye contact, turning the marionette’s body to follow the participant, and the marionette raising its hands were the three gestures that inspired the greatest emotions among the testers.
4.1 Gesture Implementation
The marionette gestures were designed to convey emotions of different kinds and were divided into the following five categories:
-
Complex gestures: A subset of the marionette gestures involve large body movements that are intended to convey that the marionette is experiencing a strong emotional response. These gestures are implemented as a quickly executed series of movements across many degrees of freedom. Examples of the marionette’s complex gestures include surprise and scared.
-
Subtle gestures: Other, smaller gestures are intended to encourage continued interaction with the marionette. For instance, simply lifting the head of the marionette gives an impression of eye contact and, in our experiments, engaged participants and made them more likely to continue interacting. Similarly, a series of movements that convey a “quizzical look” from the marionette while participants wave for him can be intriguing and encourages continued interaction.
-
Attentive gestures: These gestures are a direct response to participants’ movement. For example, the marionette turns so that it tracks a participant, or turns its head such that it faces them. As another example, when participants walk away from the marionette, the marionette might shake its head as an attentive gesture to get their attention back.
-
Living gestures: These gestures are designed to convey the impression that the marionette is alive, and involve movement that is not a direct response to perceived participant action. For example, the marionette possesses a motor behind its eyes that can execute a blinking gesture, which is performed at random times. Another example of this type of gesture is a “breathing” gesture, which moves the back of the marionette up and down very slowly such that it looks like it is breathing. These gestures prevent the marionette from being completely still.
-
Restorative gestures: After performing some gestures the marionette might not be in a natural pose, or may have lost track of its exact pose due to technical limitations. To accommodate this lack of information, a restorative gesture was designed to adjust the marionette’s position back to its initial position. One such gesture slightly lifts marionette up off the ground and returns it to its default position.
The cognitive model for the marionette has three components: participant gesture detection, marionette gesture selection and marionette gesture execution. Participant gesture detection uses the Microsoft Kinect and its SDK to detect and send human gestures to the marionette gesture selection program. The selection component selects the most relevant marionette gesture to execute, and sends the related action to the gesture execution component.
The next challenge was how to model the selection of a marionette gesture as a response to a human gesture. We developed a set of guidelines based on observations of human movement, particularly during dialogue:
-
people are always moving;
-
different people respond differently, particularly to a repeated event;
-
people may respond by starting a conversation with another person;
-
people shift their attention to a different object even when there is no obvious event.
In order to have the marionette’s responses not become predictable, each participant gesture is mapped to a set of possible marionette responses from which a single response is stochastically chosen. This one-to-many relationship is used because the goal of the interaction is not to generate an expected response, but to encourage and provoke continued interaction. This is in contrast to the typical interaction design goal of learnable and predictable interaction between user and interface. The perceived autonomy of this simple random behavior is also intended to provoke the human to perceive intelligence in the marionette: human social interaction is not predictable, and systems intending to provoke dialogic interaction should be similarly opaque.
To define a set of appropriate marionette response gestures to participant gestures, the design team envisioned the set of probable emotional states that could cause the participant to perform each gesture. Since the gesture selection process cannot interpret participants’ emotional states, the set of possible response actions was designed to cover all emotional states (such as surprised, shy or shocked) that were deemed probable causes. The cognitive model constructs a probability that participants would be in each emotional state, and selects a random gesture weighted by this probability distribution. This allowed the marionette to respond in a manner that responds to the most likely emotional state of the participant. As an example, a person who is bent over near the marionette is probably displaying curiosity or interest in it, which puts the marionette in the surprised, shocked or shirk state. Table 1 shows the mappings from each human gesture that the marionette can recognize to a list of possible marionette response gestures. Each gesture in Table 1 represents an emotional state and refers to one or more implementation on the marionette.
Table 1. The mapping from human to marionette gestures.
The marionette gesture selection component is also responsible for deciding which participant is the current focus of the marionette’s attention. Participant interestingness is based on continued engagement (measured by amount of body movement) and the order that people approached the marionette. The marionette attends to the participant it perceives to be the most interesting, and rotates to follow their position. From the gesture elicitation study and participant interviews, it was determined that eye contact was the most important feature to participants, and resulted in the highest level of engagement. If a person was behind the marionette and no one was present in front, the marionette rotates to face them, allowing gestural interaction to continue.
The gestures of the marionette were divided into two categories based on its responsiveness. The first category is a set of “regular” gestures that are selected in response to participants’ gestures, and the second one is a set of “idle” movements that are selected when no one interacts with the marionette for a defined amount of time. If no participants are detected the gesture selection component triggers an idle state, during which the marionette performs subtle gestures in an attempt to engage anyone present but not detectable by its perceptual system (due to the limited range or field of view of the Kinect sensors). Idle movements are actions that are short in terms of execution time, and are subtle movements designed to engage people to interact. This ensures the marionette is not still for lengthy periods.