1 Introduction

In human–robot interaction (HRI), expressions of a robot facilitate human understanding of the robot’s behavior, affects (e.g., emotions and moods), rationale, and motives, and is known to increase the perception of a robot as trustworthy, reliable, and life-like [1]. To participate in affective interaction, robots must be able to communicate their affective state to others [2]. Among the many ways of showing affect, such as speech, voice, facial expressions, bodily expressions, color, and lights, we are interested in bodily expressions of humanoid robots. Studies showed that a considerable portion of communication in human–human interaction is through body language [3]. People have sophisticated skills at interpreting meanings from body cues. Expressing robot affect through the body enables people to use those skills to better understand robots. Moreover, a study showed that bodily expressions in addition to facial expressions improved the recognition of affect [4]. Making the robot body expressive thus may improve people’s understanding of robot affect. Physically, the body is also a large part of many humanoid robots, and many robot behaviors involve the body. The body is a particularly important way for humanoid robots that lack facial features to express affect nonverbally, such as the NAO, ASIMO, and QRIO.

This study aims to investigate how a social robot expresses affect through body language during task executing in the context of a dyadic human robot interaction. More specifically, we would like to figure out how robot affect can be shown through body language while the robot is performing body actions required by the interaction at the same time. Our motivation stemmed from a game between a humanoid robot and a child. In this game, the robot performs gesture sequences and the child imitates the sequences. This imitation game has been developed to foster the relationship between a personal robot assistant and children with diabetes [5]. For the better part of the interaction, the robot is performing gestures (details see Sect. 4.2), and children pay attention mainly to the gestures.

Before introducing our work, we first briefly discuss the concepts of affect, emotion, and mood. Affect is an umbrella term in psychology that refers to the experience of feelings, emotions, or moods. Our work focuses on mood. Distinctions between affect, emotion, and mood are explained in [610]. Here, we highlight the distinctions between mood and emotion that are related to expression: emotion is a short-term, intense affective state, associated with specific expressive behaviors; mood is a long-term, diffuse affective state, without such specific behaviors. Mood emphasizes a stable affective context, while emotion emphasizes affective responses to events.

Expressing affect through ongoing functional behavior as opposed to expressing affect with explicit categorical expressions is relevant for the following reasons. First, expressions based on explicit body actions show affect for a brief period of time and interrupt functional behavior. For example, raising arms akimbo to display anger [11]; covering eyes by the robot’s hands to display fear [12]; and raising both hands can be used to display the emotion of happiness [13]. Although clearly recognizable, such explicit gestures cannot be used when a robot is, e.g., carrying a box that requires the use of both arms and hands. To express affect through ongoing functional behavior the expression needs to be integrated into the robot behavior in a more or less continuous fashion, which is quite different. In this paper, we used the imitation game as an interaction scenario and studied the use of body language for expressing mood. We extend previous work reported in [1416] on a parameterized behavior model for expressing mood. The model is adapted here to enable the continuous display of mood through game gestures (see Sect. 4 for more details).

Second, affect expression that is integrated with the functional behaviors of robots provides a way of expressing mood. Bodily expression of emotion has been extensively studied, while bodily mood expression yet needs to be explored. Compared to emotion, which is a short-term and intense affective state, mood is a more long-lasting and less intense affective state. An individual is at any given time in a more or less positive or negative mood. Integrating mood into the body language of a robot therefore may provide a robot with an alternative, more stable but less specific, affective communication channel. This may also contribute to the believability, reliability, and lifelike quality of a robot, since robots are enabled to show another form of affect, mood, and with mood expression robots can show affect more often and continuously over time.

Our research questions in this study are whether (1) people, while interacting with a robot, can recognize mood from positively versus negatively modulated robot behaviors and, (2) how this influences a person’s own affective state and interaction behavior. For example, it is well known that mood can transfer between persons and has specific effects on behavior [17] and it is useful to gain insights into the effects and possible transfer of mood from a robot to an individual.

The remainder of this chapter is organized as follows. Section 2 discusses related work. In Sect. 3, we elaborate the parameterized behavior model for mood expression and the modulation principles, and we explain the rationale behind our claim that the expression by means of behavior modulation is suitable for expressing mood. We also briefly describe the evaluation of the model in a recognition task. We describe the interactive game we used in our study and the integration of the behavior model into the game gestures are introduced in Sect. 4. Importantly, we demonstrate the motivation of our investigation of using the bodily mood expression in an interaction scenario. In Sect. 5, we formulate our main research questions and hypotheses. Sect. 6 discusses the experimental setup and Sect. 7 presents the results. We discuss these results in Sect. 8. Finally, the chapter is concluded and the future work is discussed in Sect. 9. In addition, we provide examples of how to construct parameterized behaviors computationally in Appendix A.

2 Related work

The affective states of a robot or a virtual agent can be expressed nonverbally by poses and movements of facial and body components. Facial expressions have been used in embodiments such as Kismet [18], iCat [19], Greta [20], and Max [21], while bodily expression has been used for ROMAN [11], NAO [12, 22], KOBIAN [13], Greta [20], and Max [21]. In these studies, it has been experimentally demonstrated that people generally are capable of recognizing the affective states that are expressed. Furthermore, [11, 13] showed that bodily expression combined with facial expression may significantly enhance the recognition of a robot’s emotion expression.

Bodily expression can be generated by directly simulating human static postures and movements as done in, e.g., [13, 22]. A more generic approach for generating expressive behaviors, however, is to modify the appearance of a behavior via the modulation of parameters associated with that behavior. Wallbott [23] investigated whether body movements, body posture, gestures, or the quantity and quality of movement in general allow us to differentiate between emotions. This study found that qualities of movement (movement activity, spatial extension, and movement dynamics) and other features of body motion can indicate both the quality of an emotion as well as its quantity. Laban movement analysis (LMA) [24] models body movements using four major components: body, space, effort, and shape, characterized by a broad range of parameters. Based on LMA, Chi et al. [25] developed the EMOTE framework that uses post-processing of pre-generated behaviors to generate expressive gestures for virtual agents. The model developed by Pelachaud et al. [20] modifies gestures before generating actual movements. This model distinguishes spatial, temporal, fluidity, power, overall activation, and repetition aspects of behavior. It has been applied to the Greta virtual agent [26] and the NAO robot [27] for communicating intentions and emotions. These methods can be applied to functional behaviors in order to express affect of a robot while it is performing a task. In our model, behavior parameters are defined when the behavior profile is synthesized. One advantage of doing so is that we can model the physical constraints of the robot body at the same time. The ranges of behavior parameters are determined when the parameters are defined to make sure that modulation will not cause collision with other parts of the robot body. Another approach is to use the body resources that are not required by functional behaviors to express affect (e.g., [28]). In our model, when head movement is not part of the functional behaviors, head movement can be used for expressing mood if needed.

Affect expression of robots has many positive impacts on human–robot interactions including the following aspects: the way of interacting with a robot, the attitude towards a robot, the effectiveness of assistive tasks. A long-term field study showed that facial expression of robot mood influenced the way and the time that people interact with a robot [29]. Emotional behaviors made elderly participants perceive a robot as more empathic during their conversation [30]. Emotional gestures improved participants’ perception of expressivity of a NAO robot during a story-telling scenario [31]. In an application of a robot companion that is capable to play chess with children [32], robot emotion expression that varied with the state of the game was used to help children better understand the game state. A preliminary evaluation also suggested that the emotional behavior of the robot improved children’s perception of the game. In another study [33], this robot responded empathically to children’s affective states. Results suggest that the robot’s empathic behaviors enhance children’s attitude towards the robot. Adaptive multimodal expression was studied with children using a quiz game [34]. Expressive behaviors were selected based on events in the environment and internal parameters. The study showed positive effects of the adaptive expression on children and the children’s preference for bodily expression. In a personal assistant application for children [35], robot emotion expression was shown to improve the effectiveness of the robot when used as companion, educator, and motivator. Robots equipped with minimally expressive abilities were developed to help children with autism with their social abilities [36]. Facial and bodily expressions of the robot were used to help children learn to recognize these expressions and use their own expressions by imitating the expressions of the robot. These robot expressions were found to attract children, improve and maintain engagement of the interaction, and evoke emotional responses [37].

Affect expression also influences users that interact with virtual agents (see [38] for a review). The review focused on the effects of affective expression of virtual agents on users’ perception/attitude towards the agent (e.g., likeability, trustworthiness, and warmth), users’ behavior (e.g., attention, concentration, motivation), and users’ task performance in the interaction. Most studies suggested that people perceived agents as more positive when they display emotions. More importantly, we would like to highlight the studies that suggested effects on users’ (affective) states and performance, since they are closely related to our study. Several studies showed that affective agents were able to reduce negative affective states of users. Prendinger et al. [39] investigated the effect of a virtual agent with affective behavior on a user in a mathematical game scenario. Participants who interacted with the agent displaying empathy were significantly less stressed according to physiological measurement. A similar effect was also found in a virtual job interview scenario. Klein et al. [40] and Hone [41] reported that an interactive affect support agent was able to alleviate frustration in games that were designed to frustrate players on purpose. Hone found that an embodied agent was more effective in reducing frustration and a female embodied agent was more effective than a male. Similar results were obtained in Burleson and Picard’s study [42]: agents with affective support was reported to reduce participants’ feeling of frustration in a learning context, and this affective intervention was found to be more effective in girls.

Several studies also reported effects of affective virtual agents on performance. In Klein’s study [40], participants who interacted with the affective support agent played the game significantly longer. Maldonado et al. [43] found that participants who interacted with the emotional agent performed better in a test in a language learning context. Berry et al. [44] studied the effects of the consistency between emotion expressions and persuasive messages about healthy diet using the GRETA agent. Results showed that GRETA with consistent emotion expression resulted in better performance of memory recall. Emotion expression was reported to have effects on users’ affective states and behaviors. Tsai et al. [45] found that happy expressions of both still images and virtual agents can induce an increase of users’ happiness. Interestingly, when cognitive load is increased by decision-making, this emotion induction is dampened. Okonkwo and Vassileva [46] found that the agents with facial expressions improved concentration and motivation in subjects. In Gong’s study [47], a talking head agent presented happy and sad novels with either a happy or a sad facial and vocal expression. Results showed that the happy agent elicited greater intent to buy the books and more positive evaluation of the novel books and the book reviews. All these studies suggested that affective expressions of virtual agents have effects on the users during interaction. Our study investigated whether affective expressions of robots have similar effects on users. In particular, [45] also looked at the mediating effects of task load. We also studied the effect of task load by varying game difficulty.

In previous work, a parameterized behavior model for expressing mood using body language while performing (functional) behaviors was proposed [14]. We have adapted this parameterized behavior model for this work. The model is based on a set of generic parameters that are associated with specific body parts and that are inherently part of related body movements. These parameters subsequently are modulated in order to express various moods. This model allows us to integrate mood into functional behaviors in a manner that does not interfere with the functions of these behaviors. The model was validated by evaluating whether users could recognize robot mood in a recognition experiment. The results obtained showed that participants who were asked to rate valence and arousal were able to differentiate between five valence levels and at least four levels of arousal [16].

In this paper, we ask the question whether a robot’s mood can be transferred from robot to human. Some evidence that supports this has been found by Tsai et al. [45] who showed that even still images of virtual characters can induce mood. Their study also revealed an interaction effect between cognitive load and contagion in a strategic game: the contagion effect was reduced by the mobilization of more cognitive resources required for the decision-making task. The application of robot bodily expression in an HRI scenario and its effects on the interaction, however, are still largely unexplored. To investigate these effects, in the study reported in this paper bodily mood expression has been used that can be displayed simultaneously with functional behaviors. In particular, we address the question whether these body expressions can produce a well-known psychological effect — emotional contagion (in our case robot mood transferred to humans) — during human robot interaction.

3 Parameterized behavior model for mood expression

3.1 Model concept

To enable a robot to express a long-lasting affective state during task execution, a mood, we applied a previously developed model for integrating affect expression with functional behaviors (e.g., task behaviors, communicative gestures, and walking). In this model, behaviors are parameterized (see Fig. 1), and by varying behavior parameters different moods can be expressed. The set of parameters is generic and can be used to modulate behavior parameters of arbitrary behaviors. Example parameters include the speed of movement and the amplitude of a movement. A parameter may also be associated with a particular body part of the robot (e.g., head, hand palm, and finger). For a specific behavior, one only needs to specify which parameters should be varied to express mood while performing that behavior. Moreover, by varying these parameters the “style” of executing a particular functional behavior can be modified without changing the particular function of that behavior. Different styles thus can be used to express a range of affective states. This way, affect can be displayed throughout a series of behaviors.

The parameterized behavior model (Fig. 1) consists of three layers: (1) a drive layer; (2) a behavior parameter layer; and (3) a joint configuration layer. The drive layer contains the task scheduler (the task part) and the affect generator (the affect part). Robot affect state can be determined by, for instance, appraisal models, while the affect state controls the parameters. The task scheduler decides which behavior should be performed at each moment according to the task requirements. From the top layer, task scheduler and affect generator work simultaneously and independently (without interfering with each other).

Fig. 1
figure 1

General parameterized behavior model

3.2 Mathematical representation of pose modulation

This section focuses on the modulation of behavior poses. The modulation of motion dynamics is straightforward, so is not included in this paper but details can be found in [14]. A behavior in this study is defined as a sequence of movements of effectors transiting from one pose to another. A behavior profile describes the behavior function that conforms to social conventions or fulfils certain physical operations of objects. For example, we define the profile of the waving behavior as one hand swinging between two horizontally aligned positions repeatedly, where the palm should always face forward. Taking pointing behavior as another example, we define pointing as the arm stretching out from the preparation pose to the pointing pose. Put differently, a behavior profile defines the set of poses in a behavior and the order of transitions between poses. Note that a pose of a behavior is not fixed but can vary within a certain range. The following equation depicts the set of poses in one behavior, while the transitions between poses form the movement.

$$\begin{aligned} Behavior=(\Sigma , \left\{ {Pose}_{1,}{ Pose}_{2,} \ldots , {Pose}_{k} \right\} ) \end{aligned}$$

\(\Sigma \) defines the order of the poses in the movement. A pose is a set of joint variables of an effector.

$$\begin{aligned} {Pose}_{i}=\left\{ j_{i}^{1}, j_{i}^{2}, \ldots , j_{i}^{n} \right\} \end{aligned}$$

\(i=1,2,\ldots ,k;j\) denotes a joint; the i-th pose contains n joints. The poses that correspond to a particular behavior must meet certain conditions that represent the behavior function. Put differently, some of the joints should meet the requirements specified by a certain formula for each pose. We use B to denote, for example, a linear function that represents the behavior function. Hence, \(\exists \{j_{i}^{m}\}\subset {Pose}_{i}, m\le n, s.t. \)

$$\begin{aligned} B\left( j_{i}^{m} \right) =0\qquad \hbox {OR}\qquad B\left( j_{i}^{m} \right) >0\qquad \hbox {OR}\qquad B\left( j_{i}^{m} \right) <0 \end{aligned}$$

The solution (the value of the joint variable \(j_{i}^{m}\)) to the above equations or inequations is usually not unique. This allows for the use of pose parameters to vary the control a part of the joints \(j_{i}^{r}\in \{j_{i}^{m}\), while at the same time making sure that these joint variables still meet the required equations. Note that we also use pose parameters to control the joints \((j_{i}^{ur}\in {Pose}_{i}, j_{i}^{ur}\notin j_{i}^{m})\) that are not related to behavior functions. We use M to denote modulation formulas that represent the relations between pose parameters \(p_{t}\) to joints.

$$\begin{aligned} j_{i}^{r}=M_{i}^{r}(p_{t})\qquad \hbox {OR}\qquad j_{i}^{ur}=M_{i}^{ur}(p_{t}) \end{aligned}$$

As a result, different behavior patterns can be achieved without violating the behavior function. An example can be found in Appendix A.

3.3 Modulation principles

To evaluate the feasibility of the mood expression model, we initially applied the model to two typical behaviors in HRI, waving and pointing, and we defined parameters for the two behaviors based on the findings about human behaviors from literatures. Our aim was to figure out what parameters can be modulated to express mood and how to modulate them to express different moods. Instead of applying the modulation principles from the literature directly to the robot behaviors, we decided to conduct a user study [14] to collect data from users. Participants were asked to set a value for each parameter of the robot behaviors to match a given mood (i.e., a given valence level). A graphic user interface was designed for participants to set the parameter value and play the behavior on a real robot.

One advantage of doing so is that we can obtain opinions from more general end-users of the robot in the daily life about how the behaviors should be like for expressing a specific mood. Put differently, how users think the parameters should be modulated to express a mood. We also expected the user-designed expressions to result in higher recognition rate. Although expert designers (actors/actresses or researchers on human behavior modeling) used in some studies (e.g., [13]) can produce more versatile expressions, sometimes the expressions are not interpreted as intended by normal people. The reason might be that normal people do not have the same expertise of recognizing behavioral affective cues as the experts do. Moreover, in this way we can test whether robot mood can be expressed by parameter modulation. More details about the user study setting can be found in [14].

Results showed that participants created different parameter settings corresponding to different valence levels. This supported that it is feasible to use behavior parameter modulation to express mood. We also found that the spatial extent parameters (hand-height and amplitude), the head vertical position, and the temporal parameter (motion-speed) are the most important parameters. These parameters are “global” features that shape the overall quality of behaviors. Moreover, multiple parameters were found to be interrelated. Modulating these parameters in combination provides particular affective cues. More details of the analysis and discussion about the relations between parameters can be found in [15].

3.4 Bodily mood expression

We consider that the expression by means of behavior parameter modulation is particularly suitable for expressing mood. First, the expression extends over time, since it can be used even when a robot is performing tasks. It is suitable to express a long-term affect. Second, an expression does not show a particular action tendency. Behaviors are triggered by the task scheduler, but not the affect. The affect only influences the “styles” of the behaviors. Third, the expression relies on the behavior cues that result from behavior modulation. Compared to the meaning or functions of the behavior, we believe that the affect in the behavior is more implicit and less intense. Mood is also a less intense affective state, compared to emotion. Therefore, we believe that our way of expressing affect is suitable for expressing mood.

3.5 Expressing mood without a context

To validate the modulation principles obtained from the user study [14], we first conducted a recognition experiment in a laboratory setting using mood expression resulting from the user study. This is a pure perceptual task without an interaction context. We adopted a paired comparison approach: five mood levels were presented to participants in pairs. Participants were asked to compare which of the two robot behaviors has higher valence and arousal. Paired comparison gave us more accurate results of whether participants can distinguish these mood levels, especially the adjacent levels. We tested the recognition under three conditions: modulating all parameters, only important parameters, and only unimportant parameters, as the user study suggested that the contribution of each parameter to the mood expression is different [15]. Although in our model mood is characterized using valence, we also tested whether the perceived arousal also changed with the valence. The results showed that valence and arousal can be well recognized as long as the important parameters are modulated. Modulating only the unimportant parameters might be useful to express weak moods. We also found that speed parameters, repetition, and head-up-down correlate with arousal. Thus, the modulated behaviors do not only display the valence of the robot mood but also the arousal. More details about the recognition experiment can be found in [16].

4 Expressing mood in an interaction context

The main contribution of this work is that we investigated mood expression in the context of an actual HRI interaction task. We now describe the task, the gestures used, and the rational for our hypotheses.

4.1 Imitation game

The interaction scenario we used in this study is an imitation game, in which the humanoid robot NAO performs a sequence of gestures that are shown to a human player who is asked to imitate the gestures in the same order. Eight gestures were used to form the sequences in the game; single left arm pointing to left of robot in upward direction, left arm pointing left and downward, right arm pointing right and upward, and right arm pointing right and downward (see Fig. 2b). The left and right arm movements were also performed at the same time, resulting in four more gestures: both up, both down, slope left (left up right down), and slope right (right up left down). The left and right were mirrored between participants and the robot. For example, when the robot performs a left-arm gesture, the participant should perform a right-arm gesture with the same up or down direction.

Fig. 2
figure 2

Modulated gestures for the imitation game: a shows the four elementary gestures modulated for a positive mood; b shows the four mirrored elementary gestures for a neutral mood; c shows the slope-right gesture modulated for a negative mood. Pose parameters (amplitude-vertical, amplitude-horizontal, palm-direction, and finger-rigidness) are annotated on the figure

The classification of participants’ gestures into one of the eight types of gestures was done by one of the experimenters. Using this input, the robot system evaluated whether the participant’s gestures correctly replicated its own gestures in the right order and provided feedback by means of speech. The feedback text was selected randomly from a predefined list of sentences, e.g., “Yes, those were the right gestures” for a correct imitation, or “No, those were not the right moves” for an incorrect imitation.

To make the game more entertaining and keep the human player engaged, the system adaptively changes the difficulty of the gestures to be imitated according to the estimated level of the participant. Each gesture has an associated difficulty rating that has been defined based on studies with the Glicko system [48]. Each participant starts with an average difficulty level. When a participant correctly imitates a gesture, the participant’s level goes up, and the system selects a next gesture with a slightly higher difficulty rating. When a participant incorrectly imitates a gesture, the participant’s level goes down, and the system selects a next gesture with a slightly lower rating. For stability of the participant’s level, in practice the participant has to succeed or fail twice in a row before the level changes (see Fig. 3).

Fig. 3
figure 3

Item selection strategy of the imitation game

4.2 Mood expression in the gestures of the imitation game

One of our goals of the study we performed is to apply and evaluate this model in a more interactive scenario as a step towards the application of this mood expression model in real-life application context. To this end, we used the imitation game introduced above. The robot gestures used in this game were adapted using the design principles (Table 1) gained from previous studies [1416] in order to express robot mood while the robot is playing the game, i.e., performing various gesture sequences that are to be imitated.

Table 1 Design principles for mood expression

The robot arm movements are the primary relevant movements for the imitation game. Three pose parameters, amplitude, palm-direction, and finger-rigidness, were used for the arm. The amplitude relates to three aspects: vertical extent, horizontal extent, and arm extension; these are controlled individually by the joints shoulder-pitch, shoulder-roll, and elbow-roll (see Fig. 2a). We also used two pose parameters for head movement (see Fig. 2c). Two motion parameters, motion-speed and hold-time, were used to modulate the motion dynamics. Decay-speed was used in [14] to control the speed of movements when robot actuators return to its initial poses. In this study, we used motion-speed as decay-speed because decay-speed was found to correlate with motion-speed in [15]. The resulting gestures for positive and negative moods are illustrated in Fig. 2a, c. A video clip of the gestures used in this study and gestures modulated by mood on a continuous scale is available online.Footnote 1 The concrete modeling of the game gestures can be found in Appendix A.

4.3 Rationale for studying mood expression during an interaction

Our ultimate goal is to apply robot mood expression to daily human robot interaction. Different from the recognition experiment, in which participants were asked explicitly to recognize the mood from the robot behaviors, during daily interaction people will not be asked to do so. Expression based on behavior modulation is implicit (see Sect. 3.4). Chances are that people may even not pay attention to the affective cues in the robot behaviors. However, it is not uncommon that people spontaneously recognize the mood from the behavior of other people. We are thus interested in whether (a) people can recognize the robot mood from behavioral cues spontaneously, and, (b) the expression has any (positive) effects on the interaction and users, more specifically, effects on the users’ affective states (affective effects) and task performance (cognitive effects)?

To answer these questions, we have used a gesture-based game in this study and we have applied the mood expression model to these gestures. Instead of explicitly asking a user to recognize mood, we asked users to play a simple imitation game with a robot and try to get a high score. Hence, we considered that there is a chance that people will ignore the affective behavioral cues, since people need to focus on the game to win a high score.

We briefly discuss here the effects that might be expected by varying task difficulty (i.e., the game difficulty) on the recognition and effects of an expression. For the same task, increasing difficulty mobilizes more attention and effort on the task. For instance, the difficulty of the imitation game was controlled by manipulating the sequence length and gesture combination. As the difficulty of the gesture sequence goes up, human players focus more attention and effort on remembering the sequence, and thus may pay less attention to the details of the robot behaviors. As a result, they may be less capable at recognizing the robot mood and thus less influenced by it. However, it is known from psychology that cognitive load should not influence the recognition accuracy of emotion [49], and as we in the long term aim at a model that is able to generate robot moods that are recognized by observers in a similar fashion as mood expressed by humans, it would be good if mood recognition results do not depend on the difficulty of the interaction task. A second reason to study the task difficulty is that we want to be able to replicate mood effects on task performance [5054], as a behavioral measure for mood contagion (in addition to self-reported mood). Thus, we also studied how the task difficulty influences people’s perception of the robot mood and how the task difficulty influences the aforementioned affective and cognitive effects of the mood expression on the interaction.

5 Research questions and hypotheses

As discussed in Sect. 4.3, the main questions addressed in this study are

Q1:

Can participants differentiate between positive and negative robot mood expressed in gestures during an interaction scenario, rather than in a pure recognition task?

Q2:

Can mood expressed by a robot induce mood contagion effects in human observers?

Q3:

Can the mood expression of a robot influence the performance of a human in an interaction task?

As a result, in this study we looked at the effect of robot mood (positive versus negative) and task difficulty (difficult sequences to imitate versus easy sequences) on three constructs: observed robot mood (participant-reported robot valence and arousal), observer own mood (self-reported valence and arousal), and task performance (percentage of correct imitation sequences). We formulated the following hypotheses:

H1:

Participants rate the robot mood more positive when the robot behavior is modulated to display positive mood than when the behavior is modulated to display negative mood. This effect should not be dependent on the easy and difficult task conditions.

H2:

Participants’ affective self-reports are more positive in the positive robot mood condition than the negative robot mood condition.

H3:

Participants’ task performance is better in the negative robot mood condition than in the positive robot mood condition.

The latter hypothesis needs some explanation. If robot mood influences participant mood, then we should be able to observe mood effects on task performance. The imitation game is a detail-oriented game in need of bottom-up attention because the goal is to watch and repeat robot movements exactly. It is well known that orientation towards details and bottom-up attention is favored in neutral-to-negative mood states, as opposed to creative and out of the box thinking in positive mood states [5254]. Therefore, if mood contagion happens, we would expect to see higher task performance in the negative mood condition than in the positive mood condition.

6 Experimental setup

6.1 Experimental design

We used a mixed model \(2\times 2\) design with game difficulty (easy / difficult) as a between-subject factor and robot mood (positive / negative) as a within-subject factor. Each participant plays with the robot in only one game difficulty condition (easy or difficult) and in both robot mood conditions (positive/active and negative/passive) in two sessions. Each session took between 6 and 10 minutes and involved 10 imitations. The game difficulty was manipulated by restricting the gesture sequences that the Glicko rating system could select (see Sect. 4.1): for an easy game condition, the item ratings ranged from 300 to 1500; for a difficult game condition, the item ratings ranged from 1501 to 2800. Mood was manipulated by controlling behavioral parameters as explained in Sect. 4.2. Task difficulty was manipulated by the length of the sequence and the variation of the gestures in the sequence. Participants were randomly assigned to the two groups (Table 2). The order of the mood conditions was counter-balanced. After the two sessions, participants were asked to fill out questionnaires.

Table 2 Experiment conditions and participant groups

6.2 Measures

Both the recognition of the robot mood (H1) and the participants’ affective states (H2) were measured in terms of valence and arousal after the two game sessions using the Self-Assessment Manikins (SAM) questionnaire [55] on a 9-point Likert scale (see Appendix B). To gain more insights into how participants perceive the robot mood (related to H1), the participants were asked to describe how they thought the robot mood related to the behavior parameters listed in Table 1. This question was placed at the end of the questionnaire. Participants’ game performance (H3) was assessed by the percentage of correct imitations during each session (the score of the participant for that session), where correct vs. incorrect was a binary choice rated by the Wizard observer as explained above.

6.3 Materials

A Wizard-of-Oz method (Fig. 4) was used in this experiment for the recognition of the participants’ gestures. An operator was sitting in the room next door to the experiment room. He could see and hear the participants via a webcam and microphone. His task was to recognize the correctness of the participants’ response. The operator classified all gestures made by the participants. Procedural instructions on how to classify were given to the operator: each gesture had to be classified as one of the eight gestures the robot displayed, and in the event that the operator could not classify a gesture (usually caused by the participant’s hesitation) he was told to ignore that particular gesture and continue to see whether the participant’s next gesture is correct. The operator had been trained before the experiment to minimize the chance that he made mistakes during the operation.

Fig. 4
figure 4

The Wiz-of-Oz setting: the wizard recognized the gestures of the participant and input into the system; the system selected next gesture sequence and the robot generated the mood-modified gestures automatically

A screen (Fig. 4) was placed on the wall just behind the robot so that participants knew that the “robot” could see their gestures. Participants were told that the screen was used for facilitating the recognition of gestures by the robot, while in fact this was the operator’s view. A grey NAO robot (NaoQi version 1.14; head version 4.0; body version 3.3) was used with LED lights switched off. The robot provided oral feedback on the participant’s imitation performance by indicating whether a sequence of gestures performed by the participant correctly reproduced the gestures performed by the robot. The robot accompanied its gestures with speech (e.g., “Left up.” “Both down.”). The robot voice and texts were affect neutral. That is, phrases such as “Excellent!” or “Very good!” were avoided. The robot (58cm tall) was placed on a desk (Fig. 4) to ensure that participants could see the robot by facing the robot and looking straight ahead.

6.4 Participants

36 students (25 males and 11 females) aged 19 to 41 \(({Mean}\,=\,26.6, {SD}\,=\,4.1)\) were recruited from the Delft University of Technology for this experiment. They were from nine different countries, but most of them are Dutch (\(N\,=\,13\)) or Chinese (\(N\,=\,13\)). A pre-experiment questionnaire confirmed that the participants had little expertise on the design of gestures or behaviors for robots or virtual agents. As compensation, each participant received a gift after the experiment.

6.5 Task

Participants were asked to use a thumbs-up gesture to instruct the robot (actually the “Wizard”) to start the game. When the robot was performing gestures, the only task for participants is to watch the robot and remember the sequence. They were asked to repeat the sequence after the robot finished the sequence. In addition, participants were asked to act slowly to ensure that the robot could recognize their gestures, and they were told that they did not need to mimic the exact movements of the robot, but to imitate the correct direction (of four possible directions). They were also asked to put their hands in front of their belly when they are not imitating gestures and not make any other gestures to avoid misrecognition. Participants were encouraged to achieve a high score: they were told beforehand that the winner would receive a prize.

6.6 Procedure

Before the experiment, each participant was asked to fill in demographics, a general questionnaire about previous experiences with robots, and a consent form with regard to the general information of the experiment. Participants were told that the robot was autonomous (as is common in a Wizard-of-Oz setup). Participants were told to pay attention to the game in general, and we did not emphasize mood or behavior to try to eliminate a demand effect (participants rating what they think we want them to feel / see). They were informed that the experiment contains two sessions with different experimental conditions.

The robot started the interaction when the participant was ready. After the participant finished an imitation (sequence of movements), the robot told whether it was correct or not, and the score of the participant was updated in the system but not shown to the participant. Then the robot started the next turn and performed the next gesture sequence. Each session contained 10 turns. There was no break between the two sessions, but participants were clearly informed about the session switch.

After the two sessions, the participants filled in the SAM affect self-report (Appendix B) and the post-experiment questionnaires. The experiment took about 30 minutes on average. After the experiment, participants were fully debriefed, and each participant signed a consent form with regard to the video recording.

7 Results

7.1 Manipulation check

Task difficulty was effectively manipulated. The average difficulty ratings of the gesture sequences used in the easy condition is 1229 (\(SD\,=\,100\)) and in the difficult condition is 1555 (\(SD= 51\)). An independent sample t test showed that the difference in correctness is significant between the easy (\(Mean= 72\,{\%}, SD= 10\,{\%}\)), and difficult (\(Mean\,=\,33\,{\%}, SD\,=\,18\,{\%}\)) conditions (\(t(34)\,=\,8.121, p\,<\,0.001\)). In addition, we asked participants to rate to what extent they thought the game is challenging on a 5-point Likert scale (\(-2\) to 2) after the experiment. Participants in the difficult-game group considered the game more challenging than those in the easy-game group (\(t(34)\,=\,2.428, p\,<\,0.05\)).

7.2 Participants consistently differentiate between positive and negative robot mood

Participants were able to distinguish between positive and negative robot mood and this distinction was consistent across the two task difficulty conditions, as evidenced by a mixed (doubly) MANOVA with robot mood and difficulty as independent factors and perceived valence and arousal of the robot mood as dependent variable. This analysis (see Fig. 5) shows that robot mood had a significant effect on participants’ robot mood perception: \(F(2,33)\,=\,23.597\), \(p\,<\,0.001, \eta ^{2}\,=\,0.588\). The perceived valence and arousal were significantly different between positive and negative conditions: \(F(2,33)\,=\,27.008, p\,<\,0.001, \eta ^{2} = 0.443\) for the valence; \(F(2,33)\,=\,44.222, p\,<\,0.001, \eta ^{2}\,=\,0.565\) for the arousal. In addition, task difficulty did not influence mood perception significantly (\(F(2,33)\,=\,1.589, p\,=\,0.219, \eta ^{2}\,=\,0.088\)). These results directly support our first hypothesis (H1). Moreover, participants rated the positive robot mood as positive (one sample t-test on valence measure, \(t(35)\,=\,8.620, p\,<\,0.001\)), and active during the interaction (one sample t-test on arousal \(t(35)\,=\,8.544, p\,<\,0.001\)), and rated the negative robot mood as passive (one sample \(t-\)test testing on arousal \(t(35)\,=\,-2.086, p\,<\,0.05\)) but they did not rate it significantly more negative than neutral (\(t(35)\,=\,-0.435, p\,=\,0.666\)). This further supports our first hypothesis (H1), as it shows that arousal manipulation was in the right direction for both positive and negative, and that valence of the positive mood was also perceived as being more positive than neutral.

Fig. 5
figure 5

The participants’ perceived valence and arousal of the robot mood during the interaction

7.3 Participants’ mood depends on robot mood

Participants’ affective states were influenced by the robot mood in the expected directions, supporting our second hypothesis (H2) that robot mood has a contagion effect on human observers. A mixed (doubly) MANOVA with robot mood and difficulty as independent factors and self-reported participant mood valence and arousal as dependent variables showed that both mood (\(F(2,33)\,=\,8.379, p\,=\,0.011, \eta ^{2}\,=\,0.337\)) and task difficulty (\(F(2,33)\,=\,4.397, p\,<\,0.05, \eta ^{2}\,=\,0.210\)) influenced participants’ self-reported mood. Post hoc analyses without adjustments showed that participant arousal (\(F(1,17)\,=\,20.302, p\,<\,0.001, \eta ^{2}\,=\,0.544\)) and participant valence (\(F(1,17)\,=\,10.000, p\,<\,0.01, \eta ^{2}\,=\,0.370\)) were significantly influenced in the easy task condition, but not in the difficult task condition (see Fig. 6). This suggests that we were able to measure mood contagion effects with self-reported mood only for the easy task. In the difficult task, no contagion effect seems to be present.

Fig. 6
figure 6

The participants’ affective states

Post hoc tests of the game difficulty factor without adjustments show that in the positive robot mood condition participants’ valence is significantly higher in the easy game than the difficult game (\(t\,=\,4.049, p\,<\,0.0005\)). Arousal is approaching significance (\(t\,=\,1.809, p\,=\,0.079\)). Moreover, correlations were observed between the perceived valence of the robot mood and the valence of the participants’ moods: \(r\,=\,0.418, p\,=\,0.011\) for the negative condition and \(r\,=\,0.520, p\,=\,0.0012\) for the positive condition. The perceived arousal of the robot mood was also found to correlate with the arousal of the participants’ moods: \(r\,=\,0.335, p\,<\,0.05\).

7.4 Task performance depends on robot mood

Participants’ game performances were influenced by the robot mood (H3). A mixed ANOVA showed that participants’ scores (percentage of correct imitations) were significantly (\(F(1, 34)\,=\,7.335, p\,=\,0.011, \eta ^{2}\,=\,0.177\)) different when the robot showed a negative mood. Post hoc tests without adjustments showed that participants’ scores were significantly different between the robot mood conditions for only the difficult game condition (\(F(1,17)\,=\,6.608, p\,<\, 0.05, \eta ^{2}\,=\,0.280\)), but not for the easy game condition (see Fig. 7). The direction of the mood effect on task performance is exactly as one would expect based on psychological findings [5254]: a neutral-to-negative mood state favors orientation towards details and bottom-up attention as opposed to a positive mood state. This type of processing is needed to perform well on the imitation task.

Fig. 7
figure 7

The participants’ game performance

7.5 Qualitative analysis of perceived affective behavioral cues

To investigate what affective behavioral cues participants perceived exactly, we asked at the end of the post-experiment questionnaire how they recognized the robot’s mood in general and what, according to the participant, the relations are between the robot mood and the following behavioral features (parameters): amplitude, palm direction, finger straightness, motion speed, hold time, head-up-down, and head-left-right. Participants were allowed to leave no comments on particular behavioral features if they did not notice a relation with robot mood, and were allowed to fill in “not related” if they considered particular features did not contribute to the robot mood. The number of participants that left a comment, the frequency of “not-related-to-mood” comments, and the extracted adjective keywords are summarized in Table 3.

Table 3 Perceived affective behavioral cues from behavior parameters

The results show that the most noticeable behavior parameters related to robot mood are motion speed, amplitude, and head-up-down, while parameters like head-left-right, finger-straightness, and palm direction are less noticeable although they still have weaker contribution to the expression. We considered the number of participant leaving comments as an indicator of the parameter importance in terms of mood display. This is generally consistent with our previous findings with regard to the parameter importance [15, 16]: motion speed and amplitude are “global” parameters that change the overall quality of the behavior; finger-straightness and palm direction are “local” parameters that change the behavior quality of only a small area of the body parts. This result suggests that participants’ perception of the affective behavioral cues were not influenced (at least not much) by an interaction task.

Moreover, the parameters hold-time and head-left-right become more important in this scenario, compared to our previous findings [15, 16]. Our explanation is that the hold-time changed the overall dynamics of the gesture sequence. Although a single gesture of the imitation game contains only one stroke, gestures are displayed in sequences. Thus, the effect of the hold-time on the fluency or smoothness of the gesture sequence is more noticeable. With regard to the head-left-right, participants commented that more movement made the head display more affective cues. In previous studies, the head only turned to a certain direction and then held until the end of a behavior. In contrast, in this scenario the head continuously turned to the direction where the arm was moving when the robot displayed a positive mood. As a result, the head performed more movement and thus displayed more affective cues.

From the comments about the relations between parameters to valence and arousal, we gain insights into how participants interpreted the affective behavioral cues. We separate the adjective words that participants used to describe the relations into valence-oriented words (has a large absolute valence value but smaller absolute arousal value) and arousal-oriented words (a large absolute arousal value but smaller absolute valence value) according to the word distribution in Russell’s circumplex affect space [56]. Based on the number of valence-oriented or arousal-oriented words (Fig. 8) used to describe a parameter, we determine whether the parameter is more likely to be perceived to show valence or arousal.

Fig. 8
figure 8

Number of adjectives that participants used to describe the relations between parameters and valence and arousal

The motion speed seems to have strong relations to both valence and arousal, and so does the amplitude. The motion-speed contributes slightly more to the arousal display and the amplitude contributes more to the display of valence. The results are consistent with the findings in [23, 57]: fast speed and large spatial amplitude usually show positive valence while slow speed and small spatial amplitude usually show negative valence. The result of motion speed also confirms the findings in [5860]: varying movement speed influences the recognition of emotion intensity. The head-up-down seems to contribute mainly to the valence display, since most participants commented on it using valence-oriented words. This result confirms the findings in [61] that head position plays an important role in displaying valence and arousal. The hold-time influences the fluency of the movement, so it influences the perceived speed of the movement. Thus, the hold-time contributes mainly to the arousal display. There are two interpretation of the head-left-right: when it is interpreted as a posture, e.g., looking at the moving arm or not or looking at the participants or not, it is perceived to display valence; when it is interpreted as head movement, it increased the movement intensity or the overall activation of the behavior, and thus it is perceived to display arousal instead. The finger-straightness was perceived to show arousal, since this parameter controls the finger stiffness and shows the force of the finger. The palm-direction was only described using valence-oriented words.

In sum, parameters like the motion-speed and the hold-time that control the dynamics of a behavior, parameters like finger-straightness that present the force or stiffness of a body part, and parameters like head-left-right (movement interpretation) that change the overall intensity of movement are usually interpreted as showing arousal. Parameters like amplitude, head-up-down, finger-straightness, and head-left-right (posture interpretation) that control the posture and spatial extent of a behavior are usually interpreted as showing valence. These results are generally consistent with our previous findings [16], except that previously the head-up-down was also found to correlate with arousal to a large extent. In addition to our previous findings, the amplitude is perceived to correlate with arousal to a certain extent in this study.

8 Discussion

First and foremost, this study showed that our model for bodily mood expression of a humanoid robot successfully generalized to the behaviors needed in the imitation game: we applied the parameter modulation principles obtained in [14] to the imitation gestures directly (see Sect. 4.2); and results show that participants distinguish between positive and negative robot mood, even when they were faced with a high task load. Moreover, the recognition of the valence and arousal is consistent with the findings in [16]: modulating these behavior parameters varied both valence and arousal in the same direction. We would like to stress that this is an important contribution to the ability of appearance-constrained robots lacking facial expression capabilities to express affective signals. Further, this is an important step towards the expression of affect during task execution of a robot, something humans do automatically (e.g., walking in a sad, happy, or angry way looks very different).

Our aim in this study has been to use bodily mood expression that does not interfere with the behavioral functions of body movements and to study the effects of mood expression. This has been achieved by using a parameterized behavior model, but this does not necessarily mean that no additional effects besides the mood expression in an interaction scenario have been introduced. More specifically, effects on the game itself may have been introduced: mood expression potentially influenced game difficulty. For example, the use of head movements for expressing mood was reported by one participant as something that distracted attention and thus made it more difficult for that participant to remember the exact sequence. Another participant reported that the slow speed of the gestures in the negative mood condition increased the duration of the sequence, and consequently, increased the time that the participant needed to remember the sequence. On the other hand, slower movement may also make it easy to remember the gestures. Because mood and difficulty level are not entirely independent factors, we cannot fully rule out the possibility that the performance difference within a difficulty condition is not caused by the slight variation of the game difficulty that is caused by the gesture modulation. So formally, it is unclear if the performance difference between mood conditions on the difficult task is only influenced by the induced mood. To obtain a more reliable conclusion, further study is needed to investigate the effects of the participants’ mood and the game difficulty on the game performance separately. To be able to claim that mood contagion happened and the effect on performance is due to the mood, a follow up priming study should be done in which participants are mood primed using prior robot gestures as primes (and a manipulation test afterwards), after which participants do a task at two difficulty levels.

We asked participants to report their own mood only after the two sessions, because we wanted to avoid introducing a demand effect in the second session. This may have influenced the self-reported mood because of mood decay effects or because of the different robot mood in the second session. In a mixed (doubly) MANOVA we found a significant interaction effect between mood condition and mood order on self-reported valence and arousal (\(F(2,33)\,=\,3.507, p\,<\,0.05, \eta ^{2}\,=\,0.175\)), primarily caused by a decay in self-reported arousal for the mood condition that was presented first. This shows that presentation of the second session indeed diminishes the self-reported contagion effect of the first session.

The results of the perceived behavior cues in Sect. 7.5 indicate that the participants consciously recognize the robot mood. Although some parameters are more noticeable, every parameter received attention, which means that modulation of these parameters did change the perception of the robot movement quality. The results also help us to identify the role of each parameter in the mood expression in terms of showing valence or arousal. This will help us to improve our behavior model. That is, it may be possible to use arousal as a second variable in our model to control the modulation of the parameters. Additional work is needed to address the modulation principles when arousal is introduced in the control mechanism of our model.

Participants’ assessment of the robot mood is a comprehensive affective appraisal over all aspects on display including robot body movements, the robot’s speech, game events, etc. In line with this, the attribution of a mood was explained differently by different participants even though only body language was varied in both sessions (see Sect. 6 for the experiment setup). Some participants thought the robot mood changed because of their performance within a session. For example, one participant said “the robot’s mood was negative because I always made mistakes.” Additional evidence that robot mood was consciously recognized by participants is provided by the fact that a participant indicated that the robot was happy because the robot did not display a negative mood even when she made many mistakes, whereas another participant indicated that the robot was not so happy because the robot did not praise and encourage him when he made a correct imitation. Some participants also said they recognized mood by means of the voice of the robot even though no changes were made to the robot’s voice between the two sessions. This also indicates that participants were consciously aware that the robot mood changed. In addition, participants could have different interpretations for the same behavior parameters. For example, the head left right movement can be interpreted as either looking away (thus showing negative mood) or following the arm movement (thus showing more excitement). The variation of the interpretation may depend on people’s personality, their own behavioral habit, or the scenes in their minds.

In this study, the bodily expression of robot mood produced contagion effect on the participants: 1) explicitly, participants’ self-reported valence and arousal was significantly influenced by the robot mood under the easy game condition; and 2) implicitly, participants’ game performance was significantly influenced by the robot mood under the difficult game condition, suggesting that participants’ true mood might be influenced by the robot mood during task execution even though they did not report it after the task. We have no clear explanation for the absence of an influence on self-reported mood in the difficult condition, apart from the following two. Tsai et al. [45] proposed that the contagion effect of a virtual character still image was hindered by the occupation of cognitive resources by decision-making. It could be the case that in our study self-reported mood was somehow hindered by cognitive load. Another alternative explanation is that the participant’s mood in the difficult task was more negative by default, because the task was difficult. The fact that the participant’s negative mood was not rated even more negative could thus be due to a floor effect as one does typically not get into a very bad mood due to a game in an experiment. Hence, no effect of negative mood induction due to the robot mood was measured. The same sort of explanation would hold for why we did not find an effect of robot mood on participants’ task performance in the easy task. Here we probably had a ceiling effect: the easy imitation game is so easy, that no matter what your own mood is, you can do it almost perfectly. Finally, we cannot completely rule out alternative explanations for our findings that would argue, e.g., that participants were entertained more in the positive condition and for this reason somehow performed worse. Even so, explanations like these would still suggest some kind of mood transfer would have happened.

We have used an imitation task in this study. The participants were asked to reproduce sequences of arm movements made by a robot. The robot’s arm movements expressed different moods in two conditions. Although the participants were not asked to reproduce the exact “moody” movements, some participants still mimicked the movements to some extent, according to the recorded video. There is evidence that expression of nonverbal behavior associated with affective communication can cause experience of the relevant affect [6264]. Moreover, the “motor mimicry” theory states that people catch others’ feeling by unintentionally imitating others’ expressions [17, 65, 66]. Thus, the imitation game task context of our study may have enhanced the mood contagion. We believe, however, that mood contagion would have also happened even if the participants would not have imitated the movements. That is, the imitation of the movements is only part of the causal chain of mood contagion but not the main factor, and imitation only enhanced the contagion. It remains, however, an important question for future work to verify whether the mood contagion effect observed in this study can be generalized in scenarios in which users do not perform actions that are directly related to the robot body language.

We recorded video of each participant during the game. The videos are meant to be analyzed for more objective evidence that supports mood contagion. We did a pilot for the video annotation. Two coders performed event based annotation on the videos. No significant results were found, because not enough cues from the participants’ body actions or facial expressions were available to allow for interpretation of their emotions or moods. One explanation for the lack of cues may be that the participants were instructed not to make extra movements to avoid misrecognition of their gestures so the expressivity of their body movements is somehow constrained. Facial expressions also did not vary that much. The only evident facial expression in the videos is the smile. The participants mostly smiled when they made mistakes, but it remains difficult to interpret the relation between the smile and the robot expression.

9 Conclusion and future work

This study shows that it is feasible to use parameterized behavior to express a robot‘s mood in an actual HRI interaction scenario. Results show that participants are clearly able to distinguish between positive and negative robot mood. They are able to recognize the parameters we manipulated during the interaction. The importance of each parameter seems to be consistent with previous results in [15]. Our results also suggest that mood contagion takes place between the robot and the human. We have evidence for this contagion effect in the following two forms: 1) participants self-reported mood matches that of the robot mood, and 2) participants’ task performance is lower in the positive robot mood condition compared to the negative robot mood condition replicating a well-known mood-contagion effect.

To the best of our knowledge, this study is one of the very few in which the robot mood expressed by bodily expression is clearly distinguished by participants and the robot mood has an effect on participants, which we interpreted as mood contagion. Our study is unique in that a) robot mood expression was evaluated and investigated in a real HRI scenario, b) mood expression was realized by integrating robot body language into functional behaviors required by a task, and c) the participants were not primed to pay attention to any form of affective expression.

Our work provides an alternative way of expressing affect through robot body movement. The study presented in this paper shows the effectiveness of modulation based expression in terms of recognition and influence on users. This indicates that our model has been successfully generalized to imitation game behaviors. We believe that our behavior model can be applied to a wide range of applications, since the modulation based expression has less interference with functional behaviors compared to the expressions based on additional body actions. One of our long term goals is to apply the model to more behaviors that are frequently used in HRI. One of our studies in this direction has focused on the design and evaluation of the behaviors of a robotic tutor [67]. Our model has been applied to the co-verbal gestures of the robotic tutor and the movements when the robot is idle. As we discussed above, the imitation of movements may contribute to mood contagion. In the robotic tutor scenario, students do not imitate the robot gestures. It is important to examine whether mood contagion still exists in that scenario.

Moreover, we believe that our work not only contributes to field of the robotics, but also contributes to the field of virtual agents. For virtual agents, extensive work regarding affective expression based on behavior modulation is usually on the communicative gestures of conversational agents (e.g., [20, 68]). Our method is similar to existing parameter based approaches in constructing communicative gestures. A difference is that our model is a step further in modelling the poses related to behavior functions for more complex behaviors such as waving (see Appendix A). There are also scenarios in which virtual agents perform body actions that are constrained by functional requirements and dimensions of the virtual environment. For example, the virtual agents in training system need to demonstrate standard operations (e.g., [6971]). Our model can be used to parameterize these behaviors for modulation based expressions, while also modelling the functional and spatial constraints of these behaviors. Moreover, our work shows the mood contagion between a robot and a human via affective body language. This provides support that affective body language can produce mood contagion effect between agents in general and humans, and thus can be used as a support for mood contagion between virtual agents and humans via body language of the agents.

In this experiment, the robot mood condition was designed as a within-subject factor and presented in successive sessions. Thus, participants were able to compare the differences of the robot behaviors between the sessions. This differs from real recognition, which requires people to tell the robot mood without a reference. One way to test whether people can actually “recognize” robot mood is to put the independent variable (i.e., the robot mood) as a between-subject factor and ask people to rate the robot mood using scales (i.e., assigning values for valence and arousal). This is also a challenge since humans are not good at scaling and thus they may not be able to give accurate result. After all, this study is the first step toward the “recognition” of the robot mood from its behaviors.

An interesting topic would be to make the mood expression as a response to human players’ task performance. Put it differently, the robot will change its mood according to whether human players imitate correctly or not. In this way, the functions of the bodily mood expression in HRI can be explored. For example, we can test the empathy effect by comparing the effects between the robot displaying a positive as a response to an incorrect imitation of the human player and displaying a negative mood. Moreover, we expect the effect of the bodily mood expression on the HRI to be strengthened, since mood can be expressed through behavioral cues more often or continuously. Another example is to use the mood expression as an indicator showing the stage of goal achievement in a learning-by-demonstration scenario in which humans teach a robot doing things. It is interesting to see whether the mood expression simultaneously expressed during the task will make the learning more efficient.

One additional interesting aspect that we found in our study is that participants attributed the robot mood to various factors that were not manipulated. In a complex interaction scenario such as the imitation game, participants may believe that the affective state of a robot is shaped by the events that happen during the game, the objects present in the interaction scenario, or, for example, by the (performance of) participants themselves. It is interesting to explore this conscious attribution of mood and its causes to a robot in more detail in future work. Moreover, when other modalities of expression are also used as well as modulation based expression. It is interesting to study the interaction between each modality. For example, a robot may change its tone to expression mood when it is talking, while the robot may also perform coverbal gestures. An interesting question is whether modulated coverbal gestures (for expressing mood) can enhance the overall mood expression, alongside with the vocal expression. It has been showed in [6, 8] that body action based emotion expression may significantly enhance the recognition of a robot’s emotion when it was combined with facial expression. It is also interesting to test whether modulation based expressions can also enhance the recognition.

Finally, whether expression is universal or culturally-specific is another important question. Culture may influence the recognition of affect expression. Ekman has proved the existence of the universal facial expression [72]. For body language, Kleinsmith et al showed that cultural differences exist in recognition of affect from body postures [73], while many studies also found universal aspects of body expressions (see [74] for an overview). Culture differences in the recognition may also influence the contagion process. Besides, cultural difference may influence the contagion process indirectly. For example, there is evidence that cultural background has significant influence on the attitude towards the interaction with robots including the attitude to the emotions in the interaction with robots [75]. It was shown that attitudes influence emotional contagion process [76]. Perhaps, attitudes have an effect on the mood contagion process between humans and robots. Thus, it is important and interesting to validate our mood expression cross-culturally. Taking a step further, it would be useful to identify which parameters can be modulated to produce universal mood expression, or, just as important, to produce culturally-specific mood expression.