1 Introduction

This paper reports on a study of how humans convey emotions via touch to a social humanoid robot, in this case the Nao robot.Footnote 1 As a foundation for this study, we have, as far as possible, replicated a human–human interaction (HHI) experiment conducted by Hertenstein et al. [25] so as to validate our work within the context of natural tactile interaction. The purpose of our work is twofold: (a) systems design based: to inform future development efforts of tactile sensors, concerning where they need to be located, what they should be able to sense, and how to interpret human touch in terms of emotions, and (b) furthering scientific understanding of affective tactile interaction: to be able to draw conclusions regarding whether it is possible to transfer theories and findings of emotional touch behaviours in HHI research to the field of human–robot interaction (HRI) and vice-versa. In addition, potential gender differences were investigated. Previous studies show that gender, in terms of human gender, robot gender, computer voice gender, and gender typicality of tasks, can have an influence on the human experience and perception of the interaction as well as the human behaviour [32, 38, 40, 41, 45]. The application of robots in different social domains, e.g. for teaching, companionship, assistive living, may predominantly affect one gender or other and, therefore, we consider analysis of affective tactile interaction between genders of critical importance to HRI.

1.1 Socially Interactive Robots

Socially interactive robots are expected to have an increasing importance for a growing number of people in the coming years. Robotics technology has increased in application to commercial products [44] but for socially interactive robots to be accepted as part of our everyday life, it is critical for them to have the capability of recognizing people’s social behaviours and responding appropriately. The diversity of applications for socially interactive robots includes military (cf. [11]), service-based (cf. [7, 27]), assistive/health care-based (cf. [10]), industry-based (cf. [26]) and robotic companion-based (e.g. [18, 29]). Common to all domains is the need for communication between human and robot: “[Human–Robot] [i]nteraction, by definition, requires communication between robots and humans” [23]. Communication can take the form of verbal linguistic, verbal non-linguistic, or non-verbal (see [24]). Critical to naturalistic interaction is the role of affect and the ability of the inter-actor to perceive affective states (including intentionality) in the other.

The non-verbal communication domain of affective touch, as fundamental in human communication and crucial for human bonding [21, 31], is typically expressed in the interaction between humans and social robots (see e.g. [13, 50]) and should therefore be considered important for the realization of a meaningful and intuitive interaction between human beings and robots. Social robots, designed to socially interact with human beings, need to act in relation to social and emotional aspects of human life, and be able to sense and react to social cues [19]. As interaction between humans and robots has become more complex, there has been an increased interest in developing robots with human-like features and qualities that enable interaction with humans to be more intuitive and meaningful [17, 46, 49]. Touch, as one of the most fundamental aspects of human social interaction [37], has started to receive interest in HRI research (for an overview of this work see e.g., [16, 47]) and it has been argued that enabling robots to “feel”, “understand”, and respond to touch in accordance with expectations of the human would enable a more intuitive interaction between humans and robots [47]. To the present date, the work regarding the modality of touch in HRI has mainly revolved around the development of tactile sensors for robotics applications (e.g., [36]). Generally, these sensors measure various contact parameters and enable the robot to make physical contact with objects and provide information such as slip detection and estimation of contact force [16, 47]. However, studies on how people interact with robots via touch are still to a large degree absent from the literature, especially in terms of affective interaction.

1.2 Touch and Social Human–Robot Interaction

Concerning the role of touch as a means for social interaction between humans and robots, several studies have revealed that people seek to interact with robots through touch and spontaneous exhibitions of affective touch such as hugging (see the Telenoid of [43]) or stroking (see Kismet of [9, 50]). This implies that physical touch plays an important role also in human–robot interaction. Lee et al. [33] show that physically embodied robots are evaluated as having a greater social presence, i.e., a simulation of intelligence successful enough for the human not to notice the artificiality, than disembodied (i.e. simulated) social robots. However, when participants were prohibited from touching the physically embodied robot, they evaluated the interaction and the robot’s social presence more negatively than when they were allowed to interact with the robot via touch. This suggests that physical embodiment alone does not cause a positive effect in the human inter-actor and that tactile communication is essential for a successful social interaction between humans and robots [33]. Clearly, the fundamental role of tactile interaction in interpersonal relationships goes beyond HHI and extends also to HRI.

Some attempts to increase the knowledge about how people touch robots have been made. For example, Yohanan and MacLean [54, 55] developed the Haptic Creature, an animal shaped robot with full body sensing and equipped with an accelerometer, which allow the robot to sense when it is being touched and moved. Yohanan and MacLean studied which touch gestures, from a touch dictionary, participants rated as likely to be used when communicating nine specific emotions to the Haptic Creature [55]. Regarding the humanoid robot form, Cooney et al. [12] studied how people touch humanoid robot mock-ups (mannequins) when conveying positive feelings of love and devotion and identified twenty typical touch gestures. Hugging, stroking, and pressing were rated by the participants as the most affectionate; patting, checking, and controlling were neutral touch gestures; hitting and distancing were considered unaffectionate. Focus here was on classification of affectionate gesture types rather than the encoding of specific positive and negative emotions. Typically, such HRI-relevant studies have not been compared to human–human empirical set-ups and findings such as the Hertenstein et al. [25] study mentioned above.

In general, aside from a few exceptions such as those mentioned above, affective touch in HRI is an understudied area. The fundamental role of touch for human bonding and social interaction suggests that people will similarly seek to show affection through touch when interacting with robots, especially social robots designed for social–human interaction. Improving the understanding of the mechanisms of affective touch in HRI, i.e., where and how people want to touch robots, may shorten the communicative distance between humans and robots. It may also have implications for the design of future human–robot interaction applications.

An appealing source to use when it comes to informing the development of tactile sensors for affective HRI concerns the field of HHI (human–human interaction). It has been suggested that valuable contributions to the field of HRI research can be derived from interaction studies in Ethology, Psychology, and the social sciences [17]. For example, App et al. [1] show that people tend to prefer tactile interaction over other modalities when communicating intimate emotions critical for social bonding. Gallace and Spence [22] argue, based on their review of interpersonal touch, that tactile interaction provides an effective way of influencing people’s social behaviors, for example, increasing their tendency to comply with requests. Studies like these are potentially very informative for the design of social robots, both in terms of how users may communicate social information to the robot, but also in providing input on how robots may act towards human users in order to increase positive user experience. However, to effectively draw from studies on HHI, we need to understand to what extent such theories can apply to interactive situations in which one of the parties is a robot instead of a human.

In order to test to what extent results from studies on interpersonal tactile communication generalizes to HRI, we take inspiration from the HHI study of tactile interaction conducted by Hertenstein et al. [25]. In that study participants were paired into dyads and assigned either the role of emotion encoder or decoder when communicating the emotions by touch. The encoder was instructed to convey eight emotions (anger, fear, happiness, sadness, disgust, love, gratitude, and sympathy), one by one, via touch to a blindfolded decoder. The emotion words were displayed serially to the encoder, who was asked to make physical contact with the decoder’s body using any type of touch he or she considered appropriate for communicating the emotion. Duration, location, type of touch, and intensity were recorded. After each tactile interaction, the decoder was asked to choose from a forced-choice response sheet which emotion was being communicated.

As for Hertenstein [25], the work reported is exploratory in nature regarding research into gender differences. However, as alluded to above, gender differences are found in a number of areas in human–robot (and machine) interaction and on that basis we hypothesize that there will be differences in some aspects of performance between males and females though we do not make specific predictions concerning either direction of the differences or regarding the specific aspects wherein differences may lie.

The results showed systematic differences in where and how the emotions were communicated, i.e., touch locations and which types of touch were used for the different emotions. The main result showed that all eight emotions were decoded at greater than chance levels and without significant levels of confusion with other emotions (for further details, see [25]).

The remainder of the paper is organised as follows: Sect. 2 describes the methodology of the experiment. In Sect. 3, the analysis and results are reported. Section 4 provides a discussion of the research results, making explicit reference and comparison to the work of Hertenstein et al. [25] and concludes by outlining projected future work.

2 Method

2.1 Participants

The sample comprised sixty-four participants (32 men and 32 women), recruited via fliers and mailing lists, from the University of Skövde in Sweden. The majority of the participants were undergraduate students in the age range of 20–30 years. Each participant was compensated with a movie ticket for their participation. No participant reported having previous experience of interacting with a Nao robot.

Participants were randomly assigned to one of two conditions that concerned the robot wearing, or not, tight-fitting textile garments over different body parts. Gender was balanced across the two groups (32 males and 32 females for each condition). The results concerning analysis of the effects of the robot wearing (or not) the textile garments is to be reported elsewhere [34]. The use of Nao in a clothed interface is here considered a controlled variable since the effects of interacting with a ‘naked’ versus an ‘attired’ robot are not clear or well documented in the HRI literature. In this paper, we grouped the 16 male/female subjects of the clothed versus non-clothed conditions into conditions solely for gender. This was done to enable a comparison with the study presented by Hertenstein et al. [25] in which they compared gender differences in the communication of emotions.

2.2 Procedure and Materials

Methodologically, we replicated the experiment conducted by Hertenstein et al. [25] in relation to encoder instructions and overall task (see the description of Hertenstein’s work in Sect. 1.2). Instead of pairing the participants into dyads, the ‘decoders’ were replaced with the Nao robot. The robot was unable to decode the emotions and due to this change, there was no decoding of the emotions being conveyed during the experiment.

Fig. 1
figure 1

Experimental set-up where the participant interacts with the Nao in the Usability Lab. This participant interacts with the Nao by touching left and right arms to convey a particular emotion. Camera shots are displayed and analyzed using the ELAN annotation tool

For each participant, the entire procedure took approximately 30 min to complete and took place in the Usability Lab. The lab consists of a medium-sized testing room furnished as a small apartment and outfitted with three video-cameras, a one-way observation glass and an adjacent control room. The Lab, and experimental set-up, is displayed in Fig. 1. The control room is outfitted with video recording and editing equipment and allows researchers to unobtrusively observe participants during studies. The participants entered the testing room to find the robot standing on a table. Nao is one of the most common robotic platforms used in research and education and thus considered to be an appropriate model on which to focus our human–robot interaction study. During the experiment, the robot was running in “autonomous life” mode, a built-in application of the Nao robot, designed to simulate life-like behavior. We considered this more naturalistic setting preferable for promoting interaction than a motionless Nao (switched off). As a result, the robot was at times turning its head giving the impression of establishing eye-contact with the human participant, and also showed slow micro-motions including simulated breathing and some arm motion. The robot did not, however, move around freely and all joints were configured with high stiffness, meaning that the participant could only induce minor movement of arms and other body parts of the robot. It may be argued that such an autonomous life setting compromises the controlled nature of our investigation. We viewed this as a problem of a trade-off between having a static robotic agent that may constrain the extent to which a human would wish to interact emotionally, and having a non-controlled ‘naturalistic’ interaction. In general, the robot would not give specific reactions to the different emotions, however, so while this setting may potentially increase inter-subject variability, it is less obvious that it would have specific gender, or emotion-specific, effects, i.e. in relation to the two variables under investigation.

Following Hertenstein et al. [25], eight different emotions were presented one at a time on individual cards in a random order. The participants were instructed to convey each emotion to the robot via touch. A set of five primary emotions: anger, disgust, fear, happiness, and sadness, and three pro-social emotions, gratitude, sympathy, and love, were used [25].

Participants were required to stand in front of the table on which the robot was placed. They were facing the robot and instructed to read the eight emotions written on the paper cards, one at a time, and for each emotion think about how to communicate that specific emotion to the robot via touch. The instructions said that they, when they felt ready, should make contact with the robot’s body, using any form of touch the participant found to be appropriate. Participants were not time-limited in their interactions as this was considered to impose a constraint on the naturalness or creativity of the emotional interaction. While the study was being conducted, one of the experimenters was present in the room with the participant and another experimenter observed from the control room. All tactile contact between the participant and the robot was video recorded. At the end of the experimental run, the participant answered a questionnaire regarding his or her subjective experience of interacting with the robot via touch (the results concerning the analysis of this questionnaire is reported in [2]).

2.3 Coding Procedure

The video recordings of tactile displays were analyzed and coded on a second-by-second basis using the ELAN annotation software.Footnote 2 During the coding procedure, the experimenters were naïve to the emotion being communicated but retroactively labelled annotation sets according to each of the eight emotions. Following Hertenstein et al. [25], four main touch components were evaluated by the experimenters: touch intensity, touch duration, touch location and touch type.

Each touch episode was assigned a level of intensity, i.e., an estimation of the level of human-applied pressure, from the following four-interval scale [25]:

  • No interaction (subjects refused or were not able to contemplate an appropriate touch),

  • Low intensity (subjects gave light touches to the Nao robot with no apparent or barely perceptible movement of Nao), 

  • Medium intensity (subjects gave moderate intensity touches with some, but not extensive, movement of the Nao robot),

  • high intensity (subjects gave strong intensity touches with a substantial movement of the Nao robot as a result of pressure to the touch).

Whilst tactile expression for a given emotion could involve many intensities, annotation of intensity entailed the intensity type that was expressed most in the interval between different emotion tactile expressions. While an objective measure of touch intensity is difficult to achieve without the use of tactile force sensors, the investigators made an effort to increase inter-rater reliability by carrying out parallel annotations. In pilot studies and over initial subject recordings, for any given subject, two investigators compared annotations for the emotion interactions. This comparison was based on 5 recordings from the pilot study and 4 subject recordings from the experimental run, annotated by both investigators and used as a material to come to an agreement for the coding practice. Once this was done, all video recordings were divided between the two investigators and annotated based on this agreed upon coding practice, and the initial annotations, mainly used as a practise material, were replaced by final annotations, which are the ones reported here. There were a few cases of equivocal touch behaviours that required the attention of both investigators to ensure an appropriate coding. However, these instances were considered a consultation and separate annotations were therefore not part of the work procedure. This approach was also applied in annotations for touch type and location.

Touch duration was calculated for each emotion over the entire emotion episode, i.e. from initial tactile interaction to end of the tactile interaction. A single interaction comprising, for example, two short strokes separated by a longer interval without contact was hence coded as a single (long) duration. As such, duration should be seen as a measure of the length of tactile interaction, not as a direct measure of the duration of physical contact between human and robot. We adopted this approach as a result of ambiguity as to when to objectively measure the point at which a touch interaction had started or ended, e.g. certain touch types like pat or stroke entail touching and retouching with variable time delays.

Fig. 2
figure 2

Diagram over body regions considered in the coding process for location of touch. Colors indicate unique touch locations. (Color figure online)

In order to analyze touch location, a body location diagram of the robot (Fig. 2) was created and used during video annotation. 16 unique body locations were considered: back, below waist, chest, face, left arm, left ear, left hand, left shoulder, left waist, occiput, right arm, right ear, right hand, right shoulder, right waist, scalp. Each location was coded zero or once during each interaction implying that locations touched several times during the same interaction was only counted once.

Following the methodology of Hertenstein et al. [25], type of touch was coded using the following 23 touch types: Squeezing, Stroking, Rubbing, Pushing, Pulling, Pressing, Patting, Tapping, Shaking, Pinching, Trembling, Poking, Hitting, Scratching, Massaging, Tickling, Slapping, Lifting, Picking, Hugging, Finger interlocking, Swinging, and Tossing. Our single-instance type annotation per emotion, presented a rather coarse approach; however, we avoided in our evaluation accounting for multiple touches of the same type for a given emotion that might provide a source of strong variance in the data.

Fig. 3
figure 3

Intensity ratings over emotions and genders. The stacked bar plots show female, (F) and male (M) ratings over the different intensity intervals per emotion. The x-axis shows total number of ratings per emotion as well as mean ratings over all emotions (right-most plot) for comparison. It can be seen that with the exception of anger (both male and female) and disgust (males) medium intensity ratings were highest

Hertenstein et al. [25] make reference to their use of the Tactile Interaction Index (TII) for attempting to provide objective standards to annotation. It has been described as using: "a complicated scoring system to measure, among other factors, the actual number and duration of touches, the location of touch and whether the areas touched are densely packed with nerve pathways [...], the degree of pressure on the skin and the specific type of action used in touching.”Footnote 3 Notwithstanding its not being publicly accessible, the TII was specifically developed, therefore, for human–human interaction. Touch type, therefore, as for touch intensity, was evaluated in the present study according to inter-rater agreement regarding annotation in a pilot phase and initial subject evaluations in the experimental phase.

3 Results

Results were analyzed according to the four criteria with which Hertenstein et al. [25] evaluated emotion encoding on their HHI studies: intensity, duration, location, type. It should be borne in mind that, unlike for Hertenstein et al. [25] experiments, the Nao robot is not able to decode the emotions being conveyed by the humans; we did not have an a priori measure of successful decoding of the emotions. However, we evaluated tactile dimensions along which, in principle, the Nao robot might be able to distinguish among the different conveyed emotions, i.e. to decode.

3.1 Encoding Emotions

Intensity

The number of each of the four intervals, no interaction, low intensity, medium intensity, and high intensity, for the emotions is displayed in Fig. 3, separated for male and female participants. Plots concern total number of ratings over the participants. Mean number of ratings per emotion were not analyzed as only one touch intensity per emotion was recorded by the experimenters. What is observable is a general tendency for emotions to be rated as of medium intensity. However, it is also salient that this is not the case for Anger, in particular for males, who were rated as showing predominantly Strong Intensity touch interactions.

Fig. 4
figure 4

Mean durations of tactile interaction from initial to final touch over each emotion. Females interact with the Nao for longer durations over all emotions (means) and differences are greatest (non-overlapping standard error bars) for sadness, love, disgust and fear emotions

Only Anger (both males and females) and Disgust (males) showed a predominant rating for an intensity category other than Medium Intensity by the experimenters. In these cases, Strong Intensity ratings were most frequent. It can also be observed that the Strong Intensity rating was more frequently applied to interactions by male participants, compared to females, for the primary emotions (the opposite being true for the pro-social emotions).Footnote 4 Tables of results (Tables 1, 2, 3) of the total different interval valuations for each of the 8 conveyed emotions for (a) the female participants, (b) the male participants, (c) for all participants, are given in “Appendix A”.

The tendency for experimenters to predominantly rate intensities as Medium may owe to experimenter bias in rating or participant bias similar to a non-committal central tendency bias (as is common to 5-point likert scales). We carried out a chi-squared test comparing frequencies of the four intensity categories over the two genders. Our value of \(\upchi ^{2}(3,\hbox {N}=64) = 2.141,\, p > 0.05\) showed there was no significant difference between the genders regarding recorded intensity of touch.

Duration

The duration of tactile interaction for a given emotion was recorded according to the initial touch and the final touch before the participant turned the next card (signalling the next emotion conveyance episode). Figure 4 plots means of such durations (emotion conveyance episodes) in relation to each emotion both for males and females.

It can be observed from Fig. 4 that females interact with the Nao robot for longer durations on average than males over all emotions. This is most evident for sadness and love. Using a two-way (mixed design) ANOVA with independent variables of gender (between subjects), and emotion type (within subjects) and WinsorizationFootnote 5 of 90% we found a significant main effect of gender: \(\hbox {F}(1,64) = 4.228,\, p = 0.0485\). There was no significant interaction effect between the two independent variables: \(\text {F}(7,64) = 0.877, p = 0.5259\); but there was a significant main effect for emotion type: \(\hbox {F}(7,64) = 10.838,\, p < 0.01\). See Table 4 for details.

Therefore, female participants tended to have longer duration tactile interactions with the Nao robot than male participants. Two-tailed post hoc (bonferroni correction) tests were carried out to test differences between the emotions conveyed. Only Sadness, Love and Sympathy yielded significant differences with respect to other emotions. Sadness was conveyed with significantly longer duration for all emotions except for love and sympathy (at \(p < 0.05\) level; see “Appendix B” for details). Sympathy was conveyed for significantly longer duration than Anger and Disgust, Love for longer than Disgust.

In summary, differences in duration of tactile interaction for the different emotions could be observed with Sadness being the dominant emotion in regard to duration of tactile interaction. Gender differences were also found (over all emotions) with females spending significantly longer to convey emotions than males. Data for this dimension showed a large degree of variability such that outliers were required to be dealt with (Winsorization was used). The reason for this was that the instructions vocalized by the experimenters did not request time-limited responding from participants regarding the conveyed emotions. Time-limitation was considered to be constraining on the modes of interaction conveyed and was thus avoided.

Location

Figure 5 displays the mean number of touched locations during interaction separated for each emotion and for gender and where individual touched regions per emotion were only recorded once. As is visible in the figure, Disgust yielded the most limited interaction for both genders, with a mean of fewer than two locations touched. Love resulted in the most plentiful interaction overall with a mean for females of greater than 5 regions involved in each interaction. The exact number of touches for each location is found in “Appendix A” (Table 5).

Fig. 5
figure 5

Mean number of touched locations during interaction. The mean values represent the number of touches per participant for each gender

Using a two-way (mixed design) ANOVA with independent variables of gender (between subjects), and emotion type (within subjects) we found a significant main effect of gender: \(\hbox {F}(1,64) = 13.05,\, p <0.01\) (females touched more locations), and also for emotion type: \(\hbox {F}(7,64) = 11.512,\, p < 0.01\). However, there was no significant interaction effect between the two independent variables: \(\hbox {F}(7,64) = 1.4,\, p = 0.2024\). Bonferroni correction tests found: Love > Fear, Love > Anger, Love > Disgust, Love > Happiness, Love > Gratitude, Love > Sympathy, Happiness > Disgust, Sadness > Disgust, all at \(p<0.01\).

In summary, females showed a tendency to touch the Nao over more areas particularly with respect to Love, while Love, per se provoked subjects to touch more areas than most other emotions.

Fig. 6
figure 6

Heat maps depicting touch distribution over gender, averaged over all emotions. The different locations on Nao are visualized according to amount of red in relation to numbers of touches. Darker red indicates a higher number of touches over all the participants. The percentage of all touches is in brackets for each touch location. Sc scalp, Fa face, RS right shoulder, LS left shoulder, RA right arm, LA left arm, RH right hand, LH left hand, BW below waist, Ch chest, Oc occiput, LE left ear, RE right ear, Ba back, LW left waist, RW right waist. (Color figure online)

Figures 6 and 7 present the frequencies of touched locations for gender and emotion, respectively. The difference between male and female participants described above is here reflected in a larger involvement of the head for female participants. Both male and female participants, however, touch the arms and hands most frequently and involve feet and legs to a very small extent.

Looking at the touch frequencies for each emotion (Fig. 7), Gratitude corresponds to a high percentage of right-hand touches. This correlates with the high amount of hand-shaking observed by participants in the experiment and is corroborated by the Type data analyzed (see Type section). Disgust is characterized by Chest poking or pushing actions (see Type subsection)—in general participants minimized the amount of touches conveying this emotion. Anger (and somewhat Fear) was focused on the upper torso. Love and Sadness shared a profile of more distributed touch. Sympathy and Happiness were focused more on the arms and hands of the Nao.

Fig. 7
figure 7

Heat maps depicting touch distribution over emotions (both male and female participants). The different locations on Nao are visualized according to amount of red in relation to numbers of touches. Darker red indicates a higher number of touches over all the participants. The percentage of all touches is in brackets for each touch location. Sc scalp, Fa face, RS right shoulder, LS left shoulder, RA right arm, LA left arm, RH  right hand, LH left hand, BW below waist, Ch chest, Oc occiput, LE left ear, RE right ear, Ba back, LW left waist, RW right waist. (Color figure online)

Fig. 8
figure 8

Touch type over emotions and gender. The seven most common touch types are presented individually for each emotion

Type

The seven most frequently used types of touch are presented in Fig. 8. On average, these seven touch types constitute 85% of all tactile interaction. Participants use squeezing (29%), stroking (16%), and pressing (14%) most frequently. Pulling, trembling, and tossing, are never observed during any interaction. Happiness stands out by involving a relatively large proportion (12%) of swinging the robot’s arms, not observed during other emotions. Male participants show a general tendency to predominantly use squeeze for each conveyed emotion. Only in the case of disgust is another touch type dominant (Push). By contrast, female participants use squeeze as the dominant touch type in 3 of the 8 emotions: Fear, Happiness, Gratitude. Push (Anger, Disgust), Stroke (Sadness, Sympathy) and Hug (Love) are other dominant emotion types expressed. Overall, females thereby appear to show a greater variety of tactile interactions. However, gender differences did not reach significance when applying the \(\upchi ^{2}\) test to type of touch patterns for the individual emotions (see “Appendix B”).

In summary, when encoding emotions from human to robot, the following results stand out:

  1. 1.

    Sadness was the emotion conveyed for longer time than all the ‘basic’ emotions and longest overall (independent of gender);

  2. 2.

    Females tended to touch (convey emotions to) the Nao robot over a longer duration than Males;

  3. 3.

    Love was the emotion that evoked the highest number of locations touched on the Nao;

  4. 4.

    Females tended to touch more locations than Males;

  5. 5.

    Females showed a greater variety of touch types than Males (though results were not significant).

The results suggest that Female participants were typically more emotionally engaged with the Nao robot than were Male participants in support of our hypothesis that there would be differences in interaction behaviour with the Nao between males and females. The pro-social emotions of Love and Sadness were more expressed although based on these results this could signify greater uncertainty of expression or alternatively greater engagement in relation to these emotions.

As a final point, by evaluating single, rather than multiple, touch types per emotion, and giving one intensity rating over the emotion interval, it is possible that this would have brought intensity values closer to medium ratings. However, it was observed that typically intensities of interaction didn’t vary so much, particularly in relation to multiple touches of the same type.

3.2 Decoding Emotions

Unlike the Hertenstein et al. [25] experiment upon which our HRI study was methodologically based, the Nao robot was a passive recipient of touch, i.e. lacking the sensory apparatus to decode the emotions conveyed. Nevertheless, the patterns of affective tactile interaction observed during experimentation provide clues as to the critical dimensions of touch requisite to disambiguating the emotional or affective state of the encoder. This in turn can inform robotics engineers as to which types of sensors, and their locations, are most suitable for a Nao robot seeking to interpret human affective states. It can also inform as to the types of post-processing (e.g. classification algorithms and input dimensions) that are most relevant for decoding emotions. Therefore, here we derive Systems Design based insights from our study.

In Fig. 9 is visualized a Support Vector Machine (SVM) classification of emotional valence—specifically, the valence of emotional conveyance. We used Matlab for the 2-dimensional SVM classification. We analyzed mean values for the two dimensions—number of different locations touched and duration of touch—in order to classify the emotions. 2-dimensional classifications according to gender can be seen in “Appendix C”.

The SVM classification here effectively has used the data provided by the participants in our experiment as a training set. In principle, new emotions conveyed could be classified into one or two of the valenced affective states such that the Nao robot has a fundamental affective understanding of the meaning of the tactile interaction of the human. Nevertheless, individual variance is such that any affective tactile interaction would have to be calibrated (require some re-training) on a case-by-case basis.

Fig. 9
figure 9

A support vector machine classification of emotional conveyance valence by number of location and duration of touch. Emotion mean values are classified according to their positive or negative meaning (either side of the hyperplane). Note, sadness here is classified as an emotion that is conveyed positively (for consoling the agent). Circled are the support vectors. (Color figure online)

The results above-described have shown that emotions (primary and pro-social) of the type used by Hertenstein et al. [25] in their human–human study are conveyed differentially along a number of dimensions—intensity, duration, location and type. Along with specific differences found regarding the emotions being conveyed, it was found that classifications of emotional tactile interaction according to valenced emotional conveyance provides a useful means by which emotions may also be decoded in robots (specifically the Nao robot here). The two dimensions of number of different locations and duration of locations provide hints at the types of sensors, and their distributions over the robot, needed for emotional intention to be interpreted by the robot (see Discussion section).

Fig. 10
figure 10

A support vector machine classification of rejection versus attachment affective state conveyance valence by number of location, duration of touch and intensity of touch. Here is depicted training data (50% of all data used) for each gender for rejection emotions (disgust and anger) and Attachment emotions (love and sadness). Left: female SVM classification. Right: male SVM classification. Support vectors are not depicted here for purposes of clarity of visualization

Fig. 11
figure 11

Confusion matrices for females (left) and males (right). The matrices were calculated using the SVM hyperplanes in Fig. 10. Female data, overall, were more accurately classified using this approach, especially with respect to the reject emotions

In Fig. 9, a strong distance between means can be seen regarding Anger and Disgust, with respect to Love and Sadness (even greater in Fig. 12 where Anger is the most intensely expressed emotion). We decided to pool data for all female and male subjects over Anger and Disgust (Rejection emotionsFootnote 6) and over Sadness and Love (Attachment emotionsFootnote 7, where Sadness appears to be expressed in a consolatory manner). As can be seen in Fig. 10, most data for both females and males for Rejection emotions are clustered around high intensity, low duration and low location number touch whereas Attachment emotions are more distributed with typically higher duration and location number. Figure 11 shows, based on the (linear) decision hyperplanes generated in the respective SVM training phase for females and males, the classification accuracy for the remainder of the data points. Note, the partitioning of data into training and test/classification sets was arbitrary and we ran 25 such tests selecting the partitioning (model) that provided the greatest accuracy for Rejection-Attachment classification.

In summary, our results for decoding emotions suggest that affective tactile interaction may be classified according to:

  1. 1.

    valence—positive and negative emotions conveyed seem amenable to classification given that individual calibration (much inter-individual variance) is accounted for by the robot;

  2. 2.

    rejection versus attachment—these two particularly important social affective types appear amenable to classification based on touch alone.

Much research has highlighted the benefits of having multiple modalities of sensory input so as to decode affective states (e.g. [5]), including with reference to decoding tactile (gestural) inputs [12]. Furthermore, Hertenstein et al. [25] found a lower mean percentage correct classification/decoding for the Rejection (64%) and Attachment (59%) emotions identified above, than we did in our study. However, each emotion was decoded in reference to all other emotions in this case. The fact that we obtained reasonable classification accuracy using tactile interaction as the sole sensory modality for conveying emotion on the robot indicates that there is some potential to use the encoder results to provide a basis for Nao to decode emotions according to the touch properties of duration, location number and intensity. Imbuing the Nao with appropriately placed sensors and perception/learning algorithms would potentially allow the robot, thus, to perceive the affective state (e.g. valence, rejection versus attachment) of the interacting human by touch alone, particularly when the robot is calibrated to the individualFootnote 8.

4 Discussion

In this article, we have reported and analyzed findings of an experiment detailing how humans convey emotional touch to a humanoid (Nao) robot. The experiment closely followed the methodological procedure of Hertenstein et al. [25] and compared touch behaviour between male and female participants. Our main findings are as follows:

  1. 1.

    Females convey emotions through touch to the robot for longer durations than do males.

  2. 2.

    Females convey emotions over a larger distribution of locations than do males.

  3. 3.

    Females show a greater variety of touch types over all emotions compared to males (but not significantly so).

Thus, we found females were more emotionally expressive than males when conveying emotions by touch to the Nao robot. This is consistent with our hypothesis that we would find differences between female and male robot interaction behaviours.

Additionally:

  1. 4.

    Sadness is the emotion that is conveyed for the longest duration over both genders.

  2. 5.

    Love is the emotion that is conveyed over the largest distribution of locations.

  3. 6.

    Emotions may be classified by conveyance valence, and decoded (by a Nao robot), according to location number and duration of touches.

  4. 7.

    Emotions may also be classified in relation to location number, duration and intensity, when pooled into Rejection (Disgust and Anger) and Attachment (Love and Sadness/Consoling) based affective states.

Evidence for a number of other emotion-specific findings were also found: (i) anger was the most intensely expressed emotion, (ii) anger, disgust and fear were expressed for shortest time and over the fewest number of locations. In general, we found negative conveyance emotions (anger, disgust, fear) were typically conducive to expressivity than positively conveyed emotions (happiness, sadness, gratitude, sympathy, love) where sadness was seen to be expressed as a gesture of consolation not dissimilar to sympathy and love.

Despite subjects being instructed to “imagine the you have these emotions and that you want the robot to understand how you feel by touching it in ways that you feel is relevant for conveying each specific emotion”, sadness, apparently, was interpreted more as a pro-social emotion. It could be considered in terms of conveying empathy or sympathy to the robot. Expression of sadness as a pro-social emotion (i.e. being responsive to another’s sadness) versus empathy (i.e. ‘feeling’ another’s pain), however, may be different. For example, Bandstra et al. [4] found that children were more behaviourally responsive when expressing pro-social sadnessFootnote 9 than empathy. One might feel another’s pain but not be unduly worried about it! On this reading, sadness, expressed by subjects in this experiment, was a pro-social emotion that did not necessarily entail an empathic component.

4.1 Human–Robot and Human–Human Interaction: Scientific Implications

As alluded to throughout the article, our HRI research has taken strong inspiration from the work on HHI of Hertenstein et al. [25]. While there are some methodological differences between the present work and the replicated study on human–human tactile communication, we see several strong similarities in the results. Intensity of touch and duration of touch, for example, followed similar patterns of interaction in our HRI investigation as can be seen in “Appendix D”. The (three) most and least categorized emotions according to the four annotated intensity types (no interaction, low, medium, high) are observably comparable in both ours and Hertenstein’s investigations. For example, Anger and Disgust are similarly annotated as being of high intensity (or involving no interaction) whereas pro-social emotions (Love, Gratitude, Sympathy) are more typically conveyed through low or intermediate intensity touch. In relation to duration, many emotions are similarly conveyed in both human–human and human–robot investigations. For example, Sadness and Sympathy are of relatively long duration in both our results and in the study by Hertenstein et al. Interestingly, Fear and Love are conveyed differently in the two studies. In Hertenstein’s HHI study, Fear is of longest duration whereas in our HRI study it constituted one of the shortest duration emotions conveyed. Love is conveyed with the second shortest duration in the Hertenstein study, while in our HRI study it is one of the emotions conveyed over the longest duration. Low duration conveyance of Love, to our understanding, is not an intuitive result. A possible explanation for Hertenstein’s finding is that humans find it awkward to convey such an intimate emotion as Love to another human stranger while to a small robot conveyance of such an emotion is less intimidating. Such a divergence in our results might even indicate that there is an important scientific role for artificial systems to play in understanding emotional tactile interaction. This interpretation gains weight when we consider the results of our questionnaires regarding the ease and confidence with which the subjects perceived their conveyance of Love. This was perceived to be expressed more easily and confidently than for all other emotions (see [2]).

In relation to type of touch, comparable findings over the HHI and HRI investigations, Hertenstein et al. [25] report that “fear was communicated by holding the other, squeezing, and contact without movement, whereas sympathy was communicated by holding the other, patting, and rubbing” (p. 570). We observed a similar pattern with squeezing and pressing being the dominant touch types used for communicating Fear, while stroking was most frequently used when communicating Sympathy. Furthermore, in line with Hertenstein et al., we found several significant gender differences regarding how emotions are communicated using touch. Male participants appear to use high intensity interaction when communicating primary emotions to a larger degree than female participants, but for a shorter duration. Female participants are more varied in their interaction, touching more locations on the robot and using a larger set of different types of touch, compared to male participants.

Going beyond a comparison with Hertenstein’s study, it is noticeable that similar results have been found in Psychology research and investigations of HHI in relation to touch and gender differences. One of the most well-known studies [28] shows that females touch other people, both females and males, on more regions of their body than do males in their tactile interaction. The most frequently touched body parts in HHI are hands, arms (forearms and upper arms), shoulders, and head [15, 42] and that is consistent with our study in which both the male and female participants most frequently touched the robot’s arms and hands.

There are some other notable differences between the results observed in the present study, and those reported by Hertenstein et al. [25]. Firstly, Hertenstein et al. reported no significant main effects of gender in terms of decoding accuracy, that is, the perceiving person’s ability to identify the communicated emotion. In the present study, we do not have a measure of decoder accuracy but as discussed above, several other effects of gender were found. It should be noted that Hertenstein et al. only tested gender effects in relation to male-female combinations of dyadic (encoder-decoder) interactions and the accuracy of decoded emotions, and not with respect to the properties of emotions communication. This opens up at least two possible interpretations: (1) that the gender differences found in the present study are not present in HHI, (2) that our results also apply to HHI but that observed gender differences in how emotions are communicated via touch do not affect the accuracy of communicated emotions. Furthermore, while there is high consistency regarding most types of touch over all communicated emotions, Hertenstein et al. reports more frequent use of lift, shake, and swing than observed in the present study. This may be a result of the robot being configured with stiff joints, making it difficult for the participant to use touch types involving movement of the robot.

4.2 Human–Robot Tactile Interaction Systems Design

From the perspective of Systems Design and HRI, it is worth noting that three touch types, squeezing, stroking, and pressing, constituted more than half (59%) of all tactile interaction in the study. While a detailed analysis of the information content in each touch component is beyond the scope of the present work, the present findings suggests that encoding and decoding of these three touch types are critical for successful human–robot tactile interaction. Furthermore, as presented in Sect. 3.2, number of different locations touched and duration of touch proved to be particularly informationally critical in the decoding of emotions from tactile interaction. Somewhat surprisingly, the intensity of touch appears less informative for decoding emotional content. There was a predominance of intermediate intensity encodings, which may reflect a central tendency bias in either or both annotator and participant behaviour. Hertenstein et al. [25] refer to the use of the Tactile Interaction Index (TII) of Weiss [52] but we were unable to adopt this approach to intensity annotations in our investigation as we were unable to obtain the TII. Subsequently, we relied upon inter-rater agreement regarding estimations of touch intensity (and type).

The present findings can also be viewed in relation to the existing positioning of tactile sensors on the Nao robot. The Nao has seven tactile sensors, three on the scalp, two on the back of the hands, and two bump sensors on the feet. While the hands are frequently involved in tactile interaction, the scalp constitutes less than two percent of all tactile interaction in the present study. No tactile sensors are placed on the arms that are the most frequently touched locations.

The fact that our HRI study found a general tendency for females to be more expressive than males suggests that positioning/distribution of sensors may need to account for the particular application domains in which the robot (specifically Nao in this case) is used. Robots in HRI domains are often used for teaching assistance, elderly care/assistive living, companionship. If the primary users are one gender or another, sensor positioning and number may need to be considered.

Further design considerations concern the use of fabrics embedded with sensors that provide a robot wearable/interface (see Lowe et al. [34]). Such wearables need not only have the sensors appropriately distributed on the robot’s body, but should also allow for the sensor properties to be utilized. Sensors may be sensitive to pressure for registering touch types such as squeeze and press. They may also be implemented as arrays to record stroke or rub touch types. Wearables embedded with smart sensors exist [cf. Maiolino et al. [35], Yogeswaran et al. [53]] that serve as effective suits whose primary role, however, is to provide the robot with tactile information for its own safety and to provide a softer surface interactive interface for facilitating human safety. In relation to affective-based interactions, if the wearable materials are not conducive to such interactions, e.g. do not visibly afford touch, the sensors will not be so well exploited. Of further relevance is how the Nao (or a given robot) should perceive and respond to affective touch. Our results (Sect. 3.2) indicate that affective valence (positively conveyed versus negatively conveyed emotions) may be detectable according to the dimensions of duration of touch and distribution of locations touched. Such perception naturally requires calibration to individual humans.

While the similarity between the present results and the results reported by Hertenstein et al. [25] is notable, it is still unknown to what extent these results hold also for other robot models, including non-humanoid robots. Evaluating different morphological properties of robots and other artificial systems would be requisite to furthering understanding of the factors that influence human conveyance of emotion-based touch. The present results are likely to be dependent on the appearance and shape of the robot, and to what extent people see the robot as another agent, or merely an artefact.

Results may also be dependent on the interaction context and the placement of the Nao. For example, placing the Nao on the floor is likely to change the interaction pattern, at least in terms of where the robot is touched. Another limitation of this study lies in the age of participants. It would, for example, be interesting to compare these results to children interacting with the robot. Children constitute one important target group and may be less constrained by social conventions. This may be particularly relevant to furthering the understanding of tactile conveyance of intimate emotions such as love where adults may feel comparatively inhibited.

4.3 Human–Computer Tactile Interaction Systems Design

Tactile interaction for use in human–computer interaction (HCI) is of growing interest with increasingly broadening applications [48]. The nature of differentiated affective or emotional tactile interaction in artificial systems has most obvious application to physically embodied agents with morphologies comparable to humans. However, affective tactile interaction may have more general application to HCI, for example in the shape of digitally mediated emotions with the use of haptic devices [3], or as a facilitator of social presence in relation to virtual agents (cf. [51]). It has been found that hand squeezes, using an air bladder, improves human relations with virtual agents [6]. In the context of tactile interaction that may be informative for both virtual agent technology and for HRI, Cooney et al. [12] investigated how people conveyed affectionate touch (types) to two different types of humanoid (adult-sized) motionless mannequins. They found that humans were able to accurately decode touch types according to a classification algorithm. They did not, however, evaluate how well people interacted with a real, moving, robot nor did they look at the conveyance of specific emotions (including negative emotions). The domain on non-humanoid artificial agents also provides an application area for affective tactile interaction. An example of an artificial creature (robot) designed to encourage haptic (tactile and kinesthetic) interactions is the Haptic Creature of Yohanan and MacLean [55]. This creature is simultaneously able to (i) sense touch and movement using an array of touch sensors and an accelerometer, respectively, and (ii) display its emotional state through adjusting the of its stiffness ears, modulation of its breathing and producing a (vibrotactile) purring sound. Use of such non-humanoid robots, however, may ultimately be limiting with respect to the types of affective touch interactions that are permissible and natural.

4.4 Further Study

Follow-up studies are envisioned to take the form of revisiting our Human–Robot tactile interaction scenario using different robots and also different subjects, e.g. children. Present work concerns an ongoing investigation using Aldebaran’s PepperFootnote 10 robot. In general, a different robot morphology may afford different touch types more than others. We also plan to utilize smart textile sensors [14, 30] on the robot (e.g. Nao) distributing the sensors on a wearable (Wearable Affective Interface, or WAffI—see Lowe et al. [34]) in accordance with our findings. Different textiles may also affect the extent to which human subjects utilize particular touch types, e.g. squeeze, press, as a function of the elasticity of the material. Further studies are required to also take into account the mitigating effects of environmental settings for the HRI. Nevertheless, we believe that our findings, presented in this article, as well as those in Lowe et al. [34], can directly influence the positioning, selection, and development of tactile sensors for robots, and possibly other artefacts. Finally, we see potential to investigate in more depth interaction regarding specific emotions. This is particularly relevant where the use of a robot may make participants feel more comfortable when communicating some emotions (such as Love) than when communicating the same emotion to another human stranger.