1 Introduction

Touch plays a crucial role in building emotional connections during human development and human-human interaction [22, 29]. One natural touch mechanism is lighthearted hand-to-hand contact, from hand-clapping games between children to high fives among teammates or coworkers. Accordingly, as robots enter more human-populated environments and seek to connect with people, we anticipate that physical human-robot hand-to-hand touch will be an important future source of mutual human-robot learning and bonding (Fig. 1). The widely varying kinesthetic and tactile aspects of these interactions result in a rich but complicated design space for human-robot interaction (HRI) researchers. Previous social-physical HRI (spHRI) studies have compared human-robot handshake behavior [1, 3], object handover modes [8, 18], and other human-robot tactile interactions further described in Sect. 2.

Our previous work details how we created a safe and capable hand-clapping robotic system by developing hand-clapping trajectory models and hand-contact detection strategies for the Rethink Robotics Baxter Robot [15], leading to the prototype used in the reported study. Here, the term “hand-clapping” refers to tempo-matching hand-to-hand contacts between two agents. In the style of early research steps in the aforementioned related literature, this article focuses on the thorough analysis of an initial human-robot experiment. After testing with one user in [15], we designed this experiment to expose many more users to our new hand-clapping robot behavior while investigating the implications of different robot contact acknowledgment modes, trajectory variability modes, and stiffness settings during interactions at various tempos.

Preliminary results from this study were reported in [16], looking only at users’ overall qualitative reactions to the robot. This article augments these initial findings by examining the effects of the different tested conditions and by analyzing additional data gathered throughout the investigation. This controlled initial experiment is an ideal springboard for understanding more complex future human-robot hand-clapping game interactions like “Pat-a-cake” and “Slide.” Other ongoing aspects of this work explore the usefulness of these hand-clapping games in interaction scenarios like promoting bonding before human-robot teamwork [13] and encouraging older adults to stay active by carrying out light exercise in their homes [12].

After summarizing related work (Sect. 2), this article describes how we tested our hand-clapping robot’s abilities by conducting a user study (Sect. 3), extracting results from the experiment data (Sect. 4), considering the findings alongside the limitations of this research (Sect. 5), and summarizing overarching lessons (Sect. 6). We believe this research will aid and inform others who are interested in designing complex spHRI.

2 Related Work

Our work builds on two main areas: social robotics and physical human-robot interaction (pHRI). Combining these two fields creates the potential for emotional interaction augmented by a direct physical connection, as surveyed in [2]. Classic examples of effective social-physical robots are the robotic creature [33], the Huggable [30], and Paro [28]. In its various possible application areas, spHRI can serve functions from enhancing mental healthcare treatment [25] to helping a robot more fully comprehend human intention [19]. spHRI can even encourage communication from developmentally delayed children who otherwise would be unlikely to interact [26]. Our work looks to leverage the advantages of spHRI through human-robot hand clapping.

Fig. 1
figure 1

The Baxter robot clapping hands with a person

As a close parallel to our empirical investigation of hand-clapping HRI, research on human-robot handover tasks has illustrated that human users prefer minimum jerk trajectories along with other human-inspired robot motion controllers [8, 18]. Human-robot handshake experiments have demonstrated ways to shape human-like robotic handshake algorithms using pure imitation or machine learning [3] and leveraging affective interaction design [1]. Although one study of robot touch in a medical setting indicates that users prefer pragmatic robot touch (washing the skin) to affective robot touch [7], we believe well designed playful robot touch in tempo-matching clapping games may be a more natural-feeling interaction than a comforting pat from a robot in a mock medical setting. Like our work, all of these projects used empirical HRI experiments to compare subjective user responses to different robotic interaction modes.

Our experiment explores several interaction modes akin to those investigated in [1], a similar study of affective spHRI that focused on human-robot handshakes rather than hand clapping. In combination with investigations from the field of haptics, the affective handshake experiment inspires us to push the bounds of spHRI. Some haptics research delineates the perceived human- or robot-likeness of different robot control methods [7], including human motion data-driven strategies like the one used by our robot. Other work related to altering human-robot hand contact sensations includes studies on realistic contact rendering in virtual environments [11] and haptic illusions of softness [31]. This article pursues the understanding of haptic interactions with hand-clapping robots by investigating variables such as arm stiffness and system behavior during hand contact.

Although we believe our work is the first research focused on human-robot hand-clapping games, some past studies explored similar human-human activities. For example, investigations of human-human social motor coordination reveal that people are likely to engage in hand-clapping games throughout many stages of life and are skilled at synchronizing during this type of activity [27]. Study of jointly improvised motion lends similar insights about human-human synchronization and additionally reveals that expert improvisers may outperform others in some social motor coordinations tasks [24]. Additionally, [4] proposes a video game that uses electrodermal activity-sensing controllers to detect hand-to-hand contact between players for more enjoyable gameplay. In a similar way, [21] outlines the design and testing of an electrodermal activity-sensing wrist-worn watch designed to increase intimacy in a workplace environment.

HRI studies on other play applications also shaped our research. In educational environments, playful social robots have been shown to improve English language learning and help researchers estimate human friendship [20]. Qualitative evaluations helped researchers design affective understanding and reactions for the Haptic Creature [34]. Researchers have also observed natural human-robot play for social machine-learning applications using a small humanoid [9]. Practices in these previous studies indicate that we should collect qualitative data alongside quantitative subjective and sensor data in our initial hand-clapping HRI investigations.

3 Methods

We ran an experiment with the Rethink Robotics Baxter Research Robot to explore how human users perceive robotic hand-clapping playmates. Here, we were especially interested in how various robot interaction modes would affect impressions of robot safety and attributions of emotion to the robot. Accordingly, we gathered users’ subjective responses to different styles of robot behavior while recording Baxter’s sensor data and full-face video of the participant throughout the study. In this initial work, we selected a coarse sampling of conditions over the design space of possible robot interaction modes to gain a general understanding of this previously unexplored area. The Penn IRB approved all experimental procedures under protocol 823886. The video included as supplementary material complements this Section’s written explanation of the study methods.

3.1 Robotic Platform

We conducted this experiment using Baxter, a human-sized humanoid robot designed for interactive factory tasks. Baxter has a torso with two 7DOF arms, interchangeable grippers, and a panning head screen, as seen in Fig. 1. Although its intended use is pHRI in a factory setting, Baxter’s humanoid design makes it an ideal candidate for spHRI. The robot’s series elastic actuators, impact-absorbing shells, and fully backdrivable joints allow it to safely contact people in social scenarios. Further advantages stem from its standard Robot Operating System (ROS) framework and relatively affordable price (\(\sim \)$25,000).

Fig. 2
figure 2

Upper left: Baxter’s built-in finger alignment rails with regularly spaced threaded holes. Lower left: our fabricated Baxter hand with \(\hbox {M}4\times 0.7\) screws compatible with the alignment rails. Right: the custom end-effector mounted on Baxter’s arm

The commercially available Baxter parallel-jaw end-effectors proved unsuitable for hand-clapping interactions, so we developed custom 3D-printed non-articulating hands with inlaid silicone rubber contact pads, as shown in Fig. 2 [15]. Part files for our custom hands are available under a Creative Commons license at http://www.thingiverse.com/thing:2286104. These end-effectors are average human-hand size to facilitate comfortable handclaps. The inlaid rubber protrudes beyond the hard plastic of the 3D-printed hand core to facilitate comfortable hand-to-end-effector contact that mimicks the sound of a human handclap.

3.2 Experiment Setup

Twenty participants (12 male and 8 female) enrolled in our study, gave informed consent, and successfully completed the experiment. Participants ranged from 19 to 38 years (\(\hbox {M} = 27.2\,\hbox {years}\), \(\hbox {SD} = 5.0\,\hbox {years}\)) and were mostly technical students (17 students, 1 educator, 1 postdoctoral researcher, 1 homemaker). Each person came to the lab for a single session that lasted about 1 h. The participant stood facing Baxter throughout the experiment and engaged in repeated palm-to-palm contacts between their left hand and the robot’s right end-effector, in the style of hand-clapping games. We designed this left-handed interaction to increase participant mental load during the experiment; all but one of the participants were right-handed. To keep participants as safe and comfortable as possible, the human-robot clapping location remained consistent and therefore predictable to users throughout the entire study. Because of this consistent clap location, it was easy for participants to step away from the claps or exit Baxter’s workspace if they ever wanted a break for any reason. This experiment examined only one type of hand-clapping motion to increase the likelihood of successful execution by participants, but we are exploring other motions in ongoing work.

At the beginning of each session, the experimenter introduced Baxter to the participant and led the user in a practice human-human round of the hand-to-hand contact involved in experiment trials. Next, in each of 24 randomly ordered human-robot interaction trials, Baxter began to move its right end-effector along a hand-clapping trajectory at one of three possible hand-clapping frequencies (60, 110, or 160 beats per minute (BPM)). We chose these specific tempos based on our past human-human hand-clapping study [15]. In that previous work, the fastest two tempos were so stressful that the temporal demand of keeping up dominated the interaction experience. Accordingly, we chose the three remaining tempos from that work that were neither boring nor unduly stressful.

The participants started to play each trial’s clapping game by contacting the robot after they believed they understood the robot’s intended tempo and continued hand clapping for about 20 s, after which Baxter returned to its starting pose. We asked participants to try to maintain the constant tempo set by the robot at the start of each interaction. A final human-robot interaction trial allowed users to verbally select their favorite robot behavior mode and clap hands with the robot in that mode for as long as they desired.

3.3 Conditions

This experiment was designed to explore what people think about hand-clapping robots and quantify how different visual and haptic variables affect how users feel about the interaction. Our previous observations of human-human hand-clapping interactions focused on arm behaviors [15] and overall human emotional reactions [16]. Accordingly, to keep participants engaged and explore themes similar to the foci of our human-human observations, we designed different styles of Baxter facial animation, arm trajectory, and arm control. The following subsections detail how we designed each independent variable. To the best of our ability, we kept all other aspects of Baxter’s behavior the same from trial to trial.

Fig. 3
figure 3

Left: the default mildly positive Baxter facial expression. Right: the responsive facial expression used to animate the facially reactive robot

3.3.1 Facial Animation

Facial expressiveness can greatly affect human perception of a robot [14]. To begin exploring how differences in Baxter’s face affect users, we designed facially reactive and facially nonreactive robot modes. It is important to note that this experiment considered facial responsiveness mainly as a way to signal human hand impact awareness and did not attempt to explore the large design space of Baxter facial expressiveness. Our work in [14] more rigorously assesses Baxter face image effects. Figure 3 illustrates the two expressions used to create the different robot face modes. These Baxter facial animation frames are consistent with several principles of robot face design discussed in [10]: wide facial proportions, more detail in the eyes than any other facial feature, and presence of a mouth. The face design is intended to convey mild, but not uncanny, humanness.

In facially nonreactive mode, the robot’s screen remained on the mildly positive face image at all times. In facially reactive mode, the mildly positive face was the default screen animation, and the responsive face appeared for 0.2 s after each detected hand contact. We identified hand impacts using the accelerometer in Baxter’s moving hand [15]. Specifically, we thresholded the result of filtering the x-axis of the right wrist accelerometer using a discrete-time first-order Butterworth high-pass filter with a cutoff frequency of 25 Hz. The threshold was set to 4.0, 5.0, and \(5.5\,\hbox {m/s}^2\) for the clapping tempos of 60, 110, and 160 BPM, respectively, to balance false positives with false negatives.

Fig. 4
figure 4

Illustrative examples of physically unreactive versus physically reactive 60 BPM trajectories in Baxter W1 joint space. Each plot includes markers indicating hand contacts and the resulting trajectory after hand contact

3.3.2 Trajectory Variation

A robot that can dynamically vary its hand trajectory based on human interaction behaviors may be perceived differently than a fixed-trajectory robot. Accordingly, another salient haptic interaction aspect to explore was the physical arm trajectory logic. Our previous investigation informed the employed strategies of hand-clapping robot movement and contact detection [15]. We found human motion to be generally sinusoidal and designed our robot to move with this same default behavior, as depicted in the top subplot of Fig. 4. Our previous work revealed a negative correlation between human hand-clapping frequency and amplitude [15], which we used here to determine appropriate amplitude values for each clapping frequency. Throughout experiment trials, only the wrist pitch (W1) joint of Baxter’s right arm was commanded to move. The other joints were commanded to stay stationary, but their series elastic actuators are always soft, so the wrist motion caused passive natural-looking motion from the rest of the arm. The motion equation for the nonreactive mode of the robot’s only active W1 joint was as follows, using variables for desired joint angle (\(\theta _{d}\)), amplitude of motion (A), frequency of hand contacts in Hz (f), time (t), initial joint angle (\(\theta _\mathrm {init}\)), and a factor to keep handclap location the same regardless of tempo (\(a_\mathrm {comp})\):

$$\begin{aligned} \theta _{d} = A\sin (2\pi f t + (\pi /2))-A+\theta _\mathrm {init}+a_\mathrm {comp} \end{aligned}$$
(1)

The \(\pi /2\) shift causes Baxter’s wrist to start at the farthest retreat location compared to the human partner.

In contrast, the robot’s variable trajectory mode control strategy reacted dynamically. Whenever Baxter detected a handclap in this mode, its governing algorithm fit a cubic polynomial trajectory back to the extreme retreat location, as shown in the lower portion of Fig. 4. The start and end points of the trajectory were known because the start was simply the position and time of contact detection and the end was the farthest retreat location at a time calculated in order to maintain a constant robot hand-clapping frequency; the desired start and end velocities were zero because they represented instants of motion direction change. After achieving the retreat position, the robot returned to a sinusoidal approach trajectory until the next handclap. Later discussion will note that hand contacts very early in a motion cycle mandated a small cubic polynomial slope and thus slowed Baxter motion to maintain a constant clapping frequency.

3.3.3 Stiffness

Stiffness has been shown to influence how people perceive social-physical robots [1]. To explore the effects of robot arm stiffness in spHRI and keep users engaged with a diversity of interaction experiences, the final haptic condition that we varied was the stiffness of Baxter’s arm throughout gameplay. We employed different proportional gains in Baxter’s control law to accomplish varying stiffness. As developed previously [15], the overall time-domain control law of the robot’s motion in this experiment is:

$$\begin{aligned} \varvec{\tau }_{\mathrm {cmd}} = {\mathbf {K}}_{d}(\dot{\varvec{\theta }}_{d}-\dot{\varvec{\theta }}) + {\mathbf {K}}_{p}(\varvec{\theta }_{d}-\varvec{\theta }) - {\mathbf {K}}_{f}\varvec{\theta }_{d} + \varvec{\tau }_{\mathrm {gc}} \end{aligned}$$
(2)

where \(\varvec{\tau }_{\mathrm {cmd}}\) is a vector of torques commanded to each Baxter arm motor, \({\mathbf {K}}_{d}\) is a diagonal matrix of derivative gains, \(\varvec{\theta }_{d}\) is a vector of desired arm joint angles, \(\varvec{\theta }\) is a vector of actual joint angles, \({\mathbf {K}}_{p}\) is a diagonal matrix of proportional gains, \({\mathbf {K}}_{f}\) is a diagonal matrix of feedforward gains, and \(\varvec{\tau }_{\mathrm {gc}}\) is a vector of gravity compensation torques.

To maintain a consistent presented motion trajectory regardless of the trial conditions, we always used the same proportional gain (\(30\frac{\mathrm {Nm}}{\mathrm {rad}}\)) for the active W1 joint. For all other arm joints, we selected a lower proportional gain (\(15\frac{\mathrm {Nm}}{\mathrm {rad}}\)) to accomplish more compliant passive joint behavior and a higher proportional gain (\(60\frac{\mathrm {Nm}}{\mathrm {rad}}\)) for a more stiff arm joint behavior. In the equation above, \({\mathbf {K}}_{p}\) is the element that changes depending on the trial stiffness mode.

We investigated two levels of stiffness in our experimental design because of the clear role stiffness can play in strong and weak human-human hand-to-hand interaction; however, it is not certain that the trend between stiffness and user perception is monotonic. We thus discuss overall stiffness trends in this article using the assumption that the relationship between stiffness and participant perception is monotonic for the range of stiffness values tested. More evaluation would be needed to rigorously characterize this relationship.

3.4 Hypotheses

We selected the experimental conditions in anticipation that they would significantly affect the way users feel about clapping hands with Baxter. In particular, we hypothesized that:

  • H1: Users will perceive a robot mode with responsive facial animation to be more pleasant than a facially unresponsive robot.

  • H2: Users will perceive a variable trajectory robot to be less energetic, less dominant, and more safe than a robot that does not respond to their impact.

  • H3: Users will perceive a stiffer robot to be more dominant and less safe than a robot with lower arm stiffness.

Fig. 5
figure 5

Adjectives used by participants in describing the hand-clapping experience, grouped by synonym. The size of the word reflects the frequency of use

3.5 Data Collection

During every human-Baxter interaction, we recorded all available data from the robot’s accelerometer, endpoint state, joint state, and face display ROS topics. We also recorded desired position, velocity, and feedforward torque at every timestep. The experiment was videotaped to enable retrospective review of the participants’ facial expressions and any other notable experiment events.

Participants completed several surveys: (1) a robot evaluation survey after hearing introductory information about Baxter, (2) a hand-clapping game evaluation survey after practicing the game with the experimenter, (3) a subjective perception survey after each of the randomly ordered 20-s trials, (4) a concluding survey after the final unlimited interaction trial, and (5) a basic demographic survey after the concluding survey. The first two surveys involved only slider-type parametric questions. Survey (3) is analyzed extensively in Sect. 4, and thus the complete survey questions appear below:

  • Please rate where the robot falls on this safety scale: (slider, “very unsafe to use” to “very safe to use”)

  • Please rate where the robot falls on this pleasingness scale: (slider, “very displeasing” to “very pleasing”)

  • Please rate where the robot falls on this energeticness scale: (slider, “not energetic” to “very energetic”)

  • Please rate where the robot falls on this dominance scale: (slider, “very submissive” to “very dominant”)

  • Why did you select these ratings? (extended text response)

The concluding survey contained a combination of slider and extended response questions.

Surveys for the experiment were carefully designed based on precedents in HRI research. The subjective trial perception survey leveraged the questionnaires used in [1] to evaluate robot engagingness based on the PAD (Pleasure, Arousal, Dominance) emotional state model [23] and also assessed perceived robot safety. Questionnaires (1), (2), and (4) were adapted from the Unified Theory of Acceptance and Use of Technology (UTAUT) and other metrics employed in [32] and [17].

4 Results

Qualitative and quantitative human user responses to interacting with Baxter, combined with recordings from robot sensors and experiment video footage, help us answer the question of how it feels to clap hands with a robot. The results from each type of data are detailed in this Section.

4.1 Qualitative Results

We first analyzed the extended text responses to each Baxter experiment trial, as well as other experience-related metrics.

4.1.1 Promising Aspects

Study participants supplied us with a wealth of descriptive responses to the randomly ordered interaction trials. The word cloud in Fig. 5 presents all adjectives used by participants in the trial survey free-response field after interactions during which no errors occurred. (Errors are described in more detail in Sect. 4.1.2). Since each participant expressed a clear like and dislike of different trials, we find a balance of positive and negative descriptors in the word cloud. It is important to note that the large word “slow” always referred to the trial tempo, never the robot motion or responsiveness capabilities. Every user identified at least one interaction mode they enjoyed, as well as some that they did not like. Their preferences were not uniform. Some polarized opinions indicated that individualized interaction models may work best for distinct interactees, especially tempo-wise.

Table 1 Excerpts of each extended response entry linked to each observed type of experiment error

Users frequently remarked positively on Baxter’s facial reactivity in survey responses and verbal commentary, labeling it with descriptors like “cute,” “funny,” “friendly,” “pleasant,” and “personable.” Conversely, a lack of animation was often described as less pleasant, less sentient, and more robotic. This result was not completely universal; one user believed Baxter’s responsive face to be an expression of pain, and another liked the facial animation at first but labeled it as “silly” near the end of the experiment.

In the final free interaction trial, users interacted with Baxter for eighteen or more seconds (\(\hbox {M} = 43.95\,\hbox {s}\), \(\hbox {SD} = 29.17\,\hbox {s}\)). This free interaction duration is approximately equal to or greater than the fixed length trial interactions. Based only on brief verbal descriptions of the options, without linking the words to particular observed behaviors, users selected the following customized robot behaviors for their free interaction trials: 20/20 selected facially animated, 15/20 selected variable trajectory, and 9/20 selected high stiffness. For tempo, 1 selected 60 BPM, 14 selected 110 BPM, and 5 selected 160 BPM. This variety of choices indicates that we aptly designed diverse robot interaction behaviors.

Fig. 6
figure 6

Survey responses to analogous pairings of robot-related questions on the initial robot evaluation and concluding surveys. In each plot, the top box plot represents the participant responses to the question on the robot evaluation survey, and the bottom box plot represents the responses on the concluding survey. The center box line represents the median, and the box edges are the 25th and 75th percentiles. The whiskers show the range up to 1.5 times the interquartile range, and outliers are marked with a “\(+\)”. The question coding abbreviations stand for cultural context (CC), forms of grouping (GR), self-efficacy from UTAUT model (SE), and attachment (ATT)

4.1.2 Error Accounts

Although eight of the 20 study participants reported never perceiving Baxter to have made any error, other participants encountered some challenges and errors during the 500 total trials of the experiment, as summarized in Table 1. In the fastest-tempo physically reactive mode, seven participants responded to Baxter’s retreat reaction by sometimes contacting the robot’s end-effector earlier and earlier in its motion cycle (12 total trials). Because the retreat motion was capped at the far retreat position, this user strategy resulted in a decrease in robot motion amplitude and, in some extreme cases, momentary periods of Baxter stillness, which sometimes perplexed users. Reactions to the robot stopping varied from satisfaction and acceptance to displeasure and distrust.

Next, thresholding the accelerometer data produced six observable false positive hand contact detections, most often in trials with a physically responsive trajectory and low arm stiffness. In these cases, vibrations from the robot’s own movement made it think the user had contacted its hand when they had not. This error bothered some users, but not others. Perhaps because this error is a natural-feeling and humanlike mistake, only two of the false positive incidents resulted in negative user commentary.

A final robot behavior problem stemmed from controller gains; five participants were able to exert enough axial torque on Baxter’s end-effector to cause surprising, “jazz hands”-like oscillations in Baxter’s wrist roll (W2) joint motion (15 total trials). Users disliked this error type; the mechanical “jazz hands” behavior seemed unusual and was described as almost unanimously problematic in survey comments.

Human users also made sixteen errors throughout the experiment, namely failing to match Baxter’s fastest clapping frequency or misunderstanding Baxter’s slower frequencies. Users had a balance of positive and negative responses to these trials as well. The various human errors may be avoided in the future with better experiment design.

Table 2 Responses occurring at least twice in concluding survey essay responses

4.1.3 Overall Impressions

Despite these occasional errors, participants’ opinions of Baxter did not change significantly in a positive or negative way over the course of the experiment, as illustrated in Fig. 6. The statements in the subplot titles appear exactly as shown to participants; no additional explanations were provided. Paired t-tests reveal no significant difference between each survey question’s pair of before and after responses (all \(p > 0.23\)). A comparison of before and after survey responses grouped by general category (cultural context, forms of grouping, self efficacy, and attachment) similarly yielded no statistically significant differences (all \(p>0.27\)), although the median response in all categories except self-efficacy increased in the post-interaction survey.

We also see some consensus in positive and negative remarks in the concluding survey essay responses. Table 2 illustrates a balanced set of experiment feedback, including praise of interaction modes alongside critiques indicating that we need more customizability in the robot behaviors. In a final essay question, nineteen of the 20 users identified a personal interest in interacting with Baxter in some way. Use ideas included experiment-like tasks such as playing more complicated hand-clapping games, collaboratively manipulating objects, and doing arm exercises; chore-like tasks such as cooking, washing dishes, doing laundry, and cleaning bathrooms; and social tasks like performing music/dance, playing sports/board games, drinking beer, and socializing.

4.2 Trial Survey Results

We next focused on analyzing the subjective trial questionnaire results to understand how specific interaction factors affected the hand-clapping gameplay experience. Our main tool throughout this process was repeated measures analysis of variance (rANOVA), a statistical method that enables us to determine whether the presented visual and haptic conditions affected user perceptions of Baxter’s pleasantness, energeticness, dominance, and safety.

The intended within-subject factors for our rANOVA were presence or absence of facial reactivity, presence or absence of physical reactivity, and low or high arm stiffness (a \(2 \times 2\times 2\) design space). While testing for other significant conditions, we discovered that tempo was also highly influential, although the three different tempos used in the experiment were originally intended to serve as three repetitions of each experimental condition. After making this discovery, we included tempo as a within-subject factor, leading us to carry out a \(2\times 2\times 2\times 3\) four-factor rANOVA.

Table 3 The p values returned from rANOVAs on the safety and affect ratings
Fig. 7
figure 7

Participant responses to trial survey questions, separated by all conditions except tempo. The center box line represents the median, and box edges are the 25th and 75th percentiles. The whiskers show the range up to 1.5 times the interquartile range, and outliers are marked with a “\(+\)

The overall rANOVA statistical results appear in Table 3, and the box plots in Fig. 7 and 8 illustrate the different data partitions that result from sorting data by each within-subject factor. We fill in any boxes representing a pairing of data with significant differences, as determined by examining p values from our rANOVA tests at an \(\alpha = 0.05\) significance level. We report for significant effects using \(\eta ^2\). These differences are referenced throughout the following discussion of condition results, which includes the testing of our hypotheses.

4.2.1 Facial Animation

Facial reactivity had significant positive effects on two ratings (Fig. 7). As predicted in H1, participants did perceive a facially responsive robot as more pleasant than a facially static one (\(\hbox {F}(1,19) = 9.17\), \({p} = 0.0069\), \(\eta ^2 = 0.0183\)). Users also saw the facially reactive robot as more energetic (\(\hbox {F}(1,19) = 6.06\), \(p = 0.0235\), \(\eta ^2 = 0.0036\)). Facial reactivity did not affect perceived safety or dominance.

4.2.2 Trajectory Variation

User impressions of physical reactiveness significantly affected three ratings, meeting some of our expectations and overturning others (Fig. 7). We did not expect this condition to affect perceived pleasantness of the robot, but users reported that the variable trajectory robot was less pleasant than a robot that maintains the same trajectory regardless of perceived hand impacts (\(\hbox {F}(1,19) = 7.14\), \({p} = 0.0151\), \(\eta ^2 = 0.0234\)). Conversely, we correctly predicted in H2 that a physically reactive robot would appear less energetic (\(\hbox {F}(1,19) = 14.12\), \({p} = 0.0013\), \(\eta ^2 = 0.0223\)) and less dominant (\(\hbox {F}(1,19) = 7.65\), \({p} = 0.0123\), \(\eta ^2 = 0.0212\)). Trajectory variation did not affect perceived robot safety.

4.2.3 Stiffness

The results contradicted our predictions of stiffness effects on user perception (Fig. 7). Although we hypothesized in H3 that a stiffer robot would appear less safe and more dominant, participant responses revealed that users perceived a stiffer robot to be more safe (\(\hbox {F}(1,19) = 6.93\), \({p} = 0.0164\), \(\eta ^2 = 0.0043\)) and less dominant (\(\hbox {F}(1,19) = 7.24\), \({p} = 0.0145\), \(\eta ^2 = 0.0055\)). Stiffness did not affect pleasantness or energeticness.

4.2.4 Tempo

As noted previously, tempo had significant effects on the results, evoking some of the strongest trends in user response (Fig. 8). Participants found robots interacting at different tempos to have significantly different energy levels (\(\hbox {F}(2,38) = 144.01\), \({p} < 0.0001\), \(\eta ^2 = 0.4215\)) and dominance levels (\(\hbox {F}(2,38) = 24.86\), \({p} < 0.0001\), \(\eta ^2 = 0.1340\)). A post-hoc Tukey multiple comparisons test shows that at each increasing tempo, Baxter seems significantly more energetic and more dominant. Tempo did not affect robot safety or pleasantness.

4.3 Robot Recording Results

Although the self-reported participant responses were informative, we were also interested in examining some more objective measures to see if they could help elucidate what happened during each trial. The data recorded from Baxter’s wrist accelerometer in particular allowed us to examine errors between the actual and intended tempo during each trial (a metric reflecting synchronization) and the peak contact acceleration during each hand impact (a proxy for peak contact force). As in our analyses of the trial survey responses, we use a \(2\times 2\times 2\times 3\) four-factor rANOVA at an \(\alpha = 0.05\) significance level for these tests. The overall rANOVA statistical results appear in Table 4, and the box plots in Figs. 9 and 10 illustrate the different data partitions that result from sorting recorded data by each within-subject factor.

Fig. 8
figure 8

Participant responses to trial survey questions, separated by tempo condition. The center box line represents the median, and box edges are the 25th and 75th percentiles. The whiskers show the range up to 1.5 times the interquartile range, and outliers are marked with a “\(+\)

Table 4 The p values returned by our objective measure rANOVAs

4.3.1 Synchronization Analysis

Synchronization metrics can help evaluate how well Baxter and the experiment participant were working together to accomplish the target clapping tempo during each interaction trial, revealing additional interactional effects of the different hand-clapping conditions. Accordingly, we calculated the error between each trial’s target inter-clap time interval (known from the trial tempo) and the actual intervals (found by computing the difference between recorded times of hand contact) for each experiment trial. To encompass a variety of descriptive information, we extracted the mean squared error (MSE) in timing for each trial and also gathered the statistical measures of median and standard deviation in timing errors per trial.

The rANOVA on each timing error metric reveals that synchrony varied over some trial conditions (Fig. 9). Facial responsiveness did not affect trial timing. Physical responsiveness resulted in a greater median timing error (\(\hbox {F}(1,19) = 5.72\), \({p} = 0.0273\), \(\eta ^2 = 0.0137\)). A stiffer robot arm led to a greater timing MSE (\(\hbox {F}(1,19) = 5.79\), \({p} = 0.0265\), \(\eta ^2 = 0.0109\)) and also a greater standard deviation of clap timing errors (\(\hbox {F}(1,19) = 12.49\), \({p} = 0.0022\), \(\eta ^2 = 0.0218\)). Tempo also affected clap timing errors; median timing error dropped for higher tempos (\(\hbox {F}(2,38) = 11.23\), \({p} < 0.0001\), \(\eta ^2 = 0.0617\)). A post-hoc Tukey multiple comparisons test shows that the median timing error value was significantly lower for the 110 and 160 BPM tempos than the 60 BPM tempo.

Fig. 9
figure 9

Comparison of inter-clap interval timing errors for different trial conditions

4.3.2 Contact Accelerations

Differences in contact acceleration could indicate user comfort with the robot, but forces that are too high could become painful after prolonged exposure. Thus, observing how hard participants clapped with Baxter under different conditions can help us compare interaction modes. We located the times of hand contact in experiment recordings using the same accelerometer filtering technique described in Sect. 3.3. The acceleration readings at all the times of hand impact were extracted from each experimental trial recording. We then computed the median and standard deviation of the peak accelerations from each trial.

A rANOVA on each contact acceleration metric shows some statistically significant differences (Fig. 10). Facial responsiveness did not affect hand contact acceleration. Physically reactive trials were accompanied by a decreased median contact acceleration (\(\hbox {F}(1,19) = 7.38\), \({p} = 0.0137\), \(\eta ^2 = 0.0086\)). Trials with a stiffer robot arm displayed a decreased standard deviation in peak contact acceleration (\(\hbox {F}(1,19) = 7.37\), \({p} = 0.0137\), \(\eta ^2 = 0.0116\)). Trials with faster clapping tempos showed a higher median contact acceleration (\(\hbox {F}(2,38) = 16.58\), \({p} < 0.0001\), \(\eta ^2 = 0.0862\)). A post-hoc Tukey multiple comparisons test shows that the median acceleration for the 110 and 160 BPM tempos was significantly higher than for the 60 BPM tempo.

Fig. 10
figure 10

Illustration of acceleration at hand contact over different types of trial

4.4 Video Recording Results

The main user behavior that we quantified via the experiment videos was the number of times the user looked at Baxter’s face throughout the trials. These glances may be a good proxy for social referencing, a phenomenon that occurs when one person looks to another person during an unfamiliar situation for guidance or reassurance [6]. Face referencing occurred in different ways from user to user, but the overall trends can help us understand the influence of Baxter’s face. An experienced rater watched all of the experiment footage videos and tallied every event where the participant looked at the robot’s face. Figure 11 illustrates the distribution of this occurrence over different trial conditions. A statistical test reveals that only the facial responsiveness condition affects the facial reference count. Participants looked at Baxter’s face more often when it was facially animated (\(\hbox {F}(1,18) > 1000\), \(p < 0.0001\), \(\eta ^2 = 0.9999\)).

Fig. 11
figure 11

Visualization of the number of face referencing occurrences between the participant and Baxter during each trial under different conditions. The circles indicate the positions of the medians

Although self-reported responses to the robot and hand-clapping experience were fairly positive throughout the study, we also sought a more objective measure of participant experience through the study videos. We analyzed the participant emotions exhibited in each video frame using the facial Action Unit (AU) extraction abilities of the OpenFace tool [5]. Figure 12 illustrates the total time that each participant exhibited all of the AUs indicative of each of five main human emotions: happiness, sadness, surprise, fear, and anger. The high occurrence of happiness supports that participants enjoyed the study. A rANOVA on facial expression durations across emotion revealed that there is a significant difference in how long participants displayed different types of emotion (\(\hbox {F}(4,72) = 18.60\), \({p} < 0.0001\), \(\eta ^2 = 0.4542\)). A post-hoc Tukey multiple comparisons test revealed that participants displayed happy expressions significantly more than any other expression.

Fig. 12
figure 12

Amount of time each participant displayed the analyzed core human emotions

5 Discussion

The results of this human-robot experiment begin to answer the question of how it feels for human users to clap hands with Baxter. Our analyses focused especially on the effects of Baxter’s facial responsiveness, physical responsiveness, and stiffness during interaction trials. Although the results confirmed some of our hypotheses, other expectations were incorrect, and the data revealed several unpredicted effects.

User impressions of Baxter, as indicated by the responses to the UTAUT-inspired survey, did not change significantly from before to after the experiment. We interpret this consistency to signify that although this version of our hand-clapping robot made some errors, it is worth continuing to develop improved system behaviors. The fact that we purposely tested some “bad” robot modes may have also contributed to the lack of change between the before and after UTAUT responses.

Our findings upheld H1’s prediction that a facially animated robot would appear more pleasant. The additional effects of facial responsiveness outside of those predicted by H1 appear logical; increased energeticness of a facially animated robot may stem from extra dynamism and appearance of social intelligence. When asked if Baxter’s facial expression changed throughout the experiment, three of the 20 participants reported never noticing a change. Nevertheless, the statistical results with or without these three individuals remain the same. Users also glanced more at the responsive face.

For the physically responsive robot, H2 correctly forecasted a decrease in energeticness and dominance ratings. We believe the negative pleasantness ratings for the physically reactive robot arose from some confusion about how to interact with this robot mode. In the fastest-tempo physically reactive mode, several participants responded to Baxter’s retreat reaction by contacting the robot’s end-effector earlier and earlier in its motion cycle, which caused slow motion and some momentary periods of Baxter stillness. Some users also felt confusion due to the unannounced switch from robot leadership (in defining the interaction tempo initially) to human leadership (in defining contact position during physically reactive trials). Some users seemed perplexed as a result, so the overall safety ratings did not match the positive safety trend predicted by H2.

The stiffness results, all contrary to H3, most likely arose from the fact that a higher proportional gain in the controller causes the robot to more closely follow its programmed trajectory. Baxter movement at low stiffnesses was more likely to vary, and accordingly, result in the appearance of dominant volition and less safety. Similarly, mechanical errors arising from wrist roll joint stiffness were less likely to be forgiven than other more human-like errors that occurred in the experiment. People seem to prefer a robot that moves predictably.

Tempo results all appear logical: a faster robot is literally more energetic and harder to stop. As similarly seen in other emergent HRI work [35], the timing of Baxter’s motion can strongly influence how energetic and dominant it seems. The contact acceleration of handclaps also increases significantly as tempo increases. Although trials at different tempos were originally intended to be repetitions of conditions, some of the strongest user rating and robot recording differences came from tempo differences.

Other informal observations throughout the experiment indicated positive aspects of the designed robotic system and experimental interaction. For example, all enrolled participants were comfortable enough with Baxter’s appearance to consent to the experiment. The experimental activity was also sufficiently captivating so that all users completed the full experiment, even in this extremely repetitive use case. Although the robot made some mistakes, safety ratings of the robot were still generally high. When asked if the robot made any type of mistake throughout the entire experiment, eight of the 20 participants believed the robot’s behavior to be perfect. Additionally, users enjoyed Baxter’s facial expressions and often attributed inventive, non-existent sensing intelligences to the robot.

In contrast, the study design had some shortcomings. The user population in this study was fairly small and was composed of mostly young technical students. The experiment also took place in a controlled lab setting. To ensure that the results apply for a broader population in more general settings, we would need to run a study with more balanced representation from additional groups in uncontrolled everyday environments. Additionally, the single-handed nature of the experiment interactions is somewhat limiting. To gain more understanding of practical human-robot interactions for everyday use, we will need to expand to more diverse and relevant two-handed interactions. Furthermore, we relied on self-reported measures for our affect analysis, rather than assessing engagement-related metrics through more objective data-driven means such as facial expression analysis. Finally, although the experimenter did their best to convey equipoise throughout the study, the within-subjects design may have exaggerated differences between conditions due to demand characteristics.

Our next research steps will address the encountered issues for improved future experimental design and robot motion. Most user confusion and some of the reactivity mode issues could be addressed by giving the user more instruction and practice trials at the beginning of the study. Other improvement steps include defining leadership roles more consistently throughout each trial and modifying robot reaction behavior to allow for constantly-adapting retreat positions. We can rectify the “jazz hands” oscillation issue by redesigning the end-effector to make it more difficult for users to exert axial torque and also re-tuning the controller with increased W2 joint damping. In future experiments, we plan to recruit a more diverse pool of participants and explore additional interaction engagement metrics in both within-subjects and between-subjects paradigms.

6 Conclusion

Overall, we are energized by the positive reactions to this study and are eager to create an improved version of this system. The affective results give us a guide to manipulating the emotional experience of robot users by adjusting known parameters. The disparate effects of facial animation and arm stiffness hold particular potential for changing specific aspects of user experience; experimenters can adjust the pleasantness and energeticness of a robot by altering only the robot’s facial reactivity while separately manipulating the robot’s safety appearance and dominance by changing the stiffness of passive arm joints. Robot timing had a strong effect on user perceptions of the robot as well. Particularly, a faster interaction tempo can make the robot seem more energetic and more dominant as needed. Now that we know how it feels to clap hands with a robot, this initial human-robot investigation will inform future work on our hand-clapping robotic system in various playful HRI settings. Our findings, especially those elucidating how to tune different interaction parameters to shape an affective HRI experience, may help other researchers in their own spHRI work.