Building our research on and relating it to existing knowledge is the building block of all academic research activities, regardless of discipline [227], p. 333.

1 Introduction

Due to demographic change and related skilled-worker scarcity [25] and increased technological penetration of our private and working lives [232], social robots and success factors of human–robot interaction (HRI) become increasingly important. HRI and research on it represent a multidisciplinary field. It involves “the study of the humans, robots, and the ways they influence each other” ([84], p. 257). HRI research brings together various disciplines, such as robotics, engineering, computer science, human–computer interaction, cognitive science, and psychology [13]. Across these domains, a growing body of research focuses on human interactions with social robots. These robots “exist primarily to interact with people” ([142], p. 322) or evoke social responses from them [158]. Social robots appear in numerous roles, such as museum guides [226], receptionists [235, 236, 239], educational tutors [114, 127], household supporters [219], and caretakers [56, 114, 148, 252].

Due to their automated social presence, such robots make humans “feel that they are in the company of another social entity” ([277], p. 909). Social presence is often associated with a robot’s ability to express artificial emotions and facilitate social relationships [126]. Emotional signals have been shown as important factors in human–human relationsips [237], and emotions by robots increase humans’ perceptions of the transparency of the HRI. Furthermore, these signals allow humans to interpret robotic behaviors using well-known social cues, which they learned from prior human–human interactions [75]. As social robots gain the capacity to approximate humans more closely, their emotional expressions increasingly facilitate social HRI [37, 107, 204]. Accordingly, and thereby contributes to robotic psychology. Robotic psychology is defined as interdisciplinary field examines emotional, cognitive, social, and physical human responses to human-robot interactions by also considering physical and social environments. This overview seeks to synthesize research knowledge pertaining to emotions during HRI with social robots.

As the popularity of social robots continues to rise [124, 215], research into their displayed emotions also has accelerated; between 2000 and 2020, more than 1600 publications appeared in this area. The publication rate across the two decades indicates continuous increases, as Fig. 1 shows.

Fig. 1
figure 1

Publication rates of studies that examine emotions during HRI (source: dimensions.ai)

Empirical studies of emotions during HRI mainly revolve around three topics: (1) emotion expression by robots, (2) human recognition of robotic emotions, and (3) human responses to robotic artificial emotions. Some conceptual/overview articles, studies on emotion recognition during HRI, and publications in related fields also can inform the current literature synthesis. The complexity and fragmentation in this research domain makes it challenging for researchers to keep up with state-of-the-art findings. Furthermore, it is difficult to conceive the collectivity of evidence available in a particular research area. Therefore, a literature review is both timely and necessary. This comprehensive review aims to identify publications dealing with emotions during HRI with social robots (for studies on manufacturing robots, see [182, 186, 209]), following the process detailed in Fig. 2.

As a starting point, this review relies on an electronic search of digital libraries (Google Scholar, ScienceDirect, and Dimensions) using keywords such as “human–robot interaction / HRI AND emotion”, “robots AND emotion”, and “social robots AND emotions”. Next, the author conducted a manual search of proceedings published in key journals and conferences (for a complete list of reviewed conferences and journals see Web-Appendix 1 tiny.cc/IJSR20WebApp1). This initial scanning process revealed more than 1600 publications; further screening identified many of these publications as patent reports, short reports, or book chapters though. After excluding them from further consideration, 473 articles remained for the review (Fig. 2).

Fig. 2
figure 2

Flowchart of the literature screening process

As Fig. 2 indicates, several other exclusion criteria apply for the review process too. First, the focus of this review is on emotions during HRI with social robots.

Second, in a broader sense, HRI involves a wide spectrum of research topics, such as industrial robots, telepresence, virtual reality, and wearables [9, 36]. This review takes a more narrow perspective by focusing on embodied HRI [56]. Accordingly, 24 studies on emotions exhibited during interactions with non-humanoid or non-embodied agents, such as virtual avatars and wearables, are excluded from further consideration.

Third, to maintain focus on dyadic interactions between typically developed humans and social robots, this review excludes several studies. Specifically, studies in unique, particular settings, such as human–robot teams that depend on other, dynamic effects (for an overview see [287]) are excluded. Furthermore, HRIs that are specific to humans experiencing health issues, in which the medical diagnostics is pivotal to defining the HRI [269, 270] are excluded. Web-Appendix 2 (tiny.cc/IJSR20WebApp2) outlines the excluded studies; these references may serve as further readings for researchers with special interests in these excluded areas.

Fourth, this review requires original empirical publications that underwent a double-blind and peer-reviewed process. Conceptual contributions and dissertations thus are excluded. Because of their predominant emphasis on conceptual approaches, studies of the ethical implications of emotions during HRI are also excluded. After applying these criteria, a total of 175 papers remain for the survey.

This review also may facilitate theory development. In particular, the insights from extant research indicate several areas that demand more research, as manifested in calls for conceptual and empirical models of emotions during HRI with social robots. Accordingly, this review sought to address five main research questions:

  1. (1)

    What methods have been applied for robotic emotional expression generation?

  2. (2)

    How well can robotic artificial emotions be recognized by humans?

  3. (3)

    How do humans respond to artificial robotic emotions?

  4. (4)

    How do contingencies affect the relationship between robotic emotions and human responses during HRI?

  5. (5)

    What remains to be learned regarding emotions during HRI?

In accordance with the guidelines for systematic literature reviews [15, 261], the review effort involves explanations of underlying theoretical perspectives, empirical design issues, and major findings to establish a foundation of existing research that can advance knowledge. By integrating both perspectives and empirical findings, it identifies areas in which research findings are disparate or exhibit interdisciplinary views.

To answer the research questions, Sect. 2 starts with a description of the conceptual framework as an organizing structure for this review. Section 3 presents the state of the art, organized by application domains. Finally, Sect. 4 outlines research directions for the field. The paper finishes with a conclusion in Sect. 5.

2 Framework of the Overview

A detailed review of published articles reveals four main streams of research on (1) emotional expressions by robots, (2) the human recognition of artificial robotic emotions, (3) human responses to robotic emotions, and (4) contingency factors. These research streams provide the basis for the review framework in Fig. 3.

To present extant research, this article organizes the discussions along a causal chain, which parallels the notions of the stimulus–organism–response (S–O–R) paradigm [178]. According to this framework, certain features of the environment (stimulus) affect people’s cognitive and emotional states (organism), which in turn drives behavioral responses (response) [66]. In parallel with the S–O–R paradigm for the current analysis, the stimulus (HRI) is an independent variable, and an organism is a mediator (human participants, their cognitive or affective states, and their prior experiences). The response (behaviors during the HRI) is the dependent variable for this review (see also [267, 289]).

This relationship is not automatic but rather tends to be shaped by the context and people’s own experience. The summary in Fig. 3 displays the focal topic of emotions during HRI with social robots.

Fig. 3
figure 3

Conceptual framework of this review

3 State of the Art

3.1 Research Stream 1: Robotic Expressions of Artificial Emotions

The first research stream includes 71 reviewed studies and relates to the stimulus, depicted in Fig. 3. Reviewing this research attempts to answer the first research question: What methods have been applied for robotic emotional expression generation?

In a classical S–O–R paradigm, a stimulus can include expressions of another person’s internal states [77]. For the current framework, it entails expressions of artificial emotions by a robot [72]. Emotions are perceived as strong feelings by observers [80], so they offer important stimuli during both human–human interactions [259] and HRI [39, 144].

Disciplines. For their foundation, these studies rely on contributions from robotics (e.g., [1, 11, 156]) and HRI [121, 122, 169, 225]. Further studies are rooted in human–computer interaction (e.g., [3, 4, 21, 99, 140, 173], engineering [171], and philosophy [101].

Theoretical approaches. Some researchers (e.g., [167]) rely on a multilevel process theory of emotion [160,161,162]. It holds that people perceive emotions within a three-level hierarchical system, including a sensory level that generates primary emotions, a schematic level that integrates perceptions and responses in a memory system, and a conceptual level that integrates prior experiences with predictions about the future. This model provides valuable insights about the cognitive processes during HRI.

Other researchers base their proposed classification standards for robotic emotion expressions on conceptual considerations [51, 87], the facial action coding system (FACS) emotions [300], or a circumplex model of emotions [31, 125]. The FACS includes six basic emotions—happiness, surprise, anger, sadness, fear, and disgust—that arguably can be experienced by both humans and non-humans [74]. In contrast, secondary emotions such as interest and curiosity are particular to humans [59]. Still other classifications rely on user responses [139], emotion simulations [180, 181], or emotion animations [278]. This model suggests characterizing robotic artificial emotional expressions according to valence and arousal dimensions (for reviews, see [152, 216]). Accordingly, a basic premise of the circumplex model is that a person’s affective states appear along the circumference of a circle, on which the various dimensions can be classified by their degrees of positive or negative affect.

Methods for generating artificial emotions. The applied methods can be classified into two major categories: (1) static and (2) dynamic.

Static approaches to robotic emotion generation create manually coded, pose-to-pose robotic animations based on stable system architectures of robotic emotions, using hand-crafted categorizations of robotic emotions. Authors have proposed robotic emotion architectures based on predefined scripts [2, 64], predefined emotional spaces [42, 143], movements of pneumatic actuators [106], or a fuzzy emotion system [271, 293].

Dynamic approaches can be either proactive or reactive. Proactive emotion generation may be inspired by graphic animation design, such as Disney’s 12 basic principles of animation [90], which can be applied to generate lifelike robotic emotions too. The underlying notion of these principles is to use “pose-to-pose animation, in which several keyframes are defined and interbetweened to get the intermediate postures” ([175], p. 546). This creative design-oriented approach generates high-quality robotic animations because it is adapted to the morphology of the robot. Furthermore, emotions might stem from a combination of features, hand-crafting, a creative design approach, and direct imitations of the human body [175].

Reactive emotion generation instead relies on data, generated through the recognition of human emotions in general, as might be gleaned from humans’ faces, head movements, body motions/gestures, speech, touch, or brain feedback (for overviews see [208, 211]. Table 1 provides an overview of the recognition areas).

Table 1 Studies on robotic emotion generation based on human emotions

Studies in this tradition mostly attempt a direct imitation by tracking human emotional expressions, such as with computer vision techniques, or special markers and sensors. The key positions then can be mapped to the robot’s movement space either with data-driven processing [290] or by defining some suitable transfer functions for the robot morphology [176].

Summary of findings. Hand-coded robotic animations can offer high quality [175]. However, these static approaches are limited because “robot performance based on a static emotional state cannot vividly display dynamic and complex emotional transference” ([282], p. 160). Furthermore, the limited set of emotions increases the likelihood of repetitive behavior, which may appear inappropriate in HRI. Therefore, research suggests that robotic emotion generation should be based on dynamic algorithms that can recognize human emotion. Yet despite its strengths, this approach is challenging due to the differences in the movement possibilities between humans and robots [175].

Table 2 Average human recognition rates of robotic emotional expressions and Mann–Whitney U test results

3.2 Research Stream 2: Human Recognition of Artificial Robotic Emotions

The second research stream includes 43 reviewed studies and relates to the organism depicted in Fig. 3. Reviewing this research attempts to answer the second research question: How well can robotic artificial emotions be recognized by humans?

In the S–O–R paradigm, an organism refers to any “internal processes and structures intervening between stimuli external to the person and the final actions, reactions, or responses emitted” ( [12], p. 46). For this survey, the organism is represented by humans’ recognition of robotic artificial emotions, expressed via face, body, or both. Details about the reviewed studies in this stream can be found in Web-Appendix 3 (tiny.cc/IJSR20WebApp3).

Geographical origins. Most of the studies that fall into this stream focus on a single country (see for details Web-Appendix 3) in Europe (e.g., Austria [265], France [40, 68, 190], Germany [75, 95, 105, 110, 111, 170, 236], the Netherlands [98], Portugal [224], Switzerland [76], and the United Kingdom [16, 46, 174]), Asia (e.g., China [141, 223], India [215, 241], Japan [123, 124, 131, 229, 251, 254, 295], Korea [187], and Taiwan [102]), or America (e.g., United States [22, 29, 115, 117, 183, 256] and Mexico [214]). Only four studies investigate humans’ recognition of robotic emotions across multiple countries [19, 55, 82, 215]. For example, [19] find cultural differences in emotion recognition among participants from the United States, Asia, and Europe, as do [82] for Denmark, the United Kingdom, and Germany. Other studies capture data from Germany, Hungary, India, the Netherlands, Poland, Portugal, and the United Kingdom [215] or Germany, Slovakia, and Spain [55], without explicitly examining cultural differences.

Disciplines. For their foundation, these studies rely on contributions from robotics [20, 40, 55, 82, 187, 214, 223, 224, 242], human–computer interaction [253, 254, 256, 295], HRI [29, 75, 105, 110, 123, 170, 183, 215, 229], and social robotics [16, 76, 179, 265]. One study is rooted in neuroscience [68].

Theoretical approaches. From a theoretical perspective, most of the studies [17, 22, 29, 40, 68, 70, 86, 95, 174, 183, 214, 215, 224, 236] rely on the FACS model [72]. The circumplex model of emotions (see Sect. 3.1) also has been applied in several studies [18, 29, 55, 111, 117, 229, 251, 265]. A closely related approach is Plutchik’s wheel of emotions [205], which has been applied in two studies [253] [254].

Examined emotions and modes of expression. Table 2 summarizes the percentage recognition rates by which humans can recognize the six basic emotions of the FACS model [73, 74]. The determination of the average percentage values is based on the detailed list of reported recognition rates across the reviewed studies in Web-Appendix 4 (tiny.cc/IJSR20WebApp4). Recognized emotions provide a basis “for evaluating and judging events to assess their overall value with respect to the creature (e.g., positive or negative, desirable or undesirable, etc.)” ([31], p.273).

Fig. 4
figure 4

Sampling of robotic avatars

A closer look at the studies (see Table 2 and in detail Web-Appendix 4) reveals that there is only little consistency across the reviewed studies. Rather, they are heterogeneous in several important respects:

  • Manipulated emotions: Most existing studies select manipulated emotions based on the FACS approach [71] or the circumplex model [216]. Therefore, a large proportion of these studies focuses on the emotions of happiness, surprise, anger, fear, sadness, and disgust.

  • Robotic agents: Robotic agents can be distinguished as anthropomorphic (category a and b) or zoomorphic (category c) robots (see Fig. 4). Due to their different degrees of freedom in their bodies and/or faces, they exhibit different abilities to express emotions.

  • Body parts for emotion expression: Consistent with notions from social psychology [63], existing publications focus on facial or bodily emotion expressions, or both, as exhibited by robots during HRI. This heterogeneity may also result from the availability of different robots (see Fig. 4).

  • Context of HRI: Most of the studies have been conducted in a laboratory setting or online (see Web-Appendix 4); only two studies feature real-life settings, i.e., home settings [98] or clinical settings [183].

  • Scenario for the HRI: With regard to the type of HRI, the studies differ in whether the interaction is face-to-face, video-based, or based on images (see Web-Appendix 4).

An interesting question is whether differences in humans’ ability to recognize robotic emotions occur when emotions are expressed by the robot’s face or body. Because most of the studies only reported average recognition rates as percentages (see Table 2 and in detail Web Appendix 4), the requirements for a t-test for independent samples are not met. The test for potential differences for this review therefore relies on a Mann–Whitney U test [189, 217, 303]. This test indicates whether the central tendencies of two independent samples (e.g., studies on human recognition of facial robotic emotion expressions and studies on human recognition of bodily robotic expressions) are different. Mathematically, the Mann–Whitney U statistics can be defined as follows [189]:

$$\begin{aligned} U_x = n_xn_y + \displaystyle \frac{n_x(n_x + 1)}{2}) - R_x \end{aligned}$$
(1)
$$\begin{aligned} U_y = n_xn_y + \displaystyle \frac{n_y(n_y + 1)}{2}) - R_y \end{aligned}$$
(2)

where \(n_x\) is the number of observations in the first group of studies (e.g., studies of facial expression recognition); \(n_y\) is the number of observations in the second group of studies (e.g., studies on bodily expression recognition).

\(R_x\) is the sum of the ranks assigned to the first group and \(R_y\) is the sum of the ranks assigned to the second group. That is, the Mann–Whitney U test is based on the idea of ranking the data; the measured values themselves are not used in the calculations, but instead are replaced by ranks, which inform the actual test. Thus, calculations are based solely on the order of the data (i.e., greater than, less than). Absolute distances between the values are not considered. For example, both U equations can be understood as the number of observations in the experimental studies on HRI when all the scores from one group are placed in ascending order.

Researchers suggest greater importance of facial relative to bodily expressions during HRI [295]. Of the reviewed studies in this stream, a large portion examines face-to-face human–robot interaction [16, 40, 75, 82, 97, 105, 117, 123, 131, 170, 187, 214, 223, 224, 229, 242, 251, 254, 256, 265], whereas other studies rely on images [20, 29, 55, 61, 68, 76, 82, 110, 117, 183, 214, 215, 295] to express robotic emotions. With their findings, they detail efficient methods to program robots’ artificial expressions, using both facial expressions and bodily features. However, in the current review, no significant differences arise in participants’ recognition rates of emotions, expressed by a robot using either facial or bodily expressions (see Table 2). The Mann–Whitney U test further shows that the recognition rates during HRI with a physically embodied robot are not higher than those in HRI with an image- or video-based robot, which is consistent with some extant findings [117].

Robot type. Robotic avatars can be categorized into three groups (see Fig. 4): robotic faces, fully embodied robots, and zoomorphic robots. Robotic faces (e.g., Flobi robot, Melvin robot, EMYS robot, Golem-X robot, ERWIN robot, android PKD, KISMET robot) have been used commonly in studies of facial emotional expressions. Fully embodied robots appear in studies of bodily emotional expressions (e.g., KOBIAN, WE-4RII, Nao, Pepper, Elenoide). Among zoomorphic robots, the Keepon, KAROTY, and Pepe robots have been investigated.

Research setting. Most of the reviewed studies feature laboratory settings [19, 29, 40, 68, 75, 76, 82, 86, 103, 105, 117, 123, 131, 170, 183, 187, 190, 214, 223, 224, 229, 242, 251, 253, 256, 265, 294, 295]. Another set of studies is designed as online experiments [19, 55, 82, 110, 117, 214, 215]. A study by De Graaf, Allough, and van Dijk [97] involves qualitative interviews. No studies use field settings.

As participants, the majority of studies rely on adult student samples [16, 76, 82, 105, 117, 123, 170, 214, 223, 224, 229, 242, 251, 254, 256, 265]; three feature adult non-student respondents, specified either as household members [97], non-clinicians/clinicians [183] or frontline employees [236]. Others include mixed samples of adult participants, obtained through online channels [19, 55, 82, 110, 117, 214]. Among some notable exceptions, some studies used data gathered from children as participants [17, 29]. Canamero and Fredslund [40] compare the recognition capabilities of adults and children and find that children recognize robotic emotions better than adults. Several studies do not identify their participants clearly [22, 53, 68, 70, 75, 95, 131, 174, 188, 215, 242, 295].

Summary of findings. These studies affirm that robots can be programmed to express emotions, despite not actually having them. The average values of the recognition rates for different emotions serve as indicators, though various robotic agents require adequate programming and testing to establish expressions of robotic emotions.

Despite the differences across robots, the revealed recognition rates also offer some guidance regarding what robotic emotional expressions are associated with what emotions. In Table 2 the average recognition rates of most studies are well above the threshold value of 15% recommended in early HRI studies [29] for both facial expressions (58.76) and body expressions (57.87) (see in detail Web-Appendix 4). Thus, future HRI studies should strive for an emotion recognition rate of at least 50% for both facial and bodily expression. Furthermore, these examined emotions provide an initial basis for creating standardized, posed emotional expressions that accurately and reliably convey information. The validated expressions in robotic research also are less likely to suffer the problems that have been associated with emotional expression stimuli developed without any standardized system. Yet few studies provide data about any mean differences in detection rates; instead, they report percentages. This limits the capacity for tests of significant differences across groups. Although the results provide initial indications, an empirically validated “gold standard” for expressing robotic emotions is not yet available.

3.3 Research Stream 3: Human Responses to Artificial Robotic Emotions

The third research stream includes 61 reviewed studies and relates to the response depicted in Fig. 3. Reviewing this research attempts to answer the third research question: How do humans respond to artificial robotic emotions? In the S–O–R paradigm, the response is a person’s reaction to a stimulus. Accordingly, research stream 3 includes studies of human reactions to robotic emotions (see for details Web-Appendix 5, tiny.cc/IJSR20WebApp5).

Geographical origins. The reviewed studies in this research stream mostly take place in single countries, which span most of the world, including Korea [136, 138, 202], Japan [128,129,130, 147, 188, 192, 193, 196, 253, 286], India [255], and China [223, 243, 301] in Asia. France [5, 6, 43], the Netherlands [91, 92, 230, 258, 283, 284], Finland [102], Germany [195, 233, 264], Italy [17], Spain [89], Sweden [7], and the United Kingdom [28, 38, 146, 157, 213, 279] in Europe; as well as the United States [27, 49, 50, 93, 115, 119, 142, 145, 153, 159, 165, 218, 221], Canada [222], Israel [113, 114], Australia [231], New Zealand [33], and Brazil [262].

Only three studies include multiple countries [82, 179] [104]. These studies reveal insights on contingency factors that may affect the strength of the effects of human responses on HRI. One study includes both the United States and Japan [179]. Robotic joy prompts similar ratings from humans in both countries, but a robot that appears to represent another culture is perceived as part of the outgroup. Another study includes Australia and Japan [104]. Results show that Australian participants perceived an android robot more positive than Japanese participants. A comparative study of native language speakers from Denmark, the UK, and Germany found that different communities hold different expectations regarding robotic emotional expressions [82].

Disciplines: Most of the studies have their origins in the field of robotics [5, 6, 43, 115, 138, 145, 179, 196, 202, 223, 283, 284, 301], human–computer interaction [38, 93, 111, 128, 129, 230], or HRI [27, 49, 50, 91, 92, 113, 114, 136, 146, 157, 159, 165, 188, 195, 213, 218, 221, 243, 255, 258, 279]. Emotional reactions to HRI also have attracted considerations of social interactions, as detailed in research into behavioral robotics [28], cognitive science [102], ergonomics [119], social robotics [142, 147, 192], and psychology [253].

Fig. 5
figure 5

Input-process-output model to summarize human responses to emotions during HRI

Theoretical approach. Social identity theory [179] and the similarity attraction paradigm [5, 6], as first introduced by Tajfel and colleagues [246, 247, 249, 250] provide frameworks for examining whether humans perceive robots as part of their social ingroup or social outgroup. The unified theory of acceptance and the technology acceptance model (TAM) [28, 222, 231], rooted in information systems research [57, 58] that deal with technology acceptance by humans, also have been applied to HRI. In the TAM, perceived usefulness and ease of use determine behavioral intentions to use a system, which in turn predicts actual use [57].

In addition, cognitive appraisal theory [83] and the hierarchical model of cognitive appraisal, [198] provide a framework for developing artificial agents that are capable of exhibiting emotional responses [150]. Finally, the uncanny valley paradigm [146, 234, 264], first introduced by Mori [184, 185, 274], helps to predict humans’ emotional responses to robots, according to their human-likeness (i.e., extent to which they resemble humans [172]).

Examined relationships regarding emotions. Different variables represent robotic emotional actions, as emotion-related input, and the human reactions to robotic emotions, or human reactions during HRI. Several variables also have been studied as outcomes of robotic emotions on the one hand and antecedents of human reactions to robotic emotions on the other, which are referred to as emotion-related mediators. The investigated variables can be organized into an input-process-output model, depicted in Fig. 5.

With regard to emotion-related input, studies show that robots’ characteristics, such as indications of their personality [5, 6, 202], empathy [43], or human-likeness [33, 234, 279], affect emotions during HRI. For example, a robot’s personal similarity with the human and human-likeness affect acceptance among humans.

Robotic emotion displays [7, 28, 89, 93, 115, 146, 159, 179, 221, 253, 263, 264] and emotional capabilities [286], such as using non-verbal cues [102, 147] or referring to humans by name [128, 218], also increase robot acceptance. Human characteristics, such as their emotional intelligence [50, 153] and experience with robots [119], similarly can affect emotions during HRI. Finally, social cues, including the length [301] or mode of emotional expression [27, 38, 49, 91, 92, 136, 213, 255], help determine emotions during HRI.

The emotion-related mediators help explain how an emotion-related input relates to the human response to a HRI [133]; see Fig. 5. Only one study examines indirect effects pertaining to emotions during HRI [230]. It shows that a robot’s emotional valence indirectly affects user perceptions through their emotional appraisals of the HRI.

Although not explicitly identified as investigations along these lines, several studies shed relevant light on potential emotion-related variables that likely mediate the input–human reaction relationship. In examining both antecedents and human responses to a set of constructs, they identify what is referred to as emotion-related mediators in Fig. 5. These potential mediators include a robot’s perceived social nature [27, 301], emotional responsiveness [27, 49, 113,114,115, 165, 221, 243], and pleasantness [38, 115, 138, 223, 230, 283, 284].

Finally, among human responses to HRI, the studies examine affective, cognitive, and behavioral reactions [191]. Affective responses include affect [91, 111, 145, 192, 196, 213], likability [102, 243], empathy [136], interest in the robot [129], uncanniness [146], emotional adaptation to robotic emotions [157, 195, 283, 284], and trust in the robot [43, 91].

Cognitive responses include the attention to the robot [115], social agency judgements [92], overall perception of the robot [255], perceived human-likeness [146], emotional interpretation [159], emotional valence [230], and perceived ingroup connections [179]. Some researchers [109, 238] leverage the TAM. From the TAM, only perceived usefulness has been included thus far as a dependent variable to examine emotions during HRI [223, 243]. The behavioral responses include variables such as intensity of the interaction [5, 6, 93, 114, 202], positive reactions [258], avoidance [142], altruistic behavior toward the robot [253], and human performance [145, 165, 188, 221].

Empirical design/sample. Most studies use laboratory experiments [5, 6, 14, 27, 28, 38, 44, 49, 91,92,93, 102, 104, 111, 113, 114, 129, 130, 135, 137, 142, 145,146,147, 157, 159, 165, 188, 190, 192, 195, 202, 213, 221, 223, 243, 253, 255, 258, 279, 283, 284, 299, 301] with students [5, 27, 28, 32, 44, 49, 91, 92, 104, 111, 113, 114, 157, 188, 193, 221, 223, 253, 255, 283, 284, 301]. The participants are usually adults, without further specification [14, 102, 129, 130, 137, 142, 146, 165, 192, 196, 218, 230, 301], or else are children [38, 128, 135, 145, 147, 159, 165, 190, 213, 258, 279]. Web-Appendix 5 (tiny.cc/IJSR20WebApp5) provides further details. The controlled simulation of HRI in laboratories may reflect the continued legitimacy of a positivist paradigm in mainstream robotics research, according to which findings that are supposed to have a knowledge character should be limited to the interpretation of “positive”, i.e., actual, sensory, perceivable, and verifiable findings. Yet this research tends to be limited in its generalizability.

Few studies include online experiments [50, 119, 179, 230] or data from online participants who represent various backgrounds. Three experimental studies were conducted in a real-life setting, gathering data from visitors [129, 142] or clients in elder care [130]. Most studies involved small samples of fewer than 50 respondents; only about 20% feature 50 respondents or more.

All reviewed studies rely on participants’ self-ratings, which are useful to assess their characteristics. Gauging emotions or behaviors with self-ratings may create a threat of common method variance [207], “attributable to the measurement method rather than to the constructs the measures represent” ([206], p. 879). It creates a false internal consistency, suggesting an apparent correlation among variables that actually is generated by their common source.

In most cases, the studies focus on a single interaction with a robot, reflecting an implicit assumption that humans’ emotional reactions remain identical and do not change over time or through additional interactions with a robot [276]. In a few longitudinal studies, the same user interacts with a robot several times. For example, a six-month field experiment [93] shows that HRI lasts longer with emotional robots (which express happiness or sadness) than neutral robots. A field study in a shopping mall over a period of 26 days [129] reveals that participants who evaluate the robot positively also express more interest in the interaction. A nine week study determines that the degree of empathy humans offer in response to emotional expressions does not differ from their degree of empathy after verbal expressions [142].

Such longitudinal studies are more laborious and time-consuming [88]. Furthermore, only recently has the technology been robust enough to allow some degree of autonomy when users interact with robots for extended periods. However, “longitudinal studies are extremely useful to investigate changes in user behaviour and experiences over time” ([158], p. 291).

Summary of findings. Emotions are particularly important during HRI with social robots. During HRI, humans express emotional, cognitive, and behavioral responses. In particular, robotic emotion-related characteristics, emotional capabilities, and displays of emotions matter for HRI. A robot that expresses positive emotions is more accepted as technology than one that does not.

3.4 Research Stream 4: Contingency Factors Affecting Emotions During HRI

The fourth research stream includes 14 studies that relate to the contingencies in which the interaction takes place (see Fig. 3). The studies are a cutting quantity of the reviewed studies in research stream 3 (see Sect. 3.3). Accordingly, details about these studies can also be found in Web-Appendix 5 (tiny.cc/IJSR20WebApp5).

Reviewing this research attempts to answer the fourth research question: How do contingencies affect the relationship between robotic emotions and human responses during HRI?. A contingency or moderator variable either strengthens or weakens the relationship between two or more variables [10, 65]. By considering contingency factors, this research stream goes beyond the classical S–O–R logic by recognizing that the basic effects may not be equally strong in every situation; rather, the presence and strength of the basic effects may depend systematically on contingency factors.

Characteristics of the interacting parties. Five studies test whether the human’s gender affects emotions during HRI [7, 38, 50, 136, 188]. While several studies find differences (e.g., [136]), others indicate that men prefer to interact with a pleasant (cf. neutral) robot, but women indicate no such preference differences [38]. Robot characteristics, including emotional intelligence [50] and human-likeness [188], also have been examined as moderators.

Characteristics of the interaction. Gaze cues during a game with a robot increase participants’ perceptions of the social desirability of a geminoid robot, which is designed to look like a specific person, but not those of a less human-like robot, such as Robovie [188]. Control over the robot during the HRI also increases participants’ affect, expressed in response to an android’s facial expression [192].

Duration of the interaction. Six studies note long-term effects of emotions during HRI with a social robot [93, 128, 129, 142, 147, 218]. Relying on longitudinal data from adult participants, these studies consistently reveal that social cues and emotions, expressed by a social robot, trigger HRI over time [93, 128, 129, 142, 218]. However, one study indicates that children between 3 and 4 years of age lose interest in the robot over time [147].

Environmental factors. External factors, such as cultural differences [104, 179] or task characteristics [234], have been examined too (see Web-Appendix 5). The existing studies clearly reveal cultural differences regarding human responses to emotions during HRI. Furthermore, it could be shown that the task complexity matters for the trust in robots during HRI [234].

Summary of findings. Environmental characteristics (e.g., culture of human participants) and characteristics of the involved parties (humans or robots) matter. Furthermore, control over the robot increases robot acceptance during HRI. The studies further indicate that the duration of the interaction matters for humans’ emotional responses to HRI. As most extant research is cross-sectional in nature, their conclusions should be treated with caution.

4 Discussion

4.1 What Do We Know?

Answering the first four research questions of this survey attempted to provide insights on current empirical knowledge on emotions during HRI (see Sect. 1). This review reveals that the field is well researched, with many methodologically sound empirical studies. The domain integrates findings pertaining to robotics, HRI, and psychology, though the different disciplines reveal some variations in their research focus. For example, robotics research mainly seeks technical specifications to improve HRI, but researchers from the HRI, social robotics, or psychology traditions are primarily interested in human responses to interactions.

In terms of theoretical backgrounds, research in the latter domains appears more strongly theory driven, whereas robotics research is more technology focused. Accordingly, the specific theories used in prior research can be assigned to three categories: (1) Classical concepts of human emotions, such as FACS, the circumplex model of emotions, and Plutchik’s wheel of emotions, (2) approaches to social interaction, such as social identity theory, the similarity attraction paradigm, emotional contagion, or a social agency perspective, and (3) concepts related specifically to HRI, such as the uncanny valley paradigm.

Classical human emotion concepts specifically address expressions of basic human emotions, which can be transferred to robotic emotional expressions. The theories in the other two categories are broader, in the sense that they explain the underlying mechanisms that lead humans to respond in a particular manner during human–human interactions. Previous research suggests a fairly consistent pattern of human responses. Robot acceptance depends strongly on the robot’s exhibited characteristics (e.g., empathy, personality), emotional displays, and emotional capabilities (e.g., competence), as well as the human’s prior experience with robots.

Studies typically analyze the direct links of these emotion-related input variables with robot acceptance, without considering the possible indirect effects (e.g., through mediators). This gap is surprising, because several process variables, such as robotic perceived naturalness, emotional responsiveness, and pleasantness, have been studied as antecedents or outcomes in extant research.

Furthermore, the review provides clear evidence that moderator variables are relevant for studying emotions and robot acceptance. In other words, the strength of the links between emotional input variables and robot acceptance is systematically influenced by other variables. However, research related to such moderating effects is rather fragmented and more research is needed.

4.2 What Remains to be Learned?

Despite considerable progress achieved by empirical research on emotions during HRI, this review also reveals several limitations of previous empirical research. This section therefore relates to the fifth research question of this survey (see Sect. 1) that asks What remains to be learned regarding emotions during HRI? and outlines seven suggestions for continued empirical research on HRI and robotic psychology.

Suggestion 1: Gain a better understanding of the underlying mechanisms for human responses to robotic emotions.

Clarifying underlying mechanisms that drive human responses to robotic emotions would provide an answer to an important “Why” question: Why do humans respond in the way they do to artificial emotions? Is it because they compare their expectations toward robots with the perceived robotic emotions, experienced in the HRI, as suggested in the expectation-disconfirmation paradigm [34, 197, 272]? Is it because humans compare their emotions with those expressed by robots [81]? Do they assign robots to their own or another social group as indicated by social identity theory [248]? Or do humans become infected with robotic emotions, similar to the emotional contagion that takes place during human–human interactions [108, 112]?

Previous research offers a rich range of possible theoretical approaches to make predictions about suitable robotic emotions (e.g., FACS model [71], circumplex model of emotions [216]) or human responses to robotic emotions (e.g., emotional contagion theory [108], social identity theory [248], TAM [57]). However, still many studies fail to draw explicitly on theoretical approaches to establish or justify their hypotheses. Hypotheses development should be based firmly in theories that have been well established with respect for human–human interaction (e.g., [69, 168]) or new theories on HRI should be developed. Table 3 provides an overview of potentially fruitful psychological theories that could be applied (and extended), as well as some sample research questions, to gain a deeper understanding of the theoretical mechanisms at play during HRI (for an overview on robotics psychology, see [234]).

Table 3 Selected theoretical contributions related to the mechanisms for how humans respond to emotions during HRI
Fig. 6
figure 6

Empirical design characteristics of extant research on HRI

Suggestion 2: Investigate contingency effects to a greater extent.

The logic for examining contingency factors proposes that there is not one best HRI design [260]. Rather, the human-related outcomes of HRI depend on the culture (for an overview see [79]), the setting (for an overview see [177]), the scenario (for an overview see [285]), and the human participants. Some empirical studies that focused on emotions during HRI mention moderator variables (see Sect. 3.4), but research in this area is still scarce. Conceptual articles distinguish several categories of potentially relevant contingency factors [234], such as the interaction setting and its duration, but no integrative, empirical analysis of situational variables has been published. Researchers should pursue such a contribution.

Suggestion 3: Define uniform standards for the experimental investigation of emotions in HRI experiments.

Most of the studies in this review are based on experimental investigations. They are relatively heterogeneous in their experimental design, as is particularly evident in the repetition frequencies, study period, sample, and form of interaction (e.g., direct face-to-face versus indirect online or via virtual reality), as Web-Appendices 3 and 5 reveal. This heterogeneity is challenging in two respects. First, it makes it difficult to compare the findings across studies. Second, the quality of the findings is difficult to assess, particularly due to the lack of design science research dedicated to investigating human reactions to HRI. The few available contributions [116, 285], deserve more attention; more work also is needed in this field.

Suggestion 4: Compare different HRI scenarios with regard to their effectiveness.

In terms of possible scenarios, the studies can be differentiated according to whether HRI takes place directly or indirectly (see Fig. 6):

  • Media-supported HRI is mostly used in online studies or face-to-face experiments in which images or videos of robots are used.

  • Direct HRI is mostly used in face-to-face experiments where human participants interact with real robots or parts of robots (e.g., head, upper body).

Fig. 7
figure 7

Cybernetic framework of emotions during HRI

Although, no differences in emotion recognition rates across different scenarios of HRI could be found in this survey, the varying degrees of immersion likely cause humans to react differently to images or videos of robots than to a face-to-face HRI in a real–world situation though. The lack of differences between the scenarios in the Mann–Whitney test also should be evaluated cautiously, due to the strong heterogeneity across the experimental studies considered. Despite a few studies of these questions [285], no clear findings are available.

Suggestion 5: Use real–world environments to test the effects of emotions during HRI.

Some recent studies of HRI take place in real–world settings, such as homes [85, 244], workplaces [188], elderly care facilities [218], schools [128, 147], shopping malls [129] [130, 191], or a university campus [142]. But most studies continue to rely on laboratory settings (see Fig. 6). This setting has the advantage of limiting external errors, due to the controlled nature of the experiment. However, the external validity of the results is limited, and they are difficult to generalize to real–world settings. That is, the results may be valid in an experimental setting but not in realistic settings [100]. Levitt and List [163, 164] explicitly note concerns about extrapolating interpretations of data from lab experiments to the world beyond. The lack of studies that move beyond the laboratory also is surprising, because a real–life, face-to-face HRI scenario is the most informative [285]. As robots take on more roles in society and business, continued research should examine emotions during HRI using real–world, private environments and business settings, including both customer–robot [236, 238] and employee–robot interactions [234].

Suggestion 6: Analyze longitudinal effects of emotions during HRI.

Most studies rely on cross-sectional data, so their findings stem from a single interaction, which could reflect humans’ sense of surprise when they met a robot for the first time. The few existing longitudinal studies clearly indicate that the duration and repetition of HRI matter for human emotional responses. Additional research should examine longitudinal effects of emotions, accounting for not only first impression effects but also the effects of emotions and potential changes of HRI over time [158]. Understanding these long-term effects of emotions during HRI with social robots is important, because most real–world applications aim for long-term uses of robots. Researchers thus might investigate whether and how humans’ communication with the robot or other humans change over time.

Another interesting question relates to potential responsibility shifts over time, as famously exemplified by the increased automation bias resulting from the use of navigation systems in cars [94]. Extant research also indicates that an automation bias can arise during HRI [257]. Continued research could examine whether a similar responsibility shift occurs during long-term HRI and how this affects human emotions.

Suggestion 7: Examine feedback loops during HRI to a greater extent.

This review indicates that extant studies tend to analyze relationships between emotion-related input variables of robot acceptance by humans according to simplistic, “one–stage” models (see Fig. 5). Analyzing such simplistic models provides only limited understanding of the driving forces for HRI, because they cannot distinguish direct versus indirect effects on robot acceptance. This limitation is critical, because some categories of success factors (e.g., robotic social cues) likely affect robot acceptance only indirectly, rather than directly. A systematic analysis of such structures is possible only if researchers use complex integrative models that support the simultaneous analysis of both direct and indirect effects in a single model. Such integrative studies also would be consistent with the logic of the S–O–R model [267, 289].

Furthermore, the logic of the S–O–R model should be extended with potential feedback loops to consider the dynamic robotic expression of emotion. For example, PSI theory [67] addresses the interplay among motivational stimuli, cognitive processes, and outcomes. An interactive feedback model also would account for the robot’s sensitivity to what the human is doing (such that robots need a sophisticated system to recognize human emotions). A cybernetic framework that can account for the dynamic and adaptive nature of emotions during HRI appears necessary (see Fig. 7, inspired by [291]).

5 Conclusion

Social robots are an increasingly pervasive reality in daily lives, and they have prompted more than 1,600 studies in the past two decades. However, the interdisciplinary, fragmented state of research on emotions during HRI with social robots makes it difficult for researchers to develop new insights and ideas using extant studies. This review systematically condenses extant knowledge. In terms of human recognition of robotic emotions, studies that examine the five primary emotions suggested in the FACS model are identified. Although such studies include different robots and emotional expression modes (facial, bodily, both), humans can recognize on average about 50% of a robot’s emotions correctly; for some high arousal emotions, such as happiness and anger, the average recognition rates are even higher. In terms of human responses to robotic emotions during HRI, extant research has made a lot of progress. Emotions inform the interaction intensity and positive human responses to a robot. The findings from this review suggest conceptual and methodological suggestions for further research. In turn, they hold the promise to generate meaningful impacts and encourage further empirical research in this field.