A three-component framework for empathic technologies to augment human interaction
- First Online:
- Cite this article as:
- Janssen, J.H. J Multimodal User Interfaces (2012) 6: 143. doi:10.1007/s12193-012-0097-5
- 2k Views
Empathy can be considered one of our most important social processes. In that light, empathic technologies are the class of technologies that can augment empathy between two or more individuals. To provide a basis for such technologies, a three component framework is presented based on psychology and neuroscience, consisting of cognitive empathy, emotional convergence, and empathic responding. These three components can be situated in affective computing and social signal processing and pose different opportunities for empathic technologies. To leverage these opportunities, automated measurement possibilities for each component are identified using (combinations of) facial expressions, speech, and physiological signals. Thereafter, methodological challenges are discussed, including ground truth measurements and empathy induction. Finally, a research agenda is presented for social signal processing. This framework can help to further research on empathic technologies and ultimately bring it to fruition in meaningful innovations. In turn, this could enhance empathic behavior, thereby increasing altruism, trust, cooperation, and bonding.
KeywordsEmpathyAffective computingSocial signal processingEmotionHuman interaction
Imagining a world without empathy paints a grim picture for humanity. Empathy is at the basis of cooperation, bonding, altruism, morality, and trust [13, 88, 120]. Moreover, evolution of the human race has strongly depended on empathy as a necessary process to facilitate social support and enhance chances of survival . Although humans are biologically wired to be empathic, several scholars have argued that there is a need for more empathy. According to Rifkin , the rise of Homo Empathicus is the main force against the degeneration of our planet. Furthermore, de Waal  has argued that more emphasis on empathy instead of competition and individualism can help to prevent future social and economic crises.
Empathy is a communicative process of understanding and responding to the (inferred) feelings and emotions of others [45, 57]. The word empathy can take on different meanings, depending on the scholar or field in which it is used . Nonetheless, there is now considerable agreement among psychologists and neuroscientists that empathy consists of three different aspects , which will be elaborated later on: recognizing someone else’s emotional state (i.e., cognitive empathy), the convergence of feelings between people (i.e., emotional convergence), and responding to another person’s (inferred) feelings or the emotional convergence those feelings initiate (i.e., empathic responding). Empathy should be distinguished from sympathy, with which it is sometimes conflated. Sympathy is a specific type of empathic response that signals care and provides social support . In contrast, empathic responses may also include responses to other’s positive emotions or may be more self-focused. Especially individuals who have difficulty regulating their own emotions and are low in self-awareness may confuse their own emotions with those of other individuals they are interacting with. In those cases, empathic responses often consist of personal distress instead of sympathy [13, 54]. As such, making a distinction between the self and others is important for empathic responses.
Some authors have taken a somewhat broader definition of empathy that not only focuses on feelings and emotions but also on cognitions, intentions, and beliefs others may have or experience (e.g., ). Although such definitions are not necessarily wrong, for the purposes of this paper I have adopted a notion of empathy that only includes emotions and feelings. Taking this narrower definition provides focus, is an accepted stand taken among psychologists and neuroscientists [14, 45], and helps to better define the possibilities for affective computing and social signal processing to incorporate empathy as a research topic (which is the goal of this paper). Furthermore, empathy is also sometimes taken to be a person’s disposition instead of a communicative process. I will refer to an individual’s empathic disposition as empathic ability or empathic skill throughout the rest of the paper.
One way (among others) to improve empathy in human interaction could be through technological innovations that can measure and take into account empathy [109, 113]. Such technologies might make empathy a more salient and influential construct in human interaction. Furthermore, they could help to signal empathic deficits and train people in empathic responding. In turn, this might have a beneficial influence on our societies and revive empathic values [44, 144]. Therefore, the goal of this paper is to provide a survey on the different aspects of empathy and show how they might be incorporated into affective computing and social signal processing research. Such research can proof to be a fruitful basis for future applications and technologies that could measure and augment empathy (i.e., empathic technologies). To provide a theorethical basis for the incorporation of empathy as a topic in affective computing and social signal processing, I will present a three component framework of empathy that describes the different processes involved in empathic interaction.
Combining the empathy framework with advances in affective computing and social signal processing provides a strong basis for empathic technologies [133, 174]. Affective computing is the field that tries to develop machines that can recognize emotions and respond appropriately to them [132, 133, 152]. Over the last decade, research on all kinds of different aspects of affective computing has rapidly increased. Computers are now able to automatically detect and recognize different emotional states from facial expressions, speech, and physiological signals [35, 189]. Moreover, other studies have focused on synthesizing emotions in (conversational) agents to improve social interactions with artificial entities [78, 129, 147, 169]. This provides a useful knowledge base for further inquiries specific to technologies focused on empathic interactions between humans.
Social signal processing investigates machines that automatically identify and track social signals that humans display during their interactions . This research is mainly targeted at detecting the nonverbal behavior that permeates our interactions . Examples of such nonverbal signals are posture, facial expression, gestures, vocal characteristics, gaze direction, silence, or interpersonal distance. Note that these signals need not necessarily relate to emotions. However, as I will show, many of these social signals are involved in empathic interactions.
There have been some studies that have tried to measure or augment empathic aspects of human-human interaction. For instance, the work of Pentland [130, 131] shows how speech parameters can be used to extract several interaction parameters like mimicry and intensity. As shown below, these parameters can be related to empathy. Furthermore, some studies have tried to measure empathy through synchronization in physiological parameters (i.e., similarity in physiological changes between two or more individuals). For instance, Marci and Orr  have linked therapist empathy to physiological synchronization between therapist and client. Additionally, Gottman and Levenson  have related physiological synchronization between spouses to marital experiences. Other research groups are focusing on specific groups of people, for instance people with autism spectrum disorder who have great difficulty engaging in affective interactions . Some more examples of empathic technologies include the work of Sundstrom and colleagues  who developed eMoto, a closed-loop emotion system for mobile messaging based on gestures. Furthermore, Janssen and colleagues [98, 158] worked on using physiological signals as intimate cues. They showed that communicating a heartbeat signal can transform our experience of a social situation and the behavior we display towards the person we are interacting with. Another example comes from Balaam and colleagues , who showed that subtle feedback about interaction behavior can enhance interactional synchrony and rapport. Finally, mediated empathic touches have been investigated by Bailenson and colleagues , who showed how mediated handshakes can be communicated and transformed to signal different emotions.
In this paper, the focus is specifically on human-human interaction as opposed to human-machine interaction. Although some empathic processes are likely to be similar in human-human and human-machine interaction , others might work very differently. As I will describe, empathy is an inherently interpersonal communicative process and many of the methods and techniques presented in this paper require two or more humans to be interacting. For one thing, the communication of empathic responses is necessary for the occurrence of empathy. Therefore, empathy is treated as a property of an interaction and not as a property of an individual. Furthermore, the psychological framework around empathy is based on research on human-human interaction and it is at this moment unclear how the processes in the framework generalize to human-machine interaction. Nonetheless, there can be individual differences in empathic abilities or empathic skill, and such abilities or skills might be trained or taught.
2 Three levels of empathy
Cognitive empathy (i.e., the cognitive ability to infer what another is feeling).
Emotional convergence (i.e., the ability to experience another person’s emotions).
Empathic responding (i.e., a response to another person’s distress consisting of sympathy or personal distress).
2.1 Cognitive empathy
Cognitive empathy is the process of inferring or reasoning about others’ internal states [64, 184]. In other words, cognitive empathy relates to the detection of how someone is feeling. For instance, a successful cognitive empathic inference entails an observer recognizing a person’s feeling as sad when that person is in fact sad. This is in line with the definition of Decety and Jackson , who describe this cognitive process as an important part of empathy. Cognitive empathy has also been called internal state empathy [138, 182], mentalizing , and theory of mind . Empathic accuracy is the accuracy of cognitive empathic inferences , and therefore strongly related to cognitive empathy. In other words, it is an indication of how accurate our inferences about others’ feelings are.
Considering the evidence for cognitive empathy, it becomes clear that cognitive empathy consists of mainly higher order cognitive processes. This is supported by neuroscience studies that have identified different regions of the neocortex involved in cognitive empathy . For instance, studies have found that processes involved in cognitive empathy activate regions including the medial prefrontal cortex, the superior temporal sulcus, and the temporal poles [71, 149, 154]. These studies have used both healthy individuals as well as individuals with lesions. In contrast, affective empathic processes (see next section) relate to structures typically involved in emotional processing like the amygdala and the insula [96, 156].
Cognitive empathy is related to a few of noteworthy cognitive and social findings. Research shows that cognitive empathy improves when people are more familiar with each other . This familiarity can increase rapidly with high amounts of self-disclosure (i.e., the sharing of personally relevant information; ). As sharing feelings is a form of self-disclosure, regularly sharing feelings with a certain individual will improve the individuals chances of correctly recognizing those feelings (i.e., cognitive empathy). Furthermore, verbal information is the most important information channel for cognitive empathy in humans, as Gesn and Ickes  and Hall and Schmid Mast  showed in two studies that investigated the effect of different verbal and nonverbal information channels on empathic accuracy. Facial expressions were found to be the least important information channel in making empathically accurate judgments. More recently, Zaki and colleagues  confirmed this result by comparing verbal and facial signals with continuous ratings. This information makes it easy to make cognitive inferences about how someone is feeling. Hence, for the cognitive empathy component this is a very important source of information. As will be shown later, this might be different for the other two components in the empathy framework.
Cognitive empathy, or a deficit thereof, has strong effects on different aspects of our social interactions. For instance, maritally abusive men score lower on empathic accuracy than non-abusive men . This suggests that enhancing empathic accuracy can potentially reduce marital abuse. Furthermore, and perhaps unsurprisingly, a strong link has been found between autism and a deficit of cognitive empathy [8, 9]. Hence, people with autism spectrum disorder are likely to benefit from empathy enhancing technologies. Additionally, Crosby  suggests that mothers who are more accurate in inferring their children’s feelings have children with the most positive self-concepts. This is likely to be related to attachment theory as empathic accuracy can help mothers to create more secure attachment . Finally, adolescents with lower empathic accuracy are more likely to be the target of bullying and are more likely to be depressed . Although most of these studies are correlational, all cases of low cognitive empathy suggest a clear benefit of technology that can improve empathic accuracy.
2.2 Emotional convergence
Emotional convergence is the second component of empathy, and is the process of emotions of two (or more) interacting individuals becoming more similar (because emotions of either one or both individuals adjust to another’s state). This process is often thought to arise from implicit emotional contagion processes . Emotional contagion is defined as “the tendency to automatically mimic and synchronize facial expressions, vocalizations, postures, and movements with those of another person and, consequently, to converge emotionally” (p. 5, ). In other words, emotional contagion is a low level automatic process constituted by mimicry and feedback. Other concepts that are strongly related to emotional contagion are motor mimicry [50, 51, 88], facial empathy , imitation [119, 165], motor empathy [23, 24], or emotion catching .
The first step in the emotional contagion process is the automatic mimicry of facial, vocal, and/or postural information. For long, researchers have shown that people automatically mimic the expressions of those around them through facial [87, 91, 159], vocal , and postural [18, 19] expressions. Such automatic imitation behavior can already be observed in preverbal children . Neuroscience has suggested that mirror neurons could provide the common ground through which mimicry might work [156, 181]. Mirror neurons become active when a certain action is performed as well as when that same action is perceived [72, 145]. Hence, when a certain gesture or facial expression is perceived, mirror neurons fire that innervate motor neurons related to the same gesture or facial expression. This way, perceived gestures or facial expressions are also triggered in the observer, supporting imitation and mimicry. Therefore, these neurons might provide a mechanism for establishing a common ground between someone’s actions and perceptions.
In the second step of emotional contagion, the bodily changes induced through mimicry provide feedback to the central nervous system and influence emotional experiences . Again, research has shown that facial [63, 106], vocal [53, 185], and postural feedback [1, 82] all influence emotional experience. This is related to the James-Lange view on emotions, which suggests that emotions are perceptions of one’s own bodily states [97, 139]. Hence, these bodily expressions influence our emotional states. Taken together, through processes of mimicry and bodily feedback emotions between two or more interactants can automatically converge.
Although bottom-up emotional contagion processes are a possible mechanism to generate emotional convergence, there are likely to be other processes involved in emotional convergence. This idea stems from the fact that evidence for the second part of the emotional convergence process, namely the feedback processes from the body to the central nervous system, has sometimes been considered as unconvincing . Effect sizes from research on facial feedback are small at best . Hence, it is unlikely that emotional convergence proceeds completely unconsciously and automatically. Instead, emotional convergence could be understood as an interplay between bottom-up and top-down processes. In a recent review, Singer and Lamm  show that the automatic bottom-up and cognitive top-down influences can be differentiated by their temporal characteristics. There is evidence for an early automatic response and a later cognitive response . An example of such a cognitive influence is whether or not we attend to others’ emotions, as attending to the other improves emotional convergence . Because emotional convergence is also influenced by cognitive factors, it can be seen as (partly) building on the cognitive empathy component.
Emotional convergence is likely to be moderated by environmental and social factors . First, environmental factors can influence emotional convergence as emotional convergence can emerge because two persons are in the same emotion-eliciting context. For example, simultaneously watching a scary movie will, to a certain extent, elicit similar (and thus converged) emotions in its perceivers as long as the movie triggers the same emotions in its perceivers . Second, recent work showed that emotional convergence can be stronger in persons that are more familiar with each other . For instance, Cwir and colleagues  showed that self-reported feelings and physiological arousal converge more when social connectedness was induced among strangers. In sum, there are significant social and environmental influences on emotional convergence as well.
2.3 Empathic responding
The third component of empathy consists of someone’s response to another person’s distress . This response can consist of sympathy, focusing on alleviating the other’s distress . Sympathy consists of feelings of sorrow or concern for someone else . However, the response can also be one of personal distress. Personal distress is an aversive reaction to another’s distress, focused on alleviating one’s own distress . As such, personal distress is focused on the self . The empathic responding component is similar to the third part of the empathy definition of Decety and Jackson , which describes a response of sympathy or distress to another’s distress. Hence, it requires a differentiation between self and other which makes it different from emotional convergence . It has been argued that there are other possible empathy responses , but sympathy and personal distress are the two that have received most attention and are widely accepted in both psychology and neuroscience. Therefore, I focus specifically on these two responses.
Whether someone’s empathic response consists of sympathy or personal distress is mostly related to one’s ability to self-regulate emotions [58, 59]. On the one hand, low self-regulation capabilities when viewing another’s emotional state likely result in overarousal when viewing another’s negative emotional state. In turn, this overarousal leads to a self-focused response of personal distress with the goal to alleviate some of this negative emotional arousal . On the other hand, individuals who can self-regulate increases in emotional arousal are more likely to respond with sympathy focused on reducing some of the others distress . Finally, it is also thought that a certain minimal amount of arousal is necessary for any empathic response at all. This comes from studies that have shown that a lack of arousal has been related to difficulties in sympathizing and can result in increased psychopathic tendencies . Neuroscientific evidence for the importance of self-regulation comes from Spinella , who showed that prefrontal dysfunction (which is related to self-regulation) was positively related to expressed personal distress and negatively related to expressed sympathy.
Differences between sympathy and personal distress have also been related to other social and developmental phenomena. Sympathy is generally positively related to prosocial behavior, whereas personal distress is negatively related to prosocial behavior . For instance, altruistic behavior can be induced by sympathy. Furthermore, abusive parents often report personal distress reactions towards distress in infants . In turn, this might negatively influence children’s sympathetic abilities, as they are related to parents’ sympathetic abilities . In line with these findings, supportive parenting has been related to higher levels of children’s self-regulation . This suggests that helping parents to regulate their emotions and react more sympathetically will have strong beneficial effects for their children. Finally, in adolescents, sympathy has been associated with self-efficacy and managing negative emotions [6, 47]. Hence, individuals who are confident of their own capabilities (i.e., high in self-efficacy), are likely to be better at self-regulating emotions. Nonetheless, it should be noted that most of this research uses solely correlational methods, making the causality of the effects difficult to judge at this time. Finally, low self-awareness and self-regulation of emotions can also put a strain on professionals dealing with distressed individuals in their work (e.g., therapists). Such professionals can have problems delivering help and support and have an increased risk of burn-out.
The precise interactions between empathic responding and the other two components of empathy is not entirely clear. According to Eisenberger and Fabes , sympathy and personal distress may arise from both emotional convergence and cognitive empathy. It is unclear if empathic responding can influence cognitive empathy or emotional convergence. Hence, more experimental research is needed to shed light on the exact interactions between these components.
3 Empathy in affective computing and social signal processing
Having established a conceptual framework around different components of empathy, the next step is to show how these components relate to current and possible future practices in affective computing and social signal processing. In particular, I will first argue that affective computing research has so far mainly been focusing on the cognitive empathy component. From there, the next step would be to start taking into account the other two components of empathy, which could be of great value for affective computing and social signal processing applications.
A considerable part of affective computing research has focused on predicting mental states, especially emotional states, from different modalities [38, 127, 135]. This research shows that computers can reasonably accurately learn to recognize emotional states, often with recognition accuracies of 80 % or higher [35, 172, 189]. Recognition of emotional states is what the cognitive empathy component focuses on. Ideas for applications of cognitive empathy systems are manifold: for instance, affective music players [99, 104], emotionally intelligent cars , or emotionally adaptive games . Affective computing focused on cognitive empathy enables such applications.
A reason why cognitive empathy has received so much attention from the affective computing research community to date might be because this particular form of empathy does not rely heavily on analyzing interactions. Instead, cognitive empathy can be artificially created by only taking into account the individual from whom the emotional states are to be recognized. Therefore, cognitive empathy is easier to incorporate in affective computing and social signal processing than the other empathy components that do require dyadic processes. Moreover, creating cognitive empathy naturally links to popular machine learning techniques aimed at recognizing patterns in all kinds of signals .
In contrast to cognitive empathy, the emotional convergence component of empathy can, by definition, only be considered within social interaction. Understanding emotional convergence requires integration of measurements of at least two interacting agents . Because of the role of mimicry, emotional convergence is directly related to changes like postural, vocal, facial expressions, or physiological changes [82, 83]. This is a great advantage for social signal processing as it provides a relatively accessible starting point from a measurement perspective. By extracting different features of these modalities and seeing how they converge between people, an index of emotional convergence can potentially be computed. Such features are, for instance, facial expressions, gestures, or movement patterns . Hence, recent advances in social signal processing can be very beneficial to the detection of emotional convergence . Moreover, many studies have already focused on automated extraction of facial, vocal, or physiological features of individuals [7, 125, 172]. This could be the basis of a method that can measure the influence of other’s emotional states on one’s own emotional state. Note also that such an approach is typically multimodal, as integration of different modalities (face, movement, gestures, speech) often leads to better performance .
The third component, empathic responding, can build on emotional convergence and cognitive empathy. To understand a response it is important to know to what or whom a response is being made. In particular, it will be of importance to know if the other is in distress (cognitive empathy, ) and if those feelings of distress have also been transferred to the sender of the response (emotional convergence, ). This is necessary because cognitive empathy and emotional convergence provide the basis for empathic responding. In other words, they provide the necessary context awareness for detecting empathic responses. From there, being able to track people’s empathic responses could help to train individuals in their responses. Research on empathic responding could also inform the design of artificial agents that need to respond empathically to a user. Finally, empathic response measurements could be used as input for machines that need to detect empathic responses to their behaviors. As such, insights from psychology on empathic responding can be very valuable for social signal processing.
Because there has already been a lot of research on cognitive empathy in affective computing, the rest of this paper focuses specifically on issues surrounding emotional convergence and empathic responding. First, I will present different ideas for applications around those components of empathy that can motivate future research. Second, I will go into different possibilities for automated measurement of emotional convergence and empathic responding, which is necessary for many of the presented applications. Finally, I will discuss some methodological issues that surround research on empathy in social interactions.
4 Applications of empathic technologies
Splitting the construct of empathy into three different components can help in thinking about different applications of empathic technologies. As discussed, applications on the cognitive empathy component have already been identified (e.g., ). Therefore, the focus in this paper will be on applications of emotional convergence and empathic responding. The types of applications are split into applications passively quantifying human-human interaction and actively supporting human-human interaction, both with the goal to augment the interaction process. The goal of this paper is not to fully describe possible applications. Instead, application ideas below are suggestions that can motivate further research on two of the components of empathy described above.
4.1 Quantifying human-human interaction
In the following paragraphs, I present some applications for which empathic quantification is an enabler. Note that these examples are targeted solely at measurement of interaction and not necessarily at influencing or supporting these interactions (which will be discussed in the next section). Hence, this section describes applications that are based on measurement of empathy and not of effects these measurements may have on that specific interaction.
First of all, empathy measurement can be used as a tool for personal performance management. In many professions, relating empathically to one another is an important aspect of successful performance. Physicians have to relate empathically to patients [108, 177], teachers have to relate to students [117, 122], or salesmen have to tune in to buyers [116, 166]. In all these examples, more empathy will likely improve the professional’s success in reaching his or her goal. Hence, it is important that the feelings of the professional can quickly converge to the feelings of their interaction partner. In addition, it is maybe even more important that they respond to these interaction partners with sympathy in such a way that the emotional convergence does not lead to personal distress (i.e., the empathic responding component). Therefore, evaluating the empathic performance of such professionals would benefit from an analysis of their empathic abilities. Nonetheless, empathic performance is still difficult to capture, with questionnaires often being used as proxies for actual behavior . In those cases, empathy is often considered a trait, while it can differ greatly between different situations and interaction partners . With automated empathy quantification, it will become possible to evaluate the empathy skills of professionals during their actual work, possibly giving a more precise and continuous indication of their empathic performance.
Second, empathy can be used for interpersonal performance measurement. In those applications, the relationship between two or more persons can, to a certain extent, be quantified by measureing empathy, which is a possible predictor (from a set of predictors) of how well two or more persons relate to each other. In a professional setting, this information can be used to optimize team performance. Henning and colleagues  have shown that emotional convergence is a significant predictor of group performance. Hence, based on emotional convergence measurements, groups can be changed to get an ideal composition of the right people. In a more private setting, emotional convergence and empathic responding indices can also be used to predict the success of romantic relationships. Gottman and Levenson  have shown that indices of personal distress and sympathy were able to predict if the partners would still be together 15 years later with 80 % accuracy. Hence, empathic measurements of emotional convergence and empathic responses could be used for both private and professional interpersonal performance management.
Third, empathy measurement can be used as an evaluative tool for new technologies and products. Many of our current interactions with social partners are mediated by some kind of technology, be it social media, a telephone, videophone, immersive virtual reality [26, 68], or complete telepresence installations . Many of these tools are aimed at providing the social power of face-to-face interactions through shared virtual environments when people are not co-located. To evaluate the success of these and future communication tools, evaluating the level of empathy is an important aspect, as different communication channels are known to support different levels of empathy . Hence, it is most important to see how a communication medium supports emotional convergence, as this is the basis for empathic interaction and depends on automatic processes that use low-level information that is often absent in mediated communication. In sum, automated empathy measurement can help to test different communication tools to optimize mediated communication.
Finally, automated empathy measurement could become an important scientific tool. Many social science experiments are conducted in heavily controlled lab situations. Such laboratory approaches are sometimes said to entail low ecological validity, may miss processes that occur in real-world interactions, and have trouble comparing effect sizes of different processes in the real-world [124, 180]. The recent advances in unobtrusive sensing platforms have focused primarily on individual’s emotions , instead of interindividual empathy processes like emotional convergence and empathic responding. Hence, it is currently not possible to continuously and unobtrusively measure empathic processes in field studies. This is why most scientists use lab studies in which the constructs and processes that cannot be measured are controlled for. Reliable automated empathy measurement could significantly enhance scientific inquiries into social processes by enabling more sophisticated field studies.
Taken together, the above categories describe some powerful applications of empathic technologies for automated emotional convergence and empathic responding measurement. These general categories are unlikely to be complete, let alone the fact that the applications described in each category are merely a few examples from a rich set of challenges and opportunities. This shows the wealth of possibilities for applying empathy measurement in practical applications.
4.2 Supporting human-human interaction
Research on empathic accuracy has shown that empathy can be trained in humans [10, 112]. Such empathy training requires feedback on the level of empathy during or after certain interactions. An empathy training system could provide such feedback based on automated measurements of empathy. Users could then try different strategies to improve empathy during the interaction and get (immediate) feedback on these strategies. On the one hand, this process can either be used to improve empathy in the short term, within one interaction. On the other hand, this process can be used to improve long-term empathic abilities of a user (i.e., over different interactions). It is likely that empathy training systems will both enhance short-term and long-term interactions. Strategies to improve empathy within one particular interaction might not generalize to all interactions. In sum, empathy measurement could enable training and improvement of empathy-related responses through feedback mechanisms.
The applications for such a mechanism are manifold. In the professional domain, one could think of salesmen, teachers, or therapists that have to tune in to their interaction partners to make their interactions more successful. Especially in the medical domain, there has been a lot of research on the effects of empathy on the healing process of patients [108, 177]. Moreover, all these professions often already have some form of training for their specific interactions, in which an empathic training system could be easily integrated. Another professional application of empathy recognition could support call center agents in interactions with callers. Emotions often play an important role in call center conversations, and it might be helpful for human call center agents to get more in tune with the emotions of the customer, so that the customer might better understand the call center agent’s position and vice versa. These are typical applications in which people are trained to empathize with many different people. There, the feedback could be useful within each interaction, as each interaction is performed with a different human.
In the personal domain, the empathic feedback mechanisms could potentially be used to enhance the interaction of close friends, family, or romantic partners. As discussed in the previous section, a lack of empathy can have serious consequences for marital interaction, where increases in personal distress and decreases in sympathy relating to lower relationship durations. Hence, in these situations, empathy feedback systems could help people to better tune into a specific individual to which they are close. This can significantly improve the quality and duration of the relationship. This is especially important nowadays, as more and more people are reporting they have no one to share important matters with and feel lonely . In turn, this is likely to have severe consequences, as social connectedness is often said to be the single most important thing for our health and well-being .
In sum, several examples above suggest that empathic measurements can also be applied to support and improve social interactions by creating feedback loops. These loops can help users to train their empathy skills, either towards a specific user or to people in general. This can be applied in professional as well as personal domains.
5 Automated empathy measurement
This section describes approaches to detecting different levels of empathy between humans. As in the previous section, I will focus here on emotional convergence and empathic responding, rather than cognitive empathy that has largely been covered by affective computing research (see  and  for reviews). Automated empathy measurement is discussed based on three different modalities that have so far received the most attention in affective computing and social signal processing: facial expressions, speech, and physiological signals. Note that measurement of emotional convergence and empathic responding has not received much attention, so most of the discussion is based on a generalization of the definitions of these constructs to these modalities.
5.1 Emotional convergence
It is widely acknowledged that emotions are closely related to facial expressions, speech, and physiology [7, 62, 63]. Hence, similarity of emotions also leads to similarity in emotional expression. Therefore, emotional convergence can potentially be assessed by analyzing the similarity between the facial expressions, physiology, and speech parameters of two or more interacting individuals. From this perspective, measuring emotional convergence might seem simple. Nonetheless, there are still a number of challenges that need to be resolved before emotional convergence can be measured automatically. In the following paragraphs, I present a step by step approach to measuring emotional convergence.
The first step to measuring emotional convergence is to track facial, speech, and/or physiological signals from two or more users. With increasing availability of wireless, unobtrusive measurement platforms around this has become relatively simple. For instance, the sociometer badge from Pentland and colleagues  can unobtrusively track speech. Physiological signals can be measured through wireless unobtrusive wearable sensors . Depending on the application, developers might choose which modalities are most useful. When users are mobile, tracking facial expressions might be difficult and speech and physiological signals could be more appropriate. On the other hand, in video conferencing applications, facial expressions can easily be tracked by the cameras used for the video recordings.
Third, different low-level features can be extracted that capture relevant properties of the modalities. For facial expressions, features are often values of different points on the face, for instance, from the facial action coding system . These can also be measured dynamically . Speech features might be intensity or pitch, which are often employed in affective computing research [65, 103, 150, 151]. Physiological features that are likely to be coupled to emotional convergence are skin conductance level and heart rate variability, as these are strongly coupled to emotions [20, 27, 31].
Fourth, the extracted features from the two individuals should be synchronized. Considering the temporal aspect of the signals is important, as similarity in the expressions not only entails similarity at one point in time, but also similarity in change of the signals. Therefore, it is necessary to take into account signals over time. Moreover, there might be a time lag between the sender of an expression and a receiver responding to this the expression . Testing for time lags can be done by comparing the signals at different lags (for instance in a range of −5 to +5 seconds) and seeing if similarity increases or decreases . When typical time lags are known they can be applied by shifting the signal in time. Hence, synchronization (at a certain lag) of the signals is an important aspect of the emotional convergence measurements. This might be easy to do in laboratory situations, but can be difficult in practical real-world applications as synchronization requires timestamp signals from all users and a method to synchronize them (i.e., provide a handshake mechanism). Moreover, if different users are using different systems the systems should use the same method for handshaking.
Different algorithms that can be used to calculate similarity or dissimilarity between two temporal signals
Time domain similarity measure giving a value in [0, 1]. For continuous signals a Pearson correlation can be used, whereas Kendall and Spearman indices measure correlations between ranked or ordinal data.
Frequency domain similarity measure giving a value in [0, 1]. Sometimes, weighted coherence is used by correcting for the total power within the spectrum.
Model of individual time series using auto regressive and moving average components. Predictions can be made by regressing different people’s ARMA models onto eachother.
Class of stochastic dissimilarity measures, including for instance Kullback-Leibler and Cauchy Schwarz divergences.
Correlation is for instance used for the synchrony detection algorithms used by Nijholt and colleagues , Watanabe and colleagues , and Ramseyer and Tsacher . Coherence has been used by Henning and colleagues . In these cases, it is important that appropriate corrections for autocorrelations within a signal are made [29, 37]. A simple way to do this is to use first-order differences of the calculated signals (Eq. 1). A more sophisticated way is to construct autoregressive moving average (ARMA) models that explicitly model the autocorrelations . Subsequently, it can be tested how well the ARMA models of different individuals predict each other. This is the approach that has been taken by, for instance, Levenson and Ruef . A third way of correcting for autocorrelations was proposed by Ramseyer and Tsacher  by shuffling the signal from one individual to see if it still correlates with the other individuals signal. If the correlations are similar to those from the unshuffled data, they are not due to synchronization. Finally, divergence measures can be used to calculate (dis)similarity. These have, to my knowledge, not been applied in an empathy-related context. Examples include Kullback-Leibner and Cauchy-Schwarz divergences, among others (see  for a review).
Using multiple modalities can significantly increase the performance of similarity measurement. This is the case because many of the different modalities are not only responsive to affective changes, but also to cognitive and physical changes . For instance, it is well known that cognitive workload or exercising influence heart rate and skin conductance. Another example is that it can be problematic to track facial expressions when eating, because in that facial muscles are activated as well. Therefore, combining measurements from multiple modalities and seeing if they match up can give much more precise indications of synchronization. Furthermore, physiological measures and speech parameters tend to tap into arousal components of emotions, whereas facial expressions mostly relate to valence . Hence, there is different information in different modalities, so combining modalities can give a more complete picture of emotional convergence as well.
In sum, I presented an empathy measurement pipeline based on measurement of physiological signals. First, signals have to be preprocessed and normalized. Subsequently, they have to be coupled in time (with a possible lag). Then, relevant features have to be extracted. Once, these features are extracted, there similarity has to be established by a similarity algorithm.
5.2 Empathic responding
For the third component of empathy, empathic responding, it is most important to measure whether a response is mostly related to sympathy or mostly related to personal distress. Unfortunately, there has not been a lot of research that has explicitly examined the differences between such responses, so there is a clear need to identify specific behavioral and physiological responses accompanying either sympathy or personal distress. Nonetheless, three different strategies can potentially be used to track whether empathic responses are mainly based on sympathy or on personal distress.
The first strategy is to track specific nonverbal behavior that is related to sympathy or personal distress. Zhou and colleagues  present a review of facial and vocal indices related to empathic responding based on studies of human-coded behavioral responses to empathy invoking stimuli (e.g., videotapes of others in need or distress). They suggest that specific sympathy-related behaviors are found in signals of concerned attention. Typical examples of such behaviors are eyebrows pulled down and inward over the nose, head forward leans, reassuring tone of voice and sad looks. A study by Smith-Hanen  reported arms-crossed position related to low sympathy. Behaviors related to personal distress are fearful or anxious expressions. Typical examples of such expressions are lip-biting , negative facial expressions, sobs, and cries. This is a very limited set of behaviors related to empathic responding, and I therefore agree with Zhou and colleagues  who state that “more information on empathy-related reactions in every-day life is needed” (p. 279).
Another way of approaching the measurement of empathic responding is to see to what extent the individuals share the same emotional state. For personal distress, the similarity in emotional state is likely to increase (as both interactants are truly distressed) whereas sympathy is likely to lead to less distress. This may be captured by different levels of emotional convergence. With high emotional convergence, personal distress is more likely whereas low emotional convergence is more related to sympathy. Hence, for automated measurements it may be sufficient to threshold emotional convergence in order to see if a response is sympathy or personal distress. Nonetheless, not responding at all would also lead to low emotional convergence, which is also low sympathy. Hence, this strategy cannot be used on its own, but might have value as an additional measurement of empathic responding.
The third strategy to measuring empathic responses is related to the notion that effortful control is involved in regulating emotional convergence. On the one hand, when high levels of effortful control are applied, reactions are sympathic. On the other hand, when effortful control is lacking, emotional convergence processes lead to personal distress. Hence, tracking regulation processes could give an indication of empathic responding. A wide variety of studies has shown that respiratory sinus arrhythmia (RSA; sometimes referred to as heart rate variability) is an indicator of emotion regulation [48, 49, 136], especially during social interaction . RSA is an index of periodic changes in heart rate related to breathing and provides an index of parasympathetic activity of the autonomous nervous system [20, 79]. Between-person differences in RSA have been related to individual differences in emotional flexibility (i.e., the ease with which one’s emotions can change; ). Within-person changes in RSA have been related to the activation of emotion regulation processes [69, 148]. Hence, RSA could also be a useful index for tracking empathic responses.
RSA can be measured by transforming the interbeat intervals of an ECG signal to the frequency domain. Subsequently, the power in the high frequency range (0.15 Hz–0.40 Hz, ) can be calculated as an index of RSA. Because this power can also be influenced by respiration rate and volume it is often corrected for respiration parameters as well .
In sum, there can be different approaches to automated measurement of empathic responses. It needs to be stressed that there has been (almost) no research on using these approaches and their feasibility and performance are to be determined in future studies. Finally, the three strategies are not mutually exclusive and combining them would likely provide the best solution for automated measurement of sympathy and personal distress.
6 Methodological issues
6.1 Empathy questionnaires
Questionnaires are often used in psychological studies to capture empathy. For social signal processing and affective computing, they can be useful as ground truth measure against which automated techniques can be validated.
Empathy questionnaires that can be used to measure self-reported dispositional and situational empathy. Questionnaires can be subdivided in the type of empathy that they measure
Hogan’s empathy scale
Mehrabian and Epsteins measure
Davis’ interpersonal reactivity index
Affective and cognitive empathy
Batson’s empathy measurement
Barrett–Lennard Relationship Inventory
There are different dispositional measures of empathy available that tap into one or more of the different components of empathy described above (see Table 2). Hogan’s scale is focused completely on cognitive empathy, whereas Mehrabian and Epstein’s scale captures solely the affective components of empathy. Davis’ scale has different subscales that capture both affective and cognitive phenomena associated with empathy. Often these scales are completed by the individual under investigation, but sometimes (especially with children) observers fill out the questionnaire. A combination of responses by both observers and individuals being tested might give more reliable scores of empathy.
Batson’s empathy measurement [15, 16] is a situational empathy questionnaire that taps into empathic responding by measuring both sympathy and personal distress. Responses are taken on a 7-point Likert scale regarding the degree to which people experienced eight adjectives associated with sympathy (i.e., Sympathetic, Moved, Kind, Compassionate, Softhearted, Tender, Empathic, Warm) and twelve adjectives associated with personal distress (i.e., Worried, Upset, Grieved, Distressed, Uneasy, Concerned, Touched, Anxious, Alarmed, Bothered, Troubled, Disturbed). Responses range from 1 (not at all) to 7 (extremely). As with the dispositional scales, this scale can be completed both by individuals being tested themselves, or by observers.
Another situational empathy questionnaire is the Barrett-Lennard Relationship Inventory which contains an Empathic Understanding Sub-scale (EUS). The EUS is validated in clinical settings and contains 16 items to assess a patient’s perception of a clinicians empathy during therapy sessions [11, 12]. A sample question from the modified EUS is, “My therapist was interested in knowing what my experiences meant to me”. Each question uses a scale ranging from +3 (strongly agree) to −3 (strongly disagree). The questionnaire can easily be modified to be used in other contexts (as done by Marci and colleagues ), but its validity in those other contexts has not been tested.
It is important to note that there are limitations and downsides to the use of any self-report measure, and these limitations are also relevant for empathy-related questionnaires. First of all, self-reports are subject to self-presentational biases. Furthermore, in experiments where the manipulation is clear to the participants, a confirmation bias might play a role. In those situations, participants might be biased to (unconsciously) answer towards the result that researchers are hoping to achieve. Next to these biases, it is also likely that the self-reported measurements tap into other aspects of empathy than behavioral and physiological measures. Many empathy-related processes are automatic low-level processes (e.g., emotional convergence) that people might not even be aware of. Hence, those processes are unlikely to be reflected in self-report questionnaires. Therefore, questionnaires are probably most relevant as measurements for empathic responding. For empathic responding, it can also be important to ask not only the individual that generates the response but also the person towards whom the response was directed to indicate the level of empathic responding.
6.2 Experimental setups and procedures
As with all social signal processing and affective computing studies, there is a need for large amounts of varied and ecologically valid data on which to train and test different systems . The following paragraphs describe a few methodological lessons relevant to obtaining empathic data and testing empathy-related systems.
On the one hand, it is important to take into account the fact that emotional convergence is a partly automatic low-level process that is difficult to manipulate. Moreover, although data from acted behaviors is a very popular source for social signal processing, it is unlikely to capture all the automatic processes involved in emotional convergence between two individuals. On the other hand, getting some control over interactions between two individuals is also difficult. As a compromise, movies of people expressing certain strong emotions are often used as stimuli in psychological studies . These movies are prerecorded, and therefore all participants can get the same stimulus. This allows some control over what is perceived by the participants. A downside of this approach is that there is no mutual influence, as the stimulus can only influence the viewer. In contrast, in natural interaction the viewer also influences the person in the stimulus (or, in other words, interacting people are both perceiving and sending out empathic information to each other). Nonetheless, videotaping is an often used and widely accepted method for inducing empathy .
Different levels of empathy are sometimes induced by giving participants different instructions (also referred to as perspective taking instructions; ). Such instructions tell participants to pay close attention to either the feelings of the other person or to the information that is disclosed by the other. Empathy is generally found to be higher in situations in which participants are instructed to pay close attention to feelings of the other than when they are instructed to focus on the information disclosed by the other . This is mostly a manipulation of cognitive empathy and it is unclear how this influences the other two components of empathy.
Emotional convergence might be influenced by selectively leaving out communication channels that normally trigger emotional convergence . For instance, masking the facial expressions in video stimuli should reduce emotional convergence. A disadvantage of such an approach is that it is a rather crude manipulation that might also influence other processes beside emotional convergence.
A final note on methods for social signal processing is that the different measurement techniques should be evaluated in actual applications to provide some indication of their performance . With many of the empathy measurements, especially with emotional convergence, it will be difficult to judge their validity, as ground truth information and triangulation is even more difficult than in individual emotion research . From that perspective, it is essential to test the techniques in practice and see how well they work for specific applications. For many systems, it is not necessary that the measurements are flawless, as long as the users can receive some benefits from the system. Moreover, iterations of testing empathic technologies evaluated in practical settings are also likely to improve the measurement and recognition process by gaining new insights. In sum, there are still many open questions, that need to be assessed with further research. This will be discussed in the next section.
7 A research agenda for social signal processing
The framework and review presented above provide a starting point for further research into empathic technologies. As has become clear, such systems have so far mainly been approached from a cognitive empathy point of view. Nonetheless, there are many opportunities to integrate the emotional convergence and empathic responding components in artificial systems to augment human interaction. In the following paragraphs, I describe some directions for future research that have not explicitly been addressed yet.
Research focusing on detecting emotional convergence could focus on identifying the different facial, speech, and physiological parameters that are helpful in detecting emotional convergence. Moreover, different similarity algorithms can be compared to investigate which algorithm is most successful in quantifying emotional convergence. Several emotion recognition studies have shown that a multimodal approach will give better results than unimodal approaches [4, 101, 128]. As different modalities have their own advantages and disadvantages, combining them leads to more reliable measurements. To investigate multimodal approaches, different signal fusion techniques should be compared. Possibly the similarity algorithms can be extended to take into account different signals. Otherwise, the outputs of individual similarity ratings for each signal can be combined afterwards (for instance, using a weighted average). Such investigations could lead to more reliable emotional convergence sensing systems that are better able to handle environmental noise.
Further research on empathic responding could be approached from three sides. First of all, research could be focused on identifying specific nonverbal behavior associated with either personal distress or sympathy. Identifying and developing algorithms that can detect such behaviors can serve as a proxy for sympathy and personal distress. Second, systems could use emotional convergence scores to check if reactions are based on sympathy or personal distress. As explained before, with sympathy there is likely to be some emotional convergence, but there is much more emotional convergence during personal distress. Hence, simply thresholding emotional convergence might be sufficient for distinguishing sympathy and personal distress. Finally, it should be investigated to what extent physiological signals, and especially RSA, differ between sympathy and personal distress reactions. A combination of the three proposed approaches might even lead to better system performance, but these issues first need to be addressed individually.
Beside empathy measurement, it is also important to investigate feedback mechanisms to support empathy. Supporting empathy by giving users feedback on their empathic abilities raises interesting questions. Emotional convergence and empathic responding are partly subconscious processes . Therefore, providing explicit feedback about empathy might actually backfire, because consciously trying to improve empathy might interfere with the automated processes. Therefore, designing feedback mechanisms that work subconsciously and preferable in the background might be better suited for empathy enhancement. An example of this is the peripheral ambient display used by Balaam and colleagues , which reinforced synchronization between interacting individuals using stimuli in the form of water ripples whenever there was behavioral synchrony between participants. More such mechanisms should be investigated to see what feedback modalities and temporal characteristics are most effective.
Evaluation of the mechanisms described above should mainly be done by testing their performance in practice [134, 180]. Validation could also be done with questionnaires, but these might not be able to tap into the exact processes involved in empathy (especially in emotional convergence). Moreover, it is unclear what the performance requirements are for empathic technologies in practice. For these reasons, it is important to move out of the lab and into the real world to test practical applications of empathic technologies. It might well be that easily implementable systems are already sufficient for many applications and very sophisticated recognition algorithms are not needed. Finally, real world testing is likely to lead into many new insights that can further improve the systems. In sum, only by actually implementing applications as described before can we investigate how well they work in practice.
An important issue when evaluating automated empathy measurement is the separation of empathy measurement from other constructs . This has not received a lot of attention in the literature. Hence, it is unknown to what extent the methods presented above are solely triggered by empathy, or are also responsive to other constructs. From a theoretical perspective, empathy, and especially emotional convergence, is often considered a low level process that works automatically and is not influenced by many other factors. Nonetheless, for instance, physiological signals respond to other factors as well, like cognitive effort or physical exercise . In that light, it is important to create ecologically valid tests in which other responses might also occur, to be able to test if empathy recognition would also work in practice.
In this paper, I treated empathy as a temporary situated process. Nonetheless, many psychological studies have also identified stable trait-level differences between people on empathy. One example is the common finding that women tend to behave more empathically than men . Such individual differences might not be directly relevant for applications of empathic technologies. However, they might be useful for improving the recognition accuracies of the different empathy-related systems . Future research could therefore focus on models that take into account some of the well-known individual difference.
The focus of this review has been on improving human-human interaction. Nonetheless, the same principles might apply to human-machine interaction. As Reeves and Nass  have shown, humans treat computers the same way they treat other humans. In that light, empathy might be just as important in human-machine interaction as in human-human interaction. Nonetheless, some of the empathy processes probably work differently in these two different contexts. For one thing, emotional convergence works based on stimuli that are largely absent in interactions with computers (e.g., through facial expressions). One exception to this is interacting with embodied agents. In those cases, emotional convergence could be tracked in the same way as done with human-human interaction. Furthermore, research on empathic responses could also inform the design of behavior of artificial agents to become more empathic. Hence, human-machine interaction could greatly benefit from specific empathy research as well.
Empathy is an essential process in our social interactions. To make the construct of empathy more useful, this paper has presented a three-component framework of the different processes of empathy. The framework has been linked to current and possible future practices in affective computing and social signal processing, and defines an upcoming area of research and applications around empathy. Possible applications for empathic technologies have been identified and structured. Furthermore, as these applications depend on measurement of empathy, measurement of empathy has been discussed for each component in the framework. Specific gaps and a concrete research approach on how to close these gaps have been identified.
Although there are many challenges ahead, the opportunities for and promises of incorporating empathy into affective computing and social signal processing are manifold. When such research comes to fruition, it can enhance empathy, thereby boosting altruism, trust, and cooperation. Ultimately, this could improve our health and well-being and greatly improve our future societies .
I would like to thank Egon van den Broek, Maurits Kaptein, Petr Slovak, Gert-Jan de Vries, Joyce Westerink, and Marjolein van der Zwaag and three anonymous reviewers for their useful comments and suggestions on earlier drafts of this manuscript.
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.