1 Introduction

Empathy is an integral aspect of human existence. Without it, our life would be rather bleak. Barely livable, in fact. Without at least a basic ability to access others’ affective life, communication and interaction, caring, meaningful social relationships, and social or cultural norms—anything that requires understanding others—would be well-nigh impossible. But empathy is not uniquely human. The capacity to assess what others are feeling has evolved early during mammalian evolution: Non-human primates are capable of at least primordial empathy, as are elephants, dolphins, dogs, and rodents (Preston and de Waal 2002). This suggests that empathy is an adaptation, i.e., a trait present in us because it was conducive to our ancestors’ fitness. And indeed: Being able to understand others’ emotional state allows us to provide the kind of comfort, advice, relief, assistance etc. that facilitates coping with environmental contingencies (Schulz 2017).

However, psychological mechanisms that used to be adaptive may cease to be so if the environment changes so rapidly that evolution cannot adapt them swiftly enough. The momentous cultural and technological developments throughout the past two millennia have changed our environment so radically that what evolutionary psychologists call our ‘stone age minds’ often cannot keep up. More and more studies claim to provide evidence for a noticeable decline in the capacity for empathy in an increasing number of people—quite generally (e.g., Konrath et al. 2011; Zarins and Konrath 2017), but specifically in frequent internet and social media users (e.g., Konrath 2013; Martingano et al. 2022; Small and Vorgan 2008; Twenge 2013; Turkle 2017), leading to what has been called the ‘media-empathy paradox’ (Guan et al. 2019): Precisely the technologies that have been expressly designed to foster social connection seem to lead to a deterioration in people’s interpersonal capacities. Apparently, the means humans have, throughout the ages, acquired to access others’ emotional life no longer seem to function well in what has become our everyday business – technologically mediated social interactions in online spaces. With apparently tremendous consequences (e.g., Dregdge and Schreur 2020): The resulting decrease in empathy appears, among other things, to negatively impact the health (e.g., Abi-Jaoude et al. 2020) and life satisfaction (e.g., Usán Supervía et al. 2023) of frequent internet users, and to promote the spread of fake news (e.g., Vafeiadisa and Xiao 2021), online radicalization (e.g., Feddes et al. 2015) and hate speech (e.g., Hangartner et al. 2021).

If (and this is a large if; see Sect. 6) these diagnoses are by and large correct – i.e., if there is indeed a decrease in empathy in frequent online users and if that decrease is indeed something we should find worrisome – there are two philosophically and societally important questions: (1) What makes empathy for frequent online users so difficult? and (2) What can be done to alleviate the negative consequences, both for them and others? Taking the apparent empirical evidence for a worrisome decline in digital empathy at face value (but see Sect. 6), the aim of this paper is therefore twofold. Firstly, in order to address (1), we identify structural differences between offline and technologically mediated interactions that can contribute to an explanation of why digital empathy is harder to achieve. Secondly, in order to address (2), we draw on ideas from ‘situated affectivity’ research (e.g., Stephan et al. 2014; Walter and Stephan 2023) and consider the idea of modifying online spaces in ways specifically designed to foster empathy where and when our evolved mechanisms fail.

Section 2 argues that empathy is, at its core, a matter of interpreting the behavior of embodied subjects. Section 3 identifies three factors that are crucial for this interpretative endeavor: the empathizer’s affective repertoire, their perceptual input, and their background knowledge. Section 4 argues that technologically mediated and face-to-face interactions differ with regard to these factors in ways which often render our evolved empathy mechanisms less effective in the digital world. This answers question (1). Section 5 explores the idea that in such cases situational factors can serve as ‘empathic scaffolds,’ i.e., as ‘tools’ that can ‘shape’ people’s empathic reactions. This offers a tentative answer to question (2). Section 6 wraps things up, points out limitations, responds to objections and invites further scholarship.

2 Empathy: Interpreting Embodied Subjects

Empathy is usually described as a ‘multidimensional’ construct comprising various affective, cognitive, and conative facets (e.g., Lietz et al. 2011). Conceptually, it can be distinguished from several closely related phenomena such as mimicry, sympathy, perspective taking, or compassion (e.g., Maibom 2020, ch. 1).Footnote 1 Interesting and important as such nuanced differentiations might be, however, at a first approximation empathy might simply be regarded as the skill to affectively understand and adequately respond to others’ feelings. This is certainly too broad to be called an ‘account’ of empathy. But we do not want to arbitrate between the plethora of competing theories. Rather, we aim to offer an analysis of what can make digital empathy more difficult that can be valuable for as many accounts of empathy as possible. Therefore, distilling a fairly general ‘common core’ of extant accounts is precisely what is needed. This ‘common core,’ we venture, is that empathic observers have to (1) recognize the target as an embodied subject and (2) interpret the available behavioral and/or contextual cues.

First, if empathy is a matter of understanding (and eventually responding) to how the target is feeling, the claim that there is a kind of genuine empathy with fictional or otherwise inanimate objects is at least misleading: For if observers are aware of the inanimate nature of an object, they know that there is no emotional state they can ‘feel into.’ At best, they can project their own emotional state ‘onto’ the object. Such cases deserve consideration (e.g., Fuchs 2014; Safdari Sharabiani 2021), but they are a far cry from the kind of understanding we generally (try to) attain in social interactions: In order for us to even wonder what others’ affective perspective on the world is, we have to recognize (at least implicitly) that they have such a perspective, i.e., that they are not a mere ‘it,’ but a ‘you,’ not an object, but a subject. In particular, our recognition of others as a subject will typically include the (tacit) presupposition that the way they affectively interact with their environment is, among other things, a matter of their (lived) body. It is this assumption that the target is not only a subject, but an embodied subject (e.g., Gallagher 2008; Osler 2021; Osler and Zahavi 2022; Svenaeus 2021), that provides us with reasons for thinking that, given the similarities in bodily constitution and behavior, they experience and express emotions in similar ways, so that we can (ideally; see Sect. 3) resort to our own experiences to understand how they are feeling.

Second, while the recognition that others are embodied subjects can trigger a basic empathic understanding that they are experiencing some affective state or other – that in contrast to mere objects there really are emotional shoes one might try to step into – that alone arguably does not reveal what affective state others are experiencing. In order to figure that out, empathic observers have to interpret their affective life. This claim requires some elaboration and, for reasons that will emerge soon, argument.

We use the term ‘interpretation’ in a very liberal, undemanding sense. As we understand it, interpretation need not be conceptual, conscious, or cognitively onerous. We explicitly want to sidestep the question whether it proceeds along the lines of simulation-theory (e.g., Goldman 2006), theory-theory (e.g., Carruthers and Smith 1996) or (most likely, we think) any other account (e.g., Fuchs and De Jaegher 2009; Gallagher 2012). The following sections only assume that somehow or other, observers have to ‘work out’ the target’s affective life from whatever cues are available in the given context. That many, especially traditional, accounts of empathy require some such interpretation is arguably uncontroversial. With regard to so-called ‘direct social perception accounts’ (e.g., Gallagher 2008; Krueger 2018; Osler 2021; Zahavi 2011), however, this claim will strike many as controversial.

According to direct social perception accounts, observers need not infer happiness or sadness from overt expressive cues, but can directly perceive them. Krueger (2009), for instance, has argued that given that an agent’s observable behavior is constitutive of their affective states, we can “very literally see, in a direct and noninferential way, various emotions and moods as they ripple and flow across the terrain of the body’s movement and gesture” (p. 683). But if I ‘very literally’ “see my partner’s sadness in her slumped shoulders, furrowed brow, and quiet speech” (Krueger 2018, p. 301), then, it might be said, no interpretation is required – and that might be taken to suggest that such accounts are immune to, or unlikely to benefit from, the considerations to come. The remainder of this section sets out our reasons for thinking that this concern is unwarranted.Footnote 2

First, if push came to shove, we would not be convinced that direct social perception accounts are viable. The argument sketched above rests on the assumption that if x is constitutive of y, then by seeing x one (‘very literally’) sees y. And this is, at the very least, controversial (Walter 2018): A jigsaw tile is constitutive of the whole jigsaw, but by seeing the tile, one does not see the jigsaw; the CPU is constitutive of a computer, but by seeing the CPU, one does not see the computer.Footnote 3

More importantly, though, even if the part-whole inference of direct social perception accounts is viable, they at least seem to fail to paint the full picture. Even if by seeing the target’s smile the observer would literally see their happiness, empathic understanding would still require some kind of interpretation on top of mere perception. For just as Ralph can see Hubert without knowing that it is Hubert he is seeing, even if the observer would, by seeing the target’s smile, see the target’s happiness, they would not, thereby, eo ipso know that it is happiness they are seeing.Footnote 4 But this knowledge is necessary for truly understanding the target’s affective state – after all, if the target were a super-actor, the observer would see the same smile, but no happiness. Advocates of direct social perception have objected that perception is so ‘smart’ that it can do “most of the work without the need of extra cognitive […] processes” (Gallagher 2008, p. 538). As they see it, “in the broad range of normal circumstances there is already so much available in the person’s movements, gestures, facial expressions, and so on, as well as in the pragmatic or social context, that I can grasp everything I need for understanding” (ibid., p. 540; emphasis added), and they insist that super-actor and other cases where we are “misled by what we perceive” and have to actively decipher the target’s affective life are “relatively rare” (ibid.). However, even if we typically perceive others’ emotions without cognitive effort later in life, it takes infants days or weeks to make the link between smiling and happiness and much longer to associate less basic emotions such as shame, pride, envy, or jealousy with characteristic bodily, behavioral and contextual patterns. And for people with autism spectrum disorder, for instance, such links may remain forever imperceptible (e.g., Coelho et al. 2023). But their difficulties when it comes to dealing with other’s emotions are usually not merely attributed to deficiencies in perception. Whatever they lack (or better: in whichever respect they differ from neurotypical people), i.e., whatever has to happen for perception ‘proper’ to be enriched into ‘smart’ social perception, is what we would subsume under the term ‘interpretation.’ Fortunately, it is fine to leave this conceptual dispute unresolved here, for there is another reason why the considerations to come are relevant for direct social perception accounts, even if they are viable and talk about ‘interpretation’ is eschewed tout court.

As seen above, direct social perception accounts admit that they only cover “the broad range of normal circumstances” (Gallagher 2008, p. 538), suggesting that ‘non-normal circumstances’ are sufficiently rare to be ignored. As we will see, however, there are reasons for thinking that digital encounters are precisely not comparable to “normal everyday encounters” (ibid., p. 540). As a consequence, the more effortful cases that require active deciphering are becoming ever more pervasive. And advocates of direct social perception accounts are aware of this. Osler (2021), for instance, argues that embodied subjects communicate their affective states through their “expressive, lived body” (p. 3), that can, in contrast to their physical body, extend into interpersonal spaces, including digital ones, where they can, in principle, be directly perceived. But she agrees that the necessary perceptual process faces more problems in digital settings and therefore offers an analysis of these problems that overlaps with and is complemented by our own (see also Osler and Zahavi 2022). Given this, it seems that even ardent advocates of a direct social perception account can benefit from a better understanding of the structural differences between technologically mediated and face-to-face encounters, even if they remain at unease with the talk about ‘interpretation.’ We therefore take it that the following is indeed relevant for virtually all accounts of empathy on the table.

Section 3 takes a closer look at what exactly enables us to decipher others’ affective life in everyday encounters. Section 4 then identifies structural differences between offline and online interactions.

3 The Foundation of Empathy: Affective Repertoires, Perceptual Input, and Background Knowledge

As indicated in Sect. 2, we will not attempt to develop a full-fledged account of empathy. The goal of this section is to identify three factors that (so far; see Sect. 6) make empathic understanding possible in the first place or at least further its accuracy. While there is ample discussion about the decline of empathy in frequent internet and in particular social media users, the question of what exactly it is that makes digital empathy more difficult is still a matter of debate (e.g., Bortolan 2022; Konrath 2013; Osler 2021; Osler and Zahavi 2022; Svenaeus 2021). This section paves the way for a discussion of this question in Sect. 4.

In order for the interpretive feats that have to be performed in empathic interactions to be accomplished, the empathizer, the target and the context of their encounter ideally have to fulfill three conditions.

First, observers must have an appropriate affective repertoire. In the course of our ‘affective biographies’ (von Maur 2021) we encounter a multitude of emotional situations. Our respective experiences and reactions are sedimented into our lived body in the form of both implicit and explicit affective memories.Footnote 5 This affective repertoire determines, together with concrete situational factors, what we are able to understand (ibid.). When we see that others react to similar situations similarly, our affective repertoire is thus an invaluable source of information. At the same time, though, it constrains us: If we’ve never even been close to something like the other’s shoes, we cannot step into them just so. We may be able to gain some understanding through the target’s explanations (and possibly other resources; see below), but barring any comparable experiential memory, we cannot truly resonate with them. Consider, for instance, the restless-legs-syndrome, a neurosensory disorder causing decidedly unpleasant bodily sensations patients are typically unable to convey. Abstract knowledge can enable others to somehow grasp that their experiences are unpleasant and most likely also the rough extent or severity of these experiences, but accurately resonating with them proves to be an exceptionally arduous endeavor: Without an appropriate affective repertoire, observers might feel sympathy or compassion (see Sect. 1), but cannot relate the expressive cues to isomorphic experiences. In such cases, they at best have what Hume (1748, Sect. 2) called a “faint and dull” idea, a mere “copy” of the “more lively” impressions of the target.

Second, some perceptual input – from straightforward verbal reports to more opaque cues such as facial expressions, gestures, body posture, inflexion, diction – must be, first, provided, and, second, perceived, i.e., accessed and processed. The observer’s interpretation can change in arduousness and accuracy, depending on the richness, intensity, level of detail, accessibility, and processability of the perceptual input, which all can vary for different reasons. On the one hand, accurate empathy might be hard to achieve because the target provides no or only misleading input, say when they attempt to display a ‘false’ emotional state in order to provoke undeserved advice, relief, consolation etc. On the other hand, even if the target is honest and forthcoming and eloquently reports their feelings, accompanied by strong emotional expressions, unfavorable background conditions such as noise, bad lighting or an unusual context can affect the accessibility or processability of the input, again resulting in less accurate empathy. In addition, as Osler (2021) has pointed out, the intensity with which the target provides the input may itself hamper its accessibility or processability – shouting, for instance, despite the perceptually rich expression, can effectively weaken the amount of useful expressive cues conveyed.

Third, background knowledge – about the target, their affective state, or the context – can provide (additional) information that facilitates an interpretation. The more observers know about the target’s character, history, situation, hopes, commitments, expectations etc., the less ambiguous the input. Importantly, background knowledge cannot only supplement, but also to some extent be a surrogate for an insufficient affective repertoire or perceptual input.Footnote 6 For instance, if the observer knows that the target has experienced a significant bereavement, they can empathize with them even in the absence of any further perceptual input. Similarly, if there is no suitable experiential memory in the observer’s affective repertoire, as in the case of restless-legs-syndrome, background knowledge – about, say, which familiar feeling the target’s feeling resembles – can allow for at least a basic understanding.

Given what has been said in this section, the increasing prevalence of digitally mediated interactions raises the question of whether, and if so, how, digital and face-to-face encounters differ (see also Osler and Zahavi 2022; Svenaeus 2021), in particular with regard to the affective repertoire of the observers, their perceptual input, and their background knowledge.

4 Empathy in Technologically Mediated Interactions: Perils and Pitfalls

Regardless of whether observers encounter the target face-to-face or through video-calls, Snapchat videos, personal text messages or impersonal status updates, we have argued, empathy requires that they interpret the target’s behavior, and this interpretation is based upon their affective repertoire, the extent to which perceptual input is provided, accessed, and processed and the background knowledge they can draw on. As we will see, all these factors are typically more error-prone in technologically mediated interactions, which contributes to the ongoing attempts (e.g., Osler and Zahavi 2022; Svenaeus 2021) to understand how non-mediated and mediated forms of sociality differ in ways which make digital empathy often (see Sect. 6) harder to attain.Footnote 7

4.1 Affective Repertoire

Frequent internet users often spend hours each day on social media. In particular, digital natives of the Gen Z and Gen α generation are increasingly growing up with such ‘screen time,’ not just passively consuming content but actively creating their own online social world, in which many interactions that still used to be face-to-face for the members of Gen Y at a comparable age are shifted to digital spaces.Footnote 8 This influences their affective repertoire (e.g., Wood et al. 2016). The sparser the affective input during one’s development, the sparser one’s affective repertoire: For instance, a child raised in a Western climate of individualism and not accustomed to the affective intensity that accompanies interdependent affiliations in the Japanese culture will not be able to accurately empathize with the emotion of amae (roughly, a culturally ingrained indulgent dependency on others) that pervades the experience of Japanese children of the same age. And digital spaces have (as of yet) neither the same depth nor diversity than the non-digital world – a raised index finger emoji does not have the same affective vividness as one’s mother speaking to one’s conscience with a worried look and a caring voice, and the number of likes per post on Facebook does not distinguish between sincere maternal pride and shallow juvenile admiration. Compared to childhoods spent in nature, with peers, creative pretend play, and self-made non-technological toys, digital encounters are rather bleak and repetitive in their experiential diversity (Svenaeus 2021), not to mention the physical and mental health benefits, especially for fundamental executive functions, associated with ‘green time’ (Yogman et al. 2018). As adolescents have progressively less offline-experiences, there is thus a greater likelihood that there is less variety and depth in their experiences they can draw on (McNamee et al. 2021; Sriwilai and Charoensukmongkol 2016), with the consequence that they simply cannot understand or, for that matter, feel what the target is feeling (Uhls et al. 2014).

Moreover, being capable of experiencing the target’s emotion is to no avail if the experience, although generally familiar, cannot be called up in the given context. Another potential problem is therefore that the observer’s affective biography might be so different that it is impossible for them to feel what the target is feeling in a given context, even if the experience is generally familiar. Someone who hates nothing in the world more than the taste of Brussels sprouts will not be able to truly resonate with someone whose mouth waters at the mere sight of them; imagining their own favorite mouth-watering food might tell them ‘what it is like’ for the other to indulge in Brussels sprouts, but that alone does not make them understand the other’s perspective, precisely because they cannot imagine how someone can have this experience when eating Brussels sprouts.Footnote 9 Social media usage can exacerbate or make such fundamental interpersonal differences more likely by decreasing people’s self-esteem, with the result that social comparison prevails over emotional contagion: With intact emotional contagion mechanisms, witnessing positive moments of others triggers a positive affective response, but when social comparison dominates, positive experiences reported in posts or personal messages negatively affect the viewer (Gomez et al. 2022; de Vries et al. 2018). Yet, if the observer is unable to feel happy precisely because the target is happy, empathy is impossible.

Still another problem is that digital spaces can extenuate or suppress the activation of affective states. First, online interactions hold the danger of ‘emotional exhaustion.’ Social media users are confronted with emotional news of the same kind – for instance the death of someone’s parent, partner, or pet – much more often than people that (used to) lead only ‘offline lives.’ Such recurring emotional input, partly or mostly from strangers and distant acquaintances, that usually does not trigger any emotional involvement beyond a few emoticons desensitizes, lessening the likelihood that a truly affective state is activated – the more who die, the less we care. Second, apart from an overload of empathy-demanding input, digital spaces also hold the danger of a general ‘emotional numbness,’ given that the emotions communicated there are often ‘disembodied’ (Fuchs 2014), i.e., expressed through written words and/or emoticons only, without facial expressions, bodily postures etc. To the extent that bodily expressions are an integral component of emotions, emotions are no longer the same if their expression is significantly different.

The difficulties discussed so far arise primarily in interactions between people with different online biographies, such as, typically, the Gen Zers and their digital immigrant parents or grandparents,Footnote 10 but they probably do not so much affect interactions between those accustomed to similar digital realities. Even such interactions are prone to a paucity in empathy, though.

4.2 Perceptual Input

As others have also pointed out (e.g., Aagaard 2022; Osler 2021; Svenaeus 2021) digital and ‘real life’ interactions differ with respect to the amount and the quality of the perceptual input the target provides and the observer can access and process.

First, digital communication channels affect which input the target provides to begin with. Which personal thoughts and attitudes are shared and the manner in which they are expressed depends upon whether the interaction takes place online or offline. It is easier to feign misleading emotions online and correspondingly more difficult to decipher the target’s true affective state (see Sect. 4.3). Moreover, digital spaces negatively affect people’s feelings of restraint, leading to an ‘online disinhibition’ (Suler 2004) that makes them say things they would not dare say face-to-face. As a result, online communications can be more toxic, resulting again in a desensitization or even “dehumanization” (Harel et al. 2020) that makes empathy more difficult (but see Sect. 6).

Second, online spaces offer only a limited number of modalities in which perceptual input can be provided. In face-to-face encounters, one not only hears and sees the other, but can, for instance, also smell and touch them, which is (so far) impossible during online encounters. And yet, both smell and social touch demonstrably increase the quality of empathy.

Third, the communication channels offered by digital spaces are quantitatively less rich because observers can access only a limited part of the target’s expressive field. In simple text messages or posts, they see only the other’s written words or emoticons; in voice messages, they hear their spoken words, but still don’t see them; in Instagram posts, they see them, but only in still images; in Snapchat videos, they see and hear them in dynamic action, but cannot interact with them etc. Even in live video-calls only a small fraction of the information that would be available offline is readily accessible (see Aagaard 2022). And yet, again, all these factors – seeing the target’s face, hearing their voice, perceiving their gestures and postures, in particular dynamically and not just statically, or the context, and conversationally interacting with them – demonstrably increase the quality of empathy.

Fourth, input provided online is prone to be qualitatively less rich. Virtually everyone has experienced the hassles of bad audio quality, feedback loops, frozen cameras and connection time-outs. Nothing along these lines happens regularly face-to-face and it renders digital empathy more arduous and more likely inaccurate.

Fifth, another substantial advantage of face-to-face interactions is the possibility of immediate feedback. Such interactions enable observers to constantly monitor and adapt their empathy, for instance by asking clarifying questions or disclosing doubts, making it more likely that inaccuracies can be detected and corrected early on. In contrast, many technologically mediated forms of communication are not guaranteed to provide such immediate feedback, given that the other can respond with a delay of hours, days, or even more. Even the closest digital relative to face-to-face interactions, video-calls, can suffer from smaller time delays due to, say, transmission lags; and even barring any technological breakdowns, spontaneous and smooth turn-taking is made more difficult online (Aagaard 2022). But again, receiving immediate rather than delayed feedback demonstrably increases the quality of empathy.

4.3 Knowledge

Lastly, the anonymity of digital spaces also affects the observers’ background knowledge, in particular about their conversational partners.

First, social media increase the number of contacts with people about which one has little personal and contextual information. This makes it harder, for example, to judge whether one’s words may be hurtful or misunderstood, or what emotional value ‘really’ lies behind the target’s stories, thereby affecting the accuracy of empathy. The more background knowledge is available, the easier it is to feel empathy (e.g., Behbahani and El-Nasr 2011). This is corroborated by the fact that perceived similarity and perspective taking are positively correlated, even cross-culturally (e.g., Heinkes and Louis 2009). To the extent that anonymity impedes the possibility of discerning commonalities such as cultural and political stances, physical appearance, attitudes, desires etc. that makes the observer perceive the target as a similar other, authentically stepping into the target’s shoes becomes, once more, exceedingly difficult.

Second, besides the problems caused by the sheer number of social contacts, the information divulged on the internet is often filtered, embellished and concentrated on only few (positive) aspects of life, such as photoshopped Instagram stories. This makes it more difficult to obtain, literally, a true ‘picture’ of the target. Online platforms also make it easier to actively deceive others, so that observers not only possess less knowledge about the target, but can also be less sure about its veracity, which can also decrease the accuracy of empathy.

Importantly, since the function of background knowledge is precisely to compensate for ‘gaps’ in the available perceptual input and/or the affective repertoire (see Sect. 3), the lack of knowledge aggravates the problems identified before.

To sum up, the considerations above illustrate how widely online spaces differ from non-digital ones and contribute further to an explanation for the apparent decrease in digital empathy: Through their effects on the observer’s affective repertoire, the perceptual input and the background knowledge, technologically mediated interactions in digital spaces can make it (1) harder to experience empathy at all, (2) more likely that empathy is inaccurate, and (3) more difficult to assess how accurate whatever empathic understanding is eventually achieved actually is. This answers the first question raised in the introduction. Which brings us to the second: What can be done about the decrease in digital empathy that apparently results from the problems just described?

5 Changing Social Media Minds: Nudging Empathy with Empathic Scaffolds?

Even if the way online interactions currently work makes empathy often harder to achieve, digital empathy might still be fostered in some way or other. After all, wide – albeit not unanimous (e.g., Preston and de Waal 2002) – consensus has it that empathy is not a ‘fixed’ trait, but a skill. If so, then empathy, like any other skill, can be improved. In this section, we consider three techniques for potentially increasing empathy in online interactions: behavioral, reflective, and automatic interventions. We argue that the latter two are preferable to the first, and we tentatively suggest that among these two ‘scaffolding’ approaches, automatic ones might be preferable to reflective ones – although, admittedly, most of the pertinent research is still (not even) in its infancy.

5.1 Behavioral Interventions

Behavioral interventions make people engage in some sort of behavior that focusses their attention on the need for or motivation to seek empathic understanding and engage in empathic behavior (e.g., Weisz and Cikara 2021; Weisz et al. 2021). For instance, instructing subjects to engage in peer role plays, re-enact others’ experiences, meditate or practice perspective-taking, for instance by writing essays about the target or by means of immersive virtual environments, can positively affect their empathy. Unfortunately, behavioral interventions arguably won’t be able to combat the apparent decrease in empathy among social media minds.

First, being tailor-made to specific contexts and observer-target groups, behavioral interventions do not generalize to novel situations or encounters outside of the original setting. While there is nothing wrong with such limitations per se, the concerns raised in Sect. 4 are so multifarious that more versatile techniques seem to be more promising.

Second, even if extant behavioral interventions could be amended so as to be applicable to a broader range of contexts or people, they typically require extensive preparation or training, oftentimes spanning weeks or months. Given their incessive interest in instant gratification (Wilmer and Chein 2016) and their reduced impulse control (Reed 2023), such long-term projects are unlikely to be pursued on their own precisely by those who seem to be most vulnerable, viz., the ‘online community builders’ of Gen Z and Gen α.Footnote 11 Moreover, there is no guarantee that behavioral interventions will be as effective for these cohorts, given the apparently pronounced cognitive and emotional changes they have undergone and still undergo (Uncapher and Wagner 2018). Techniques that are not totally alien to the daily routines of those who most likely need them therefore seem to be more promising.

Third, while extant research suggests that behavioral interventions can induce temporary changes in empathy, there is virtually no evidence that they can also prompt long-term changes (Behler and Berry 2022). In the absence of long-term effects, one strategy is to repeat interventions. However, in light of the enormous number of online encounters, it is hardly practical to play role-playing games, write essays, meditate, or immerse in virtual environments whenever one might face social interactions. Techniques that either have long-term effects or can effortlessly be coupled with online interactions therefore seem to be preferable.

The foregoing suggests that especially digital natives, but also frequent internet users in general, might benefit from techniques that can be seamlessly embedded into their daily online routines rather than sticking out as disruptions. Incorporating digital technologies into existing communication channels therefore appears to be an obvious option.

Let us illustrate the basic idea. The software Project Us (Rojas et al. 2022) is designed to mitigate the negative consequences of sparse or indistinct perceptual input (see Sect. 4.2) in online communications and to facilitate the interlocutors’ empathy by relieving them from the task of having to actively decipher the other’s affective state. Project Us can be coupled with any common video conferencing system and uses machine learning algorithms to analyze the interlocutors’ tone of voice and facial expressions and shares real-time feedback in the form of a simple binary color transition when it detects a negative emotional valence in one of the conversation partners. Users report that this ‘affective traffic light’ increases their awareness of others’ emotions, as well as their own expressions and how they may be perceived, and makes them adapt their behavior accordingly. Similarly, StoryChat (Yen et al. 2023) is a tool for live streaming chatrooms that uses comic strip like visualizations to picture how a protagonist might experience the atmosphere of the chat, again relieving users from the task of having to detect emotionally relevant content in the flood of information they are exposed to. Users reported that “they felt the same way [as] the main character” (ibid., p. 10), with the result that their empathy and their level of irritation towards negative comments increased (ibid., p. 11).

Project Us and StoryChat are what one might call ‘empathic scaffolds.’ As situated approaches to cognition have argued, we can save precious cognitive resources by actively structuring our environment so as to solve routine problems or open up entirely new domains of competence with least internal effort (e.g., Clark 2008). Proponents of situated affectivity have argued that this also holds for our affective life (e.g., Stephan et al. 2014; Walter and Stephan 2023): We can modulate, sustain, enrich, expand or even make possible specific experiences by drawing on natural, social, or technological resources. This ‘piggybacking’ on external structures is known as ‘scaffolding’: We use the environment as a ‘scaffold’ when we actively structure it in such a way that we deliberately alter the challenges we face, our ability to cope with them, or the way we cope with them to our advantage. This is what Project Us and StoryChat do: The cognitive and affective work required to decipher emotions and detect and decode emotionally relevant comments are offloaded onto the software, with the goal of making errors less likely and freeing resources. Just as an abacus serves as a cognitive scaffold, and music, clothes, or therapists can serve as affective scaffolds (e.g., Coninx and Stephan 2021), Project Us and StoryChat serve as ‘empathic scaffolds.’ In the remainder of the paper, we consider two potentially different kinds of such empathic scaffolds.

5.2 Reflective Interventions

According to so-called ‘dual process theories,’ the human mind has two modes of decision making: ‘System 1’ is fast, automatic and unconscious; ‘System 2’ is slow, reflective and conscious (Kahneman 2011). So-called ‘nudges’ (Thaler and Sunstein 2008) target either System 1 or System 2.

A nudge is anything that influences the likelihood that people choose a particular option without forcing, incentivizing or rationally convincing them to do so, simply by altering the choice environment. One important distinction that has emerged in the burgeoning nudging literature is that between reflective and automatic nudges (Hansen and Jespersen 2013).

Reflective nudges aim to pull people out of the well-trodden mindless paths of System 1 in the hope of making them reach better decisions with System 2. For instance, asking people to consciously reconsider what they have written when an AI detects offensive or otherwise unwanted content can be strikingly effective: Twitter users reformulated or even abandoned offensive tweets after having been asked to review them before submitting and were subsequently less likely to draft offensive tweets (Katsaros et al. 2022). Along similar lines, Wang et al. (2014) designed a Facebook-plugin that prompts a reflective decision by displaying the profile pictures of five random contacts from the user’s friend list as they are typing a post, telling the user that ‘These people and [X; depending on the user’s number of friends] more can see this’ and giving them 20 seconds to edit the post afterwards. AI can also be used to suggest ways of rephrasing (or even automatically rephrase) offensive posts while preserving as much of their (non-offensive) meaning as possible (e.g., Tran et al. 2020).

Although reflective interventions are implemented by third parties and not actively recruited by the users themselves (see Sect. 5.3), they also do provide ‘scaffolds’ in the sense that they actively structure the environment with the goal of changing the way in which people cope with empathic challenges to their advantage. And they have some potential to foster digital empathy or empathic behavior: Unlike behavioral interventions, they are not limited to specific audiences and contexts, but available on any platform that offers some form of chat, comment or post function, and even if they do not have lasting effects, they can effortlessly be used time and again. However, they have problems of their own.

Reflective interventions work by activating the cognitive skills of System 2 required for self-monitoring, self-control, attention etc. But this is precisely what frequent internet users seem to be increasingly bad at (see Sect. 5.1). Interventions that pose fewer cognitive and motivational demands therefore seem to be preferable. Moreover, by disrupting System 1, reflective interventions are precisely that: disruptions. They will therefore arguably be experienced as intrusive, annoying, or patronizing, with the result that they fail to work or even ‘backfire’ (Hummel and Maedche 2019). Less intrusive interventions therefore seem to be preferable. Automatic interventions fulfill both requirements.

5.3 Automatic Interventions

As efficient cognitive systems try to reduce their cognitive load, System 2 kicks in only when System 1 cannot handle a situation. And System 1 can handle a lot: Our social life is to a striking extent a matter of processes that are non-consciously triggered and guide action to completion without our deliberate intervention. Much of the time, we operate on automatic pilot, relying on ‘fast and frugal’ heuristics that make our decisions fast and easy. These heuristics, however, also make us susceptible to cognitive biases. According to the availability heuristic, for instance, something is more important or better the more present it is in our mind. This allows us to quickly and effortlessly decide, say, which of two comparable products to buy. Yet, it also leads to an availability bias: We overestimate the importance or quality of something simply because it is more recent, vivid, or emotional. Automatic interventions target such biases. Unlike reflective interventions, they do not disrupt System 1, but tweak the circumstances in such a way that the unconscious biases of System 1 make a certain behavior more likely. The result is what Walter and Stephan (2023) have called ‘mind shaping’: Scaffolds are not only resources for affecting one’s own mind, but can also serve as ‘tools’ for shaping others’ minds by varying inconspicuous situational factors that influence their behavior beyond their conscious control.

Mind shaping of this kind has the potential to enhance digital empathy (but see Sect. 6). Something so seemingly marginal as the smell of chocolate chip cookies, freshly roasted coffee, or all-purpose cleaner or the hair-color or body weight of other people appears to have effects on pro-social behavior (Walter and Stephan 2023). In fact, such environmental and situational cues may even be better predictors of empathic understanding and empathic behavior than the observer’s stable character traits (Darley and Batson 1973). This opens up the possibility of exploiting automatic interventions as ‘empathic scaffolds’ in order to unconsciously shape social media minds. Here are some examples from current research to illustrate the basic idea.

The status-quo bias makes us go along with the path of least resistance. As a consequence, changing default settings can affect our behavior. For instance, when the online multi-player game League of Legends changed its settings in such a way that players had to explicitly activate the chat function, rather than the all-chat being the default, toxic conversations decreased and positive exchanges increased (Kiritchenko et al. 2021). The same bias might also be exploited to combat the anonymity characteristic of online platforms that hampers empathy by masking that one is interacting with ‘real’ people. Making the use of real photos or realistic avatars (Ekdahl and Osler 2023) in social media profiles the default option, so that one has to explicitly opt-out if one wants to remain hidden behind the digital veil of anonymous speech, might already make a difference. Not only could it prevent people from attacking others if their behavior could be traced to them, it would also give both offenders and victims literally a ‘face,’ making similarities more visible, thereby furthering empathy (see Sect. 4.3). It need not even be faces: All that matters is that a victim becomes the victim (Small and Loewenstein 2003), i.e., that the other is torn out of anonymity and made an identifiable subject (see Sect. 2). For instance, changing a typical social media interface in such a way that the comment box below a post does not say ‘Write a comment’ but ‘Write to …’ followed by the user’s first name, has the potential to increase social transparency and accountability and prompt a greater internal motivation to intervene when the user is bullied (Taylor et al. 2019).

The confirmation bias refers to our tendency to focus on information that matches our beliefs. Together with the availability bias, the clearly visible number of ‘Likes’ on many social media platforms reinforces users in their views, including bullying and hate speech, by drawing attention effortlessly and exclusively to ‘friendly’ voices, while those who disagree have no equally straightforward means of expressing their dissent. Adding a ‘Dislike-button’ lessens such reinforcement by making critical alternative views more readily available, thereby decreasing the one-sidedness of the perceptual input provided online (Lee et al. 2022).

The reflective interventions discussed in Sect. 5.2 that ask users to reconfirm their tweets, messages etc. are also tapping into the availability bias by bringing the inappropriateness of their behavior to the foreground of their minds. Automatic nudges might accomplish the same while minimizing intrusiveness (see Sect. 5.2). For instance, Agapie et al. (2013) used a colored aureole around the query text box to make users formulate more elaborate queries. The aureole is red when the query box is empty, but as information is added, the aureole starts to fade, becoming blue when the input is perceived as enough to retrieve reliable search results. Using an AI to indicate potentially offensive language in a similar way might yield better results than the direct confrontation with verbal feedback, yet again reducing the cognitive load required to decipher the target’s affective state that may be the consequence of one’s own (intended) expressive behavior. This can be especially valuable during online interactions that lack the immediate feedback characteristic of face-to-face interactions (see Sect. 4.2).

6 Conclusion: Objections and Further Issues

We have argued that our traditional empathy mechanisms rely on some sort of interpretation or other (Sect. 2). The accuracy of such interpretation is a matter of the affective repertoire, the perceptual input and the background knowledge (Sect. 3). The fact that technologically mediated interactions are more problematic with regard to these factors (Sect. 4) can help explain the apparent decrease in empathy in frequent internet users. Supposing that such decrease is indeed problematic (see below), the question is what can be done about this. We have suggested that one option might be to develop ‘empathic scaffolds,’ in particular technological ‘nudges’ that actively shape social media minds (Sect. 5). This obviously raises many important questions that are beyond the scope of this paper.Footnote 12 We briefly touch upon some of them in our concluding remarks.

One important concern is the so far exclusively negative portrayal of the internet’s role with regard to empathy.Footnote 13 The observations above were made, as announced in Sect. 1, against the background of the fact that more and more studies seem to demonstrate a lamentable decline in empathy in frequent internet and in particular social media users. As a result of this, it might have sounded as if the internet were per se ‘bad,’ producing only ‘narcissists,’ emotionally blind and cognitively crippled ‘social phobics.’ This is expressly neither our view nor true. First, while some studies are indeed pessimistic, others have actually documented an increase in empathy through social media use (e.g., Vossen and Valkenburg 2016), and even those who have been highly skeptical have recently come to reckon with the possibility that “the downward trend witnessed before 2009 might not have continued” (Konrath et al. 2023, p. 2). As so often, many empirical findings fall somewhere in between the extremes (e.g., James et al. 2017), calling for more research that distinguishes more carefully, for instance, between passive vs. active social media use (e.g., Verduyn et al. 2017). Nevertheless, as long it is still a live option that digital communication channels do have detrimental effects on empathic abilities, specifically of those who frequently use them from an early age on, the question what could be done to counter them remains important – and this is the question we have been concerned with above. Second, there can be hardly any doubt that digital encounters are – at least sometimes for at least some people – superior to face-to-face encounters: The internet can provide a safe space in which individuals who, for various reasons, experience rejection and discrimination by, or do not feel like they fit into, society, can cultivate relatively rich forms of empathy, find support, connection, mutual understanding, and a sense of empowerment within various online communities. Nevertheless, as long as there are also those who do not benefit from the structure of technologically mediated interactions, the question what could be done to counter them remains, again, important.

As a matter of fact, the two sides of the coin just mentioned might be related. The analysis offered in Sect. 4 of why online interactions are more problematic for some (neurotypical) individuals may also shed light on why they can be so valuable for others. If empathy is affected by, among other things, suitable background knowledge, a perceived similarity, the quality and quantity of the input, the immediacy of the feedback etc. (see Sect. 3), this not only explains why digital empathy might be harder to achieve for many, but also why others might benefit precisely from the special character of online spaces: The members of a closed LGBTQ + forum might share more common background knowledge than they share with offline friends and family, fostering a perceived similarity that simplifies truly ‘feeling into’ the others; addicts might benefit from the ‘online disinhibition effect’ because the anonymity of the online self-help community allows them to speak candidly – to say what they would not dare say in ‘real’ life – enabling them to provide the input others need to emphasize with them; socially anxious or autistic people might use the extra time offered by the possibility of a delayed response to process information in the way they need to make an empathic response; introverted individuals or individuals with hearing impairments or speech disorders who might feel overwhelmed by the rapid pace and unfiltered massive input of in-person conversations might be better able to concentrate on the important things in digital spaces, again facilitating their understanding of others. Viewed from such a perspective, the analysis offered above might prove fruitful even beyond the (limited) use to which it was put in this paper.

Another obvious concern is that the scaffolds discussed in Sect. 5 do not, in fact, foster empathy at all, but instead replace it with a shallow surrogate, with a sort of ‘as-if-empathy’ that merely makes non-empathic people behave in superficially more pro-social ways. This concern seems to be corroborated by the fact that in Sect. 3 we identified factors that are necessary for empathy and at least some of the technologies discussed in Sect. 5 do nothing to change the interlocutors’ affective repertoire, the perceptual input or the background knowledge, instead seeking new ways of fostering empathy that can dispense with what was necessary for face-to-face mechanisms of empathy to work. There are two ways to deal with this concern.

On the one hand, if one thinks that the observable behavior and internal reactions triggered by the sort of ‘empathic scaffolds’ discussed above have so little in common with ‘genuine’ empathy that the term ‘empathy’ is unwarranted, one might have to concede that such technologies might not stop the apparent empathy decay. At least three thoughts are still worth considering in this case, though. First, there might be a glimmer of hope, for getting used to behave as if one were empathic might be an important first step towards actually being empathic. Second, if the empathy deficit is lamentable enough, a bird in the hand might be worth two in the bush: Barring any other remedies, fostering as-if-empathy might be better than nothing. Third, if there is a decrease in ‘genuine’ empathy, the important question is: Is that indeed a problem, all things considered? Can we hope to cope with the challenges of our radically altered environment if we continue to chase the sort of empathy that results from mechanisms that have evolved under completely different circumstances? Is greater ‘genuine’ empathy really what we need (Mezzenzana and Peluso 2023)?

On the other hand, however, the objection under consideration rests on an intuition that will hardly resonate with those who accept the paradigm of situated cognition and affectivity. To claim that someone who adds two numbers by using pen and paper is not ‘genuinely’ doing math or that someone who keeps track of their appointments by using a notebook is not ‘genuinely’ remembering manifests precisely the sort of “biochauvinistic prejudice” (Clark 2008, p. 77) advocates of situated approaches have been opposing all along. In their eyes, the mechanisms underlying our cognitive and affective capacities can be implemented in various ways, some purely internal, but some also, and often more efficiently, by processes spanning brain, body, and environment, including technological devices. And just as doing sums with one’s fingers is not mere as-if-calculation but another way of calculating, the sort of empathy fostered by the technologies discussed in Sect. 5 might be ‘genuine,’ although different from the sort of empathy that results from our traditional mechanisms. This is admittedly at odds with the widespread tendency to think of empathy, just as of emotions such as love or shame, as something like a ‘natural kind’ that can be given a (real or nominal) definition in terms of individually necessary and jointly sufficient conditions. But just as the view that emotions have boundaries carved in nature has come under pressure (Barrett 2006), it is far from clear that empathy is indeed a homogeneous kind (Smith 2017) and not rather a multidimensional construct the manifestation of which is malleable and dependent, among other things, on cultural factors such as prevailing norms, group standards etc. (Jami et al. 2023).Footnote 14

While much of the foregoing is sketchy, and many questions need to be addressed, we hope that our thoughts resonate at least with those who have cast their eyes upon the issue of digital empathy and have walked away not entirely convinced.