In the past 16 years, neuroscience has provided a massive amount of data suggesting that mirror neurons, shared representations or in general resonance mechanisms underlie our ability to read basic intentions and emotions into the behaviour of other people. According to what is arguably the most influential interpretation, these data support a version of the simulation theory of social cognition (cf., e.g. Gallese and Goldman 1998; Goldman and Sripada 2005; Goldman 2006; Gallese 2001, 2003, 2005, 2007; Jeannerod and Pacherie 2004). The evidence on neural resonance is taken, by the many scientists and philosophers who accept this interpretation, to show that low-level social cognition is subserved by implicit simulation of gestures, actions and facial expressions.

The implicit simulation interpretation of neural resonance has been attacked by those inclined towards more cognitivist approaches to mindreading (Jacob 2002, 2008; Jacob and Jeannerod 2005; Saxe 2005, 2009; Csibra 2005). Recently, however, it has been attacked from a camp that theorists of neural resonance so far considered to be an ally: phenomenology (Gallagher and Zahavi 2008a, b; Gallagher 2007a, b, 2008a, b; Zahavi 2005, 2008). Rather than simulation, phenomenologists suggest that resonance phenomena should be interpreted as contributing to enactive social perception.

According to Gallagher and Zahavi, to interpret, e.g. the activity of mirror neurons as contributing to a simulation process for the purpose of ‘mind-reading’, we need to conceive of such a process as a step-wise procedure: First, our mirror neuron activity represents the motor intention of the observed other; secondly, we project the represented intention onto the other. The problem with this procedure is that on the one hand, it does not adequately capture our phenomenology. We see intentions in the gestures of others and we directly perceive emotions in their facial expressions. We do not experience ourselves going through step-wise procedures on such occasions. On the other hand, according to Gallagher and Zahavi, at the neural level, the term ‘simulation’ does not really apply to the activity of mirror neurons.

I believe that this line of argumentation is strictly speaking correct. I also believe, however, that there is a threat of throwing away the baby with the bathwater. There is something unexpected and exciting about the notion of shared representations: the overlap, at the neural level and arguably at the phenomenological level, between intending and observing an intentional action or between experiencing an emotion and recognizing a similar emotion in another person’s comportment. This overlap threatens to be ignored when neural resonance is merely said to contribute to social perception. It is acknowledged by simulationists, however mistaken they may be in other respects. The question I shall try to answer in this paper is how the baby, i.e. this overlap, can be preserved when we get rid of the bathwater, i.e. a conception of social perception in terms of a step-wise procedure. If we cannot account for this overlap in terms of the identity between certain stages of different processes, e.g. experiencing one’s own emotions and observing the emotions of others, as I take Gallagher and Zahavi to show convincingly, then how else can we account for its role in social cognition?

My aim is to answer this question, in the third section of this paper, in terms of a holistic model of neural resonance-based low-level social cognition. This model is intended to be compatible with Gallagher and Zahavi’s position. But it is also intended as an answer to a question that I will claim not to be answered satisfactorily by the social perception interpretation of shared representations, in the second section. Some elements in this model are not straightforwardly congenial to the tenets of Gallagher and Zahavi’s position. I will agree that the label ‘simulation’ is wrong and misleading, but since I will claim that there is a crucial feature of neural resonance that simulationists do and phenomenologists do not capture well—the overlap mentioned above—I also think that ‘social perception’ is not entirely correct. Maybe we should stop forcing personal-level terminology on sub-personal processes altogether. These terms do not illuminate the data. In fact, we might even, as I will suggest at the end of this paper, go one step further and acknowledge that insight into the sub-personal processes that underlie social cognition illuminates our phenomenology in this area: It is neither simulation nor perception, but something in between.

But let me start by outlining the idea of neural resonance, the simulationist interpretation thereof, Gallagher and Zahavi’s criticism of this interpretation and what I claim this criticism overlooks.

Implicit simulation versus enactive social perception

Neural resonance as implicit simulation

The principal target of the phenomenologist’s criticism is the simulationist interpretation of mirror neurons. Mirror neurons fire both during the execution of specific goal-directed actions and during the observation of the same specific actions performed by others. Not long after they appeared on the scientific stage (di Pellegrino et al. 1992; Rizzolatti et al. 1996), it was proposed, in a seminal paper by Gallese and Goldman (1998), that mirror mechanisms are either a precursor or a primitive version of simulation routines for mindreading.

Such routines were introduced in the late 1980s in the theories of mind debate as alternatives to the so-called theory–theory approach to social cognition (Gordon 1986; Heal 1986; Goldman 1989). Up until 1986, it was uncontroversial that understanding others, interpreting their behaviour in terms of intentions and emotions, involves the tacit use of theoretical knowledge. But according to the simulation theory, we use our own minds as models for understanding the minds of others—we understand others by imaginatively putting ourselves in their shoes.

Up until the 1998 paper by Gallese and Goldman, the simulation theory concerned so-called high-level mind reading (Goldman’s 2006 terminology) only, i.e. the ascription of propositional attitudes. The simulationist reading of mirror neurons, by contrast, introduced a low-level version of the simulation theory. That is, the understanding of primitive motor intentions or goal directedness in the behaviour of others was now also approached via simulation. Low-level simulation routines by mirror mechanisms are entirely exogenously produced. That is, unlike much of our high-level mindreading, the trigger for a mirror neuron-based simulation routine is the observed behaviour of the other, not an internal decision of the simulator/mind reader. This is why mirroring is also referred to as neural resonance. Gallagher and Zahavi speak of ‘implicit simulation’.

In the past 10 years, the basic idea of Gallese and Goldman has been expanded and, at least in the case of Gallese, partly altered. Four changes need to be mentioned:

  1. 1.

    In Gallese and Goldman (1998), simulation by mirror neurons was considered to serve the purpose of retrodictive ascription of low-level mental states. The intention preceding an observed action, for instance, was thought to be grasped by neurally mimicking that action. Nowadays, mirror neurons are primarily considered to underlie predictive understanding of actions of others. That is, they do not just pick up the motor intentions behind bodily movements (pace Jacob and Jeannerod 2005), but rather mimic individual movements as belonging to ‘logically’ related sequences carrying out an intention that is pitched higher than mere motor intentions (Iacoboni et al. 2005; Fogassi et al. 2005).Footnote 1 For instance, grasping a cup can be mirrored through the influence of context cues as ‘grasping a cup in order to take a sip of tea’, thus predicting further specific behaviour (bringing the cup to mouth, sipping).

  2. 2.

    Whereas the original connection between mirror mechanisms and simulation concerned the pickup of intentions only—be they simple motor intentions or more complex ones—mirroring mechanisms are now thought to underlie the pickup of other low-level mental states too. Goldman now includes emotions in low-level simulation (Goldman and Sripada 2005; Goldman 2006), and Gallese (2001) proposes that mirroring underlies a general empathic understanding of others that encompasses, e.g. sensations, pains and emotions alongside basic intentions.

  3. 3.

    Originally, the mirror neuron version of simulationism was an entirely sub-personal affair. Gallese (2001), however, is explicit that his expanded version of mirror mechanism-based simulationism, the empathic understanding of others based on the fact that observation and execution of all sorts of meaningful action is mapped onto the same neural substrate, also allows for a phenomenologically accessible sharing of, e.g. emotions with others. We can think not only of simple emotional contagion (Hatfield et al. 1994) here but also of more complex forms of empathy. This expansion is important for what is to follow.

  4. 4.

    The last change relative to the 1998 proposal I will mention, which is also important for what is to follow, is the fact that many theorists now hold that since the same neural substrate underlies the execution and observation of intentional actions, activation of that substrate in itself is neither a first-personal nor a third-personal intention representation. It is a neutral intention representation (de Vignemont 2004; Hurley 2005; Gallese 2005) or a representation of a ‘naked intention’ (Jeannerod and Pacherie 2004).

Against simulation, for enactive perception

In order to see what is objectionable in the simulationist reading of mirror neuron activity, we should look at the role that is assigned to mirror neurons by this interpretation in the overall process of grasping the meaning of, e.g. another person’s gestures or emotional expression. Here is the standard view to which Gallagher and Zahavi object:

[W]hen mental matching occurs by means of a regular causal pathway, I will consider it an instance of mirroring—and an instance of mental simulation. But it isn’t yet (…) an instance of simulation-based mindreading.

What more is required for there to be mindreading? (…) [M]indreading involves mental attribution to a target. (Goldman 2006, p. 133)

Mirroring or neural resonance taken as simulation, however implicit, is not considered to be sufficient for social cognition. On the standard simulationist interpretation, it needs to be supplemented by an act of attribution of the mirrored state to the observed person. Goldman conceives of this in terms of ‘projection’, by which he means ‘the act of assigning a state of one’s own to someone else’ (Goldman 2006, p. 40). Thus, when I see happiness in your face, what happens in fact, according to Goldman, is the following: I experience a (possibly weaker) copy of your happiness due to resonance or mirroring. I do not, however, ascribe this happiness to myself but proceed to project it onto you.

This is an objectionably Cartesian (Carruthers 1996) picture of social cognition. As Zahavi puts it, ‘the simulation-plus-projection procedure imprisons me within my own mind (…) and prevents me from ever achieving a true understanding of others.’ (Zahavi 2008, p. 519). This is not how we experience things. We see happiness in a smiling face; we do not infer it from the happiness we feel ourselves. We do not look inside ourselves in order to discern emotions and intentions of others. We look at them and see their emotions and intentions. Zahavi: ‘When I experience the facial expressions or meaningful actions of an other, I am experiencing foreign subjectivity, and not merely imagining it, simulating it or theorizing about it.’ (Zahavi 2008, p. 520). In Gallagher’s words, we experience intentions and emotions ‘fully clothed in agent specification’ (Gallagher 2007a, p. 358). Phenomenologically speaking, there is no step-wise mirroring-followed-by-projection procedure.

Goldman’s response to this charge is that mirror neurons only contribute to a sub-personal, implicit form of simulation. If the step-wise procedure mirror neurons are involved in does not trickle through to the experiential level, this is at least no objection to implicit, sub-personal simulationism. The question is, of course, how my mirroring brain knows that the ‘happiness’ I register as a result of looking at your smile is in fact not my own happiness but yours, or how the ‘intention’ present in my automatically mimicking the premotor activity in your brain is in fact yours and not mine. But this is where the naked intentions mentioned in the previous sub-section are invoked.

According to many simulationist interpreters of mirror neurons (Jeannerod and Pacherie 2004; de Vignemont 2004; Hurley 2005; Gallese 2005), the premotor activity of the observed agent that is replicated in the observer represents a ‘naked intention’, an intention that is not represented as belonging to either the observer or the observed agent. The brain is then thought to employ what Georgieff and Jeannerod (1998) call a ‘who system’, driven by the fact that the naked intention representation either does or does not overlap with other representations such as that of one’s own acting body, to ascribe the intention to the observer or to the agent.

But here, Gallagher and Zahavi finish the job by arguing that the notion of ‘simulation’ fails to make sense at the sub-personal level. ‘Simulation’, in the present context, can mean one of two things:

  1. 1.

    The pretence definition: Simulation is an imitation, in the sense of something not real—counterfeit; to simulate means to feign, to pretend

  2. 2.

    The instrumental definition: Simulation in the sense of a simulator: a model (a thing) that we can use or do things with so we can understand the real thing (Gallagher and Zahavi 2008a, p, 179)

The first definition cannot be applicable to neural resonance, since the notion of ‘pretence’ fails to make sense in that context:

[I]n subpersonal processes there is no pretence, and this is the case whether we consider neuronal processes vehicles (mechanisms) or in terms of the content that they might represent. (…) [W]hat these neurons represent or register cannot be pretence in the way required for ST. They do not fire ‘as if’ I were you. As we saw, proponents of implicit ST claim that the mirror system is neutral with respect to the agent; there is no first- or third-person specification involved. In that case it is not possible for them to register my intentions as pretending to be your intentions. (Gallagher and Zahavi 2008a, p. 180; cf. Gallagher 2007a, pp. 360–361)

The main problem with the second definition is that the exogenously produced, passive character of neural resonance does not match the active, endogenously produced character of simulation routines:

If simulation is characterized as a process that I (or my brain) instrumentally use(s) or control(s), if this is what simulation is, then it seems clear that what is happening in the implicit process of motor resonance is not simulation. (…) [It doesn’t] make sense to say that at the subpersonal level the brain itself is using a model or methodology, or comparing one experience with another, or creating pretend states, or that one set of neurons makes use of another set of neurons as a model. (…) [T]hese neuronal systems do not take the initiative; they do not activate themselves, but are activated by the other person’s action. (…) The other person has an effect on us. The other elicits this activation. This is not a simulation, but a perceptual elicitation. (Gallagher and Zahavi 2008a, p. 180; cf. Gallagher 2007a, pp. 360–361)

This last sentence is important. By rejecting a simulationist interpretation of neural resonance, Gallagher and Zahavi do not claim that neural resonance plays no role in social cognition. On the contrary, their claim is that neural resonance should be characterised as enactive social perception rather than as simulation.

By calling neural resonance enactive social perception, Gallagher and Zahavi emphasize that this form of perception is sensorimotor and not merely sensory reception. They also stress that this form of perception is not instantaneous but a temporal phenomenon. This does not imply, however, that the neural process that underlies enactive social perception can be subdivided in well demarcated temporal stages. The time-lag between visual representation of an action and mirror neuron activation, for instance, is too brief to be decisive about representing the process in separate stages (30–100 ms; Gallagher and Zahavi 2008a, p. 179). Thus, mirror neuron activity can be regarded as an integral part of an overall sensorimotor perceptual process allowing us to acquire knowledge of the basic intentions and emotions of others rather than as a separate simulation stage that follows visual perception (i.e. mere sensory reception) of the other person’s actions and expressions.

A problem: what role does neural resonance play in enactive social perception?

I think Gallagher and Zahavi’s criticism of the simulation reading of neural resonance is convincing (I believe there is no point in arguing that one of the definitions of simulation does apply at the low level (Herschbach 2008), nor do I believe the phenomenology of social cognition may be characterised by a step-wise procedure). I am also sympathetic to the enactivist social perception reading of mirror neurons. Notwithstanding these sympathies, however, there is, I believe, a gap in the overall position with regard to neural resonance and social cognition as presented by Gallagher and Zahavi. One way of bringing this out is to pretend that we are back in the early 1990s. The idea of enactivism is launched (Varela et al. 1991) but mirror neurons have not yet been discovered. Suppose someone is to propose an enactivist theory of low-level social perception. According to enactivism, an organism’s knowledge of its environment is contained in the way sensory information is linked to motor output—the so-called sensorimotor contingencies. That is, the restructuring of these links in recursive interaction of an organism with its environment in which the organism adapts to it implies or specifies knowledge of the world. The question is what the chances are that the hypothetical early enactivist theory of social perception would predict neural resonance to be a part of the sensorimotor contingencies underlying social perception. What are the chances that the motor side of these contingencies would have been predicted to consist partly in the neural mimicking of the observed action?

The chances would be very slim, I think, and that may be an understatement. The point of enactivism is ‘to determine the common principles or lawful linkages between sensory and motor systems that explain how action can be perceptually guided in a perceiver-dependent world.’ (Varela et al. 1991, p. 173). Without prior knowledge of mirror neurons, one would expect the motor components of the sensorimotor contingencies to be related to the huge variety of behaviour that is relevant and appropriate in response to observing meaningful actions of others. It is obvious (I take it) that only a tiny fragment of this variety of behavioural responses consists in imitation of the observed behaviour of the other. Hence, the systematic occurrence of a very specific type of premotor activity that is only indirectly connected to the motor side of the sensorimotor processes that constitute social cognition would be unexpected. On top of that, the fact that this activity is congruent with premotor activity in actual imitation of the observed behaviour would be even more surprising. There is indeed something very remarkable and exciting about the so-called execution–observation overlap observed in mirror neurons, also to those who would have favoured an enactive theory of social perception before their discovery.Footnote 2

It is easy to understand how this surprising find initially gave rise to a simulation reading. If this premotor activity is to contribute to acquiring a sense of the other person’s intention or emotion, an interpretation of it in terms of a form of off-line re-enactment along simulationist lines, as in the original Goldman and Gallese paper, is not far-fetched; even though on closer inspection, it turns out to be a misleading interpretation. Now in view of the fact that mirror neurons and neural resonance are not an empirical discovery that would have been predicted per se from the viewpoint of an enactive theory of social perception, the question becomes: how should we interpret this discovery if not in terms of simulation? Asserting that mirror neurons and neural resonance contribute to an overall sensorimotor process that subserves social perception is not enough (if it would be enough, mirror neurons would be predictable from an enactive theory of social perception). The question is why and how social perception is subserved by this specific mirroring off-line premotor activity.

Here, Gallagher and Zahavi retreat in their roles as phenomenologists. As such, it is not their job to explain how sub-personal processes underlie what we at the personal level experience as social perception, and maybe we should be careful here and resist the temptation to speak of off-line premotor activity in the first place. At any rate, Gallagher and Zahavi’s focus is on defying the suggestive power of the so-called execution–observation overlap, rather than on giving it a real explanatory function, by warning against unwarranted extrapolation from the neural to the phenomenological level:

There is a crucial difference between claiming that my recognition of a certain emotion in you requires me to experience the very same kind of emotion immediately prior to ascribing it to you and claiming that the same neural substrate subserves both the experience of an emotion and the recognition of the same kind of emotion in others. The latter claim is considerably weaker. (Zahavi 2008, p. 519)

In a similar vein, Gallagher warns against what he calls the fallacy of supposed isomorphism between the neural, functional and phenomenological levels (see also Gallagher 1997):

(…) there is no necessary isomorphism between the phenomenological, the functional and the neuronal levels. So, if the neuronal process can be defined as involving a stepwise process, this does not mean that a step-wise process necessarily shows up at the level of experience, and vice versa. (…) This is tied into the concept of multiple realizability. (Gallagher 2007a, p. 358)

The claim in passages such as these is that we need not draw conclusions about the experiential level from suggestive data on the neural level. The execution–observation overlap at the sub-personal level need not have a counterpart at the experiential level in the form of an overlap between, e.g. the experience of an emotion and the recognition of the same emotion in someone’s facial expression. In fact, according to Gallagher and Zahavi, there is no such experiential counterpart. To assume there is, is to follow the implicit simulationist’s scheme according to which the initial stages of the processes of experiencing and recognizing an emotion consist of the same neural representation and experience of a ‘naked’ emotion, and this simply misrepresents what goes on at the experiential level.

I agree that the simulationist version of conceiving of an experiential counterpart of the sub-personal execution–observation overlap is wrong-headed. But the position that Gallagher and Zahavi seem to imply, which should replace or avoid such a counterpart view, is not unproblematic either. According to this position, mirror neuron activity may contribute to a larger neural process that subserves the experience of an emotion while it may also contribute to another larger neural process that subserves the recognition of a similar emotion in someone else’s comportment. Given that what goes on at the experiential level is somehow determined by the neural level, the question is what kind of relation between the neural and the experiential level is assumed here.

Gallagher appeals, in passing, to the widely used concept of multiple realisation in order to claim plausibility for his position on the relation between the sub-personal (neural and functional) and the personal (phenomenological) level. But in my view, the opposite effect is attained. Multiple realisation is a concept that was essentially used in the context of a functionalist theory of the mind–brain relation (Putnam 1967; Fodor 1974) in order to argue for the possibility of similar mental/functional states being subserved by different material/neural states. The case here is different in two respects: First of all, in the mirror neuron case, different higher-level states are supposed to be realised by neural states with a significant overlap. Secondly and more importantly, the relation of multiple realisability is not between the neural and the functional level, but between the neural/function and the phenomenological level, and this causes serious problems.

Gallagher and Zahavi’s position requires a theory of multiple realisation between the neural/functional and the experiential levels. No such theory has been proposed, as far as I know. At any rate, the required autonomy of the experiential level relative to the neural/functional levels cannot be explained parallel to the way functionalists explain the relative autonomy of functions relative to brain states. Functions are explained to be relatively autonomous from neural/physical events (in the sense that the same function does not imply the same neural/physical mechanism) in terms of a distinction between roles (functions) and realisers (the neural/physical mechanisms performing the function; Lewis 1972). But the experiential level that Gallagher and Zahavi require to be autonomous resists redefinition in terms of causal roles. Moreover, the functional level must serve here as the realiser level. It is not clear what that would mean.Footnote 3

Gallagher and Zahavi’s route to avoiding having to answer the question why and how mirroring subserves enactive social perception, then, burdens them with an as yet unsolved problem about the relation between the experiential and the neural/functional level. This is no knockdown argument against their position. There are more metaphysical mind–body-related problems unsolved, and it is not the phenomenologist’s job to solve them. Still, it is worth striving for a phenomenological view on social cognition with the lowest metaphysical mortgage possible. The threat of a very high metaphysical mortgage on Gallagher and Zahavi’s position (i.e., the threat that their position requires a theory about the relation between the neural/functional and the experiential level that we do not know how to begin to imagine) is mainly due to their insistence on the fact that, e.g. experiencing an emotion and recognizing it in others are qualitatively different states. This insistence (which is an integral part of their anti-simulationist rethoric) creates a tension with the fact that a part of the neural substrates of these states are nevertheless similar or even identical. A theory of multiple realisation relation is meant to alleviate this tension. But if we cannot expect such a theory to be forthcoming under the assumption that experience and recognition are entirely different states at the experiential level, it may be good to look for another way out. Assuming that the execution–observation overlap at the neural/functional level accounts for some overlap between experiencing an emotion and recognizing that emotion in someone else would take the metaphysical pressure off. It would give the unexpected discovery of mirror neuron activity a real place in our explanation of social cognition.

The challenge is to do this without falling in the simulationist’s trap of imposing a step-wise view of social cognition on the experiential level. The overlap between experiencing and recognizing must not consist in the similarity between the initial stages of these processes, i.e. in the experience of a ‘naked intention’ or ‘naked emotion’. In the next section, I shall argue that this challenge can be met. I will try to show that it is indeed possible to give the activity of mirror neurons a function in the overall processes that underlie both experience and recognition such that the idea of an experiential overlap between these experiential states becomes phenomenologically plausible.

I would like to end this section with a remark on the relation between phenomenology and neuroscientific data. I take it as obvious that we should reject views that use neural data to force views of our experience on us that flagrantly contradict our phenomenology. But if we are open to neuroscientific confirmation of phenomenological finds or the possibility of informing the setup of neuroscientific experiments via neurophenomenology or what Gallagher and Zahavi call front-loading phenomenology (2008a, pp. 33–40), we may also be open to the possibility that neuroscience may help us in achieving a finer-grained picture of our phenomenology (which need not imply committing the fallacy of supposed isomorphism). Instead of as a one way street, we may view the relation between phenomenology and the cognitive sciences as a dialectic reflective equilibrium in order to be maximally fruitful. Hence, we should try to use whatever elbow room there is in our descriptions of the experiential and the neural levels, within the limits of phenomenological plausibility and within the limits set by the by the scientific data, to make these descriptions mutually illuminating.

A holistic model of neural resonance-based social cognition

Gallagher and Zahavi reject treating recognition and experience of basic intentions and emotions as step-wise procedures. They stress that in our phenomenology, recognition and experience do not have overlapping initial stages. I believe they are right. But that does not mean that experiences and recognitions of emotions and intentions are unstructured states, impenetrable by further analysis. My proposal is that there is an in-between position that does explain the role of mirroring in social cognition by assigning it a proper role in accounting for what goes on at the experiential level, but without falling in the simulationist trap. Instead of as step-wise procedures, we should view experiences and recognitions of intentions and emotions as holistically structured states. As such, even though experiencing and recognizing are entirely different as overall states, they nevertheless may share similar or even identical constituents. That, at least, is what I will try to explain in this section. This overlap may be accounted for at the neural level by mirror neuron activity.

Experiencing intentions and emotions as one’s own

Let me start with the idea that experiencing emotions and intentions as one’s own can be analysed in terms of a holistic structure in our streams of consciousness. The background for this view is a notion that I will label ‘embodied autobiography’, for lack of a better term. The interlocking of perceptual, proprioceptive and intercoceptive information is fundamental to this notion.

Perceptual, proprioceptive and intercoceptive information is provided by the body and its sense organs and processed by the brain to produce the ‘backbone’ of our stream of consciousness. This information interlocks in such a way that the experiences it yields, if conceptualized, would be intelligible as a coherent primitive narrative. To quote P.F. Strawson:

[A] temporally extended series of [perceptual] experiences should have a certain character of connectedness and unity, secured to it by concepts of the objective (…) as a fundamental condition for the possibility of self-consciousness. That experience should be experience of a unified objective world at least makes room for the idea of one subjective or experiential route through the world, traced out by one series of experiences which together yield one unified experience of the world—a potential autobiography. (Strawson 1966, p. 163)

Thus, consecutive visual perceptions issued by one body display an order revealing the double-sided ‘story’ of, on the one hand, a physical body tracing a spatiotemporal path through the world and, on the other hand, the visual features of various consecutive locations of that path the eyes are turned to. It is important not to conceive of such experiences as a mere series of ‘pictures’ in our minds, but rather as a series of lived experiences of a situated body. Such a sequence is integrated with other sense perceptions in ways that allow us to make sense of the world and one’s bodies in it. For instance, when approaching an object (the object becomes bigger in ones visual field in a way that ‘matches’ with the proprioception of one’s walking body etc.), while hearing a sound that is increasingly becoming louder and loudest when one is closest to the object, we will experience the object as the source of the sound (a loudspeaker, say). There is an order in ones sense perceptions and proprioceptions that is determined by the features and regularities of the physical world on the one hand and by one’s embodiment and the workings of our senses on the other. Such an order can be violated by odd experiences (such as, e.g. looking upwards, feeling ones head being tilted backwards and seeing one’s own feet) and restored when these experiences can be explained as illusions.

This basic synchronic and diachronic coherence in our experiences, provided by the body as an ‘objective continuant’ (McDowell 1998), is the backbone of our stream of consciousness in that it forms the background against which other experiences take place and against which they acquire coherence. Let me give an example and discuss the extent to which we can extrapolate from it below: Suppose I walk in a dark alley and see a figure approaching me. The figure frightens me. This immediately makes me ‘decide’ or intend to turn around and walk away. After having acted on that intention, reaching a crowded street, I slowly feel my anxiety diminishing. In this scene, my emotions and intentions interlock with my sense perceptions, proprioceptions and interoceptions in such a way that the various ‘elements’ and stages of my entire stream of consciousness cohere and make each other’s appearance intelligible. Conceptualising (and hence distorting through abstraction from) what happened, we might say that my emotion (which I take, following James, to involve the interoception of visceral processes) was caused by perceptions, my intention to turn around was caused by my fright and my consecutive walking away, accompanied by specific proprioceptions and perceptions, was caused by my intention etc. The emotion and intention ‘blend in’ with what I shall label (after Strawson) my ‘embodied autobiography’: my stream of consciousness as determined by my situated body.

In other words, the idea is that the emotion and intention do not just occur contingently against the background of embodied autobiography; rather, their occurrence becomes intelligible against it. Even stronger: The full ‘content’Footnote 4 of the emotion and intention is co-determined by their context of perceptions, proprioceptions and interoceptions. The particular quality of my fear, for instance, is inseparably connected with my perception of the scary figure. I do not experience the fear and the perception of the figure as contingently connected components (regardless of the fact that the visual cortex (responsible of the perception of the figure) and the amygdala (responsible for my experience of fear) are distinct parts of the brain). It is not as if a different visual perception may have been accompanied by the exact same fear. My fear is of that figure, there and then. The same goes for the quality of resoluteness of my intention to walk away, which is strongly coloured by my fear, and for many other ‘connections’ between the experiential ‘constituents’ of my stream of consciousness. Even though these constituents may be processed in different parts of the brain, the overall experience they yield is, in general, a holistic structure, unified to such a degree that we can only speak of experiential constituents by abstracting from experience.

The claim I wish to make in this section is that what it means to experience an emotion or intention as one’s own (rather than as someone else’s; see “Recognizing emotions and intentions in others: neural resonance without simulation” section) is that such an intention or emotion is embedded in one’s overall embodied autobiography in the same way as the emotion and intention in the above example.Footnote 5 This claim rests on the idea that the embodied self to whom emotions and intentions are ‘attributed’ (i.e. who is implied in experience as the subject of these emotions and intentions) is represented in our streams of consciousness by means of our embodied autobiography. This is basically a neo-Kantian view in which the body as objective continuant and precondition for coherent synchronic and diachronic experience plays the role of the transcendental ego. But in order to apply it to the current issue, I will briefly compare it to two notions of the self, taken from the wide variety of notions available in the literature, to which I think the idea of an embodied self implied by (rather than represented in) our streams of consciousness is congenial: the ‘ecological self’ (Neisser 1988) and the ‘minimal self’ (Gallagher 2000).

Ulric Neisser describes the ecological self as ‘the self as perceived with respect to the physical environment: I am the person here in this place, engaged in this particular activity.’ (Neisser 1988, p. 36; see Barker 1968 for inclusion of aspects of the social environment, such as rituals, as items relative to which the ecological self is perceived). This, I believe, is the self whose continuity connects the various stages of my embodied autobiography. Unlike Neisser, however, I wish to stress the way in which the ecological self is presented ‘from within’ and how it is present in the way it shapes our streams of consciousness. This apparently makes the self I am after overlap with what Neisser calls the private self. Here, the status of the body is important. Body schemata as ‘silent organizations’ (Scheerer 1954) are crucial in transforming bodily information into a perspectival representation of one’s surroundings. This is the sense in which the self I am after is embodied.

Even though I would like it to be as minimal as possible, this embodied self is slightly richer than what Gallagher (2000) calls the minimal self; the basic, immediate, primitive subject that is not extended in time but confined to the ‘specious’ present. Still, it is important not to exaggerate the differences. The self implied by our embodied autobiographies does extend over time, but not so as to constitute personal identity over time.Footnote 6 Provided that the ‘specious’ present is defined broad enough to encompass short scenes such as the above (I am aware that this is a big proviso), there need not be a major difference with the minimal self in this respect. The claim that an intention or emotion must fit one’s embodied biography in order to be experienced as one’s own is not meant to imply that emotions and intentions require coherence with a real narrative autobiographical self in order to be considered one’s own. The embodied autobiography in which emotions and intentions are experienced as ‘had by’ what Neisser calls an ecological self, which is principally embodied in the sense indicated above, can be as short as the attention span required for a specific action sequence in response to occurrences in the environment, such as the dark alley example. The main apparent difference between the embodied self I am after and the minimal self is in the way in which the self can be said to be the subject of emotions and intentions, the way in which it is implied when we experience simple emotions and intentions as our own.

Self-ascription of an intention, on Gallagher’s view (discussed mainly in the context of possible mis-identification of a motor intention as one’s own), is explained in terms of the match between the intention and an efference copy of the ensuing motor command (or even the intended state of affairs in the world and the sensory re-afferent feedback about the result of the action caused by the motor command; Gallagher 2000, pp. 16–17). My claim, by contrast, is that an intention or emotion is experienced as one’s own if it fits in with one’s embodied autobiography. A basic motor intention that is unrelated to the perceived affordances of the environment, that lacks every ‘motivation’, e.g. in the form of preceding emotions calling for a bodily response (in the above example: fear) and that does not have a follow-up in the form of perceived and proprioceived bodily movements, for instance, cannot be an intention one experiences as one’s own in the sense relevant for our discussion.

These italics are important. But before explaining them, let me stress that the difference between Gallaghers view and mine does not imply mutual exclusion. Gallaghers account is mainly sub-personal, while I am interested in analysing the experience of ownership, and to the extent that Gallagher’s account does filter through to the experiential realm, the match between an intention and an efference copy of the motor command is precisely one way of fitting into an embodied autobiography (of which the efference copy is a part). I think that it is entirely plausible to assume that this match is an essential requirement for self-attribution, a necessary condition. The difference between Gallagher’s minimal self and the embodied self I would like to employ is that I wish to claim this necessary condition is not sufficient. There is more to experiencing an intention as one’s own than the match of an intention with the efference copy of the following motor command. A hypothetical state in which there is a match between intention and efference copy, but in which the intention has no further connections with our embodied autobiography, is not a state in which the intention is experienced as one’s own in a sense that is relevant to our discussion.

Now to the italics: The point is that the kind of self-ascription we are interested in in the context of social cognition is the kind that is relevant to explaining how and why we experience intentions and emotions ‘fully clothed in agent specification’ (Gallagher 2007a, p. 358). I do not see how agent specification can be anything other than identifying a body as the locus of an intention or emotion. So the question is whether the account of the experience of first-person ownership in terms of fitting one’s embodied autobiography is correct in general. I do not know how to argue for this briefly, other than by (a) noting that the idea is, as I believe, intuitively and phenomenologically plausible and (b) using the remainder of this paper to show how it can help to solve the puzzle outlined in the previous section.

Recognizing emotions and intentions in others: neural resonance without simulation

The solution to this puzzle I wish to propose hinges on the idea that both the experience of an emotion/intention as my own and the recognition of the same type of emotion/intention as someone else’s can be analysed in terms of the holistic structure of this experience and recognition. In the previous section, I outlined this idea with respect to experiencing intentions and emotions as one’s own. Before discussing recognizing emotions and intentions as someone else’s, it may be helpful to say a bit more about the holistic structuring of experiences/recognitions.

Holistically structured states, in my use of this term, have constituents—‘parts’—that may have discretely identifiable neural correlates but that ‘blend’ into each other at the experiential level. My fear of the figure in the dark alley is subserved by neural activity in the visual cortex and in the amygdala. Even so, my experience of ‘a scary figure’ does not, at the experiential level, consist in a contingent conjunction of two discrete components, a conscious ‘image’ and the experience of fear. Rather, I have one experience: my fear of a certain figure. This is parallel to the way in which we experience a moving coloured object as one phenomenon, even though its movement, colour and shape are processed in separate parts of the visual cortex.Footnote 7 What I mean to imply by saying that a state is holistically structured is this: Although at the experiential level we can only speak of constituents or parts by abstracting from what is a stage of one stream of consciousness, there is a structure in the neural substrate of this stage that may to some degree parallel the structure of the experiential abstractions (depending, of course on how these abstractions carve-up our experience). This structure, in all likelihood, is a causal structure (in the scary figure example: The activity in the amygdala is causally preceded by the activity in the visual cortex).Footnote 8

The constituents of stages of our stream of consciousness are not, I believe, experiences in their own right. I can abstract from a certain experience of mine a feeling of fear or a visual perception of a figure in a dark alleyway. But to think of these components in abstraction from their contexts would be to abstract away from experience as such. Consider, as a parallel, experiencing the shape of an object in abstraction from its colour (it is not even black, white or gray). That is not an experience, but an abstraction. Nevertheless, the part of the visual cortex that is responsible for processing the shape of a perceived object does contribute to an overall experience of a coloured object.

The same is true of the neural activity of so-called shared representations such as mirror neurons. It may contribute to, e.g. an overall state of me experiencing an intention or emotion as my own. Even so, it is wrong to say that this specific activity on its own underlies the experience of an intention or emotion. This is why Zahavi is right when he writes that ‘[t]here is a crucial difference between claiming that my recognition of a certain emotion in you requires me to experience the very same kind of emotion immediately prior to ascribing it to you and claiming that the same neural substrate subserves both the experience of an emotion and the recognition of the same kind of emotion in others.’ (Zahavi 2008, p. 519) The latter is true but the former is false. The assumption that the former is implied by the latter is motivated by the idea that this same neural substrate by itself constitutes or realises the emotion, both in the case of my experiencing it and in the case of recognizing it in others. The account of my experience of my own emotion as a holistically structured state explains why this idea is wrong and why Zahavi is right: The shared substrate is only one constituent among many of my experience, while these constituents together realise one experience.

This brings us to the question what happens when such a shared neural substrate, the activity of a shared neural ‘representation’, is induced by observing the facial expression or bodily comportment of another person. The first thing to note is that it is likely that this representation will not fit holistically into the embodied autobiography of the observer. Chances are that, for instance, the visceral interoceptions accompanying the observed person’s emotional state and the motor intentions in response to the observed person’s emotions are completely lacking in the embodied autobiography of the observer, and this does not even mention the cause of the emotion—that which the fear is fear of, the grief is grief about etc.—which may well be lacking for the observer too. On the account I am advocating this implies that the observer will not, not even initially, experience the emotion as her own as, e.g. Goldman’s simulationism seems to supposeFootnote 9 (if by accident the emotion does fit into the observer’s embodied autobiography, I take it that the observer will ‘mistakenly’ attribute the emotion to herself).Footnote 10

The not fitting in with the observer’s embodied autobiography rules out that the observer experiences the emotion as her own, on the account. This may seem to leave open the possibility that she experiences it as a ‘naked emotion’, an emotion that is not yet attributed. But that option is ruled out by the idea that ‘naked emotions’ and ‘naked intentions’ are not experiences in their own right. They are constituents of overall states when they holistically fit in with other constituents. In the case of self-attributed emotions or intentions, this fitting in is contributing to the experiencer’s embodied autobiography. The question, then, is what other constituents the shared representation fits in with in the case of recognizing someone else’s emotion. If it does not contribute to an experience of an emotion or intention as one’s own, what does it contribute to?

Well, to the recognition of an emotion or intention in someone else’s facial expression, gesture or bodily comportment, at the neural/functional level, there is a causal connection between the representation of the expression or gesture in the visual cortex and the activity of mirror neurons or other ‘representations’ shared with the substrates of my experiencing an emotion or intention myself. At the experiential level, this results in a connection much like the connection between the fear and the figure in the alley example of “Experiencing intentions and emotions as one’s own” section. Just like the fear is fear of the figure, the resonated-with intention or emotion is experienced in the expression or gesture that evoked the resonance.

A parallel might help to make the point: Suppose you are outside and there are large clouds in the sky with patches of blue sky in between. Every now and then the sun appears from behind the clouds only to be blocked by yet other clouds a minute or so later. When the sun appears, you feel warmth on your skin and you see light. The light and the warmth are not separate experiences that are at some moment understood as being connected through induction. Rather, they are experienced as aspects of one phenomenon: the sun appearing. In a similar fashion, the visual representation of the gesture or expression and the activation of neural structures that are also involved in one’s own experience of a specific intention or emotion yield a perception of one phenomenon: the other person experiencing that emotion or intention. We perceive happiness in the smile. We perceive the intention in the grasping movement. The central idea here is again that of mental holism, the idea that experiences, perceptions and sensations shape and colour each other and determine each other’s meaning or sense.

The position that results from this holistic model of neural resonance based social cognition differs from Goldman-style simulationism in that neural resonance is not thought to result in a ‘naked’ experience or registering of an emotion that is subsequently attributed to someone else. Neural resonance is not a first stage in a process but rather one part of a larger neural substrate underlying the state of recognizing an emotion in someone else. Following Gallese (2001), however, the model does allow for an experiential overlap between recognizing an emotion in someone else and experiencing that emotion oneself. The fact that part of the neural activity underlying a person’s fear, for instance, also underlies that person’s state of recognizing fear in someone else’s facial expression may well explain an experiential overlap between fearing and recognizing fear. I believe this overlap exists. To a certain extent, seeing fear in someone else’s face is a bit like being afraid oneself. But the difference between the holistic model and stage-wise simulationism explains why this overlap allows us to agree with Zahavi that ‘[w]hen I experience the facial expressions or meaningful actions of an other, I am experiencing foreign subjectivity, and not merely imagining it, simulating it or theorizing about it.’ (Zahavi 2008, p. 520). For the overlap is only partial, the total states of experience and recognition do differ considerably both in neural and experiential respects.

I doubt whether Zahavi would consider the ‘otherness’ of an emotion recognized in someone’s facial expression foreign enough on the model proposed. There is a remnant of simulationism in it after all: recognizing fear in someone’s face is not, on the proposed model, depicted as a state that is completely different from experiencing fear oneself. Nevertheless, the difference between (1) a feeling being a part of one’s own embodied biography, naturally and ‘logically’ fitting in with one’s proprioceptions, interceptions and perceptions on the one hand and (2) a feeling not fitting in thus but occurring simultaneously with a perception of a facial expression or gesture of another person on the other does constitute a principled difference between self-ascription of, e.g. an emotion and other-ascription. Moreover, on this way of distinguishing one’s own subjectivity from foreign subjectivity, perceiving the latter is not portrayed as ‘merely imagining it, simulating it or theorizing about it.’ Thus, the holistic model of neural resonance-based social cognition occupies a middle ground in between theories of social cognition in terms of simulation and theories in terms of social perception.Footnote 11

What then, on this model, is the difference between seeing, e.g. happiness in someone else’s face on the one hand and emotional contagion—or simply starting to feel happy oneself as a result of seeing someone else smile—on the other? The difference, I speculate, is that in cases of, e.g. emotional contagion, the neural activity that results from resonance (e.g. mirror neuron activity) is not taken off-line (to use this simulationist terminology). Seeing a smile may result, through resonance, in the activation of one’s own facial muscles. The resonator’s proprioception of the contractions of her own facial muscles and whatever other bodily expressions the contagion results in will be processed as part of the resonator’s own embodied biography. Thus, the resonance activity starts to contribute, according to the proposed model, to an overall state that will be experienced, if at all, as the resonator’s own. Insofar as this means that the experiential focus is no longer on the other person’s emotional state, the holistic connection with the visual perception of the other’s behaviour is likely to fade. Thus, though neural resonance may contribute to emotional contagion and to social cognition, the holistic model can keep the two apart.Footnote 12

A solution and an empty question

Thus, there is an overlap on the experiential level between experiencing an emotion or intention and recognizing these in the comportment of others. Both kinds of state share an experiential constituent accounted for by a shared part of their overall neural substrates. Seeing fear is a bit like being afraid. Seeing happiness is a bit like feeling happy, but only a bit. Social cognition is not exactly emotional contagion. The idea of a partial experiential overlap can be illustrated by a parallel: The perception of a red square and the perception of a green square have an experiential overlap accounted for by similar activity in the part of the visual cortex that processes shape. Or compare tasting an Italian pasta sauce and a Thai green curry: Although the overall flavours are hugely different, I may well be able to note that both contain a fair amount of basil. If the holistic model of social cognition is correct, something similar goes for experiencing fear, happiness etc. and recognizing these emotions in others.

This model, I contend, takes the metaphysical pressure off of the view of Gallagher and Zahavi: Recognition and experience are, as whole states of consciousness, truly distinct, qualitatively different states—it is true that I do not look into myself in order to ascribe an emotion to you and when I see your emotion, I see foreign subjectivity. But experience and recognition do share a component, at the experiential level, which is accounted for by an overlap in their neural substrates. Thus, there is no need for a problematic theory of multiple realization between the neural/functional and the experiential level (see the section entitled “A problem: what role does neural resonance play in enactive social perception?”).

Put in this way, the view of holistically structured states would be a slight modification of Gallagher and Zahavi’s position intended to solve a problem in it. The problem that is addressed by this amendment arose from the fact that Gallagher and Zahavi described experiences and recognition as qualitatively different without describing them as structured in a way that allows for experiential overlap. But that may well be due to the anti-simulationist rethoric, countering the simulationist tendency to ignore the qualitative difference between first- and third-person experiences and attributions. If that is the case and if there is allowance for an analysis of the proposed experiential structure of experiences and recognitions of emotion, then it may very well be that it is precisely something along the lines of the view proposed that Gallagher had in mind when he invoked the notion of multiple realisation.

Especially in view of this latter possibility, it is fair to ask whether the position I end up with is in fact Gallagher and Zahavi’s position. Though the positions are, as far as I can see, certainly compatible, I also think there may be some difference, at least in emphasis. The notion of holistically structured states and the concurrent idea that there is indeed experiential overlap between experiencing and recognizing emotions and intentions is not part of Gallagher and Zahavi’s theory, and there is, indeed, a simulationist ‘flavour’ to claiming that recognizing an emotion or intention in the facial expression or gesture of another person does, partly, feel like experiencing that emotion or intention oneself. This fact is ignored or downplayed by the term ‘perception’.

So, is neural resonance-based social cognition perception or simulation? If the account of neural resonance-based social cognition sketched in this section is anywhere near roughly correct, I think it is best to adopt a strategy borrowed from Parfit (1984) with respect to this question. We should ask whether there something we do not know if we do not know whether neural resonance-based social cognition is simulation or perception. The answer, I believe, is no. If the picture sketched in this section is more or less accurate, the simulation-or-perception question becomes an empty question. The personal-level terminology does not illuminate the sub-personal neural/functional data. Moreover, conversely, the discovery of mirror neurons has focussed our attention to an overlap at the experiential level between experiencing and recognizing intentions and emotions, and new terms such as ‘shared manifold’ (Gallese 2001), pitched at the neural level, express this experiential fact more accurately than traditional personal-level terminology. Rather than debating over which personal-level terminology we should apply to describe the sub-personal data, in this case, it may be better to let the sub-personal data illuminate our personal-level terminology.