Phenomenology and the Cognitive Sciences

, Volume 17, Issue 2, pp 245–266 | Cite as

Individualism versus interactionism about social understanding

  • Judith Martens
  • Tobias Schlicht


In the debate about the nature of social cognition we see a shift towards theories that explain social understanding through interaction. This paper discusses autopoietic enactivism and the we-mode approach in the light of such developments. We argue that a problem seems to arise for these theories: an interactionist account of social cognition makes the capacity of shared intentionality a presupposition of social understanding, while the capacity of engaging in scenes of shared intentionality in turn presupposes exactly the kind of social understanding that it is intended to explain. The social capacity in question that is presupposed by these accounts is then analyzed in the second section via a discussion and further development of Searle’s ‘sense of us’ and ‘sense of the other’ as a precondition for social cognition and joint action. After a critical discussion of Schmid’s recent proposal to analyze this in terms of plural pre-reflective selfawareness, we develop an alternative account. Starting from the idea that infants distinguish in perception between physical objects and other agents we distinguish between affordances and social affordances and cash out the notion of a social affordance in terms of “interaction-oriented representations”, parallel to the analysis of object affordances in terms of “action-oriented representations”. By characterizing their respective features we demonstrate how this approach can solve the problem formulated in the first part.


Social cognition Enactivism Interaction Joint action Affordances Sense of us We-mode Sense of the other 

1 Introduction

Philosophers and psychologists disagree about the nature of social cognition, i.e. whether it is achieved by ascribing mental states after a process of cognitive reasoning based on observational data (Gopnik and Wellman 1992), or after a simulation routine (Gordon 1986), or whether it is simply perceptual in nature (Gallagher 2005). In the last decade, the theoretical landscape has been both supplemented and differentiated by hybrid approaches (Goldman 2006), two systems theories (Apperly and Butterfill 2009; Butterfill and Apperly 2013), and a renaissance of phenomenological approaches in the context of an embodied enactive cognitive science (e.g. Gallagher 2005; Zahavi 2014).

In particular, the distinction between the mere passive observation of social interaction and an active engagement in social interaction (Schilbach et al. 2013; Butterfill 2013) motivated the development of enactive accounts according to which interaction based on embodied sensorimotor skills is – systematically as well as ontogenetically – prior to reflection or simulation procedures based on disengaged observation of behavioral patterns (Hutto 2004; Reddy 2008; de Jaegher et al. 2010).

A significant feature of such accounts is the rejection of the individualism that is presupposed in the more traditional approaches. According to some of the new interactionist theories, social cognition is achieved not by a single individual who is understanding someone else’s mental states, but by the joint (or collaborative) effort of two or more agents. De Jaegher & Di Paolo (2007), for example, call this process “participatory sense-making” since understanding is seen as the emergent result of the autonomous interaction process between two or more agents. This account of social cognition is intended to replace the more traditional theory- and simulation accounts. The problem which we intend to outline and address in this paper is the following: accounts aiming to explain social cognition in terms of joint action are ultimately circular since joint action of the relevant kind presupposes social cognition of a basic kind. That is, in order to arrive at the kind of social interaction needed to constitute participatory sense-making, the participating agents must already be in the possession of some basic social understanding of each other. Consequently, social cognition cannot be explained in terms of joint action.

In the first section, this problem is illustrated in more detail by a brief discussion of two recent accounts of social understanding, namely, the autopoietic enactivism put forward by de Jaegher & Di Paolo (2007) and de Jaegher, Di Paolo, & Gallagher (2010) and the we-mode account, put forward by Gallotti and Frith (2013) as an explicit alternative to the former position. The kind of social understanding that is (at least) presupposed by accounts of this kind is then analyzed in some detail in the second section which starts with a brief discussion of Searle's (2002) analysis of joint action. He argues that a general sense of the other as a possible candidate for collaboration is a precondition of any truly joint action. Since we think that Searle fails to provide a satisfying explanation of this important condition, we investigate Schmid's (2014) recent attempt which introduces the notion of a “plural pre-reflective self-awareness”. After highlighting some problems of this proposal, we finally suggest an alternative solution that differs importantly from both Searle’s and Schmid’s proposals. The basic idea of this alternative is to start with the notion of affordances, prominent in enactive embodied approaches like de Jaegher’s, but we argue for a representationalist analysis of affordances, especially social affordances which emerge from the coupling of two interacting agents. We propose to cash out the notion of a social affordance in terms of “interaction-oriented representations”, based on Millikan’s (2004) notion of a pushmi-pullyu representation that she introduced as a representational characterization of Gibson’s (1979) notion of an (object) affordance. Against more radical enactive accounts, we try to make a case for an account that is based on such mental representations.

2 Social cognition and interaction

In the recent debate on social cognition, various philosophers and psychologists have stressed the need for a distinction between interaction and observation as constituting two different types of situation which determine in different ways whether and how we can understand another person’s mental states (e.g. Gallagher 2001; Hutto 2004; Schilbach et al. 2013; Butterfill 2013). For example, by interacting directly with someone, their nonverbal ways of communication and the direct way of addressing the other (as a “You”, as Reddy 2008 calls it) may lead to a communication loop and in turn make it not only much easier to understand what the other is up to; the interaction dynamics among two (or more) individuals may even open up completely different avenues to understanding some other person’s mind (Butterfill 2013). In contrast to third-person ascriptions of mental states on the basis of inferences following observation, as suggested by the theory-theory (Gopnik and Wellman 1992), and in contrast to the projection of one’s own mental states – known from the first-person perspective – into the other person, as suggested by simulation-theory (Goldman 2006), this family of proposals introduces the importance of a second-person stance which is constituted by active participation in a communication loop (see Eilan 2014). Overgaard and Michael (2013) have argued that weaker formulations of these latter approaches do not provide an alternative to theory- or simulation accounts, while stronger formulations turn out to be implausible. We intend to address these theories’ strong emphasis on the role of interaction, which implies the explicit rejection of the individualism entailed by the traditional approaches. By discussing one representative enactive approach, we point to a problem in the attempt to explain social cognition in terms of joint activities or in terms of the interaction dynamics between agents. In the following section, we discuss another popular approach, namely, the so-called we-mode theory (Gallotti & Frith, 2012) that has been put forward as an alternative to the second-person approaches. Again, we show that this approach is equally problematic because it clearly presupposes an important aspect of social cognition.

2.1 Autopoietic social enactivism

Some of the proponents of a second-person account have suggested that the interaction process between agents may itself at some point emerge as a semi-autonomous process on a higher level, which can therefore be partly credited with understanding. Instead of looking for the cognitive achievement of social understanding within the participating individuals (or their brain mechanisms for that matter), these authors suggest that the communication loop between the individuals literally (partly) constitutes understanding. This is called “participatory sense-making” since the agents jointly bring about the meaning of their mental acts, via interaction (De Jaegher & Di Paolo 2007, De Jaegher, Di Paolo & Gallagher 2010).

This thesis is put forward in the context of an embodied and enactive account of social cognition, translating central claims from an embodied enactive cognitive science to the social domain. According to enactive approaches, cognition is (quite generally) a bodily activity and must be explained in terms of the complex dynamics between brain/mind, body, and environment. In the course of the complex back and forth between action and perception, cognitive processes transform the neutral environment into a meaningful “Umwelt” in relation to the specific needs and purposes of the cognitive agent (Clark 1997; Thompson 2007). In this context, the organism’s range of sensorimotor skills and the layout of the environment determine the complex set of affordances that emerges specifically for this agent in this situation. To a first approximation, affordances are possibilities for action and perception emerging from situating the agent in an environmental context (Gibson 1979; Noë 2004; Chemero 2009). An important aspect here is the notion of the right kind of coupling between agent and world which forces us to treat them as one complex cognitive system where a relevant portion of cognitive processing is outsourced by the agent onto the world. (We will return to the notion of affordances in more detail towards the end of the paper.)

According to one particular brand of enactivism, namely “autopoietic enactivism”, cognition emerges from the self-organizing and self-producing activities of living organisms. Cognitive structures display the same organizational features as living structures such that mind is life-like and life is mind-like (Thompson 2007, 128ff). De Jaegher and Di Paolo (2007) have translated the central tenets of autopoietic enactivism into an enactive account of social cognition, i.e. a theory “concerned with defining the social in terms (a) of the embodiment of interaction, (b) of shifting and emerging levels of autonomous identity, and (c) of joint sense-making and its experience” (de Jaegher & Di Paolo 2007, 489). On this view, social understanding by way of direct interaction with others – based on the right kind of coupling between agents, not between agent and (physical) world – is importantly different from an understanding of other minds based on mere observation, and it constitutes our primary way of understanding what others feel, intend or desire. De Jaegher and Di Paolo define social interaction as “the regulated coupling between at least two autonomous agents, where the regulation is aimed at aspects of the coupling itself so that it constitutes an emergent autonomous organization in the domain of relational dynamics, without destroying in the process the autonomy of the agents involved” (2007, 493).

In the context of interaction, embodied practices like gestures and facial expressions modulate one’s social understanding. One’s own active efforts to understand someone else affect the other’s verbal and nonverbal behavior which in turn affects one’s understanding and so on. The reciprocity that is so characteristic of social interaction is already an inbuilt feature of enactive approaches. But the coupling is not something that accidentally happens; co-presence is insufficient, mutual awareness is necessary. It is important that the coupling is “coordinated” (de Jaegher and Di Paolo 2007, 490), i.e. consciously achieved and maintained. Only when the agents are mutually aware of their coupling relation, the interaction itself can take on an autonomous dynamics that can then prompt us to consider it as a constituent of social understanding itself. As the quote above (and other passages) makes clear, de Jaegher and Di Paolo emphasize the active and joint character of social understanding. They hold that social understanding based on such coupling is achieved jointly such that social cognition is explained in terms of joint action, namely, the joint action of actively bringing about the meaning of the agent’s mental states by way of coordinated interaction. This is expressed even more clearly in a recent publication where they argue that “in such situations of interactive engagement, it is not individual cognizing and behavior that sufficiently determines the relevant phenomena: both social acts and meanings are constituted socially and during the interactive encounter […] The interactive constitution of social acts and meanings is a joint cognitive process that necessitates, but is underdetermined by, individual cognition.” (de Jaegher, Di Paolo, Adolphs, 2016, ms. p. 8).1

Note that de Jaegher et al. (2010) explicitly emphasize that the interaction process plays a constitutive role for understanding; it is not intended as merely a contextual factor or enabling condition. Interaction is supposed to be part of what produces social understanding such that understanding cannot be identified by investigating the cognitive processes of one individual, since it happens “between individuals” as a result of participation (de Jaegher et al. 2010, 446). Paradigm examples of such regulated coupling are conversations, but also other cases of collaboration in solving a given task. Reciprocal cognitive processes lead to understanding, conceived of as a process of meaning-production or “participatory sense-making”. Within the interaction process, the agents jointly bring about social understanding.2

If this claim – that we have to look for the social understanding within the process of interaction (rather than in the agent’s heads) – is comprehensible at all, it is questionable whether it succeeds as an explanation of social understanding. The point we wish to make is that the account ends up being circular. While it intends to explain social cognition in the sense of how meaning is generated, it presupposes meaningful social cognition at the same time. In their explanation of the relation between the individual and the social level, de Jaegher and Di Paolo (2007) rely heavily on the concept of a coordination of the interaction as an ongoing process between two (or more) agents. If we focus not so much on the ongoing interaction itself but on its causal antecedents (how the interaction got started in the first place), then it is clear that at least one of the agents must have intentionally and consciously brought about the coupling by way of some communicative gesture or even an act of mindreading. Although this gesture may be minimal – it certainly need not be vocal but may be any activity – its performance clearly presupposes that the person using the gesture thinks that performing whatever communicative act makes sense with respect to a certain communicative goal. Thus, this act of intentional communication presupposes at least the social capacity to consider the other as a possible candidate for social interaction.

Obviously, the crucial difference between (non-social) cognition and social cognition is that both elements of the coupling relation are agents. For example, when I am investigating a tree in the woods the coupling between the tree and me is quite different from the coupling between my brother and me when I am trying to understand him while we are interacting. A prerequisite of establishing an interaction with my brother is that at least one of us considers the other as a possible candidate for such a special kind of coupling. The point is that this consideration is already a basic meaningful act of social cognition. Although it is (at this stage) not concerned with specific thoughts and feelings of the other person, it entertains the possibility of reciprocal interaction, of exchanging (i.e. communicating) beliefs, emotions and intentions and of sharing intentions or goals or emotional states with this other agent. One might consider this as a basic capacity of human beings since young children are already highly sensitive to feelings, interests, needs and other attitudes of others, especially their caretakers. But this does not obviate the need for “the first step”, i.e. the need to address someone else as a “You” (Reddy 2008) in order to bring about the coupling in question. We do not question the claim that after this initial step the interaction process may bring about more complex meanings through participatory sense-making.

But since this instance of social cognition is presupposed by social interaction and joint action, the latter cannot explain or provide any general grounding for the former. At the point where interaction can play the autonomous role envisaged by de Jaegher, Di Paolo, and Gallagher, the coupled agents must be engaged in a complex joint activity of exchanging information on various levels. And such a joint activity can only get off the ground via an initial process of social cognition – if not full-blown mindreading – simply understood as entertaining the idea that the other can reciprocally relate to myself; that the other is a cognitive agent like myself (e.g., along the lines of Meltzoff 2007). We submit that therefore, autopoietic enactivism cannot, as it stands, explain social cognition since it presupposes it.

Therefore, it seems reasonable to conceptualize the role of interaction for social cognition differently. Other thinkers have acknowledged the importance of social interaction yet without giving up cognitive individualism regarding the processes underlying social understanding. For example, Butterfill (2013) emphasizes the difference between interactive mindreaders and merely observing mindreaders with regard to the “evidential basis of mindreading”, not with respect to the nature of the mechanisms involved in mindreading (2013, 842). His thesis is that interactive mindreaders “could exploit routes to knowledge which would be unavailable if they were entirely passive observers” (ibid.) such that interaction can narrow down the range of possible interpretations of behavior, and thus reduce uncertainty. Butterfill does not intend to replace mindreading with interaction. Unlike de Jaegher and Di Paolo, Butterfill does not propose that interaction can explain or constitute social understanding or that interactive processes replace individual ones. Indeed, Butterfill is open to various strategies of social cognition without being committed to only one route to understanding others, let alone an identification of social cognition with mindreading. Therefore, his much weaker claim about the role of interaction is not prone to the objection formulated above.

2.2 The We-mode

Some critics of second-person-accounts who do not wish to consider the interaction process as a constituent of the cognitive process of social understanding may want to remain focused on the individual agent and their cognitive processes of understanding, while considering interaction only as a causal contributor. One alternative proposal in this regard is based on what Gallotti and Frith (2013) call a “we-mode” as a way of sharing mental states once we engage in interaction. They emphasize the first-person plural perspective as essential for a conceptualization of the social interaction, conceived of as a way of expanding “each individual’s potential for social understanding and action” (2013, 160). This claim is directed at proponents of the view that interaction provides a richer “data base” for a mindreader than observation. Thus, it is critically directed not only at de Jaegher and Di Paolo’s account, but even at Butterfill’s more moderate account of the role of interaction.

In Gallotti and Frith’s proposal, the basis for my broader understanding of my partner’s state of mind is that I represent “aspects of the interactive scene in the we-mode” (2013, 161). The authors introduce a we-mode in order to bypass the problems associated with an interactionist account as developed by de Jaegher and colleagues, while still capturing the intuition that the representational capacities necessary for true joint action outstrip the resources that are available to an individual who represents states of affairs and other people’s mental states from their own individual perspective only. Two agents acting together represent their contributions as leading towards a goal that they are pursuing together. So, they share the focus on the jointness of understanding with the account by de Jaegher and colleagues. Instead of simply intending or desiring or believing Gallotti and Frith rely on attitudes like intending-together, desiring-together and believing-together (2013, 163). Despite the attitude being construed in the first-person plural, it is still supposed to be an individual representation in the head(s) of the agent(s).

Gallotti and Frith suggest that the function that interaction plays in enactive accounts should be realized by representations of actions in the we-mode. That is, on their view, interacting agents simply know more about the other person’s mental states than mere observers because “co-representing the others’ viewpoint on the action scene as a condition for acting jointly modulates the space of mental activity and, therefore, behavior, by providing each agent with access to a set of descriptions and concepts that would be unavailable from the observational, first-person singular or third-person, perspective” (2013, 164). It is only when people act jointly or enter interactive contexts that these “latent” cognitive resources become salient. An individual agent has I-mode- as well as we-mode-propositional attitudes. The suggestion seems to be that we can switch – at will, so to speak – from I-mode into we-mode and vice versa, once a situation is deemed appropriately social or affords interaction.

Yet, on this account it seems somewhat mysterious how people achieve this. Gallotti and Frith introduce the we-mode as a solution to experimental data suggesting that the individual performance in a joint task varies depending on whether the participant represents her contribution as belonging to a joint goal of the team or not. But saying that sometimes we represent our intentions individually while at other times we represent them in the first-person plural is not a solution to the puzzle, it is one way to formulate the puzzle. It remains mysterious how people achieve representing their actual or potential partner’s actions and goals. We will introduce our own solution to this puzzle in the last part of this paper. On the we-mode account though, co-representing the other agent’s intentions and beliefs about the situation is required, but such joint action clearly presupposes sophisticated mindreading. The action is only carried out jointly if agent A represents not only his/her own intention and subplan in order to achieve the joint goal. A must also represent the specific representational states of B (and possibly other members of the team). Another presupposition is the possibility for agent A (and presumably B as well) to recognize a situation as being social or in any way suited for collaboration. How this is achieved remains unclear.

Therefore, it seems that interaction can provide an answer to the question how the we-mode can emerge. As Timmermans et al. (2013) point out, Gallotti and Frith seem to create an implausible dichotomy between social and non-social interaction: When I take my dog for a walk, am I doing this in we-mode or I-mode? And if babies, dogs, people that might enter the room, and random people in the street all trigger the we-mode, when would we ever be in I-mode? The account presupposes the capacity to distinguish between situations where a we-mode is suitable and situations where it is not, so that it presupposes (at least) what Searle has called “the sense of the other as a candidate for cooperative agency” (Searle 2002, 104). Since it is not at all clear that the world is neatly carved up that way, Gallotti and Frith should provide an explanation of how we can identify individual from interactive situations.

Their proposal is circular if it is intended as an account of social cognition (Timmermans et al. 2013). Towards the end of their paper (2013, 164), Gallotti and Frith present the we-mode-account as an alternative to the social enactivism defended by de Jaegher and Di Paolo (2007), suggesting it is meant as an account of social cognition. But throughout the paper, it is elaborated as an account of joint action. As such, it is incomplete because no explanation is given how and when to switch modes. While Gallotti and Frith’s approach is supposed to bypass the difficulties besetting de Jaegher’s interactionist approach, it runs into basically the same problem by making joint action the precondition for social understanding while the account of joint action presented clearly presupposes social understanding on various levels of complexity.

There is a loophole through which Gallotti and Frith may be capable of escaping the circularity. They suggest that it is a basic feature of human cognition that it contains these latent resources for collaboration, even up to a point where “the mind is not just a product of the social: it is social all the way through.” (2013, 164) Even when we attempt to realize projects on our own, latent resources for joint action make any mental activity inherently social, even if we do not act jointly. One suggestion in this context could be to hold that sensibility to ostensive signals that are communicative in nature, e.g. direct gaze, are realized by an innate mechanism (Csibra and Gergely 2009). In that way, the explanatory circularity could possibly be avoided by postulating an innate module that realizes the relevant function. But Gallotti and Frith do not elaborate whether they intend their proposal of an irreducible we-mode to be innate in this sense. We will formulate our own proposal in the last part of this paper.

2.3 Summary

The attempts to account for social understanding in terms of joint action that we have discussed here – de Jaegher and Di Paolo’s autopoietic enactivism and the we-mode account proposed by Gallotti and Frith – face the general problem that joint action in turn presupposes social understanding in a basic sense. Obviously, this can give rise to a vicious circularity, more damaging for some such accounts than for others. For example, it seems to create a significant problem for de Jaegher’s et al. attempt to show that the interaction process constitutes social understanding (de Jaegher et al. 2010). It creates no problem, for example, for Butterfill’s (2013) emphasis of the advantage that interaction provides to a person that is addressed by an interacting mindreader. Gallotti and Frith suggest that intending in the we-mode should play the role that is assigned to interaction in the enactive account, but they lack an explanation of why and when one switches from I-mode into we-mode and vice versa. A presupposition that is shared by these accounts is that at least some agent must be capable of considering the other agent(s) as potential partner(s) to interact with socially. Otherwise the relevant coupling relations or interaction processes or activities experienced as done in the we-mode would not get off the ground. If engagement in social interaction presupposes the capacity to distinguish between social and non-social situations, then engaging in social interaction presupposes a basic understanding that is already importantly social in the sense that the interacting agents have to understand what they are doing with the other person and why.

This whole predicament raises the following questions: If interaction and reciprocity make a crucial difference for our social understanding (in contrast to mere observation), are they really in a position to explain social understanding in the way that is suggested by proponents of second-person- or first-person-plural accounts or do they rather presuppose it? Can we say more about the very basic social capacity at play here? What is responsible for my considering the other person as a possible candidate for joint action or social interaction? What is this sense of the other that motivates us to address each other as a You? In the next section, we discuss several ways of cashing this out.

3 The sense of the other and the sense of Us

In his classic paper on Collective intentions and actions, John Searle (2002, 103) writes:

“In addition to the biological capacity to recognize other people as importantly like us, in a way that waterfalls, trees, and stones are not like us, it seems to me that the capacity to engage in collective behavior requires something like a pre-intentional sense of ‘the other’ as an actual or potential agent like oneself in cooperative activities.”

The two accounts discussed above, by de Jaegher and Di Paolo on the one hand and by Gallotti and Frith on the other, attempt to explain social cognition in terms of joint activities. Searle highlights in this passage that truly joint action has its own cognitive presuppositions, namely to recognize other people as being like oneself in an important respect and a sense of the other as a potential candidate for engaging in joint activities in the first place. For example, while you may intend to cook a meal with (the help of) your kitchen aid, you cannot intend to cook a meal together with your kitchen aid, conceived of as a joint action. The kitchen aid, or any tool for that matter, is simply the wrong kind of thing for being a partner in a true joint action; it is not an agent like you, let alone a conscious agent with a sense of agency. Searle argues that the mutual awareness of the other(s) as agent(s) must “coalesce into a sense of us as possible or actual collective agents” (2002, 104). The presupposition is that each team member conceives of the others as candidates for being a team member in the first place, being able to contribute to some joint action in some way. Only then can a true sense of “we” get off the ground. Searle also uses the notions “communal awareness” and “sense of community” to describe this condition for joint action (ibid.).

We think that Searle makes an important point here that is also central for the present discussion, in particular for our objection formulated in part 1. Being able to recognize someone as being able to collaborate or interact in some way is an act of social understanding. In this sense, the accounts outlined in the first section are circular. But Searle relegates these important conditions to what he calls the “Background” of any representational state and does not elaborate on them any further. According to Searle (1983), Background assumptions and capacities play a foundational role for all representational states and help us to bypass this circularity. Searle rejects the view that a sense of us could be conceived as the result of joint activities. Collective behaviors and communication are not basic enough in order to explain society, since they are intentional and as such they presuppose this pre-existing sense of us in order to function in the right kind of way. It is this point that we also find in Schmid’s analysis. Schmid (2014) is not satisfied with Searle’s account of the conditions for joint action and wants to analyze them further. He picks up Searle’s claim that we need a “sense of us” and ultimately characterizes it as “plural pre-reflective self-awareness”. The next section introduces and examines this account.

3.1 Schmid’s analysis of the sense of us: plural pre-reflexive self-awareness

By calling it a “communal awareness”, Searle indicates that it is supposedly a conscious background condition of collective intentionality. Schmid (2014) does not distinguish between the sense of the other as a candidate for joint action and a sense of us, but concentrates on analyzing the latter. He starts by delineating the realm of systematically possible ways of cashing out the phrase “sense of us”, where the “of” is interpreted in the context of the structure of intentionality (subject, mode, object). Some technical complications arise here since Schmid seems to have changed his mind. While Schmid (2014) rejected the subject- and object-interpretations of “of” in favor of a mode-account, he reconsiders the subject account in a forthcoming paper, arguing that the plural pre-reflective self-awareness is a feature of the group. We only point out these differences but think that they do not matter for our discussion below since we will restrict our comments on conceptual issues pertaining to the notion of plural pre-reflective self-awareness.

The central claim in Schmid’s version of the mode account is that we should acknowledge a basic form of ‘plural pre-reflective self-awareness’, analogous to its individual cousin (see e.g. Sartre 1956; Shoemaker 1968). In the individual case, pre-reflective self-awareness is supposed to be an awareness of a subject as subject, thereby grounding any sense in which a subject can reflect on itself as object. Schmid argues “that plural self-awareness plays the same role between minds as singular self-awareness plays within individual minds. Selfhood does not only come in the singular, but also in the plural” (Schmid 2014, 15). Just as we consider pre-reflective self-awareness as being constitutive of being an individual self and having a first-person perspective, Schmid argues that we should consider the possibility whether such a pre-reflective self-awareness can constitute a collective self and a we-perspective.

The main features of what has been called pre-reflective self-awareness are the following: (a) the immediacy with which one’s conscious experiences present themselves as one’s own experiences is the reason why there is no question whose experience it is that one is undergoing – where this self-awareness does not itself involve any act of identification but in turn grounds any higher-order (reflective) act of self-identification (identifying one’s own hair color, say); (b) this ontological subjectivity of experience constitutes one’s own first-person perspective on the world which precedes any intentional act in which one can reflect on oneself as object; (c) being aware of oneself in this way is the source of commitment to strive for consistency among one’s intentional attitudes and the source of motivation to act in accordance with such commitments. In sum, “self-awareness is being aware of one’s attitudes as one’s own, as attitudes that are one’s own perspective on something, and as one’s own commitments” (Schmid 2014, 17).

Schmid argues that these features translate to an analogous kind of pre-reflective self-awareness for collective attitudes. Regarding the function of ownership of attitudes, when, say, not only you and me, but we believe that going for a walk together is a good idea, plural pre-reflective self-awareness “is the basic way in which those collective intentions and beliefs are transparent to ourselves as ours” (Schmid 2014, 17). This is also supposed to involve the idea of a shared perspective in the sense that we look at things in this way. In virtue of two or more agents having a sense of sharing certain attitudes, they constitute a plural self (−perspective), or so Schmid argues. Such a shared perspective can also explain the normative pressure to agree with each other, as observed in many social interactions. Thus, Schmid concludes, “plural self-awareness is pre-reflective, non-thematic awareness of our attitudes as ours, collectively, in a way that makes the social sharedness of those attitudes phenomenally transparent to us, constitutes a shared perspective, and normatively drives us towards consistency of our attitudes” (Schmid 2014, 18).

3.2 Problems with Schmid’s proposal

Schmid’s analysis faces problems, however. Our central objection to his account is that the notion of plural pre-reflective self-awareness is very problematic. More specifically, a tension arises when we clarify the various elements that feed into it and the roles it is supposed to play. In order to explain this further we will have to point to the important disanalogies between individual and plural pre-reflective self-awareness: As far as common ownership and a shared perspective are concerned, there is an important difference between the individual and the plural case in terms of possible misidentification. While I cannot be mistaken whether it is my feeling of hunger rather than someone else’s, I can very well be mistaken about how the collective is constituted in case I have a plural self-awareness (Evans’ (1983) immunity to error through misidentification). Schmid (2014, 23) agrees that there are such important epistemic differences between the ‘we’ and the ‘I’. A spokesperson of a group may be mistaken about the specific constitution of the group and about their propositional attitudes in a way he cannot be mistaken in the first-person singular case. But then, when Schmid claims that intentions can be transparent to “ourselves as ours” (Schmid 2014, 17) the referent of this phrase is not at all clear.

Schmid’s arguments for the strong thesis that collective intentions can be transparent as ours in the same way that individual intentions can are insufficient. If it is to be an awareness, it will have to be one agent’s awareness since (a) if it is supposed to be shared by two or more agents, then Schmid’s account will ultimately lead into a regress, because it has to explain how this awareness comes to be shared and we are back where we started. Alternatively, (b) if the awareness is supposed to be had by a common subject then Schmid should inform us how we should make sense of the idea that conscious awareness can be a feature of a composite entity and how individuals relate to this group mind and the sense of us pertaining to this group mind. A composite subject simply seems to be the wrong kind of entity when it comes to being the subject of an experience or awareness. Arguably, a sensation like a toothache had by a whole team is difficult to conceive of. Similarly, only an individual can be aware of being a member of a team.

But the awareness in question cannot at the same time be pre-reflective. If it is one agent who has the experience ‘of us’ then such plural self-awareness rests upon a foregoing act of identification in order to determine the group or team in question. But this act of identification of the team members makes plural self-awareness reflective. And if that is the case, the self-awareness in question does not have the same normative force as the individual case either. A member of a congregation having a belief that p not only as her private attitude but having it in a way that “involves her self-understanding as a member of the congregation” (Schmid 2014, 17) clearly seems to involve an act of reflection on her role in the given situation. This becomes clear once we consider the possibility of a conflict between her private attitude towards p and the congregation’s official attitude towards p, which she is supposed to support or represent. Believing that p as a member of a group presupposes an act of identification as a member of that group in contrast to being a member of another, possibly competing group. If there can be plural self-awareness then this cannot be at the same time pre-reflective. It simply does not carry the same authority, normativity and commitment as the first-person singular case.

This raises the important question whether the notion of a plural self-awareness at issue here is consistent if it cannot be pre-reflective. One of the central features of individual pre-reflective self-awareness is its lack of an act of identification with the person in question. Yet, when a group is concerned, identification seems to be necessary such that the resulting self-awareness can no longer be pre-reflective. Schmid holds that this awareness is already constitutive of a plural subject just like the individual pre-reflective self-awareness is constitutive of a single subject. What he has in mind is that the notions of (individual) subject and plural subject do not refer to anything over and above these respective kinds of pre-reflective self-awareness. He does not intend to endorse a more robust or substantial notion of subject. Yet, it seems that simply one agent being pre-reflectively aware that intending to go for a walk is not only their private I-intention but a we-intention is insufficient for the constitution of a plural self or team intending to go for a walk, especially since it is underdetermined to which team the agent in question may belong. In order to be clear about the latter fact, an act of reflection is needed. But such a reflective self-awareness of being a member of a group cannot play the foundational role that Schmid intends plural self-awareness to play. By calling it pre-reflective, Schmid argues, his interpretation of the sense of us is supposed to be closer to Searle’s conception of the sense of us as a non-representational Background condition. This seems right, since Schmid holds that this kind of awareness is not yet intentional because it does not involve a subject-object-structure. But given the objections raised above, Schmid must show how the notion of plural pre-reflective self-awareness is more than an ad hoc construction modeled on the first-person singular case. In his development of Searle’s claim that joint actions and intentions presuppose a sense of us Schmid ignores the difference between the sense of us and the sense of the other as a candidate for cooperation. The latter, also highlighted by Searle, is the systematically prior notion, since arguably, a sense of “us” can only arise on the basis of the recognition that others can or do cooperate or at least interact. As we argued in the first part, it is this latter act of social understanding that is presupposed by the enactive account and the we-mode account. The following section attempts to develop an account of this condition since neither Searle’s nor Schmid’s theories are satisfactory.

4 Social affordances as interaction-oriented representations

The second part of the paper was devoted to demonstrating that certain accounts of social cognition that either rely on interaction or assign a special role to joint action remain problematic since interaction and joint action both presuppose social cognition in a basic sense. The discussion of the third part was concerned with an attempt at providing a foundational explanation of this enabling condition. Yet, not only does Schmid fail in his attempt to analyze a ‘sense of us’ in terms of the notion of plural pre-reflective self-awareness; he also ignores the notion of a sense of the other as a candidate for cooperation and interaction. This is the notion that we address now.

Schmid and Searle intended a non-representational account of the sense of us. An alternative is to stick to the representational theory of mind and aim for a more unified account that does not relegate important mental capacities to an unanalyzed non-intentional Background. We intend to show how we can capture what Searle calls the social presupposition of joint activities without postulating a problematic foundational non-intentional we-consciousness as Schmid suggested. The aim is to develop a representational theory of social cognition and joint action that does not lead to the circularity sketched in the first part of this paper. To this effect we rely on important and familiar recent developments in embodied cognitive science and developmental psychology.

Returning to the influential enactive approaches to cognition, one can see a strong emphasis on the notion of affordances, borrowed from Gibson’s (1979) ecological psychology, in an attempt to provide an anti-representationalist theory of basic forms of cognition (Gallagher 2005). The problem faced by such radical accounts is that they have to bridge a gap between non-representationalist forms of cognition (perception and action presumably) and higher forms of cognition, which are clearly representational, even by their own lights (imagination, memory, linguistic forms of cognition etc.) (Hutto and Myin 2013). Arguably, the notion of affordances does not scale up to these higher forms of cognition. A more unified approach that characterizes cognitive capacities in terms of mental representations is therefore preferable, and we intend to develop such a theoretical alternative to the enactivist analyses of affordances. Yet, as Ramsey (2007) has shown, representationalist approaches face what he calls the job description challenge, i.e. the task of clarifying what function a given representational state exercises in order to justify the introduction of such representational states in a theory of cognition.

4.1 Social perception, affordances, and social affordances

What we need is a characterization of a basic kind of perception that captures the difference between the social and the non-social. A good starting point is the notion of affordances as developed by Gibson (1979). On his view, affordances are relational properties, understood as possibilities for action and perception, arising from the relation between a specific agent (or organism) and the immediate environment. For example, a surface may be horizontal and rigid such as to allow me to walk on it. That makes it ‘walk-on-able’; it may also be ‘sit-on-able’ and ‘stand-on-able’ etc. Another environment may be such that certain creatures can climb up on it, while others cannot. Affordances are thus not fixed properties of objects like size, but emerge in relation to a specific organism and depend on the organism’s sensorimotor repertoire and bodily constitution. The agent’s bodily and sensorimotor capacities together with the properties of the environment determine the affordances offered by the environment to any one particular organism.3

In an important paper, Costall (1995) emphasizes that Gibson’s notion should be socialized. Part of such a socialization of affordances should be the recognition that our environment is often already shaped by cultural practices and other human interventions, often in order to serve specific needs, i.e. in order to afford certain actions. An infant’s primary encounter with the objects in her immediate surroundings always already “involves careful structuring by the parent, through the removal of distractions, presentation of the utensil in the right orientation, and so on. Thus our activity is further canalized not only by the form of the object but also by its socially structured setting” (Costall 1995, 472). Chemero (2003) makes the related point that the possibilities for acting are determined by several environmental factors, including social and cultural features (beyond the obvious purely physical ones), as well as features that are determined by the ‘form of life’ or culture an organism is immersed in.4

But Costall’s arguments seem to be focusing on the social and cultural dimension of the affordances provided by the physical environment, which is itself shaped by human intervention. He did not so much focus on the difference between the physical and social environment itself. Animals, humans in particular, provide a set of affordances to other fellow animals quite distinct from the actions afforded by the physical environment, even though this may be socially modulated.

Social affordances then form an important subset of affordances. During their first months of life, infants interact either with objects or with other people, but only at around the age of nine to twelve months do they engage in activities that involve both an object and another human being. Such activities of joint attention and engagement involve pointing to an object in order to show it to the caregiver, for example, or social referencing, and playing together (Tomasello 1999). When it comes to the interaction with other humans, appropriate actions and reactions involve an infant’s sensitivity to the aspects that distinguish people from objects and to the social meaning of a bodily expression. In this context, the faces, eyes and gaze of others are utterly important. Infants have a natural (possibly innate?) propensity to distinguish between social and non-social affordances. Newborns show a preference for face-related stimuli (Johnson et al. 1991; Johnson 2005). For example, another’s smile affords sympathetic responses on your part – a smile, or later in life a handshake, say –, while aggressive behavior may afford defensive reactions. Social affordances can prompt appropriate actions and reactions in a conversational context, culminating in the maintenance and extension of reciprocal relations. But due to the general flexibility and unpredictability of others in social interaction, cultural differences, and a higher degree of uncertainty, social affordances are richer and more complex than affordances provided by physical objects.

Enactive approaches rely heavily on the notion of affordances in their theories of cognitive phenomena but do not provide much analysis of how these cognitive acts proceed since they do not have at their disposal the notion of a computation or transformation of a mental representation. As a consequence, such approaches often sound even behavioristic when they claim that “experiencing organisms are set up to be set off by certain worldly offerings – that they respond to such offerings in distinctive sensorimotor ways that exhibit a certain minimal kind of directedness and phenomenality” (Hutto and Myin 2013, 19). The resulting picture is that cognition arises from the (unanalyzed notion of) coupling of organism and environment by way of responses to stimuli. A representationalist approach, by contrast, has the advantage of being able to allude to the computation of (different kinds of) mental representations that stand-in for aspects of the environment. In the following, we would like to demonstrate how the gist of the notion of an affordance can be retained in a representational approach to (social) cognition.

4.2 Core cognition, social perception, and interaction-oriented representations

By using the notion of a mental representation, we follow Carey (2009, 5) by simply referring to putative “states of the nervous system that have content, that refer to concrete or abstract entities (or even fictional entities), properties, and events”. At this stage, we do not want to commit us to one specific theory of mental representations but simply contend that this notion plays an important and useful explanatory role in a theory of cognitive capacities. We are open to the possibility that mental representations can have different formats, e.g. that they can be conceptual or non-conceptual (Gunther 2003) or even embodied in the sense of representing specific aspects of the body (Goldman and de Vignemont 2009). Mental representations can serve different functions in the cognitive system. Invoking them in a theory of cognition must be justified by providing such a function or functions. Below we say something more about what the relevant subset of representations that we introduce is supposed to do “for” the cognitive system. In this way, we will address Ramsey’s job description challenge in order to justify the representationalist approach against a purely enactive approach. But first, the move from a purely affordance-based approach to a representationalist approach must be motivated and clarified.

Various philosophers have argued for an analysis of affordances in terms of certain kinds of representations. Siegel (2014), for example, has investigated the question whether affordances are represented in perceptual experience. Her focus is on certain experiences “of the environment as compelling you to act in a way that is solicited or afforded by the environment”. In a way, “from your point of view, the environment pulls actions out of you directly…” (Siegel 2014, 40). Contrary to defenders of ecological or enactive approaches, she argues that such experiences represent the relevant affordances even though it seems that the subject feels immediately propelled to act in the given environment. The crucial step in her argument (which cannot be reconsidered here at length) is the claim that such experiences have “accuracy conditions”, i.e. that they can be right or wrong and can only be accurate if these conditions are satisfied. Since phenomenologically, any object X always looks as having property F, she argues, the relevant visual experience of X’s having F is accurate only if X has F (Siegel 2010, 27–76). Applied to experiences involving affordances, part of Siegel’s account is the claim that a soliciting aspect of such experiences could be represented with the content “X is to be phi’d”, as in, say, the stairs are to be climbed, the door is to be opened etc. Important for the present discussion is that Siegel provides good arguments for the claim that even affordances do not provide an exception to the general (traditional) assumption that perceptual experiences represent because they can be accurate or inaccurate. An ecological approach has difficulties to capture this intuition.

Gibson was hostile to the notion of representation, so he did not conceive of affordances in representational terms. Clark (1997) and Millikan (1995, 2004), by contrast, have suggested that affordances can (and should) be characterized in terms of the most basic kind of representations, namely “pushmi-pullyu-representations”, as Millikan calls them. Such representations are special in having not one but two directions of fit (Searle 1983). While beliefs have to match the world in order to be accurate (world-to-mind direction of fit) and intentions must be fulfilled be the (state of the) world in order to be fulfilled (world-to-mind direction of fit), pushmi-pullyu-representations are more basic because they are declarative and directive at the same time: They can be accurate or inaccurate in representing the environment as being such-and-such (like beliefs) while at the same time being able to be fulfilled or unfulfilled if the action they prescribe is not carried out or not carried out adequately (as is the case with intentions). Millikan (2004, 159) argues that “Gibsonians have generally assumed that if there were such things as inner representations they would have to be things calculated over, vehicles of inference, and hence, that the perception of affordances does not involve inner representations. But inner processes mediating the perception of and responses to Gibsonian affordances would certainly involve pushmi-pullyu representations, these being far more primitive than the representations Gibsonians reject.” If Millikan’s account is right, then pushmi-pullyu representations are widespread in nature. She usually illustrates them with the example of bee dances or other natural signs that are performed purposefully by animals. Because of their lack of cognitive sophistication, they are the right candidates for a representational analysis of affordances. Clark (1997) calls them “action-oriented representations” and we will follow him in using this terminology: “In representing (…) the environment as such a complex of possibilities, we create inner states that simultaneously describe partial aspects of the world and prescribe possible actions and interventions. (…) they say how the world is and they prescribe a space of adaptive responses.” (Clark 1997, 50) 5

Turning to an analysis of action-oriented representations, these representations have peculiar features that distinguish them from more sophisticated conceptual or propositional representations. First, given the characterization of affordances, they are action-specific, i.e. designed to represent the world in terms of possible actions. Second, they are egocentric, i.e. geared towards possible actions the agent herself can perform right now. Finally, they are highly context-dependent and may lead to relevant kinds of agent-object-coupling (Wheeler 2005, 199). In this sense, they are good candidates for providing an analysis of affordances.

So far, this recapitulation of representationalist analyses of affordances should sound familiar. Now, drawing on Clark, Millikan, and Wheeler’s frameworks, we propose a new move, namely, to translate this interpretation of affordances to an analogous interpretation of social affordances in terms of “interaction-oriented representations”. Herein lies the major difference to enactive accounts of cognition and social cognition (e.g. Gallagher 2005), which follow Gibson by taking the notion of affordance to be non-representational. Based on the close link between action and perception, this proposal introduces a basic distinction between representations that constitute a coupling between an agent and an object on the one hand, and representations that constitute a coupling between an agent and another agent on the other hand. With respect to the problem addressed in this paper, our proposal is that we encounter others (unlike trees etc.) by perceiving (representing) them in terms of social affordances, as affording interaction and collaboration. On a very basic level of action-perception-cycles, we distinguish between objects and agents. More specifically, interaction-oriented representations have the function to signal the presence of a special kind of object (or entity), namely, an object that is at the same time a subject, i.e. an agent being able to reciprocate.

Interaction-oriented representations can be characterized analogous to action-oriented representations, with the important difference that they are not about objects or the physical environment but about other persons or the social environment. First, they are interaction-specific in prescribing (or offering) certain kinds of interaction with the other agent who is perceived as another agent. Second, they are egocentric because they are concerned with possibilities for interaction and joint action provided for the perceiving agent. Finally, they are highly context-dependent since they are concerned with possible interactions between myself and the other agent (or agents) in this given situation. I represent the other agent as affording interaction and joint action such that my representation of the other (a) says something about the world (this is an agent like me), and (b) says something about what I should/could do with them, namely I should/could approach them in order to collaborate or negotiate our roles in a joint activity because they are potential candidates for joint activities.

In order to address what Ramsey (2007) calls the “job description challenge”, we have to say not only something about what these representations do, but also something about their mechanisms. Evolutionarily speaking, being able to draw this distinction between something that can reciprocate and something that cannot is very useful. Our suggestion in this paper is that interaction-oriented representations are basic representations that have the function to inform an organism about the presence of another agent (or agents) with the capacity to reciprocate. This “proper function” of such inner representations can be applied to the mechanisms whose function it is to produce and to use inner representations: Quite generally, “when functioning properly, inner-representation-producing mechanisms produce representations in response to, and appropriate to, situations in which the individual organism finds itself.” (Millikan 1995, 188) In order to reduce uncertainty about the specific features of the immediate environment, it is evolutionarily useful for an organism to be able to draw this distinction on a very basic perceptual level. That is, it should possess innate mechanisms that are sensitive to the basic distinction of being able or not being able to reciprocate. Of course, like all mechanisms, such mechanisms are likely to break down or malfunction under certain circumstances. For example, the famous studies by Heider and Simmel (1944) suggest that people often tacitly attribute agency to too many objects. But in the wild, false positives are much less harmful than false negatives.

If data from developmental psychology are correct, then human children really do possess an innate preference and capacity to process social affordances. This could provide them with the relevant skills to just see when (and that) someone else is a possible candidate for social interaction. In this sense, the problem addressed in this paper can be solved by way of introducing specialized mental representations underlying social perception. Given that we – especially young infants – almost always find ourselves in a social context, this social perception is fostered and enhanced throughout childhood. Social perception enabled by interaction-oriented representations should be seen as a basic capacity that could possibly be enabled by modules in the brain specifically devoted to the task of signaling the presence of possible candidates for social activities.

Developmental psychologists have provided empirical evidence for such special-purpose mechanisms. Here, we will only mention some of the relevant data. As already mentioned above, young infants first perceive their environment in terms of both affordances (physical environment) and social affordances (social environment). In dyadic social relations, young infants demonstrate a high sensitivity for the reciprocity of interactions with the caregiver, and if this reciprocity breaks down, as it is the case in the still-face procedure (e.g. Tronick et al. 1978), they use all their bodily abilities to resume the reciprocal interaction. As soon as infants are four months old, proto-conversations with the caregivers tend to become more and more stable and frequent (Rochat et al. 1999). Between nine to twelve months of age a “revolution” motivates them to engage in much more complex triadic (joint) intentional relations involving objects and fellow humans (Barresi and Moore 1996).6 In this regard, human beings are often taken to be unique in the animal kingdom (Tomasello 1999). Up to this point, they have learned to exploit affordances and social affordances appropriately, but only in isolation. Now, they are in the position to act both in accordance with a physical object affordance and a social affordance at the same time. But these data suggest that from the very start, infants are capable of distinguishing social from physical stimuli. It is thus a very basic cognitive capacity.

A framework from developmental psychology that may not only help us understand how to think about such basic representations but also provides evidence for our claim is based on the central notion of “core cognition” (Spelke 2000; Kinzler and Spelke 2007; Carey 2009). Spelke distinguishes various specialized systems that form part of the foundations of knowledge. Among them there are systems for representing objects (based on principles like cohesion and continuity), numbers (bases on principles like imprecision, applicability to diverse entities etc.), and agents (based on principles like self-propelled motion, goal-directedness, and reciprocal interaction). While the number of such systems is yet unclear, there is evidence that a few such systems form the foundation of knowledge, whereas new skills and concepts emerge from the combination of such systems. Carey distinguishes between several types of core cognition, of which for this paper object cognition and agency cognition are the relevant ones. Children distinguish agents from objects from very early on. They do so based on several factors, such as the (type of) movement and the looks of the entity (see Carey 2009, pp. 163–173 for discussion).

Regarding agent representations, Carey argues that “[…] it would not be surprising that evolution bequeathed us humans with core cognition of agents, agents’ interactions with each other, and agents’ interactions with the physical world, articulated in terms of representations of goals, information, and attentional states” (Carey 2009, p. 157). Agents behave importantly different from objects and are thus perceived differently, e.g. as being capable “of self-generated motion and of resisting forces acting upon them” (Carey 2009, 158). Such self-generated motion can be directed towards another agent and is thus perceived as the possibility of reciprocity.

Kinzler and Spelke (2007, 260) speculate about the possible existence of a system for “identifying and reasoning about potential social partners and social group members” because quite generally, human beings seem to be “predisposed to form and attend to coalitions … whose members show cooperation, reciprocity, and group cohesion”. Whereas this points already in the direction of classifications into “us vs. them”, an important part of such a system would be the recognition of agents being able to reciprocate which is what we are looking for.

The impact of these data and hypotheses is that although the content and format of core representations are debatable, core cognitive mechanisms are inherently representational. Thus, there is a good case to be made for a representationalist theory of a social perception mechanism, which has the function to compute information specifically about agents being able to reciprocate. With respect to the problem we addressed here in this paper, the circularity besetting enactive approaches and we-mode approaches can be circumvented by invoking specialized perceptual mechanisms informing an agent about the presence of a possible candidate for (social) interaction and joint action. Since this function served by social perception belongs to social understanding, it follows that such approaches cannot fully explain social cognition in terms of joint action or interaction. Joint action and interaction presuppose this capacity to recognize someone as a candidate for such activities.

5 Conclusion

We want to conclude by relating the results of the third and fourth part to the discussion in the second part. The coupling that is presupposed by de Jaegher et al. in their attempt to ground social cognition in interaction can be explained by our approach: On the basis of cognitive processing of social affordances, an agent may actively establish the relevant coupling that can then lead to a reciprocal communication loop (Frith 2007, 175). The coupling can be established simply by exploiting the social affordances – by way of the computation of interaction-oriented representations – emerging from the perceptual encounter with the other. Thus, this way of looking at de Jaegher’s account does provide an explanation (albeit in representational terms) of how the coupling and reciprocity in social interaction can get off the ground.

Gallotti and Frith suggest to fill the role that interaction plays in de Jaegher’s et al. account by what they call a we-mode. As we have seen they seem to suggest that an agent can simply switch from I-mode to we-mode given the situation. But they do not say on what basis an agent may be able to decide in which mode they should approach the environment. The present account decides this question on the level of perceptual experience, which proceeds in terms of either action-oriented or interaction-oriented representations (or both at the same time). These can form the basis for the development of propositional attitudes in the I-mode and we-mode respectively. As we have outlined, the representations underlying the perception of affordances and social affordances are more basic than beliefs and intentions. They have a twofold direction of fit: they describe a state of affairs (the presence of an object or agent, say) and prescribe actions to be performed given that state of affairs (actions directed towards the object or the agent, say). Upon an ensuing interaction, joint action plans may then be formed, culminating either in collaborative or competitive projects. Partly, relevant factors of the situation will be important for the question whether the agents remains in I-mode or switch into we-mode. But the present proposal can in this way provide a cognitive foundation for the we-mode, which has not been provided by Gallotti and Frith.

The present account also bypasses the problems that beset Schmid’s proposal of postulating a primitive pre-reflective plural self-awareness, because as we saw, this notion is in danger of being inconsistent. The perceptual basis for joint action and interaction in terms of a perceptual sensitivity for social affordances, analyzed as interaction-oriented representations, does not yet amount to an awareness of plural self-consciousness. Since such awareness must arguably be reflective, it depends on the perceptual base developed here. Another agent is perceived as someone with the capacity to reciprocate. That (among other features) distinguishes agents importantly from mere physical objects. Even newborns seem to be able to draw an immediate distinction between physical and social objects (i.e. agents). Understood in this way, social perception even provides the basis for the plural awareness that Schmid is aiming at since it does not yet amount to a we-perspective itself but only enables its emergence. This kind of social perception does not presuppose yet another act of social cognition, for, arguably, it constitutes the most basic kind of social cognition. Nor does it presuppose any joint activity (like the constitution of a team or joint deliberation to act jointly). In this way, it is a more promising route to solving the problem formulated in the second part of the paper. Such a new account should be developed in more detail and in relation to other minimal approaches and aspects of social cognition, such as Butterfill and Apperly’s (2013) ideas on minimal mindreading, and Tollefsen and Dale’s (2011) ideas on entrainment and alignment as a basic element in joint action. 7


  1. 1.

    An anonymous reviewer doubted that our interpretation of de Jaegher et al.’s interactionism in terms of joint action is adequate. The main reason for this doubt seems to be differences between social interaction and participatory sense-making on the one hand and paradigm cases of collective intentionality on the other (as analyzed by Searle, Bratman, and others). Yet, it is far from clear that these cognitively sophisticated examples rule out more primitive forms of joint cognitive processes (Tollefsen & Dale 2011) and that cognitive processes, in order to count as joint, must amount to collective intentionality in the full-fledged sense. Moreover, it is not clear what else this strong anti-individualism should amount to which is important for the purposes of this paper. De Jaegher et al. always put this forward in conjunction with an emphasis on the joint and active character of social understanding. In another passage, Jaegher, H. de et al. (2016) write, with a focus on social neuroscience: “The hypothesis we discuss here concerns an occurrent instance of social cognition. We claim that a normal adult human brain in isolation is insufficient for a typical instance of social cognition.”

  2. 2.

    De Jaegher and Di Paolo do not make clear in what sense their social enactivism is autopoietic. The feature of autopoiesis seems to be difficult to translate into the social domain since it refers to a process of self-production but neither the agents can be said to produce themselves, nor can we consider the interaction itself to produce itself. While it is clear what the autopoietic elements in biological organisms are, this is not so clear in the case of social phenomena. As far as we know, the only account attempting to translate the notion of autopoiesis literally to social phenomena is Niklas Luhmann’s social autopoiesis (Luhmann 2008). But on this view, a) systems are importantly isolated, and b) Luhmann considers the elements of social systems not to be agents but communications, independent from agents. Therefore, this view cannot be of any use for de Jaegher and Di Paolo’s enactivism.

  3. 3.

    It should be noted that the increased interest in the notion of affordances has produced a number of different interpretations of the notion (e.g. Turvey 1992; Reed 1996; Chemero 2009, Rietveld and Kiverstein 2014). Yet, over and above the minimal definition given in the text we do not intend to commit ourselves to any particular account of affordances. Where our interpretation differs from the mainstream usage in ecological psychology is that we aim at providing a representationalist account of affordances and social affordances in particular.

  4. 4.

    A focus on the social dimension of affordances is not new. Recently, Abramova and Slors (2015) presented their idea on direct social perception in terms of affordances. Their view, however, differs in an important way from ours. They argue that coordination happens as a result when the actions of and/or affordances for one agent shape the field of affordances for another agent. But this is only the case given a shared intention. This means that for Abramova and Slors affordances play a role once two agents are involved in a (social) interaction. The affordance is about seeing the other in their world-context. Although this is obviously important, and might be compatible with our ideas on social affordances, Abramova and Slors are not looking for an understanding of the ‘sense of us’ in any way.

  5. 5.

    Jacob and Jeannerod (2003, 180ff) provide a further yet different representational analysis of affordances. They point to the fact that Gibson did not consider the bifurcation of functions for vision: next to visual perception vision can be for action. They argue that the latter should be associated with affordances while the former should not.

  6. 6.

    Raczaszek-Leonardi, Nomikou, and Rohlfing (2013) propose that early forms of intentionality arise from “initial, perhaps automatic, ‘moving with others’, which is sculpted in multiple social episodes to become ‘acting with others’” (Raczaszek-Leonardi et al. 2013, 211). That is, the structure of activities emerges (is made evident) through repetitive behaviors by the caretakers. From repetition expectations may emerge enabling conventionalized behaviors. The main mechanism for this shaping is the education of action-perception-cycles to enable the child to pick up and create interactive affordances.

  7. 7.

    The authors' project on Situated Cognition is generously funded by the Volkswagen Foundation.


  1. Apperly, I., & Butterfill, S. A. (2009). Do humans have Two systems to track beliefs and belief-like states? Psychological Review, 116(4), 953.CrossRefGoogle Scholar
  2. Barresi, J., & Moore, C. (1996). Intentional relations and social understanding. Behavioral and Brain Sciences, 19, 107–122.CrossRefGoogle Scholar
  3. Butterfill, S. A. (2013). Interacting mindreaders. Philosophical Studies, 165(3), 841–863.CrossRefGoogle Scholar
  4. Butterfill, S. A., & Apperly, I. A. (2013). How to construct a minimal theory of mind. Mind and Language, 28(5), 606–637.CrossRefGoogle Scholar
  5. Carey, S. (2009). The origin of concepts. Oxford: Oxford University Press.CrossRefGoogle Scholar
  6. Chemero, A. (2003). An outline of a theory of affordances. Ecological Psychology, 15(2), 181–195.CrossRefGoogle Scholar
  7. Chemero, A. (2009). Radical embodied cognitive science. Cambridge, Mass: MIT Press.Google Scholar
  8. Clark, A. (1997). Being there: Putting brain, body, and world together again. Cambridge, Mass.: MIT Press.Google Scholar
  9. Costall, A. (1995). Socializing affordances. Theory & Psychology, 5(4), 467–481.CrossRefGoogle Scholar
  10. Csibra, G., & Gergely, G. (2009). Natural pedagogy. Trends in Cognitive Sciences, 13(4), 148–153.CrossRefGoogle Scholar
  11. de Jaegher, H., & Di Paolo, E. (2007). Participatory sense-making. Phenomenology and the Cognitive Sciences, 6(4), 485–507.CrossRefGoogle Scholar
  12. de Jaegher, H., Di Paolo, E., & Gallagher, S. (2010). Can social interaction constitute social cognition? Trends in Cognitive Sciences, 14(10), 441–447.CrossRefGoogle Scholar
  13. Eilan, N. (ed.) (2014). The Second Person. Special Issue of Philosophical Explorations 17(3).Google Scholar
  14. Frith, C. D. (2007). Making up the mind: How the brain creates Our mental world. Oxford: Blackwell.Google Scholar
  15. Frith, C. D., Frith, U. (2012). Mechanisms of social cognition. Annual Review of Psychology, 287–313.Google Scholar
  16. Gallagher, S. (2001). The practice of mind: Theory, simulation or primary interaction? Journal of Consciousness Studies, 8(5–7), 83–108.Google Scholar
  17. Gallagher, S. (2005). How the body shapes the mind. Oxford: Oxford University Press.CrossRefGoogle Scholar
  18. Gallotti, M., & Frith, C. (2013). Social cognition in the We-mode. Trends in Cognitive Sciences, 17(4), 160–165.CrossRefGoogle Scholar
  19. Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Houghton Mifflin.Google Scholar
  20. Goldman, A. (2006). Simulating minds: The philosophy, psychology, and neuroscience of mindreading. Oxford: Oxford University Press.CrossRefGoogle Scholar
  21. Goldman, A., & de Vignemont, F. (2009). Is social cognition embodied? Trends in Cognitive Sciences, 15(10), 154–159.CrossRefGoogle Scholar
  22. Gopnik, A., & Wellman, H. M. (1992). Why the Child’s theory of mind really is a theory. Mind and Language, 7(1–2), 145–71.CrossRefGoogle Scholar
  23. Gordon, R. M. (1986). Folk psychology as simulation. Mind and Language, 1(2), 158–71.CrossRefGoogle Scholar
  24. Gunther, Y. (ed.) (2003). Essays on nonconceptual content. Cambridge: MIT Press.Google Scholar
  25. Heider, F., & Simmel, M. (1944). An experimental study of apparent behavior. The American Journal of Psychology, 57(2), 243–259.CrossRefGoogle Scholar
  26. Hutto, D. D. (2004). The limits of spectatorial folk psychology. Mind and Language, 19(5), 548–73.CrossRefGoogle Scholar
  27. Hutto, D. D., & Myin, E. (2013). Radicalizing enactivism. Cambridge, Mass.: MIT Press.Google Scholar
  28. Jaegher, H. de, DiPaolo, E., Adolphs, R. (2016).What does the interactive brain hypothesis mean for social neuroscience. A dialogue. Phil. Trans. Royal Society B 371, 20150379 (manuscript).Google Scholar
  29. Johnson, M. H. (2005). Subcortical face processing. Nature reviews. Neuroscience, 6(10), 766–774.Google Scholar
  30. Johnson, M. H., Dziurawiec, S., Ellis, H., & Morton, J. (1991). Newborns’ preferential tracking of face-like stimuli and its subsequent decline. Cognition, 40(1–2), 1–19.CrossRefGoogle Scholar
  31. Kinzler, K., & Spelke, E. (2007). Core systems in human cognition. Progress in Brain Research, 164, 257–264.CrossRefGoogle Scholar
  32. Luhmann, N. (2008). The autopoiesis of social systems. Journal of Sociocybernetics, 6(2), 84–95.Google Scholar
  33. Meltzoff, A. N. (2007). “Like me”: a foundation for social cognition. Developmental Science, 10(1), 126–134.CrossRefGoogle Scholar
  34. Millikan, R.G. (1995). Pushmi-pullyu representations. In: Philosophical Perspectives 9: AI, connectionism, and philosophical psychology. 185200.Google Scholar
  35. Millikan, R. G. (2004). Varieties of meaning. Cambridge, Mass.: MIT Press.Google Scholar
  36. Noë, A. (2004). Action in perception. Cambridge, Mass: MIT Press.Google Scholar
  37. Overgaard, S., & Michael, J. (2013). The interactive turn in social cognition research: A critique. Philosophical Psychology. doi: 10.1080/09515089.2013.827109.Google Scholar
  38. Raczaszek-Leonardi, J., Nomikou, I., & Rohlfing, K. J. (2013). Young children's dialogical actions: the beginnings of purposeful intersubjectivity. IEEE Transactions on Autonomous Mental Development, 5(3), 210–221.CrossRefGoogle Scholar
  39. Ramsey, W. M. (2007). Representation reconsidered. Cambridge: Cambridge University Press. Google Scholar
  40. Reddy, V. (2008). How infants know minds. Cambridge, Mass.: Harvard University Press.Google Scholar
  41. Reed, E. S. (1996). Encountering the world. New York: Oxford University Press.Google Scholar
  42. Rietveld, E., & Kiverstein, J. (2014). A rich landscape of affordances. Ecological Psychology, 26(4), 325–352.CrossRefGoogle Scholar
  43. Rochat, P., Querido, J. G., & Striano, T. (1999). Emerging sensitivity to the timing and structure of protoconversation in early infancy. Developmental Psychology, 35(4), 950–957.CrossRefGoogle Scholar
  44. Sartre, J.-P. (1956). Being and nothingness. London: Routledge.Google Scholar
  45. Schilbach, L., Timmermans, B., Reddy, V., Costall, A., Bente, G., Schlicht, T., & Vogeley, K. (2013). Toward a second-person neuroscience. Behavioral and Brain Sciences, 36(4), 393–414.CrossRefGoogle Scholar
  46. Schmid, H. B. (2014). Plural self-awareness. Phenomenology and the Cognitive Sciences, 13(1), 7–24.CrossRefGoogle Scholar
  47. Searle, J. R. (1983). Intentionality: An essay in the philosophy of mind. Cambridge, Mass.: MIT Press.CrossRefGoogle Scholar
  48. Searle, J. R. (2002). Collective intentions and actions. In Consciousness and language (pp. 90–105). Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  49. Shoemaker, S. (1968). Self-reference and self-awareness. Journal of Philosophy, 65, 555–567.CrossRefGoogle Scholar
  50. Siegel, S. (2010). The contents of visual experience. Oxford: Oxford University Press.Google Scholar
  51. Siegel, S. (2014). Affordances and the content of perception. In B. Brogaard (Ed.), Does perception have content? Oxford: Oxford University Press.Google Scholar
  52. Spelke, E. (2000). Core knowledge. American Psychologist, 55, 1233–1243.CrossRefGoogle Scholar
  53. Thompson, E. (2007). Mind in life: Biology, phenomenology, and the sciences of mind. Harvard: Harvard University Press.Google Scholar
  54. Timmermans, B., Schlicht, T., & Schilbach, L. (2013). Social interaction builds the we-mode. Comment on: Gallotti & frith, social cognition in the We-mode. Trends in Cognitive Sciences, 17(4), 160–165.CrossRefGoogle Scholar
  55. Tollefsen, D., & Dale, R. (2011). Naturalizing joint action: A process-based approach. Philosophical Psychology, 25(3), 385–407.CrossRefGoogle Scholar
  56. Tomasello, M. (1999). The cultural origins of human cognition. Harvard: Harvard University Press.Google Scholar
  57. Tronick, E., Als, H., Adamson, S., & Brazelton, B. (1978). The infant’s response to entrapment between contradictory messages in face-to-face interaction. Journal of the American Academy of Child Psychiatry, 17, 1–13.CrossRefGoogle Scholar
  58. Turvey, M. T. (1992). Affordances and prospective control: An outline of the ontology. Ecological Psychology, 4(3), 173–187.CrossRefGoogle Scholar
  59. Wheeler, M. (2005). Reconstructing the cognitive world: The next step. Cambridge, Mass: MIT Press.Google Scholar
  60. Zahavi, D. (2014). Self and other: Exploring subjectivity, empathy, and shame. Oxford: Oxford University Press.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  1. 1.Institute for Philosophy IIRuhr-Universität BochumBochumGermany

Personalised recommendations