The overall topic of this article is the question of how we should understand everyday collective attitudes, for instance, the intention of two persons to go for a walk together or the intention to collectively paint the walls of a flat. Discussions concerning collective attitudes have so far for the most part concentrated on the structure of full-fledged or explicit collective attitudes and collective intentional states as typically expressed in sentences like “We will go for a walk later on”. Accordingly, authors have presented analyses of the content (e.g., Bratman 1999), or the mode (e.g., Tuomela 2007, 2013), or the subject (e.g., Gilbert 2006, 2013) of such collective attitudes. What is still neglected is an exploration of the underpinnings of everyday collective attitudes, especially an investigation of their experiential and social-cognitive underpinnings.Footnote 1 I take such experiential features to be one of the different mental features necessary for collective attitudes, such as (mutual) beliefs and goals, desires, commitment and imagination (of who takes part in the attitude in question). The thesis I will argue for is that explicit collective attitudes are grounded in a certain experiential state, which I will call a “sense of us”—a term that was first brought up by John Searle (1990). More specifically, I will show in what way such an experiential state pre-structures explicit collective attitudes such that it is necessary for those collective attitudes to obtain. In support of my thesis I will explore some relevant experiential phenomena such as social micro-interaction and forms of nonverbal behavior. My thoughts will be based on the general claim that for many types of mental phenomena it is theoretically valuable to assume different degrees: in terms of the way they are present to a subject, whether they are implicit or explicit, and in terms of their structure, whether they are pre-reflective or reflective, pre-conceptual or conceptual.Footnote 2

My argument is structured as follows: (1) After some preliminary clarifications (2) I will motivate my thoughts by giving an argument for the theoretical relevance of taking experiential processes into account when exploring collective attitudes. (3) I will then outline what I call a “minimal sense of us” by identifying core prerequisites supported by some empirical findings from psychology and cognitive neuroscience. (4) In the last part I will turn to some potential disturbances of the sense of us due to implicit stereotypes. The effects implicit stereotypes have on emergent collective attitudes are intended to indirectly demonstrate the grounding role of the sense of us for explicit collective attitudes.

1 Preliminary clarifications

The target phenomena of the following analysis are experiential underpinnings of everyday collective attitudes, which may be expressed in sentences such as: “We intend to go to the cinema”, “We will paint the house” and “We agree upon p”. One important constraint to the target phenomenon is that I shall only refer to small and informal groups and their collective attitudes. I thereby would like to exclude organized groups such as companies or political parties, which, due to their own complex structure, raise further questions which I will not address here. A further constraint on the target phenomenon is that the persons in question, the potential subjects of a collective attitude, have to be present at the same time t and at the same location l.Footnote 3

Entertaining a collective attitude implies a collective perspective, which I will term “we-perspective”. The term brings to the forefront several important properties. Literally speaking, having an (individual) perspective or a point of view means to have access to an object from a certain location. However, in an everyday use of the term, “perspective” more often refers to a relation a person bears towards an object or a state of affairs regardless of her location (Campos and Gutiérrez 2015).Footnote 4 The term is thus taken in the sense of a relation of intentionality or more narrowly to a proposition. This description is yet inaccurate since “perspective” other than a relation of intentionality or “propositional attitude” puts emphasis on the particular relation; we normally use the term “perspective” not to point to a propositional attitude such as a belief. Rather, it serves to explicitly refer to the relation a person or oneself entertains to an object or state of affairs. Now a “we-perspective” is relational in various respects: by adopting a we-perspective two or more persons bear a relation to an object or a state of affairs as well as to each other. The latter is not necessarily captured by the term “collective perspective” for it may be correctly applied to situations in which, for instance, the attention of several persons walking down the street is suddenly drawn to an accident just happening. The persons are collectively observing what is taking place without necessarily being aware of each other doing the same thing. Having a we-perspective, on the other hand, means being specifically related to the same object as well as to (at least) another person. This description echoes what is standardly understood as “joint attention”. Two or more persons are said to be engaged in joint attention if they are mutually aware of the other’s attention to the same outside object (e.g., Tomasello 1995; Eilan 2005).Footnote 5 This triadic attentional engagement with a clear focus is a nice example of how a we-perspective is instantiated. The agents’ perspective involves a two-fold relation: to an object and to each other. Note, however, that the present notion of a we-perspective is intended to be broader for it is supposed to be applied to various types of collective intentional states such as to collective intentions, desires, thoughts and beliefs.Footnote 6 And, as mentioned above, I will focus on the experiential underpinnings of such a we-perspective.

How should one understand “experiential underpinnings” of a we-perspective, and what is their theoretical status? By “underpinnings” I mean “precursors” of a collective attitude instantiating a we-perspective. This term is yet highly ambiguous: it may, for instance, be understood in a phylogenetic way and would then refer to evolutionary developmental stages of collective behavior (e.g., Tomasello 2014). Moreover, “precursor” may as well be understood in an ontogenetic way thereby taking individual developmental stages into account (e.g., Brinck et al. 2017; Michael and Székely 2018). Furthermore, the term might also be taken in a genealogical way such that a collective intentional state results (or may result) from an experiential state in a given situation. This is one aspect of the meaning I will make use of in the following. Importantly, however, I will understand this relation also to mean that the experiential state has a (pre-)structure that is conceptually necessary for the intentional state. Due to this structure the experiential state, then, is a mental feature in virtue of which a collective attitude obtains. The sense of “precursor” I will make use of in the following comprises these two aspects (genealogical and conceptual).

Note that the overall aim of this paper is to show in what sense experiential states can be “underpinnings” or “precursors” of collective intentional states. It is therefore necessary to establish a hypothesis about this very relation which will structure my argument: a precursor is a two-digit relation expressing an asymmetric dependence between X and Y (where X is the precursor of Y). Broadly construed, it is a grounding relation such that X grounds Y when Y obtains in virtue of X (e.g., Trogdon 2013). Accordingly, X and Y are—by hypothesis—different mental states or different structures of a mental state: Y is to be understood as a thought, it is thus an intentional state directed at an object having a semantically evaluable content; it is typically conscious. X, by contrast, is an experiential state with a phenomenal content having a what I will call “pre-intentional” structure; it is often not in the focus of attention (it may, however, be peripherally conscious and eventually become more salient), and it is often automatically activated. It is important to note that by “pre-intentional” I mean that the state in question whose content is phenomenal also exhibits a certain structure. It is a structure that I would like to describe—in a preliminary sense—as weakly or tacitly intentional, in contrast to the structure of a full-fledged intentional state. As such it is not “non-intentional”.Footnote 7 What this precisely means should become more clear in Sect. 3. At this point, however, it is worth emphasizing that this description allows to conceive of phenomenal, experiential states as representational states.Footnote 8

Why should such a grounding relation hold between some experiential state having phenomenal content and full-fledged intentional states having semantically evaluable content? My claim is further motivated by the general view according to which intentional states are, at least in part, constituted by phenomenal states (e.g., Searle 1991; Woodward 2016; Kriegel 2013). For instance, according to this view, a perceptual intentional state of green leaves is constituted by a phenomenal experience of green color and shapes of leaf. The perceptual intentional state has its content in virtue of a phenomenal state, that is, according to the above considerations, in virtue of its phenomenal pre-intentional content. The perceptual intentional state thus results from and is constituted by the experiential state. Approaches to phenomenal intentionality are typically distinguished according to strong, moderate and weak claims about phenomenal intentionality (Bourget and Mendelovici 2019). In the following, I will make use of the moderate thesis, according to which intentional states are at least partly grounded in phenomenal states (proponents of this view include, e.g., Bourget 2010; Kriegel 2013). My aim is to investigate whether this idea can rightly be applied to the case of some certain experiential state, a “sense of us” (as I will call it), and explicit collective intentional states. This would then mean that the experiential state in question has to be taken into account for an adequate understanding of collective intentionality.

In the next section, I will further motivate my thoughts by pointing to theoretical problems that may occur when experiential underpinnings or precursors of explicit intentional states are not considered.

2 Theoretical relevance of experiential precursors of collective intentional states

Proponents of approaches to collective intentionality predominant in the current debate might at this point already object why one should bother to look at experiential underpinnings of collective intentionality in the first place. As they are interested in the question of what features are responsible for the ‘collectivity’ of intentional states, a detailed analysis of those states is all what’s needed, so they could argue. I will show that this is wrong and will use Margaret Gilbert’s account of collective intentionality as an example. My argument may be transferred to other accounts which, like Gilbert, also focus on an analysis of full-fledged collective intentional attitudes typically expressed in sentences as “We’ll meet in the café later on”.

Gilbert (2006, 2013) famously argues that attitudes are collective in virtue of a plural subject by which those attitudes are held. Plural subjects consist of individual subjects being specifically related to each other. This specific relation between subjects has to be created or “brought about” by what Gilbert terms “joint commitment” to φ. According to Gilbert, “[t]he parties jointly commit to φ as a body” (2006, p. 137) by which she means that individual subjects thereby express their willingness to jointly φ, to jointly attain a certain goal. Plural subjects only exist in virtue of conscious acts of individuals who are being part of the group as plural subject. And this act is typically brought about in form of verbal or nonverbal communication. This description, however, seems problematic in various respects. The first problem comes into view when we take a look at Gilbert’s approach from a conceptual perspective.Footnote 9 According to Gilbert, plural subjecthood is considered to be constitutive of collective intentionality. And joint commitment is a conceptually necessary condition for plural subjecthood. The concept of joint commitment, however, seems to already presuppose collective intentionality: expressing the willingness to jointly commit to φ already requires some kind of collective attitude and joint commitment on the part of the persons who perform this very speech act. More specifically, from a pragmatist perspective jointly shared meaning entails shared communicative aims and interrelated intentions that interlocutors are jointly committed to. This suggests that Gilbert’s account is circular (see Tollefsen 2002 for a related critique).Footnote 10 In order to avoid circularity, a notion of collectivity is needed, which can be shown to be constitutive of collective intentionality and which yet does not presuppose collective intentional attitudes. I will argue that the minimal sense of us (to be further specified) qualifies as such a notion so that a non-circular account of collective intentionality is possible.

A further and related problem in Gilbert’s account has been raised by Schmid (2014). The problem particularly concerns the process of forming a plural subject by joint commitment. An act of joint commitment paradigmatically consists in an exchange of words or meaningful glances, by which the communicating partners commit themselves to, say, a common goal; they establish a “we”. However, it seems that a “we” needs to be already in place in this very act of communication, that is, in the act of joint commitment. I would like to specify the problem as follows: if the function of joint commitment is to fix the referent of “we” (for instance, in the expression “We will go for a walk later on”), the act of joint commitment (“Let us go for a walk”, “Okay, let us do that”) already requires determining who is the referent of “us”. In order to achieve this, another act of joint commitment would have to be carried out—and so forth. So if this is correct then the approach also faces the danger of an infinite regress. Is there a way of stopping this? I will argue that this problem can be solved if one can bring into view a form of experiential collectivity, which is (in the genealogical sense of “precursor”) appropriately related to collective attitudes. It should be noted that the analysis I will suggest can—at least in this respect—be seen as a coherent extension of Gilbert’s approach. Gilbert at some point writes that “Readiness for a given joint commitment may gradually emerge over time” (Gilbert 2006, p. 140). Unfortunately, she doesn’t say anything further about this. Yet it seems that a detailed account of what this actually means may be able to solve the problem of infinite regress. Importantly, however, things are a bit different concerning the circularity charge I raised above. My argument challenges the concept of plural subjecthood in Gilbert’s approach. It conceptually requires a notion of collectivity, which does not presuppose full-fledged collective intentionality and plural subjecthood. And the notion of what I call a “minimal sense of us” may provide the necessary equipment (in the conceptual sense of “precursor”).

3 Minimal sense of us

I will argue that a relevant precursor, which is constitutive of collective attitudes and explicit joint commitment is a “minimal sense of us” as I will call it. Few existing approaches have addressed related phenomena. John Searle has introduced the term “sense of us”, by which he means “a sense of ‘the other’ as an actual or potential agent like oneself in cooperative activities” (Searle 1990). Unfortunately, Searle leaves it at this rather vague description. Interestingly, however, he considers the sense of us to be part of what he calls a “Background” to which he counts kinds of non-intentional phenomena such as certain abilities and dispositions. According to Searle, these are necessary for intentional states to function: intentional states only determine conditions of satisfaction relative to non-intentional phenomena. While I agree with Searle that the sense of us (to be further specified) is a necessary condition for collective intentional states, I disagree with him that the sense of us is void of any even weak intentional structureFootnote 11; the following analysis will make this clear.

Hans Bernhard Schmid has brought the notion of “plural self-awareness” into the discussion (Schmid 2014). By this notion Schmid means a primitive plural mental state held by two or more persons. The approach is construed in close analogy to an account to individual pre-reflective self-consciousness. Schmid’s approach has been criticized on various grounds and has been shown to be problematic (see, e.g., Crone 2018; Martens 2018).

Schmitz (2018) introduces an account of collective intentionality in terms of representing others as co-subjects. According to Schmitz, this awareness of co-subjectivity—considered as the mode of collective intentionality—constitutes collectives. Schmitz identifies a nonconceptual form of collective self-awareness in joint attention and action which he distinguishes from a conceptual form of collective self-representation effective in collective propositional attitudes. While I agree with Schmitz’s analysis of nonconceptual collective self-awareness in some parts, my argument goes a different way: first, I believe his qualification of collective self-awareness as “nonconceptual” to be too strong as this seems to exclude the possibility that collective self-awareness bears a minimal conceptual structure. A given mental state bears such a minimal conceptual structure if it presents the world (including ourselves) a certain way and is sensitive to concept application. In my view, it seems adequate to assume such a minimal conceptual structure in collective self-awareness. Moreover, unlike Schmitz, I will try to show that an experiential form of collective self-awareness is constitutive of collective intentionality in a broader sense, that is, also of collective propositional attitudes. My objective in the following is to unearth conceptual relations between different layers or levels of collective intentionality: between the level of an experiential phenomenon and the level of full-fledged intentionality.Footnote 12

Other approaches particularly take the specific phenomenology of joint action into focus. Pacherie (2013), for instance, explores the difference between the agency one experiences in individual actions and the sense of agency in joint actions. Based on different types of action she identifies mechanisms of action specification and control as being responsible for the sense of agency. Pacherie is thus concerned with a specific experiential property accompanying action, which in most cases already presupposes a collective intention and a shared goal on the part of the agents. The same applies to Butterfill’s (2018) approach, in which he identifies mechanisms underlying coordination in joint action. My argument, by contrast, concerns the question of how a specific collective experiential state must be structured in order to be (amongst other conditions) constitutive of collective attitudes including joint intentions.Footnote 13 The approach I will argue for is thus also different from the account on what John Michael, Natalie Sebanz and Günther Knoblich call a “sense of commitment” in recurring instances of joint action. They argue that in such cases agents are collectively committed to φ without requiring a renewing explicit agreement (Michael et al. 2016). The present approach instead attempts to capture experiential grounds of a we-perspective, which is required for a range of collective attitudes including, e.g., holding and expressing collective beliefs or agreements on, say, a certain proposition as well as collective intentions to φ.

In view of this, an analysis of the nature and structure of what I call a “minimal sense of us” and its relation to collective attitudes is needed. In order to do so, I will proceed as follows: I will first give a conceptual description by outlining adequacy requirements a sense of us must meet for being constitutive of a we-perspective—a perspective instantiated in full-fledged collective intentional states expressed in sentences such as “We will go to the cinema tonight”. In a second step, I will further analyze the sense of us with the help of empirical data from social and cognitive psychology and social neuroscience in order to demonstrate its structural relevance for collective attitudes.

Broadly speaking, by a minimal sense of us I mean a certain mental state or structure of a mental state held by individuals who are specifically related to each other. It means to experience and thereby minimally represent another person (or more than one person) as a co-subject of a collective attitude and a (potential or actual) partner for cooperation. The function of the sense of us is—by hypothesis—to pre-structure collective intentional relations to others and to the world, which is why I call it “pre-intentional” (and not non-intentional; see Sect. 1).Footnote 14 To say that the sense of us “pre-structures” collective intentional states means that it puts an agent in a position to utter “we” (followed by, e.g., “intend to φ”; “agree upon p” etc.) as it fixes who is included in the “we”. An adequate description of the sense of us, then, must meet the following requirements: in terms of structure, it is an experiential state having a phenomenal content in virtue of which collective intentional states obtain. This means, first, that it feels a certain way to be specifically related to others in the occurrence of collective attitude. Moreover, in order for the sense of us to be adequately related to collective intentional attitudes it must represent an intersubjective relation, a form of coordination, and it must comprise a motivational component. Furthermore, the sense of us is typically not in the focus of attention. It may yet become salient depending on, for instance, situational factors.

I take these adequacy requirements to be intuitively plausible in order to make sense of the idea that there is a mental state or a structure of a mental state in which a collective attitude is grounded. The sense of us must itself exhibit a certain minimal structure in order to be shown to be constitutive of a full-fledged structured collective intentional state. More precisely, this means that full-fledged collective intentional states are (at least partly) determined by the sense of us. Plausibly, the sense of us, then, features in the content of a full-fledged collective attitude such that implicitly collective self-related information is part of the content (for instance, tacit information about who takes part in the “we” and about each one’s spatial position fixing a shared point of view).Footnote 15 Depending on the context, an agent may make this part of the content explicit, for instance, by uttering “we”.

Building on the requirements previously outlined I will now investigate the sense of us in more detail. The analysis will draw on findings from psychology and cognitive neuroscience. This will be necessary in order to demonstrate the specific features of the sense of us required for its role as a precursor being constitutive of collective attitudes. I will argue that for a sense of us at least three conditions must be in place, each of which will be further specified below: (1) basic intersubjectivity, (2) micro-interaction, and (3) a feeling of binding. Beginning with broadest the two following become more and more restrictive.

3.1 Basic intersubjectivity

Establishing a we-perspective depends, very generally, on an awareness of others as sentient beings. This sort of awareness of others doesn’t require any conscious thought or higher cognitive functions such as inferential reasoning. In line with our everyday experience we are very often only peripherally aware of other beings around us. For instance, walking down the street one is surrounded by many different types of things, sentient beings like persons and dogs and other things such as trees and cars. Usually, one doesn’t particularly pay attention to other pedestrians nearby. And yet one is still aware of their presence.Footnote 16

In order to get a better understanding of this first condition it is important to note that the awareness of others depends on a tacit distinction between oneself and other sentient beings, and more generally, on a distinction between sentient beings and other things. According to findings from developmental psychology, very young infants are already able to differentiate between self and other. Various studies provide evidence that newborns are capable of rudimentary differentiation between self and non-self (both inanimate and living objects) (e.g., Baldwin and Baird 2001). Infants are, of course, cognitively not yet able to understand or conceptualize this difference. The findings, however, suggest that they are at least tacitly aware of the difference between self-related and other-related information. A case in point is a task called “invisible imitation” (Gallagher and Meltzoff 1996). Infants are demonstrably able to imitate body movements and gestures of others while their own corresponding bodily parts are invisible to them (for instance, parts of their face when imitating a smile or grimace). During imitation infants clearly relate visible gestures of others to their own body, and this reveals that infants possess a body schema (a system of motor functions that operates below the level of self-referential intentionality)—along with some perceptual elements of a body image to which they can relate their own movements. The fact that young infants are already able to differentiate between oneself and others seems to explain why this distinction is so natural to (adult) persons and why this normally does not require any cognitive effort.Footnote 17

Furthermore, that this distinction is usually in place is revealed by the fact that, under normal conditions, the behavior of sentient beings is—compared to that of inanimate things—less predictable to agents. Their behavior appears to be more flexible and less foreseeable (e.g., Vogeley 2017). Moreover, to be peripherally aware of other sentient beings around oneself is disclosed in the readiness to interact with them—as opposed to, say, inanimate objects that may be present, too. We wouldn’t be surprised if a pedestrian we didn’t consciously take notice of approached us and started speaking to us. Apparently, we are prepared to interact with others.

However, this basic intersubjectivity, this tacit awareness of others, captures a very broad condition of the sense of us outlined above. Clearly, we are around other sentient beings most of the time without being specifically related to them. Basic intersubjectivity is too weak to give us a we-perspective with those of whose presence we are just tacitly aware.Footnote 18 It is yet a necessary condition for different sorts of social interaction including adopting a collective attitude to φ. What is needed are further conditions yet in which basic intersubjectivity is implied.

3.2 Micro-interaction

Having a sense of us—being aware of others as potential cooperative partners—further depends on a more or less successful interaction with others which gives a mutual awareness of each other. This requires an exchange of appropriate information based on a direct (or adequately mediated) encounter. Relevant to the present analysis are forms of nonverbal interaction based on an exchange of nonverbal social cues: for instance, via eye contact, facial expressions, gestures, bodily posture and voice.Footnote 19 Importantly, an exchange of social cues underlies and guides different types of social interaction such as conversations and joint actions. Such para-linguistic signs contain evaluative information about the other person, and this has an impact on the quality of the respective contact. For example, a facial expression may reflect a subtle emotional state a person is in thereby signaling an evaluation of the particular situation. Empirical findings from social neuroscience moreover show that nonverbal social cues are often produced and decoded in an automatic way. This is due to fast information processing, which agents are usually not aware of (e.g., Schilbach et al. 2013; Vogeley 2017). According to these findings, social interaction based on an exchange of nonverbal cues makes it easier for interacting partners to become aware of each other’s mental states and to interpret and predict each other’s behavior. This type of nonverbal interaction has therefore a guiding function for social cognition. And it seems highly plausible that this, in turn, makes it easier to adopt a we-perspective, for instance, to agree upon a proposition or a common goal to collectively achieve.

Furthermore, evidence suggests that in nonverbal interaction, for example during eye contact, the reward system becomes active (e.g., Pfeiffer et al. 2014). This is interesting as this shows that interaction and likely the prospect of cooperating (in a broad sense) is evaluated as positive: nonverbal interaction seems to come along with an implicit motivation to cooperate with others.

However, micro-interaction as just outlined is non-committal and loose and thus does not yet capture what is needed for a sense of us. Notice that micro-interaction very likely occurs quite frequently in everyday life: entering a café and sitting down or getting on a bus full of people probably cannot happen without catching the gaze of strangers and mutually becoming aware of each other’s gestures and bodily movements. Yet it would be inadequate to say that this is sufficient for a sense of us. Therefore, a further condition is needed, which is clearly required for a we-perspective.

3.3 Feeling of binding

This third condition is supposed to spell out a stronger relatedness between agents as a structural feature of the sense of us necessary for a we-perspective. I call it the “feeling of binding” by which I mean an experience of attunement between agents in a given situation. Being more closely related in such a way should clearly make a difference to forms of ‘loose’ micro-interaction described in the previous section. Note that, besides the two conditions just outlined, an appropriate setting needs to be in place for a feeling of binding to occur. Various findings from social psychology help to get a better grip on this feature. The first thing to mention is what is often referred to as “non-intentional behavioral mimicry”. Evidence suggests that persons have the tendency to unconsciously and non-intentionally mimic the behavior of others they interact with: they mimic their facial expression, their bodily posture and bodily movements such as foot shaking (e.g., Lakin and Chartrand 2003).Footnote 20 This demonstrably induces the feeling of rapport and liking between the individuals in question. Unconscious coordination and bodily synchronization apparently enhance bonding between persons, and it thus very likely supports the formation of cooperative goals and collective intentions.

A stronger case of nonverbal coordination by which evidently a feeling of binding arises is gaze leading. This happens, for example, when a person intentionally draws the attention of another person to an object. For one thing, both the gaze leading person as well the gaze following person have to be aware of each other—this is in line with conditions (1) and (2)—as well as of the object. What’s more, experiments reveal that gaze leading generates a sense of social agency on the part of the gaze leader (Stephenson et al. 2018), an experience of coordination with the other person. Interestingly, this sort of nonverbal coordination is accompanied by a higher activation of the reward system on the neural level. This, again, suggests that the interacting individuals are even more motivated to cooperate (ibid.).

It could be objected, however, that the example of gaze leading describes a special case of interaction, which requires more than what is needed for the feeling of binding. This can hardly be contested. For example, gaze leading presupposes an intention on the part of the gaze leading person: the intention to make another person look in a certain direction. It would yet be wrong to generalize that for the feeling of binding to occur an individual intention is required. Even though gaze leading is indeed a more demanding form of interaction than what is at issue here, we can still identify components which are of interest. As mentioned above, according to experiments a certain experiential state arises from gaze leading, namely a sense of social agency. What is relevant for the present discussion is most notably the social component of this experiential state and not the agency component. Plausibly, this social component can be described as a subjectively felt relatedness to another person with a binding character.

Overall, the examples speak in favor of essential properties of the sense of us conforming to an adequate description as set out at the beginning. They support the idea of a stronger nonverbal intersubjective binding between individuals—a feature that was identified as a requirement for a sense of us and which is rooted in conditions (1) and (2). The structure thus analyzed provides a necessary requirement for a we-perspective instantiated in collective attitudes. It determines who takes part in the “we” and it features the prerequisite of a collective commitment.

In order to make this account more plausible, I will now address the question of possible disturbances of the sense of us. What here suggests itself is to look at relevant discussions of so-called implicit biases. Interesting for the present investigation are findings about automatically activated stereotypes which often lead to unintentional discrimination. The guiding hypothesis is that unintentional discrimination of others will impact the sense of us, which in turn determines how and whether a we-perspective occurs. These considerations thus especially refer to the genealogical sense of “precursor” outlined in Sect. 1.

4 Disturbances of the sense of us

Before addressing this question, some general remarks about how we perceive and categorize other persons are necessary. Most humans automatically track group membership of others on the basis of minimal cues (for instance, the shape of the body, skin color, the tone of the voice etc.). Moreover, we tend to automatically categorize people as part of our own social group (in-group) or of a different social group (out-group) related to gender, race or level of education, just to name a few (e.g., Spaulding 2018). This is not a bad thing in itself—quite on the contrary. Automatic social categorization is useful as it helps to smoothly navigate through everyday life and to quickly cope with unforeseen situations and encounters (Fiske and Taylor 2013). The other side of the coin, however, is that those shortcuts may in certain situations also lead to exclusionary behavior. There is a thin line between ‘mere’ social categorization without any harmful consequences and stereotyping with discriminative behavior as outcome. Stereotypes are relatively stable (over-)generalized beliefs about a particular social group, that is, about sets of properties typically instantiated by its members (Spaulding 2018). They can be negative, positive or neutral.

In philosophy and social psychology, unintentional discrimination towards members of certain groups is often explained in terms of implicit attitudes, which are informed by stereotypes a person harbors. The existence of such attitudes is typically revealed by indirect measures such as the Implicit Association Test (IAT) and variations hereof (Greenwald et al. 1998, 2009). A strong focus of current empirical approaches is put on the quality and validity of the IAT (see for an overview Madva and Brownstein 2018; Brownstein et al. 2019). Furthermore, many philosophical discussions are concerned with the question of the nature and structure of implicit attitudes, for instance, whether these are belief-like states or dispositions (Mandelbaum 2015), in-between beliefs (Schwitzgebel 2010) or so-called aliefs (Gendler 2008). It is beyond the scope of this paper to discuss the nature and structure of implicit attitudes in detail. However, as should become clear the considerations to follow speak in favor of a dispositional account.

Clearly, discriminative behavior whether unintentional or intentional affects the prospect of cooperation: one such possible expression of hostile behavior towards another person belonging to a different social group is avoidance of cooperation. My interest here is the question of what kind of influence implicit attitudes may already have on nonverbal interaction and whether it is plausible to assume that this in turn affects the prospect of cooperation. First, it seems very likely that the activation of implicit attitudes informed by stereotypes have an impact on nonverbal interaction between persons in a given situation.

Some experiments indeed provide evidence for this to happen. For instance, in an experiment the nonverbal behavior of persons who explicitly claimed not to be prejudiced towards obese people was tested (Bessenoff and Sherman 2000). However, when confronted with obese persons, they smiled less frequently at the target person and they put their chair in a greater distance to her than in a control setting. Although the test persons claimed not to be prejudiced they nevertheless clearly showed avoidance behavior in nonverbal interaction. Such a behavior is best explained by implicit prejudice against obese people the test persons harbor. The results suggest, then, that implicit attitudes impact nonverbal interaction in a way that very likely negatively affect a we-perspective to arise. This interpretation is confirmed and further explained by other studies in which the impact of implicit stereotypes on the duration of eye-contact was tested. In one early study the experimenters compared the duration of eye contact of white persons being interviewed by a white person with the duration of eye contact when the interviewer was a person of color (Dovidio et al. 1997). In the first case (white interviewer) the test persons indeed showed a longer duration of eye contact, while in the second case (person of color interviewer) they showed higher rates of blinking. Importantly, these higher rates of blinking are related to negative arousal, whereas longer durations of eye contact reflect greater attraction to the other person, intimacy and respect (ibid.). These differences in eye contact are explained by implicit biases on the side of the test persons towards persons of color. The first case, on the other hand, indicates a preferential attention towards the white interviewer. This signals what in the above analysis of the sense of us has been identified as an attunement towards another agent—a feature of condition (3) the feeling of binding, which in the second case is missing. In a further more recent study, the impact of preferential attention to the eyes of white in-group members was related to the willingness to interact with them (compared to black out-group members) (Kawakami et al. 2014). The experiments demonstrate that preferential attention indeed predicts the willingness to approach a white in-group member and to choose her as a partner rather than a black out-group member.Footnote 21 We can conclude from these findings that implicit attitudes informed by stereotypes impact mechanisms underlying the sense of us such that it affects the process of establishing a we-perspective. These results thus provide empirical support to the hypothesis of my argument that collective attitudes presuppose low-level interaction, which is an essential feature of the sense of us.Footnote 22

The upshot of these studies is that implicit attitudes noticeably impact nonverbal interaction. It seems reasonable to say that, on a very low level, implicit attitudes may block the sense of us and thereby a we-perspective to emerge. The evidence thus supports the claim that the possibility of establishing a we-perspective doesn’t occur because the sense of us is disturbed. In the cases discussed implicit stereotypes impact the target phenomenon in a negative way.

On this basis, two claims in view of the target phenomenon can be made: first, the discussion supports the thesis of the present paper according to which the sense of us is indeed a relevant precursor of a we-perspective. Undisturbed, the sense of us due to its inner structure as outlined in the previous section gives rise to a we-perspective in which, for instance, collective goals are being fixed. Disturbances thus indirectly reveal an otherwise positive function of the sense of us in view of collective attitudes and its grounding role for a we-perspective. Second, the possibility of disturbance shows that the sense of us requires a perspective which is not biased by implicit stereotypes. This unbiased perspective is yet only one of numerous other features which may play an enabling role for the sense of us. Prominently, the specific context in which interaction takes place may play such a role, too. For instance, it may well be that people who are in a particular place stay with themselves due, for instance, to social rules. Imagine a very small café with people sitting at tables very close to each other without interacting in any relevant way. In such a situation it is very unlikely that a sense of us arises between the visitors unless the situation changes and, say, a person enters the café who is visibly in need of help. In such a case it becomes very likely that those present start to interact in a more narrow way and will develop mutual goals to take certain relief measures.

Furthermore, specific personality traits of individuals presumably also make a difference to whether or not a sense of us emerges. Socially anxious people may lack a basic openness to other people. Such an openness facilitates interaction with others necessary for a sense of us. Outgoing people, on the other hand, may be more prone to the sense of us in certain situations. These few remarks should suffice to show that the existence and the quality of the sense of us depends on various influencing factors which cannot be dealt with here in detail. The aim of the above considerations was to provide an empirically informed core concept of the sense of us in which suitable circumstances were presumed as given.

5 Conclusion

In this paper, I argued that an adequate description and understanding of collective attitudes requires to take experiential underpinnings and precursors of such attitudes into account. The reason for this is that theories focusing on an analysis of full-fledged explicit collective attitudes run the risk of circularity or infinite regress. The leading hypothesis of the paper was that the experience of persons being specifically related to each other, called the sense of us, renders a core feature of a we-perspective, which is instantiated in collective attitudes. In order to show that this hypothesis indeed holds I analyzed the structure of the sense of us by drawing on empirical findings from psychology and cognitive neuroscience. These findings support the suggested view according to which basic intersubjectivity, forms of nonverbal interaction and a feeling of binding are conditions of the sense of us. Furthermore, the analysis revealed important functions of the sense of us: for one thing, it facilitates perceiving the mental states of potential cooperative partners. This, in turn, is making it easier to agree upon common goals and to adopt a collective attitude and express it accordingly. Moreover, it was shown that the sense of us comprises a motivational component, a mechanism which explains why agents have a tendency to cooperate. Finally, some possible negative effects of implicit stereotypes on central features of the sense of us were outlined. They especially impact nonverbal interaction and block the feeling of binding, and this suggests that the persons involved will be prevented from adopting a we-perspective. The findings indirectly give support to the thesis of the paper according to which a we-perspective is at least partly grounded in the sense of us.