Our status as rational animals is marked by several distinctive abilities. Chief among these are the ability to interpret one another’s behaviour in terms of beliefs and desires (mind-reading), and the ability to reflect on and evaluate the reasons we have for holding our beliefs (reflective reasoning). It has often been argued that such capacities might originate in socio-linguistic interaction (Davidson 1982; Pettit 1993; Brandom 1994; Tomasello 1999; Heyes and Frith 2014), and a growing body of empirical evidence now supports this view. How these abilities might originate in such interaction, however, remains something of a mystery.

Here we draw attention to a rarely explored socio-linguistic phenomenon that we believe plays a central role in this development. We call it ‘joint attention to mental content’ (JAM). Joint attention is generally understood as the ability to focus with another on an external object or event, like an apple or a game. However, with linguistic skills of sufficient complexity, we can jointly attend not just to external objects, but to the contents of our mental states—beliefs, reasons, plans and the like. This capacity is routinely on display in ordinary conversation. If I say to you ‘I think we should go for a picnic’, and you reply ‘that’s a terrible idea, it’s raining!’, the focus of this conversational exchange is not an external object like an apple, or an orange. Rather, what we are jointly focused on and have exchanged attitudes to is a mental content: my plan for a picnic.

We argue that the ability to engage in joint attention to mental contents is instrumental to the development of our rational capacities. Our argument is simple. A key element of mature mind-reading and reasoning is thinking about the contents of our beliefs and reasons under ‘multiple attitudes’—understanding that what I believe, you might disbelieve; or that what I now endorse as a good reason to believe something, I might later reject as a bad reason. In JAM, where children begin to exchange attitudes to mental contents with others for the first time, they are provided with the means to discover that mental contents can be evaluated under these various attitudes. The result is that JAM provides a natural platform for the development of children’s mind-reading and reasoning abilities.

We begin by exploring how joint attention to external objects and events begins at around 9 months of age, and how a new kind of joint attention—joint attention to mental content—emerges around 4–5 years (Sect. 2); we then explore how the acquisition of the ability to engage in JAM affects children’s mind-reading (Sect. 3) and reasoning abilities (Sect. 4).

Joint attention at two levels

Joint attention is generally understood as a capacity to focus together with another on an external entity—like an apple, or a football game (cf. Eilan 2005). To see the importance of joint attention to mental content, we need to begin by exploring this simpler ability.

Joint attention to external entities

In joint attention to an external object or event, two agents attend to the same entity, while mutually recognizing that they are both attending to that entity. If we are facing one another with a lighted candle between us, we can each see not only the candle, but also that we are both attending to the candle, and that we can both see that we are both attending to the candle. In this scenario, we are ‘jointly attending’ to the candle (Lewis 1969; Schiffer 1972).

The ability emerges pre-linguistically at around 9 months (Tomasello 1999). At this age children begin to recognize when their attention is aligned with their caregiver on an object, and begin pointing things out to bring their attention into alignment (Carpenter et al. 1998). Joint attention transforms children’s interactions with others. It has been shown to support word-learning (Bruner 1983), to support our ability cooperate with others (Warneken 2006), and to play so prominent a role in social interaction that a poor ability to engage in joint attention predicts pervasive social difficulties (Mundy and Newell 2007).

There are two essential features of joint attention. First, joint attention involves the coordination of perspectives on an object. If an infant and parent are jointly attending to an apple held between them, it is transparently clear to each what the other is attending to. This is why joint attention is so helpful for word-learning: when it is transparently clear to a child which object a care-giver is attending to, it is clear to the child which object is being referred to with a novel word (Tomasello 1988, 2003).

Second, and due to this coordination of perspectives, joint attention allows us to recognize differences between our perspectives on or attitudes to the objects we are attending to—what has often been called ‘perspective-taking’. If we are jointly attending to an apple that looks good to me, and I see you grimace, it is immediately clear to me that you have an attitude to the apple that conflicts with my own—perhaps you hate apples even though I like them. Because I know which object you are attending to, it is obvious to me that a smile or grimace you produce in this context is a response to that object. As a result, even if it had never have occurred to me that anyone could dislike apples, such a conflicting attitude to my own can become obvious in a context of joint attention. Moll and Meltzoff (2011) argue, indeed, that perspective-taking develops in infancy on the basis of joint attention. In support of this we find that by around 14 months, shortly after the emergence of joint attention, children begin to keep track of what others have or have not seen, recognizing differences in visual perspective (Moll et al. 2006; Moll and Tomasello 2007); and around the same time they begin to understand that others may have different attitudes to objects of joint attention, such as recognizing that an adult might like broccoli even though this is something the child dislikes (Repacholi and Gopnik 1997).

In these ways, joint attention to external entities plays a foundational role in children’s ability to interact with others around external objects, and to learn about the differences in attitude we might have to those objects. However, external entities are not the only kinds of things to which we can engage in joint attention. Later on in childhood, we begin to engage in joint attention to mental contents.

Joint attention to mental contents

By ‘mental content’, we mean the content of any thought, belief, desire, hope, plan etc.Footnote 1 If I believe that it is raining, the content of my belief is the proposition or claim or hypothesis ‘it is raining’. If I doubt that it is raining, my mental state has the same content—but now I doubt what I believed before. I may also hope that it will rain, expect that it will rain, etc.

It is clear that an important feature of our mental lives is our ability to attend to the contents of our own mental states in introspection—for example when we reflect on the content of our beliefs and wonder whether they are true or not, or when we think about our plans for the future and consider whether they are coherent (Dunlosky and Metcalfe 2009; Smithies and Stoljar 2012). But can we jointly attend to mental contents, as we might jointly attend to an apple? The idea of joint attention to mental content might sound odd prima facie. Joint attention is generally thought of as a perceptual phenomenon, and yet mental contents are not perceptible. However, the expressions of a shared language are perceptible, and we shall argue that by attending to these expressions, we can engage in joint attention to mental contents. Let us first consider some examples.

figure a

Here, the first speaker reports a dramatic claim he has heard at the conference. The second speaker responds that the claim is not true—even crazy. Notice the discourse demonstrative ‘that’ in the second speaker’s utterance. To what does this demonstrative refer? Clearly, to the content of the first speaker’s assertion: the proposition that the earth is flat. This is the object of the conversation, the veracity of which is under discussion; in general, indeed, discourse demonstratives are understood to refer to propositions (Fillmore 1997, pp. 103–106). A standard requirement on the fluent use of a demonstrative, however, is that the referent of the demonstrative is available for joint attention (Diessel 2006). The ease with which we can use discourse demonstratives therefore suggests that in ordinary conversation, a proposition—which is a prototypical mental content—can be made available for joint attention. In Flat Earther, we are jointly focused on the first speaker’s claim ‘the earth is not round at all, it’s flat’, and as a result we can easily pick that claim out using demonstratives as the conversation progresses. Now consider a second example.

figure b

Here we have another mundane exchange. Notice again the use of the demonstratives ‘this’ and ‘it’. These refer clearly to the rule described in the note on the wall. But a rule is not an external concrete entity—it cannot be seen or heard; rather, a rule is something we entertain in thought or commit to—it is a mental content. This example also brings out the second feature of joint attention considered above—that joint attention allows us to exchange conflicting perspectives on or attitudes to the objects of joint attention. In this case, the first speaker says ‘this is a terrible idea’, but the other disagrees ‘no, I think it’s great’—expressing a conflicting attitude to the same rule. Since ordinary conversation exhibits our ability to fluently exchange attitudes to mental contents, we have another reason to think that in such conversations, mental contents are the focus of joint attention.

Just as ordinary joint attention allows us to coordinate attention on, and exchange attitudes to external objects or events, it would appear that we can also coordinate attention on and exchange attitudes to mental contents, in JAM.

How does JAM come about?

As we remarked above, joint attention is generally conceived of as a perceptual phenomenon: each person perceives the other perceiving a common object. And yet, mental contents are not externally perceptible objects that we can interact around as Schiffer (1972) describes us interacting around a candle. JAM might seem at a first glance, as a result, like a very strange phenomenon—appearing to involve us engaging in joint attention to something imperceptible. However, the expressions of a shared language are indeed perceptible, and by monitoring one another’s attention to, and reactions to these expressions, we can jointly attend to mental contents (cf. O’Madagain 2016).Footnote 2

This is easiest to see in the context of written expressions, as in the Lunch Rule example, where we can see each other paying attention to the note on the cork-board. In such a case, it is straightforward to see how we are engaged in joint attention to a mental content: we can each see the other attending to the note, and we know very well that each of us is focused on the meaning or content of the note, as opposed to the colour of the ink. The result is that we jointly attend to the content of the note—each of us aware that the other is thinking about the same message. This process is clearly also at work in verbal conversation. In the course of a conversation, each of us carefully monitors that the other is listening to what we say, and to the other’s reactions to what we say. This is what allows us to keep track of a topic of conversation, such as a plan for a picnic, a rule for work or a wild claim about the earth being flat, and to discover one another’s attitudes to that plan or claim over the course of the conversation.

Simply sharing a language is not sufficient for JAM, however. By about 2.5 years, children are able to engage in simple linguistic conversations—coherently exchanging comments, in turn, about a single entity. But at this age the objects of such exchanges are external entities—an apple, a toy, etc. It is not until much later, between 4 and 5 years, that children acquire the ability to have a conversation about a mental content, like a plan or a belief. The most explicit form of this kind of discourse is found in ‘sentential complement clause’ constructions (De Villiers and De Villiers 2000), which are mastered around 4 years of age. These are constructions such as ‘Sally thinks the ball is in the box’. In such constructions, a proposition is introduced (‘the ball is in the box’), and the attitude of the speaker to the sentence is expressed (‘Sally thinks’). In discourse with such constructions, which we will call ‘SCC discourse’, conversational participants can refer back to a proposition that has been introduced and they can fluently exchange attitudes to it (A: ‘Sally thinks the ball is in the box!’, B: ‘I don’t think that’s true...’). Although SCC discourse is the most explicit form of JAM, it can also occur less explicitly, if one speaker simply asserts a claim, and the other denies it—as we see in the Flat Earther exchange. Here the speaker who asserts the claim implies she believes it, and the one who denies it implies that she disbelieves it. Discourse focused on a proposition, allowing the use of discourse demonstratives and an exchange of attitudes to that proposition, will therefore be sufficient for JAM to take place.

Let us suppose, then, that this kind of discourse makes JAM possible. Might this ability make any difference to our cognitive development? An impressive range of evidence indicates that our ability to think about our own and other’s minds changes in two important ways in early childhood, as a result of socio-linguistic interaction: false-belief understanding, which is a major component of mind-reading, and reasoning. In the next two sections we explore the role that JAM might play in each.

JAM and mind-reading

One of the most useful skills we have as social agents it the ability to think about others’ mental states such as beliefs, and to predict their behavior on that basis. To engage in such ‘mind-reading’, we surely need to grasp the concept of belief. And grasping the concept of belief, as Dennett (1978) argued, seems to require that we understand that beliefs can be false. This insight lead to a series of experiments exploring children’s mind-reading abilities, which seem to have shown that children’s understanding of beliefs is transformed around 4 years of age. We now argue that JAM plays a key role in this transformation.

The development of false-belief understanding

The first of the ‘false-belief’ tasks developed have come to be called ‘explicit’ tasks, because they involve asking children direct questions that reveal their understanding of beliefs. In a ‘change of location’ task, children are told a story in which a character ‘Sally’ leaves her toy in a basket, and then leaves the room. While she is gone, the toy is hidden in a box. Sally comes back to get her toy, and children are asked where she will look. Remarkably, up to around 4 years children say she will search for the toy in the box, where it was hidden, rather than where she left it (Wimmer and Perner 1983). This suggests that up to around 4 years, children cannot understand that others might have false beliefs. Similarly, if children are presented with a container that appears to contain one type of object but is revealed to contain another (e.g. a box of smarties that is revealed to contain pencils), and are then asked what they originally believed was in the container, up to around 4 years they answer with the real contents (saying ‘pencils!’ rather than ‘smarties’) (Gopnik and Astington 1988). This indicates that up to the same age, children do not understand that they themselves may have had false beliefs in the past. Finally, in an ‘appearance/reality’ task, children are presented with an eraser that looks like a bar of chocolate. They are first asked what the object looks like, and reply that it looks like a bar of chocolate. They are now shown that it is really an eraser, and asked again ‘but what does it look like’. Up to around 4 years, children reply that it now looks like an eraser (Flavell et al. 1983), apparently failing to distinguish between appearance and reality. Since distinguishing appearance and reality requires understanding that appearances can be deceptive, or lead to false beliefs, this task too is taken to illustrate a difficulty with false-belief understanding. Taken together, these results indicate that children do not understand that beliefs can be false up to around 4 years.

However, the story is not so straightforward. So-called ‘implicit’ false belief tasks, which measure where participants look rather than how they reply to questions, are passed by much younger infants, and indeed apes. In a version of the change of location task, 1-year-olds and chimpanzees are ‘surprised’ (look longer) when an agent reaches for a target object where it really is rather than where she last saw it (Onishi and Baillargeon 2005; Surian et al. 2007; cf. Southgate et al. 2007; Krupenye et al. 2016; Buttelmann et al. 2017); and in a ‘false-contents’ task, infants are surprised when an agent searches for a toy in its actual location, rather than in a misleading container that appears to contain the toy (Song and Baillargeon 2008). Many interpret these results to show that infants can, after all, represent others’ false beliefs.

How do we make sense of these apparently contradictory results? On a ‘nativist’ view, infants can indeed represent other’s false beliefs, but fail the ‘explicit’ tasks up to 4 years because they don’t understand pragmatic aspects of the tasks—such as not understanding the relevance of an agent’s beliefs to the experimenter’s questions (Leslie 1994; Helming et al. 2016; Westra and Carruthers 2017). Overall, however, we don’t think this provides a very good explanation for the explicit tasks taken together—because the tasks are pragmatically very different. The appearance-reality task does not require children to talk or think about beliefs, but instead to talk about what something ‘looks like’. Failure here cannot be explained by a poor understanding of the relevance of beliefs to an adult’s question—since beliefs are not directly relevant here. As a result, the pragmatist needs to appeal to multiple different pragmatic developments in order to account for all of the data. It is more convincing, we think, to suppose that what explains the difference in performance between younger and older children is a difference in what the tests were designed to explore—false-belief understanding.

Fig. 1
figure 1

False-belief attribution as the attribution of conflicting representations of an object. The character on the left represents it as an apple, but recognizes that the character on the right represents it as a banana (cf. Wellman 2014)

On an opposing ‘constructivist’ view, something substantially changes in children’s understanding of beliefs. One proposal is that infants cannot yet represent others’ beliefs, and solve the early tasks simply by tracking what others have seen (Butterfill and Apperly 2013). This proposal, however, is undermined by infants’ success in the implicit ‘false contents’ task, where keeping track of where the protagonist looks cannot be used to predict the outcome (Carruthers 2013). Nevertheless we think the ‘constructivist’ approach is broadly correct—something has changed in children’s understanding of belief. We think, however, that what has changed is not that older children have newly acquired the ability to represent beliefs, but instead that they have newly acquired the ability to evaluate mental contents under multiple attitudes. As we shall argue, they acquire this ability through JAM. But first let us more carefully explore what false-belief understanding involves.

‘Mature’ false-belief understanding

False-belief understanding is sometimes characterized as the recognition that another might represent the world in a way that is incompatible with my own view. For example, Fig. 1 is a diagram that purports to describe an understanding that another has a false belief about the contents of a box (Fig. 1, adapted from Wellman 2014, p. 35). Here the character on the left attributes a false belief about what’s in the box to the character on the right, by regarding the other as representing that the object is a banana, while the first character represents it as an apple. On this model, false-belief understanding amounts to recognizing that another might have a belief about the world that is incompatible with my own. Such an account of false-belief understanding is not wrong, but we think it is problematically under-described.

When I attribute to someone a false belief, I don’t just recognize that we represent the world in incompatible ways—that you believe it’s raining while I believe it’s sunny, for example. Instead, if I attribute to you a false belief that it is raining out, I generally think to myself, ‘she thinks it’s raining, but that’s not true!’ That is, I represent that you take an attitude of belief to the hypothesis ‘it’s raining’, even as I have the contrary attitude of disbelief to that very same hypothesis. Rather than representing us as having incompatible views of the external world, I represent us as having incompatible attitudes to a given hypothesis. This is what philosophers have called ‘propositional attitude taking’, where we consider multiple distinct attitudes to the same proposition or hypothesis, that it may be believed or disbelieved, taken to be true or false, by different parties or at different times (Brentano 1874; Davidson 1982; Engel 2012; Perner and Roessler 2012).

To adapt Wellman’s diagram to illustrate this, we need to allow that the character on the left considers the content of the other’s belief, and thinks ‘that’s false...but she believes it!’ (Fig. 2). Here each party thinks about the same mental content (the hypothesis that the box contains a banana, as depicted in the bubble) but adopts a distinct attitude toward that content—one believes it, or takes it to be true, the other disbelieves it, or takes it to be false. Rather than illustrating minds with different mental contents, as we saw in Fig. 1, in Fig. 2 we aim to illustrate two people taking different attitudes to the same mental content.

Fig. 2
figure 2

False-belief attribution as the representation of conflicting attitudes to the same mental content. Both characters consider the same hypothesis—that the object in the box is a banana. The character on the left disbelieves the hypothesis but recognizes that the character on the right believes it

We suggest that it is the ability to consider mental contents under various attitudes that children have acquired competence with at around the 4–5 years.Footnote 3 And we think that this explains the onset of their competence with the explicit false belief tasks. To see how the ability to represent conflicting attitudes to a common mental content might help to resolve the explicit tasks, just consider how you would reason through such cases as an adult, for example the Sally–Anne task. The first thing you are asked in this task is ‘Where will Sally look for her toy’. You have just seen the toy moved into the box, so the first thing that occurs to you is the actual location of the toy. ‘Well, the toy’s in the box’, you might think, initially focusing on where you believe the toy is. However, since you can represent conflicts between your own attitude to this hypothesis and someone else’s attitude, you can now go on—‘but of course Sally doesn’t believe that!’, invoking Sally’s opposing attitude to the very thing you think is true (the hypothesis that the marble is in the box). And now you can move on to consider Sally’s belief—‘Sally thinks, rather, that the toy’s in the basket’. By recognizing that Sally has an opposing attitude to the very thing you believe, you have a way to ‘corral’ your own belief in the context of thinking about Sally. Even though your own representation about the location of the toy is salient and distracting, once you understand that Sally might have an opposing attitude to that representation of the world, you have a way to deal with this representation in the context of evaluating Sally’s beliefs.

In fact, all of the explicit tasks require us to think through conflicting attitudes in this way. In the ‘change of location’ task, the subject has to recognize that the very thing she believes (that the toy is in the box), is disbelieved by the protagonist—that they have different attitudes to the same proposition. In the false contents task, to give the right answer, a subject has to recognize that what she now believes is false, she earlier believed to be true. That is, that the subject herself had two distinct attitudes to the same mental content at different times—‘before I believed there were smarties in the box, but now I don’t believe there were smarties in the box’. This is not simply a matter of having two different beliefs at different times, then, but taking two different attitudes, at different times, to the same mental content. In the tasks involving the appearance-reality distinction, the conflict arises too. Understanding that there is a difference between the way the world appears and the way that it really is requires understanding that a representation or mental content one currently believes on the basis of appearance (that the object before you is an apple), one might come to disbelieve upon closer inspection. Again, that one and the same mental content affords distinct attitudes.Footnote 4

Crucially, the ‘implicit’ tasks that apes and younger children pass do not require this. If I am to predict where someone will search for a banana based on their belief about the location of the banana, I do not need to represent the conflict between our beliefs—all I need to do is represent what they believe, paying no attention to my own contrary belief. Supposing that it is children’s ability to think through conflicts in beliefs that is transformed around the 4 years mark therefore makes sense of infants good performance in the implicit tasks. More recent discoveries support this interpretation. Children as young as 2.5 years have been found to pass the ‘explicit’ false belief task as long as they do not currently have a belief that conflicts with the protagonist in the story: if the target object is removed from the scene, so that the subject no longer has a particular belief about where it is located, younger subjects can correctly predict that the protagonist will search in the false location (Scott and Baillargeon 2009; see Rubio-Fernández and Geurts 2016 for a similar result). Some have argued that it is the ability to suppress our own beliefs when considering someone else’s that develops at the later age (Scott and Baillargeon 2009, p. 1176), explaining the difference in performance between age groups. But notice that, as adults, we do not need to suppress our own beliefs in order to represent disagreement with another. In mature mind-reading, we don’t forget about or suppress what we believe ourselves—rather, we are fully aware of what we believe ourselves, while understanding that someone else might not agree with us. It seems more convincing to us, then, to suppose that what develops at the later age is not the ability to suppress our beliefs, but rather the ability to think through conflicts in belief. As we have proposed, this is made easier when we understand that different people can take different attitudes to one and the same claim—that the very thing I think is true, Sally might think is false, and vice versa.

Overall, then, we can make good sense of the apparently contradictory data from the implicit and explicit tasks by supposing that what children acquire at around 4 years is the ability to represent conflicting attitudes to a single hypothesis—or, to engage in propositional-attitude taking. The next question to ask is how they acquire this ability.

JAM and mature false belief understanding

We have argued that what children come to understand at around 4–5 years is that mental contents can be evaluated under multiple attitudes. This is what allows them to reason through cases where their own belief conflicts with that of the protagonist, without getting ‘stuck’ on their own belief. But what is it that causes children to come to acquire this understanding? We think JAM is the answer.

There already exists substantial evidence that passing the ‘explicit’ tasks appears to be contingent upon some very specific types of socio-linguistic interaction. Children’s ability to engage in discourse with sentential complement constructions (see Sect. 2.3) is strongly correlated with their ability to pass the explicit tasks (see Milligan et al. 2007, for a review). Children growing up deaf and with limited exposure to sign language are significantly delayed—in some cases well into adolescence—in their competence with these tasks (Peterson and Siegal 1995; Woolfe et al. 2002). Deaf children who grow up without a conventional sign language fail these tasks even as adults (Pyers and Senghas 2009). And training with SCC discourse has been shown to significantly increase children’s performance in these tasks (Lohmann and Tomasello 2003; Hale and Tager-Flusberg 2003; Wellman and Peterson 2013). This has lead many to argue that grasping the concept of belief depends on language—specifically discourse with SCC constructions.

Why does SCC discourse have this effect? There are various suggestions, but the most well-known is de Villiers and de Villiers’ claim that by acquiring linguistic constructions that have the syntactic structure of others’ beliefs (e.g. ‘S thinks that p’) it becomes possible to have thoughts about those beliefs. On this view, reminiscent of the approach of Whorf (1964), new linguistic constructions literally act as the representational vehicles of thoughts (see also Olson 1988; Bartsch and Wellman 1995; Astington and Baird 2005, for similar approaches). However, evidence from training studies that we discuss below (Lohmann and Tomasello 2003; Hale and Tager-Flusberg 2003) indicates that it is not merely the acquisition of the syntax that is augmenting children’s abilities, but rather its use in specific kinds of social discourse, which can in fact be carried out without this specific syntax.Footnote 5

We propose something different. The reason SCC discourse plays a role in the development of false-belief understanding, we suggest, is that SCC discourse involves jointly attending to mental content. And JAM allows us to discover that others have different attitudes from our own to the contents of our thoughts.

Recall how joint attention to external objects makes it clear that those objects afford ‘multiple attitudes’. First, joint attention allows us to see that our perspectives are coordinated on the same object. Second, and because of this coordination of perspectives, joint attention makes the differences between our perspectives on or attitudes to an object salient. If I see you scowl at an apple that looks tasty to me, it is made particularly clear that our attitudes to the apple diverge. As we might expect, children’s ‘perspective taking’ abilities—their understanding that we can have divergent attitudes to external objects—develops on the basis of joint attention to those objects (Moll and Meltzoff 2011).

We suggest that JAM has exactly the same effect, but at the level of mental contents rather than external objects. By engaging in joint attention to mental contents, we are placed in a position to exchange attitudes to those contents. The result is that as children begin to engage with others in JAM, they come to think of mental contents in terms of these multiple and potentially conflicting attitudes: they acquire the ability to engage in perspective taking on mental contents. Let us return to the opening examples to illustrate. In Flat Earther, one speaker reports on the claim he has heard: ‘the earth isn’t round at all, it’s flat!’ Once he has introduced the proposal, his friend can express her attitude to it, which in this case is one of disbelief: ‘that’s not true, that’s crazy!’ In Lunch Rule we find the same thing, but now the conflicting attitudes are made more explicit. One speaker expresses that she thinks the new rule is a great idea, and the other expresses the opposite (‘I think it’s a terrible idea!’).

Now consider again the challenge that children are faced with in passing the ‘explicit’ false belief tasks. The challenge is not simply to represent another’s belief, but to represent different attitudes to the same mental content at once. That Anne thinks it’s true that the ball is in the basket even though I think that’s false; or that I believed the box contained smarties before, but now I disbelieve that very thing. A natural explanation now presents itself for why SCC discourse helps children to pass these tasks. By allowing children to engage in joint attention to mental contents with others, SCC discourse familiarizes them with the exchange of attitudes to those contents, and brings them to think about mental contents as the kinds of things that afford multiple attitudes. Children come to think of the claims they believe as claims that others might disbelieve; or the plans they think are great as plans others might think are terrible, and so on. With enough exposure to these kinds of exchanges, children come to fluently reason through scenarios that require them to think about conflicts in beliefs between speakers, in ways that gave them trouble before. The explanation for the effect of SCC discourse on children’s false-belief understanding is, we therefore submit, that SCC discourse involves JAM, and it is JAM that is bringing about this transformation in children’s understanding.

The evidence that comes closest to supporting our hypothesis is given in the training study of Lohmann and Tomasello (2003). Here children are given ‘training sessions’ with different kinds of discourse before undertaking classic false belief tasks non-verbally.Footnote 6 There are four conditions. In a ‘no-language’ condition, children are given training interacting with an experimenter and a puppet around deceptive objects, such as a candle that looks like an apple. Together they ‘discover’ the deceptive aspects of the objects, but in this condition they do this using only expressions like ‘ooh!’, or ‘look!’. In a second condition, sentences are used, and here the experimenter and the puppet always express conflicting attitudes to one another’s claims about the objects. If the experimenter says ‘it’s a candle!’, the puppet replies ‘a candle? Never! This is an apple, surely!’ This is a simple version of JAM—where conflicting attitudes to a proposition are expressed, but it is ‘implicit’ in the sense that SCC constructions are not used, and there are no mental state terms. In a third condition, children are given practice with SCC discourse, but where there is no disagreement expressed. If the child says ‘I think it’s an apple!’, the experimenter will agree ‘I also think it’s an apple!’. This is also a kind of impoverished version of JAM—expressions of attitudes to mental contents are produced, but no conflicting attitudes are ever expressed. If the ‘syntactic’ view of De Villiers and De Villiers (2000) were correct, where it is simply the syntax of these constructions that allows children to represent others’ false beliefs, then this condition should show maximal improvement in false-belief understanding. Finally, children experienced a ‘full’ condition in which they received SCC discourse plus disagreement—the most explicit form of JAM. Here are some examples of the exchange in that condition:

figure c

The results of this experiment support the JAM account. The condition in which children are given training with JAM in its most explicit form, involving both the explicit expression of attitudes using mental state terms (SCC discourse) and the expression of conflicting attitudes, showed double the improvement in performance on subsequent false-belief understanding of the other language conditions. The SCC condition and the discourse-only condition, in both of which JAM is present in a minimal form, showed moderate improvement in false belief understanding compared to the no-language condition, but significantly less than when used together. If the ‘syntax’ account of De Villiers and De Villiers (2000) were correct, the SCC condition should outperform the discourse-only condition, but it did not—the conditions improved performance to the same degree. And so the evidence established in Lohmann and Tomasello (2003) can best be understood as evidence for a role for JAM in the acquisition of false-belief understanding (see also Hale and Tager-Flusberg 2003; Wellman and Peterson 2013 for similar outcomes). JAM is the key to the development of children’s understanding of belief.

The JAM view departs considerably from a ‘syntactic’ view like that of de Villiers’ (2000), since the syntax of SCC constructions is not necessary to engage in JAM. Although SCC discourse provides the most explicit framework for the exchange of attitudes to propositions, this is often done implicitly. As we discussed in the opening, if one speaker states ‘The earth is flat!’, and her friend replies ‘that’s not true!’, the pair have now exchanged attitudes to a proposition—but SCC discourse has not been involved. Neither should the JAM account be thought of as a pragmatic account, on which the role of socio-linguistic interaction is to familiarize children with the kind of answers are expected of them in false belief tasks (Westra and Carruthers 2017). In JAM children make a genuine discovery: they discover that mental contents afford multiple and sometimes conflicting attitudes, that they can be believed or disbelieved, certain or dubious, true or false. And it is due to making this discovery that they come to evaluate mental contents (their own and others’) under these and other attitudes.Footnote 7

Let us suppose, then, that JAM plays this role in the acquisition of false-belief understanding. But now consider that false-belief understanding can be seen as a necessary ‘first step’ in a much more general cognitive ability: reflective reasoning. In our final section we explore how JAM figures in the development of reflective reasoning more broadly.

JAM and reflective reasoning

Reflective reasoning is the ability to evaluate the quality of the reasons we have for holding our beliefs (Mercier and Sperber 2011). Such an ability could not be held by a creature that does not understand that her beliefs may be false, since a good reason for belief simply is a reason to think the belief is not false. As a result, our account already shows how reflective reasoning may have its roots in socio-linguistic interaction: if false belief understanding begins in JAM, and false belief understanding is required for reflective reasoning, then the latter is also ultimately rooted in JAM. However, if the account provided so far is correct, then it is natural to think that JAM plays an even more thorough-going role in the acquisition of reflective reasoning. If JAM is responsible for bringing children to evaluate propositions under multiple attitudes, after all, then it likely introduces them to the evaluation of reasons in the same way.

Just as we have evidence that false-belief understanding develops on the basis of socio-linguistic understanding, we also have a good deal of evidence that reasoning more generally depends on interaction with others. Some studies appear to show reasoning in very young children. For example 2 years olds have been shown to prefer to believe a speaker who provides multiple premises supporting a claim to a speaker who simply repeats themselves (Castelain et al. 2018). However as Castelain et al. note, their results could be explained by children preferring to believe speakers who provide more information rather than less. By 3–4 years, children are more convinced by assertions based on evidence (such as “I looked and I saw an apple in the bag”) than wishful thinking (“I want there to be an apple in the bag”) (Koenig 2012), which shows some sensitivity to reasons. But when Köymen et al. primed children with evidence that would refute a partner’s argument, they found that only by 5 years would children reliably produce that evidence as an objection (Köymen et al. submitted), showing clear evidence of the evaluation of reasons (although younger children could produce the objections with training, discussed below). By 5 years children can also recognize their own deductive inferences as more reliable than speculation (Pillow 2002), and by adolescence, children can recognize the difference between validity and soundness. These abilities continue to develop in sophistication into adulthood (see Moshman 2009 for review).

A good deal of evidence indicates that this development depends on social interaction. It has been found that children produce less arguments at 5 years if their caregivers engage them mostly through direct orders, than if they use a more discursive style (Kuczynski and Kochanska 1990; see also Ensor and Hughes 2008); that children’s problem-solving abilities are improved after they engage with a peer in a collaborative reasoning task (Doise and Mugny 1984; Perret-Clermont 2004); that school age children learn more efficiently when engaged in collaborative tasks than solitary tasks (Johnson and Johnson 2007); that school-age children are better able to detect mistakes in reasoning when they have been taught about these kinds of mistakes by others (Weinstock et al. 2004); and finally, a range of rational fallacies are mitigated in discursive or argumentative contexts, suggesting that reasoning has an origin in such contexts (Moshman and Geil 1998; Mercier and Sperber 2011). We have solid grounds, then, for thinking that reflective reasoning is developing on the back of discourse with others.

We think that the way in which discourse supports this development is best understood in terms of JAM. We have already argued that the kind of false-belief understanding that JAM supports is required for reflective reasoning—understanding that mental contents can be evaluated under multiple attitudes, as believable or doubtful, certain or unlikely. But of course reflective reasoning will, similarly, require the evaluation of reasons under multiple attitudes. The goodness of a reason for believing something generally amounts to the extent to which that reason makes the belief likely to be true—either by its cogency (as defended by ‘internalists’ about justification, e.g. Chisholm 1977) or its reliability (as defended by ‘externalists’ about justification, e.g. Dretske 1981; for a review of the issues at stake her see Kornblith 2001). If I am to set about examining my reasons for a particular belief for goodness, I must surely understand that there are at least two conflicting attitudes that I might take to that reason—that it is a good one, or a bad one. Whatever the cognitive mechanism that delivers for us ‘intuitions about arguments’ (Mercier and Sperber 2011, p. 58), then, that cognitive mechanism must allow us to recognize that our reasons or arguments afford multiple attitudes—that they are good or bad, strong or weak.

If we accept that discourse about beliefs provides a natural platform for discovering that beliefs can be evaluated under various attitudes, we can see that a similar if slightly more complex form of linguistic discourse will allow us to discover that reasons, too, can be evaluated under multiple attitudes. Consider another example:

figure d

In Yellow Moon, not just a belief, but a reason for holding a belief is the object of joint attention. Sally is considering what reasons might support the claim that the moon is made of cheese, and proposes as a good reason ‘it’s yellow, and cheese is yellow’. Having announced this candidate reason, her mother can comment on the reason (picking it out with the discourse demonstrative ‘that’), and remark upon the fact that it doesn’t strongly support the conclusion Sally is considering. In an exchange like this, it becomes clear that the two have conflicting attitudes to the reason Sally has introduced to justify the claim—Sally thinks it’s a good reason, but her mum thinks it’s a bad reason and explains why. Through such exchanges, it becomes clear that reasons are the kinds of things that afford multiple attitudes—as good, bad, valid, invalid, convincing, ridiculous, etc. Just as children learn through discourse that the content of beliefs afford multiple attitudes, they can learn that reasons can be evaluated under various attitudes in the same way.

The hypothesis that it is through JAM that children begin to engage in reflective reasoning yields straightforward predictions. First, that just as ‘training’ with discourse where we exchange attitudes to beliefs augments children’s ability to pass false belief tasks, as reviewed above, training with discourse in which children exchange attitudes to reasons should augment their ability to solve reasoning tasks. The studies described in Perret-Clermont (2004) provide tentative evidence for this—children who engaged in a collaborative problem-solving task, and critiqued one another’s arguments, were much more likely to succeed in subsequent problem-solving tasks. However, it is not clear that the problems to be solved in those studies [e.g. solving physics puzzles in Howe et al. (1990)], fall under the category of reflective reasoning (as opposed to ‘intuitive inference’, cf. Mercier and Sperber 2011). A stronger case would be provided by tasking children with recognizing the strength of arguments before and after training with such discourse. This should reveal that discourse in which we exchange attitudes to reasons would directly augment our ability to evaluate each others’ arguments, recognize errors in reasoning, and prefer strong to weak arguments. The study by Köymen and colleagues mentioned above gives preliminary evidence for just such an effect. The experimenters gave one group of children training with discourse involving the evaluation of reasons before tasking them with identifying counter-evidence to a partner’s arguments, while another group did not receive this training. Sure enough, the group who received the training performed significantly better than the group with no training. Talking about reasons, it would appear, improves our thinking about reasons.

Generally the evaluation of reasons in discourse will be implicit, as in the kind of exchange discussed above. But it can also be explicit. Obviously we are taught explicitly about different patterns of reasoning if we take, for example, a logic class—and such explicit instruction has indeed been found to improve children’s reasoning abilities (Weinstock et al. 2004). And when a carer tells a child ‘don’t believe everything you hear’, they are also offering a fairly explicit warning against a particular kind of reason one might hold for believing something. The fact that inferences and arguments are made available for joint attention in linguistic discourse means that from very early on, humans can begin to teach their offspring about good reasoning. JAM provides us a species, therefore, with a very rare skill indeed—the ability to teach our offspring how to think. As insights accumulate over time, and new generations inherit a growing body of knowledge of what makes for good reasoning, and how to guard against mistakes (eventually taking classes in logic and statistics), significant aspects of our competence in reasoning become a part of our cultural inheritance (cf. O’Madagain 2019). All of this begins, however, with JAM.


It is widely recognized that our ability to put our heads to together to learn from and collaborate with one another is one of the most distinctive human abilities. Engaging in joint attention with others is understood as a cornerstone of the acquisition of these skills. Through the simplest kind of joint attention humans come to recognize that others have distinct attitudes to and perspectives on external entities to their own, and this has a dramatic impact on our ability to learn from and cooperate with others. What we have drawn attention to here is that such processes also operate on a second level. Humans can jointly attend not just to external entities and situations, but also, through linguistic exchanges, to the content of our beliefs, reasons and plans.

JAM, we have argued, lets children discover that mental contents are the kinds of things that afford multiple attitudes. This transforms the simpler ‘mind-reading’ that infants and apes are capable of into the mature mind-reading that adults engage in. Now they are thinking not only about how others represent their environment, but about the mental contents they and others entertain in terms of an array of attitudes like belief, disbelief, doubt, certainty, etc. Once children begin to clash with others in discourse over which beliefs are true or false, they begin to attempt to convince their partners by expressing reasons for why their belief is true and their partner’s is false. This brings not only beliefs but also reasons into the scope of joint attention. Now children can learn through discourse with others that reasons too afford multiple attitudes, that some inferences are acceptable and others less so, and they begin to acquire rational norms from social interaction.

A resounding question for cognitive scientists and philosophers is what difference, if any, socio-linguistic interaction might make to cognition. What we hope to have done here is to have identified an ability that linguistic discourse provides us with, that plays a central role in the development of rationality. We have focused on joint attention to the content of beliefs and reasons, but JAM surely also allows us to jointly attend to intentions, plans, memories, and many other mental contents. In this way JAM likely sets human cognition apart from that of other animals along multiple dimensions, letting us inform and learn from all aspects of one another’s thinking, and ultimately endowing us with skills of reasoning built up over generations, and quite unavailable to other species.