In the last 30 years of Theory of Mind research, two types of tasks have been standardly used to investigate the development of false-belief (FB) reasoning in childhood: change-of-location tasks, such as the Sally-Anne, and unexpected-contents tasks, such as the Smarties. In the Sally-Anne task, Sally puts a marble in a box before going out to play. During her absence, Anne moves the marble to a basket, setting the scene for the FB question: ‘When Sally comes back, where will she look for her marble?’ (Baron-Cohen et al., 1985). Hundreds of developmental studies have shown that children under 4 err by predicting that Sally will look for her marble in the basket, rather than in the box where she left it. In the Smarties task, children are shown a tube of Smarties chocolates that is filled with something unconventional (e.g. pencils) and they have to predict what somebody else will say is inside the container. As in standard change-of-location tasks, children under 4 fail unexpected-contents tasks, responding according to their own knowledge of the contents of the tube (Gopnik & Astington, 1988; Perner et al., 1989).

Young children’s difficulties with FB tasks have been interpreted as either an inability to represent other people’s false beliefs due to their immature Theory of Mind (e.g. Baron-Cohen et al., 1985; Perner & Roessler, 2012; for a meta-analysis of FB tasks, see Wellman et al., 2001) or a failure to inhibit the incorrect, true-belief (TB) response due to their immature Executive Function (EF; e.g. Carlson & Moses, 2001; Baillargeon et al., 2010; for a recent meta-analysis of FB and EF correlational studies, see Devine & Hughes, 2014). The first group of accounts of Theory of Mind development, so-called ‘competence accounts’, defend the view that a conceptual change takes place in children’s understanding of mental states at around age 4, before which children do not yet have the necessary understanding of belief to pass FB tasks. In contrast, ‘performance accounts’ defend the view that children under age 4 already possess the basic Theory of Mind abilities necessary to pass FB tasks, and explain their poor performance as a result of their inability to meet the EF demands of these tasks.

Executive functions are a set of cognitive processes (e.g. inhibitory control, working memory, cognitive flexibility and attentional control) that are involved in the cognitive control of behaviour. A well-accepted model of EF is the tripartite model proposed by Miyake et al. (2000), which includes inhibition of prepotent responses, mental set shifting, and updating and monitoring of information. According to performance accounts of Theory of Mind development, the key component of EF that is involved in FB tasks is response inhibition (but for studies on the role of working memory, see Gordon & Olson, 1998; Mutter et al., 2006). For example, Leslie and colleagues argue that children and even adults attribute their own knowledge to others by default, and that this ‘true belief default’ (also known as an ‘egocentric bias’) needs to be actively inhibited in FB tasks (Leslie et al., 2005). According to other researchers (e.g. Carlson & Moses, 2001; Baillargeon et al., 2010), passing a FB task requires actively inhibiting the TB response because the actual location of the object is the dominant response to the test question. All these accounts of Theory of Mind development interpret young children’s failure in FB tasks as a result of their limited inhibitory control. The aim of this review is to re-examine the role of inhibitory control in passing standard FB tasks in view of the better performance that has been observed with bilingual children and adults relative to their monolingual peers.

Bilinguals outperform monolinguals in FB tasks

Theory of Mind studies have revealed that 3-year-old bilinguals outperform monolingual children of the same age in standard FB tasks (Goetz, 2003; Berguno & Bowler, 2004; Kovács, 2009, 2012; Nguyen & Astington, 2014; Gordon, 2016; see also Bialystok & Senman, 2004; Greenberg et al., 2013), and parallel results have been reported with bilingual adults (Rubio-Fernández & Glucksberg, 2012). These studies discuss a number of experiential factors that could boost bilinguals’ understanding of false belief, such as having greater metalinguistic awareness (e.g. understanding at an earlier age that a concept can have different labels) or an enhanced sensibility to differences in perspective (e.g. appreciating that not all speakers share the same language). If these aspects of the bilingual experience have an effect on Theory of Mind development, bilinguals’ early success in FB tasks could be taken to support the competence view of FB reasoning. That is, the bilingual experience would allow young bilingual children to have an earlier understanding of mental states than monolingual children of the same age, hence boosting their performance in FB tasks.

However, since bilingual children have shown greater inhibitory control than monolinguals in EF tasks (Bialystok, 1999, 2001), their better performance in the above FB studies has also been interpreted as a possible effect of their better inhibitory control. This latter interpretation is in line with performance accounts of Theory of Mind development, according to which monolingual 3 year olds fail standard FB tasks because of their immature inhibitory control. While both hypotheses about bilingual children’s advantage in FB tasks have been discussed in the literature (i.e. that they may have an earlier understanding of belief vs. a better EF), the dominant view is that their enhanced inhibitory control allows them to pass these tasks earlier. The inhibitory control view has probably become the dominant view because it has received support from a number of experimental studies. However, it must be noted that the experimental record has not disconfirmed the alternative hypothesis that bilingual children may have a better understanding of other people’s perspectives, independently from their enhanced EF.

Kovács (2009) tested the competence versus performance accounts of Theory of Mind development in a FB study with monolingual and bilingual children. For this purpose, she used a standard change-of-location task and a modified FB task that was set up in a language-switch scenario. Children in the latter task had to predict where a child protagonist would go to get ice-cream (i.e. to an ice-cream vendor or to a sandwich vendor) after she heard the ice-cream vendor say that he had run out of ice-cream but that the sandwich vendor still had some. Crucially, the protagonist in the story was a monolingual speaker of Romanian who did not understand the ice-cream vendor because he spoke Hungarian. The protagonist could therefore not use the ice-cream vendor’s message to correct her false belief that she could get ice-cream from the ice-cream vendor.

According to performance accounts of Theory of Mind development, bilingual children’s enhanced inhibitory control should allow them to outperform monolingual children in both the standard and the modified FB tasks (since both require inhibiting the TB response). In contrast, according to competence accounts, the bilingual experience should give bilingual children an advantage in the modified FB task (which was set up in the kind of language-switch scenario that bilingual children regularly experience) but not in the standard FB task. As predicted by the performance accounts, the 3-year-old bilinguals in Kovács (2009) outperformed their monolingual peers in both types of task.

It must be noted, however, that the modified FB task used by Kovács (2009) may have actually been easier for the monolinguals than for the bilinguals. Since the monolingual children in the study also spoke Romanian, they required a translation of what the ice-cream vendor said, which may have helped them take the protagonists’ perspective. In contrast, the bilingual children spoke both Romanian and Hungarian and therefore had to imagine (rather than experience) what it would be like not to understand what the ice-cream vendor said. A more balanced test would have involved a third language that neither of the two groups spoke, and such a modified FB task might have revealed an even greater bilingual advantage, which could be taken to support the competence accounts of Theory of Mind development.

In this review, I will challenge the view that bilinguals’ better performance in FB tasks is due to their better inhibitory control, and instead propose that it results from their more effective attention management. In defending this view, I will move away from Miyake et al.’s (2000) model and focus instead on a different aspect of EF. It must be noted, however, that, even if attention management is not a ‘component’ in Miyake et al.’s tripartite model, it is generally considered as part of EF and as such has been investigated in connection with bilinguals’ enhanced EF (e.g. Bialystok & Martin, 2004; Carlson & Meltzoff 2008; Colzato et al., 2008; Grundy & Bialystok, 2015).

Regarding the debate between competence and performance accounts of Theory of Mind development, this review will be mainly concerned with performance accounts, since the argument will be that bilingual children’s enhanced attention management helps them pass standard FB tasks at an earlier age. However, as I have pointed out before, the current experimental record does not rule out the alternative hypothesis that bilingual children may have a perspective-taking advantage, and possibly an earlier understanding of mental states, which could be independent from their enhanced EF. This alternative hypothesis will be explored in the last section of the paper as a new avenue for experimental research.

My challenge of the view that the bilingual FB advantage is due to their enhanced inhibitory control ties in with two independent lines of research. First, it offers a re-interpretation of bilinguals’ better FB performance in view of current studies on the role of attentional processes in FB tasks (Rubio-Fernández 2013, 2015a, b; Rubio-Fernández & Geurts, 2013, 2015). Second, it puts bilinguals’ better FB performance in line with recent findings on their better performance in EF tasks (Costa et al., 2008; Martin-Rhee & Bialystok, 2008; Bialystok, 2010, 2015; Grundy & Bialystok, 2015). I will start by examining children’s focus of attention in standard FB tasks, and then challenge the general assumption that passing this type of task requires inhibiting a prepotent response.

Children’s focus of attention in standard FB tasks

Three-year-old children normally perform below chance level in standard FB tasks (Wellman et al., 2001), showing a reliable preference for the TB response. Rubio-Fernández and Geurts (2013, 2015) have recently shown that 3 year olds are able to pass a standard FB task named ‘the Duplo task’, with a success rate of 80 %. The protocol for the Duplo task was a variation on the Sally-Anne task described in the Introduction. The experimenter used a set of Duplo toys (i.e. large Lego toys for small children) that she had on a table: a girl figure called Lola, a bunch of bananas, and two little cupboards. As in the standard task, Lola puts her bananas in one of the containers and leaves the scene. In the remainder of the task, only two sets of variations were introduced to the original paradigm, both intended to help the child stay focused on Lola’s perspective.

The first set of variations was introduced in the displacement phase of the task. First, it was ensured that the child could see Lola throughout the session. Rather than making the figure disappear, as is standardly done in change-of-location tasks, the experimenter made Lola walk in the direction of the child and turn her back on the scene. Also, rather than introducing a second character in the story, which might have resulted in the child losing track of the protagonist’s perspective, it was the experimenter herself who moved the bananas from one cupboard to the other. Finally, before and after the experimenter moved the bananas, she checked with the child whether Lola could see the experimenter from where she was: ‘Can Lola see me from over there?’ / ‘Lola hasn’t seen what I did, has she?’ The aim of these prompts was to keep the child’s attention focused on Lola during the displacement.

The second set of task variations was introduced in the test phase. When the experimenter returned Lola back to the centre of the scene, rather than asking the child the standard FB question, she placed Lola in front of the two cupboards and asked the child whether he would like to play with Lola and continue the story. The experimenter then encouraged the child to take the lead by saying: ‘What happens next? What is Lola going to do now?’

The results of Rubio-Fernández and Geurts (2013, 2015) showed that both manipulations are critical for young children’s success in a standard FB task. Thus, making Lola disappear from the scene while the bananas are transferred to the other container (a normal manipulation in change-of-location tasks) has a negative effect on 3 year olds’ performance relative to making Lola walk away and turn her back on the scene. Likewise, mentioning the bananas in the test question (e.g. ‘Where will Lola look for her bananas?’ vs. ‘Where will Lola go now?’) draws children’s attention to the wrong response, with negative results. Rubio-Fernández and Geurts (2013, 2015) interpret these findings as evidence that perspective tracking is a continuous process that requires focusing on an agent throughout a series of events and therefore depends on attentional resources. This dependence on attentional resources makes perspective tracking susceptible to disruption by task manipulations, especially in young children.

It is worth noting that Rubio-Fernández and Geurts (2013, 2015) investigated features of task design that may hinder young children’s performance in FB tasks, but did not assume that 3-year olds pass the Duplo task by mentally representing the protagonist’s false belief. Alternative explanations based on low-level associative processes can also account for the results of these studies (for discussion, see Rubio-Fernández and Geurts, 2015). Therefore, the results of Rubio-Fernández and Geurts (2013, 2015) may in principle be taken to support performance accounts of Theory of Mind development, but the empirical evidence is not conclusive.

The relative salience of the two responses to a FB question

The results of Rubio-Fernández and Geurts (2013, 2015) suggest that mentioning the target object in the test phase increases the salience of the wrong response. For example, in one version of the Duplo task, the experimenter asked the child ‘What happens next? What is Lola going to do now?’, and the majority of 3 year olds continued the story by taking Lola to the empty container. In contrast, in the TB control, all children took Lola to the container with the bananas, suggesting that in both conditions Lola’s goal was to fetch the bananas. However, in another version of the Duplo task, the experimenter mentioned that Lola was hungry and wanted a banana, just before asking the child the same open questions. In this FB condition, the majority of children took Lola to the actual location of the bananas, thus failing the task.

The importance of the salience of the wrong response in FB tasks has also been observed with unexpected-contents tasks, in which only the wrong response is physically present in the setting. For example, in the Smarties task, there are no Smarties chocolates in the scene, only pencils. Early studies have shown that physically representing the two possible responses to an unexpected-contents task improves performance in younger groups, probably because it reduces the salience of the wrong response (Mitchell & Lacohée, 1991; Freeman & Lacohée, 1995).

The salience of the wrong response in standard FB tasks speaks to three sets of findings in the Theory of Mind literature. First, children under 4 perform at chance (rather than below chance) in unknown-location FB tasks in which the object is removed from the scene (Wimmer & Perner, 1983; Bartsch, 1996). Devine and Hughes (2014) have recently challenged a performance account in this connection: if Baillargeon et al. (2010) are right and young children fail standard FB tasks because the object’s actual location is a prepotent response to the test question, why is it that young children do not perform above chance level in unknown-location tasks? It is unclear what the prepotent response would be in those FB tasks since the child does not know where the object is. Moreover, without a prepotent response, it is unclear why passing unknown-location tasks would require response inhibition.

Rubio-Fernández and Geurts (2015) suggested an alternative explanation to this puzzle in view of their results with the Duplo task: in an unknown-location task, mentioning the target object in the FB question draws children’s attention towards the missing object and away from the protagonist, hence disrupting the process of perspective tracking. When this happens, the original location of the object (corresponding to the protagonist’s perspective) stops being in the child’s focus of attention. Thus, by making children focus on an object that has been removed from the scene, the standard FB question leaves 3 year olds to choose randomly between the two containers because the object is in neither.

The relative salience of the two possible responses to a FB question is also relevant to the interpretation of those Theory of Mind studies that have found that 3-year-old children are able to pass a change-of-location FB task provided they are asked where the protagonist will look first for the target object (Siegal & Beattie, 1991; Yazdi et al., 2006). According to this performance account, the reason why young children fail standard FB tasks is because they do not understand the point of the test question (e.g. ‘Where will Sally look for her marble?), and interpret it as a question about what the protagonist should do in the situation.Footnote 1

Like Siegal and Beattie (1991) and Yazdi et al. (2006), other researchers have pointed out that the standard FB question is pragmatically infelicitous in the context of the task (e.g. Hansen, 2010; Rubio-Fernández and Geurts, 2013; Helming et al., 2014). However, Siegal and Beattie’s assumption that 3 year olds pass the ‘look first’ version of the FB task because they derive a pragmatically rich interpretation of the test question is not unproblematic. For an adult participant, the question ‘Where will Sally look first for her marble?’ presupposes that (1) Sally will have to look for the object a second time because (2) she will not be able to find it the first time around. Whether 3-year-old children are able to derive such pragmatic inferences from the experimenter’s use of the adverb ‘first’ is far from obvious.

Even if 3-year-old children only had a shallow understanding of what ‘look first’ implies, they may nonetheless be able to perform better with this test question. Thus, the mention of the target object would draw children towards the wrong response, but the use of ‘first’ may allow them to consider the alternative response. In order to control for such ‘false positives’, Siegal & Beattie (1991) and Yazdi et al. (2006) used a TB condition on the assumption that, if children simply selected the alternative response to the object’s current location, then they should fail the ‘look first’ question in the TB condition. However, the 3 year olds in Siegal and Beattie (1991) and Yazdi et al. (2006) were able to pass this control task.

While the rationale for the use of a TB control is sound, the FB and TB conditions in Siegal & Beattie (1991) and Yazdi et al. (2006) were not comparable in all crucial respects. In the FB condition, the following story was acted out for the children using a girl and a kitten figures in a wooden house: ‘Jane wants to find her kitten. Jane thinks her kitten is in the kitchen. Jane’s kitten is really in the bathroom. Where will Jane look (first) for her kitten?’ The TB narrative was as follows: ‘Jane wants to find her kitten. The kitten lives in two rooms: the garage and the lounge. Jane thinks her kitten is in the garage and now it really is in the garage. Where will Jane look (first) for her kitten?’

Since all versions of the test question mentioned the kitten, the children’s attention would have been drawn to the kitten’s current location in all conditions (Rubio-Fernández & Geurts, 2015; Rubio-Fernández, 2015b). However, prior to the test question, the FB narrative made a clear contrast between the two possible responses to the question, potentially allowing the 3 year olds to select the alternative response in the ‘look first’ condition, even without a pragmatic enrichment of the test question. Unlike in the FB condition, the TB narrative did not highlight the contrast between the two possible responses prior to the test question, and merely highlighted the kitten’s current location, potentially priming this response in both the ‘look first’ and the standard FB question.

In a recent study using continuous eye-tracking during the processing of an indirect FB question (‘Let’s see where Martin comes out for his ball’), Rubio-Fernández (2015b) observed that 3-year-old children were able to correctly anticipate the protagonist’s return (see also Clements & Perner, 1994; Ruffman et al., 2001), but the mention of the target object in the question had a disrupting effect on the 3 year olds, but not on the 5-year-old children in the control group. Future eye-tracking studies should monitor children’s eye movements during the processing of standard and ‘look first’ questions in FB and TB narratives that control for the relative salience of the two responses, as this will give us a better understanding of the effect of the ‘look first’ question in 3 year olds’ FB reasoning.

Thirdly and finally, the relative salience of the wrong response in standard FB tasks also speaks to the view that inhibitory control is required to pass these tasks. Helming et al. (2014) have recently challenged Baillargeon et al. (2010) in this regard: if these authors are correct and the actual location of the object is a prepotent response in FB tasks, there is no principled reason why 3-year-old children should be able to pass the Duplo task (Rubio-Fernández and Geurts, 2013, 2015), since children in this FB task know where the object was hidden.

Supporting the view that passing standard FB tasks requires response inhibition, a large number of studies have found a correlation between children’s performance in FB and inhibitory control tasks (e.g. Carlson & Moses, 2001; Carlson et al., 2002; Benson et al., 2013). The results of these studies are taken to support the performance accounts of Theory of Mind development. However, because most of these studies used standard FB tasks in which the incorrect response is normally more salient than the correct response, the need for inhibitory control in passing FB tasks is likely to have been artificially increased by features of task design. It is therefore an open empirical question whether the correlation between FB reasoning and inhibitory control would hold if modified FB tasks were used that controlled for the relative salience of the erroneous response in the test phase (e.g. Rubio-Fernández and Geurts, 2013, 2015; Rhodes & Brandone, 2014).

This is an important challenge on both methodological and theoretical grounds, since the correlation between FB and EF tasks has been taken beyond the specific paradigms used in the studies, and as general evidence that EF development sustains Theory of Mind development (for discussion, see Perner & Lang, 1999). If that is indeed the case, then the correlation between the two types of task ought to hold regardless of features of task design such as the relative salience of the erroneous response.

Why do bilingual children perform better in FB tasks?

Since most of the Theory of Mind studies that compared bilingual and monolingual children used standard FB tasks (Goetz, 2003; Berguno & Bowler, 2004; Kovács, 2009; Nguyen & Astington, 2014, Gordon, 2016; but cf. Bialystok & Senman, 2004; Kovács, 2012), the bilingual children in these studies may not have necessarily shown better inhibition of the dominant TB response, as is generally assumed. Because standard FB tasks do not ensure that the protagonist’s perspective is salient throughout the narrative or that the two responses to the test question are equally salient, what bilingual children may have shown is a better ability to stay tuned to the protagonist’s perspective when task manipulations extraneous to FB reasoning (e.g. making the protagonist disappear from the scene) disrupt the process of perspective tracking in monolingual children.

In line with this hypothesis, Nguyen and Astington (2014) have recently reported that bilingual children’s FB performance is not predicted by their performance in a conflict-inhibition task. Instead, FB performance correlated with children’s working-memory capacity, as measured by the backward word-span task (which requires both holding in mind and manipulating information). These results are compatible with those found by Carlson and Meltzoff (2008), who reported that bilingual children did not differ from monolingual children in their response inhibition (i.e. in tasks requiring control over competing responses), but did show an advantage in tasks requiring working memory and interference suppression (i.e. complex tasks that require control over attention to competing cues). Likewise, Bialystok and Martin (2004) reported that bilingual children performed better than monolingual children in a dimensional change card-sort task that required inhibition of attention to an obsolete representation, but were not generally better at inhibiting a prepotent response.

Recent studies on bilingualism and EF have shown that bilingual adults do not differ from their monolingual peers in terms of active inhibition but have a better ability to maintain action goals and select goal-relevant information from competing, goal-irrelevant information (Colzato et al., 2008). More specifically, recent studies have shown that bilingual adults perform better on EF tasks that require disengaging one’s attention from distractor cues, as measured by ‘sequential congruency effects’ or the magnitude of conflict generated by a previous trial (Grundy & Bialystok, 2015; see also Mishra et al., 2012). Assuming that bilingual children also make more effective use of their attentional resources (Bialystok & Martin, 2004; Carlson & Meltzoff, 2008), they should be less affected by the mention of the target object in the FB question, for example, thus being better able to revert to the protagonist’s perspective after a momentary distraction.

Kovács (2012) discusses a study that supports this hypothesis: bilingual children outperformed monolingual children in a standard FB task but not in a modified task in which the target object had been removed from the scene. These results suggest that young bilingual children may be less distracted by the mention of the target object in the test question of standard FB tasks.

Bilingual adults also perform better in a standard FB task

Because the Sally-Anne task was originally designed for preschool children, neurotypical adults perform at ceiling in this task. However, when Rubio-Fernández and Glucksberg (2012) combined the Sally-Anne task with an eye-tracking technique that allowed them to measure first-fixation accuracy, delay of first fixation on target and response times, they found that adults suffered from an early TB interference when processing the FB question. Specifically, adults showed a general tendency to first look at the container that hid the object before switching their attention to the empty container and giving the correct response. Bilingual adults suffered less interference than did monolinguals, with rates of accurate first-fixation of 0.57 and 0.26, respectively.

Ryskin et al. (2014) have argued that the results of Rubio-Fernández and Glucksberg (2012) are difficult to interpret because of known delays in bilingual linguistic processing: “At the time when monolinguals were interpreting the critical test question that queried their understanding of false-belief, bilinguals may have been processing an earlier part of the sentence that mentioned the target object and this may have guided their eye fixations, rather than better understanding of false belief” (pp.47-48, authors’ emphasis; original source: Ryskin, 2012:5). Since Rubio-Fernández and Glucksberg (2012) did not specify in their paper what their participants had heard prior to the test question, this criticism is entirely speculative. As it turns out, the prior mention of the target object was 6 s before the critical test question and, as Ryskin and colleagues admit at the end of their paper, the bilingual processing delays that have been reported in the literature are ‘subtle’ (2014:66). Since bilingual speakers should not need an extra 6 s to process the sentence preceding the test question, Ryskin et al.’s (2014) criticism of Rubio-Fernández and Glucksberg (2012) is unfounded.

Rubio-Fernández (2013) used an extended version of the paradigm devised by Rubio-Fernández and Glucksberg (2012) to investigate the demands of direct and indirect FB tasks on monolingual adults (i.e. whether having to answer a test question, as opposed to simply listening to a FB narrative, has an effect on performance).Footnote 2 As in the original bilingual study, Rubio-Fernández (2013) used a ‘visually disrupted narrative’ (see Table 1) in which the containers momentarily disappeared from the scene prior to the protagonist’s return. This manipulation was designed for a more accurate measure of first-fixation direction (i.e. to prevent participants from fixating on one of the containers prior to the test question). However, the results of Rubio-Fernández (2013) showed that the visual disruption of the scene had a negative effect on adults’ performance, both in the direct and indirect versions of the Sally-Anne task. In fact, when the containers were left on the scene throughout the test phase, monolingual adults did not suffer from a TB interference, contrary to what Rubio-Fernández and Glucksberg (2012) had observed.

Table 1 Cartoon slides corresponding with the test phase of the Sally-Anne task used by Rubio-Fernández and Glucksberg (2012) with monolingual and bilingual adults (Visually Disrupted – Direct Test), by Rubio-Fernández (2013) with monolingual adults (Visually Disrupted/Continuous × Direct/Indirect Test) and by Rubio-Fernández (2015a) with monolingual adults (Visually Disrupted/Continuous – Direct Test)

These new findings were interpreted as evidence that the two containers in a change-of-location FB task represent two different perspectives on the location of the object: the outdated (corresponding with the protagonist’s) and the updated (corresponding to the participant’s). Since both representations of the object compete for attention during language processing (Altmann & Kamide, 2009), adults momentarily fell back on their own perspective when their focus on the protagonist’s was disrupted by the sudden disappearance of the containers. Rubio-Fernández (2013) concluded that perspective tracking is dependent on attentional resources and can therefore be disrupted by subtle task manipulations, even in adults.

This conclusion was further supported by a recent study that replicated the original findings with monolingual adults (Rubio-Fernández, 2015a). Crucially, when participants were first habituated to the momentary disappearance of the containers in a TB trial, they did not show a disruption of their perspective tracking when processing the FB question in the second trial. These results confirm that the ‘egocentric bias’ observed with this experimental paradigm results from a disruption of the participants’ focus of attention on the protagonist.

Bringing together bilinguals’ better performance in FB and EF tasks

Given these new findings, what should we conclude about the source of bilinguals’ better FB performance in Rubio-Fernández and Glucksberg (2012)? The original results were interpreted as evidence for an egocentric bias in adult social cognition (see, e.g., Keysar et al, 2000, 2003). This interpretation was in line with performance accounts of Theory of Mind development, which posit that adults suffer from a true-belief default in FB reasoning (e.g. Leslie et al., 2005), or that the TB response in FB tasks is a dominant response that needs to be actively inhibited (e.g. Carlson & Moses, 2001; Baillargeon et al., 2010). In a recent investigation of these performance accounts, Rubio-Fernández (2015a) used a lingering-inhibition measure in a standard FB task with monolingual adults and showed that these participants passed the task without inhibiting the TB response. In contrast, response inhibition was observed in the control condition, which used a negated question (‘Where isn’t Sally’s marble?’) and did reveal inhibition of the positive response.

What the results of Rubio-Fernández (2013, 2015a) suggest is that the bilingual participants in Rubio-Fernández and Glucksberg (2012) were not better at inhibiting the TB response, but were at staying focused on the protagonist’s perspective when the containers disappeared from the scene. This interpretation is still in line with performance accounts of FB reasoning, but points at a different aspect of EF as responsible for bilinguals’ better performance in FB tasks. The re-interpretation of bilinguals’ FB advantage as a result of their enhanced attention management parallels the re-interpretation of bilinguals’ better performance in EF tasks that has been proposed by Ellen Bialystok, among others, in view of recent studies on bilingual cognition.

It had originally been hypothesised that speaking more than one language required suppressing all but the currently selected language (Green, 1998; Bialystok, 2001). More recent studies, however, have revealed that bilinguals activate information about both languages when using one language alone (for a review, see Kroll & Bialystok, 2013). Since bilingual language production requires constant monitoring of the target language in order to minimise interference from the competing language, bilinguals’ EF is strengthened over time (Bialystok, 2010; Bialystok & Craik, 2010). However, even so, studies of EF have not revealed a bilingual advantage that is specific to inhibitory control, or any other single component of EF (for reviews, see Hilchey & Klein, 2011; Bialystok, 2015).

According to Bialystok, “the bilingual advantage is not in inhibition; rather it is the failure of bilinguals to inhibit attention to the non-target language that leads to the involvement of executive function and the eventual consequences for its development and function” (2015:4; original emphasis). This view explains why bilinguals and monolinguals often perform comparably in simple tasks tapping a single component of EF, whereas bilinguals tend to outperform monolinguals in complex tasks tapping broader reasoning abilities and often including conflicting information (Bialystok, 2015). In line with this interpretation of how the bilingual experience affects EF, recent studies have shown that bilinguals outperform monolinguals in tasks that require interference suppression (i.e. managing conflicting attentional demands) but not in tasks requiring response inhibition (i.e. control over conflicting responses; see, e.g., Colzato et al., 2008; Costa et al., 2008; Martin-Rhee & Bialystok, 2008; Bialystok, 2010; Luk et al., 2010). Moreover, a recent study has shown that the advantage observed in these tasks extends to sentence processing, as shown by bilinguals’ better syntactic ambiguity resolution (Teubner-Rhodes et al., 2016).

The role of attentional processes in bilinguals’ enhanced performance in EF tasks is highlighted by studies with bilingual-to-be infants, who show a greater ability to switch responses in visual-orientation tasks than infants exposed to a single language (Weikum et al., 2007; Kovács & Mehler, 2009a, 2009b; Sebastián-Gallés et al., 2012). Infants as young as 4 months are able to discriminate between two similar languages (Bosch & Sebastián-Gallés 1997, 2001), and Kovács (2012) argues that these early discriminatory abilities suggest that bilingual-to-be infants start switching attention between the languages they are exposed to well before they acquire these languages and must start code switching in their language production. Along similar lines, Bialystok (2015) argues that the bilingual experience may change the way attentional resources are deployed from a very early age, with bilingual-to-be infants attending more carefully to subtle differences in their environment (e.g. differences between the phonology, prosody, vocabulary and syntax that characterise the different linguistic systems to which they are exposed).

In summary, Bialystok (2015) (see also Martin-Rhee & Bialystok, 2008; Bialystok, 2010; Luk et al., 2010) has recently re-interpreted bilinguals’ better performance in complex EF tasks as evidence for their more effective attention management, rather than an advantage in inhibitory control. This interpretation of the EF data supports my re-interpretation of bilinguals’ better FB performance as a result of bilinguals’ greater ability to resist distraction in those tasks.

Could bilingualism help Theory of Mind development?

The main debate in the last 30 years of Theory of Mind research has been whether children under 4 fail standard FB tasks because they do not yet have a concept of false belief, or because they lack the necessary EF to inhibit the TB response. In this review, I have challenged the view that passing FB tasks requires inhibiting the TB response and argued instead that bilinguals’ better FB performance results from a better management of their attentional resources. In this view, bilinguals’ enhanced performance in FB tasks is simply a by-product of their enhanced EF. However, this view is based on a detailed analysis of the task designs that have been used to investigate FB reasoning (Rubio-Fernández & Geurts, 2013, 2015), and is not incompatible with the possibility that other aspects of the bilingual experience may yield further gains in social cognition.

Bilinguals’ better FB performance could in principle be related to both aspects of FB reasoning since the bilingual experience is likely to result in a complex Theory of Mind advantage (for discussion, see Goetz, 2003; Berguno & Bowler, 2004; Kovács, 2009, 2012; Rubio-Fernández & Glucksberg, 2012 Fan et al., 2015; Gordon, 2016). For example, bilinguals’ early sociolinguistic awareness of their interlocutor’s language background (Genesee et al., 1995; Petitto et al., 2001; Comeau et al., 2007) may allow young children to appreciate that other people’s perspectives can be different from their own at an earlier age than monolingual children. Therefore, while the argument that bilinguals benefit from more effective attention management in FB tasks could be taken to support performance accounts of Theory of Mind development, compatible evidence that other aspects of the bilingual experience boost bilinguals’ perspective-taking abilities would support competence accounts of Theory of Mind development.

Even though it has often been discussed in the literature that the bilingual experience may facilitate Theory of Mind development by presenting the child with more opportunities for perspective taking, this hypothesis has not yet been tested independently from bilinguals’ enhanced EF. This is unfortunate for two reasons, the first being that competence accounts of Theory of Mind development deserve a fair test in bilingual cognition research, or as fair a test as that of performance accounts. The second reason is that, by investigating other aspects of bilingual cognition that may not be so directly dependent on their enhanced EF (e.g. their pragmatic ability or their perspective-taking skills in communication), our understanding of the bilingual experience and its cognitive effects would also be broadened. As a first step in addressing these issues, in the remainder of the paper I will discuss some possible avenues for future research on bilinguals’ Theory of Mind development.

In order to investigate whether (or to what extent) bilingualism may help Theory of Mind development independently from EF, future studies comparing bilingual and monolingual children should use FB tasks that rely less heavily on EF. The kind of non-verbal FB tasks that have been used with infants (e.g. Southgate et al., 2007; Kovács et al., 2010; Senju et al., 2011) and the implicit Theory of Mind tasks that have been used with children (i.e. eye-tracking FB tasks without a test question; e.g. Clements & Perner, 1994; Ruffman et al., 2001; Rubio-Fernández, 2015b) would in principle be a good test case for the hypothesis that bilingualism may help Theory of Mind development.

According to Apperly and Butterfill (2009), infants as young as 7 months of age are able to pass non-verbal FB tasks by relying on a cognitively efficient but inflexible capacity for tracking belief-like states. Because this early Theory of Mind system is not supposed to be dependent on domain-general executive processing capacities, young bilingual children may reveal an advantage in implicit FB tasks that is due to their advanced perspective-taking abilities (rather than to their enhanced EF). It must be noted, however, that Schneider et al. (2012) have recently found that adults’ performance on an implicit FB task was affected by cognitive load, contrary to what Apperly and Butterfill (2009) would predict. More research is therefore needed to elucidate the extent to which implicit FB reasoning relies on EF.

Having said that, it seems safe to assume that implicit FB tasks are less dependent on EF than standard FB tasks because early eye-tracking studies have shown that 3 year olds are able to correctly anticipate the behaviour of a mistaken character in a FB narrative, while giving the incorrect response to the test question (Clements & Perner, 1994; Ruffman et al., 2001; cf. Rubio-Fernández, 2015b). Future studies should therefore compare the performance of bilingual and monolingual children on implicit FB tasks, as a bilingual advantage in this type of task could reveal a more specific effect of bilingualism on Theory of Mind development.

Enhanced perspective-taking abilities may also lead to a pragmatic advantage in bilingual speakers. For example, Siegal et al. (2009, 2010) have observed that bilingual children between the ages of 3 and 6 have a better conversational understanding than their monolingual peers, as indicated by their greater sensitivity to violations of conversational maxims (i.e. speakers must be informative while avoiding redundancy, and they must speak the truth and be relevant and polite; Grice, 1975). Such a pragmatic advantage is unlikely to be related to bilingual children’s enhanced EF, and is more likely to result instead from their more sophisticated perspective-taking abilities.

A well-known perspective-taking task is the Director task, in which a participant follows the instructions of a confederate to move various objects in a vertical grid of squares. The confederate sits on the other side of the grid and cannot see all the objects because some of the cells are occluded on her side. Crucially, the confederate is supposed to be ignorant about the contents of those cells, and when she asks the participant to ‘move the small candle’, for example, the smallest of three candles is only visible to the participant. Over a long series of studies, participants have shown a tendency to consider the smallest candle before reaching for the medium-sized one, sometimes even reaching for the smallest candle in their privileged view (e.g. Keysar et al., 2000, 2003; Barr, 2008; but cf. Hanna et al., 2003; Heller et al., 2008).

Wu and Keysar (2007) have reported that participants from collectivistic cultures perform better in the Director task than participants from individualistic cultures, as they suffer less interference from their own perspective when processing the Director’s instructions. There is, however, a potential confound in the study by Wu and Keysar: their participants with a collectivistic background were bilingual Chinese students from the University of Chicago, thus potentially performing better than the American students because of their enhanced EF. It has indeed been shown that performance in the Director task is dependent on EF as participants have to selectively focus their attention on the objects that the Director can see on the grid (Brown-Schmidt, 2009; Lin et al., 2010; Symeonidou et al., 2016; Rubio-Fernández, 2016).

Wu et al. (2013) re-analyzed the eye-tracking data from Wu and Keysar (2007) and found that the bilingual Chinese students suffered a similar interference from their own perspective as the American students. However, the bilingual students were better at correcting this interference later in their language processing. This re-analysis supports the view that the results of Wu and Keysar (2007) reveal an EF advantage of bilingual participants in their sample, and not necessarily a difference in perspective between East Asian and Western cultures, as claimed by Wu and Keysar (2007) and Wu et al. (2013).

Supporting this re-interpretation of the results of Wu and Keysar (2007), Fan et al. (2015) found that both bilingual children and children exposed to a multilingual environment who were not bilingual themselves performed better than monolingual children in the Director task. The better performance of children exposed to a multilingual environment relative to monolingual children suggests that learners of a second language may also reveal a similar advantage. Sullivan et al. (2014) have recently observed that even early-stage second-language learning improves EF in university students, and Bialystok and Barac (2012) report that the time spent in a language immersion program predicts children’s performance in non-verbal EF tasks. Future studies should investigate what degree of exposure to a second language is necessary in order to show a performance advantage in the Director task.

However, as in the case of standard FB tasks, bilinguals’ pragmatic abilities should also be investigated in perspective-taking tasks that do not rely so heavily on EF. One such task was recently proposed by Rubio-Fernández (2016), who designed a new version of the Director task that measures whether participants are able to update their representation of the speaker’s perspective during the task (rather than whether they can inhibit their own perspective to avoid interference). This type of task could in principle reveal a bilingual advantage in perspective taking during referential communication, which would be independent from their EF.


Addressing now the question in the title of the paper, I have argued that bilinguals outperform monolinguals in FB tasks because they are more efficient at managing their attentional resources. In this sense, both bilingual children and adults appear to be less susceptible to distraction in FB tasks and therefore more able to keep track of the protagonist’s perspective during a FB narrative. While bilinguals’ enhanced attention management helps them succeed in FB tasks, the possibility that the bilingual experience may also lead to other gains in social cognition development has not yet been explored experimentally. Future studies will hopefully investigate these two hypotheses and give us a better understanding of the effects of bilingualism on Executive Function, Theory of Mind and the relationship between the two.