Introduction

Research investigating the possibility that bilingualism modifies cognitive function and brain structure has increased rapidly in recent years. A slight increase in citations to work on “bilingualism” between 2000 and 2010 turned into a steep incline that continues to the present. Part of the reason for the increased interest in this research is the controversy that has arisen around some of its central claims, creating a lively debate in the literature. The debate centers on whether the ongoing experience of managing two languages is associated with improved performance on a set of nonverbal cognitive tasks that are typically described as involving executive functions (EFs). There are many reviews of this research (Antoniou, 2019; Baum & Titone, 2014; Bialystok, 2017), and meta-analyses that endorse both positive (Adesope et al., 2010; Grundy, 2020; Grundy & Timmer, 2017; van den Noort et al., 2019) and null (Donnelly et al., 2019; Lehtonen et al., 2018) results. In all these meta-analyses, there is a small but significant effect size, usually around 0.20, that researchers are rightly cautious to accept as evidence for better bilingual performance, but this effect size is similar to that found for the effect of physical exercise on cognitive outcomes, typically between 0.10 and 0.25 (Chang et al., 2012; Etnier et al., 1997), an effect that is not considered to be controversial. Others have argued that the controversy can be decided by large data sets, but here, too, the evidence falls on both sides. Nichols and colleagues (Nichols et al., 2020) reported that in a sample of over 11,000 adults performing a variety of online cognitive tasks, there was no difference between those who claimed to speak more than one language and those who reported speaking only one language; Dick et al. (2019) reported a similar outcome in an analysis of a database of around 4,500 children. In contrast, in a study of over 18,000 children performing executive tasks, those from bilingual homes outperformed those from monolingual homes (Hartanto et al., 2019). Despite several commentaries that have attempted to reconcile the conflicting outcomes (Bak, 2016; Bialystok, 2016; Valian, 2015), what the debate makes clear is that the issue is complex and multifaceted, and a simple binary conclusion is unwarranted. Given the rich body of evidence that reveals positive effects of bilingualism, there must be some relation between bilingualism and these behavioral outcomes, but the evidence documenting equivalent performance between language groups in many studies suggests that other factors modulate these effects.

The debate is currently deadlocked over the question of which results are more valid – those that show positive effects of bilingualism or those that show no difference between groups – with new evidence for each side being added regularly. If there were truly no relation between bilingualism and cognitive performance and the positive findings were Type 1 errors as has been suggested (Paap & Greenberg, 2013), then there should be a similar number of false positives in which monolinguals outperformed bilinguals on nonverbal tasks. To our knowledge such cases are extremely rare, suggesting strongly that there is a positive effect of bilingualism that needs to be understood. Others have argued that null results have been underestimated because of publication bias that favors positive results (de Bruin et al., 2015), but that argument speaks to the ratio of positive and null results, not to the validity of the positive ones. Even if positive results are over-represented in the empirical record, they require an explanation. Furthermore, a large portion of these studies use conflict tasks such as Stroop, flanker, and Simon in which the dependent variable is the reaction time difference between congruent and incongruent trials. However, as Draheim et al. (2019) point out, such scores are not appropriate for correlational studies in which participants are not randomly assigned to groups because the correlation between the two scores decreases the reliability of the difference between them. In this way, the absence of group differences in these scores is in part attributed to the low reliability of the measure.

One well-established difference between bilingual and monolingual groups is that monolinguals have superior verbal knowledge, at least as measured in one of the bilinguals’ languages, and so typically outperform bilinguals on verbal tasks. This situation may result in a trade-off between verbal knowledge and attentional control such that monolinguals show superior performance on a verbal processing task, but bilingual participants are superior on a similar task constructed with nonverbal materials (Luo et al., 2013). There is little controversy about these findings.

Resolution of the contradictory findings for nonverbal cognitive tasks is important, however, because the implications of positive effects have great consequence. Evidence for better performance on attention tasks by infants raised in bilingual homes in the first year of life (Kovacs & Mehler, 2009) sets the stage for differences in subsequent developmental trajectories; precocious performance on EF tasks by school-aged bilingual children (Barac et al., 2014) may have longer-term implications given that EF is related to academic success and lifelong well-being . There is also evidence that older bilingual adults maintain cognitive levels better than monolinguals and show symptoms of dementia several years later than monolinguals (Alladi et al., 2013; Bialystok et al., 2007; Woumans et al., 2015), a delay that creates more time for independent living and reduces health-care costs. Because of these far-reaching implications, it is important to clarify the nature of the effect of bilingualism on cognitive performance as well as detailing the preconditions and limitations for those effects.

Why should bilinguals show a cognitive benefit?

Why should bilingualism be associated with enhanced cognitive control? A large body of psycholinguistic research has shown that both languages are always active in the bilingual brain, despite the absence of any conscious awareness of the non-used language (Costa et al., 1999; Francis, 1999; Kroll et al., 2014; Marian & Spivey, 2003; Wu & Thierry, 2010). Because bilinguals rarely commit intrusion errors from the unwanted language, inhibitory control seemed to be an obvious mechanism for excluding the non-target language from ongoing processing (Liu et al., 2016; Martin-Rhee & Bialystok, 2008; Misra et al., 2012; Philipp & Koch, 2009). Evidence from brain imaging demonstrated that overlapping networks were used for language selection and nonverbal selection (review in Wong et al., in press). On the assumption that lifelong bilinguals have had many years of flexibly deploying inhibitory control in language processing, and that these processes are at least partly shared with nonverbal cognitive networks, the interpretation has been that inhibitory processes are strengthened in such individuals. The further step in the argument was that this mode of control then generalizes in bilinguals to apply to any situation in which it is beneficial to select one source of information while suppressing attention to competing sources that would disrupt performance. In this account bilingual speakers enjoy a sort of spillover from their language experience that acts to enhance general processes of cognitive control.

This assumption that inhibition is the key element of bilingual experience was the basis for several influential models of bilingual functioning. A detailed account of how inhibition could be the bridge between bilingual language use and cognitive outcomes was provided by Green (1998). In his Inhibitory Control model, conceptual ideas and task goals are mediated by the supervisory attentional system to activate language task schemas at a lower level. In turn, the activated schemas coordinate into “functional circuits” that exert control by activating and inhibiting relevant lexical-semantic representations for appropriate verbal outputs. In a further development, called the Adaptive Control Model, Green and Abutalebi (2013) pointed out that the need for inhibition differs depending on the linguistic environment shared by speakers and listeners. The authors identify three interactional contexts for bilingual language use and consider the implications of each for cognitive (and brain) outcomes; the contexts differ in how the two languages are used and the demands each place on mechanisms of cognitive control. Bilinguals typically find themselves primarily in one of these contexts, so their long-term experience will impact the underlying processes specific to that context, leading to different outcomes for bilinguals whose interactional experiences differ. The three contexts are: single language, in which each language is used in a unique context, such as one language at home and another at work; dual language, in which both languages are used in the same context but with different speakers who may speak only one of the bilingual’s languages; and dense code-switching, in which the languages are mixed across other bilingual speakers in the same context. The authors’ claim is that eight control processes adapt differentially to these contexts in monolingual and bilingual speakers; the processes are goal maintenance, conflict monitoring, interference suppression, salient cue detection, selective response inhibition, task disengagement, task engagement, and opportunistic planning. This scheme, reproduced in Table 1, shows the differential demands on each process as a function of the interactional context. The + signs in the table indicate that a specific context increases the demand on a specific control process more for bilingual than for monolingual speakers; the = signs indicate that a specific interactional context has an equivalent effect on a specific control process for the two classes of speaker.

Table 1 Demands on language control processes in bilingual speakers as a function of the interactional context relative to demands on the processes in monolingual speakers in a monolingual context. From Green and Abutalebi (2013)

An important implication of the Green and Abutalebi model is that not all bilingual speakers are expected to show enhanced control abilities, and this point may go some way to understanding the failures to find bilingual control benefits in some studies. Instead, potential modifications to cognitive systems depend on the type of bilingual experience in which individuals have been engaged. Although the model was developed as a theoretical exercise, emerging evidence supports its predictions (Beatty-Martinez et al., 2019; Hartanto & Yang, 2016; Ooi et al., 2018). Similarly, Gullifer and colleagues have proposed “language entropy” as a measure of the complexity of the social contexts in which each language is used (Gullifer & Titone, 2020), again tying the cognitive outcomes of bilingualism to the way the two languages are used. These studies have demonstrated that greater entropy is associated with better outcomes in EF tasks (Gullifer et al., 2018; Gullifer & Titone, 2021).

Other factors have also been shown to modulate the relation between bilingual language use and cognitive outcomes. These include early versus late bilingualism (Pelham & Abrams, 2014; Vega-Mendoza et al., 2015), children versus adult bilinguals (Bialystok et al., 2005; Dash et al., 2019), and language switchers versus non-switchers (J. Festman et al., 2010; Prior & Gollan, 2011, 2013; Verreyt et al., 2016). The nature and degree of bilingual experience is also a factor: Recent studies have examined bilingualism as a continuum along monolingual to bilingual experience rather than as a dichotomy and shown a significant positive relation between the conditions or extent of language experience and cognitive and brain outcomes (Calabria et al., 2020; DeLuca et al., 2020; Hervais-Adelman et al., 2018; Novitskiy et al., 2019; Pot et al., 2018; Sulpizio et al., 2020). Extending this idea, simultaneous interpreters can be considered “super bilinguals” in that they continually manage two languages in online processing. Several studies have shown greater cognitive and brain outcomes in this group than in comparable multilinguals (Hervais-Adelman et al., 2015; Yudes et al., 2011). The detailed relations uncovered in these studies undermine any conclusions from binary procedures to classify participants in terms of their response to a simple question about how many languages they speak (e.g., Dick et al., 2019; Nichols et al., 2020). The new approaches have refined our understanding of the relation between bilingualism and cognition by identifying relevant moderating factors.

Factors potentially confounded with bilingualism

In addition to factors that may mediate effects of bilingualism described above, some researchers have suggested that factors associated with bilingualism may in fact be responsible for the reported outcomes. One such factor that has been suggested is socioeconomic status. For example, Morton and Harper (2007) argued that bilingual advantages in cognitive control may actually reflect superior SES backgrounds in bilingual children and adults. It is a compelling argument because it is well established that high SES is associated with better EF outcomes (Farah et al., 2006). However, studies with children (Engel de Abreu et al., 2012; Grote et al., 2021) and adults (Nair et al., in press) have carefully controlled for SES differences and still found superior performance in bilingual groups. Studies that have manipulated both bilingualism and SES have reported effects for both factors with no evidence of confound (Bialystok & Shorbagi, 2021; Calvo & Bialystok, 2014; Krizman et al., 2016).

Other studies have suggested that bilingual benefits may reflect the larger number of immigrants in the bilingual groups, based on claims that immigrants have superior cognitive abilities (Fuller-Thomson & Kuh, 2014). However, better performance by bilinguals has been reported in studies where participants in both language groups were citizens of one country and none were immigrants (Alladi et al., 2013; Costa et al., 2008). In other studies, immigrant and non-immigrant bilinguals were compared and there were no differences between these subgroups, both showing the same effect of bilingualism over monolinguals (Bialystok et al., 2007; Schweizer et al., 2013).

Finally, differences in cultural background have been suggested as another confounding factor (Hilchey & Klein, 2011; Oh & Lewis, 2008). However, although children from East Asian countries often perform better on tests of attentional control than do children from Western countries (Tran et al., 2015; Yang & Yang, 2016), the same studies found performance advantages associated with bilingualism over and above these cultural differences.

It seems likely that these and other factors can affect performance on tests of cognitive control. However, there is no convincing evidence that in the studies reporting bilingual effects on cognition the results should instead be attributable to one of these other factors.

Possible mechanisms

The possible mechanisms underlying the reported consequences of bilingualism have received surprisingly little attention. Yet, without a concrete proposal for such mechanisms, the discussion cannot move beyond competing arguments and countervailing data sets; no critical test of opposing positions is possible. For example, in an influential body of work that consistently shows no differences between monolinguals and bilinguals in cognitive outcomes, Paap and colleagues (Paap et al., 2014, 2015; Paap & Greenberg, 2013) reject the notion that generalized effects of executive functioning exist but offer no explanation for the many studies that produce them or speculation about why their results differ from those that do report significant effects.

As a different approach to understanding these complex and often contradictory effects, we propose a framework for the observed cases of bilingual benefits on nonverbal cognitive tasks that involve EF but differs from the general view based on inhibitory control. We emphasize that the purpose of the present article is not to review the evidence for and against the validity of bilingual benefits in cognitive processing; these arguments have been made elsewhere (Bialystok, 2017). Our point is that many positive cases from many different labs have now been reported and that these results require an explanatory account; current attempts to explain these effects have failed to provide a coherent description. The framework should also shed light on the conditions in which such benefits do and do not appear and suggest further work to clarify these conditions.

Executive functions in bilingualism research

Until the 1960s, research comparing intelligence scores of monolingual and bilingual children concluded that cognitive confusion would inevitably befall bilingual children (e.g., Saer, 1923; review in Hakuta, 1986). The first reliable evidence for positive effects of bilingualism in children was reported by Peal and Lambert (1962). Like the previous research, their study was based on intelligence tests, but unlike those earlier studies, they reported better performance by bilingual children on both verbal and nonverbal assessments. Their interpretation referred to the enhanced “mental flexibility” of bilingual children, a phrase that seemed vague at the time but turned out to be prescient. However, the nature of the effect became clearer in subsequent research that turned away from intelligence tests in favor of cognitive tasks. These studies showed that bilingual children outperformed monolinguals not on general intelligence measures but rather on tasks in which responses required attending to a target in the context of conflicting information (Bialystok & Majumder, 1998). This insight focused the argument on executive function as the crucial domain in which the effects of bilingualism were manifest. The problem, however, was that apart from descriptions of controlled processing occurring in frontal brain regions (Fuster, 2000; Norman & Shallice, 1986; Stuss & Benson, 1986), it was not clear how to connect the emerging work on executive functions to the possible effects of bilingualism.

The EF tasks on which bilinguals outperform monolinguals (e.g., Stroop, flanker, Simon) all involve the inhibition of misleading features; it therefore seemed logical to suggest that bilinguals enjoy a general advantage in inhibitory control. Inhibitory control, as described in that line of reasoning, is a central aspect of cognitive control in many accounts, and so potentially provides the bridge between bilingual language processing and nonverbal executive functioning. Diamond (2013) proposed that cognitive control is achieved by means of three major EF processes – Inhibitory Control (including interference control, selective attention and cognitive inhibition), Working Memory, and Cognitive Flexibility (including set shifting). She endorses the suggestion of Engle and Kane (2004) that working memory and inhibition both depend on some limited-capacity attentional system, and describes evidence suggesting that EF processes benefit from practice and training regimes, a point that is clearly relevant to the case of bilingualism.

A conceptualization of EF that has had much influence on characterizing the observed changes in bilingual control processes is the Unity and Diversity model proposed by Miyake and colleagues (Miyake et al., 2000; Miyake & Friedman, 2012). In their original model (2000, Fig. 1a), nine EF tasks shown at the lower level were combined through confirmatory factor analysis to give rise to the upper level of three latent variables – Updating, Shifting, and Inhibition. Thus, inhibition is again regarded as a major component of executive functioning. In a later revision, Miyake and Friedman (2012), Fig. 1b) modified their scheme to propose that inhibition is represented as a broad latent variable – Common Executive Functions – that correlates with all tasks; Updating and Shifting were retained to capture the specific variance associated with a restricted set of tasks.

Fig. 1
figure 1

Componential model of executive functioning for (a) original model and (b) revised model. Reprinted from Miyake and Friedman (2012) with permission from Sage Publishing

The models associated with Diamond, Miyake and colleagues, and Green (1998) were developed for different purposes – both Diamond and Miyake sought to analyze and clarify the structure of EF, and Green’s goal was to explain language control in bilinguals – but they converged on a common conclusion. Inhibitory control was a central process in all three models: Inhibition is one of the three components of both the Diamond and the Miyake models, and inhibitory control is the supervisory process for suppressing interference and inhibiting unwanted responses in the Green model. If bilinguals routinely inhibit the non-target language to avoid intrusions, it is possible that domain-general inhibition is strengthened, leading to overall improvement in EF. However, inhibition is defined differently in the models. In Diamond’s scheme, inhibitory control is a descriptive component of EF comprising interference control and response inhibition, enabling the individual to stay focused and resist impulsive tendencies. For Miyake, inhibition is a latent variable describing the commonality involved in control across specific tasks. For Green, inhibition is a hierarchical construct in which a higher level (inhibitory control) monitors and adjusts performance on lower-level processing operations. His use of the term is based on the Supervisory Attention System proposed by Norman and Shallice (1986), a system that Miyake places at a higher level of functioning than his executive processes. In Green’s conception, inhibitory control includes both selection and inhibition.

Because of the common use of the term, inhibition became the predominant explanation guiding research into the relation between bilingualism and EF, although the research was based primarily on the Miyake model. From the perspective of all three models, however, it is reasonable to expect that any task involving inhibition should be performed better by bilinguals than by monolinguals, yet, as described in the following section, many such predictions have not been supported, leading some researchers to conclude that there is no effect of bilingualism on cognition (Paap & Sawi, 2014).

Problems with inhibition as an explanation of bilingual performance

Research across the lifespan in which bilinguals outperform monolinguals on various cognitive tasks has continued to accumulate despite not conforming to predictions generated from the inhibition view. This inconsistency points to two problems in the way inhibition was conceptualized in the bilingualism and EF studies: (1) false equivalence between tasks and processes, and (2) classification errors in the construct.

False equivalence is the tendency to label tasks with descriptions of the processes used to perform them, making tasks proxies for processes: Stroop assesses inhibition, n-back assesses working memory, and task switching assesses shifting (discussions in Diamond, 2013, and Kroll & Bialystok, 2013). The interpretation that follows is typically “Participants had better inhibition” rather than “Participants performed better on the Stroop task.” Although these tasks include the processes indicated in their descriptions, they are not simply manifestations of that process. The consequences of this reductionism are clear in the case of bilingualism. The primary tasks used in this research are Stroop, flanker, and Simon, all of which have been described as involving inhibition, although the nature and locus of the inhibition are different for each. More problematic, however, is the finding that bilinguals typically outperform monolinguals on both incongruent trials, which arguably do rely on inhibitory processes, and congruent trials, for which no inhibition is required (Hilchey & Klein, 2011). The equivalent benefit for both types of trials has been reported across the lifespan in research with children (Martin-Rhee & Bialystok, 2008; Yang et al., 2011), adolescents (Chung-Fat-Yim et al., 2018), young adults (Costa et al., 2009; Emmorey et al., 2008), and older adults (Bialystok et al., 2004). This highly replicable finding challenges the conclusion that the superior performance of bilinguals on these tasks reflects enhanced inhibitory control.

The second problem with the conceptualization of inhibition is errors in categorization. The centrality of inhibition to EF was substantially boosted by its identification as one of the three components in the Unity and Diversity model (Miyake et al., 2000). In the original study from which that model emerged, a confirmatory factor analysis clustered performance from Stroop, anti-saccade, and stop-signal tasks into a latent variable labeled inhibition. However, research on bilingualism has confirmed earlier findings by Bunge and colleagues (Bunge et al., 2002) that such tasks reflect two distinct processes, namely, interference suppression and response inhibition, each with different neural underpinnings and different developmental trajectories. Interference suppression is the ability to ignore the effects of misleading information (Stroop, Simon, and flanker tasks), whereas response inhibition is the ability to inhibit an inappropriate response (go/no-go and stop signal tasks).

In bilingualism research, tasks based on interference suppression are typically performed better by bilingual than monolingual participants (except in some behavioral studies with young adults), but tasks based on response inhibition are typically performed equivalently by monolingual and bilingual groups. In studies with children, conditions of a Simon task (Martin-Rhee & Bialystok, 2008) or Stroop task (Esposito et al., 2013; Nayak et al., 2020) that required resolving conflict led to better performance by bilingual than monolingual children, but in comparable conditions in the same studies requiring children to inhibit a response, all children performed similarly. For young adults, monolingual and bilingual participants performed a flanker task in a scanner while fMRI was recorded (Luk et al., 2010) using stimuli adapted from the study by Bunge and colleagues (Bunge et al., 2002), creating conditions for inhibition suppression and response inhibition. In this case there were no behavioral differences; however, the two language groups recruited different networks for the interference suppression condition but similar networks for the response inhibition condition. The same pattern of results was reported for young adults performing a flanker task (interference suppression) and a go/no-go task (response inhibition), showing better performance by bilinguals only on interference suppression, particularly when working memory demands were high, making the task more difficult (Jiao et al., 2019). The consistency of the interaction between the two types of inhibition and language group across the lifespan indicates that inhibition is not a unitary process; instead, it appears that interference suppression is a factor in bilingual performance benefits whereas response inhibition is not.

Part of the rationale for positing enhanced inhibition in bilinguals followed from the notion that bilinguals inhibited the non-target language to avoid intrusions, but arguments against that idea came from research on lexical retrieval. Although the non-target language continued to influence bilingual performance, even in strongly monolingual contexts (Marian & Spivey, 2003; Wu & Thierry, 2010), research by Costa and colleagues demonstrated the complexity of the language selection processes by uncovering the role of factors such as relative proficiency between the two languages (Costa, 2005; Costa et al., 2000; Costa et al., 2006; Duyck et al., 2007). He concluded that inhibition alone could not explain language selection, and therefore proposed a hybrid account that included selection and inhibition, acknowledging a role for both but insisting that the influence of the non-target language is never absent (Costa et al., 2006). Given that both languages are always active during discourse, inhibitory processes do not block the neural activation of non-target language representations but may prevent the emergence of such representations into consciousness, a point that would be consistent with Green’s (1998) model.

Bilinguals outperform monolinguals on some kinds of tasks and under some conditions but, as detailed below, inhibition provides an incomplete account of the data and on its own does not offer a mechanism for the cognitive differences between monolinguals and bilinguals. Hilchey and Klein (2011) presented a detailed case against a simple inhibitory view yet argued in favor of a bilingual processing advantage that reflects “a general executive system that improves in efficiency owing to the need to monitor linguistic representations competing for selection” (Hilchey & Klein, 2011, p. 655). According to these authors, the advantage takes the form of a general increase in processing speed (see also Diamond, 2013), but bilingual benefits extend beyond a simple increase in speed of processing. For example, several studies have shown better accuracy by bilinguals on n-back working memory tasks, especially when task difficulty was increased through greater working memory demands (Barker & Bialystok, 2019; Comishen & Bialystok, 2021; Janus & Bialystok, 2018; Teubner-Rhodes et al., 2016). Second, bilingual benefits are typically not found in conditions of Simon tasks (Bialystok et al., 2004; Linck et al., 2008; Morales, Calvo, & Bialystok, 2013a) and flanker tasks (Costa et al., 2008) involving minimal conflict and EF demands. Third, some studies have shown that the processing speed advantage in bilinguals is attributable to fewer atypically long RTs, indicating fewer lapses in attentional control rather than to an overall improvement in processing speed. If inhibition is not the key factor, what might be responsible for better performance by bilinguals on some nonverbal cognitive tasks? In the Miyake model, the unit of analysis was task performance and results were extrapolated to three higher-order constructs. However, other hierarchical descriptions are also possible.

The case for attentional control

In a review of the literature across the lifespan, Bialystok (2017) documented areas in which bilingual participants showed better performance than comparable monolinguals on a variety of nonverbal cognitive tasks. These included enhanced flexibility, switching, and monitoring of attention in infants and children, better performance in adults on tasks involving perceptual and response conflict (e.g., Stroop, flanker, and Simon tasks) and on tasks involving switching, monitoring, inhibition, selection, and resource allocation. To explain the pattern, Bialystok concluded that inhibitory control was an insufficient mechanism to account for these varied findings, and proposed instead that “lifelong bilingualism impacts a set of processes subsumed under the category of executive attention” (Bialystok, 2017, p. 250). The suggestion is that the bilingual environment leads to adaptation of the attention system to cope with its specialized demands, that this adaptation confers a domain-general benefit to attentional control, and that the resulting benefit enhances aspects of cognitive performance across the lifespan. As described later, these adaptations of attention enhance processes of both facilitation and inhibition, as well as processes underlying cognitive flexibility and resource allocation. In this section we characterize our interpretation of the term attentional control and its various manifestations. In subsequent sections we describe how the construct can act to integrate the relevant empirical findings under a common rubric, as well as providing a framework for the generation of new studies of bilingualism and its consequences. The framework is constructed by considering cases in which bilingual benefits have and have not been reported as a means of explaining those results but may also act to clarify the conditions under which such benefits do and do not occur to help resolve the current controversy over the contradictory findings.

Although the construct of attention has been central to models of cognitive processing since the time of William James (1890), it has been remarkably difficult to characterize scientifically. One complicating factor is that attention has been used to describe both a processing resource – the energy necessary to perform any effortful cognitive or motor task – and the control processes necessary to guide and manage such activities. In our usage attentional control serves to maintain current goals in an active state, to facilitate cognitive operations that accomplish these goals, to suppress interference, and to switch processing resources to a different set of operations when it is cognitively beneficial to do so (see also Eysenck et al., 2007; Ong et al., 2017; Zhou & Krott, 2018, for a similar use of the term). The construct is thus similar to the notion of executive attention as used by McCabe et al. (2010), and by Engle and Kane (2004). The constructs of resource and control are related, and both are relevant for understanding potential differences between monolinguals and bilinguals performing EF tasks. For example, it is possible that the control aspects of attention function to allocate processing resources to specific representations and processes lower in the chain of command. This form of top-down control may be exerted by the person’s current goals and task sets maintained actively in working memory. This is the concept of goal maintenance proposed by Braver, Barch, and colleagues (Braver et al., 2001; Braver & Barch, 2002). As detailed later, the basic idea is that the current goal serves to generate an activation signal that biases the allocation of processing resources to relevant procedures of perception and action (Braver & West, 2008).

In our proposed scheme, attentional control is a broad descriptive term composed of specific functional procedures. Tasks draw on these procedures in various combinations and to various degrees depending on performance needs. The term “attentional control” thus describes a repertoire of processing operations that specific tasks and higher-level cognitive functions can utilize to fulfill their various goals. As shown in Fig. 2, the procedures may be broken down broadly into those that facilitate mental operations, for example selection, goal maintenance, temporary holding, coordination, engagement, and disengagement, and those that inhibit mental operations, for example interference suppression and response inhibition (Bunge et al., 2002). Figure 2 also lists a set of cognitive abilities that we view as descriptive terms for the outcomes of processing operations involving attentional control. In this category we include coordination, flexibility, planning, monitoring, problem-solving, decision-making, and conflict resolution. These abilities also draw on the procedures listed above them in the figure for their effective performance.

Fig. 2
figure 2

Relations between attentional control, cognitive abilities, and tasks in the proposed framework

The scheme is broadly hierarchical in that attentional control is the mechanism that selects operations servicing current needs and goals and reallocates resources to these operations. There is general agreement that such high-level control is mediated by networks originating in the frontal lobes (Fuster, 2000; Smith & Jonides, 1999; Stuss & Alexander, 2000). Thus, the scheme is similar to that proposed by Green and Abutalebi (2013), although our components differ somewhat from theirs (compare Table 1 and Fig. 2). Importantly, the abilities and tasks shown in the lower boxes in the figure are not strictly nested under the set of procedures but rather draw on the procedures in various combinations as needed to fulfill their goals – there is no one-to-one mapping of abilities and tasks. Therefore, our proposal is a framework that specifies relevant elements but is not a formal model of their interactions.

There is clearly substantial overlap between this formulation and other proposed models of cognitive control that are hierarchical in nature, some of which we discussed earlier, such as Green’s (1998) Inhibitory Control model, but there are other models with more direct links to notions of attention. Chun and colleagues (Chun et al., 2011) constructed a hierarchical model of attention at the process level based on four core properties: limited capacity, selection, modulation, and vigilance. Limited capacity reflects the primary purpose of attention, which is to focus on relevant information to the exclusion of less relevant information. Selection serves to bias attention towards one of the competing available candidates. Modulation is the extent to which attended items are processed, thereby affecting their likelihood of being remembered. Vigilance is the extent to which modulation in terms of degree of processing can be sustained over time. Thus, the flanker task requires a high degree of selection without much demand on the other three components, but n-back tasks typically reflect the involvement of all four components – vigilance, selection, capacity, and modulation.

The concepts of attention and EF are central to current models of working memory (WM). In the WM model of Baddeley and Hitch (1974), the flow of information among peripheral systems is managed by the central executive, a form of attentional control assumed to be located in the frontal lobes. Both Cowan (1999, 2016) and Oberauer (2002, 2009) have proposed that items held in conscious awareness are essentially maintained in WM by the processes of focal attention. A more radical view, now dominant in the field of cognitive neuroscience, suggests that WM does not involve storage buffers but rather reflects the allocation of attention to sensory, motoric, and internal representations (D'Esposito & Postle, 2015). Engle, Kane, and their collaborators (Conway et al., 2003; Engle & Kane, 2004) proposed the notion of working memory capacity (WMC) as responsible for controlling higher cognitive functions, and defined it in terms of the ability to attend to relevant representations under distracting conditions (Conway et al., 2003; Engle & Kane, 2004). Researchers have distinguished storage and processing functions of WM, both involving attention, but in relatively passive and active forms, respectively. This distinction between the passive and active deployment of attention in WM is important in the present context, as it is probably only the active engagement of attention that elicits EF – or rather, the active engagement of attention is equivalent to EF – “executive attention” in the words of McCabe and colleagues (McCabe et al., 2010). It seems probable, therefore, that bilingual benefits will be seen in active but not passive situations involving WM.

As mentioned earlier, Braver, Barch, and colleagues developed the goal maintenance theory of prefrontal control function (Braver & Barch, 2002: Braver, Barch, et al., 2001). According to this theory, control of cognitive operations is achieved by holding the desired outcomes for perception and action in a highly accessible form (by sustained neuronal activity patterns in WM) and producing activation signals that bias the flow of ongoing processing in regions relevant to current goals. This last point builds on the ideas of Desimone and Duncan (1995), who proposed that there is constant local competition in the brain for representation at all levels from sensation to action in the form of mutually inhibitory interactions. Top-down excitatory signals from the prefrontal cortex can then bias the outcome of such competitions in favor of goal-relevant percepts and actions (Braver & West, 2008).

Attentional control and bilingualism

Our suggestion is that the concept of attentional control, supervising both goal maintenance and conflict resolution, provides a congenial framework for understanding the findings regarding bilingual benefits. The argument is that immersion in an environment involving competing mappings between concepts and symbols modifies controlled attention in bilingual individuals, making the processes of attentional control more powerful and more flexible. It is unlikely that bilingual experience results in an increase in attentional resources; rather, the continuing need to manage two languages leads to greater efficiency in utilizing those resources. Previous studies using fMRI with monolingual and bilingual younger adults performing a flanker task (Abutalebi et al., 2012), older adults performing a Simon task (Berroir et al., 2017), and both younger and older adults performing a task-switching paradigm (Gold et al., 2013) have demonstrated less brain activation by bilinguals than monolinguals to achieve similar or better levels of performance, a difference interpreted as better efficiency in bilinguals.

Another consideration in understanding the findings linking bilingualism to attentional control is that relatively easy tasks will be performed successfully by both monolinguals and bilinguals with the result that no group differences will be observed. The implication is that bilinguals will perform better than their monolingual counterparts to the extent that the attentional control demands of a specific task exceed the control abilities of monolingual but not bilingual individuals. Therefore, no language group differences should be expected on tasks that can be performed in an automated manner or on tasks for which attentional control demands are easily within the range of the population, such as young adults performing simple EF tasks. This notion is analogous to a situation in which the objective to compare fitness levels across groups is examined by asking participants to walk, jog, or run for 15 min: Group differences would only be expected to emerge as the aerobic demands increased and eventually exceeded the resources of each group; the absence of group differences in the walking condition is non-diagnostic. Lifespan changes in control have also been well researched, with studies showing that the efficiency of attentional control processes increases from birth to adulthood (V. Anderson et al., 2010; Diamond, 2002), peaks in young adulthood, and declines in the course of aging (Braver & Barch, 2002; McDaniel et al., 2008). Therefore, with less effective control functions in general, children and older adults need to devote more attention to a task, so the relatively stronger control processes available to bilinguals better equip them to perform these tasks, revealing the greater likelihood of a positive effect of bilingualism in these populations.

The componential model of EF proposed by Miyake and others and the attentional control approach described here assume different mechanisms for the observed effects. The primary mechanism for a componential view is transfer: a skill learned and practiced in one context, such as inhibition of a non-target language, is transferred to a new context, such as inhibition of misleading perceptual features. Better performance in the first context predicts better performance in the second. This relationship is shown in Fig. 3a. Transfer is an appealingly simple mechanism, but one for which the evidence is limited to specific instances: apparent cases of transfer typically turn out to involve two processes or skills that share common features; evidence for far transfer between abilities with less in common is notably weak (Redick et al., 2013; Shipstead et al., 2012; Simons et al., 2016).

Fig. 3
figure 3

Difference between processes of (a) transfer as used in componential models and (b) adaptation as proposed in the current framework

In contrast, the primary mechanism underlying the attention model is adaptation. In this case, an operation or set of operations, along with their underlying neural networks, is modified through experience so that all domains in which they are involved are impacted. Thus, through enhancement of the control procedures, the task processes run more efficiently. This relationship is shown in Fig. 3b. Given that attentional control connotes the effective deployment of processing resources, such training of attention (see also Diamond, 2013; Tang & Posner, 2014) may result in the more efficient allocation of resources. Thus, there is no specific relation between tasks as there is for transfer but rather a diffuse set of outcomes on tasks for which attention is recruited; the impact need not be equivalent for all outcomes. In this way, a given task, such as flanker or Simon, can sometimes lead to group differences and sometimes not depending on the specific attention demands of the task or condition and on the control capacities of the participants. Thus, predicting when group differences are expected requires more multidimensional analysis than is the case for an interpretation of simple transfer. The approach also rules out more discrete descriptions in which specific component processes such as inhibition are followed from a source domain (language control) to a target domain (nonverbal EF).

Control or inhibition – what’s the difference?

The terms inhibition and attentional control are both quite general and clearly overlap. For example, as depicted in Fig. 2, control does have an inhibitory component, although in bilinguals it is manifest as interference suppression more than as response inhibition. However, our main point is that attentional control provides a good description of many findings that are clearly not inhibitory in nature.

Table 2 presents studies that have shown better performance by bilinguals. The table is organized around the central finding or process involved in the study for which we believe inhibition does not provide an adequate account. To evaluate the possibility that the unifying construct for understanding cognition in monolingual and bilingual participants lies in the notion of attentional control rather than the components of executive function as laid out by Miyake et al. (2000, 2012), we review the major empirical evidence for processing differences between monolinguals and bilinguals and consider their compatibility with our proposed framework. Support for the componential view of executive function that has been the basis for much research investigating the effect of bilingualism on these processes would be obtained by evidence that language group differences were consistently found for a particular component, such as Inhibition, or a particular task, such as Flanker. As several meta-analyses have shown, this level of consistency has not been achieved. Alternatively, the predictions from an adaptation of attention view are that tasks or conditions for which effortful attention is required are likely to produce differences in performance between monolingual and bilingual participants, regardless of the classification of those tasks in the componential structure.

Table 2 Types of studies for which better bilingual performance more plausibly attributed to attentional control than to inhibition

The first point in Table 2 is that bilinguals respond more rapidly than monolinguals on congruent trials in conflict tasks such as flanker and Simon, as well as the incongruent trials where such differences might be expected (Bialystok et al., 2004; Costa et al., 2009), as we described above. Similar results were found for evoked response potential (ERP) responses to a Simon task with younger (Kousaie & Phillips, 2012) and older (Kousaie & Phillips, 2017) adults. There is nothing to inhibit in the congruent trials, but the finding can be attributed to better attentional control (see also Hilchey & Klein, 2011).

The second point comes from the finding that bilinguals show greater facilitation than monolinguals on tasks including such trials. In the Stroop task, these are trials in which the color to be named is presented with its own name (e.g., the word RED printed in red); the facilitation refers to the faster responses to such trials than to control trials (colored Xs) or the color word printed in black (Bialystok et al., 2008). Similarly, in the Proactive Interference task where an item must be recognized as having appeared in the previous display, a facilitation effect occurs when that item appeared in both the previous display and the one before it (Bialystok et al., 2014). Inhibition cannot explain these effects.

A similar effect may be responsible for performance in working memory n-back tasks: an item needs to be identified as having been seen at a specified prior interval. The task becomes more difficult as the interval increases because of proactive interference from the familiar stimuli, but there is no actual inhibition involved. Nontheless, bilinguals generally outperform monolinguals on difficult conditions of this task in both children (Janus & Bialystok, 2018) and adults (Teubner-Rhodes et al., 2016). Studies including electrophysiology indicate more efficient performance by bilinguals (Morrison et al., 2018).

The fourth point is statistical: Studies that use alternative analytic approaches often report significant benefits of bilingualism when standard approaches fail to detect significance. For example, bilinguals produce fewer long response times (RTs) in their responses to conflict tasks, indicating better maintenance of attentional control (Calabria et al., 2011; Zhou & Krott, 2018). An approach that directly investigates this possibility involves ex-Gaussian analyses. Instead of comparing the likelihood that two mean scores came from the same population as is the case with analysis of variance, ex-Gaussian analyses use the entire distribution of scores and extract separate measures associated with the mean tendency (μ) and exponential (τ) components of the overall RT. There is general agreement that the μ component signals relatively automatic aspects of processing, whereas τ (the positively skewed tail of the RT distribution) reflects monitoring and attentional control (Calabria et al., 2011). Studies using this approach with young adults have shown that the mean RT did not differ between groups, that is, there was no group difference in μ, but significantly smaller values in τ for bilinguals, signaling fewer lapses of attention (Abutalebi et al., 2015; Calabria et al., 2011; Tse & Altarriba, 2014; Zhou & Krott, 2018). These results are consistent with better goal maintenance and attentional control in the bilinguals despite comparable mean RTs.

In another demonstration of this point, Zhou and Krott (2016) reviewed a large number of studies comparing monolinguals and bilinguals performing EF tasks in terms of whether the data analyses used data-trimming procedures to exclude extreme RTs. They found that studies that trimmed data to cluster around the overall mean generally found no RT difference between language groups, whereas those that included the entire range of values were more likely to report faster performance by bilinguals. That is, the slower overall responses of monolinguals were attributable to occasional lengthy RTs associated with reduced attentional control.

Alternative analyses have also been applied to studies of older adults. One such approach uses diffusion models of choice RT (Ratcliff & McKoon, 2008). In these models, decision-making is viewed as a dynamic process of evidence accumulation; in two-choice RT paradigms such as flanker and Simon, evidence favoring one alternative is accumulated until some pre-set threshold is reached. Parameters of the model can specify components of the overall process. This analytic technique was used on flanker data generated by older adults with the finding that bilingual participants showed reduced time costs for focusing on the target during incongruent trials (Ong et al., 2017). The authors acknowledge that their study is preliminary, yet the method has potential for illuminating the cognitive processes underlying performance on EF tasks.

The fifth point comes from results from task-switching paradigms that have revealed performance differences between language groups, at least under some conditions (Prior & MacWhinney, 2010; Stasenko et al., 2017). Task switching requires goal maintenance and top-down control of attention, and although inhibition may be a component of both, a general inhibitory process cannot explain task performance.

Lopez Zunini et al. (2019) used a task-switching paradigm and reported better performance by both younger and older bilingual adults, but more importantly found larger N2 amplitude for bilinguals in both young and older age groups and smaller P3 amplitude for bilinguals in the older adult group. They interpreted this pattern as indicating superior sustained attention by bilinguals. Similar results were reported by Gold et al. (2013). Younger and older bilinguals outperformed their monolingual counterparts on a task-switching paradigm, and functional MRI indicated decreased activation by bilinguals in the cingulate cortex, an effect the authors attributed to neural efficiency, similar to an effect reported by Abutalebi et al. (2012) using a flanker task with young adults.

A task based on the notion of task-switching, the Dimensional Change Card Sort Task, was developed for children by Zelazo et al. (1996). Children are asked to sort a set of items by matching to a target on one feature and then re-sort the items by matching to a different feature. The task recruits various components of EF, including inhibition, working memory, and shifting, but overriding that is the requirement for children to direct their attention to relevant features of the display in the presence of misleading distractions. In several studies, bilingual children outperformed monolingual children on this task (Bialystok, 1999; Bialystok & Martin, 2004; Carlson & Meltzoff, 2008; Kalashnikova & Mattock, 2014; Okanda et al., 2010).

The sixth point, disengagement of attention, also includes elements of inhibition but goes beyond a simple definition of inhibition. The idea is that the current focus of attention can be efficiently suspended and refocused in order to meet task demands. This situation occurs when a habitual response or information source is no longer relevant, and the individual must switch to a new source or response. This ability to disengage and update a response has been shown in infants in the first year of life. In these studies, infants raised in bilingual households could disengage from a rewarding visual source and switch to a different source when the source of rewards switches, whereas monolingual infants persisted in attending to the original source (Comishen et al., 2019; D'Souza et al., 2020; Kovacs & Mehler, 2009). In tasks for children and adults, disengagement is demonstrated by the ability to move to the next trial in a series without the continuing influence of the previous trial. In standard EF tasks, this is demonstrated by the finding that congruent trials preceded by congruent trials and incongruent trials preceded by incongruent trials are faster than those trials preceded by the opposite type. The idea is that the congruency of one trial continues to influence the judgment of congruency of the next trial, an effect known as the sequential congruency effect. However, the carryover effect is smaller for bilingual children (Grundy & Keyvani Chahi, 2017) and adults (Grundy, Chung-Fat-Yim, et al., 2017b) than for their monolingual counterparts.

The Wisconsin Card Sorting Task (WCST) is a classic test of the ability to disengage from one strategy when it ceases to provide successful results and to engage a different strategy. In a study by (Xie & Dong, 2017), young Chinese adults who were fluent in English to varying degrees obtained higher scores and made fewer perseverative errors on the WCST than a matched monolingual group. The more fluent bilinguals also outperformed the less fluent bilinguals on this task. Similarly, Yudes et al. (2011) showed that monolinguals and bilinguals did not differ on WCST performance, but a further group of highly proficient bilingual speakers (professional simultaneous interpreters) achieved higher performance than either group. Finally, Festman and Munte (2012) compared bilinguals who switched frequently between languages and had difficulty remaining in the target language, called switchers, and bilinguals who rarely switched languages, called non-switchers. The non-switchers scored higher than the switchers on four tests of cognitive control, performed the WCST more rapidly, and made fewer perseverative errors.

Disengagement is also involved in the ability to see the alternate image in a reversible figure, such as the famous “duck-rabbit” image. Children were shown a series of such ambiguous figures and given progressive cues until they could identify the other image. Having decided that an image is a “rabbit,” children need to disengage from the previous meaning and reinterpret the lines to see that the same figure can also be a “duck.” Bilingual children were more successful than monolingual children and could detect the new image using significantly fewer cues (Bialystok & Shapero, 2005; Wimmer & Marx, 2014).

The seventh type of finding that is not well explained by inhibition comes from studies of monitoring and goal maintenance. For example, Costa et al. (2009) found a bilingual advantage in a version of the flanker task that required substantial monitoring but not in the same task with lower monitoring demands; Hernandez et al. (2012) showed that bilinguals were less affected by an invalid cue in a visual search task and attributed the benefit to better top-down control of attention, which they also described in terms of monitoring.

A task developed by Braver et al. (2001), the AX-continuous performance task (AX-CPT), provides a measure of controlled monitoring. Participants watch a continuous stream of letters, and are instructed to press a response key each time an X appears that was preceded by an A. The stream also contains the sequences BX and AY, where B refers to any non-A letter and Y refers to any non-X letter. In order to avoid false alarms to BX, participants must exert reactive control when the X occurs, whereas in order to avoid false alarms to AY, proactive control must be deployed on seeing A. The paradigm thus involves working memory and goal maintenance as well as two types of control. In several studies, bilingual participants outperformed monolinguals on this task (Beatty-Martinez et al., 2019; Gullifer et al., 2018; Morales, Gómez-Ariza, & Bajo, 2013b).

A recent series of studies demonstrated effects of monitoring and goal maintenance in an auditory discrimination task (Olguin et al., 2018; Olguin et al., 2019). Using a dichotic listening paradigm while EEG was recorded, participants attended to a narrative in their native language in one ear while ignoring an interfering auditory stream presented to the other ear. The non-target stream consisted of another story in the native language, speech in an unknown language, or non-speech. There was also a control condition in which no competing signal was presented. Comprehension of the attended narrative was equivalent for all participants, but EEG results distinguished between the language groups. For monolinguals, the processed signal for the target stream increased in strength to maintain comprehension as the non-target stream became increasingly interfering, but for bilinguals, attention to the target stream remained constant across the conditions. All participants made an early distinction between speech and non-speech, but for the bilinguals the distracting speech signal did not interfere with their attention to the target stream. The authors concluded that the experience of using multiple languages modulated the neural mechanisms of selective attention.

Finally, another category of task that involves EF but is not easily explained in terms of inhibition or other components is false belief, or, more generally, theory-of-mind tasks. In developmental research, the ability to perform these tasks is an essential developmental milestone (Wellman et al., 2001). In a typical task, two characters, Sally and Anne, interact and then Sally hides a toy in one of two locations and then leaves. While she is gone, Anne moves the toy to the other location. Sally returns and the child has to decide where she will look for the toy – in the location in which she hid it or the location where Anne moved it. The child knows where the toy is hidden but Sally does not, so the problem requires answering from the perspective of the knowledge that Sally has. Children generally learn to solve this problem at around 4 years of age, but in several studies, bilingual children were more advanced than their monolingual peers (Bialystok & Senman, 2004; Goetz, 2003; Kovacs, 2009; Nguyen & Astington, 2014). In an interesting extension of this research, Rubio-Fernandez and Glucksberg (2012) administered the task to adults using eye-tracking as the response. Although all participants could provide the correct answer, the monolinguals looked first at the incorrect location before responding. Rubio-Fernandez (2017) commented that the results had typically been ascribed to enhanced inhibitory control but argued instead that “bilinguals’ better false-belief performance results from more effective attention management” (Rubio-Fernandez, 2017, p. 987).

To summarize the empirical results described in this section, bilingual benefits have been found across the lifespan in a variety of cognitive tasks, including better detection of language switches, better deferred imitation performance, and faster disengagement from no-longer-relevant information sources in infants, superior performance on such classical EF tasks as flanker, Simon, and Stroop in children and adults, better performance on false belief, working memory, ambiguous figures, and the DCCS task in children, better n-back performance and fewer lengthy RTs (from ex-Gaussian analyses) in young adults. Whereas explanatory accounts in terms of enhanced inhibition, monitoring, and response speed certainly apply to some of these bilingual effects, they cannot account for them all. Moreover, the componential approach is based on the need to identify discrete processes in the source experience that modify an outcome through the process of transfer, assumptions that have little evidence to support them. In contrast, we suggest that the broader construct of enhanced attentional control provides a satisfactory interpretive framework to account for bilingual benefits in this very diverse set of tasks and accomplishments.

Bilingual effects across the lifespan

The findings summarized in Table 2 are organized around the nature of the effect with the idea that these effects are not well explained by a model in which bilinguals have better inhibitory control than monolinguals. Instead, the argument is that adaptations in attention systems as a consequence of bilingual experience modify a range of tasks and processes based on attention. However, the manifestation of these effects is somewhat different across the lifespan.

Possibly the most surprising group for which bilingual experience has been shown to modify cognitive performance is infants. Because the assumption had been that any putative effects of bilingualism would be traced to language use, there was no reason to believe that simple exposure to a bilingual environment by preverbal infants would reveal an impact of that experience. Since infants do not speak, explanations based on transfer of experience from language use to nonverbal contexts are unlikely to account for these results. Nonetheless, infants in bilingual environments are processing the languages around them, and their experience in listening, comprehending, and processing multiple languages may be sufficient to reshape attentional control.

Infants attend to the world and classify what is similar, and in so doing create the conceptual categories that will be the basis for future learning, including those for language. Newborn infants can distinguish between the language or languages they heard in utero and novel languages, providing a basis for language categorization (Byers-Heinlein et al., 2010). More dramatically, infants in bilingual environments watching a silent video of a talking face can determine when there is a language switch, whether they have heard both languages in their environment (Weikum et al., 2007) or not (Sebastian-Galles et al., 2012). All infants could do this at 4 and 6 months of age, but only infants in bilingual environments could still detect this change at 8 months old. Relatedly, over the course of the first year, infants looking at faces tend to shift their primary attention from the eyes to the mouth as language learning becomes a more consuming part of their lives (Tenenbaum et al., 2013); bilingually raised infants switch to focusing on the mouth at a significantly earlier age (Ayneto & Sebastian-Galles, 2017; Pons et al., 2015). Therefore, from the beginning, infants raised with two environmental languages use different attention strategies to extract information from talking faces and establish distinct representations for the languages in their environment. This has also been shown in simple memory tasks: bilingual babies at 6 months (Brito & Barr, 2014) and 18 months old (Brito & Barr, 2012) outperformed their monolingual counterparts on a deferred imitation task in which the infants were shown an action involving one puppet, and successfully repeated the action with a different puppet at a later time. The authors attribute this superior memory generalization to enhanced selective attention to salient perceptual cues. To summarize, in the first 2 years of life, preverbal infants in bilingual environments showed better attentional control to both verbal and nonverbal aspects of the environment, possibly setting the stage for further development of attention with higher cognitive functions.

Research with children was the first area to report better performance by bilingual participants than their monolingual peers. In an early study using a variety of tasks, Bialystok and Majumder (1998) noted that bilingual children outperformed monolinguals on tasks that relied on conflict resolution, and proposed that inhibitory control could be the explanatory mechanism. Consistent with this idea, subsequent research reported that bilingual children outperformed monolinguals on flanker (Yang et al., 2011; Yoshida et al., 2011), Simon (K. Antoniou et al., 2016; Martin-Rhee & Bialystok, 2008; Morales, Calvo, & Bialystok, 2013a; Poarch & Van Hell, 2012; Tse & Altarriba, 2014), and Stroop tasks (Esposito et al., 2013; Nayak et al., 2020; Poulin-Dubois et al., 2011), all of which include a role for inhibition, but other studies using similar tasks failed to find these language group differences (Anton et al., 2014; Dunabeitia et al., 2014; Gathercole et al., 2014; Goriot et al., 2018; Morton & Harper, 2007). Subsequent research with children expanded the range of tasks used, such as including working memory tasks, although here too there were studies that reported better performance by bilinguals (Blom et al., 2014; Morales, Calvo, & Bialystok, 2013a; Soliman, 2014) and others that did not (Engel de Abreu, 2011). The framework that focused largely on inhibition proved to be an unreliable predictor of these results.

The types of tasks for which bilingual children demonstrated advanced performance are similar to tasks that children with attention disorders, such as ADHD, find difficult. To test possible parallels and potential interaction effects, Sorge et al. (2017) tested 280 typically developing children, 8–11 years old. Children were assigned a continuous score for bilingual experience that varied from “monolingual” to “highly bilingual” (J. A. E. Anderson et al., 2018) and a continuous score for children’s attentional capacity from the Strengths and Weaknesses of Attention-Deficit/Hyperactivity Disorder Symptoms and Normal Behavior Scale (SWAN). This instrument is generally used to identify cases of clinical impairment in attention by focusing on children whose scores fall below a pre-determined cutoff. None of the children in this study were clinically impaired but the scores nonetheless fall on a normal distribution. Children completed three EF tasks – a flanker task, a working memory task, and a stop-signal task. Regression analyses showed independent contributions for each of degree of bilingualism and attention score to outcome measures in each task, with no interaction effect; children who were more bilingual and children who had a higher attention score achieved better performance. Thus, the effect of bilingualism was parallel to the effect of attention in boosting performance on these EF tasks. It is not surprising that an assessment of attention is related to performance on what are essentially attention tasks; it is more surprising that an assessment of bilingualism has a similar relation.

The majority of research on the effects of bilingualism on cognition, however, has been conducted with young adults, and it is in that group that the results are most contentious. Some resolution can be found in neuroimaging evidence. In a review of structural and functional brain differences between monolinguals and bilinguals, Grundy, Anderson, and Bialystok (2017a) reported that across studies, event-related potential waveforms associated with attentional resources, namely, N2 and P3, have a larger amplitude and earlier onset in bilinguals than in monolinguals, suggesting that bilinguals devote more resources to the control of attention earlier than monolinguals, who devote more resources later in processing. This pattern is consistent with more habitual and efficient control of attention for bilinguals than monolinguals. In this way, when tasks become more demanding, bilinguals can maintain better attentional control than monolinguals.

Studies with older adults have revealed many of the same effects found for younger adults that are described in Table 2 (Baum & Titone, 2014), but the most dramatic effect of bilingualism in older age is the accumulation of cognitive reserve that manifests as a delay in symptoms of dementia (Bialystok, 2021). The central notion for cognitive reserve is that there a dissociation between cognitive level and underlying neural structures such that individuals with cognitive reserve outperform levels predicted by their neural resources (Stern, 2002). A large number of studies have shown that bilingual patients are diagnosed with dementia at a significantly older age than monolingual patients after matching for cognitive level and a variety of demographic variables (meta-analyses in Anderson et al., 2020; Paulavicius et al., 2020). Moreover, these effects have been observed in the context of more hippocampal atrophy (Schweizer et al., 2012) and poorer glucose metabolization (Kowoll et al., 2016; Perani et al., 2017) in the bilingual sample, both indications of greater disease pathology. That is, the bilingual patients performed at the same cognitive level as monolingual patients despite greater levels of structural and functional impairment. In a recent study, Costumero and colleagues (Costumero et al., 2020) examined 99 monolingual and bilingual patients with Mild Cognitive Impairment matched on cognitive and demographic variables, and reported significantly reduced cerebral volume in bilinguals compared to the monolinguals. In a follow-up battery of tests with a subset of the sample approximately 7 months later, monolinguals demonstrated both greater brain decline and greater cognitive decline than did bilinguals.

What is the basis of cognitive reserve in older bilinguals? Because the primary evidence for cognitive reserve in older bilinguals does not come from standard EF tasks, the explanations that have evolved from those tasks, such as the componential model of EF, cannot apply. So how do bilinguals maintain cognitive levels into older age even in the presence of brain decline? Our suggested answer is that lifelong bilingualism has conferred enhanced levels of attentional control to speakers of two or more languages, providing a robust basis for a range of cognitive tasks, including those dependent on executive functions.

Moving forward: Empirical approaches to uncovering the mechanism

We began by arguing that the componential approach to describing the structure of EF contained inconsistencies that were revealed by research with bilinguals. Specifically, the components lacked the coherence and the integrity that were required to predict or explain performance in monolingual and bilingual samples. Instead, the results from studies that used the standard EF tasks and followed the implications of componential models led to contradictory results. In contrast, research with infants, children, and older adults that was not based on the tasks or assumptions of individual components of EF more reliably produced performance differences between language groups, although they could not be described in terms of those components. Bilinguals typically outperform monolinguals on tasks generally involving conflict, as shown in Table 2, but show no difference from monolinguals on easy tasks, verbal tasks, or task conditions that do not include conflict. We described those effects as reflecting differences in attentional control: bilingual environments shape the development of attention in infants, efficient attentional control guides children’s performance on complex attention tasks, and attention networks support cognition in older age by providing cognitive reserve. Thus, the cognitive benefits of bilingualism are found across the lifespan, but they are manifest in different ways and under different circumstances. They are also, it seems, underestimated by performance on standard EF tasks, especially for young adults who can perform such tasks with relatively few demands on cognitive control. In terms of the analogy offered earlier, most laboratory EF tasks only involve walking. Our conclusion is that these results are more compatible with an explanation based on attentional control than one based on a narrow view of executive functioning or task differences.

One way to investigate these claims is to hold the task constant and modify the attention demands within a task. This approach provides a direct comparison of componential models that are based on task differences and an attention model that is based on variation of demands within a task. Although the model endorsed by Diamond (2013) does include variation of task demands within components, the model proposed by Miyake et al. (2000, 2012) does not incorporate fluctuations within a task. Some evidence supporting the attention interpretation can be found in previous studies. For example, both a flanker task (Costa et al., 2009) and a Simon task (Bialystok, 2006) were performed better by bilingual than monolingual young adults when they were presented in a context requiring many inter-trial switches (increasing attention demands), but not when the switching, and hence attention demands, was reduced. A similar effect was found by manipulating the working memory demands of a flanker task, showing that bilinguals outperformed monolinguals only in the condition with high demands (Jiao et al., 2019). Similarly, Diamond (2013) offers examples from research with children in which modifications of the inhibitory demands within a single task (Dots task or Spatial Stroop) impacted children’s performance.

The n-back task in which participants must decide if a stimulus matches one seen on a previous trial specified by a gap of n trials lends itself well to this type of manipulation. The processes involved in deciding if the current stimulus is the same as one seen one or two trials back are essentially the same, but the 2-back condition is more challenging. Our claim is that what makes 2-back more difficult than 1-back is that it requires greater attentional control to compare the current stimulus over a longer stretch. Two previous studies have reported that bilinguals outperformed monolinguals in the 2-back condition but not in the simpler 1-back condition (Barker & Bialystok, 2019; Janus & Bialystok, 2018). However, a more systematic exploration of this pattern was reported by Comishen and Bialystok (2021). Young adults who were classified as monolingual or bilingual completed four conditions of an n-back task – 0-back, 1-back, 2-back, and 3-back – while EEG was recorded. The 0-back is a control condition in which participants simply identify a current item as being a target or not, so cannot be compared to the other n-back conditions. For the other three conditions, RT in both groups was significantly slower for each successively difficult condition but with no speed difference between language groups. Similarly, as difficulty increased across the three conditions, accuracy declined, but so too did the performance gap between monolingual and bilingual participants resulting in a greater bilingual benefit at longer lags. However, the analyses of the event-related potentials for P2 and P3 waveforms indicated less effortful processing was recruited by the bilinguals to achieve these outcomes at all three lags. Because the task is the same across all the conditions, an explanation in terms of components of EF appears not to apply. These studies support the interpretation that conditions within a task that impose more effortful demands are more likely to reveal differences between language groups than a simpler version of the same tasks. The demands for attentional control provide a better fit to evidence for language group differences than do differences between tasks or EF components.

Future research needs to apply this “graded-difficulty” approach to other task paradigms to establish the consistency of the effect of demands for attentional control across different tasks. The results would be particularly convincing as a critical test of the continuous attention model versus the EF component model if parallel results could be found for tasks typically associated with different components. Since n-back is considered a test of working memory, the extension of this approach to tasks such as a flanker task as an index of “inhibition” and a task-switching task as an index of “shifting” would provide converging support. Previous studies that have manipulated condition difficulty within a task, such as a flanker task (Costa et al., 2009; Jiao et al., 2019), have compared just two conditions, labeled “easy” and “hard.” However, a test of the attentional control view requires a larger range of variation in difficulty that can be observed and calibrated to bilingual experience. For example, a flanker task that includes four types of flanking stimuli that differ in salience, creating increasing distraction from the target stimulus, would be expected to show the smooth transition from equivalent group performance in the simplest condition, to diverging performance in moderate conditions, and finally to significantly better performance by bilinguals in the most difficult condition, as reported by Comishen and Bialystok (2021) for the n-back task. Accompanying these studies with measures of EEG to index effortfulness while performing the task would provide more complete data.

Two important factors when developing these future studies are that the tasks should be nonverbal and require consciously controlled as opposed to automatic procedures. This combination makes Jacoby’s (1991) process dissociation procedure a possible candidate for investigating these issues. In memory experiments using this procedure, items to be remembered are typically presented in one of two modalities or contexts. In a later recognition test, participants are asked to recognize items from either group in one test condition (“inclusion”), but only items from one specified group in a second condition (“exclusion”). The difference between performance levels in the two conditions yields an estimate of consciously controlled recollection; an estimate of automatic processing can then be calculated using equations proposed by Jacoby (1991). The prediction is that a bilingual advantage would be found in the consciously controlled measure but not in the automatic measure, and indeed preliminary evidence in favor of this prediction was reported by Wodniecka et al. (2010).

A further method that directly assesses attentional resources and their management is the dual-task paradigm in which two tasks are performed simultaneously. One measure of attentional control can be computed from a visual tracking task, performed both alone and in conjunction with a second task. Spatial deviations from the target under dual-task conditions provide a measure of attention paid to the second task (Naveh-Benjamin et al., 2005). For studies assessing attentional control in bilingual participants, the nonverbal secondary task could be a continuous auditory RT task in which three tones are each associated with a different response key. Better attentional control would be indicated by smaller deviations in the tracking task when combined with the auditory RT task. Performance on the continuous RT task would assess language group differences in response speed, and how speed is affected by the addition of a second task.

One major conclusion of the present account is that the superior performance of bilingual over monolingual individuals on certain tasks is not a main effect but an interaction. By this we mean that we should not expect to find better bilingual performance on all EF or attentional control tasks; rather, the finding of better bilingual than monolingual performance will depend on an interaction between the control demands of a particular task and the control abilities of the person performing the task. In turn, the control demands of a task will depend on such things as task complexity and the presence of prepotent response tendencies. It is also the case that task demands are not absolute but depend on the person performing the task, principally on how practiced the person is on that particular task. A complex task that requires substantial resources and control abilities in a beginner may be performed with little need for executive control processes in an expert. Individual differences in control abilities also depend on a variety of factors including age of the person and impairment of frontal lobe functions. This interpretation based on gradations of task demands and gradations of ability within varying contexts is more nuanced than the categorical schemes that have framed most of the research investigating cognitive effects of bilingualism. Hence binary answers determining that effects do or do not exist are ruled out. Instead, the effect of bilingualism is to modify the equation in which task demands and individual resources determine the point at which more controlled attention is required to maintain task performance. At some point, only those with adequate or possibly “reserve” capacity can continue to perform.

We have proposed that bilingualism is one factor that enhances cognitive control abilities although other factors undoubtedly play a role; speculatively, these may include the complexity of a person’s work environment or leisure activities. Overriding all this is the need for a detailed description of “bilingualism” in each study. As recent research shows, differences in bilingual experience have significant consequences for the impact of that experience on cognitive outcomes (DeLuca et al., 2019; DeLuca et al., 2020; Gullifer et al., 2018; Gullifer & Titone, 2020). These considerations mean that the research moving forward will inevitably be more complex than previous studies that have relied largely on categorically different groups of individuals performing simple tasks. Future research will need to attend to details of task demands and bilingual experience but also consider the bilingual context in a more meaningful way than has typically been done. This research began on the assumption that there was a simple question that could be easily addressed – do groups with different language experiences develop different levels of cognitive or EF ability? – but we now see that the question is not at all simple. We suggest that an approach focusing on the interaction between the control requirements of tasks and the attentional control abilities of individuals may provide the way forward. For the reasons explained above, answering the question is extremely important in that the answers impact child development, cognitive function, and cognitive decline in older age.