The currently emerging consensus regarding conflict monitoring and control vis-à-vis Stroop processes is in need of a fresh evaluation. In particular, we argue that control is a redundant concept vis-à-vis Stroop performance and that the meaning of conflict is distorted in conflict-monitoring theory. To quote the title of the influential paper by Julie Bugg (2014), “Conflict-Triggered Top-Down Control: Default Mode, Last Resort, or no Such Thing,” we argue that the third option is the case. James Schmidt presented evidence-based arguments against conflict-monitoring and control, calling it repeatedly “an illusion” (Schmidt, 2019; Schmidt et al., 2015). Much earlier, Theeuwes (1994, 2010 ; Theeuwes et al., 1998; Theeuwes et al., 2000) has shown in a series of tightly controlled experiments that early visual attention is “completely stimulus driven” (Theeuwes, 2010, p. 77). Low-level associative mechanisms have been implicated in more recent work, too (e.g., Abrahamse et al., 2016; Braem & Egner, 2018; see also, De Neys, 2021), but these authors still seek to engage those bottom-up processes with “central control”—an unwarranted linkage and assumption (see the theoretical discussion below). Recently, two of the present authors further challenged the role of conflict monitoring and control in the Stroop domain (Algom & Chajut, 2019). Unlike the work by Schmidt and Theeuwes, the present statement is not a review of studies in the literature, not even a minireview (Algom & Chajut, 2019). It is a rigorous evaluation of the theoretical arguments. The present statement entails several of the points made by Algom and Chajut (2019), but here we crystalize them, while adding new points, insights, and perspectives. We hope that the crisply stated, logically scrutinized arguments will make them readily accessible to a large audience interested in Stroop and cognitive processes.

At this point we should state our exclusion criteria. A principal one refers to the underlying brain loci and processes. We do not discuss them for two reasons. First, our analysis is logical, so that if the premises are true (as we believe them to be), the resulting argument must be true. Second, while not experts in recording and imaging, we are still apprehensive about the exclusive identification of a given psychological process (say conflict) with a definite brain activity (enhanced energy/reaction in the anterior cingulate cortex). Brain loci and the attendant activations do not come with a label identifying them as “conflict,” “detection,” or “decision;” hence, any (causal) link must be supported by double dissociation. This has been done in the control literature, but not nearly to a sufficient extent. For example, Grinband et al. (2011a) note that conflict-monitoring theory has not been tested against the natural null hypothesis that enhanced anterior cingulate activity is associated with generic task processes rather than with conflict (see also Levin & Tzelgov, 2016). When the hypothesis was tested (Grinband et al., 2011b; see also Weissman & Carp, 2013), conflict monitoring theory failed the test. In this respect, establishing a more basic condition, selective influence (Dzhafarov, 1999, 2001; Townsend, 1984; Van Zandt, 2002), has not been attempted in the conflict literature. This is a notoriously difficult condition to satisfy in empirical research (Algom et al., 2015; Algom, et al., 2017; Fitousi & Algom, 2018), but developments within mathematical psychology resulted in several tests by which processing times can be isolated.

For a second stipulation, we restrict our analysis of control and conflict monitoring to Stroop processes. We mean Stroop processes in a broad comprehensive sense to include the large class of Stroop and Stroop-like dimensions (e.g., picture-word, spoken word, visual word) as well as conceptually similar tasks (e.g., flanker and Simon tasks). We use the original Stroop dimensions of ink color and color word for our definitions and illustrations and as our running example throughout the text. However, our arguments apply to the gamut of Stroop effects from number and size (the size congruity effect; e.g., Algom et al., 1996; Fitousi & Algom, 2006, 2018, 2020; Fitousi et al., 2009; Ganor-Stern et al., 2007; Henik & Tzelgov, 1982; Pansky & Algom, 1999, 2002) to directional word and position (left–right or spatial Stroop; e.g., Baldo et al., 1998; Shor, 1970, 1971) to picture-word (Arieh & Algom, 2002) to word-word (Dishon-Berkovits & Algom, 2000) to picture-picture (Shaki & Algom, 2002) or to cross-modal auditory-visual Stroop (e.g., Melara & Marks, 1990). The points we make also generally apply to the allied tasks of flanker (Miller, 1991) and Simon (Ansorge & Wühr, 2004; Fitousi, 2016) or to the “Navon task,” wherein a global letter is composed of small letters (same or different from the global letter) and the task is to identify either (Mevorach et al., 2010; Navon, 1977; Pomerantz, 1983). For a rigorous definition of Stroop effects as opposed to non-Stroop effects we refer the reader to Algom et al. (2004, pp. 324–325). All of the Stroop and Stroop-like effects mentioned, as well as flanker and Simon tasks, inhere well within the Stroop side of the demarcation line.

A variety of studies have established control as a wide-ranging cognitive concept. Its generality granted, it is still the case that a disproportionate amount of the relevant research has been directed at one particular phenomenon – the Stroop task and effect. The latter serve as a testing ground for evaluating the validity of suggested control mechanisms. Our critique thus focuses on the concept of control as an explanation of the Stroop effect. The basic control account is as follows. The near perfect, errorless performance with the nonhabitual response of color (even when it conflicts with the habitual word) seems to indicate the operation of an efficient top-down control mechanism (Botvinick et al., 2001; Botvinick et al., 2004; see also more recently Musslick et al., 2015; Shenhav et al., 2013). Control enables refraining from impulsive, automatic, or habitual responses, which, in turn, leads to the impeccable performance observed. We question the validity of this story. We show instead that simple perceptual, input-driven stimulus factors account for all features of Stroop performance. Of course, it would be absurd to deny brain control of whatever we do, but it is almost equally unlikely to assume a dedicated central command system that adjusts behavior on a trial-to-trial basis in any particular task. Because there is a more complete and parsimonious explanation of the Stroop task and performance, the Stroop effect is inherently unsuitable to serve as the gold standard for capturing control and conflict monitoring

It is worth pausing briefly on the notion of control. Obviously, as we just mentioned, there is brain control over all human action. People willfully apply control in everyday life over whatever they do – they come to meetings as planned, or, in the Stroop task, follow the instructions to name the colors. Beyond this trivial sense, however, virtually all applications of control in psychological science refer to specific mechanisms. Michael Posner, who pioneered the study of control in cognitive science, has been acutely aware of its hypothetical status, stressing repeatedly that “much needs to be learned about the mechanisms” of hypothetical control systems (e.g., Posner & Raichle, 1995, p. 171). It is here, in the scientific research domain, that we issue an important caveat. Control is a theory, not a fact. All too often, popular conjectures in psychology are taken for facts rather than for the theoretical notions they are. Two ready examples are the “mental number line” in numerical cognition (Dehaene, 1997; but see Bar et al., 2019) and the “attentional spotlight” (Posner et al., 1980; but see Shalev & Algom, 2000). It is important to keep in mind this caveat because, for many a student, the fact of control seems to be an article of faith.

The structure of the article is as follows. We begin with a rigorous definition of the Stroop effect and its components. Discussion of seven issues follows, each posing a challenge to conflict monitoring and control as an account of Stroop processes and effects. Some, though not all, of the basic findings discussed could possibly be accommodated by a control account; however, assuming the attendant monitoring processes is gratuitous because stimulus driven attention and perception completely and more parsimoniously explain the Stroop results. Other Stroop issues are ignored in accounts of conflict monitoring and control. Next, in the section on theory, we discuss how the notion of conflict has been corrupted in conflict monitoring theory. We conclude that the Stroop effect is a misguided choice for testing conflict monitoring and control because the latter are fundamentally ill-suited to account for the effect and its associated properties.

The Stroop effect assesses the selectivity of attention: Basic definitions

Proper functioning in any task depends on facility at attending to the relevant feature for responding, while ignoring task-irrelevant distractions. This critical ability is assessed by the Stroop effect (Stroop, 1935), psychology’s oldest and still most popular standard for specifying people’s prowess at attending selectively (MacLeod, 1991; Melara & Algom, 2003). The fundamental idea is creating agreement (via congruent stimuli [C]) or conflict (via incongruent stimuli [IC]) between values of the target feature (e.g., color) and the distractor feature (color-word) when responding to the target feature. An influence of the distractor is detected when responses to the target are more sluggish and error prone to incongruent stimuli, thereby compromising exclusive focus on the target. Conversely, if there is no difference between the two types of stimuli, selectivity to the target color is perfect. Formally, the Stroop effect is defined as:

$$\mathrm{Stroop}\ \mathrm{Effect}=\mathrm{MRT}\ \left(\mathrm{IC}\right)-\mathrm{MRT}\ \left(\mathrm{C}\right),$$
(1)

where MRT is mean response time to color. An analogous formula exists for error rate. The larger the Stroop effect, the more pronounced is the failure of full selectivity to color; a zero Stroop effect, by contrast, attests to perfect selectivity of attention to the target color (i.e., the task-irrelevant feature of word does not make a difference). The Stroop effect can be partitioned by presenting a third type of stimulus, control or neutral stimuli (N). Considering the ink color (as the target attribute) and the color word (as the distractor attribute), the word RED in red ink color forms a congruent stimulus, RED in green forms an incongruent stimulus, but TABLE in red is a neutral stimulus. Neutral stimuli do not create agreement or conflict between their constituent components. Presenting also neutral stimuli, Stroop interference is defined by the difference between incongruent and neutral stimuli:

$$\mathrm{Stroop}\ \mathrm{Interference}=\mathrm{MRT}\ \left(\mathrm{IC}\right)-\mathrm{MRT}\ \left(\mathrm{N}\right),$$
(2)

and Stroop facilitation is defined by the difference between neutral and congruent stimuli (people still respond “red” faster to RED in red than to TABLE in red),

$$\mathrm{Stroop}\ \mathrm{Facilitation}=\mathrm{MRT}\ \left(\mathrm{N}\right)-\mathrm{MRT}\ \left(\mathrm{C}\right).$$
(3)

Note that the Stroop effect is the algebraic sum of interference and facilitation:

$$\mathrm{Stroop}\;\mathrm{Effect}\;=\;\mathrm{Stroop}\;\mathrm{Interference}\;+\;\mathrm{Stroop}\;\mathrm{Facilitation}.$$
(4)

 

The reason we re-viewed these fundamentals is to issue several notes of caution vis-à-vis control and conflict monitoring explanations of the Stroop effect.

Seven problems

Missing distinction between Stroop and non-Stroop stimuli

The control-conflict approach lacks a clear demarcation of Stroop stimuli, the putative target for its theorizing and explanation. What is a Stroop stimulus after all? It is a unique configuration of attributes that are conjoined together by meaning. The Stroop stimulus is distinguished from all other (multidimensional) stimuli by being simultaneously a physical-perceptual construct and a logical-semantic construct (this intersection of domains likely explains its unabated popularity). For perception, the color and the word are physical stimuli impinging on the sensory surface like all other stimuli; for meaning and unlike other stimuli, the color and the word are yoked together by the logical-semantic relation of compatibility or incompatibility. Therefore, the distinctive feature of all Stroop stimuli is the presence of a logical relation between their constituent components: Each and every Stroop stimulus is either congruent or incongruent. Consider the original dimensions of color word and ink color: All possible combinations of a color word and an ink color must result in either a congruent stimulus (when the word is the name of the ink color) or an incongruent stimulus (where word and color mismatch). Precluded is a third possibility. As a result, also, responding to an attribute of a Stroop stimulus always mandates semantic analysis.

Notably, not all multidimensional stimuli possess the quality of congruity. Consider a red triangle with the task of naming the color (or the shape). A red triangle is not a Stroop stimulus. This stimulus is a combination of a shape and a color, yet it lacks the quality of congruity. A red triangle is neither more nor less (in)congruent than, say, a blue circle. As a result, Equation 1 (the formula for computing the Stroop effect) cannot apply to a stimulus like a red triangle. The upshot is: The Stroop effect is not defined and cannot be calculated for non-Stroop stimuli (see again Algom et al., 2004, for a rigorous definition of the Stroop stimulus, and for a clear line of demarcation between Stroop and non-Stroop stimuli).

Now, control and conflict monitoring theory ignores the difference between Stroop and non-Stroop stimuli. It treats non-Stroop stimuli on an equal footing with Stroop stimuli. The wealth of “conflict” stimuli and tasks portrayed in the work of Botvinick et al. (2001) comprises an amalgamation of Stroop and non-Stroop stimuli. Despite the irrelevance of Equation 1—hence the absence a Stroop effect—the theory imputes “conflict” into non-Stroop stimuli where none exists. Consider the (possible) responses, “red” and “triangle,” to the non-Stroop stimulus, red triangle. These responses are not in any semantic or logical conflict in the way that the responses “red” and “green” are for the Stroop stimulus, RED (word) in green (ink color). For the Stroop stimulus, the responses are in genuine conflict to the extent that they exclude each other, whereas for the non-Stroop stimulus, the responses do not exclude each other, hence do not conflict logically or semantically. This lack of demarcation alone challenges the monitoring account as a bona fide Stroop theory.Footnote 1

Ignoring the unique cognitive processing of Stroop stimuli

Pursuant to the previous point, the difference between Stroop and non-Stroop stimuli is not merely logical; it reaches deeply onto underlying cognitive processing. Responses to Stroop stimuli entail semantic analysis, whereas responses to non-Stroop stimuli (usually) do not. This difference in processing is ignored in conflict-control theory. For the Stroop logical relation to exist, one of the dimensions (at the least) must be semantic (i.e., to possess meaning through associations with referent stimuli). Concerning the original Stroop dimensions, the color-word is semantic, although print color is not. In the popular picture-word species of the Stroop task, both dimensions are semantic. By contrast, processing of many non-Stroop stimuli is perceptual: A triangle in red entails two physical components (a shape and a color), and purely physical stimuli do not mandate logical or semantic analysis.

Nevertheless, the question of selective attention exists with equal force with respect to non-Stroop stimuli, too. Can people attend selectively to color and ignore shape (D. J. Cohen, 1997)? When preparing for landing, can a pilot focus on azimuth and momentarily ignore height (Algom et al., 2004)? Because these dimensions are not Stroop dimensions, the Stroop effect cannot be calculated. Other measures, notably Garner interference (Garner, 1974; see Algom & Fitousi, 2016, for review), serve then to assess selective attention.

The main point to note is that qualitatively different cognitive processes underlie selective attention with Stroop and non-Stroop stimuli. Again, the former stipulates semantic analysis, the latter does not. For Stroop, to notice that the momentary value of the task-irrelevant dimension is congruent or incongruent with that of the target mandates semantic analysis. This stipulation applies full force to the allied conflict tasks of flanker, Simon. The stimuli in the latter tasks, too, divide into congruent and incongruent cases, a semantic demarcation. As a result, processing in the flanker or the Simon tasks, too, mandates semantic analysis—deciding whether or not the momentary value of the irrelevant dimension is congruent or incongruent with that of the target. Sans this analysis and the attendant advantage of congruent combinations, one would not observe flanker or Simon effects. Indeed, according to the first sentence of Miller’s (1991) in-depth analysis of the flanker task, the very fact that the “flankers produce a response compatibility effect indicates that they are processed semantically, at least to some extent” (p. 270).

Many flanker tasks use arrows for stimuli (e.g., Fan et al., 2005; Fan et al., 2002; Q. Li et al., 2014). It is important to note that arrows are as fully semantic stimuli as are letters of the alphabet. As Posner has repeatedly shown since presenting his original attention paradigm (Posner, 1980; Posner & Raichle, 1995), an arrow can and often does function as a symbol (for direction)—that is, has a referent (think of endogenous attention in the Posner orientation of attention paradigm). In Posner’s influential Attention Networks Tests (ATN) paradigm, Simon, Stroop, and flanker are interchangeable candidates for executive attention—attesting to their fundamental affinity in terms of (semantic) processing. The latter is often chosen on purely pragmatic grounds (Fan et al., 2005; Fan et al., 2002; see also Chajut et al., 2009).

Concerning the Simon effect in particular, quite apart from the essential compatibility relation enabled by (a modicum of) semantic analysis, Ansorge and Wühr (2004; see also, Wühr & Ansorge, 2007) argue for deep cognitive processing involving short term memory. In his recent review, Fitousi (2016) concluded that the Ansorge and Wühr theory “entails that the Simon effect is a semantic phenomenon” (p. 2451). Other influential accounts (Tagliabue et al., 2000; Zorzi & Umiltá, 1995; see Wühr & Heuer, 2018) assume two parallel routes by which the dimensional values activate the response. This most popular dual route account of the Simon effect (Hommel, 1993b, 2011) is a close relative of species of the relative speed of processing account of the Stroop effect (e.g., J. D. Cohen et al., 1990; Melara & Algom, 2003) and subject to the influence of the same variables (Hommel, 1993a).

For non-Stroop stimuli, by contrast, no semantic processing is involved (Algom et al., 2004; Fitousi, 2016). The Garner measure (for example) specifies the toll exacted on target performance by the mere presence of irrelevant variation.

To recap, a major problem with control and monitoring theory is that it ignores the unique operation-characteristics of Stroop stimuli, and fails to recognize their distinctive cognitive features. A handy example comes already from the pioneering study of Botvinick et al. (2001). The conflict-control model was applied to color-word stimuli—truly conflict and Stroop stimuli—and to line stimuli, distinctively nonconflict and non-Stroop stimuli. Success at modeling completely different tasks, for example, Stroop versus discrimination of line stimuli (while also observing errors), cannot replace genuine cognitive theory.

The anomaly Stroop facilitation

The phenomenon of Stroop facilitation is anathema to the conflict-control approach as this approach is predicated on interference. One notices that in virtually all control studies of Stroop, “Stroop effect” and “Stroop interference” are used interchangeably. However, this is a false identity, and it only serves to undermine research and theory alike. As a glimpse at Equation 4 shows, the Stroop effect comprises two components, interference and facilitation. The typical Stroop study records both interference and facilitation, but it is eminently possible that nearly all of the Stroop effect is interference or that the chief part is facilitation. Interference is usually larger than facilitation, but facilitation is often sizeable (e.g., Tzelgov et al., 1992) and can be as large (e.g., Brown, 2011; Sabri et al., 2001) or be larger than interference (e.g., Carter et al., 1992; Melara & Algom, 2003; see also, Hatukai & Algom, 2017). Notably, the entire Stroop effect can be facilitation (Eidels, 2012; Eidels et al., 2010; Schmidt et al., 2013; see also Schmidt, 2021). In particular, Schmidt et al. (2013) have shown that semantic processing—the genesis of the Stroop effect—is facilitative and that with (very) large stimulus sets incongruent stimuli might actually facilitate performance. Interpreting such results in terms of conflict monitoring is awkward if not downright impossible.

Clearly, the Stroop effect is not synonymous with interference. This absolutely vital distinction is ignored in large portions of control studies of Stroop. As we just recounted, the Stroop effect can be produced fully or in part by facilitation. Now, for the conflict monitoring account, a facilitation-produced Stroop effect is an anomaly. Stroop stimuli are assumed to generate conflict and interference. However, if presentation of the same stimuli generates facilitation rather than interference, then the conflict-generated-control approach is called into question. The situation is actually worse with respect to control. Most control studies of Stroop did not use neutral stimuli, so that the Stroop effect could not be partitioned into interference and facilitation. In the absence of partitioning, the effects attributed to interference might well be those of facilitation.

Several caveats are invited. First, we do not argue that in all or that in the majority of studies the Stroop effect is reducible to facilitation. It is still the case that, in most empirical research, interference is larger than facilitation. However, it is theoretically possible (and sometimes materializes) that the major component or even the entire Stroop is effect is facilitation. Second, including a neutral condition is common in Stroop studies (though not in those associated with control), but is less common in the allied tasks of Simon and flanker. Nevertheless, when a neutral condition was included, it led to important theoretical developments. For example, Aisenberg and Henik (2012) introduced two types of neutral stimuli onto the Simon task and recorded a significant facilitation effect (with one). This means that the Simon effect, too, is composed, in part, of facilitation. In a further study from the Henik lab using tactile responses (Salzer et al., 2014), a “neutral condition revealed both facilitation and interference in the . . . Simon effect” (p. 177). This, in turn, led the authors to propose that two separate cognitive mechanisms underlie the Simon effect. For the flanker task, Lamers and Roelofs (2011) included a neutral condition, documenting facilitation along with interference. The results led these authors to challenge conflict monitoring theory. Including a neutral condition is a bit more common in flanker tasks associated with load theory (Lavie, 2005). In a recent flanker study with a neutral condition (Z. Li & Lou, 2019), a reverse facilitation effect was found by which responses in the neutral condition were faster even than those in the congruent condition. The same reverse facilitation effect is sometimes documented in color-word Stroop, too (Entel et al., 2015; Kalanthroff et al., 2018). Z. Li and Lou (2019) concluded that their results pose a challenge to load theory.

The upshot is, facilitation is present and contributes to the Simon and the flanker tasks, too. It is difficult to assess its size given the infrequent use (as yet) of neutral stimuli in that research. Regardless, the main point is that the notion of conflict conductive to facilitation is problematic for conflict-monitoring theory.

Manipulating percent congruity (PC) creates correlation, not conflict

In conflict-monitoring theory, incongruent stimuli are said to generate conflict (the adverse consequences of which are then attenuated by the summoned control). Consequently, control studies of Stroop manipulate the number of incongruent or congruent stimuli in the stimulus ensemble as a means of creating or reducing conflict and control. What is not realized though is that the same manipulation automatically creates color-word correlation and word-response correlation in the same stimulus ensemble (see Algom & Chajut, 2019; Hasshim & Parris, 2021; Spinelli & Lupker, 2020, on the difference between color-word correlation and word-response contingency). These stimulus factors have nothing to do with conflict or top-down control, yet they generate all of the effects attributed to conflict and control. To understand the impact of correlation, observe that any deviation from random assignment of the ink colors to the words creates a word-color correlation and a word-response correlation over the experimental trials. Such correlations are unavoidable in imbalanced presentations. In the lopsided designs used in control studies—say 80% (in)congruent stimuli—the correlations are sizeable. Given a correlation, the nominally irrelevant word becomes predictive of the to-be-reported color. Inevitably, exclusive attention to the color is compromised, and a large Stroop effect ensues. Correlation is fatal for selective attention, the very task of the Stroop assay.

How does color-word correlation explain the typical PC result? Consider the commonly employed 2 (word) × 2 (color) design with 80% (in)congruent stimuli. In the mostly congruent condition, the large color-word correlation created diverts attention to the task-irrelevant word due to its predictive power. Because the Stroop effect gauges the failure of full selectivity to color (Equation 1), a large Stroop effect ensues. In the mostly incongruent condition, the possible gain reaped by the correlation is offset by the semantic clash between color and word, resulting in a smaller Stroop effect.

The second correlation produced by manipulating PC is that between the word and the response. Consider the mostly congruent condition. For the frequent congruent trials, the nominally irrelevant word strongly predicts the response, resulting in very short RTs. For the rare incongruent trials in this condition, the word mispredicts the response, resulting in a cost. The net result is a large Stroop effect. In the mostly incongruent condition, a strong predictive relationship again develops between the word and the opposing response (e.g., when the word is RED, respond “green”). The responses are fast on the congruent trials, but they are rare (and the predictive role of the irrelevant word changes), so that net result is a small Stroop effect. The main point is that correlation accounts of the PC effect are “unrelated to conflict, control,” so that, “stimulus–response correspondences is all that matters” (Schmidt, 2016, p. 1). Applying Occam’s razor, stimulus-bound accounts that merely invoke perception of correlation are preferred to accounts that summon top-down control on a trial-to-trial basis.

The Stroop task is further jeopardized in control studies by the experimental instructions. Often, participants are told in advance to attend to the task-irrelevant word, thereby contravening explicitly the essence of the Stroop task (as a measure of selective attention to color). Consider the title of the study by Bugg and Smallwood (2016), “The next trial will be conflicting!” These instructions in effect invite directing attention to the nominally irrelevant word, thereby rendering the Stroop task and effect meaningless. Less extreme encroachments entail advance information on the composition of stimuli in the set—effectively telling people to attend to the nominally irrelevant word (e.g., the majority, 80% say, of the next series of trials will be congruent).

A further misconception in Stroop studies of control refers to 50% PC. This condition is assumed (often implicitly) to be the unbiased standard from which conflict and control (80% incongruent) or release of conflict and control (20% incongruent) generated. What is not realized is that in the common 4 (color) × 4 (word) design, the 50%–50% congruent–incongruent composition already entails a sizeable color-word correlation. This condition is not neutral by any means (see Fig. 1).

Fig. 1
figure 1

Color-word correlation in the Stroop experiment: The typical balanced 4 (color) × 4 (word) Stroop design with 36 congruent (on the negative diagonal) and 36 incongruent (off diagonal) stimuli. There are 16 combinations of word and color in the basic factorial design, of which four are congruent (diagonal) and 12 are incongruent (off diagonal). The only way to equate the frequency of congruent and incongruent stimuli is to present each congruent stimulus more often (in this case, three times as often) than each incongruent stimulus. As a result, a correlation is created between the task-irrelevant word and the target color: The nominally irrelevant word predicts the to-be-reported color better than chance

Correlation is a major determinant of the magnitude of the Stroop effect across the vast Stroop literature. Melara and Algom (2003) plotted the magnitude of the Stroop effect against the built-in correlation in the design of a sample 35 studies culled from the literature. They found a staggering correlation of .69, meaning that 50% of the variability in the magnitude of the Stroop effect in the literature is accounted for by the color-word correlation built into the design of the experiment.

The goal of the Stroop task is to gauge selectivity, not to serve as a means for generating conflict

The Stroop task is a toolkit for testing selective attention (Equations 1–4). Many (perhaps the majority of)control studies of Stroop (aiming at all types of Stroop and Stroop-like dimensions as well as the flanker and the Simon tasks) drift away from the goal of generating the effect and calculating its magnitude. They rather use the attendant procedures merely as a means of generating conflict, so that the Stroop effect and task lose their raison d’être. The manipulations in the control literature are still called “Stroop,” but in truth they have nothing to do with the Stroop effect. A sure sign of the misnomer is that the Stroop effect is not even calculated or reported in many “Stroop studies” of control (e.g., Kleiman et al., 2016). When the Stroop effect is reported, this result is typically relegated to the margins (and easily lost on the reader). It is moot if such an approach can serve as a candidate theory for the Stroop effect.

We should add the confusion reflects the empirical situation—the general disinterest of control research in the Stroop effect per se. The problem granted, it does not itself mean that the Stroop effect cannot in principle be used as a testing ground for conflict monitoring—if used appropriately. Other problems with conflict monitoring challenge even the theoretical possibility.Footnote 2

The Stroop effect is determined by salience of the colors and the words, not by conflict and control

The preoccupation of conflict monitoring with central top-down regulation likely occasioned the overlook in the control literature of the simple physical makeup of the stimuli—the hue, value, and saturation of the colors, and the font, size, location, or visual angle of the words. However, a major determinant of the Stroop effect is precisely the physical makeup of the stimuli. The physical properties of the presented colors determine the ease (or difficulty) of distinguishing one color from another. Similarly, the physical properties of the presented words determine how discriminable each word is from the others. The Stroop effect is determined by the relative discriminability of the colors and the words: The more discriminable dimension (typically word) interferes with performance on the less discriminable dimension (color) (e.g., Algom et al., 1996; Algom & Fitousi, 2016; Fitousi, 2016; Fitousi & Algom, 2006; Garner, 1974; Garner & Felfoldy, 1970; Melara & Algom, 2003; Kahneman & Chajczyk, 1983; Melara & Mounts, 1993; Mevorach et al., 2010; Mevorach et al., 2006; Pansky & Algom, 1999, 2002; Pomerantz, 1983; Pomerantz & Garner, 1973). Mismatched discriminability favoring the word characterizes control studies of Stroop. It is this mismatched discriminability—not conflict and control—that determines the Stroop effect. The highly salient words intrude on color performance (= producing the Stroop effect) because they differ perceptually from one another more than do the less salient colors from one another—not because word reading is the habitual response or because the response generates conflict.

The critical role of (mis)matched discriminability is revealed when advance care is taken to match salience across the word and the colors (without affecting legibility or color identification). Then, the Stroop effect simply evaporates as does the typical word-color asymmetry (by which words intrude on color classification, but colors do not on word reading). Notably, when the ink colors are made purposely more salient than the words, a reverse Stroop effect is found by which the colors disrupt word reading, but the words do not affect color classification (of interest to note that the first researcher to report a reverse Stroop effect was J. R. Stroop himself in the little read Experiment 3 of his classic study; Stroop, 1935). The collective results converge on a simple rule: The more discriminable dimension intrudes on the less discriminable dimension more than vice versa (see Fig. 2). Because the context of relative dimensional discriminability has been ignored in the control literature, one cannot rule out the possibility that a large portion of reported effects are not due to control but rather due to relative stimulus salience.

Fig. 2
figure 2

Predominance of stimulus salience: The same number of conflict stimuli produces the Stroop effect when word is more salient than color (the default preparation; left panel), zero Stroop effect when word and color are matched in salience (middle panel), and a reverse Stroop effect by which the more salient colors intrude on word reading (right panel)

As we mentioned at the outset, discussion of the original color-word version (see also Fig. 2) serves merely as our running example. Notably in this respect, salience has been shown to impact the outcome in exactly the same way in the numerical Stroop task (e.g., Algom et al., 1996; Pansky & Algom, 1999, 2002), in the Navon hierarchal global/local letter task (Pomerantz, 1983; see also Mevorach et al., 2010), in visual-auditory cross-modal Stroop tasks (Melara & O’Brian, 1987) or in the spatially separated version of the Stroop task (wherein color and word are presented in different locations; Chajut et al., 2009; Fitousi & Algom, 2006). Kahneman and Chajczyk (1983) used different interstimulus distances as their manipulation of the spatially separated version. Recently, Fitousi (2016, Experiments 2–3) manipulated salience in the Simon task in an analogous fashion to the color-word version and Melara et al. (2018) tested the role of perceptual discriminability in the flanker task—both yielding significant results.

It is important to realize that relative dimensional discriminability accounts for a huge swath of the results across the vast Stroop literature. Melara and Algom (2003) found a correlation of .78 between word-color difference in discriminability and the size of the Stroop effect in a sample of 35 studies drawn from the literature. This difference governs the very appearance, magnitude, and direction (standard, reverse) of the Stroop effect. Notably, the rule has nothing to do with the notions of conflict and control, and, in fact, it poses a powerful challenge to explanations of conflict monitoring and control. By the latter, the Stroop effect is said to depend on the number of conflict stimuli presented. However, the same number of conflict stimuli can result in a large Stroop effect (when word is more salient than color), in a zero Stroop effect (when color and word are matched), and in a reverse Stroop effect (when color is made more salient than word). The stimulus factor of relative salience is at once a simpler and a better account of the Stroop effect than the heavy machinery of conflict and central control.

Although the critical role of relative dimensional discriminability is completely ignored in control research, the problem is less severe in selective fortuitous cases. When the same words and colors are presented for tracking performance on a trial-by-trial basis (e.g., in the Gratton effect; see below), then, obviously, discriminability is held constant (if unknown). However, even is such cases mismatched discriminability might have well predetermined the outcome. In research with separate blocks of mostly congruent and mostly incongruent trials, or that with a color-word matrix embedded in another matrix, different words and colors are typically used—with (unknown) modulation of discriminability.Footnote 3

Conflict monitoring and control cannot (completely) explain the Gratton effect

The sequential phenomenon known as the Gratton effect (Gratton et al., 1992) is the observation that the RT to the second incongruent stimulus is faster in an incongruent–incongruent sequence than in a congruent–incongruent sequence. Or, perhaps more commonly, the effect is gauged by the congruity interaction between trial n-1 (lower case letter) and trial N (capital letter) is calculated: (cI − cC) − (iI − iC), with c/C standing for congruent stimuli and i/I for incongruent stimuli. The Gratton effect is cited, along with the PC effect, as the strongest piece of evidence supporting conflict monitoring and control. Conversely, we show that conflict monitoring and control is inconsistent with major empirical features of the Gratton effect. Its other designation, the congruity sequence effect (Schmidt, 2013, 2019), likely reflects the true source of the Gratton effect—a local, input-driven, bottom-up phenomenon—one among the many sequential effects documented in the Stroop literature. Several of the wealth of sequential effects in the Stroop (and flanker and Simon) tasks (MacLeod, 1991) have even been suspected before the advent of the electronic computer that allows for trial-to-trial analysis (e.g., Dalrymple-Alford & Budayr, 1966; Smith & Klein, 1953; Smith & Nyman, 1962; see also Jensen & Rohwer, 1966).

Concerning the Gratton effect, Schmidt (2019) has reviewed a large number of studies investigating faster responses on Stroop trials following an incongruent stimulus (see also Algom & Chajut, 2019, for a selective mini-review). Schmidt found systematic biases plaguing those control studies, compromising their conclusion on conflict monitoring and central control. Schmidt concluded that stimulus factors (of learning various contingencies) provide a more parsimonious account (see also Schmidt, 2021). Here, we eschew another review and limit the discussion to three major problems that pose a challenge to conflict monitoring and control accounts.

Stimulus dependence

A central, though rarely articulated assumption of conflict monitoring is that adaptation and control are stimulus independent: It is the presence of conflict that is critical, not the particular components or means generating it. In sharp contrast, existing research reveals that adaptation and control are profoundly stimulus dependent. Consider a simple demonstration that includes two pairs of incongruent–incongruent sequences. The first sequence entails complete repetition: RED in green followed by RED in green. The second sequence entails complete change: BLUE in yellow followed by PINK in brown. Given the conflict experienced with the first stimulus in each pair, the control-produced-adjustment in the second stimulus should be comparable across the two sequences. This basic prediction is not borne out by empirical research. Instead, cumulative research in the past 20 years shows that the effect is much stronger in the first than in the second sequence, and that it is moot whether the effect is present at all in the latter (Mayr et al., 2003). A simple summary of research is this. In all Stroop studies using a 2 (word) × 2 (color) design (and in similar flanker and Simon designs), the Gratton effect cannot be attributed to conflict due to unavoidable stimulus repetitions and correlations lurking in the presentation (but see Kim & Cho, 2014). Larger designs of 4 (word) × 4 (color) and beyond (e.g., Mordkoff, 2012) are even more vulnerable as they are not free of all types of first-, second-, and…n-order contingencies (e.g., Kim & Cho, 2014; Dishon-Berkovits & Algom, 2000; Mayr & Awh, 2009; Mordkoff, 2012; Schmidt & Besner, 2008). The theory developed by Hommel et al. (2004; see also Verguts & Notebaert, 2009) of binding processes operating across stimulus–stimulus and stimulus–response “files” also replaces central control as the source of the Gratton and associated effects. Consequently, stimulus factors still provide the preferred explanation over conflict monitoring and control.

Since Mayr et al.’s (2003) formative study (that also took care of response repetitions), strenuous efforts have been made to expunge all types of stimulus and response repetitions from the experiment (e.g., Egner, 2014; Weissman et al., 2014). However, the effort at eliminating stimulus confounds often came at the cost of deforming the flanker task (i.e., altering its nature as a conflict task). A common tactic used was presenting a cue in advance of the flanker trial. The results then often depended on the perceived validity of the cue instead of the flanker, while inviting into the experiment processes such as costs and benefits of switching. The results obtained in such prime-probe or temporal flanker experiments were mixed. Thus, Weissman et al. (2014) did not find a correlation between the Gratton and the flanker effects, and sometimes recorded a negative Gratton effect, anathema to any control account. Because it is virtually impossible to remove all stimulus effects from the Stroop task (in whatever design), all attendant accounts in terms of conflict monitoring and control are suspect. In this respect, too, stimulus-bound processes of binding and unbinding (Hommel et al., 2004) account well for the Gratton effect. It is still moot whether a genuine conflict is what produces the Gratton effect.

Task dependence

Another unarticulated assumption of conflict monitoring and control is task independence. For example, facilitation should be observed on a trial in the Simon task if the previous trial was a conflict trial in the flanker task. Results violate this prediction (e.g., Akçay & Hazeltine, 2011; see also Egner, 2008). In many studies, a Gratton effect is observed in the Stroop task, but not in concurrently applied Simon or flanker tasks, and vice versa. Of more concern, when cross-task Gratton effects are observed, they are explained by shared rules, mechanisms, or stimulus features (e.g., Feldman et al., 2015; Freitas & Clark, 2015). The domain specificity (e.g., Akçay & Hazeltine, 2011), context specificity (e.g., Funes et al., 2010), even response specificity (Kim & Cho, 2014) of adaptation are discordant with the assumption of central general regulation.

Congruent–incongruent sequence symmetry

The Gratton effect concerns incongruent–incongruent sequences with facilitation observed with the second stimulus. Less attention was given to congruent–congruent sequences. Notably, these sequences produce parallel results (e.g., Aczel et al., 2021; see also Braem et al., 2019). The RT to a Stroop-congruent stimulus is usually faster after experiencing another congruent stimulus. This result challenges conflict monitoring because congruent–congruent sequences do not entail (high) conflict. The symmetry of the sequences reinforces the case for stimulus specific factors, unrelated to conflict and control.

The notion of conflict: Stroop versus conflict monitoring theory

The principal point that emerges from our survey of problems with conflict monitoring vis-à-vis the Stroop effect pertains to difficulties with the fundamental concept of conflict. The cornerstone of the Stroop effect is conflict: It is the presence of conflict (in incongruent stimuli) and the absence of conflict (in congruent stimuli) that defines the Stroop effect (Equations 1–4). We cannot overemphasize the logical foundation of conflict in the Stroop domain: Conflict is used in the strictest logical sense of the term. The responses “red” and “green” to the Stroop-incongruent stimulus, RED in green, thus can be seen an instantiation in psychology of the law of noncontradiction in logic. By the law, “A is B” and “A is not B” cannot both be true at the same time or the same sense because they are mutually exclusive (Copi, 2015). Commensurately in the Stroop domain, the responses need to be mutually exclusive logically and opposing semantically in order to be conflicting (as “red” and “green” are). The mere availability of multiple alternative responses (present with virtually all stimuli) does not ipso facto render them “conflicting.” Common sense follows logic and mainstream psychology in this case. In the American Heritage Dictionary, conflict is defined as the “simultaneous functioning of mutually exclusive tendencies.” In Merriam-Webster, the verb form is defined as “to show opposition or irreconcilability” (The American Heritage Dictionary, 2nd College Ed., 1985, p. 309; Merriam-Webster's Dictionary and Thesaurus, 2007, p. 161). Therefore, it is not surprising that the responses to a Stroop-incongruent stimuli comprise a full-fledged logical paradox (see Fig. 3).

Fig. 3
figure 3

The Stroop effect as a logical paradox. Note the paradox applies only to genuine Stroop stimuli, not to any multidimensional stimulus with alternative responses

Conflict monitoring theory ignores the logical (indeed psychological) foundations of conflict. On that view, where there are multiple responses (there virtually always are), there is “conflict,” overlooking the logical truism that all alternative responses are not also conflicting. Nevertheless, it is this distortion of the psycho-logical notion of conflict that is modeled in conflict monitoring accounts. In the language of Stroop, Botvinick et al. (2001) did model something (likely valuable), but what they modelled was not conflict in any ordinary or Stroop sense of the term. In that early rendition, as well as in subsequent developments (Botvinick et al., 2004; Musslick et al., 2015; Shenhav et al., 2013; Yeung et al., 2004; Yeung et al., 2011; Yeung & Nieuwenhuis, 2009), “conflict” is stretched beyond recognition (in psychology, that is) to include not only multiple responses but everything difficult at the input (e.g., noise).

The foreignness of conflict-monitoring to the Stroop realm is perhaps most readily apparent in the notion of congruent stimuli. In that model, Stroop-congruent stimuli also produce conflict! This feature is conveniently unrecorded in published reports, but it is part and parcel of the computational model. To be precise, congruent stimuli entail less conflict than do incongruent stimuli, but they, too, are conflict stimuli. Notice the Stroop absurd: Both components of the congruent stimulus (e.g., RED in red) agree and support the same single response, and yet they are “conflicting” in conflict-monitoring theory. Further in this respect, in many control studies Stroop-incongruent stimuli are called “conflict stimuli,” but this designation is not strictly consistent with the monitoring model.

In the face of the problems noted, one cannot deny the success of conflict monitoring theory in modeling various sets of Stroop data. Nevertheless, the fad of portraying this effort as unmitigated success must be resisted. First, much of the data modelled rest of nonstandard designs (presenting congruent and incongruent stimuli in separate blocks; e.g., Pardo et al., 1990); very long individual trials (e.g., Carter et al., 1995); tasks of divided rather than of strictly selective attention (e.g., Hutchison et al., 2016); embedded sets of stimuli within global sets (e.g., Bugg, 2014); extra-Stroop manipulations (e.g., Hutchison et al., 2013); or ignoring the Stroop effect altogether (e.g., Kleiman et al., 2016). Therefore, it is moot whether or to what extent this modeling informs bona fide Stroop phenomena and processes. Second, of the wealth of Stroop phenomena, only two are really modelled: PC and Gratton effects. Third, pursuant to the previous point, we do not see a clear path of conflict-based modeling to a variety of other Stroop results. For example, slight modifications of stimulus salience eliminate or reverse the Stroop effect—with these radical changes coming in the face of an invariant number of conflict stimuli. Fourth and most important, all of the Stroop results modelled by conflict-monitoring are accounted more fully and more parsimoniously by input-driven attention and perceptual and processes. The latter explain many more phenomena that are ignored by models of conflict monitoring.

To sum up, virtually all facets of Stroop processes are explained by stimulus driven bottom-up processes. Top-down influence, as captured by conflict monitoring and control, can arguably accommodate several of the basic Stroop findings mentioned, but assuming such an influence is redundant and unparsimonious. It is also plagued with problems. Whatever are its other virtues, conflict monitoring and control is simply the wrong theory for the Stroop effect.

This concludes our critique of conflict monitoring as a suitable candidate theory for the Stroop effect. We end the current analysis with several general observations on problems with conflict monitoring and control. Readers interested in the Stroop effect per se can conveniently skip this section.

Epilogue: Corrupting the notion of conflict in conflict monitoring theory

As we recounted, the fundamental concept of conflict-monitoring theory, conflict, is in a great measure divorced from its meaning in psychology and in logic.

The tenuous quality of “conflict” is evident already in the pioneering study by Botvinick et al. (2001). Apart from a wealth of examples (in lieu of formal delineation), the closest the authors come by way of a theoretical definition is this: “conflict may be operationally defined as the simultaneous activation of incompatible representations . . . e.g., representations of alternative responses” (Botvinick et al., 2001, p. 630; emphases added). Notice the identity: “alternative” = “incompatible,” ignoring the psychological truism that all alternative responses are not also incompatible. The responses “red” and “triangle” to the stimulus of a red triangle are not conflicting in any common, psychological, or logical sense. The tenet of conflict monitoring theory that all multidimensional stimuli are conflict stimuli by virtue of their makeup is not really tenable.

The divorce of “conflict” in conflict monitoring from its meaning in psychology is even more complete in subsequent extensions that also include errors (e.g., Yeung et al., 2004; Yeung et al., 2011; Yeung & Nieuwenhuis, 2009). In the more recent version (Yeung et al., 2011), “conflict” is any sensorimotor or cognitive activity affecting RT; hence, “conflict” can comprise noise, fluctuations of concentration, response bias, and, absurdly, even one-dimensional variable signals with a single response option. As Grinband et al. (2011b) note, such a “diffuse definition” (p. 322) “trivializes the idea of conflict. Conflict is no longer defined as competition between response options, but rather arises from a less-well specified set of processes” (p. 321).

The gratuitousness of conflict monitoring and control is apparent in control studies themselves. Thus, Bugg et al. (2015) distinguish between experience-based and expectation-based explanations of Stroop performance. The first class refers to stimulus factors that are reviewed more fully here; the second class is readily explained by perception of contingency and correlation. The authors’ appeal to central control is simply unwarranted. In his influential dual mechanisms of control model, Braver distinguishes between “proactive control” and “reactive control” (Braver, 2012). The former acts strategically in a sustained manner through top-down regulation in order to maintain goal-relevant information and to bias attention for obtaining optical performance. This species of control is activated by the likes of task instructions and PC. For the latter, Braver states that it is strongly “stimulus driven and transient . . . stimulus dependent . . . reliant on strong bottom-up . . . cues” (Braver, 2012, p. 108). If so, whence top-down control? And, what Braver calls proactive control is more parsimoniously explained by perception of correlation and contingency.

The last point is notable. The problem with the view of Braver and of further advocates of control is that they do not envisage any process exempt from control. However, if everything is control, nothing is. A useful scientific construct should also delineate what is not included in its purview. The needed demarcation is missing from accounts of control as they do not state what is excluded. If everything is control, then nothing is (i.e., the concept becomes an empty and useless one).