Defining features versus incidental correlates of Type 1 and Type 2 processing
- First Online:
- Cite this article as:
- Stanovich, K.E. & Toplak, M.E. Mind Soc (2012) 11: 3. doi:10.1007/s11299-011-0093-6
- 709 Views
Many critics of dual-process models have mistaken long lists of descriptive terms in the literature for a full-blown theory of necessarily co-occurring properties. These critiques have distracted attention from the cumulative progress being made in identifying the much smaller set of properties that truly do define Type 1 and Type 2 processing. Our view of the literature is that autonomous processing is the defining feature of Type 1 processing. Even more convincing is the converging evidence that the key feature of Type 2 processing is the ability to sustain the decoupling of secondary representations. The latter is a foundational cognitive requirement for hypothetical thinking.
KeywordsDual process theoryCognitive decouplingAutonomy
Dual-process theories of cognition have received a large share of attention in the last decade (Evans 2008, 2010; Evans and Frankish 2009; Kahneman and Frederick 2002; Lieberman 2003, 2007; Stanovich 2004, 2011). Our purpose here is not to attempt an overall assessment of the state of play in this literature. Instead, our aim is much more modest. In this brief note, we simply wish to clear up one confusion concerning dual-process models that continues to plague the literature and retard theoretical progress.
Commonly listed properties of Type 1 and Type 2 processing
Type 1 processes
Type 2 processes
Relatively undemanding of cognitive capacity
Acquisition by biology, exposure, and personal experience
Acquisition by culture and formal tuition
Often unconscious or preconscious
Lower correlations with intelligence
Higher correlations with intelligence
Short-leashed genetic goals
Long-leashed goals that tend toward personal utility maximization
Before moving on, a brief note about terminology is in order. In order not to show a preference for one particular theory, Stanovich (1999) used the generic terms System 1 and System 2 to label the two different sets of properties. Although these terms have become popular, there are problems with the System 1/System 2 terminology. Such terminology seems to connote that the two processes in dual process theory map explicitly to two distinct brain systems. This is a stronger assumption than most theorists wish to make. Additionally, both Evans (2008, 2009) and Stanovich (2004, 2011) have discussed how terms such as System 1 or heuristic system are really misnomers because they imply that what is being referred to is a singular system. In actuality, the term used should be plural because it refers to a set of systems in the brain. Therefore, in Table 1 the Type 1/Type 2 terminology of Evans (2008, 2009; see also Samuels 2009) is employed. The Type 1/Type 2 terminology captures better than previous terminology that a dual process theory is not necessarily a two system theory (see Evans 2008, 2009, for an extensive discussion).
The purpose of this original property table in Stanovich (1999) was simply to bring together the many properties that had been assigned to the two processes in the proliferation of dual-process theories of the 1990s. The list was not intended as a strict theoretical statement of necessary and defining features. As Stanovich (1999) notes in his discussion, he was searching for “family resemblances” among the various theories, perhaps similar to other theorists who published analogous lists. The list was descriptive of terms in the literature—not a full-blown theory of necessarily co-occurring properties. No one at the time could have made such a list of necessarily co-occurring properties, because based on the unsystematic and non-cross referenced work of the 1990s, no one could have known such a thing. In the past decade, however, several investigators have made an attempt to zero in on the crucial defining features of the two types of processing—and by inference make a statement about which properties are incidental correlates. Our own theoretical attempt will be sketched below, but first we will indicate how Table 1 has been misused in the literature to discredit dual-process theory.
The main misuse of such tables is to treat them as strong statements about necessarily co-occurring features—in short, to aid in the creation of a straw man. The longer the list of properties in any one table, the easier it is to create the complete straw man claim that if all of these features do not always co-occur then the dual-process view is incorrect. Kruglanski and Gigerenzer (2011) most recently created such a straw man with their claim that dual-process views fail because “these dimensions are unaligned rather than aligned” (p. 98). They explicitly construct their straw man by considering six dichotomies to carry the assumption that all are defining and thus that all must co-occur: “assuming six dichotomies, one would end up with a 26 = 64 cell matrix of which only two cells (those representing the conjunction of all six dichotomies) had entries. Again, such logical implication of the alignment assumption has never been considered seriously or tested empirically” (p. 98).
But the so-called “alignment assumption” here is not attributed to a specific dual-process theorist in their article. This is not surprising because dual process theory does not stand or fall on the full set of properties in a table like Table 1 necessarily co-occurring. Tables of properties such as appeared in publications a decade ago (see Kahneman and Frederick 2002; Stanovich 1999) were meant to organize a nascent theoretical literature, not to lay out an absurdly specific prediction about the co-occurrence of features that had been generated from over two-dozen different dual process conceptions. We doubt that the “only two cells out of 64” prediction is fulfilled by any theory in psychological science, no matter how rigorous.
It has long been recognized that Type 1 processing might involve subprocess properties that are, empirically, somewhat separable. A bit of history will show this. Our research group first employed dual-process theories not to understand reasoning and decision making, but to conceptualize individual differences in word recognition skill. In 1978, we first used the Posner and Snyder (1975) dual-process model to understand developmental trends in children’s reading performance (West and Stanovich 1978). The concept of automaticity (a Type 1 processing term) was quite popular in reading theory at the time. As years went by, it became clear that the many properties ascribed to automatic processes (modularity, speed, autonomy, resource-free processing, nonconscious processing, etc.) might not all co-occur and that reading theory needed to focus in on the one property with a unique causal role in reading skill development (many of the other correlated properties being merely incidental correlates). By 1990, one of us was writing that “LaBerge and Samuels had implicitly equated the obligatory nature of an automatic process…with capacity-free processing. In addition, the use of processing resources was conflated with the idea of conscious attention, and conversely, lack of conscious attention was viewed as synonymous with resource-free processing. Only later was the necessity of theoretically separating the issues of obligatory execution, resource use, and conscious attention fully recognized….The tendency to intertwine resource use with conscious attention in reading theory was reinforced by the popularity of Posner and Synder (1975) two-process model of cognitive expectancies” (p. 74–75, Stanovich 1990).
In the same paper, Stanovich (1990) stressed Zbrodoff and Logan’s (1986) point that “there are no strong theoretical reasons to believe in the unity of automaticity. The idea that the various properties should co-occur has not been deduced from established theoretical principles” (p. 118). Stanovich’s (1990) review of the evidence indicated that “developmental work has confirmed the finding that speed, obligatory processing, and capacity usage are at least partially dissociable. For example, it is clear that children’s word recognition speed continues to decrease even after Stroop indices of obligatory processing are at asymptote” (p. 80). In short, over two decades ago, one of us argued that the concept of automaticity (the term for Type 1 processing in reading theory) did not entail all of the correlated lists of properties that had appeared in the literature. The essential feature of automaticity in the sense of LaBerge and Samuels (1974) classic paper was the ability to process information while attention was directed elsewhere. The other properties (consciousness, resource use, etc.) listed by a host of theorists were incidental correlates. This nuance in dual-process theory has been recognized since the late 1980s.
Nonetheless, the tendency to criticize dual-process theories because of the less than perfect co-occurrence of the many properties thrown into the theoretical stew by over two-dozen theorists prior to 2000 persists. For example, Osman (2004) argues that the constructs of implicit processing, automaticity, and consciousness do not cohere in the manner that she infers they should from tables such as that in Stanovich (1999; see Table 1). Keren and Schul (2009) cite Newstead (2000) as questioning the “tacit assumption made by two-system researchers, that the similarities in the distinctions made by different models are so striking and transparent that they need no further evidence [italics added]” (p. 535). As a straw man statement, that one is hard to beat! Keren and Schul (2009) continue their critique in a similar vein, saying that they “wonder whether the dichotomous characteristics used to define the two-system models are uniquely and perfectly correlated” (p. 537). And they proceed to ride the old “2 cells out of 64” warhorse again1. For example, we are reminded in great detail that “it is necessary to demonstrate not only that the two feature sets exist (i.e., that a1, b1, and c1 tend to appear together and so do a2, b2, and c2), but also to establish that all hybrid combinations (e.g., a1, b1, and c2) do not! With three dichotomous features, five (out of six) combinations must be ruled out. As the number of features increases (most systems under consideration here are characterized by six or more dichotomies), so does the number of comparisons” (p. 539).
All of this is to establish their point2 that “the use of dichotomies to characterize the systems seems an important feature of the models, as it allows the researchers to propose that the systems are qualitatively different” (p. 538). But all of these dichotomies were never necessary to establish the two types of processing (which itself suggests that this was not the purpose of such lists)—the only thing needed is one fairly dichotomous property that is necessary and sufficient. As argued previously, the whole pedantic “2 out of 64” exercise collapses if the dichotomous characteristics (see Evans 2008, for a particularly complete list) were, each of them, never viewed as essential characteristics in the first place—that is, if it was never assumed that each of the properties listed was necessary in order define qualitatively different types of processing.
2 Our view of the defining features of Type 1 and Type 2 processing
For years, many investigators have been working—both theoretically and empirically—on the issue of which of the many properties associated with Type 1 versus Type 2 processing is really the essential feature. Evans’s (2007, 2008, 2009, 2010) attempts at more specificity on this issue are notable. We will close our comment by sketching our own view.
In our model (Stanovich 2004, 2009, 2011), the defining feature of Type 1 processing is its autonomy—the execution of Type 1 processes is mandatory when their triggering stimuli are encountered, and they are not dependent on input from high-level control systems. Autonomous processes have other correlated features—their execution tends to be rapid, they do not put a heavy load on central processing capacity, they tend to be associative—but these other correlated features are not defining. Into the category of autonomous processes would go some processes of emotional regulation; the encapsulated modules for solving specific adaptive problems that have been posited by evolutionary psychologists; processes of implicit learning; and the automatic firing of overlearned associations (see Barrett and Kurzban 2006; Carruthers 2006; Evans 2008, 2009; Samuels 2005, 2009; Shiffrin and Schneider 1977; Sperber 1994).
These disparate categories make clear that the categories that Type 1 processing represents is a grab-bag—encompassing both innately specified processing modules/procedures and experiential associations that have been learned to automaticity. Their only uniform commonality is their autonomy. The point that Type 1 processing does not arise from a singular system is stressed by both Evans (2008, 2009) and Stanovich (2004, 2011). The many kinds of Type 1 processing have in common the property of autonomy, but otherwise, their neurophysiology and etiology might be considerably different. For example, Type 1 processing is not limited to modular subprocesses that meet all of the classic Fodorian (1983) criteria, or the criteria for a Darwinian module (Cosmides 1989; Sperber 1994). Type 1 processing encompasses processes of unconscious implicit learning and conditioning. Also, many rules, stimulus discriminations, and decision-making principles that have been practiced to automaticity (e.g., Kahneman and Klein 2009; Shiffrin and Schneider 1977) are processed in a Type 1 manner.
Unlike Type 1 processing, Type 2 processing is nonautonomous and, as will be discussed below, it has a defining feature: cognitive decoupling. Type 2 processing is relatively slow and computationally expensive, but these are not defining features, they are correlates. Many Type 1 processes can operate at once in parallel, and Type 2 processing is largely serial—but this distinction is not the defining feature of Type 2 processing.
All of the different kinds of Type 1 processing (processes of emotional regulation, Darwinian modules, associative and implicit learning processes) can produce responses that are nonoptimal in a particular context if not overridden. For example, often humans act as cognitive misers (an old theme in cognitive/social psychology, see Dawes 1976; Taylor 1981; Tversky and Kahneman 1974) by engaging in attribute substitution—the substitution of an easy-to-evaluate characteristic for a harder one, even if the easier one is less accurate (Kahneman and Frederick 2002). For example, the cognitive miser will substitute the less effortful attributes of vividness or affect for the more effortful retrieval of relevant facts (Kahneman 2003; Slovic and Peters 2006; Wang 2009). But when we are evaluating important risks—such as the risk of certain activities and environments for our children—we do not want to substitute vividness for careful thought about the situation. In such situations, we want to employ Type 2 override processing to block the attribute substitution of the cognitive miser.
In order to override Type 1 processing, Type 2 processing must display at least two related capabilities. One is the capability of interrupting Type 1 processing and suppressing its response tendencies. Type 2 processing thus involves inhibitory mechanisms of the type that have been the focus of work on executive functioning (Aron 2008; Best et al. 2009; Hasher et al. 2007; Zelazo 2004). But the ability to suppress Type 1 processing gets the job only half done. Suppressing one response is not helpful unless there is a better response available to substitute for it. Where do these better responses come from? One answer is that they come from processes of hypothetical reasoning and cognitive simulation (Evans 2007, 2010; Evans et al. 2003). When we reason hypothetically, we create temporary models of the world and test out actions (or alternative causes) in that simulated world. In order to reason hypothetically we must, however, have one critical cognitive capability—we must be able to prevent our representations of the real world from becoming confused with representations of imaginary situations. The so-called cognitive decoupling operations are the central feature of Type 2 processing that make this possible (Stanovich 2004, 2009, 2011).
The important issue for our purposes is that decoupling secondary representations from the world and then maintaining the decoupling while simulation is carried out is the defining feature of Type 2 processing. For Leslie (1987), the decoupled secondary representation is necessary in order to avoid representational abuse—the possibility of confusing our simulations with our primary representations of the world as it actually is. To engage in exercises of hypotheticality and high-level cognitive control, one has to explicitly represent a psychological attitude toward the state of affairs as well as the state of affairs itself (Dienes and Perner 1999; Evans and Over 1999). Thus, decoupled representations of actions about to be taken become representations of potential actions, but the latter must not infect the former while the mental simulation is being carried out. Nonetheless, dealing with secondary representations—keeping them decoupled—is costly in terms of cognitive capacity. Evolution has guaranteed the high cost of decoupling for a very good reason. As we were becoming the first creatures to rely strongly on cognitive simulation, it was especially important that we not become “unhooked” from the world too much of the time. Thus, dealing with primary representations of the world always has a special salience that may feel aversive to overcome.
Nevertheless, decoupling operations must be continually in force during any ongoing simulations, and Stanovich (2004, 2009, 2011) has conjectured that the raw ability to sustain such mental simulations while keeping the relevant representations decoupled is likely the key aspect of the brain’s computational power that is being assessed by measures of fluid intelligence (on fluid intelligence, see Horn and Cattell 1967; Kane and Engle 2002). Decoupling—outside of certain domains such as behavioral prediction (so-called “theory of mind”)—is a cognitively demanding operation. Language appears to be one mental tool that can aid this computationally expensive process. For example, hypothetical thought involves representing assumptions, and linguistic forms such as conditionals provide a medium for such representations (Carruthers 2006; Evans 2007; Evans and Over 2004).
Decoupling skills vary in their recursiveness and complexity. The skills discussed thus far are those that are necessary for creating what Perner (1991) calls secondary representations—the decoupled representations that are the multiple models of the world that enable hypothetical thought. At a certain level of development, decoupling becomes used for so-called meta-representation—thinking about thinking itself (see Dennett 1984; Nichols and Stich 2003; Sperber 2000; Sterelny 2003). Decoupling processes enable one to distance oneself from representations of the world so that they can be reflected upon and potentially improved.
Increasingly it is becoming apparent that one of the critical mental operations being tapped by measures of fluid intelligence is the cognitive decoupling operation discussed here. For example, it has been proposed that the reaction time tasks that have been found to correlate with fluid intelligence (see Deary 2000) do so not because they tap into some factor of “neural speed” but instead because they make attentional demands that are similar to those of some working memory tasks (see Conway et al. 2002). Such an interpretation is suggested by the evidence in favor of the so-called worst performance rule—that the slowest trials on a reaction time task correlate more strongly with intelligence than does the mean reaction time or the fastest reaction time trials (Coyle 2003). In our view, the tail of the distribution is disproportionately made up of trials on which goals have momentarily deactivated because of partially failed decoupling, and such failures are more likely in those of lower fluid intelligence.
Lepine et al. (2005) report an experiment showing that working memory tasks with simple processing components are actually better predictors of high-level cognitive performance than are working memory tasks with complex processing requirements—as long as the former are rapidly paced to lock up attention. Likewise, Salthouse and Pink (2008) found that the very simplest versions of working memory tasks did not show attenuated correlations with fluid intelligence and that the correlation with fluid intelligence did not increase over trials. They argued that their results suggest that “the relation between working memory and fluid intelligence is not dependent on the amount of information that must be maintained, or on processes that occur over the course of performing the tasks” (p. 364).
Research outcomes such as these are consistent with Engle’s (2002) review of evidence indicating that working memory tasks really tap the preservation of internal representations in the presence of distraction or, as we have termed it, the ability to decouple a secondary representation (or metarepresentation) from a primary representation and manipulate the former. For example, he describes an experiment using the so-called antisaccade task. Subjects must look at the middle of a computer screen and respond to a target stimulus that will appear on the left or right of the screen. Before the target appears, a cue is flashed on the opposite side of the screen. Subjects must resist the attention-capturing cue and respond to the target on the opposite side when it appears. Subjects scoring low on working memory tasks were more likely to make an eye movement (saccade) in the direction of the distracting cue than were subjects who scored high on working memory task.
That the antisaccade task has very little to do with memory is an indication of why investigators have reconceptualized the individual difference variables that working memory tasks are tapping (Jaeggi et al. 2008; Marcovitch et al. 2010). Individual differences on such tasks are now described with a variety of different terms (attentional control, resistance to distraction, executive control), but the critical operation needed to succeed in them—and the reason they are the prime indicator of fluid intelligence—is that they reflect the ability to sustain decoupled representations. Hasher et al. (2007) summarize this view with their conclusion “our evidence raises the possibility that what most working memory span tasks measure is inhibitory control, not something like the size of operating capacity” (p. 231).
In summary, our view is that the defining feature of Type 1 processing and of Type 2 processing is not the conjunction of eight different binary properties (“2 cells out of 256!”). Research has advanced since the “suggestive list of characteristics” phase of over a decade ago. Our view of the literature is that autonomous processing is the defining feature of Type 1 processing. Even more convincing is the converging evidence that the key feature of Type 2 processing is the ability to sustain the decoupling of secondary representations. The latter is a foundational cognitive requirement for hypothetical thinking.
Keren and Schul (2009) use the fractionation of the automaticity concept that we discussed previously as a negative example: “Bargh (1994) noted, almost 15 years ago, that automaticity may not be a single concept in the sense that manifestations of automaticity (such as non-awareness, non-intentionality, efficiency, and non-controllability) are not aligned” (p. 539). But as we noted above, years before Bargh (1994), Stanovich (1990) argued the same thing. However, contrary to Keren and Schul’s implication, the fractionation of the concept into its essential features versus its incidental correlates helped rather than hurt the reading field. The latter field still profits from the distinction between Type 1 and Type 2 processing.
Keren and Schul (2009) have fun with other caricatures: “We are [concerned]…with the very strong claim that a limited set of binary attributes can be combined in only one, single, unique way, permitting the existence of exactly two systems (and no more!)” (p. 538). The “no more” is funny, but nonetheless a straw man since that has never been an assumption of dual-process theory, as Gilbert (1999) noted over a decade ago: “Few of the psychologists whose chapters appear in this volume would claim that the dual processes in their models necessarily correspond to the activity of two distinct brain structures…. Psychologists who champion dual-process models are not usually stuck on two. Few would come undone if their models were recast in terms of three processes, or four, or even five. Indeed, the only number they would not happily accept is one, because claims about dual processes in psychology are not so much claims about how many processes there are, but claims about how many processes there aren’t. And the claim is this: There aren’t one” (pp. 3–4).