The Dunning–Kruger effect refers to the observation that the incompetent are often ill-suited to recognize their incompetence. Here we investigated potential Dunning–Kruger effects in high-level reasoning and, in particular, focused on the relative effectiveness of metacognitive monitoring among particularly biased reasoners. Participants who made the greatest numbers of errors on the cognitive reflection test (CRT) overestimated their performance on this test by a factor of more than 3. Overestimation decreased as CRT performance increased, and those who scored particularly high underestimated their performance. Evidence for this type of systematic miscalibration was also found on a self-report measure of analytic-thinking disposition. Namely, genuinely nonanalytic participants (on the basis of CRT performance) overreported their “need for cognition” (NC), indicating that they were dispositionally analytic when their objective performance indicated otherwise. Furthermore, estimated CRT performance was just as strong a predictor of NC as was actual CRT performance. Our results provide evidence for Dunning–Kruger effects both in estimated performance on the CRT and in self-reported analytic-thinking disposition. These findings indicate that part of the reason why people are biased is that they are either unaware of or indifferent to their own bias.
If competency is required to recognize incompetence, truly incompetent people will be both incompetent and unaware of their incompetence (Dunning, Johnson, Ehrlinger, & Kruger, 2003; Kruger & Dunning, 1999). The failure to recognize incompetence among the incompetent—often referred to as the Dunning–Kruger effect—has far-reaching implications because, presumably, one of the prerequisites of voluntary self-improvement is actually recognizing the need for improvement. Indeed, those who would benefit the most from understanding the limits of their reasoning are the ones who are least likely to do so. From a theoretical standpoint, estimating the degree of miscalibration among the incompetent is also paramount for understanding the causes and consequences of biases in judgment and decision making writ large. In the present work, we examine incompetence in the realm of biased or intuitive responding in reasoning tasks and provide evidence that Dunning–Kruger effects extend to one’s self-reported analytic-thinking disposition.
Dual-process theory and analytic-thinking disposition
The distinction between autonomous/intuitive (Type 1) processes and deliberative/analytic (Type 2) processes, as formalized in dual-process theories (De Neys, 2012; Evans & Stanovich, 2013; Handley & Trippas, 2015; Kahneman, 2011; Pennycook, Fugelsang, & Koehler, 2015b; Sloman, 2014; Thompson, Prowse Turner, & Pennycook, 2011), is now exceedingly popular in many areas of psychology. One of the key implications of this distinction is that analytic thinking proceeds via at least some form of volitional control, which in turn indicates that people differ in terms of their dispositions toward analytic thinking. Put simply, some people appear to be more willing than others to engage in deliberative thought (Stanovich, 2012; Stanovich & West, 1998, 2000). Recent research has shown that individual differences in analytic-thinking disposition (also referred to as “analytic cognitive style”) are consequential for a wide range of psychological domains over and above individual differences in cognitive ability or intelligence (Pennycook, Fugelsang, & Koehler, 2015a).
Analytic-thinking disposition is also assessed using self-report scales, such as the Need for Cognition (NC) scale (Cacioppo & Petty, 1982; Cacioppo, Petty, Feinstein, & Jarvis, 1996; Epstein, Pacini, Denes-Raj, & Heier, 1996; Fleischhauer et al., 2010). According to Petty, Brinol, Loersch, and McCaslin (2009), need for cognition “refers to the tendency for people to vary in the extent to which they engage in and enjoy effortful cognitive activities” (p. 318). It includes items such as “I am not very good at solving problems that require careful logical analysis” and “I enjoy solving problems that require hard thinking.” Thus, “effortful cognitive activities” according to Petty et al.’s definition refers specifically to thinking and problem solving.
Pacini and Epstein (1999) further specified the NC scale by creating subscales that distinguish between “ability” and “engagement.” This distinction can be seen in the two examples offered above: The NC scale includes items that index both the self-reported ability to engage in effortful thought and the enjoyment derived therefrom. Contrast, for example, the item “I am much better at figuring out things logically than other people” (Ability subscale) with the item “I try to avoid situations that require thinking in depth about something” (Engagement subscale). Thus, Pacini and Epstein’s NC scale distinguishes two important components of general analytic-thinking dispositions in a way that parallels performance-based measures.
NC scores have been used frequently in studies on information evaluation and recall, attitude formation, and judgment and decision making (see Petty et al., 2009, for a review), along with a variety of other domains not traditionally associated with information processing. For example, high NC is associated with decreased religious and paranormal belief (Pennycook, Cheyne, Barr, Koehler, & Fugelsang, 2014; Svedholm & Lindeman, 2013), increased life satisfaction (Gauthier, Christopher, Walter, Mourad, & Marek, 2006), decreased support for punitive responses to crime (Sargent, 2004), and utilitarian moral judgment (Conway & Gawronski, 2013). Indeed, Petty et al. (2009) noted that over 1,000 publications have either cited the article that introduced the NC scale (Cacioppo & Petty, 1982) or the article that introduced a shortened version of the scale (Cacioppo, Petty, & Kao, 1984).
Methodological implications of the Dunning–Kruger effect
Although the NC scale has been used in hundreds of studies (Cacioppo et al., 1996; Petty et al., 2009), research on self-report thinking disposition faces a dilemma: People who are genuinely unwilling to engage analytic thinking may not be well suited to estimate their degree of NC. Indeed, there is evidence that high NC is associated with increased levels of thinking about one’s own thinking (Petty et al., 2009). Moreover, Kruger and Dunning (1999) found that people who were in the bottom quartile in terms of logical reasoning estimated that they were just above average in terms of accuracy whereas, if anything, those in the top quartile somewhat underestimated their performance. Mata, Ferreira, and Sherman (2013) demonstrated that relatively analytic people have a metacognitive advantage over those who rely primarily on their intuition because they are aware of both the intuitive answer and the deliberative (typically correct) response. Similar to Kruger and Dunning, Mata, Ferreira, and Sherman found that intuitive reasoners strongly overestimate their performance relative to deliberative reasoners. Given that people’s “chronic self-views” (i.e., opinion about one’s abilities independent of actual performance) have been shown to influence their performance (Atir, Rosenzweig, & Dunning, 2015; Critcher & Dunning, 2009; Ehrlinger & Dunning, 2003), previous research has suggested that people overestimate their performance on high-level reasoning tasks precisely because they view themselves as more reasonable than is justified by their objective performance.
On the basis of this research, we hypothesize that self-report thinking disposition scales are miscalibrated in a systematic way, such that those who are genuinely not analytic should overstate their relative analyticity, whereas people who are genuinely analytic should fairly accurately report their NC or perhaps even underreport it relative to others in the sample. In other words, the association between NC and objective performance should be similar to the association between estimated and objective performance: Overestimation should be largest among the most biased and the smallest (or reversed) among the least biased.
Theoretical implications of the Dunning–Kruger effect
The Dunning–Kruger effect has important—although heretofore unspecified—theoretical implications for recent work in the field of heuristics and biases in reasoning (but see Mata, Ferreira, & Sherman, 2013, for related empirical work). Namely, there is growing evidence that people can recognize (if implicitly) the conflict inherent in many reasoning problems (see De Neys, 2012, 2014, for reviews). For example, consider the following item from the cognitive reflection test (CRT; Frederick, 2005):
A bat and ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?
Typically, around 65% of participants respond “10 cents” to this problem (e.g., Pennycook, Cheyne, Koehler, & Fugelsang, 2016), even though that is incorrect (if the ball costs 10 cents, in that case the bat must cost $1.10, and in total they would cost $1.20). This is thought to occur because people are cognitive misers (i.e., they conserve mental resources when possible; see Toplak, West, & Stanovich, 2011) and unduly rely on the first thing that comes to mind.
If the Dunning–Kruger effect applies to this type of problem, those who give incorrect intuitive responses would be unlikely to recognize their bias (Mata, Ferreira, & Sherman, 2013). Surprisingly, however, there is some evidence that even people who get this problem incorrect have some sense that there is something “off” about the problem. Namely, participants who give the incorrect intuitive response to conflict problems are less confident than when they answer a nonconflict version of the task (i.e., one that does not cue an incorrect intuitive response; De Neys, Rossi, & Houdé, 2013)—a pattern of results that remains even after participants are put under cognitive load (Johnson, Tubau, & De Neys, 2016). Such conflict detection has been evidenced using a variety of different measures (e.g., response time; De Neys & Glumicic, 2008; memory recall: Franssens & De Neys, 2009; skin conductance response: De Neys, Moyens, & Vansteenwegen, 2010; fMRI: De Neys, Vartanian, & Goel, 2008; and ERPs: Banks & Hope, 2014) and a wide range of tasks (e.g., syllogisms: De Neys & Franssens, 2009; the conjunction fallacy: De Neys, Cromheeke, & Osman, 2011; and the ratio bias: Bonner & Newell, 2010).
Conflict detection during reasoning has been referred to as “omnipresent, regardless of whether participants answer problems correctly or incorrectly” (De Neys et al., 2008, p. 488). Indeed, there is evidence for conflict detection even among particularly biased participants, using subtle low-level measures such as skin conductance (De Neys et al., 2010). Interestingly, however, this line of research does not appear consistent with the strong evidence for the Dunning–Kruger effect: If people are good at detecting conflict during reasoning, why are the incompetent so unaware of their incompetence?
One possibility is that the low-level detection of conflicting outputs, however efficient, may not translate into changes in behavior and, ultimately, reductions in biased responding. Indeed, recent research has indicated that failures of conflict detection (which may be due to either a lack of conflict detection signal or a lack of responsiveness to a present conflict signal) are more common than had previously been thought (Pennycook, Fugelsang, & Koehler, 2012, 2015b). Moreover, there is evidence that less-analytic individuals are less likely to respond to conflict during reasoning (Mevel et al., 2015; Pennycook et al., 2014; Pennycook et al., 2015b; Thompson & Johnson, 2014). Thus, metacognitive monitoring may be more effective among genuinely analytic individuals, regardless of the effectiveness of conflict detection per se. This suggests that nonanalytic participants should overestimate their accuracy on problems that include an intuitive yet incorrect lure, like the bat-and-ball example above (Mata, Ferreira, & Sherman, 2013). More generally, the foregoing discussion indicates that analytic individuals may be better-suited to assess their relative degree of analyticity.
The present work
We report two studies in which participants completed a popular performance-based measure of analytic-thinking disposition, the CRT (Frederick, 2005), and were subsequently asked to estimate how many of the items they had gotten correct (Mata, Ferreira, & Sherman, 2013; Noori, 2016). Following Kruger and Dunning (1999), we hypothesized that participants who performed poorly on the CRT would overestimate their performance to a greater extent than would those who performed well (i.e., less-analytic people should be more poorly calibrated). In addition, participants were also asked to self-report their need or desire to think analytically using the NC scale. We predicted a Dunning–Kruger effect, such that participants who performed particularly poorly on the CRT (indicating an intuitive or non-analytic-thinking disposition) would overreport the degree to which they were disposed to analytic thinking. In Study 2, we used an independent assessment of analytic thinking—the heuristics-and-biases inventory (Toplak, West, & Stanovich, 2011, 2014)—to assess whether nonanalytic individuals are genuinely worse at recognizing their bias. Put differently, participants who were decidedly nonanalytic based on the performance measure should be less-suited to assess their degree of analyticity on a self-report measure, leading to poor calibration in terms of both estimated CRT accuracy and self-reported NC.
Kruger and Dunning’s (1999) primary finding was a larger difference between actual performance and estimated performance (i.e., miscalibration in the form of overestimation) for the incompetent relative to the competent. Thus, participants who perform relatively poor on the CRT should overestimate their performance and those who do relatively well should be better calibrated (Mata, Ferreira, & Sherman, 2013), which suggests that those who are genuinely nonanalytic may not be aware of their lack of analyticity.
Assuming there was good evidence for overconfidence on the estimated CRT task, we could then assess the association between this miscalibration and self-reported analytic-thinking style (via the NC scale). If the tendency to overestimate CRT performance translates into a tendency to rate oneself relatively high in NC, estimated CRT performance should predict NC over and above actual CRT performance. Similarly, the well-established association between CRT and NC (e.g., Pennycook et al., 2016) should not be evident once CRT scores had been calibrated by subtracting estimated CRT scores. Furthermore, these findings should hold for both the Ability and Engagement subscales of the NC scale.
In all studies, we report how we determined our sample size, all data exclusions, and all measures. The data are available online via the Open Science Framework: https://osf.io/3kndg/.
We recruited our participants through Amazon’s Mechanical Turk and chose a target N of 200 in order to have sufficient power (90%) to detect a moderate effect size (r = .20). We removed 14 participants because they responded affirmatively when asked whether they had responded randomly at any point during the survey. We also removed a participant who had missing values for the CRT questions. Two additional participants were removed because they gave numerical answers to a CRT question that required a nonnumerical response. The resulting sample (N = 183; mean age = 33.7, SD = 9.5) consisted of 99 males and 84 females.
Materials and procedure
Cognitive reflection test
Frederick’s (2005) original CRT consisted of three math problems that reliably cue intuitive but incorrect answers. The test is considered an index of analytic-thinking disposition because the items require the participant to question a compelling intuitive response—a process that requires a willingness to think analytically (Pennycook & Ross, 2016; Travers, Rolison, & Feeney, 2016). Since the original CRT has been used extensively on Mechanical Turk, we used two recently developed tests that were designed to measure the same underlying construct instead of the original measure. Specifically, we used Toplak, West, and Stanovich’s (2014) four-item CRT and Thomson and Oppenheimer’s (2016) four-item CRT. The numbers of correct responses for the two CRTs were significantly correlated, r(183) = .336, p < .001, and combined to make a scale with acceptable reliability, α = .70. The eight CRT items were presented one at time and in a random order for each participant. After they had completed all CRT items, we asked participants to estimate the number of CRT questions they had answered correctly.
Need for cognition
Participants then completed Pacini and Epstein’s (1999) 20-item NC scale. Pacini and Epstein’s NC scale consists of Ability and Engagement subscales, which were treated as separate scales for the present purposes. Items were presented in a randomized order, and participants responded on a scale from 1 Definitely not true of myself to 5 Definitely true of myself. Both the Ability (NC-A) and Engagement (NC-E) subscales had excellent reliability, αs = .90 and .95 (for NC-A and NC-E, respectively). The full NC scale was also reliable, α = .95
Results and discussion
Dunning–Kruger effects in estimated CRT performance
As predicted, participants overestimated their total accuracy on the eight CRT items. On average, participants estimated that they had correctly solved 5.59 CRT problems (SD = 1.52), but the mean performance was only 3.88 (SD = 2.11), t(182) = 11.14, SE = 0.15, p < .001, d = 0.82. The correlation between estimated and actual CRT performance was modest, r(183) = −.379, p < .001, such that actual CRT performance only explained 14.4% of the variance in estimated CRT performance. Following Kruger and Dunning (1999), we split the sample into groups, based on accuracy, to test whether those who did poorly on the CRT were more strongly miscalibrated than those who did better. Although there are only nine possible scores on the eight-item CRT, and in theory we could create nine different groups, only a small number of participants scored either 0 (N = 4, 2.2% of the sample) or 8 (N = 8, 4.4% of the sample). Thus, we increased the N in each CRT group by combining across accuracy scores and creating four groups (0–2, 3–4, 5–6, and 7–8).
Using a mixed-design analysis of variance (ANOVA), we found an interaction between CRT group and the difference between the actual CRT score and the estimated CRT score (see Fig. 1), F(3, 179) = 56.24, MSE = 1.13, p < .001, ƞ 2 = .49. As is evident from Fig. 1, overestimation decreased (i.e., calibration increased) systematically as accuracy increased. To investigate the association between calibration and analytic thinking, we computed a difference score between the estimated and actual CRT scores and compared it across the levels of CRT performance (based on our four groups). A post-hoc Tukey honest significant difference (HSD) test comparing the differences between estimated and actual CRT performance (i.e., calibration) indicated that each of the four groups emerged as a separate, homogeneous subset (p < .05). Notably, those who scored very low on the CRT (0–2; M = 1.42, 17.8% accuracy) estimated that they had answered 4.78 questions correctly (59.8% accuracy)—that is, they overestimated by a factor of 3.4. Those who correctly answered three or four of the CRT problems overestimated by a factor of 1.7, whereas those who answered five or six problems correctly only overestimated by a factor of 1.1 (see Fig. 1).
In contrast, and similar to Kruger and Dunning’s (1999) results, we found that those who scored 7 or 8 underestimated their performance by a factor of 1.1, t(29) = 3.21, p = .003, d = 0.59. This was not driven by the few participants (N = 8) who scored a perfect 8 out of 8 (and for whom it would be impossible to overestimate performance)—those who scored 7 out of 8 (N = 22) estimated that they had got, on average, 6.36 (SD = 1) out of 8 correct, which was significantly lower than their true score of 7, t(21) = 2.98, p = .007, d = 0.64. Importantly, those who scored 5 or 6 out of 8 significantly overestimated their performance, t(37) = 4.35, p < .001, d = 0.71, indicating that the monotonic improvement in calibration as CRT performance increased cannot be attributable to those who scored too high for overestimation to be possible. Moreover, it should be noted that these results cannot be attributed to mere regression to the mean. Particularly intuitive individuals (those who scored 0–2 out of 8) overestimated by a factor of 3.4, whereas particularly analytic individuals (those who scored 7 or 8 out of 8) underestimated by a factor of 1.1. That is, the reported miscalibration is asymmetric in a way that is predicted by previous research.
Association between overconfidence and need for cognition
If the NC scale is a good measure of actual analytic-thinking disposition, it should correlate more strongly with actual CRT performance than with estimated performance. However, if anything, estimated CRT performance was more strongly correlated with NC, r(183) = .307, p < .001, than was actual CRT performance, r(183) = .246, p = .001. This was more evident for the Ability subscale (NC-A), which correlated with estimated CRT performance nearly as strongly as did actual CRT performance, r(183) = .352, p < .001, but was more modestly correlated with actual CRT performance, r(183) = .242, p = .001. However, there was no difference between the magnitudes of these correlations via a Williams test, t(180) = 1.42, p = .159. The correlation between the Engagement subscale (NC-E) and estimated CRT performance, r(183) = .229, p = .002, was more similar to the subscale’s correlation with actual CRT performance, r(183) = .216, p = .003. Thus, if anything, miscalibration was strongest when participants self-reported their ability (as opposed to willingness) to engage analytic thinking.
If the tendency to overestimate CRT performance is linked with the tendency to rate oneself relatively high on the NC scale, the estimated CRT score should predict NC over and above actual CRT performance. We therefore entered both accuracy and estimated accuracy (along with their interaction) as predictors in two regression analyses, with NC-A and NC-E as the dependent variables (see Table 1). Not only did estimated CRT accuracy significantly predict self-reported NC-A and NC-E once actual CRT scores were taken into account, but actual CRT accuracy was not a robust predictor once estimated accuracy was taken into account. We also found an interaction between estimated accuracy and actual accuracy for each of the NC subscales. This interaction emerged because estimated CRT accuracy was more strongly positively correlated with NC for those who did well on the CRT (see Fig. S1 in the supplementary materials). These results indicate a link between miscalibration in analytic thinking and self-reported need for cognition. As a follow-up analysis, we created a calibration score by taking the difference between estimated and actual CRT. This calibration score was not significantly correlated with either NC-A, r(183) = −.012, p = .871, or NC-E, r(183) = .051, p = .491. This, again, indicates that self-report NC is as much a measure of estimated CRT performance as it is a measure of actual CRT performance.
In Study 1, those who were prone to errors on the CRT were more likely to overestimate their performance—a Dunning–Kruger effect. Moreover, consistent with previous research in other domains (Atir et al., 2015; Critcher & Dunning, 2009; Ehrlinger & Dunning, 2003), the estimates of CRT accuracy were as predictive of self-reported analytic-thinking disposition as was actual CRT accuracy. These findings suggest that self-report measures of thinking disposition may lack precision. Overconfidence among those genuinely low in analytic thinking may translate into overestimates of self-report analytic thinking, whereas proper calibration (or, if anything, underconfidence) among those genuinely high in analytic thinking may translate into relative underestimates of self-report analytic thinking. In other words, it may be the case that those who rely on their intuition are not analytic enough to know that they are not analytic, whereas analytic people are analytic enough to know the limits of their analyticity.
Although the results of Study 1 were consistent with the proposed Dunning–Kruger effect in self-reported thinking disposition, the evidence was indirect. Specifically, the same performance-based analytic-thinking disposition measure, the CRT, was used to make inferences about miscalibration for both estimated CRT scores and self-reported NC. To overcome this limitation, we included a separate measure of analytic thinking: the heuristics-and-biases inventory (H&B; Toplak, West, & Stanovich, 2011). Having a second performance-based (“objective”) measure of analytic thinking would allow us to more directly test the hypothesis that NC scores are systematically miscalibrated. That is, the difference between actual and estimated CRT performance (calibration) should be associated with objective H&B performance, but not with self-reported NC. This would illustrate a correspondence between objective measures of analytic thinking even after miscalibration has been taken into account. There should be no similar correspondence with self-report NC because, presumably, this miscalibration also affects people’s perception of their analytic-thinking disposition.
Including a second objective measure of analytic thinking would also allow us to compare relative performance on our two analytic-thinking benchmarks (CRT and H&B) with relative self-reported NC by creating a number of additional calibration scores. Individuals who are not particularly analytic (based on their relative CRT and H&B scores) should nonetheless rate themselves as relatively analytic (based on self-reported NC). We predicted that the difference between relative CRT performance and self-reported NC would correlate positively with H&B performance (and, likewise, that the difference between H&B and NC would correlate with CRT scores). In other words, there should be miscalibration between NC and our objective measures of analytic thinking in the same way that there is miscalibration between estimated and actual CRT scores. This would be akin to a Dunning–Kruger effect in self-reported analytic-thinking disposition. As in Study 1, this should be evident for both the Ability and Engagement subscales of the NC scale.
The data are available online via the Open Science Framework: https://osf.io/zrjje/.
In total, 400 participants were recruited using Mechanical Turk.Footnote 1 We removed 56 participants because they responded affirmatively when asked whether they had responded randomly at any point during the survey, and three participants who had missing data for the CRT. The resulting sample (N = 341; mean age = 33.9, SD = 10.5) consisted of 200 males and 139 female (two participants did not indicate their gender).
Materials and procedure
Cognitive reflection test
The CRT was administered as in Study 1. The two CRT scales were significantly correlated, r(341) = .362, p < .001, but the scale had weaker reliability than in Study 1, α = .65. One participant estimated that they had solved nine out of eight CRT problems correctly, so we changed the value to 8.
As an independent measure of analytic thinking, we used Toplak et al.’s (2011) heuristics-and-biases battery. The scale consists of 16 problems that were derived from Kahneman and Tversky’s heuristics-and-biases research program (see Kahneman, 2011, for a review). Items include biases such as the gambler’s fallacy and the conjunction fallacy (see the Appendix in Toplak et al., 2011). The scale had weak reliability, α = .68.
Need for cognition
The NC scale was administered as in Study 1, except in this case it followed the heuristics-and-biases battery and was presented before a paranormal belief scale (which will not be considered further; see the supplementary materials). One participant had an outlying NC score (three SDs below the mean)—subsequent analyses are reported with the outlier removed, although the pattern of results was identical when this participant was included. The subscales were reliable, α = .89 and .94 for NC-A and NC-E, respectively.
Results and discussion
Dunning–Kruger effects in estimated CRT performance
Estimated CRT accuracy (M = 5.62, SD = 1.59) was greater than actual accuracy (M = 3.85, SD = 1.98), t(340) = 16.42, SE = 0.11, p < .001, d = 0.89. As in Study 1, we split the sample into four groups based on accuracy of CRT scores, to examine the relationship between actual and estimated CRT scores. Using a mixed-design ANOVA, we observed an interaction between low/high CRT group and the difference between the actual CRT score and the estimated CRT score (see supplementary Fig. S2), F(3, 337) = 92.64, MSE = 1.09, p < .001, ƞ 2 = .45. A post-hoc Tukey HSD test comparing the differences between estimated and actual CRT performance (i.e., calibration) indicated that all four groups emerged as separate, homogeneous subsets (p < .05). This indicated a decrease in overestimation at each increasing level of CRT performance. As in Study 1, those who scored lowest on the CRT (0–2 correct) overestimated their performance by a factor of 3.4. Even those who scored 5 or 6 out of 8 significantly overestimated their performance, t(52) = 6.63, p < .001, d = 0.67. In contrast, those who scored highest (7 or 8 correct) significantly underestimated their performance, t(33) = 3.86, p = .001, d = 0.66.
Association between overconfidence and need for cognition
Here we replicated the associations between estimated and actual CRT performance and NC. Correlations among the primary variables can be found in Table 2. As in Study 1, the correlations between NC and actual CRT performance were not significantly different from the correlation between NC and estimated CRT performance, t < 1.Footnote 2 To estimate the extents to which actual CRT accuracy and estimated CRT accuracy predicted NC, we entered both variables (along with their interaction) as predictors in two regression analyses, with NC-A and NC-E as a dependent variables (see Table 3). Unlike in Study 1, both actual and estimated CRT accuracy (along with their interaction; see Fig. S3 in the supplementary materials) independently significantly predicted NC-A, and only actual CRT significantly predicted NC-E. These independent associations between NC-A and CRT (actual and estimated) were relatively modest, however; β ranged from .120 to .181.
As in Study 1, neither NC-A nor NC-E was associated with the difference between actual and estimated CRT performance, rs < .085, ps > .130 (Calibration 1; see Table 2). In contrast, H&B performance did positively associate with Calibration 1, r(341) = .148, p = .006. Thus, we found a correspondence between objective measures of analytic thinking (CRT and H&B), even after miscalibration (CRT estimates) had been taken into account. This pattern of results was not evident for self-reported NC because, presumably, the miscalibration that is reflected in estimated CRT performance also affects people’s perceptions of their analytic-thinking disposition.
Dunning–Kruger effects in self-reported need for cognition
To further test the hypothesis that participants who are genuinely low in analytic thinking overreport their NC, we created three additional calibration scores. We did this by converting CRT, H&B, and NC raw scores into z scores and computed their difference scores (Table 2). The conversion to z scores allowed us to compare participants’ relative positions on the three ostensibly linked measures. The goal was to treat self-reported NC in the same way as estimated CRT scores. For this analysis, we decreased the number of comparisons by focusing on the Ability subscale (NC-A), which more directly relates to people’s assessments of their analytic-thinking disposition (as opposed to mere enjoyment of analytic thinking).Footnote 3
If NC-A is miscalibrated in the same way as estimated CRT scores, as we have predicted, there should be a correspondence between the parallel estimate-based and NC-based calibration scores. Indeed, the difference between the z scores for CRT accuracy and NC-A did correlate positively with H&B performance (“Calibration 2”; Table 2), r(341) = .172, p = .001. Correspondingly, the difference between the z scores for H&B accuracy and NC-A correlated positively with CRT performance (“Calibration 3”; Table 2), r(341) = .165, p = .002. Participants who were less analytic on the basis of H&B performance were more likely to rate themselves as relatively higher in NC than would be warranted, given their CRT performance (and vice versa). There was no corresponding correlation between NC-A and the difference between the z scores for CRT and H&B, r(341) = −.007, p = .894, which indicates a dissociation between self-reported and performance-based measures of analytic thinking.
This analysis illustrates that self-reported NC (Ability subscale) is more similar (in terms of its correlates) to estimated CRT performance than to either actual CRT or actual heuristics-and-biases performance. To summarize, self-reported NC does not predict the difference between actual and estimated CRT performance, but heuristics-and-biases performance does predict this difference. Moreover, when the difference between relative (actual) CRT and NC scores is used as a calibration score, heuristics-and-biases performance predicts that, as well. Similarly, when the difference between heuristics-and-biases performance and NC score is used as a calibration score, CRT performance predicts that. Finally, to complete the full set of possible analyses, the difference between CRT and heuristics-and-biases performance is not predicted by NC score, since it is not a meaningful difference (unlike the difference between either CRT or H&B performance and NC scores, as hypothesized).
To illustrate the source of the Dunning–Kruger effect on self-reported NC, we will focus on the parallel between two independent patterns of results: (1) the difference between actual and estimated CRT scores as a function of CRT group (Fig. 2) [interaction: F(3, 337) = 92.64, MSE = 1.09, p < .001, ƞ 2 = .45], and (2) the difference between H&B performance and self-reported NC as a function of CRT group (Fig. 3) [interaction: NC (full scale), F(3, 337) = 2.92, MSE = 0.77, p = .034, ƞ 2 = .03; NC-A, F(3, 337) = 2.79, MSE = 0.79, p = .040, ƞ 2 = .02; NC-E, F(3, 337) = 2.87, MSE = 0.78, p = .012, ƞ 2 = .03]. In both cases, we observed an interaction such that estimated accuracy/self-report NC was higher than objective performance for those who scored relatively low on the CRT, but this difference decreased and eventually reversed as CRT performance increased. Indeed, the comparison between H&B performance and NC indicates that, if anything, particularly analytic individuals underreported their NC relative to the remainder of the sample. It should be noted, however, that the confidence intervals overlapped substantially at every level in this analysis, despite the overall evidence for an interaction (see Fig. 3).
Our results provide empirical support for Dunning–Kruger effects in both estimates of reasoning performance and self-reported thinking disposition. Particularly intuitive individuals greatly overestimated their performance on the CRT—a tendency that diminished and eventually reversed among increasingly analytic individuals. Moreover, self-reported analytic-thinking disposition—as measured by the Ability and Engagement subscales of the NC scale—was just as strongly (if not more strongly) correlated with estimated CRT performance than with actual CRT performance. In addition, an analysis using an additional performance-based measure of analytic thinking—the heuristics-and-biases battery—revealed a systematic miscalibration of self-reported NC, wherein relatively intuitive individuals report that they are more analytic than is justified by their objective performance. Together, these findings indicate that participants who are low in analytic thinking (so-called “intuitive thinkers”) are at least somewhat unaware of (or unresponsive to) their propensity to rely on intuition in lieu of analytic thought during decision making. This conclusion is consistent with previous research that has suggested that the propensity to think analytically facilitates metacognitive monitoring during reasoning (Pennycook et al., 2015b; Thompson & Johnson, 2014). Those who are genuinely analytic are aware of the strengths and weaknesses of their reasoning, whereas those who are genuinely nonanalytic are perhaps best described as “happy fools” (De Neys et al., 2013).
This research has both methodological and theoretical implications. With respect to methodological implications, self-report measures of thinking disposition such as the NC scale have been used in hundreds of studies (Cacioppo et al., 1996; Petty et al., 2009). Nonetheless, correlations with NC are often modest. In the present work, for example, correlations (Pearson’s r) between performance-based measures of analytic thinking and the Ability and Engagement NC subscales ranged from .190 to .242. Given the evidence for systematic miscalibration among genuinely nonanalytic individuals, we suggest that the predictive power of analytic-thinking dispositions may have been underestimated in research that has focused on self-report measures. Future research should continue to explore whether performance-based measures of analytic thinking yield stronger associations than do self-report measures.
In terms of theoretical implications, there has been a recent surge in metacognitive perspectives in the realm of dual-process theory. For example, Thompson and colleagues (2011; Thompson et al., 2013) have provided evidence that “feelings of rightness” are predictive of the extent and quality of analytic thinking. Mata, Fiedler, Ferreira, and Almeida (2013) found a metacognitive advantage for deliberative reasoners because they are aware of both the intuitive and reflective responses to reasoning problems. In contrast, much research has indicated that people are capable of detecting conflict between competing reasoning outputs (see De Neys, 2012, 2014, for reviews). Indeed, as mentioned earlier, conflict detection has been referred to as “omnipresent” (De Neys et al., 2008, p. 488), and even particularly biased participants have been shown to have an increased skin conductance response when faced with a conflict-inducing reasoning problem (De Neys et al., 2010). This indicates that even the most biased of reasoners can be sensitive to stimuli that cue conflicting responses.
The contrast between these lines of research is particularly stark when it comes to the CRT. Whereas De Neys et al. (2013) found that even people who gave the intuitive response to the bat-and-ball problem (as highlighted in the introduction) were only around 82% confident of their response (as compared to 97% confidence on a control version of the problem), the present results indicate that those who perform poorly on the CRT massively overestimate their accuracy (i.e., by a factor >3). This contraposition indicates that the neurological (e.g., De Neys et al., 2008) and physiological (De Neys et al., 2010) conflict detection signals may be relatively effective, but the response to this signal may actually be rather ineffective. De Neys et al. (2013) found a large (15%) decrease in confidence for the bat-and-ball problem relative to a control, but 82% confidence is still quite high.
There is also evidence from verbal protocols that conflict detection is often implicit (De Neys & Glumicic, 2008), and it seems likely that it may only become explicit upon further reflection via analytic processing. Thus, given that analytic thinking relies (to some extent) on volitional control, the presence of a signal to think analytically does not guarantee that the individual will engage in more than cursory levels of analytic thought. This line of reasoning is supported by both the present findings and previous work showing that the propensity to think analytically correlates with increases in response time for biased responses to incongruent (conflict) base-rate problems (Pennycook et al., 2014; Pennycook et al., 2015b). That is, more-analytic individuals engaged in more substantive analytic thinking, even in cases in which they ultimately rationalized their initial (“biased”) response. Collectively, these findings reveal pervasive differences between intuitive and analytic individuals at various levels of cognitive functioning.
The present results are also unique in the sense that they illustrate the everyday consequences of metacognitive differences as a function of analytic thinking. Namely, less-analytic people are not only less effective at metacognitive monitoring when given a reasoning task, but they may also be less accurate at self-reporting their relative level of analytic thinking. It may be the case that this metacognitive advantage for analytic individuals may be part of the reason why analytic thinking is associated with a wide range of important psychological factors, such as morality, religiosity, and creativity (see Pennycook et al., 2015a). Thinking about how one thinks may influence how and what people think about. Indeed, our results suggest that part of the reason why debates in areas such as politics are often futile is because those for whom overconfidence is the most consequential (i.e., those who need the most correcting, due to their low level of analyticity) are the least likely to recognize their overconfidence. Those most likely to be biased are also the least likely to recognize their bias.
We originally collected data for 200 participants using the same stopping rule as in Study 1. However, upon completing data collection for the first 200 participants in this study, an error in the CRT estimate question was discovered (which had also applied to Study 1). Specifically, participants were told to input a number between 1 and 8 instead of a number between 0 and 8. We therefore reran Study 2 with the corrected instructions. Fortunately, only one participant guessed that they had got 0 out of 8 correct when given the corrected instructions (no participants had indicated 0 in Study 1 or in the first run of Study 2, although the input field would accept this value). Estimated CRT performance was identical regardless of whether participants were told to enter a number from 1 to 8 or 0 to 8, t(339) = 0.04, p = .971. Thus, we report the data for the full set of 400 participants below.
Supernatural (religious and paranormal) belief was more strongly correlated with actual CRT accuracy, r(341) = −.278, p < .001, than with estimated CRT accuracy, r(341) = −.093, p = .086; t(338) = 3.23, p = .001. This provides an existence proof for the idea that actual CRT can be more predictive than estimated CRT accuracy.
The results were nonetheless essentially identical when NC-E was used in place of NC-A. Specifically, the difference between the z scores for CRT accuracy and NC-E correlated positively with H&B performance (akin to Calibration 2 in Table 2), r(341) = .176, p = .001. The difference between the z scores for H&B accuracy and NC-E correlated positively with CRT performance (akin to Calibration 3 in Table 2), r(341) = .163, p = .002.
Atir, S., Rosenzweig, E., & Dunning, D. (2015). When knowledge knows no bounds: Self-perceived expertise predicts claims of impossible knowledge. Psychological Science, 26, 1295–1303. doi:10.1177/0956797615588195
Banks, A. P., & Hope, C. (2014). Heuristic and analytic processes in reasoning: An event-related potential study of belief bias. Psychophysiology, 51, 290–297. doi:10.1111/psyp.12169
Bonner, C., & Newell, B. R. (2010). In conflict with ourselves? An investigation of heuristic and analytic processes in decision making. Memory & Cognition, 38, 186–196. doi:10.3758/MC.38.2.186
Cacioppo, J. T., & Petty, R. E. (1982). The need for cognition. Journal of Personality and Social Psychology, 42, 116–131. doi:10.1037/0022-35188.8.131.52
Cacioppo, J. T., Petty, R. E., Feinstein, J. A., & Jarvis, W. B. G. (1996). Dispositional differences in cognitive motivation: The life and times of individuals varying in need for cognition. Psychological Bulletin, 119, 197–253. doi:10.1037/0033-2909.119.2.197
Cacioppo, J. T., Petty, R. E., & Kao, C. F. (1984). The efficient assessment of need for cognition. Journal of Personality Assessment, 48, 306–307. doi:10.1207/s15327752jpa4803_13
Conway, P., & Gawronski, B. (2013). Deontological and utilitarian inclinations in moral decision making: A process dissociation approach. Journal of Personality and Social Psychology, 104, 216–235. doi:10.1037/a0031021
Critcher, C. R., & Dunning, D. (2009). How chronic self-views influence (and mislead) self-assessments of task performance: Self-views shape bottom-up experiences with the task. Journal of Personality and Social Psychology, 97, 931–945. doi:10.1037/a0017452
De Neys, W. (2012). Bias and conflict: A case for logical intuitions. Perspectives on Psychological Science, 7, 28–38. doi:10.1177/1745691611429354
De Neys, W. (2014). Conflict detection, dual processes, and logical intuitions: Some clarifications. Thinking & Reasoning, 20, 169–187. doi:10.1080/13546783.2013.854725
De Neys, W., Cromheeke, S., & Osman, M. (2011). Biased but in doubt: Conflict and decision confidence. PLoS ONE, 6, e15954. doi:10.1371/journal.pone.0015954
De Neys, W., & Franssens, S. (2009). Belief inhibition during thinking: Not always winning but at least taking part. Cognition, 113, 45–61. doi:10.1016/j.cognition.2009.07.009
De Neys, W., & Glumicic, T. (2008). Conflict monitoring in dual process theories of thinking. Cognition, 106, 1248–1299. doi:10.1016/j.cognition.2007.06.002
De Neys, W., Moyens, E., & Vansteenwegen, D. (2010). Feeling we’re biased: Autonomic arousal and reasoning conflict. Cognitive, Affective, & Behavioral Neuroscience, 10, 208–216. doi:10.3758/CABN.10.2.208
De Neys, W., Rossi, S., & Houdé, O. (2013). Bats, balls, and substitution sensitivity: Cognitive misers are no happy fools. Psychonomic Bulletin & Review, 20, 269–273. doi:10.3758/s13423-013-0384-5
De Neys, W., Vartanian, O., & Goel, V. (2008). Smarter than we think: When our brains detect that we are biased. Psychological Science, 19, 483–489. doi:10.1111/j.1467-9280.2008.02113.x
Dunning, D., Johnson, K., Ehrlinger, J., & Kruger, J. (2003). Why people fail to recognize their own incompetence. Current Directions in Psychological Science, 12, 83–87. doi:10.1111/1467-8721.01235
Ehrlinger, J., & Dunning, D. (2003). How chronic self-views influence (and potentially mislead) estimates of performance. Journal of Personality and Social Psychology, 84, 5–17. doi:10.1037/0022-35184.108.40.206
Epstein, S., Pacini, R., Denes-Raj, V., & Heier, H. (1996). Individual differences in intuitive–experiential and analytical–rational thinking styles. Journal of Personality and Social Psychology, 71, 390–405. doi:10.1037/0022-35220.127.116.110
Evans, J., & Stanovich, K. E. (2013). Dual-process theories of higher cognition: Advancing the debate. Perspectives on Psychological Science, 8, 223–241. doi:10.1177/1745691612460685
Fleischhauer, M., Enge, S., Brocke, B., Ullrich, J., Strobel, A., & Strobel, A. (2010). Same or different? Clarifying the relationship of need for cognition to personality and intelligence. Personality and Social Psychology Bulletin, 36, 82–96. doi:10.1177/0146167209351886
Franssens, S., & De Neys, W. (2009). The effortless nature of conflict detection during thinking. Thinking & Reasoning, 15, 105–128. doi:10.1080/13546780802711185
Frederick, S. (2005). Cognitive reflection and decision making. Journal of Economic Perspectives, 19, 25–42. doi:10.1257/089533005775196732
Gauthier, K. J., Christopher, A. N., Walter, M. I., Mourad, R., & Marek, P. (2006). Religiosity, religious doubt, and the need for cognition: Their interactive relationship with life satisfaction. Journal of Happiness Studies, 7, 139–154. doi:10.1007/s10902-005-1916-0
Handley, S. J., & Trippas, D. (2015). Dual processes and the interplay between knowledge and structure: A new parallel processing model. In B. H. Ross (Ed.), The psychology of learning and motivation (Vol. 62, pp. 33–58). San Diego: Elsevier Academic Press. doi:10.1016/bs.plm.2014.09.002
Johnson, E. D., Tubau, E., & De Neys, W. (2016). The doubting system 1: Evidence for automatic substitution sensitivity. Acta Psychologica, 164, 56–64. doi:10.1016/j.actpsy.2015.12.008
Kahneman, D. (2011). Thinking, fast and slow. New York: Farrar, Straus and Giroux.
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77, 1121–1134. Retrieved from http://psycnet.apa.orgjournals/psp/77/6/1121.
Mata, A., Ferreira, M. B., & Sherman, S. J. (2013). The metacognitive advantage of deliberative thinkers: A dual-process perspective on overconfidence. Journal of Personality and Social Psychology, 105, 353–373. doi:10.1037/a0033640
Mata, A., Fiedler, K., Ferreira, M. B., & Almeida, T. (2013). Reasoning about others’ reasoning. Journal of Experimental Social Psychology, 49, 486–491. doi:10.1016/j.jesp.2013.01.010
Mevel., K., Poirel, N., Rossi, S., Cassotti, M., Simon, G., Houdé, O., & De Neys, W. (2015). Bias detection: Response confidence evidence for conflict sensitivity in the ratio bias task. Journal of Cognitive Psychology, 27, 227-237.
Noori, M. (2016). Cognitive reflection as a predictor of susceptibility to behavioral anomalies. Judgment and Decision Making, 11, 114–120.
Pacini, R., & Epstein, S. (1999). The relation of rational and experiential information processing styles to personality, basic beliefs, and the ratio-bias phenomenon. Journal of Personality and Social Psychology, 76, 972–987. doi:10.1037/0022-3518.104.22.1682
Pennycook, G., Cheyne, J. A., Barr, N., Koehler, D. J., & Fugelsang, J. A. (2014). Cognitive style and religiosity: The role of conflict detection. Memory & Cognition, 42, 1–10. doi:10.3758/s13421-013-0340-7
Pennycook, G., Cheyne, J. A., Koehler, D. J., & Fugelsang, J. A. (2016). Is the cognitive reflection test a measure of both reflection and intuition? Behavior Research Methods, 48, 341–348. doi:10.3758/s13428-015-0576-1
Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2012). Are we good at detecting conflict during reasoning? Cognition, 124, 101–106. doi:10.1016/j.cognition.2012.04.004
Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2015a). Everyday consequences of analytic thinking. Current Directions in Psychological Science, 24, 425–432. doi:10.1177/0963721415604610
Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2015b). What makes us think? A three-stage dual-process model of analytic engagement. Cognitive Psychology, 80, 34–72. doi:10.1016/j.cogpsych.2015.05.001
Pennycook, G., & Ross, R. M. (2016). Commentary on: Cognitive reflection vs. calculation in decision making. Frontiers in Psychology, 7, 9. doi:10.3389/fpsyg.2015.00532
Petty, R. E., Brinol, P., Loersch, C., & McCaslin, M. J. (2009). The need for cognition. In M. R. Leary & R. H. Hoyle (Eds.), Handbook of individual differences in social behavior (pp. 318–329). New York: Guilford.
Sargent, M. J. (2004). Less thought, more punishment: Need for cognition predicts support for punitive responses to crime. Personality and Social Psychology Bulletin, 30, 1485–1493. doi:10.1177/0146167204264481
Sloman, S. (2014). Two systems of reasoning: An update. In J. W. Sherman, B. Gawronski, & Y. Trope (Eds.), Dual-process theories of the social mind (pp. 69–79). New York: Guilford Press.
Stanovich, K. E. (2012). On the distinction between rationality and intelligence: Implications for understanding individual diff erences in reasoning. In The Oxford handbook of thinking and reasoning (pp. 433–455). Oxford: Oxford University Press.
Stanovich, K. E., & West, R. F. (1998). Individual differences in rational thought. Journal of Experimental Psychology: General, 127, 161–188. doi:10.1037/0096-3422.214.171.124
Stanovich, K. E., & West, R. F. (2000). Individual differences in reasoning: Implications for the rationality debate? Behavioral and Brain Sciences, 23, 645–665. doi:10.1017/S0140525X00003435. disc. 665–726.
Svedholm, A. M., & Lindeman, M. (2013). The separate roles of the reflective mind and involuntary inhibitory control in gatekeeping paranormal beliefs and the underlying intuitive confusions. British Journal of Psychology, 104, 303–319. doi:10.1111/j.2044-8295.2012.02118.x
Thompson, V. A., & Johnson, S. C. (2014). Conflict, metacognition, and analytic thinking. Thinking & Reasoning, 20, 215–244. doi:10.1080/13546783.2013.869763
Thompson, V. A., Prowse Turner, J. A., & Pennycook, G. (2011). Intuition, reason, and metacognition. Cognitive Psychology, 63, 107–140. doi:10.1016/j.cogpsych.2011.06.001
Thompson, V. A., Turner, J. A. P., Pennycook, G., Ball, L. J., Brack, H., Ophir, Y., & Ackerman, R. (2013). The role of answer fluency and perceptual fluency as metacognitive cues for initiating analytic thinking. Cognition, 128, 237–251. doi:10.1016/j.cognition.2012.09.012
Thomson, K. S., & Oppenheimer, D. M. (2016). Investigating an alternate form of the cognitive reflection test. Judgment and Decision Making, 11, 99–113.
Toplak, M., West, R., & Stanovich, K. (2011). The cognitive reflection test as a predictor of performance on heuristics-and-biases tasks. Memory & Cognition, 39, 1275–1289. doi:10.3758/s13421-011-0104-1
Toplak, M. E., West, R. F., & Stanovich, K. E. (2014). Assessing miserly information processing: An expansion of the cognitive reflection test. Thinking & Reasoning, 20, 147–168. doi:10.1080/13546783.2013.844729
Travers, E., Rolison, J. J., & Feeney, A. (2016). The time course of conflict on the cognitive reflection test. Cognition, 150, 109–118. doi:10.1016/j.cognition.2016.01.015
This research was funded by the Natural Sciences and Engineering Research Council (NSERC), in the form of Discovery Grants (to J.F. and D.K.) and an Alexander Graham Bell Canada Graduate Scholarship (to G.P).
Electronic supplementary material
Below is the link to the electronic supplementary material.
(PDF 283 kb)
About this article
Cite this article
Pennycook, G., Ross, R.M., Koehler, D.J. et al. Dunning–Kruger effects in reasoning: Theoretical implications of the failure to recognize incompetence. Psychon Bull Rev 24, 1774–1784 (2017). https://doi.org/10.3758/s13423-017-1242-7
- Decision making
- High-order cognition