Human reasoners have been characterized as cognitive misers who show a strong tendency to rely on fast, intuitive processing rather than on more demanding, deliberate thinking (Evans, 2008; Kahneman, 2011). Although the fast and effortless nature of intuitive processing can sometimes be useful, it can also bias our reasoning. It has been argued that the key to this bias is a process of so-called attribute substitution: When people are confronted with a difficult question, they often intuitively answer an easier one instead (e.g., Kahneman, 2011; Kahneman & Frederick, 2002). Consider the following example:

A bat and a ball together cost \$1.10. The bat costs \$1 more than the ball. How much does the ball cost?

When you try to answer this problem, the intuitive answer that immediately springs to mind is “10 cents.” Indeed, about 80 % of university students who are asked to solve the bat-and-ball problem give the “10 cents” answer (e.g., Bourgeois-Gironde & Vanderhenst, 2009). But it is wrong. Obviously, if the ball were to cost 10 cents, the bat would cost \$1.10 (i.e., \$1 more), and then the total cost would be \$1.20, rather than the required \$1.10. The correct response is “5 cents,” of course (i.e., the bat costs \$1.05). The explanation for the widespread “10 cents” bias in terms of attribute substitution is that people substitute the critical relational “more than” statement by a simpler absolute statement. That is, “the bat costs \$1 more than the ball” is read as “the bat costs \$1.” Hence, rather than working out the sum, people naturally parse \$1.10, into \$1 and 10 cents, which is easier to do. In other words, because of the substitution, people give the correct answer to the wrong question.

The bat-and-ball problem is considered a paradigmatic example of people’s cognitive miserliness (e.g., Bourgeois-Gironde & Vanderhenst, 2009; Kahneman, 2011; Kahneman & Frederick, 2002; Toplak, West, & Stanovich, 2011). After all, the problem is really not that hard. Clearly, if people reflected upon it for even a moment, they would surely realize their error and notice that a 10 cents ball and a bat that costs a dollar more cannot total to \$1.10. Hence, the problem with attribute substitution seems to be that people typically do not notice that they are substituting and do not realize their error (Kahneman & Frederick, 2005; Thompson, 2009; Toplak et al., 2011). This can sketch a somewhat bleak picture of human rationality: Not only do we often fail to reason correctly, much like happy fools, we do not even seem to realize that we are making a mistake.

However, the fact that decision-makers do not deliberately reflect upon their response does not necessarily imply that they are not detecting the substitution process. That is, although people might not engage in deliberate processing and might not know what the correct answer is, it is still possible that they have some minimal substitution sensitivity and at least notice that their substituted “10 cents” response is not completely warranted. To test this hypothesis we designed a control version of the bat-and-ball problem that does not give rise to attribute substitution. Consider the following example:

A magazine and a banana together cost \$2.90. The magazine costs \$2. How much does the banana cost?

People will tend to parse the \$2.90 into \$2 and 90 cents just as naturally as they parse \$1.10 in the standard version. However, the control version no longer contains the relative statement (“\$2 more than the banana”) that triggers the substitution. That is, in the control version, we explicitly present the easier statement that participants are supposed to be unconsciously substituting. After solving each version, participants are asked to indicate their response confidence. If participants are completely unaware that they are substituting when solving the standard version, the standard and control versions should be isomorphic, and response confidence should not differ. However, if we are right that people might not be completely oblivious to the substitution and have some minimal awareness of the questionable nature of their answer, response confidence should be lower after solving the standard version.

## Method

### Participants

A total of 248 University of Caen undergraduates who took an introductory psychology course participated voluntarily.

### Materials and procedure

Participants were presented with a standard and control version of the bat-and-ball problem. The problems were translated into French and adjusted to the European test context (see Supplementary Material). To minimize surface similarity, we also modified the superficial item content of the two problems (i.e., one problem stated that a pencil and eraser together cost \$1.10, the other that a magazine and banana together cost \$2.90). Both problems were printed on separate pages of a booklet. To make sure that the differential item content did not affect the findings, the item content and control status of the problem were completely crossed. For half of the sample, we used the pencil/eraser/\$1.10 content in the standard version and the magazine/banana/\$2.90 content in the control version. For the other half of the sample, the contents of the two presented problems were switched. Presentation order of the control and standard versions was also counterbalanced: Approximately half of the participants solved the control version first, whereas the other half started with the standard version. An overview of the material is presented in the Supplementary Material section.

Immediately after participants wrote down their answer, they were asked to indicate how confident they were that their response was correct by writing down a number between 0 % (totally not sure) and 100 % (totally sure). Note that we intend to use this measure only to contrast people’s relative confidence difference in the standard and control versions. Obviously, the confidence ratings will be but a proxy of people’s phenomenal confidence state. The response scale is not immune to measurement biases such as end preferences or social desirability effects (e.g., Berk, 2006). For example, since it might be hard to openly admit that one has given a response that one is not confident about, mere social desirability can drive people’s estimates upward. This implies that one needs to be cautious when interpreting absolute confidence levels. However, such interpretative complications can be sidestepped when contrasting the relative rating difference in two conditions. Any general response scale bias should affect the ratings in both conditions. Consequently, our analyses focus on the relative confidence contrast, and we refrain from making claims on the basis of the absolute confidence levels.

## Results

### Accuracy

In line with previous studies, only 21 % (SE = 2.3 %) of participants managed to solve the standard bat-and-ball problem correctly. Incorrect responses were almost exclusively (i.e., 194 out of 195 responses) of the “10 cents” type, suggesting that biased participants were not simply making a random guess but were, indeed, engaged in the postulated substitution process. As was expected, the control version that did not give rise to substitution was solved correctly by 98 % (SE = 1 %) of the participants, F(1, 247) = 714.94, p < .0001, η 2 p = .74.Footnote 1

### Confidence ratings

As Fig. 1 shows, the response confidence of participants who engaged in a substitution process and gave the erroneous “10 cents” response on the standard version was significantly lower than their confidence in their control version answer that did not give rise to the substitution, F(1, 194) = 58.54, p < .0001, η 2 p = .23. Consistent with our hypothesis, this establishes that biased reasoners are sensitive to the substitution and are not completely oblivious to the erroneous nature of their substituted judgment. Figure 1 also shows that this confidence decrease was far less clear for reasoners who solved the standard problem correctly, F(1, 47) = 4.89, p < .05, η 2 p = .09. Indeed, it makes sense that the few nonmisers who reflected upon their judgment and resisted the substitution also knew that their response was likely to be correct.

A critic might note that by presenting both the control and standard versions to the same participants, we might have artificially directed their attention to the substitution. We already tried to limit this problem by restricting the surface resemblance of the two problems, but to eliminate this possible confound completely, we also ran a control analysis that was restricted to the first problem that participants solved (i.e., half of the participants solved the control version first, whereas the other half started with the standard version). Results of this between-subjects analysis confirmed our findings. Biased reasoners who failed to solve the standard version were less confident in their answer (M = 85 %, SE = 1.8 %) than reasoners who solved the control version (M = 98 %, SE = 1.6 %), F(1, 222) = 26.89, p < .0001, η 2 p = .11.

## Discussion

The present data establish that reasoners are not completely oblivious to their substitution bias. When people substitute a harder question for an easier one, their response confidence indicates that they show some minimal awareness of the questionable nature of their answer. Hence, although reasoners might typically fail to reflect on the problem and might not know the correct answer (Frederick, 2005), they at least seem to sense that their substituted response is not fully warranted. Bluntly put, what these data suggest is that although we might be cognitive misers, we are not happy fools who blindly answer erroneous questions without realizing it.

To be clear, our findings do not argue against the popular characterization of the human decision-maker as a cognitive miser per se. Note that we replicated the massive preference for the substituted “10 cents” answer, for example. In line with most authors (e.g., Evans, 2010; Frederick, 2005; Kahneman, 2011; Stanovich, 2010), we also believe that the key reason for the substitution bias is that reasoners tend to minimize cognitive effort and stick to mere intuitive processing. However, the point we want to stress is that despite this lack of deliberate reflection, reasoners are not at the blind mercy of a substitution process. More generally, our findings suggest that cognitive misers might have more accurate intuitions about the substitution process than hitherto believed.Footnote 2 Although people experience a strong intuitive pull to engage in substitution and fail to deliberately reflect on their answer, our data suggest that, at the same time, they also sense that the substituted response is questionable.

At a more general level, a number of authors have recently suggested that such intuitive conflict sensations might act as a cue that allows our reasoning engine to determine whether it is needed to engage in deliberate thinking (e.g., Alter, Oppenheimer, Epley, & Eyre, 2007; De Neys, 2012; Thompson & Morsanyi, 2012; Thompson, Turner, & Pennycook, 2011). Thompson and colleagues (Thompson & Morsanyi, 2012; Thompson et al., 2011; see also Oppenheimer, 2008) have linked this process to the metacognitive memory literature (e.g., Koriat, 1993) and labeled it the “feeling of rightness.” In terms of this model, one could argue that the present data indicate that people’s “feeling of rightness” is lowered when they substitute. Bluntly put, people feel that their biased response is not “right.” In line with our claims, this suggest that the problem with substitution and judgment bias in general is not that people do not realize that they need to think harder, but rather that this deliberate processing is not successfully engaged.

It is important to clarify some potential misconceptions and critiques about our work. For example, some critics might spontaneously argue that since our control bat-and-ball version is easier than the standard version, our findings are trivial, since they simply show that people are more confident when answering an easy question than when answering a hard question. It is important to stress that this critique is begging the question. The crucial question is, of course, whether or not people realize that the classic version is hard. That is, the control version presents the easier statement that participants are supposed to be unconsciously substituting. What we want to know is whether or not people note this substitution. If people do not notice it, the two problems should be isomorphic, and they should be considered equally hard. In other words, arguing that people notice that the classic problem is harder than the control problem underscores the point that they are not oblivious to the substitution.

A related spontaneous critique is that our confidence findings might result from mere guessing, rather than from substitution sensitivity. In general, if people do not know an answer to a problem and guess, they presumably realize this and will also give a low confidence rating. Hence, a critic might argue that the lower confidence does not necessarily point to substitution sensitivity but merely to a rather trivial “guessing awareness.” However, this critique is readily discarded. In the present study, more than 99 % of the erroneous bat-and-ball responses were of the “10 cents” type. This is the response that people should pick if they engage in the postulated substitution process. Clearly, if people were biased and less confident because they were merely guessing, we should have observed much more random erroneous answers.

In the present study, we focused on the bat-and-ball problem because it is one of the most vetted and paradigmatic examples of people’s substitution bias (e.g., Bourgeois-Gironde & Vanderhenst, 2009; Kahneman, 2011; Kahneman & Frederick, 2002; Toplak et al., 2011). However, attribute substitution has also been proposed as an explanation for people’s judgment errors in other classic reasoning tasks, such as the base-rate neglect or conjunction fallacy task (Kahneman & Frederick, 2002). Although it has been argued that these tasks might be less suited for testing substitution claims (e.g., Bourgeois-Gironde & Vanderhenst, 2009), one might nevertheless wonder whether the present findings can be generalized across these tasks. Some emerging evidence suggests that they might. For example, a recent study showed that when reasoners give a biased response to standard conjunction or base-rate neglect problems, they also indicate being less confident about their response, as compared with control problems (e.g., De Neys, Cromheeke, & Osman, 2011; see also De Neys & Feremans, 2012). This gives us some initial indication of the generality of the present findings.

With the present article, we hope to have presented a critical building block to stimulate further research on people’s substitution sensitivity. Obviously, we acknowledge that our findings will need to be extended, but we nevertheless feel that it is important to bring them under the attention of the wide range of researchers interested in human thinking. At the very least, the present data should alert scholars that the popular idea that substitution typically goes unnoticed is disputable and needs closer empirical testing.