Introduction

Problems of reasoning, judgment and decision-making can be tricky in that they create a conflict between a compelling response that immediately comes to mind, and a more thought-demanding response that is advised by careful thinking. A number of dual-process theories (e.g., Kahneman, 2011; Stanovich, 1999; see Evans, 2003, for a review) interpret the success or failure to solve such conflict problems as evidence that either deliberative or intuitive reasoning presided over the problem solving. Depending on cognitive ability, executive functioning, and thinking dispositions (e.g., Toplak, West, & Stanovich, 2011), metacognitive feelings of rightness (Thompson, Prowse Turner, & Pennycook, 2011), or how much motivation (e.g., Mata, Ferreira, & Sherman, 2013a) and time (e.g., Schroyens, Schaeken, & Handley, 2003) there is to think carefully about the problem, people would either reason intuitively or deliberatively. However, we will argue that, even before intuition or deliberation can operate, attentional processes can determine the course of reasoning.

We consider a two-stage account of reasoning (Evans, 1984, 1989, 1996; Mata, Schubert, & Ferreira, 2014; see also Margolis, 1987) whereby, in a first stage, a problem is interpreted and information is selected. Often, only part of the presented information is represented. In a second stage, deliberative or intuitive thought processes operate on this information to produce a response. Incorrect responses can result from errors at either stage. Whereas most dual-process approaches have traditionally emphasized Stage-2 explanations (but see Evans, 2006, for a more comprehensive framework), our research explores both stages, and is particularly well-suited to investigate Stage-1 processes. To illustrate this, consider the popular bat-and-ball problem (Frederick, 2005):

A bat and a ball together cost 110 cents.

The bat costs 100 cents more than the ball.

How much does the ball cost?

A Stage-2 explanation assumes that both correct and incorrect responders represent the premises accurately and differ only in their ability to engage in more deliberative reasoning to overcome the intuitive response (10 cents) and come up with the correct response (5 cents). However, a Stage-1 explanation holds that incorrect responders might interpret the problem incorrectly by misrepresenting the critical premise where the intuitive-vs.-deliberative conflict lies: “The bat costs 100 cents more than the ball”. If reasoners fail to pay close attention to this premise, even sound deliberative reasoning will produce an incorrect response.

Mata et al. (2014) explored this Stage-1 account by using a change detection paradigm. Participants solved reasoning problems, and they were then shown either exactly the same problems again or slightly different versions of those problems, and they had to indicate whether they noticed any change. In some trials, the premises underwent small but crucial changes that turned a conflict problem into a no-conflict problem. Consistent with a Stage-1 explanation, incorrect responders were worse than correct responders at detecting those changes.

Whereas previous research (Evans, 1996; Mata et al., 2014) tested the attentional account only in indirect ways, the present research offers a more direct test: eye-tracking. Analyzing how on-line attention correlates with reasoning performance offers a direct test of the attentional account, as the location and duration of fixations are precise indicators of what information participants focus on and how much effort they put into processing it (Rayner, 1998).

We devised conflict vs. no-conflict versions of the same problems while holding the critical premise constant, thereby making them directly comparable, and we assessed how much attention reasoners pay to that premise. Here is the conflict problem used in Study 1:

A bat and a ball are on sale.

Together they cost 110 euros.

The bat costs 100 euros more than the ball.

How much does the ball cost?

And here is the no-conflict version:

A bat and a ball together are on sale.

The ball costs 10 euros.

The bat costs 100 euros more than the ball.

How much does the bat cost?

The critical third premise is identical across versions. We assessed how long reasoners fixated that premise, and how many times they went back to it, suggesting that they realized its critical nature.

An attentional account predicts that correct responders are more attentive and sensitive to conflict, which yields two hypotheses: (1) correct responders should pay more attention to conflict problems than incorrect responders; and (2) correct responders should pay more attention to conflict problems than to no-conflict problems, whereas incorrect responders should discriminate less between the two. More specifically, we do not expect that correct responders are simply more attentive overall. Rather, they should devote particular attention to the critical premise, but not the other conflict-irrelevant premises.

Study 1

Method

Participants

Fifty-two participants were recruited at the University of Heidelberg and received course credit.

Materials/apparatus

Eye movements were recorded using a table mounted iView X Hi-Speed eye-tracker (SensoMotoric Instruments, Teltow, Germany) with a temporal resolution of 1250 Hz. Participants placed their head on a chin rest, 60 cm away from the screen. The experiment was programmed with ExperimentCenter 3.5 (SensoMotoric Instruments), and presented on a 23.6” LCD monitor with a resolution of 1920 × 1080 pixels.

Procedure

There were four trials. At the beginning of each trial, a 13-point calibration procedure was conducted. After the calibration, a trigger sentence (“Please look here”) appeared on the screen, positioned one line above the first sentence of the problem that would later be presented. When participants focused on the trigger sentence for at least 1000 ms, it disappeared and the problem appeared below.

The first two trials consisted of practice problems (e.g., “There are several countries in each continent. Some continents have more countries than others. What is the continent with more countries?”) The third and the fourth trials were the conflict and no-conflict versions (counterbalanced across participants) of the bat-and-the-ball problem presented above.

In all trials, each sentence of a problem (including the question) was presented in a separate line in Arial font, size 24, with double spacing between the lines. Participants pressed the space bar to start responding. The problem was then removed except for the last sentence (i.e., the question) and a blank window on the screen where participants filled in their response.

Fixations and saccades were detected using BeGaze 3.5 (SensoMotoric Instruments). The minimum fixation duration was 50 ms. The first fixation of each trial was excluded. Non-overlapping rectangular areas of interests (AOI) were defined for each sentence of the critical problems. For some participants, the position of these AOIs was adjusted to a straight drift of the eye gaze above or below the sentence. The size of the AOIs was the same for all participants. Drift correction was not possible for six participants because drift was skewed and irregular. The data of these participants were excluded as drift correction would have caused the AOIs to overlap, or rectangles of the same size would not have captured the scan path on each of the four sentences. Due to software problems, the data of two participants were not recorded. Thus, the data of 44 participants were included in the analyses.

Results

For the conflict problem, 63.5% of participants gave the intuitive but incorrect response, and 32.7% responded correctly. For the no-conflict problem, 92.3% responded correctly.

Two attentional measures were calculated: total fixation time and number of times a sentence was revisited after a first fixation (henceforth, revisits).

For each of these measures, we tested attention as a function of sentence (1–4), type of problem (conflict or no-conflict), performance (correct or incorrect response to the conflict problem), and sequence (conflict problem first or no-conflict problem first). Across both measures, results (see Table 1) show the expected pattern of a performance × conflict × sentence interaction, Fs ≥ 2.26, Ps ≤ .085, ηp 2 ≥ .05, such that incorrect responders did not differentiate between the conflict and no-conflict versions for any of the sentences, whereas correct responders paid more attention to conflict versus no-conflict, particularly for the sentence conveying the critical premise. None of these 3-way interactions was qualified by a 4-way interaction with sequence, Fs < 1, so that this pattern holds across sequence conditions. The difference in number of revisits for conflict vs. no-conflict is not significant for any of the sentences, but there is a trend such that correct responders again paid more attention to the critical sentence in conflict vs. no-conflict, whereas the opposite holds for incorrect responders.

Table 1 Mean (SD) attention scores by sentence, type of problem, and performance (Study 1)

Comparing correct and incorrect responders’ attention to the critical premise revealed significant differences in both measures, but only when that sentence was part of the conflict problem (see Table 1). Notice, for instance, that the number of revisits for the critical premise is by far the highest across sentences and conditions, and that correct responders paid more than twice as many revisits to that premise than incorrect responders did (see Fig. 1).

Fig. 1
figure 1

Number of revisits by sentence, type of problem, and performance (Study 1)

It was not the case that correct responders paid more attention overall than incorrect responders. The only sentence for which that was consistently the case was the critical premise.

Study 2

Study 2 sought to replicate Study 1 with a larger number of problems: Now, three conflict problems and three no-conflict problems were presented to every participant. Moreover, this second study measured attention in a continuous fashion, from the time when participants started to analyze the problem to the time when they were done responding to it. Thus, unlike Study 1, attention was also measured while participants were responding, capturing any reviewing of the premises that participants might have done at all stages of the reasoning process.

Method

Participants

Seventy-five participants were recruited at the University of Heidelberg and received either partial course credit or 6 Euros. Seventeen participants had to be excluded because of problems that precluded eye tracking (e.g., small glasses, hard contact lenses, problems tracking the pupil because of mascara). Data from 58 participants were analyzed.

Materials/apparatus

Eye movements were recorded using an iView RED250 mobile eye-tracker (SensoMotoric Instruments) with a temporal resolution of 250 Hz. This eye-tracker is fixed below the monitor of a laptop computer and allows relatively free movement of the head. The experiment was programmed in C, and presented on a laptop computer with a screen resolution of 1920 × 1080 pixels.

Procedure

The experiment started with a 13-point calibration. At the beginning of each of 12 trials, participants had to look at a fixation cross that was presented in the center of the screen for a maximum of 5 s. The trial started immediately when the gaze position was registered at fixation position for at least 500 ms; if this did not occur within 5 s, the camera was recalibrated. In each trial, three or four sentences (each in one line) were presented in black 24 pt Arial font on a light grey background with lines separated by 100 pixels.

The first two trials consisted of practice problems, as in Study 1. The following 11 trials comprised 6 experimental trials (3 bat-and-ball-like problems, in a conflict and no-conflict version each), separated by 5 filler trials (problems that required easy calculations) that were not analyzed. Here is an example of a conflict problem:

A TV and DVD are on sale.

Together they cost 110 euros.

The TV costs 100 euros more than the DVD.

How much does the DVD cost?

The no-conflict version:

A TV and a DVD are on sale.

The DVD costs 10 euros.

The TV costs 100 euros more than the DVD.

How much does the TV cost?

Experimental trials were presented in random order; however, experimental trials and filler trials always alternated, so that experimental trials never appeared in succession.

Participants had to press the space bar when they were ready to enter their response to each problem. Then, a response field appeared below the problem, which—in contrast to Study 1—remained on screen, so that re-inspection of the problem was possible at all times. Gaze data was analyzed as in Experiment 1, with the exception that gaze behavior was also analyzed while participants responded. Therefore, attention was measured continuously since participants started to read the problem until they were done responding and moved on to the next trial.

Results

For conflict problems, 62.1%–65.5% (range across problems) of participants gave the intuitive but incorrect response, and 34.5%–37.9% responded correctly. For no-conflict problems, 89.9%–94.8% responded correctly.

As in Study 1, for each sentence we analyzed the average fixation time, and the average number of revisits across problems. For each of these measures, we ran an ANOVA analyzing attention as a function of sentence (1–4), type of problem (conflict or no-conflict), and performance (number of correct responses to the conflict problems). For both measures, the performance × conflict × sentence interaction emerged, Fs ≥ 2.68, Ps ≤ .006, ηp 2 ≥ .13. While incorrect (below median; Ncorrect = 0) responders did not distinguish between the conflict and no-conflict versions of any of the sentences, correct (above median) responders allocated greater attention to conflict versus no-conflict problems, particularly to the critical premise of those problems (see Table 2).

Table 2 Mean (SD) attention scores by sentence, type of problem, and performance (Study 2)

Performance correlated positively with attention, but only to the critical premise, and only for conflict problems: for fixation time, r = .29, P = .026; for revisits, r = .42, P < .001.

And again, it was not the case that correct responders paid more attention than incorrect responders overall. For the conflict-irrelevant sentence 1, performance even correlated negatively with attention: for fixation time, rs ≤ –.27, Ps ≤ .043, for both conflict and no-conflict versions; for revisits, r = –.29, P = .028, in the no-conflict version. For all other correlations, Ps > .05.

General discussion

The present studies used eye-tracking to analyze the attention that reasoners pay to the premises of problems. Two critical findings emerged: first, correct responders pay more attention than incorrect responders to the critical premise in conflict problems (i.e., problems that trigger an intuitive response that is incorrect). Second, they pay more attention to that premise when it is part of a conflict problem than when it belongs to a no-conflict problem. Incorrect responders are less able to discriminate between conflict and no-conflict problems. It is not the case that correct responders pay more attention overall than incorrect responders. Rather, they are particularly attentive to the critical premise that poses the intuitive-vs.-deliberative conflict.

These results are consistent with a two-stage account of reasoning (Evans, 1984, 1989; Mata et al., 2014), whereby even before people can reason intuitively or deliberatively about a problem, they need to pay close attention to its premises and represent them accurately. To be sure, accurate premise representation is a necessary, but not sufficient, condition for sound problem-solving. That is, people who do not pay attention cannot solve the problem correctly, but paying attention does not guarantee successful problem-solving. For instance, in Study 1, none of the participants who did not revisit the critical premise solved the conflict problem, but even among those who did so, only 38.6% solved the problem. This two-stage framework offers a more fine-grained account of reasoning errors: there are incorrect responses that can be explained by faulty reasoning, even when the reasoner is operating with the correct premises. However, there are also incorrect responses that arise from reasoning about a problem whose premises were misrepresented, even if one’s reasoning is sound.

This attentional account suggests that incorrect responses sometimes emerge at a very early processing stage. However, De Neys and colleagues (see De Neys, 2012; De Neys & Bonnefon, 2013) suggest that the onset of incorrect responses occurs later, and that incorrect responders are able to detect conflict; they are simply not able to solve it. In the study that is more comparable to ours, De Neys and Glumicic (2008, Study 2) found that, for base-rate problems, incorrect responders took longer to solve conflict vs. no-conflict problems, and they reviewed the conflict-relevant information in the premises more for conflict vs. no-conflict problems, suggesting some degree of conflict sensitivity (though correct responders were overall more conflict-sensitive). However, in both the present studies and the studies by Mata et al. (2014), incorrect responders in general did not discriminate between conflict and no-conflict problems (see also Ferreira, Mata, Donkin, Sherman & Ihmels, 2016; Mata & Almeida, 2014; Mata, Ferreira, & Sherman, 2013b, Study 3; Pennycook, Fugelsang, and Koehler, 2015). How to reconcile these findings?

One possible explanation pertains to differences in methods: The “moving window” technique used by De Neys and Glumicic (2008) is a more obtrusive, less natural, and less sensitive way of assessing attention than eye-tracking, which is able to monitor spontaneous attentional processing, and pinpoint specific attention foci that were of interest to our hypothesis. Moreover, there might be individual differences, such that not all incorrect responders are alike, and that some of them, though not all, are able to detect conflict (Mevel et al., 2015; Pennycook et al., 2015; Pennycook, Cheyne, Barr, Koehler, & Fugelsang, 2014). To identify these responders, we calculated a subtraction score comparing the attention that responders paid to the critical premise in the conflict versus no-conflict versions of the problem (a positive score suggests sensitivity to conflict). Results in Study 1 indicate that there are indeed incorrect responders who show conflict sensitivity, but they are a minority: 33.3%–44.4% (depending on the measure) vs. 58.8%–82.4% of correct responders. In Study 2, 51.9%–55.6% of incorrect (below median) responders score positive vs. 90.3% of correct (above median) responders. One final possibility is that the studies simply did not have sufficient power to detect small effects.

De Neys and Bonnefon (2013) discuss, as possible reasons for responding incorrectly, an inhibition failure (i.e., responders are able to detect the conflict, but they fail to override the intuitive response), as opposed to a monitoring failure (i.e., responders do not detect the conflict), or a storage failure (i.e., they do not even possess the correct knowledge of how to think properly about the problem). Our findings suggest a fourth possibility, which is aligned with Evans’ original heuristic-analytic theory (Evans, 1984, 1989): a representation or comprehension failure. It is not necessarily the case that participants do not have the correct logical reasoning principles stored or fail to inhibit the intuitive answer; rather, they seem to fail to represent or understand the problem accurately. If this is the case, even if participants possess adequate knowledge, and good monitoring and inhibition skills, the response is bound to be incorrect. Indeed, (1) even if one possesses the correct logical reasoning principles, one will fail to give the correct response if one is using them to reason about the wrong premises; (2) even if one has the ability to monitor conflict, no conflict will be detected if the conflict-relevant part of the premises is misrepresented or neglected; and (3) in that case, there is no need to inhibit competing biased responses. In fact, the responses that come to mind will not be considered biased in light of the mental representation of the problem.

In sum, both the results presented here and those reported by Mata et al. (2014) do not suggest a failure to store logical reasoning principles, but rather a failure to store the information in the premises. To be sure, we are not arguing against the possibility of inhibitory failure—we found some indication for it in the individual-differences analysis. What we are arguing for is a different kind of failure, from which reasoning errors might emerge: lack of attention or miscomprehension of the problem. According to the revised and extended heuristic-analytic theory of reasoning (Evans, 2006), analytic processes operate not only on the Type-1 representations of the problems, but they also include the ability to reset default representations of the problem. However, they function according to a satisficing principle, which more often than not leads to the acceptance of merely good enough representations. This seems to be the case for the majority of the incorrect problem-solvers in our studies.

In conclusion, research on judgment and thinking-and-reasoning has focused mainly on the quality of the reasoning processes. Our research suggests that it is just as important to consider the quality of the representations on which those processes are based. Accurate comprehension of a problem is the foundation on which sound reasoning is built.