Introduction

The ability to detect causal relationships in our environment has been critical in the evolution of our cognitive system and it allows us to adjust our behavior to the causal structure of the environment. However, our tendency to detect causal patterns may sometimes make us misperceive a causal link where there are only spurious coincidences. This is called the causality bias, also known as the causal illusion or the illusion of causality (Matute et al., 2011). Thus, causal illusions arise when people erroneously perceive a cause-effect relationship between events that are independent from each other. For example, a patient may erroneously interpret a spurious coincidence between following a bogus treatment and recovering from a disease as favorable evidence of a cause-effect relationship even when those events are not causally related. Indeed, causal illusions are at the basis of the use of alternative medicine (defined as that which is not supported by evidence), as users develop the illusion that there is a causal relationship between the alternative treatment and their recovery (Blanco et al., 2014; Matute et al., 2011). The use of alternative medicine is a serious health problem today (e.g., Freckelton, 2012; Lim et al., 2011) and is probably one of the fields in which causal illusions can result most dangerous. In addition to alternative medicine, we can find multiple examples of causal illusions in many important areas of everyday life, such as politics, education, societal issues and personal beliefs. Indeed, the causality bias is associated with many problems. Examples include ideological extremism (Lilienfeld et al., 2012), educational practices with little or null empirical support (Double et al., 2020), stereotypes and prejudices (Blanco et al., 2018; Hamilton & Gifford, 1976), superstitious and paranormal beliefs (Blanco et al., 2015; Griffiths et al., 2019), and pseudoscientific beliefs (Lindeman, 1998; Matute et al., 2019; Matute et al., 2011; Schmaltz & Lilienfeld, 2014; Torres et al., 2020).

Illusions of causality may arise, in real life and in the laboratory, either through personal experience or vicariously (i.e., by directly experiencing coincidences or by listening or observing other people or commercial campaigns about co-ocurring events, such as, for instance, observing people taking pill X and feeling better; see Matute et al., 2011). At the experimental level, causal illusions are typically studied using a standard contingency judgment task (Allan et al., 2005; Perales et al., 2005; Wasserman et al., 1996) in which a cover story is used to accurately mimic the conditions under which such beliefs develop in real life. In this task, two events are presented during several trials, and participants are then asked to assess a possible causal link between them. For example, a fictitious new drug might be presented as the potential cause and the remission of a fictitious disease in a series of fictitious patients might be the potential outcome (Matute et al., 2015). In each trial the medical record of a fictitious patient is shown, in which the potential cause is present or absent (e.g., the patient takes or does not take the drug), and then the outcome follows or not (e.g., the patient recovers from the disease or not). In order to test for illusory causality, a null contingency scenario is typically used. That is, the contingency between the potential cause and the outcome is usually set to be null (e.g., the probability of healing is the same regardless of whether patients take the drug or not). While participants should, therefore, conclude that there is no causal link, it has been shown that many participants incorrectly conclude in null contingency settings that the potential cause is producing the outcome (e.g., that the drug is producing the healing), which is interpreted as an illusion of causality.

As noted by Lilienfeld et al. (2009), due to the significant risks that cognitive biases (such as the causality bias) can pose to humanity, developing strategies to reduce them can be one of the most important achievements of modern psychological science. The purpose of the present research is to test one such educational strategy designed to debias the illusion of causality.

Factors that can prevent causal illusions

The available laboratory evidence suggests that many different strategies could be used to reduce the illusion of causality (see Matute et al., 2015, 2019 for review). In laboratory settings, the causality bias has been shown to increase when the probability of the potential cause, P(C), is high (e.g., when many of the fictitious patients take the drug; Allan & Jenkins, 1983; Hannah & Beneteau, 2009; Matute et al., 2011), when the probability of the outcome, P(O), is high (e.g., when many of the fictitious patients heal; Allan & Jenkins, 1983; Allan et al., 2005; Buehner et al., 2003), and it is particularly intense when both P(C) and P(O) are high (Blanco et al., 2013). Moreover, when the experimental task is active and it is the participant who introduces the potential cause in each trial, then it is the participants’ probability of responding, P(R), what determines the P(C). Thus, when the participants’ P(R) is high, they observe many patients exposed to the drug and few non-exposed patients, thereby the information that they encounter is biased with a high P(C). Importantly, in these active experiments, a P(R) close to 0.50 would allow participants to learn both what happens when the cause is present and when it is absent in a balanced way. However, it has been shown that participants often tend to introduce the potential cause in more than 50% of the trials (e.g., they administer the drug to the majority of patients; Barberia et al., 2013; Blanco et al., 2011; Matute et al., 2019). If the (non-contingent) outcome is also programmed to occur frequently, thus mirroring real-life situations of spontaneous remission, their biased strategy increases the percentage of trials in which both the cause and the outcome accidentally coincide, and this increases the causal illusion (Matute et al., 2015).

Thus, in active settings in which the P(C) depends on the participants’ P(R), a very useful strategy to reduce the illusion would be to teach participants that, in order to assess causal relations, they need to test both what happens when they respond and when they do not respond. Indeed, it has been shown that providing explicit instructions to participants on how to respond efficiently reduces the causal illusion (e.g., Hannah & Beneteau, 2009; Matute, 1996).

Another procedure that has been shown to reduce the illusion of causality consists of providing information about potential alternative causes so that the participants are aware that other causes might be responsible for the observed effects (Vadillo et al., 2013). Also, providing the information on the potential cause and effect in a foreign language or in a hard-to-read font (e.g., Costa et al., 2014; Díaz-Lago & Matute, 2019a, b) are strategies that reduce the causal illusion, probably because they make participants think slowly. One more possible way to prevent causal illusion could be to provide information about the existence of the side effects of a medicine. In a previous study, Blanco et al. (2014) showed that when people were told about the potential side effects of a fictitious drug, they used it less often, and therefore developed a weaker illusion, as compared to those who were not informed about the side effects.

Debiasing interventions that reduce causal illusions

The strategies described above are all based on laboratory research testing which variables can reduce the strength of causal illusions. They provide important information on the types of variables that should be targeted when developing educational debiasing strategies to prevent causal illusions in our society. At the moment, however, research applying this knowledge and developing and examining the effects of educational debiasing strategies designed to reduce the illusion of causality is still scarce. To our knowledge, there are only two studies that report developing and testing an educational intervention specifically designed to reduce the illusion of causality (Barberia et al., 2013, 2018).

The study reported by Barberia et al. (2013) was conducted with adolescents. The experimental group received the intervention before assessment, whereas the control group did not. The intervention included two phases: a cognitive bias induction phase (hereafter, induction phase), and a training phase. These two phases were then followed by the assessment of the causal illusion. In the bias induction phase, the researchers presented a little piece of ferrite as a “miraculous product” that supposedly improved cognitive and physical skills. Participants were asked to experience the benefits of the product while performing a series of cognitive and physical tasks. Once they were convinced that the miraculous product had improved their physical and cognitive skills, the training phase started. At this point, the researchers explained that the alleged effects of the product were fake, and trained the participants on the need to ask questions and to conduct experiments and tests with adequate control conditions before concluding that a product or technique is causally effective. The intervention was thus designed to help participants understand the need for critical and scientific thinking, and for the experimental control of variables.

The efficacy of this debiasing procedure was then tested during the subsequent assessment phase using a standard contingency learning task. The control group received only the assessment phase. This phase included both a null contingency scenario, where taking a drug did not produce more healings than not taking it, and a positive contingency scenario, where taking a drug did produce more healings than not taking it. As expected, the control group showed an illusion of causality in the null contingency scenario, and this illusion was reduced in the group that had received the two phases of the intervention. Also, as expected, the positive contingency task generated accurate causal judgments in both groups.

More recently, another study by Barberia et al. (2018; see also Rodríguez-Ferreiro et al., 2021) showed the effectiveness of another, similar, intervention in reducing causal biases in undergraduate students. As in Barberia et al. (2013) the intervention included two phases: an induction phase, and a training phase, and these were followed by an assessment phase. The bias induction phase was different from the one in the 2013 study, but in both cases, the aim was that the students would become aware of their own biases before they received the training phase. In Barberia et al.’s (2018) study, the participants experienced two well-known cognitive biases during the bias induction phase: the Forer (1949) effect and the confirmation bias (Wason, 1960). The Forer effect consists of accepting a vague and generalized personality report as if it were an accurate description of one’s personality. Indeed, these personality reports are so general that they could apply to anyone, but participants tend to believe that they do describe their own personality (Forer, 1949; Snyder et al., 1977). The confirmation bias refers to the tendency to search, recall, and partially interpret information that confirms a belief, an expectation, or a hypothesis, ignoring alternative information that may lead to reject it (Nickerson, 1998). In the subsequent training phase, students received training about both biases and the original studies were discussed, following a training-in-bias methodology (Larrick, 2004). In addition, the training phase was completed with the training-in-rules methodology (Larrick, 2004), highlighting the considering-the-opposite strategy to reduce the confirmation bias. Presumably, this intervention should make students aware and more vigilant of their own biases. Therefore, this effect should also be evident in a subsequent reduction of the illusion of causality. As expected, the control group showed and illusion of causality in a null contingency scenario and this illusion was reduced in the group that received the intervention. In that research, the assessment phase included only the null contingency scenario (i.e., a positive contingency scenario was not included).

Importantly, the similar (but different) interventions used by Barberia et al. (2018) and Barberia et al. (2013) have both proven to be effective to reduce the causal illusion. We might ask why using cognitive biases different from the causal illusion (e.g., the Forer effect and confirmation bias) in the intervention described by Barberia et al. (2018) might have reduced the illusion of causality. We consider that these phenomena are influenced by factors similar to those facilitating the development of illusions of causality. First, the Forer effect might reflect, at least in part, the tendency to overweight information consistent with prior beliefs. This predisposition might also be at the basis of illusions of causality, where a tendency to overweight coincidences between the candidate cause and the expected outcome (i.e., confirmatory information) might lead to stronger illusions (Griffiths et al., 2019). Second, the confirmation bias as experienced when solving the Wason task (see methods section) might be analogous to the information search strategy that is typically observed in causal illusion tasks, in which people tend to look for causal information by frequently introducing the candidate cause, again promoting the development of the illusion (Barberia et al., 2013).

Another question that we may ask is the necessity of the induction phase. Given that mere knowledge on cognitive failures does not seem to be sufficient to eradicate them (Larrick, 2004), it may be necessary to complement this knowledge with the previous induction phase. Indeed, debiasing interventions often compromise intuitions and require that people recognize their own biases (Lilienfeld et al., 2014). This is particularly difficult because most people are able to recognize biases in others, but not in themselves. This is known as the bias blind spot (Pronin et al., 2002, 2004). The induction phase might thus be appropriate to overcome the bias blind spot, because it exposes participants to situations in which their cognitive biases will arise, thereby increasing awareness of their own biases (e.g., Barberia et al., 2013, 2018). Awareness of one's own biases is important also because the lack of personal participation can reduce the effectiveness of debiasing strategies (Arkes, 1991; Harkness et al., 1985). Last but not least, one more problem that is often associated to debiasing strategies is that some interventions can generate a backfire effect and, ironically, reinforce preexisting biases (Lewandowsky et al., 2012). The induction phase could help preventing the backfire effect because that phase should force participants to confront their own biases, and this should increase their receptivity of the evidence provided during the subsequent training phase (Lewandowsky et al., 2012).

In sum, a possible reason of the effectiveness of the two interventions that we just described (Barberia et al., 2013, 2018) could lie in their use of an induction phase before the training phase. To the best of our knowledge, however, this issue has not yet been studied. Investigating the role of the induction phase in these debiasing strategies is the purpose of the present study.

The present study

The present experiment aims to examine the role of the induction phase in a debiasing intervention designed to reduce the causality bias. To this end, we will use the intervention of Barberia et al. (2018) because it has already been shown effective in undergraduate students, the target population of the present study. Also, this intervention is more general than the one presented by Barberia et al. (2013), as it uses different biases during the induction phase and during assessment. Thus, if our intervention shows to be effective, it might probably have a greater applicability to reduce a larger number of biases in future studies.

In addition to the two groups used by Barberia et al. (2018), we will also use a group which does not receive the induction phase. We expect the causal bias to develop in the control group and to be reduced at least in the group replicating the intervention by Barberia et al. (2018). The group lacking the induction phase of the intervention will test whether this phase is needed.

In addition, we also added a positive contingency problem which had not been included in Barberia et al. (2018). We expected that all groups should be accurate in solving this positive problem (Barberia et al., 2013), so in case the intervention produced a reduction in the causal judgements in the positive contingency, such effect should be best explained as an increase in general skepticism.

Method

Ethics statement

The ethics committee of the University of Deusto approved the procedure of the present study.

Participants

A total of 234 Teacher Education undergraduate students took part in this experiment (62% women, 38% men; ages 18 – 31, M = 20.23; SD = 1.78). Participants were attending from 1st to 4th year of college, and no significant differences on causal judgement by course were found (F(2,113) = 0.18, p = 0.831). Participants were randomly assigned to one of three groups: induction + training (IT, n = 79), training (T, n = 80), and control (C, n = 74). There were no significant age differences between the groups, F(2,230) = 0.09, p = 0.914. Only one participant in the training group failed to provide his age.

The study was conducted during regular class time, within the framework of an academic program designed to increase scientific thinking among students. Because the experiment was conducted in the context of an academic activity, all students who attended the class could participate in the study if they wished to. However, the data were only collected from those students who, in addition, gave their informed consent by clicking a button to submit their responses anonymously at the end of the study.

Design

Table 1 shows the design of the experiment. The three groups were named induction + training (IT), training (T), and control (C), as a function of the phases that they received before the assessment phase, which was the critical phase in which we expected to observe differences between groups. Group IT received the complete intervention including both induction and training before assessment, as in Barberia et al. (2018). Group T was the group in which we aimed to test the role of the training phase alone. Thus, only the training phase was provided in this group before assessment. Finally, in Group C, assessment took place in the absence of any intervention. Nonetheless, and due to ethical considerations, the two groups that had not received some of the phases of the intervention received them after the termination of the experiment, that is, once the assessment phase had been completed.

Table 1 Design summary of the experiment

Procedure

The study took place in three identical replications of about 90 min each, in three computer classrooms, using one desktop computer per participant. Participants were sat about one meter apart from each other and were encouraged to work individually on the experiment. In order to control for potential instructor effects, the same researcher conducted the intervention phases (i.e., induction and training) when they occurred before assessment and an additional researcher was present to help with questions and technical issues. The intervention was a replication of Barberia et al. (2018).

Bias induction phase

The induction phase started with a staging about a fake psychological theory that we called "modes of thought". According to this theory, a personality description can be obtained through the analysis of responses in tasks that involve basic cognitive processes such as attention, perception and learning. After briefly introducing the fake theory, we asked participants to perform two computerized tasks related to this theory.

The first task was presented as a personality test but it was actually a fake personality test designed as a Forer (1949) effect. Participants were asked to complete a test inspired by the online brain quiz of Sommer + Sommer (https://braintest.sommer-sommer.com). The first part consisted of a point-and-click version of the Stroop test. The second part was a pattern selection test, in which participants were asked to choose the colored geometric figures most similar to a given target. Then, the computer supposedly analyzed the data generated from these tests and presented a fake personality report to each student. This report used the original vague phrases from Forer´s (1949) study and was identical for all participants (Spanish translation from https://es.wikipedia.org/wiki/Efecto_Forer). In order to increase the perceived accuracy of the generated personality report, the participants’ gender was adapted in the descriptions. The order in which the different phrases of this report were presented was randomized for each participant. This was done in order to prevent them from noticing that the report that their peers were reading was identical to theirs. After reading their supposedly personalized reports, participants were asked to assess how accurate was their personality description, using a scale from 0 to 100.

The second task in the induction phase was presented as a reasoning test and consisted of a computerized version of Wason's (1960) 2–4-6 task, adapted from Grobman (2003). In this task, participants have to discover a rule that determines the relationship between three numbers. At first, a sequence of three numbers that complies with the target rule is presented, for instance, 2–4-6. Next, participants are asked to propose another sequence of three numbers in order to test whether it fits the rule and to describe the rule that they believe underlies the given sequence of numbers. The computer then provides feedback indicating whether or not that new sequence complies with the rule. Participants are allowed to continue testing sequences of three numbers and describing the rule that they think the computer is using, until they are sure they have identified the correct rule (up to a maximum of 20 trials). The typical strategy used by the participants implies testing sequences that fit (i.e. confirm) their hypothesized rule. At the end of this task, participants were asked to what degree (on a scale of 0 to 100) they felt confident that they had discovered the rule. We registered (a) whether the rule they stated was correct, (b) the confidence level of their response (0–100), independently of whether the response was correct, and (c) the number of sequences they tested (from 0 to 20) before emitting their final judgment (also, regardless of correctness). Participants were not informed about whether the rule they had guessed was right or not until the subsequent training phase.

In this task, participants generally follow a positive testing strategy. First, they test a hypothesis (e.g., “increasing numbers by two”) and they generate sequences that fit that hypothetical rule. However, this positive testing strategy can involve a confirmation bias (see Klayman & Ha, 1987; Nickerson, 1998) and this is not effective in this task, in which the rule is very general (i.e., "increasing numbers”). Alternatively, a strategy that does lead to the discovery of the correct rule is to "consider the opposite", that is, to test sequences of numbers that do not satisfy the rule (e.g., 3–6-9), so that the hypothesis is falsified and a new and broader hypothesis has to be developed.

Training phase

The training phase consisted on explaining the two cognitive biases that were induced during the induction phase, that is, the Forer (1949) and Wason (1960) tasks. This was completed with the considering-the-opposite strategy, following a training-in-rules methodology (Larrick, 2004). For this purpose, Forer’s (1949) original personality report from 1949 was first presented. It was then explained that the personality test that they completed was intentionally fake and that the personality report that they received was identical for all participants. This was completed with a discussion on the personal validation fallacy (i.e., Forer effect). Next, the 2–4-6 Wason's (1960) task was described showing the common confirmatory testing strategy described above. This was completed with a discussion on the confirmation bias and focused on how to reduce it through training-in-rules (Larrick, 2004) and the considering-the-opposite strategy. Finally, we provided several examples of everyday situations that might involve the confirmation bias (e.g., horoscope reading, personality assessment through graphology, the effect of the full moon, or the questionable relationship between joint pain and relative humidity; Lilienfeld et al., 2011), and discussed how the considering-the-opposite strategy could help prevent it.

Assessment phase

Assessment of the causal illusion was conducted using a standard contingency judgment task (e.g., Matute et al., 2015). All participants were presented with a null contingency and a positive contingency problem. In both cases, they were asked to imagine being a medical doctor and their task was to determine whether a fictitious drug was effective in providing relief to a series of fictitious patients suffering from a fictitious disease. The order of presentation of the null and the positive contingency problems was randomly determined for each participant. In the firstly presented problem, the fictitious drug was called Batatrim and the fictitious disease was called Lindsay Syndrome. In the secondly presented problem, the fictitious drug was called Dugetil and the fictitious disease was called Hamkaoman Syndrome.

The procedure was exactly the same in both stages. The records of 40 fictitious patients suffering from the disease were presented sequentially, one per trial. In each trial, participants decided whether they wanted to administer the drug to the patient or not. This information was collected for the purpose of calculating the P(R), that is, the number of trials in which the participant administers the drug to the fictitious patient, divided by the total number of trials. Then, participants observed the outcome, O, for that patient, that is, whether that patient was relieved or not. At the end of each problem, that is, after all 40 trials had been finished, participants were asked to evaluate the effectiveness of the drug on a scale ranging from 0 (definitely not effective) to 100 (definitely effective). Causal judgments on this scale along with the P(R) for each problem were our main dependent variables.

In the null contingency condition, the probability that patients recover from the disease, P(O), was programmed to be high (0.75), regardless of whether or not the drug was administered. Therefore, the drug was ineffective (i.e., contingency was 0) because it did not increase the probability of healing. Specifically, among those patients who received the drug and among those who did not receive it, 6 out of 8 healed. This high rate of recovery was included because it induces the development of causal illusions (e.g., Allan et al., 2005; Matute et al., 2019). Because the programmed contingency between the drug and the healings in this case is zero, any judgments significantly higher than zero are interpreted as an illusion of causality. Following Barberia et al. (2018), we expect this bias to develop in the control group, and we expect our intervention to reduce it at least in group IT, with group T testing whether or not the induction phase is necessary for debiasing.

In the positive contingency condition, the probability that patients recovered from the disease was programmed to be equally high (0.75) when the drug was administered, but low (0.125) when the drug was not administered. Therefore, in this case the drug was effective (i.e., contingency was 0.625) because it increased the probability of healing. Specifically, 6 out of 8 fictitious patients receiving the drug were healed, whereas only 1 out of 8 was healed when they did not take the drug. Causal judgments close to 62.5 are interpreted in this case as an accurate causal judgment between the drug and the healings. We expect all three groups to be accurate on this problem, as is usually the case in a positive contingency task.

Results

Null contingency task

The critical results of this research are those of the null contingency (i.e., causal illusion) learning task. These are shown in Fig. 1. Panel A shows the mean causal judgments and Panel B shows the probability of responding, P(R), in the null contingency task for all three groups. Recall that because the programmed contingency was zero in this task, there was no causal relationship between the drug and the healings. Thus, causal judgments above zero in this condition are indicative of a causality bias.

Fig. 1
figure 1

Mean causal judgments and mean P(R) in the null contingency task. Note. Panel A shows mean causal judgments and Panel B shows mean P(R) in the null contingency (causal illusion) task across groups. Error bars represent the standard error of the mean

As can be observed in Fig. 1A, group C developed a relatively high causality bias, thereby replicating previous studies in the literature. However, this illusion was reduced in the two groups that had received the intervention, groups T and IT. The figure also suggests that the induction phase was not critical in reducing the illusion, as groups IT and T emitted similar causal judgments. These impressions were confirmed by a One-Way ANOVA on the causal judgments. This ANOVA showed a significant main effect of Group, F(2,128) = 5.76, p = 0.004, ηp2 = 0.008. A Tukey post hoc comparison revealed that participants in groups IT and T developed a weaker causal illusion than participants in group C, t(86) = -3.14, p = 0.006, d = -0.68, and, t(83) = -2.72, p = 0.020, d = -0.66, respectively. No significant differences were observed between groups IT and T, t(87) = -0.37, p = 0.926, d = -0.07.

The other critical variable in this task is depicted in Fig. 1B, which shows the probability of responding, or P(R), that is, the number of trials in which the participants chose to administer the drug, divided by the total number of trials. Note that the probability of responding is also reflecting the probability that the cause is present, so this indicates the probability that participants expose themselves to the potential cause. Therefore, a P(R) of 0.50 should be the ideal condition that would allow participants to be equally exposed to what happens when the cause is present and when it is not. As previously noted, however, participants usually tend to respond with a higher P(R), a condition that provides them biased data and increases their illusion.

As can be observed in the Fig. 1B, group C administered the drug in more trials than the other two groups. The probability of drug administration was reduced in the two groups that received the intervention, groups T and IT. These impressions were confirmed by a One-Way ANOVA on P(R). This ANOVA showed a significant main effect of Group, F(2,128) = 4.70, p = 0.011, ηp2 = 0.068. A Tukey post hoc comparison revealed that the average P(R) in group C was significantly higher than in group IT, t(86) = 2.88, p = 0.013, d = 0.57, and in group T, t(83) = 2.38, p = 0.049, d = 0.57, but differences between groups IT and T were not significant, t(87) = -0.47, p = 0.888, d = -0.98. Thus, in line with the judgmental analyses described above, the P(R) analyses also suggest that the intervention was successful and that the induction phase was not the critical aspect of the intervention, as both groups, IT and T, showed a similar reduction of P(R) in the null condition.

Positive contingency task

Figure 2A shows the mean causal judgments and Fig. 2B shows the mean P(R) in the positive contingency task. Recall that in this task the contingency was positive, so there was a causal relationship between the drug and the healings. Thus, in this task all groups should be similarly accurate in their judgements. Figure 2A suggests that both groups T and C detected the positive contingency accurately, but group IT possibly suffered from generalized skepticism. These impressions were confirmed by a One-Way ANOVA on causal judgments. This ANOVA showed a significant main effect of Group, F(2,128) = 6.84, p = 0.002, ηp2 = 0.0120. A Tukey post hoc comparison revealed that participants in group IT gave a lower causal judgment than participants in group T, t(87) = -2.94, p = 0.011, d = -0.68, and group C, t(86) = -3.43, p = 0.002, d = -0.79. No significant differences were observed between groups T and C, t(83) = -0.63, p = 0.804, d = -0.17.

Fig. 2
figure 2

Mean causal judgments and mean P(R) in the positive contingency task. Note. Panel A shows mean causal judgment and Panel B shows mean P(R), in the positive contingency task across groups. Error bars represent standard errors of the mean

As can be observed in Fig. 2B, the P(R) was similar in all three groups. Although it seemed apparently higher in group C than group T, and again, higher in group T than group IT, a One-Way ANOVA on P(R) showed a non-significant effect of Group, F(2,128) = 1.04, p = 0.558, ηp2 = 0.020. Thus, as we expected, the intervention did not affect the P(R) during the positive contingency task.

Forer and Wason´s tests

During the induction phase of our intervention, the participants completed a Forer and a Wason test. Although these were not our critical variables, we here present a summary of these results, in order to ensure that the training phase worked as expected. It is important to recall that participants in group T received training about the Forer and Wason effects before they completed these tests in the post-experimental session. Therefore, the results of this group in these tasks are not directly comparable to those of the other groups, who completed these tests before receiving any training. This being said, we present the results of these comparisons because they might through some light in future educational research looking at how a training phase similar to the one we conducted may serve to improve people’s thinking in Wason and Forer’s tests. The degree with which participants in groups IT and C believed that their personality report (i.e., Forer effect) was accurate, was high, and similar between them (M = 80.64, SD = 18.14, in group IT; M = 76.86, SD = 21.18, in group C). As we expected, however, this was reduced in group T (M = 46.03, SD = 31.70, in group T). This was confirmed by a one-way ANOVA, which showed a significant main effect of Group, F(2,219) = 45.38, p < 0.001, ηp2 = 0.293. A Tukey post hoc comparison revealed that the mean score in group T was significantly lower than the mean score in group IT, t(150) = -8.72, p < 0.001, d = -1.34, and in group C, t(144) = -7.61, p < 0.001, d = -1.13. There was no statistically significant difference between groups IT and C, t(144) = 0.93, p = 0.619, d = 0.19. These results suggest that providing the training on Forer’s effect prior to its induction was effective in reducing this effect.

Table 2 shows the main results observed in this task. The proportion of participants who correctly identified the rule in the Wason test was low and similar for groups IT and C, but significantly higher in group T, for which the induction phase took place after the explanation had been provided.

Table 2 Means (and SDs) for Wason test across groups

Pairwise chi-square tests revealed a significant association between groups IT and T and whether or not participants discovered the rule in the Wason task, χ2 (1) = 37.1, p < 0.001; and between groups T and C, χ2 (1) = 29.5, p < 0.001, but not between groups IT and C, χ2 (1) = 0.43, p = 0.513. No significant differences were found in the confidence scores between groups, F(2,217) = 2.13, p = 0.121, ηp2 = 0.019, but we found a significant main effect of Group on the number of trials used before they emit a final rule, F(2,217) = 5.46, p = 0.005, ηp2 = 0.048. A Tukey post hoc comparison revealed that the mean score in group IT was significantly lower than the mean score in group T, t(150) = -3.16, p = 0.005, d = -0.53, and the mean score in group C, t(144) = -2.39, p = 0.046, d = -0.43. There was no statistically significant difference between groups T and C, t(144) = 0.70, p = 0.761, d = 0.11. Although groups IT and C both had very low percentage of correct responses, they showed a high level of confidence and, in the case of group IT, they also used few trials to type a rule, quickly coming to a wrong conclusion. This suggests that explaining the Forer and Wason’s effects to the class before inducing these effects was effective in reducing their strength.

Discussion

Before discussing the specific contributions of this study, we would like to highlight several theoretical and practical contributions of the present research. On a theoretical level, this work contributes to the scientific literature by providing evidence of the efficacy of a debiasing intervention (and extending a previous result by Barberia et al., 2018), which according to some, should be one of the most relevant goals of modern psychology (Lilienfeld et al., 2009). Research on how to debias against a variety of cognitive biases is relatively sparse (see Arkes, 1991; Larrick, 2004; Lilienfeld et al., 2009 for reviews). In particular, we do not know of any other educational intervention specifically designed to debias people against causal illusions other than the ones discussed herein (i.e., Barberia et al., 2013, 2018). Importantly, this intervention has shown to be effective in adults without producing common undesirable consequences such as the backfire effect (Pronin et al., 2002, 2004). In particular, this paper provides evidence for the efficacy of training-in-rules for the reduction of causal illusions. Other debiasing approaches such as statistical reasoning (e.g., Milkman et al., 2009), training-in-formal logic (Lehman & Nisbett, 1990), or training in critical thinking (Lilienfeld et al., 2009) have yielded mixed or weak results on their efficacy.

In addition to replicating and extending previous debiasing results, the specific goal of this study was to examine the effect of the bias induction phase in the abovementioned debiasing intervention against the causality bias. Thus, in the present study, we aimed to test whether the induction phase used by Barberia et al. (2018) was a critical component of the intervention. We randomly assigned participants to one of three groups as a function of the phases that they were exposed to before the assessment phase on the illusion of causality was conducted: induction + training group (IT), training group (T), and control (C) group. This experimental design allowed us to examine the effect of the induction phase in reducing the causal illusion.

We observed that the intervention in groups IT and T decreased the illusion of causality in the assessment phase, as evidenced by the lower causal judgments provided by these two groups in the null contingency learning task, when compared to the control group. That is, the intervention was effective in reducing the causality bias regardless of whether the induction phase was present (group IT) or not (group T). Importantly, both groups IT and T showed also a lower P(R) than the control group during assessment. As noted in the introduction, a P(R) close to 0.50 should be preferred, as it allows participants to learn what happens when the cause is present and when it is not. However, most of them tend to a higher P(R), which is also the result we observe here in the control group in the present research. Thus, a reduction of this variable in groups IT and T is an additional index that the intervention was successful.

Moreover, the induction phase also generated lower causal judgments in the positive contingency task in group IT as compared to the control group, but not to group T. Because the actual causal relationship was positive in this condition, and because judgements of the control group in this task were highly accurate, then there is no reason why group IT should show a reduced judgement. The lower causal judgments observed in the IT group in this task can probably be interpreted as a general increase in skepticism. The over-skeptical judgments of the participants in group IT in the positive contingency task were not reflected, however, in their p(R), which was similar to that of the two other groups in this scenario.

In sum, in the null contingency scenario the intervention was successful in reducing the causality bias as well as the P(R) of the participants, regardless of whether the induction phase was present or not. At the same time, however, the positive contingency scenario showed that the intervention promoted a more generalized skepticism when the induction phase was included.

Given the current replication crisis in psychology (Camerer et al., 2018; Ioannidis, 2005; Open Science Collaboration, 2015), it is important to replicate and extend those debiasing strategies that appear to work. Overall, the results of the present study not only replicated but also extended those of Barberia et al. (2018). Their intervention group showed a lower causal illusion than the control group in the null contingency task. However, Barberia et al. (2018) only investigated the effect of the intervention, including the induction phase, and only in a null contingency scenario. Thus, it was not possible to conclude whether the results that they observed could occur in the absence of the induction phase, and whether they could be partly due to a generalized increase in skepticism, as we have observed in the present experiment. In the present study, the induction phase appeared to result in excessive skepticism or caution. In contrast, the intervention without induction seems to have encouraged a healthier skepticism (Lewandowsky et al., 2012), protecting participants from the causal illusion while ensuring that they were able to accurately recognize a positive causal relationship when it was present.

Interestingly, the work of Barberia et al. (2013), showed that the effect of the intervention (which included induction) reduced the causal illusion in the null contingency task while preserving accurate causal judgments in the positive contingency task. That is, unlike the present results, they did not observe a generalized promotion of skepticism. There are several procedural differences that could be responsible for the observation of generalized skepticism in the present research and not in the study by Barberia et al (2013). One notable difference between both studies resides in the sample: adolescents in the 2013 study, undergraduate students in the present research, as well as in Barberia et al. (2018). In addition, the training phase of the 2013 study focused on improving scientific thinking through a tutorial on scientific methods and experimental design, including reasoning about cause-outcome relationships. By contrast, in the 2018 study and in the one presented herein, this phase was more general and focused on reducing other cognitive biases, such as the Forer and Wason effects, through a training-in-rules (Larrick, 2004) and the considering-the-opposite strategy. Thus, the present research seems to indicate that an induction phase focused on several biases and followed by a training-in-rules phase seems to increase general skepticism. However, a bias induction phase more focused on the illusion of causality, followed by training on the scientific method and the experimental control of variables seems to increase a healthier skepticism according to the results of Barberia et al. (2013).

On a practical level, and in light of the present results, if generalized skepticism is not desired, an intervention without the induction phase should in principle be preferred, at least when working with adults. However, an induction phase focused on the causality bias may be advisable to promote healthy skepticism if complemented by training based on the scientific method and the experimental control of variables (Barberia et al., 2013).

We would also like to highlight several contributions of this research to the science and practice of education. To the best of our knowledge, our study is the first one to show the effectiveness of a debiasing intervention in future teachers. Thus, evidence-based interventions, such as the one presented herein, are an effective approach to reduce causal biases and increase skepticism among future teachers. This is important given that pseudoscience is particularly problematic in education as it transcends the quality of teaching and, therefore, the academic performance of students (CERI, 2007). Despite current scientific advances, pseudoscience is increasingly present in schools. For example, different studies indicate that teachers believe in a substantial number of myths about the brain related to how people learn (i.e., Dekker et al., 2012; see Ferrero et al., 2016, for a meta-analysis). There are also several popular educational programs that include myths and misconceptions, which are not evidence-based (Busso & Pollack, 2015; Sylvan & Christodoulou, 2010). Beliefs in pseudoscientific educational practices can results in the avoidance of valid conventional practices, similar to what occurs with pseudoscience in the heath domain (Freckelton, 2012). Depriving children of evidence-based practices can have particularly negative consequences for children with special needs, learning disabilities, and from disadvantaged backgrounds, which threats equity (OECD, 2012). In addition, the impact that pseudoscientific practices have on education is often underestimated, and many educational professionals, although with the best of intentions, believe that there is little or no harm in trying alternative practices with their students (Smith, 2015). According to the results of laboratory experiments on the causal illusion (Blanco et al., 2014), it could be argued that the belief that pseudoscience in education is harmless can increase its use in the classroom. Therefore, in everyday life, providing information about the side effects of pseudoscience in education could be a valuable step to reduce the frequency of their use and thus to reduce the coincidence between using pseudoscientific methodologies and learning. In addition, children and adolescents are especially vulnerable to pseudoscientific beliefs, as they are in a crucial stage of development of their reasoning and critical thinking skills (Gopnik & Graf, 1988; Piaget & Cook, 1952). Children are also potential victims of adults’ decisions and educational policies derived from misbeliefs. For example, the misbelief that vaccines cause autism (ASD) has prompted some parents to refuse to vaccinate their children (McDonald et al., 2012), with the subsequent impact on the evolution of measles in children from high-income countries (Trentini et al., 2019).

For all these reasons, reducing the illusion of causality among future teachers is of great importance, and the current research is a step contributing in that direction. In the future, we believe that it would be interesting to include some follow-up evaluation through a longitudinal study. This may be important given that data with undergraduate students suggest that misinformation may persist unless it is addressed repeatedly and explicitly (Ecker et al., 2017; Winer et al., 2002). In addition, it would be of great interest to include additional generalization measures to assess whether the effect of the intervention is transferred to different problems and contexts (Morewedge et al., 2015).