Introduction

How do people change their minds about moral matters? People generally tend to be conservative about changing their moral decisions (e.g., Haidt, 2001; Stanley et al., 2018). In five experiments, we examined how quickly people change their minds about a situation they initially judged as morally unacceptable, when they were prompted to think about alternative possibilities. Suppose a man in an airplane about to take his seat in front of you calls the stewardess and says he does not want to sit next to a Muslim passenger seated in the row, and he tells the stewardess the passenger must be moved to another seat. To what extent is his behavior morally acceptable, in your opinion? Suppose you decide it is not at all morally acceptable, but you try to imagine very quickly whether there could be some circumstances in which his behavior would be morally acceptable. You might wonder whether the Muslim passenger had been rude to him so the reason he wanted him to be moved wasn’t to do with the man’s religion. Given this imagined alternative circumstance, to what extent do you now think the man’s behavior is morally acceptable?

Our first question is whether immoral actions become less morally unacceptable when people imagine how they would have been moral, even in the absence of further facts. We aim to examine whether alternative possibilities in which an action could be considered morally justified rather than immoral, are rapidly accessible or whether the moral imagination requires effortful reflection. Intuition and deliberation may interact in various ways (e.g., Bialek & De Neys, 2017; Evans & Stanovich, 2013; Greene et al., 2001; Gürcay & Baron, 2017; Kahneman, 2011), and intriguingly, people seem to assume that immoral events are impossible at first sight (e.g., Phillips et al., 2015; Phillips et al., 2019; Phillips & Cushman, 2017). The more possible an event is, the more moral it is judged to be (e.g., Shtulman & Tong, 2013). Hence, we test the hypothesis that people can rapidly access counterfactual alternatives that go against their initial judgment that an action is immoral.

Our second question is about unreasonable actions. Suppose the man had instead told the stewardess he did not want to sit next to any passenger and told her to move everyone in the row to somewhere else. To what extent is this behavior reasonable? Can you think of alternative circumstances in which it would have been rational? Our question is, granted that immoral actions can come to be considered less morally unacceptable, can unreasonable actions seem less irrational? Given that both immoral and irrational behaviors violate prescriptive norms, and irrational actions are also sometimes considered impossible at first sight (e.g., Phillips & Cushman, 2017), we test the hypothesis that people can also rapidly access counterfactual alternatives that go against their initial judgment that an action is irrational.

Of course, people can update their moral judgments as they learn new information, since the context of the action can change the valence of their evaluation (e.g., Andrejević et al., 2020; Simpson et al., 2016; Tepe & Aydinli-Karakulak, 2019). Our interest is whether they can do so even in the absence of further information, merely as a result of imagining alternatives. The current theories that guide our hypothesis are drawn from three separate lines of inquiry: moral judgment updating, everyday non-monotonic reasoning, and counterfactual imagination, which we consider in turn.

A key issue for theories of moral judgment has been to explain how people can tolerate exceptions to their belief in a moral principle, such as doing no harm to others, when a dilemmic conflict exists, such as needing to protect others (e.g., Goodwin, 2017; Gray & Keeney, 2015; Haidt, 2012; Mikhail, 2013; Schein & Gray, 2018). People’s moral judgments are affected by beliefs about human agency (e.g., Alicke, 2000), intentionality (e.g., Cushman, 2008; Malle & Holbrook, 2012), outcome severity (e.g., Mazzocco et al., 2004), social relationships (e.g., Tepe & Aydinli-Karakulak, 2019), and by emotions (e.g., Greene et al., 2009; Greene & Haidt, 2002) and justifications (e.g., Haidt, 2001; Malle et al., 2014). Such studies clarify many of the effects of contextual information on moral judgments, but leave unanswered the question of whether people have rapid access to alternative possibilities that go against their initial moral judgments. In many circumstances, reasoning may justify, rather than revise, immediate moral judgments that have arisen from effortless, affective processes (e.g., Haidt, 2001, 2007; Luo et al., 2006; Royzman et al., 2015; Sanfey et al., 2003; Tepe et al., 2016; Ugazio et al., 2012; Valdesolo & DeSteno, 2006; Wheatley & Haidt, 2005). In some circumstances, reasoning may itself contribute to immediate moral judgments (e.g., Bloom, 2010; Haidt, 2012; Horne et al., 2015; Maki & Raimi, 2017; Paxton et al., 2012; Paxton & Greene, 2010; Wiech et al., 2013). Nonetheless, people tend not to change their initial moral judgments when presented with opposing reasons (e.g., Stanley et al., 2018), and they tend to resist negotiation on moral issues (e.g., Skitka, 2010; Skitka et al., 2005; Turiel, 2002). Updating initial moral judgments in the absence of sufficient reasons seems counterintuitive and may require some effort (e.g., Haidt, 2001, 2012). Hence, our question is, granted that immoral actions can become less morally unacceptable when people imagine how they would have been moral, are such counterfactual alternatives to initial moral judgments immediately accessible?

The first tenet of our theory draws on the idea that inferences are non-monotonic (i.e., defeasible), in that conclusions can be withdrawn on receipt of further information, through cognitive processes that combine premise information with background knowledge (e.g., Cariani & Rips, 2017; Espino & Byrne, 2020; Oaksford & Chater, 2018; Stenning & Van Lambalgen, 2012), so that belief revision maintains epistemically entrenched beliefs (Elio & Pelletier, 1997; Gärdenfors, 1988; Pollock, 1987). We suggest that people interpret exceptions to moral principles as counterexamples, incorporated into a cohesive model by constructing arguments to reconcile the premise that justified a conclusion (the passenger’s behavior towards the Muslim man was discriminatory, therefore morally unacceptable), with additional knowledge that refines assumptions (the passenger may have been reacting to the man’s rudeness), to ensure the conclusion is no longer warranted (the behavior was not discriminatory, so it is less morally unacceptable). The revision of belief in the original conclusion maintains the entrenched moral principle, by identifying a different set of facts within which to interpret the behavior, or a competing moral principle that trumps it (given that people may be resistant to imagining changes to the norm itself, e.g., Gendler, 2000).

The second tenet is that the counterfactual imagination provides one of the missing mechanisms contributing to the non-monotonicity of moral inferences. People often think about how things could have been different, especially after bad outcomes and unexpected events (e.g., Byrne, 2016; Kahneman & Tversky, 1982; Markman et al., 1993; Roese, 1997). They create models to mentally simulate actions and their outcomes (e.g., Byrne & Johnson-Laird, 2020; Byrne & Timmons, 2018; Cushman, 2013; Kahneman & Tversky, 1982; Markman et al., 2008). What they select to change in their models of reality to create a counterfactual alternative depends on the availability of alternatives, guided by norms about what is usual – physically, socially, morally, and intra-personally – including descriptive norms based on statistical averages and prescriptive ones based on moral ideals (e.g., Bear & Knobe, 2017; Halpern & Hitchcock, 2015; Henne et al., 2019; McCloy & Byrne, 2000; Phillips et al., 2015; Roese, 1997). Abnormal events recruit their normal counterparts from memory and the retrieved default possibilities may be sampled for those that are morally good (e.g., Kahneman & Miller, 1986; Khemlani et al., 2018; Phillips et al., 2019; Phillips & Cushman, 2017). Although it is established that counterfactuals can amplify moral judgments so that a morally bad event is considered to be even worse (e.g., Alicke, Buckinghman, Zell, & Davis, 2008; Lench et al., 2014; Malle et al., 2014; Migliore et al., 2014; Parkinson & Byrne, 2017, 2018; Timmons & Byrne, 2018), a gap in current theories is whether counterfactuals can reverse a moral judgment, so that a bad event is judged to be less bad. People can update moral judgments when they are explicitly provided with additional information, for example, about known reasons for an action, but whether they can do so on the basis of imagined counterfactual circumstances is untested (e.g., Cone & Ferguson, 2015; Mann & Ferguson, 2015; Monroe & Malle, 2019; Sabo & Giner-Sorolla, 2017; Stanley et al., 2018). People can imagine mitigating circumstances but their tendency to do so is affected by the emotions elicited by a moral transgression (e.g., Piazza et al., 2013). Yet people tend to imagine how things could be better rather than worse (e.g., De Brigard et al., 2013a, b; Rim & Summerville, 2014), and so it is plausible that they can replace something morally bad in the actual world with something morally good in an imagined alternative. Of course, it is possible to imagine an alternative scenario that is better in some way to what actually happened, but which does not necessarily alter the moral appropriateness of the event. Nonetheless, since the default representation of possibilities can be framed by morality, the rapid imagination of the possibilities for a moral behavior may be feasible (e.g., Phillips & Cushman, 2017). We suggest that immediate counterfactuals deliver the moral counterpart of an immoral action (such as that the man was not acting in a discriminatory manner), enabling rapid access to reasons (he was reacting to rudeness instead).

We consider that the process by which people construct counterfactual alternatives may not require effortful reflection. Counterfactual explanations may be constructed by processes comprised of immediate access to default possibilities, or reflective construction of considered arguments. A gap in current theories is how intuitive counterfactuals relate to deliberative ones (e.g., Goldinger et al., 2003; Roese et al., 2005). In contrast, extensive evidence has been gathered about whether moral judgments are guided by intuition or reason (e.g., Greene et al., 2004; Greene et al., 2008; Haidt, 2012; Luo et al., 2006; Moore et al., 2008; Sanfey et al., 2003; Suter & Hertwig, 2011), and on the role of emotions in moral judgment (e.g., Haidt et al., 1993; Russell & Giner-Sorolla, 2011a, 2011b; Ugazio et al., 2012; Wheatley & Haidt, 2005). No agreement currently exists on how dual processes of intuition and reason may interact. For example, they could occur sequentially, with poorer quality, fast intuitions overridden by better quality slower reflections (e.g., Evans & Stanovich, 2013). Alternatively, they could occur independently, even in parallel, with quick responses not always wrong, and slow ones not always right, and conflict detection occurring regardless of response choice (e.g., Bialek & De Neys, 2017; Bucciarelli et al., 2008; Gubbins & Byrne, 2014; Gürcay & Baron, 2017; Shtulman & Tong, 2013; Stupple & Ball, 2008; Trippas et al., 2017). Immediate counterfactuals could deliver the moral counterpart of an immoral action (the man was not acting in a discriminatory manner), enabling rapid access to reasons (he was reacting to rudeness instead). Subsequent reflection, rather than overriding an immediate thought with a competing one, can develop it into an elaborate counterargument, that is, the dual processes could operate in sequential co-operation rather than only in sequential competition.

In five experiments we tested these proposals by asking people to judge the moral acceptability of a set of immoral actions, to try to imagine alternative ways each one would be moral, and to provide their judgments of them again. If moral judgments are rigidly anchored in values driven by automatic processes, their judgments will remain immovable before deliberative reflection; if they are open to moderation by justification, then an imaginative shift will occur effortlessly, to the action being considered less morally unacceptable; our theory predicts the latter.

To address the second question of whether counterfactual explanations are immediately available or require reflection, we asked people to try to imagine and describe in writing alternative ways in which the behavior would be moral either very quickly or to take their time to reflect carefully (see Fig. 1). To address the question of whether counterfactuals affect immoral actions differently from irrational ones, we asked people to judge the moral acceptability of immoral actions and the rationality of unreasonable ones. If the representation of possibilities tends toward moral and rational actions, the same pattern should be found for both.

Fig. 1
figure 1

Schematic representation of the experimental trials. (a) Example of a baseline moral judgment. In each experiment participants completed judgments in the baseline phase first. (b) Example of the immediate counterfactual task: Participants imagined some different circumstances and completed a counterfactual sentence stem task for each action in 20 s, and then made their judgment of it again on the next screen. (c) Example of a reflective counterfactual task: Participants completed the counterfactual task for each action taking as much time as they required and then made their judgment of it again. (d) Example of a factual task: Participants wrote a short title for each action and then made their judgment of it again. (e) Example of alternative instructions designed to re-focus on the described behavior rather than the counterfactual circumstances, illustrated for the immediate counterfactual task

Experiment 1

The aim of the experiment was to examine whether a person’s judgment of the morality of an immoral action changes after they have imagined how the action would have been moral. We compared an “immediate” condition in which participants imagined alternatives, under a time constraint of 20 s in which they had to read about the behavior and make their judgment, to another “reflective” condition in which they imagined alternatives under no time constraints. We included a third control condition in which participants did not imagine alternatives but instead carried out the task of providing a title that describes the action (see Fig. 2). The factual task was intended to engage participants in comparable elaborative processing by requiring them to consider a description that succinctly summarized the behavior (e.g., Bransford & Johnson, 1972). It controls for the potential of a demand characteristic, that when individuals are asked to re-evaluate their initial judgments a second time, they may believe they are expected to change their initial judgment. The experiment also examined whether a person’s judgment of the rationality of an unreasonable action changes after they have imagined how the action would have been rational.

Fig. 2
figure 2

Schematic representation of the experimental designs. Illustration of the sequence of events in the experiments. In all experiments, participants judged the moral acceptability of a set of immoral actions, or the rationality of a set of unreasonable actions. In the first phase, they made their baseline judgments, in the subsequent phases, they carried out a counterfactual task for each action and provided their judgment of it again. The counterfactual task required participants to complete a sentence stem ‘it would have been morally acceptable if…’ for the immoral actions (or ‘it would have been rational if…’ for the irrational actions). In the immediate counterfactual condition, they were required to do so in 20 s, in the reflective counterfactual condition they did so with no time constraints. In Experiment 1, participants completed two phases only, and the second phase was either immediate or reflective, or a factual control task, in a between-participants design. In Experiments 2a, b, and 4, participants completed three phases, the second phase was immediate and the third phase was reflective, in a within-participants design. In Experiment 3 participants completed three phases, and they corresponded to either the immediate-first sequence of the previous experiments or to a reflective-first, immediate-both or reflective-both sequence

Method

Participants

The participants were 186 US and UK volunteers recruited through the online platform Prolific who received £0.85 sterling for participation, and there were 131 women, 54 men and one non-binary individual, with a mean age of 34 years and an age range of 18–73 years. They were assigned at random to six groups (for distribution across groups see Table S1 in the Online Supplementary Materials (OSM)). The planned sample sizes were motivated by power analyses conducted with G*power (Faul et al., 2009). A sample size of 162 participants is required to provide at least 80% power to detect a medium-sized effect at p < .05, to test a main effect of immediate versus reflective counterfactuals on individuals’ judgments in an analysis of variance (ANOVA) with the design of 2 (judgment phase: first vs. second judgments) × 3 (counterfactual task: immediate, reflective, factual) × 2 (judgment content: judgments of morality for immoral actions vs. judgments of rationality for irrational actions) with repeated measures on the first factor. We restricted access to Prolific participants who were native English speakers, above 18 years of age, and who answered correctly a “robot-detection” picture-matching question (one participant was eliminated because of their incorrect answer). Prior to any data analysis we eliminated participants from the recruited 209 participants who failed to complete all the tasks (12 participants), failed the attention check question (five participants), or who had carried out a similar study (five participants). The experiments received prior approval from the School of Psychology Ethics Committees of Trinity College Dublin and Istanbul University. For all the studies, the participants gave their informed consent, and we report all of our manipulations and measures. After the experiments participants completed several demographic and personality measures (see OSM).

Materials and design

A set of immoral and irrational scenarios were used, and each consisted of a single sentence that contained a scene-setting clause and an action (see OSM). The materials were three immoral actions, and three matched unreasonable actions, adapted from the previous literature (Phillips & Cushman, 2017; Tepe & Aydinli-Karakulak, 2019). The materials were presented to the participants in their native language of English.

The key measures were answers to two questions: “To what extent do you think this behavior is morally acceptable?” and “To what extent do you think this behavior is rational?” Participants provided their judgments on a 0–100 slider scale with 0 labelled “not at all” and 100 labelled “definitely.” As a control to ensure that all participants were exposed to the same sorts of judgments for every action, they judged not only the morality of immoral actions but also their rationality, and not only the rationality of irrational actions but also their morality (and their judgments to these additional measures, which were highly correlated, are provided in Table S1 in the OSM).

The experiment included judgment content as a between-participants factor: participants either received the set of immoral actions, or the set of irrational actions. The type of task participants carried out – immediate-counterfactual, reflective-counterfactual, or factual title – was the second between-participants factor. Every participant first provided their judgments about to what extent the actions were moral or to what extent the actions were rational for the set of actions, each presented on a separate screen, in the “baseline” phase (see Fig. 1a). To examine the effects of counterfactual thoughts they then received instructions on a separate screen, for either the immediate-counterfactual condition (see Fig. 1b), the reflective-counterfactual condition (see Fig. 1c), or the factual control condition (see Fig. 1d). In the “immediate” condition, a 20-s counter counted down on screen and the program moved on to the next screen automatically after 20 s. The timer started as soon as the participant moved to the screen and so the 20 s allowed includes the time taken to read the instructions and the scenario (see Fig. 1b). Twenty seconds is thus a very short time indeed to try to imagine alternatives and to jot them down, given that even to read the instructions and the scenario takes that long for the average reader.Footnote 1

In the “reflective” condition, there was no time restriction. In the control “factual” condition, participants wrote a short title for each action and no time restriction was applied. Every participant produced judgments first in phase 1 (the baseline judgments) and second in phase 2 (after their task – immediate, reflective, or factual), and thus judgment phase was a within-participants variable. Hence the design included the between-participants factors of judgment content (immoral, irrational) and type of task (immediate-counterfactual, reflective-counterfactual, or factual task), and the within-participants factor of judgment phase (first vs. second judgments).

Participants completed two judgments (moral acceptability and rationality) for three actions (either immoral or irrational) in the baseline phase and the same again in the second phase, i.e., 12 judgments in total. The order of judgments was randomized in all experiments. To control for order effects, the materials were presented in two different randomized orders in each experiment, and no order effects were found in any experiment (see the section on the full statistical tests in the OSM).

Procedure

The materials were presented online using Qualtrics in each experiment. Participants received instructions for each study that it aimed to examine how people think about various events and that it was a study of everyday judgment, in which the aim was to examine the sorts of answers that most people provide. They were asked to take part only if they were willing to consider the tasks seriously, and instructed that they should do the study in a quiet place where they would be uninterrupted for its duration. Each task was presented on a separate screen and participants could not return to an earlier screen once they had provided their judgment.

Results and discussion

The datasets for this experiment and the subsequent experiments are available via the Open Science Framework at: https://osf.io/mw94z/.

We compared the judgments of the morality of immoral actions and the rationality of irrational actions in an ANOVA with the design of 2 (judgment phase: first vs. second judgments) × 3 (counterfactual task: factual, immediate, reflective) × 2 (judgment content: judgments of morality for immoral actions vs. judgments of rationality for irrational actions) with repeated measures on the first factor. In this experiment and subsequent ones when assumptions of homogeneity of variance were violated, we corrected degrees of freedom using the Greenhouse-Geiser and Welch-Satterthwaite corrections as appropriate. The results showed that immoral actions were judged to be less unacceptable when people imagined how they could have been moral. Participants’ judgments increased in the second phase compared to the first, as shown by a main effect of judgment phase, F (1,180) = 259.59, p < .001, ηp2 = .591, 90% CI (0.516, 0.647), and they increased when they created an immediate or reflective counterfactual compared to a factual title, as shown by a main effect of task, F (2, 180) = 57.404, p < .001, ηp2 = .389, 90% CI (0.300, 0.463). There was also a main effect of judgment content, F (1,180) = 51.652, p < .001, ηp2 = .223, 90% CI (0.139, 0.305), as judgments of the morality of immoral actions were lower than judgments of the rationality of irrational actions. Nonetheless the same pattern was observed for both sorts of content, which did not interact with phase, F (1,180) =1.416, p = .236, ηp2 = .008, 90% CI (0.000, 0.042) or task, F (2, 180) = 1.569, p = .211, ηp2 = .017, 90% CI (0.000, 0.053), and there was no interaction of all three variables, F (2,180) = 0.452, p = .637, ηp2 = .005, 90% CI (0.000, 0.026), see Fig. 3a.

Fig. 3
figure 3

In Experiment 1 participants created either immediate or reflective counterfactuals, or else constructed a factual title for the action. In (A) their mean judgments for the first and second phase are presented for the moral acceptability of immoral actions and the rationality of irrational actions. In (B) the difference scores for the judgment change from the first phase to the second are presented. Plots of data in Experiment 1 are based on 186 UK and US participants. Error bars are standard error of the mean

Judgment phase and counterfactual task interacted, F (2,180) = 48.973, p < .001, ηp2 = .352, 90% CI (0.258, 0.428) as Fig. 3a shows. We decomposed the interaction with a Bonferroni-corrected alpha of p < .0056 for nine comparisons. The comparisons showed that the increase in participants’ judgments in the second phase compared to the first occurred only when participants created immediate counterfactuals, t (58) = 12.957, p < .001, d = 1.687, 95% CI (1.285, 2.082), and reflective ones, t (63) = 10.424, p < .001, d =1.303, 95% CI (0.966, 1.634), but not when they thought of a title, t (62)=2.305, p = .025, d = 0.29, 95% CI (0.037, 0.541), on the corrected alpha of p < .0056. This result shows that the increase in judgments cannot be attributed to extraneous factors such as repetition, practice, or task demands.

The comparisons to decompose the interaction of phase and task also showed that participants’ judgments increased in the second phase when they created an immediate counterfactual compared to a title, t (99.818) = 10.117, p < .001, d = 1.833; 95% CI (1.393, 2.266), and when they created a reflective counterfactual compared to a title, t (114.163) = 10.501, p < .001, d = 1.864, 95% CI (1.437, 2.284), but there was no difference between creating an immediate or reflective counterfactual, t (121) = 0.369, p = .713, d = 0.067, 95% CI (−0.288, 0.420), see Fig. 3a. Jotting down a few words quickly to convey the first thought that comes to mind was as effective as taking time to reflect carefully. Finally, as an important baseline, we confirmed that there were no differences in first-phase judgments in the three conditions: factual versus immediate, t (120) = 1.636, p = .105, d = 0.29, 95% CI (−0.060, 0.639); factual versus reflective, t (103.557) = 1.513, p = .133, d = 0.269, 95% CI (−0.082, 0.618); and immediate versus reflective, t (121) = 0.198, p = .843, d = 0.036, 95% CI (−0.318, 0.389). (Welch-Satterthwaite df corrections were applied for judgments in the baseline and reflective phases that violated Levene’s test for equality of variance, p < .001 in both cases.)

Judgment change scores

To probe further, we constructed judgment-change scores (difference scores based on subtracting the mean judgment scores in the first phase from the second phase) for the three groups (see the OSM). We carried out a 3 (judgment change: baseline-to-factual, baseline-to-immediate, baseline-to-reflective) × 2 (judgment content: judgments of morality for immoral actions vs. judgments of rationality for irrational actions) ANOVA with repeated measures on the first factor, on the judgment change difference scores. Participants’ judgments changed from the first phase to the second when they constructed an immediate counterfactual, or a reflective one, more than when they thought of a title, as shown by a main effect of judgment change, F (2, 180) = 48.971, p < .001, ηp2 = .352, 90% CI (0.258, 0.428), see Fig. 3b. Immoral actions became less morally unacceptable when participants imagined how they could have been moral, and the same was so for irrational actions. There was no main effect of judgment content, F (1, 180) = 1.416, p = .236, ηp2 = .008, 90% CI (0.000, 0.042), and no interaction, F (2, 180) = 0.452, p = .637, ηp2 = .005, 90% CI (0.000, 0.026).

Counterfactuals

We categorized the counterfactuals participants created (see Table 1). Participants tended to create facts-based counterfactuals, for example, “the action would have been morally acceptable if the Muslim passenger had been rude,” that indicate the action is not immoral because other facts explain it, or they created dilemma-based counterfactuals, for example, “the action would have been morally acceptable if the Muslim passenger had been acting threateningly,” that indicate that the action is immoral, but it is in response to a dilemmic conflict with another moral action that justifies it, for example, to protect others. For immoral actions, participants produced more facts-based counterfactuals than dilemma-based ones, whereas for irrational actions they did the opposite. Counterfactual analyses are presented in the OSM.

Table 1 Examples of different ways participants imagined an immoral action would have been moral, or an irrational action would have been rational, illustrated for one of the actions

Overall, the results show that people’s judgment of the morality of an immoral action changes after they have imagined how the action would have been moral. An immoral action was considered less immoral when participants imagined alternatives for only 20 s, or when they deliberated in their imagination of alternatives with no time constraints. This judgment shift did not occur when participants did not imagine alternatives but instead described the action. The same imaginative shift occurred for judgments of the rationality of an unreasonable action. The next experiments were designed to find out whether an additional moral shift occurs when participants reflect carefully, after the first thought that comes to mind.

Experiments 2a and 2b

The aim of the experiments was to examine whether there is an additional imaginative shift in judgments of the morality of an immoral action when participants first imagine alternatives for only 20 s, and then subsequently imagine alternatives under no time constraints (see Fig. 2). We also extend our material set to a larger set of immoral and irrational actions in Experiment 2a.

We extend the materials even further in Experiment 2b to examine not only possible actions but also impossible ones, for example, “A passenger in an airplane does not want to sit next to a Muslim passenger and so he tells the stewardess the passenger must be moved to the moon.” In the previous experiment, we tested counterfactual possibilities about immoral or irrational behaviors that are physically possible, that is, they can happen in real life, and the results showed that people can rapidly imagine counterfactual possibilities that turn an immoral event into a less immoral one. Are counterfactual possibilities accessible even for situations that are physically impossible, that is, they cannot happen in real life? In both experiments, we again examine judgments of the rationality of unreasonable actions as well as judgments of the morality of immoral actions.

Method

Participants

The participants in Experiment 2a were 164 students from the University of Istanbul who volunteered in return for course credits. There were 135 women and 29 men, with a mean age of 22 years and an age range of 18–59 years. They were assigned at random to two groups (see Table S3, OSM). We tested as many students as volunteered from the undergraduate module who were invited to participate. A sample size of 105 participants is required to provide at least 80% power to detect a medium-sized effect at p < .05 for the main effect of immediate versus reflective counterfactuals in the 3 (judgment phase: baseline, immediate-counterfactual, reflective-counterfactual) × 2 (judgment content: judgments of morality for immoral actions vs. judgments of rationality for irrational actions) design with repeated measures on the first factor. In Experiment 2b, 79 UK participants recruited through Prolific received £0.85 for participation, and there were 59 women and 20 men, with a mean age of 18 years and an age range of 18–33 years. They were assigned at random to four groups (see Table S5, OSM). In Experiment 2b a post hoc power test indicated the sample size provides 80% power to detect a large-sized effect at p < .05 for a main effect of possible versus impossible actions in the 3 (judgment phase: baseline, immediate-counterfactual, reflective-counterfactual) × 2 (possibility: possible actions vs. impossible actions) × 2 (judgment content: judgments of morality for immoral actions vs. judgments of rationality for irrational actions) design with repeated measures on the first factor (but only approximates 60% power to detect a medium sized effect), and so we consider this experiment an exploratory test and interpret its results with caution. The materials were presented to the participants in their native language of Turkish (Experiment 2a) or English (Experiment 2b).

None of the Turkish students had taken part in a similar study previously, and Prolific participants were excluded if they reported having done so (ten participants were removed from Experiment 2b). We restricted access to Prolific participants who were native English speakers, above 18 years of age, and who answered correctly a “robot-detection” picture-matching question (one participant was eliminated because they did not do so correctly). Also prior to any data analysis we eliminated participants who failed to complete all the tasks (one participant in Experiment 2a and seven in 2b), and those who failed the attention check question (five participants in 2a and none in 2b), resulting in 164 participants in Experiment 2a, and 79 participants in Experiment 2b.

Materials, design, and procedure

The materials were similar to the previous experiment. Experiment 2a used a larger set of eight scenarios, which varied the content for immoral and unreasonable actions, to test further whether counterfactuals affect immoral actions differently from irrational ones. Experiment 2b used the same materials as Experiment 1, but we examined not only actions that are possible, but also matched ones that are impossible, at least in an everyday situation (see the OSM for details). The materials were chosen from a larger set tested in a pilot study (see Table S10, OSM).

The measures were also similar to the previous experiment except that in Experiment 2a participants were also asked: “To what extent is it possible to think of this behavior as morally acceptable/rational?” Their judgments to these additional measures are provided in Table S3 in the OSM, since whether the questions were phrased with certainty or in terms of possibility had no effect. Accordingly, in Experiment 2a participants completed four judgments (moral acceptability, rationality, moral possibility, rational possibility) for four actions (either immoral or irrational) in the three phases of baseline, immediate, and reflective, that is, 48 judgments in total. In Experiment 2b participants completed two judgments (moral acceptability and rationality) for three actions (either immoral or irrational) in the three phases of baseline, immediate, and reflective phase, that is, 18 judgments.

The design of each experiment was similar to Experiment 1, with one main exception: Participants made judgments in a baseline phase, then they thought about some alternative circumstances for just 20 s and completed the judgments a second time; then they thought about alternative circumstances with no time constraints and completed the judgments a third time (see Fig. 2). In Experiment 2a the between-participant factor judgment content again had two levels (immoral, irrational) and the within-participant factor of judgment phase had three levels (baseline, immediate-counterfactual, reflective counterfactual), as participants carried out both the immediate task, and then the reflective task (see Fig. 2). In Experiment 2b there was an additional between-participants factor of possibility, to compare possible actions to impossible actions. The procedure in each experiment was the same as in Experiment 1.

Results and discussion

In Experiment 2a the ANOVA was a 3 (judgment phase: baseline, immediate counterfactual, reflective counterfactual) × 2 (judgment content: judgments of morality for immoral actions vs. judgments of rationality for irrational actions) design with repeated measures on the first factor. In Experiment 2b the ANOVA was a 3 (judgment phase: baseline, immediate-counterfactual, reflective-counterfactual) × 2 (possibility: possible actions vs. impossible actions) × 2 (judgment content: judgments of morality for immoral actions vs. judgments of rationality for irrational actions) design with repeated measures on the first factor.

Once again immoral actions were judged less unacceptable when people imagined how they could have been moral. Participants’ judgments shifted as they progressed through the three phases, as shown by main effects of judgment phase, in Experiment 2a, F (1.606, 260.10) =194.526, p < .001, ηp2 = .546, 90% CI (0.480, 0.597), see Fig. 4a; and Experiment 2b, F (1.609, 120.69) =230.132, p < .001, ηp2 = .75, 90% CI (0.690, 0.794), see Fig. 4b. Judgment content also showed a main effect, in Experiment 2a, F (1, 162) =91.472, p < .001, ηp2 = .361, 90% CI (0.265, 0.443), and Experiment 2b, F (1, 75) =34.246, p < .001, ηp2 = .313, 90% CI (0.173, 0.432), as participants’ judgments of the morality of immoral actions were lower than their judgments of the rationality of irrational actions. There was no main effect of the possibility or impossibility of the actions in Experiment 2b, F (1, 75) =0.770, p = .383, ηp2 = .010, 90% CI (0.000, 0.076), see OSM, Tables S3 and S5.

Fig. 4
figure 4

In Experiments 2a and b participants constructed immediate counterfactuals and then reflective ones. Their mean judgments for the first, second, and third phase are presented for the moral acceptability of immoral actions and the rationality of irrational actions in (A) for Experiment 2a, and in (B) for Experiment 2b. The difference scores for the judgment change from one phase to another are presented in (C) for Experiment 2a, and in (D) for Experiment 2b. Plots of data in Experiment 2a are based on 164 students from the University of Istanbul, Turkey, and in Experiment 2b on 79 UK participants. Error bars are standard error of the mean

There was an interaction of judgment content with judgment phase, in Experiment 2a, F (1.606, 260.10) = 3.217, p = .052, ηp2 = .019, 95% CI (0.000, 0.053) (Greenhouse-Geiser corrections were applied to the degrees of freedom because of the violation of sphericity), and in Experiment 2b, F (1.609, 120.69) = 3.630, p = .039, ηp2 = .046, 90% CI (0.001, 0.114), see Fig. 4a and b. None of the other interactions in Experiment 2b were significant (see OSM).

We decomposed the interactions of judgment content with judgment phase using Bonferroni-corrected alphas of p < .0056 for nine comparisons in each experiment. Participants’ judgments continued to shift when they created reflective counterfactuals in the third phase compared to immediate ones in the second phase, in Experiment 2a, for immoral actions, t (82) = 5.978, p < .001, d = 0.66, 95% CI (0.417, 0.892), and irrational actions, t (80) = 6.137, p < .001, d = 0.682, 95% CI (0.438, 0.922), and in Experiment 2b, for immoral actions, t (37) = 5.538, p < .001, d = 0.873, 95% CI (0.444, 1.243), although for irrational actions the difference was not significant on the corrected alpha of p < .0056, t (40) = 2.447, p = .019, d = 0.382, 95% CI (0.063, 0.697).

They also judged the immoral actions to be more immoral in the baseline condition compared to the immediate one, Experiment 2a: t (82) = 8.54, p < .001, d = 0.937, 95% CI (0.677, 1.194), Experiment 2b, t (37) = 8.72, p < .001, d = 1.415, 95% CI (0.958, 1.862), and compared to the reflective one, Experiment 2a t (82) = 11.341, p < .001, d = 1.245, 95% CI (0.956, 1.53), Experiment 2b, t (37) = 12.111, p < .001, d = 1.965, 95% CI (1.412, 2.507). Likewise, they judged the irrational actions to be more irrational in the baseline condition compared to the immediate one, Experiment 2a, t (80) = 9.431, p < .001, d = 1.048, 95% CI (0.774, 1.317), Experiment 2b, t (40) = 12.299, p < .001, d = 1.921, 95% CI (1.397, 2.436), and compared to the reflective one, Experiment 2a, t (80) = 11.784, p < .001, d = 1.309, 95% CI (1.010, 1.605), Experiment 2b, t (40) = 13.364, p < .001, d = 2.087, 95% CI (1.044, 3.131), see OSM for further comparisons.

Judgment change difference scores

Judgment change difference scores from the immediate to the reflective phase were less than from the baseline to immediate phase, or from the baseline to the reflective phase, as shown by main effects of judgment change, in Experiment 2a, F (1.408, 228.14) = 75.330, p < .001, ηp2 = .317, 90% CI (0.236, 0.388), see Fig. 4c; and Experiment 2b, F (1.299, 97.445) = 112.013, p < .001, ηp2 = .599, 90% CI (0.493, 0.669), see Fig. 4d; and its interaction with content, in Experiment 2a, F (1.408, 228.14) = 5.135, p = .014, ηp2 = .031, 90% CI (0.003, 0.074), and Experiment 2b, F (1.299, 97.445) = 6.984, p = .005, ηp2 = .085, 90% CI (0.015, 0.178). For full details see the OSM.

Counterfactuals

Participants created reflective counterfactuals in which the action was not immoral because of other facts, more than ones in which it was immoral but in a dilemma, both for immoral and irrational actions in Experiment 2a; in Experiment 2b, they created as many facts-based as dilemma-based counterfactuals both for immoral and irrational actions. We also compared the counterfactuals participants created in the reflective phase to those they created in the immediate phase. Participants tended to focus on the same alternative circumstance in the reflective counterfactual as they had in the immediate counterfactual, rather than switch to a different alternative circumstance, in Experiment 2a and Experiment 2b, see the OSM for further details, including Tables S4 and S6.

The results show that there is an additional imaginative shift in judgments of the morality of an immoral action when participants first imagine alternatives for only 20 s, and then subsequently imagine alternatives under no time constraints. The results replicate the first experiment in showing a large shift in judgments following even just 20 s to imagine alternatives; nonetheless the results also show that there is an additional shift when participants subsequently deliberate with no time limits. The effect occurs for immoral actions and irrational ones, and not only for possible actions but also for impossible ones. Hence people tend to think about moral possibilities effortlessly, even when the moral possibilities go against their initial judgment, and even when the moral possibilities are physically impossible (see also Phillips & Cushman, 2017). Importantly, participants tended to elaborate further upon the same counterfactuals that they had created in the immediate condition, when they had no time limits in the reflective condition, rather than consider a different alternative.

Both experiments show that an additional imaginative shift occurs after reflecting carefully on alternatives. However, a potential concern is that the additional shift could arise given the opportunity to create any sort of second counterfactual, immediate or reflective. The next experiment addresses this issue.

Experiment 3

The aim of the experiment was to examine whether an additional imaginative shift in judgments of the morality of an immoral action occurs only when participants first imagine alternatives for 20 s, and then subsequently imagine alternatives under no time constraints, as in the previous experiments, or whether it also occurs when they first reflect with no time constraints, and then subsequently create counterfactuals in 20 s. Accordingly, we compared judgments made by participants who created immediate counterfactuals followed by reflective ones to those made by participants who created reflective counterfactuals followed by immediate ones. We also included two controls, one in which participants created immediate counterfactuals followed by immediate ones, and one in which they created reflective counterfactuals followed by reflective ones.

Method

Participants

The participants were 355 students from Bahçeşehir University, Turkey, who volunteered in return for course credits. The participants were 264 women, 87 men, one gender-neutral individual, and three who did not provide information, with a mean age of 21 years and an age range of 17–51 years. They were randomly assigned to one of eight groups (see OSM, Table S7). The sample size had 98% power to detect a medium sized effect at p < .05, and we doubled the sample size for which we obtained an effect in Experiment 2a to enable us to test the predicted interaction (see Giner-Sorolla, 2018). None of the Turkish students had taken part in a similar study previously. Prior to any data analysis we eliminated participants who failed to complete all the tasks (41 participants) and those who failed the attention check question (13 participants), resulting in 355 participants.

Materials, design, and procedure

We compared judgments made by participants who created immediate counterfactuals in a second phase and reflective ones in a third phase (as in Experiments 2a and 2b), to those made by participants who created counterfactuals in the opposite order, i.e., reflective counterfactuals in the second phase and immediate ones in the third phase. We included two controls, a sequence in which participants created immediate counterfactuals in both phases, and one in which they created reflective counterfactuals in both phases (see Fig. 2). The design contained the factors of judgment content and judgment phase, and in addition examined four types of counterfactual sequence: immediate-first, reflective-first, immediate-both, and reflective-both. The materials were the same as those in Experiment 1 and the procedure was also the same.

Results and discussion

The ANOVA was a 3 (judgment phase: first, second, and third judgments) × 4 (counterfactual sequence: immediate-first, reflective-first, immediate-both, reflective-both) × 2 (judgment content: judgments of morality for immoral actions vs. judgments of rationality for irrational actions) design with repeated measures on the first factor.

Immoral actions were more acceptable when people imagined how they could have been moral. Participants’ judgments shifted as they progressed through the three judgment phases, as a main effect of phase showed, F (1.767,613.04) = 624.325, p < .001, ηp2 = .643, 90% CI (0.608, 0.671), see Fig. 5a. However, there was also a main effect of sequence, F (3,347) = 2.784, p = .041, ηp2 = .024, 90% CI (0.001, 0.049), as judgments were highest when participants constructed two reflective counterfactuals and lowest when they constructed two immediate counterfactuals; and judgment phase interacted with sequence, F (5.3, 613.04) = 4.949, p < .001, ηp2 = .041, 90% CI (0.013, 0.062).

Fig. 5
figure 5

In Experiment 3, participants provided judgments in a baseline phase, second phase, and third phase in one of four different sequences of immediate or reflective counterfactuals. In (a) their mean judgments for the moral acceptability of immoral actions and the rationality of irrational actions are presented. In (b) the difference scores for the judgment change from one phase to another is presented. Plots of data for Experiment 3 are based on 355 students from Bahçeşehir University, Turkey. Error bars are standard error of the mean

We decomposed the interaction between judgment phase and counterfactual sequence with a Bonferroni-corrected alpha of p < .0036 for 14 key comparisons. Consistent with the previous experiments, for the immediate-first sequence, judgments in the third phase increased compared to the second, t (85) = 4.579, p < .001, d = 0.494, 95% CI (0.269, 0.716). For the two controls, there were no differences between the second and third phases, immediate-both, t (92) =1.99, p = .050, d = 0.206, 95% CI (0.000, 0.411), which is not significant on the corrected alpha; and reflective-both, t (83) =0.243, p = .808, d = 0.027, 95% CI (−0.188, 0.240). The result indicates it is not simply the opportunity to create a second counterfactual that leads to an increase in the third phase for the immediate-first sequence. The difference between the second and third phase was the opposite for the reflective-first sequence: judgments in the third phase decreased compared to the second, t (91) = 3.018, p = .003, d = 0.315, 95% CI (0.104, 0.523), see OSM, Table S7. The result indicates that subsequent reflective elaboration on an immediate counterfactual is required to shift moral acceptability further.

Judgments in the second phase increased more for sequences with a reflective second phase than for those with an immediate one: reflective-first versus immediate-first, t (176) =2.982, p = .003, d = 0.447, 95% CI (0.149, 0.744) and versus immediate-both, t (183) =2.96, p = .003 d = 0.435, 95% CI (0.143, 0.726); reflective-both versus immediate-first, t (168) = 3.210, p = .002, d = 0.492, 95% CI (0.187, 0.797), and versus immediate-both, t (175) = 3.143, p = .002, d = 0.473, 95% CI (0.173, 0.772); and not when the second phase was the same sort of counterfactual, immediate-first versus immediate-both, t (177) = 0.064, p = .949; and reflective-first versus reflective-both, t (174) = 0.075, p = .941. The full set of comparisons is in the OSM.

The ANOVA also showed a main effect of judgment content, F (1,347) = 39.952, p < .001, ηp2 = .103, 90% CI (0.056, 0.153), as judgments that the immoral actions were immoral were lower than judgments that the irrational actions were irrational; and content interacted with judgment phase, F (1.767, 613.04) = 8.288, p < .001, ηp2 = .023, 90% CI (0.007, 0.045), see Fig. 5a.Footnote 2

The differences occurred at each phase, baseline, t (307.566) = 5.038, p < .001, d = 0.535, 95% CI (0.322, 0.747), second phase: t (353) = 5.286, p < .001, d = 0.561, 95% CI (0.349, 0.773), and third phase: t (353) = 5.069, p < .001, d = 0.538, 95% CI (0.326, 0.750), as shown in the decomposition of the interaction of content with phase, see OSM.

Judgment change scores

The judgment change difference scores are consistent with these results. The scores from the second to third phase were less than those from the baseline to second, or the baseline to third phase, as the main effect of judgment change shows, F (1.36, 472.858) = 432.296, p < .001, ηp2 = .555, 90% CI (0.508, 0.594), see Fig. 5b. The change from baseline to second was more than from baseline to third for the immediate-first sequence, t (85) = 4.579, p < .001, d = 0.494, 95% CI (0.267, 0.716), and reflective-first sequence, t (91) = 3.018, p = .003, d = 0.315, 95% CI (0.104, 0.523); but the two control sequences showed no differences, immediate-both, t (92) =1.909, p = .050, d = 0.206, 95% CI (0.000, 0.411), which was not significant on the corrected alpha, and reflective-both, t (83) = 0.243, p = .808, d = 0.027, 95% CI (−0.188, 0.240), with a corrected alpha of p < .004 on the decomposition of the interaction of judgment change with sequence, F (4.088, 472.858) = 10.099, p < .001, ηp2 = .080, 90% CI (0.040, 0.115). Judgment change difference scores were less for immoral actions than irrational ones, as the main effect of content showed, F (1,347) = 10.162, p = .002, ηp2 = .028, 90% CI (0.007, 0.063), and the difference occurred from the baseline to second phase, t (353) = 3.335, p = .001, d = 0.354, 95% CI (0.144, 0.564), and baseline to third phase t (353) = 3.174, p = .002, d = 0.337, 95% CI (0.127, 0.546); there was no difference for the second to third phase, t (353) =0.153, p = .879, d = 0.016, 95% CI (−0.192, 0.224). For details see the OSM.

Counterfactuals

Participants created counterfactuals in which the action was not immoral because of other facts, more than those in which the action was immoral but a dilemma, both for immoral and irrational actions, in each of the counterfactual sequences. Participants created counterfactuals that focused on the same alternative circumstance in the second and third counterfactuals rather than different ones for immoral actions, whereas for irrational actions there was no difference. They created counterfactuals that focused more on the same alternative than a different one in the immediate-first sequence and reflective-first one; there was no difference for the immediate-both sequence, and the pattern was the opposite for the reflective-both one, see the OSM, including Table S8, for further details.

The results show that an additional imaginative shift in judgments of the morality of an immoral action occurs only when participants first imagine alternatives for 20 s, and then subsequently imagine alternatives under no time constraints. The opposite occurs when they first reflect with no time constraints, and then subsequently create counterfactuals in 20 s – their judgments return to be closer to their original baseline.

A potential concern is that the instructions throughout have required participants to provide their subsequent judgments of the action “given the circumstances you have just written,” which encourages participants to focus on their re-interpreted version of the action in their second or third judgments of its morality, rather than on the original action itself. Arguably, the instruction may have introduced a demand characteristic in which participants believed they were expected by the experimenter to alter their judgment in the light of new circumstances they had written about. The final experiment attempts to probe further how participants view the original immoral action itself after they have considered ways in which it could have been moral.

Experiment 4

Our interest has been in how participants judge the morality of an immoral action after they have considered ways in which it could have been moral, and hence we instructed them to provide their second and third judgments of the action “given the circumstances you have just written.” To guard against the possibility that this instruction introduces an implicit task demand for participants to change their judgments, in the final experiment we changed the instructions to explicitly return participants’ focus to the original action. We asked them to provide their subsequent judgments by requesting them to “now provide your judgment about the behavior again” and, moreover, we repeated the description of the original behavior again to bring it to the forefront of their attention again (see Fig. 1e). According to our theory, participants’ imagination of alternative circumstances in which the action is moral should lead them to update their judgments of its morality, even given instructions that orient them back to focus on the original action.

Method

Participants

The participants were 120 students from Bahçeşehir University who volunteered in return for course credits. There were 102 women, 17 men, and one person who did not record their gender, and they had a mean age of 21 years with an age range from 19 to 26 years. They were assigned at random to two groups (see Table S4, OSM). We tested as many students as volunteered from the undergraduate module who were invited to participate. A sample size of 116 participants is required to provide at least 90% power to detect a medium sized effect at p < .05 for the main effect of immediate versus reflective counterfactuals in the 3 (judgment phase: baseline, immediate counterfactual, reflective counterfactual) × 2 (judgment content: judgments of morality for immoral actions vs. judgments of rationality for irrational actions) design with repeated measures on the first factor. Prior to any data analysis we eliminated participants who failed to complete all the tasks (11 participants), or who failed to answer correctly a “robot-detection” picture-matching question (one participant), or who failed the attention check question (three participants), resulting in 120 participants.

Materials, design, and procedure

The design of the experiment was similar to Experiment 2a, with one exception: after participants made judgments in a baseline phase, and thought about some alternative circumstances for 20 s, their instructions to complete the judgments a second time were as follows: “Please now provide your judgment about the behavior again: A passenger in an airplane does not want to sit next to a Muslim passenger and so he tells the stewardess the passenger must be moved to another seat.” Participants then thought about some alternative circumstances with no time limit, and their instructions to complete the judgments a third time were again the new instructions: “Please now provide your judgment about the behavior again,” with the description of the behavior included again.

Participants were assigned to two groups (judgment content, irrational or immoral), and they completed two judgments (moral acceptability, rationality) for three actions in the three judgment phases of baseline, immediate, and reflective, i.e., 18 judgments in total. Hence, the between-participant factor judgment content again had two levels (immoral, irrational) and the within-participant factor of judgment phase had three levels (baseline, immediate-counterfactual, reflective-counterfactual), see Fig. 2. The materials and measures were based on those in Experiment 3.

Results and discussion

We carried out a 3 (judgment phase: baseline, immediate-counterfactual, reflective-counterfactual) × 2 (judgment content: judgments of morality for immoral actions vs. judgments of rationality for irrational actions) ANOVA with repeated measures on the first factor on participants’ judgments. Immoral actions were more acceptable when people imagined how they could have been moral, even with the new instructions. Participants’ judgments shifted as they progressed through the three phases, replicating the previous experiments, as shown by a main effect of judgment phase, F (1.328, 156.72) = 60.913, p < .001, ηp2 = .34, 90% CI (0.242, 0.424), see Fig. 6a. As in the previous experiments, they judged the immoral actions to be more immoral in the baseline condition compared to the immediate condition, and compared to the reflective condition, and their judgments continued to shift when they created reflective counterfactuals in the third phase compared to immediate ones in the second phase. There was a main effect of judgment content, F (1, 118) = 3.984, p = .048, ηp2 = .03, 90% CI (0.000, 0.099), as participants’ judgments of the morality of immoral actions were lower than their judgments of the rationality of irrational actions. There was no interaction between the two variables, F (1.328, 156.72) = 1.284, p = .270, ηp2 = .011, 90% CI (0.000, 0.049), see OSM, Table S9.

Fig. 6
figure 6

In Experiment 4 participants constructed immediate counterfactuals and then reflective ones. Their mean judgments for the first, second and third phases are presented for the moral acceptability of immoral actions and the rationality of irrational actions in (A); the difference scores for the judgment change from one phase to another are presented in (B). Plots of data are based on 126 students from Bahçeşehir University. Error bars are standard error of the mean

Judgment change difference scores

Judgment change difference scores from the immediate to reflective phase were less than from the baseline to the immediate phase, and change scores from the baseline to immediate phase were less than from the baseline to the reflective phase, as shown by a main effect of judgment change, F (1.732, 204.43) = 29.104, p < .001, ηp2 = .198, 90% CI (0.119, 0.272), see Fig. 6b. There was no main effect of judgment content, F (1, 118) = 1.513, p = .221, ηp2 = .013, 90% CI (0.000, 0.065), and no interaction of the two variables, F (1.732, 204.429) =0.613, p = .520, ηp2 = .005, 90% CI (0.000, 0.028).

The results show that participants update their moral judgments when they consider some alternative ways in which an immoral action could have been moral, even when the instructions are careful to remove any implicit task demand to do so. The experiment replicates the findings of the previous experiments, with quite different instructions designed to re-focus participants on the original action. We can conclude that even though participants judge an action to be morally unacceptable initially, once they imagine alternative circumstances in which the action could have been morally acceptable, they revise their initial judgment and consider the immoral action to be less immoral. Once again, they can do so even after they consider alternative circumstances for just a very short time.

General discussion

How does an immoral act come to be considered less morally unacceptable? An important mechanism is that people can imagine ways in which it would have been moral, even in the absence of any further facts about the matter. People judged immoral actions to be not at all moral, but when they imagined alternative circumstances in which they would be moral, a striking shift in their judgments about the actions’ moral unacceptability was observed in all five experiments. Arguably, after they have imagined alternative circumstances, people do not consider the situation to be the same. The finding implies that people possess the moral flexibility to allow circumstance to moderate their assessment of others’ moral behavior, rather than being tied to their initial interpretations, and even when the circumstance is entirely imagined rather than based on further information (e.g., Cone & Ferguson, 2015; Graham & Haidt, 2012; Harman, 1975; Mann & Ferguson, 2015; Mikhail, 2013; Monroe & Malle, 2019; Piazza et al., 2013; Sabo & Giner-Sorolla, 2017; Sinnott-Armstrong, 2002; Stanley et al., 2018). Remarkably, they do so even though they receive no external confirmation that their imagined circumstances apply validly to the situation. They deploy a repertoire of counterfactual argumentation strategies to do so, including counterfactuals that deny the action was immoral by introducing additional facts to modify the interpretation of the situation, or ones that accept the immorality of the action but introduce a dilemma with a competing moral action to justify the violation. They rarely exhibited resistance to the idea that the immoral action could be considered moral, but equally rarely engaged in any attempt to modify the norm upon which it was based (e.g., Gendler, 2000; Haidt, 2012). The same pattern was observed for thoughts about immoral actions, and those about unreasonable ones.

People can readily imagine, in a matter of seconds, alternative circumstances in which an immoral action would have been moral. Quickly jotting down a few words in just 20 s to convey the first thought that comes to mind had a significant impact on subsequent moral judgments. The short time frame of 20 s for reading the instruction, the scenario, and typing an answer leaves very little time indeed to spend on imagining an alternative. Alternative moral possibilities appear to be immediately accessible (e.g., Phillips & Cushman, 2017). Moreover, there is an added effect of reflecting carefully on alternative circumstances without any time constraints, which shifts moral judgments further in the same direction of increased moral acceptability. Participants constructed counterfactuals in the reflective phase that elaborated on the same idea as the one they first thought of in the immediate phase. The counterfactual possibility they generated briefly in the first phase may have appeared to them to warrant further elaboration. However, when a reflective counterfactual phase was followed by an immediate one, a reversal in judgments of moral acceptability was observed. Participants tended to think of the same idea again, but the 20-s limit on thinking about it reduced their judgment of the action’s moral acceptability from the level attained by reflection. This unexpected result may indicate that revisiting the original counterfactual possibility so briefly somehow undermined its effectiveness; the finding merits further investigation.

Unreasonable actions can also come to be considered less irrational when people imagine how they would have been rational, just like immoral actions. Although the immoral actions were considered more morally unacceptable than the irrational actions were considered unreasonable, nonetheless the same pattern was observed for both. People may expect others to behave in ways that are moral and reasonable, and so these default possibilities may be readily available (e.g., Cushman, 2020; Phillips et al., 2015, 2019; Phillips & Cushman, 2017). The finding is consistent with the idea that moral cognition relies on the same sorts of cognitive processes that underpin reasoning about non-moral matters (Bucciarelli et al., 2008; Cushman & Young, 2011; Knobe, 2018; Rai & Holyoak, 2010; Uttich & Lombrozo, 2010; see also Haidt, 2012; Young & Saxe, 2011). The potential sorts of cognitive processes that are implicated by these discoveries are sketched in Table 2.

Table 2 An illustration of the cognitive processes in counterfactual imaginative moral shifts

The experiments included participants from different cultures. Participants from the USA and UK (in Experiments 1 and 2b) and those from Turkey (in Experiments 2a, 3, and 4) judged the immoral actions to be similarly morally unacceptable at the baseline phase (generally about 5–10 on the 0–100 scale). However, the US and UK participants exhibited a greater moral shift than the Turkish participants, at the second phase (generally to about 40 on the scale vs. to about 30, respectively) and at the third phase (generally about 60 vs. about 40, respectively). In contrast, both populations judged the irrational actions to be similarly irrational at the baseline phase, and exhibited a similar shift in the second and third phases. Cultural and content effects on imaginative moral shifts are worth examining further, given that counterfactuals are pervasive, occurring regardless of linguistic or cultural convention and throughout the lifespan (e.g., Au, 1983; Beck et al., 2006; Byrne, 2005; Dudman, 1988; Harris, 2000; Walsh & Byrne, 2001).

The role of the counterfactual imagination in moral mitigation has implications for its preparatory function of supporting intentions to change (e.g., De Brigard, Addis, et al., 2013a; Rim & Summerville, 2014; Roese & Epstude, 2017; Smallman & McCulloch, 2012; Timmons et al., 2021; Van Hoeck et al., 2013) and its emotional amplification of feelings of regret or relief (Kahneman & Tversky, 1982; Sweeny & Vohs, 2012). In our experiments, participants were directed to think about whether there were alternative circumstances in which the actions would be moral, and further study of the extent to which participants engage spontaneously in such moderation is needed, as is examination of the effects of being provided with arguments based on such alternatives. Moreover, participants were asked to imagine ways in which the morally unacceptable behavior could be acceptable. Of course, they could have imagined a worse-world than the actual world, in which all such morally unacceptable actions are acceptable, for example, a world in which everyone believes racism is acceptable, but instead, most participants created a better-world or upward counterfactual, in which the specific morally unacceptable action was acceptable, because of a change to the facts, for example, the action was not in fact an instance of racism, or because of a dilemma, for example, the action was racist but carried out in service of another moral principle, for example, protecting others. Since they created upward counterfactuals about how the action could be interpreted as a better one, they updated their moral judgments also in an upwards direction to be more favorable towards the action. It may be fruitful in future research to examine the effects of directing participants to imagine a downward, worse-world counterfactual, that is, how a behavior could be even less morally acceptable, to examine whether they also update their moral judgments downwards to be even harsher, again based solely on imagination. A demonstration that the moral imaginative shift occurs in either direction, upwards or downwards, would provide further support for our argument that imagination alone can alter moral judgments even in the absence of further facts.

It is notable that judgments about unreasonable actions shifted from being considered to be not at all rational, to being considered rational, whereas judgments about immoral actions shifted from being considered to be not at all moral, towards the mid-point of the 0–100 scale, but rarely beyond the mid-point. Of course, the baseline judgments for irrational actions about unreasonable actions was higher than the baseline judgments for immoral actions. Nonetheless, the action’s immorality was neutralized rather than transformed to be moral, and so whether mechanisms other than the counterfactual imagination can bring about greater transformations remains an open question.