The present paper is concerned with defeasible reasoning, which has a long tradition at the interface between philosophy, logic, and Artificial Intelligence (e.g., Brewka, 1991; Delgrande, 1987; McCarthy & Hayes, 1969; Pelletier & Elio, 1997), but also gained attention in psychology (e.g., Oaksford & Chater, 1991, 1995, 2013; Pfeifer & Douven, 2014; Stenning & van Lambalgen, 2005). A typical example of defeasible reasoning is the Tweety problem: people say that a bird named Tweety can fly but change their mind when hearing that Tweety is a penguin.

Here, we aimed to investigate defeasible reasoning in the legal domain. For instance, imagine you are a judge confronted with the following problem:

If somebody kills another human, then this person should be punished for manslaughter.

Bert killed another human.

Should Bert be punished for manslaughter?

How would you decide? According to classical logic the correct answer is “yes.” When people have to reason about a conditional rule, and are told that the “if”-part of the rule (i.e., the antecedent p) is true, then they should conclude that the “then”-part of the rule (i.e., the consequent q) follows. This inference pattern is called Modus Ponens (MP).

Yet, an affirmative answer is valid not only according to classical logic, but also according to most people’s sense of justice. People often react with intuitive feelings of moral outrage and a desire to punish when faced with offences (Darley, 2009; Darley & Pittman 2003; Carlsmith & Darley, 2008; Tetlock, Kristel, Elson, & Lerner 2000; see also Fehr & Gächter, 2002). The higher the moral outrage people feel, the more they want the offender to be punished (Darley, 2009; Darley, Carlsmith, & Robinson, 2000). In fact, moral outrage predicts the perceived severity and desired sentence of an offence (Alter, Kernochan, & Darley 2007; Carlsmith, Darley, & Robinson, 2002; Darley, 2009; Darley et al., 2000; Gromet & Darley, 2009; see also Buckholtz et al., 2008; Young, Cushman, Hauser, & Saxe, 2007), and it explains why the sentencing philosophy most laypeople subscribe to is best described as retributive justice, following a “just desserts” principle (Carlsmith & Darley, 2008; Carlsmith et al., 2002; Darley, et al., 2000; Keller, Oswald, Stucki, & Gollwitzer, 2010).

However, which conclusion do people choose if they get the additional information that there are circumstances that could exonerate an offender? For example:

If somebody kills another human, then this person should be punished for manslaughter.

Bert killed another human.

Because of a psychological disorder, Bert was not able to control his actions.

Should Bert be punished for manslaughter?

Note that there is one new sentence added to the problem. Yet, according to classical logic the correct answer is still “yes” because of the property of monotonicity: a logically valid conclusion – the affirmative answer in our example – cannot be altered by additional information. However, classical logic is not the only system that can be used to explain human reasoning. In contrast to classical logic, defeasible reasoning is non-monotonic: new information (here: Bert having a psychological disorder) can defeat previously drawn conclusions (Byrne, Espino, & Santamaria, 1999; Da Silva Neves, Bonnefon, & Raufaste, 2002; Evans, 2002; Oaksford 1993; Oaksford & Chater, 1995; 2013; Politzer, 2007).

Several studies show that everyday reasoning is defeasible. When people are asked to reason about conditionals, they do not merely follow the formal structure of the rule, but also consider its content. If they can imagine situations in which the antecedent p is fulfilled but the consequent q is not, they view those as counterexamples Footnote 1 and thus suppress the conclusion of a logically valid inference (Byrne, 1989; Cummins, Lubart, Alksnis, & Rist, 1991; De Neys, Schaeken, & d’Ydewalle, 2003a, 2003b; Dieussaert, De Neys, & Schaeken, 2005; Johnson-Laird & Byrne, 2002; Markovits & Potvin, 2001; see also Oaksford & Chater, 2001, 2003; Oaksford, Chater, & Larkin, 2000). Counterexamples therefore weaken the association between p and q, by lowering the sufficiency of p for q (e.g., Thompson, 1994, 1995).

Legal rules are also defeasible (Bäcker, 2010; Prakken, 1997; Prakken & Sartor, 2004; Sartor, 2009). In the example above, penal code prescribes that the offender should not be punished, because the absence of criminal responsibility is an exculpatory circumstance according to law and a counterexample in psychological terms. In law, exculpatory circumstances do not just lessen a sentence as it is the case for mitigating circumstances (e.g., 5 instead of 8 years of imprisonment), but void sentence completely (e.g., no punishment). In the German Penal Code exculpatory circumstances are written in the General Part of the Penal Code (i.e., Allgemeiner Teil) and thus apply to (almost) all specific offences (such as manslaughter, bodily injury, theft etc.) which are written in the Special Part (i.e., Besonderer Teil). This characteristic of the Penal Code – having separate rules for specific offences and for exculpatory circumstances – together with the fact that it is impossible to enumerate all possible exculpatory circumstances beforehand as part of the antecedent (cf. Wang, 2004) makes legal reasoning defeasible. For instance, the application of the rule against manslaughter (§212 StGB: “Wer einen Menschen tötet, ohne Mörder zu sein, wird als Totschläger mit Freiheitsstrafe nicht unter fünf Jahren bestraft [Whoever kills a human being, without being a murderer, is punished for manslaughter with imprisonment of not less than five years]”) can be defeated towards no punishment when paired with rules describing exculpatory circumstances, such as lack of criminal liability because of psychological disorders, self-defense, or necessity.

Lawyers, who know that legal rules are defeasible, will therefore conclude in our example above that the offender should not be punished. However, the main interest of the present study is in how laypeople decide about exculpatory circumstances. Unlike lawyers, laypeople do not have elaborated knowledge structures about exculpatory circumstances in law. The presentation of a legally exculpatory circumstance does not guarantee that laypeople actually accept it as exculpatory. That is, they might still conclude that an offender should be punished despite a legally exculpatory circumstance. We therefore hypothesize that while lawyers evaluate exculpatory circumstance information according to the rules of the penal code, laypeople use a different knowledge base to do so. We assume that laypeople will use their own sense of justice, based on feelings of moral outrage, to decide about whether offenders should be punished or not. The more morally outrageous the initial offence is, the harder it is for laypeople to accept circumstance information as exculpatory and the more they will deny this circumstance as being sufficiently strong to refrain from punishing the offender. This higher difficulty should be reflected in less no-punishment conclusions in legal reasoning tasks and longer decision times whenever they make decisions contrary to their feelings of moral outrage.

Note that we are not just interested in showing that lawyers differ from laypeople (which is highly unsurprising). Instead, we aim to explain how laypeople deal with their lack of knowledge in a defeasible reasoning paradigm. Laypeople could simply ignore all possible exculpatory circumstances, or could accept all of them. However, we suppose that laypeople use their own sense of justice, guided by feelings of moral outrage, to decide about exculpatory circumstances. This implies that the emotional attachment to the legal conditional rule influences its perception of defeasibility. Such a finding would be an important addition to recent studies in the psychology of reasoning dealing with the influence of emotions on reasoning (see e.g., Blanchette, 2006; Blanchette & Leese, 2010; Jung, Wranke, Hamburger, & Knauff, 2014; Oaksford, Morris, Grainger, & Williams, 1996; Perham & Oaksford, 2005).

Our assumptions are supported by studies on moral outrage cited above, and by studies indicating that people experience emotional reactions towards offences even when instructed to take a purely legal point of view (Schleim, Spranger, Erk, & Walter 2011; see also Buckholtz et al., 2008). In fact, many psychologists argue that emotional responses towards offences are the driving force of many moral judgments, whereas deliberative decision-making processes serve only as post-hoc justifications (Haidt, 2001, 2007). For instance, Manktelow, Fairley, Kilpatrick, and Over (2000) showed that in the case of road traffic violations participants were much more influenced by mitigating circumstances in cases of speeding than in cases of drunk driving. However, the authors did not explain these differences in much detail, even though, from our point of view, driving drunk might be more morally outrageous than speeding. In addition, Bonnefon, Haigh, and Steward (2013) found also evidence pointing in this direction. Although they did not work specifically with legal conditionals, they showed that when the antecedent of a conditional describes somebody doing something bad (e.g., insulting or hurting someone) people expect that the consequent will describe something negative happening to this person (e.g., “If Brian insults Mandy, then he will get told off”). However, these studies were not designed to measure how emotional constraints can affect the consideration of counterexamples, nor did they vary the emotional attachment to the conditional rules.

We report three experiments on defeasible reasoning with legal conditionals. In Experiment 1 we developed our experimental paradigm and tested whether it is appropriate for measuring (a) defeasible reasoning, and (b) differences in legal reasoning between people with legal education and laypeople. In Experiment 2 we approached our hypotheses more systematically. We varied the level of moral outrage evoked by the offence, but kept the quality of exculpatory information constant, and tested whether moral outrage inhibits the consideration of exculpatory circumstances. In both experiments the participant’s task was to decide – either as a judge following the rules of the penal code (Experiments 1 and 2) or as a judge following their own sense of justice (Experiment 2) – whether an offender should be punished for the specific offense described in the legal conditional or not (e.g., manslaughter, bodily injury, theft etc.). In Experiment 3 we tested whether the consideration of exculpatory circumstances has a correspondent in memory by using an alternative experimental paradigm in which participants had to generate exculpatory circumstances. In all experiments we tested people with legal education and laypeople. The group of laypeople consisted of students with no specific law knowledge, i.e., students from other disciplines. The group of people with legal education consisted of graduate lawyers (who completed at least the first German state examination) and advanced law students. In the latter group, we ensured that all law students were already familiar with the legal rules of penal code used in our experiments; this is usually the case after the first three to four semesters of law studies in Germany. Strictly speaking these law students are not yet fully-trained lawyers; nonetheless they are already familiar with the relevant legal rules and certainly have more legal experience than laypeople. Hence, for simplicity reasons, we here refer to the group of people with legal education as ‘lawyers.’ In all our experiments, we used a different sample of participants.

Experiment 1

In Experiment 1 we developed our experimental paradigm. We created legal conditionals by selecting legal rules from the German penal code and by putting those into a conditional form. In fact, in legal theory many legal rules ought to be understood as conditionals (Koch & Rüßmann, 1982). For instance, the rule for manslaughter was rephrased as: If a person kills another human, then the person should be punished for manslaughter (cf. Bäcker, 2009, 2010). In Experiment 1 we selected only severe offences. We combined legal conditionals with circumstance information that could be exculpatory, neutral, or aggravating. We tested whether (a) people defeat logically valid conclusions in light of such circumstances, and (b) this task is appropriate to measure differences between lawyers and laypeople. We expected that, when faced with exculpatory circumstances, both groups of participants would suppress the logically valid conclusion to punish the offender. However, since we selected only severe offences – and severe offences are related to high moral outrage (Darley et al., 2000) – we expected this effect to be less pronounced for laypeople. We added aggravating circumstances to check if, besides defeating conclusions, circumstance information also enhances punishment conclusions by strengthening the association between the antecedent (the offence) and the consequent (the punishment) of the legal conditional (cf. Manktelow & Fairley, 2000; Stevenson & Over, 1995).

Method

Participants

Participants were 22 lawyers (16 female) and 26 laypeople (14 female). The mean age of lawyers was 26.5 years (SD = 6.7); the mean age of laypeople was 23.3 years (SD = 2.3). Within the lawyers’ group, eight had already graduated from law school, the rest were still at university but already had knowledge about the offences presented in the experiment.Footnote 2 Law students had studied for 4.6 semesters on average.

Materials and design

Our problems followed the structure of a defeasible MP inference, but were adapted to the legal context. They consisted of (1) a legal conditional “if p then q,” where p refers to an offence (manslaughter, arson, bodily injury, or theft) and q to a punishment, (2) the fact p stating that someone committed the offence, (3) additional information about circumstances, and (4) the conclusion phrased as a question about q, that is, whether the offender should be punished for the offence or not (yes vs. no). For each problem, participants were also asked to indicate on a 7-point Likert scale (5) how certain they were about their conclusion (1 = very unsure, 7 = very sure), and (6) how severe they perceived the offender’s action to be (1 = not severe at all, 7 = very severe). The latter question was meant to show whether participants incorporated the information about circumstances into their mental representation of the offence. An example problem is as follows:

Legal conditional rule::

If a person kills another human, then the person should be punished for manslaughter.

Fact::

Bob killed another human.

Circumstances::

Bob is schizophrenic and had a delusion of an attack against him.

Conclusion::

Should Bob be punished for manslaughter?

Certainty::

How certain are you?

Severity::

How severe do you perceive Bob’s action to be?

We constructed 48 conditional reasoning problems. Our experimental manipulation was that all problems were presented either with situations which were potentially exculpatory, aggravating, or neutral (i.e., crime-irrelevant) for the given legal rule. Among the exculpatory circumstances, half were legally relevant (i.e., potentially exculpatory for the offence according to the penal code, or at least permissible as such at a judge’s discretion), and half were legally irrelevant (i.e., probably exculpatory, or at least mitigating, according to some personal standards, but not according to law). The same distinction of legally relevant or irrelevant information was made for the aggravating circumstances. Neutral circumstances were, per definition, always legally and morally irrelevant. Thus, the problems with neutral circumstances represented the base acceptance rate of the conditional legal rule. Examples of the circumstances are presented in Table 1 (all problems are presented in the Appendix in Table 7). All exculpatory, aggravating, and neutral circumstances were selected from a larger pool of problems (N = 192) that were tested in a pilot study. In this pilot study, participants (n = 16 for theft and manslaughter; n = 17 for bodily injury and arson) rated how mitigating or aggravating they perceived a particular circumstance for a given offence. Besides exculpatory circumstances, mitigating circumstances were also used in this pilot study. For the main experiment, we selected those combinations of offences and circumstances that received the highest “aggravating” and “mitigating” ratings in the pilot study. As “neutral” we used the circumstances which obtained the mean value in the scale. All circumstance descriptions were of similar length (61 ± 2 characters including spaces). Moreover, we varied the name and gender of the offender between subjects to avoid possible effects of attitudes or preferences (Sporer & Goodman-Delahunty, 2009), but not as another independent variable. Overall, the experiment followed a 2 (group: laypeople vs. lawyers) × 2 (relevance: legally relevant vs. legally irrelevant) × 3 (circumstances: exculpatory vs. aggravating vs. neutral) mixed design. The factor group was a between-subjects factor; all other factors were within-subjects factors. We did not differentiate between the different offences in the 48 problems (manslaughter, arson, bodily injury, or theft), so they were not treated as an additional factor.Footnote 3

Table 1 Examples for legally relevant and legally irrelevant exculpatory, neutral, and aggravating circumstances used in Experiment 1 (original material was in German)

Procedure

The experiment was programmed in Cedrus SuperLab© 4.X and took place on a computer. Participants were tested individually. The experiment was introduced as an experiment about reasoning in law. Participants were told that they would be confronted with legal cases in which a person committed an offence, and that their task was to decide as a judge whether the person should be punished for the offence. The legal conditional was introduced as a general legal rule. Problem components (i.e., rule, fact, circumstance, and conclusion) were each presented on separate screens. Participants could switch from one screen to the next by pressing the space bar and gave their conclusions by pressing a “y” (yes) or an “n” (no) key. The number pad was used to provide ratings for the last two questions on certainty and severity. All statements were presented in black font except the conclusion question, which was presented in red. Problems were presented in German. Participants had the opportunity to take a break between problems. At the beginning of the experiment, participants completed six practice problems. For both practice and experimental trials, the order of problems was randomized. The experiment took about 45 minutes. All participants received monetary compensation for their participation.

Results

General note

Data in all three experiments were analyzed with analyses of variance (ANOVAs), which will be described in more detail in the corresponding sections. In cases where Maulchy’s Sphericity test was significant, we used Greenhouse Geisser corrected values. Significant effects in the ANOVAs were scrutinized with t-tests or follow-up ANOVAs where appropriate. P-values in these follow-up analyses were tested against Bonferroni adjusted alpha levels. For decision times we always computed the time between presentation of the circumstances and participants’ punishment conclusions and excluded times resulting from mistyped/invalid answers (occurring only M = 1.1 times per participant in Experiment 1, and M = 0.55 times per participant in Experiment 2).

Perceived severity ratings (Manipulation check)

We analyzed severity ratings with a 2 (group: laypeople vs. lawyers) × 2 (relevance: legally relevant vs. legally irrelevant) × 3 (circumstances: exculpatory vs. aggravating vs. neutral) mixed ANOVA. Descriptive data can be found in Table 2. We found a main effect of circumstances, F(1.36, 62.77) = 152.85, p < .001, ηp 2 = .769, a main effect of relevance, F(1, 46) = 49.77, p < .001, ηp 2 = .520, an interaction between circumstances and relevance, F(2, 92) = 118.55, p < .001, ηp 2 = .720, and a three-way interaction between all factors, F(2, 92) = 4.43, p = .015, ηp 2 = .088. While the effect of circumstances did not differ between lawyers and laypeople for problems with legally relevant circumstances, F(1.41, 64.79) = 0.71, p = .447, ηp 2 = 0.015, it did for problems with legally irrelevant circumstances, F(1.47, 67.43) = 4.82, p = .011, ηp 2 = .108. Laypeople were descriptively more influenced by irrelevant exculpatory and irrelevant aggravating circumstances than lawyers, although pairwise t-tests did not reach significance (ts < 1.52, ps > .137; Bonferroni adjusted alpha: 0.0167). Overall, problems with exculpatory circumstances were perceived as less severe than problems with neutral circumstances, t(47) = 9.81, p < .001, d = 1.49Footnote 4, and those as less severe than problems with aggravating circumstances, t(47) = 10.16, p < .001, d = 0.75 (Bonferroni adjusted alpha: 0.025). All other effects were not significant (Fs < 2.48, ps > .110).

Table 2 Mean severity ratings (and standard deviations) for problems with legally relevant and legally irrelevant circumstances in Experiment 1

Defeated conclusions (no-punishment)

We calculated the percentage of no-punishment conclusions for each of the problem categories (see Table 3). As no-punishment decisions were scarce for neutral and aggravating circumstances, we only conducted a 2 (group: laypeople vs. lawyers) × 2 (relevance: legally relevant vs. legally irrelevant) ANOVA on no-punishment conclusions for exculpatory circumstances. We found a main effect of relevance, F(1, 46) = 224.05, p < .001, ηp 2 = .83, and an interaction between group and relevance, F(1, 46) = 11.59, p = .001, ηp 2 = .201. In cases of legally irrelevant exculpatory circumstances, lawyers and laypeople showed no difference in percentage of no-punishment conclusions, t(32.10) = 0.47, p = .644, d = 0.13, with participants in both groups almost never considering legally irrelevant exculpatory circumstances as valid counterexamples. In cases of legally relevant circumstances, however, lawyers and laypeople differed: lawyers accepted legally relevant exculpatory circumstances as valid counterexamples much more often than laypeople, t(46) = 2.39, p = .021, d = 0.69 (Bonferroni adjusted alpha: 0.025). No main effect of group was found, F(1, 46) = 2.14, p = .151, ηp 2 = 0.044.

Table 3 Percentages (and standard deviations) for the no-punishment conclusions in Experiment 1

Decision times and certainty ratings

We analyzed decision times and certainty ratings separately for punishment and no-punishment conclusions (Fig. 1). As the majority of no-punishment conclusions were made in light of legally relevant exculpatory circumstances, only these problems were analyzed. For both analyses, we conducted a 2 (decision: punishment vs. no-punishment) × 2 (group: laypeople vs. lawyers) mixed ANOVA.

Fig. 1
figure 1

Decision times and certainty ratings for punishment and no-punishment conclusions in Experiment 1. Error bars show standard errors

For the analyses of decision times we found a significant interaction between group and punishment decision, F(1, 42) = 4.36, p = .043, ηp 2 = .094. Whereas lawyers showed no differences in their decision times for punishment and no-punishment conclusions, t(21) = 0.94, p = .358, d = 0.21, laypeople required significantly more time to select no-punishment than to select punishment conclusion, t(21) = 2.80, p = .011, d = 0.62 (Bonferroni adjusted alpha: 0.025). However, a main effect of group also indicated that, in general, lawyers had longer decision times than laypeople, F(1, 42) = 5.82, p = .02, ηp 2 = .122. No main effect of decision was found, F(1, 42) = 0.16, p = .689, ηp 2 = .004.

The analyses of certainty ratings showed a similar pattern. Although the interaction between group and punishment decision failed to reach significance, F(1, 41) = 2.70, p = .108, ηp 2 = .062, descriptively only laypeople were less certain about no-punishment than about punishment decisions. Additional main effects revealed that lawyers were more certain than laypeople, F(1, 41) = 5.67, p = .022, ηp 2 = .122, and all participants were more certain about punishment than no-punishment conclusions, F(1, 41) = 6.29, p = .016, ηp 2 = .133.

Discussion

Our data show that people indeed use additional information about the circumstances of an offence when reasoning with legal conditionals. However, in accordance with our hypotheses, lawyers and laypeople differed in their no-punishment decisions. Laypeople decided less often than lawyers not to punish offenders, probably because they have no elaborate knowledge of the penal code and instead use their own sense of knowledge, guided by feelings of moral outrage, to decide about the offences and exculpatory circumstances. This idea is supported by the decision times and the certainty ratings. Laypeople took longer and were descriptively more uncertain when they decided contrary to moral outrage not to punish the offender. Contrary to laypeople, lawyers’ decision times and certainty ratings did not differ between punishment or no-punishment decisions. Lawyers were always more certain than laypeople and also needed more time than laypeople to make a decision. We think these higher decision times indicate that the underlying cognitive processes of lawyers were more deliberate (cf. Evans, 2008). However, lawyers did not consider all the legally relevant exculpatory circumstances that we presented, probably because for some of the legally exculpatory circumstances presented in this experiment it lies within the judges’ discretion whether they refrain from punishing or whether they only consider them as mitigating circumstances.

In addition to exculpatory circumstances we also added aggravating circumstances into our experimental paradigm to test whether such information enhances the logically valid answer of punishing the offender (cf. Manktelow & Fairley, 2000; Stevenson & Over, 1995). However, this was not the case, probably because of a floor effect. Laypeople’s severity perception of offences with neutral circumstances was already pretty high (M = 5.51; SD = 0.86), so that their rate of no-punishment conclusions for offences with neutral circumstances was already low. Thus it is possible that further aggravating information did not have an additional effect on participants’ preference of punishment conclusions.

Experiment 2

In Experiment 1, we showed that lawyers and laypeople differ in their acceptance of exculpatory circumstances as counterexamples to conditional legal rules. However, we did not test if this difference depends on moral outrage. It could be that lawyers and laypeople differed only because laypeople always reject violations of norms per se, irrespective of how morally outrageous an offence is. To test our hypothesis of laypeople’s punishment decisions depending on moral outrage, it is necessary to pair offences of differing degrees of moral outrage with the same kinds of circumstance information. If laypeople’s consideration of potential counterexamples depends on how morally outrageous the offence is, then the difference between lawyers and laypeople in no-punishment decisions found in Experiment 1 should diminish for low moral outrage offences, but stay for high moral outrage offences. Lower moral outrage towards an offence should make laypeople more willing to accept exculpatory counterexamples, as they do not feel the strong desire to punish the offender. Overall, lawyers’ decisions should not vary with the degree of moral outrage an offence might evoke, but only with what is prescribed by the penal code.

We also varied the participants’ perspective by phrasing two different instructions: one condition asked them to act according to their own sense of justice while the other to act as they think a real judge would do. If decisions about exculpatory circumstances are influenced by moral outrage evoked by the offences, the effect of moral outrage should be higher for participants in the former group. Also, laypeople should be more certain of their decisions when instructed to make a decision based on their own sense of justice.

Method

Participants

Participants were 24 lawyers (15 female) and 40 laypeople (20 female). Three participants in the lawyers’ group were excluded from analysis because they failed to fulfill the inclusion criteria of having studied law for at least four semesters or having passed their intermediate law examination. Thus, the final sample of lawyers consisted of 21 participants (12 female). The mean age of the lawyers was 26.48 years (SD = 4.06); the mean age of the laypeople was 24.15 years (SD = 5.31). Six participants from the lawyers’ group had already graduated, the rest were still at university, having studied for 9.6 semesters on average.

Materials and design

We created 36 conditional problems that followed the same structure of those in Experiment 1, but used legal conditionals that differed in the level of moral outrage evoked by the offences. Maltreatment of wards and child sexual abuse were considered high moral outrage offences, handling stolen goods and breach of domestic peace were considered medium moral outrage offences, and illegal gambling and obtaining benefits by devious means were considered low moral outrage offences. These different offences were selected from a large and representative (N = 448; 315 female) preliminary study in which participants rated on a scale from 1 to 7 the level of moral outrage felt in response to N = 36 offences of the German penal code. High moral outrage offences received a mean rating of M = 6.83 (SD = 0.48), medium moral outrage offences a mean rating of M = 3.91 (SD = 1.42), and low moral outrage offences a mean rating of M = 2.34 (SD = 1.24).

As additional information we used relevant exculpatory and irrelevant control circumstances. Relevant exculpatory circumstances were taken from the General Part of the German penal code and described scenarios of (1) absence of criminal responsibility due to psychological disorders, (2) mistakes of law, or (3) necessity brought about by coercion. Irrelevant control circumstances also pertained to psychological disorders, mistakes in law, and situations of coercion, but were completely irrelevant to the offence (e.g., psychological disorders with no legal connection to the crime, like having crime-irrelevant problems with memory in a case of maltreatment of a ward). These control circumstances were selected from a larger pool from three online studies (N = 21, N = 20, and N = 27) and were used to ensure that participants attended to the task and read all of the circumstance information to make a decision. The crucial manipulation of Experiment 2 was that we paired each circumstance with each legal conditional. This allowed us to assess whether the same circumstances were weighted differently depending on the degree of moral outrage of the offence with which it is presented. Offenders described in the problems were always male. Examples of the circumstances can be found in Table 4, and the whole list of problems in the Appendix (Table 8).

Table 4 Example of problems used in Experiment 2 translated into English. This problem describes relevant exculpatory vs. irrelevant control circumstances in a case concerning maltreatment of wards (“If a person maltreats a minor in their charge, then the person should be punished for maltreatment of wards”)

We gave two different sets of instructions specifying the perspective that participants should take during evaluation of the conclusion. All lawyers and half of the laypeople were instructed to imagine that they were a judge who always relies on prescriptions of the legal system (“legal system” instruction). The other half of the laypeople were instructed to imagine that they were a judge who makes decisions based on his or her own sense of justice irrespective of regulations of the legal system (“own sense of justice” instruction). Our experiment used a 3 (moral outrage: high vs. medium vs. low) × 3 (group: laypeople – own sense of justice vs. laypeople – legal system vs. lawyers – legal system) × 2 (circumstances: exculpatory vs. control) mixed design. The subject group was a between-subjects factor, and the other factors were within-subject factors.

Procedure

The procedure was the same as in Experiment 1, but stating more explicitly during the instructions that the participants’ task was to decide whether a legal conditional rule should be followed or not, and that the application of a rule would lead to punishment of the offender. The perspective to be taken by participants was given during the instructions and was highlighted in blue. After reading the instructions, we made sure that participants understood the perspective to be taken by asking them to rephrase the instructions. The experiment took about 30 min.

Results

Perceived severity ratings (Manipulation check)

Perceived severity ratings (upper part of Table 5) were analyzed using a 3 (moral outrage: high vs. medium vs. low) × 3 (group: laypeople – own sense of justice vs. laypeople – legal system vs. lawyers – legal system) × 2 (circumstances: exculpatory vs. control) mixed ANOVA. We found a main effect of moral outrage, F(1.37, 79.57) = 323.64, p < .001, ηp 2 = .848. High moral outrage offences were perceived as more severe than medium moral outrage offences (t(60) = 16.36, p < .001, d = 2.37), and those as more severe than low moral outrage offences (t(60) = 8.39, p < .001, d = .0.57, Bonferroni adjusted alpha: 0.025). We also found an interaction between moral outrage and group, F(2.74, 79.57) = 5.34, p = .003, ηp 2 = .156; however, pairwise t-tests did not reach the Bonferroni adjusted alpha of 0.0167. Additionally we found a main effect of circumstances, F(1, 58) = 72.30, p < .001, ηp 2 = .555, and an interaction between circumstances and moral outrage, F(2, 116) = 3.99, p = .021, ηp 2 = .064. Problems with exculpatory circumstances were perceived as less severe than problems with control circumstances, yet this was especially the case for medium moral outrage offences, t(60) = 7.82, p < .001, d = 0.72 (for high moral outrage: t(60) = 6.82, p < .001, d = 0.70; for low moral outrage: t(60) = 5.77, p < .001, d = 0.51; Bonferroni adjusted: alpha 0.0167). All other effects were not significant (Fs < 1.66, ps > .200).

Table 5 Mean severity ratings and certainty ratings (and standard deviations) for problems with irrelevant control and relevant exculpatory circumstances in Experiment 2

Certainty ratings

Certainty ratings (lower part of Table 5) were analyzed using a 3 (moral outrage: high vs. medium vs. low) × 3 (group: laypeople – own sense of justice vs. laypeople – legal system vs. lawyers – legal system) × 2 (circumstances: exculpatory vs. control) mixed ANOVA. We found main effects of group, F(2, 58) = 25.08, p < .001, ηp 2 =.464, of circumstances, F(1, 58) = 69.80, p < .001, ηp 2 =.546, and of moral outrage, F(2, 116) = 8.36, p < .001, ηp 2 =.126, and interactions between circumstances and moral outrage, F(1.84, 106.41) = 16.30, p < .001, ηp 2 =.219, moral outrage and group, F(4, 116) = 2.94, p = .023, ηp 2 =.092, and group and circumstances, F(2, 58) = 4.99, p = .010, ηp 2 =.147. Participants were more certain about their decisions in cases of irrelevant than in cases of relevant exculpatory circumstances, primarily in cases of high moral outrage, t(60) = 10.60, p < .001, d = 1.53, followed by medium, t(60) = 4.64, p < .001, d = .0.67, and low, t(60) = 3.53, p = .001, d = .51 (Bonferroni adjusted alpha: 0.0167) moral outrage. In cases of high moral outrage, certainty ratings did not differ between laypeople and lawyers, F(2, 58) = 1.78, p = .117, ηp 2 = 0.058. However, they did in cases of medium, F(2, 58) = 8.24, p = .001, ηp 2 = 0.221, and low moral outrage, F(2, 58) = 11.596, p < .001, ηp 2 = .286. Laypeople in the own sense of justice and in the legal system group were less certain than lawyers in cases of medium (t(39) = 2.47, p = .018, d = 0.77; t(37.99) = 4.03, p < .001, d = 1.25; respectively, Bonferroni adjusted alpha: 0.025), and in cases of low moral outrage (t(29.04) = 4.20, p < .001, d = 1.29; t(39) = 3.75, p = .001, d = 1.17, respectively, Bonferroni adjusted alpha: 0.025). In cases of relevant exculpatory circumstances, laypeople in the own sense of justice group were more certain about their decisions than laypeople in the legal system group, t(34.94) = 2.95, p = .006, d = 0.93 (for irrelevant circumstances p > .370; Bonferroni adjusted alpha: 0.025). The three-way interaction was not significant, F(3.670, 106.41) = 0.61, p = .643.

Defeated conclusions (no-punishment)

Percentages of no-punishment conclusions are shown in Table 6. As participants hardly considered irrelevant control circumstances, we only conducted a 3 (moral outrage: high vs. medium vs. low) × 3 (group: laypeople – own sense of justice vs. laypeople – legal system vs. lawyers – legal system) mixed ANOVA for problems with relevant exculpatory circumstances. We found main effects of group, F(2, 58) = 12.70, p < .001, ηp 2 = .305, of moral outrage, F(1.80, 104.31) = 45.11, p < .001, ηp 2 = .437, and an interaction between both factors, F(3.60, 104.31) = 3.75, p = .009, ηp 2 = .114. For low moral outraging offences there were no differences between lawyers and both groups of laypeople, F(2, 58) = 2.63, p = .081, ηp 2 = .083. However, there were differences in cases of medium, F(2, 58) = 6.84, p = .002, ηp 2 = .191, and especially in cases of high, F(2, 58) = 18.91, p < .001, ηp 2 = .395, moral outrage. Laypeople in the own sense of justice group and in the legal system group made less no-punishment conclusions than lawyers in cases of medium moral outrage (t(34.31) = 2.83, p =.008, d = .89; and t(39) = 3.77, p = .001, d = 1.18; respectively) and even less no-punishment conclusions in cases of high moral outrage (t(39) = 5.55, p < .001, d = 1.73; and t(39) = 4.92, p < .001, d =.1.54; respectively). The two groups of laypeople did not differ from each other either in cases of high or in cases of medium moral outrage (ts < 0.595, ps > .555; Bonferroni adjusted alpha: 0.0167). Note that according to our last hypothesis, lawyers’ no-punishment decisions for high and low moral outrage offences did not differ significantly, t(20) = 1.60, p = .126, d = 0.47.

Table 6 Percentages (and standard deviations) for no-punishment conclusions in Experiment 2

Decision times

Decision times for punishment and no-punishment conclusions for problems with relevant exculpatory circumstances were analyzed in two separate 2 (conclusion: punishment vs. no-punishment) × 3 (moral outrage: high vs. medium vs. low) within-subjects ANOVAs – one for laypeople and one for lawyers (Fig. 2). Laypeople were analyzed as a single group because the two sets of instructions (own sense of justice and legal system) did not affect their punishment conclusions. Only participants from whom we had punishment and no-punishment conclusions in each moral outrage condition were considered in the analysis (25 laypeople and 12 lawyers). This was necessary to be able to make reliable within-subject comparisons. Due to technical problems, decision times of one participant were not included in the analysis. To control for different sentence lengths, we adjusted decision times by computing the latency per character for each sentence and multiplying it by the mean sentence length.

Fig. 2
figure 2

Decision times for punishment and no-punishment conclusions for laypeople and lawyers in Experiment 2, separated by the moral outrage (MO) evoked by the conditionals. Error bars represent standard errors

For laypeople, we found no main effects (Fs < 2.38, ps > .136), but a significant interaction between conclusion and moral outrage, F(1.63, 39.13) = 5.39, p = .013, ηp 2 = .183. As shown in Fig. 2, whereas laypeople’s decision times for punishment conclusions did not differ according to moral outrage, F(1.35, 32.44) = 0.96, p = .392, ηp 2 = .038, the decision times for their no-punishment conclusions did, F(1.21, 28,93) = 5.81, p = .018, ηp 2 =.195. Descriptively, in cases of no-punishment, decision times were longer for high than for medium moral outrage, t(24) = 2.22, p = .036, d = .26, and decision times of medium moral outrage were longer than those of low moral outrage , t(24) = 2.08, p = .048, d = .43. Even though the significance did not reach the Bonferroni adjusted alpha level of 0.025, the linear trend analysis was significant, F(1, 24) = 6.58, p = .017, ηp 2 = .215. This interaction between conclusion and moral outrage was not replicated for lawyers, F(2, 22) = 0.53, p = .596, ηp 2 = .046. Rather, there was only a main effect of decision, F(1, 11) = 5.23, p = .043, ηp 2 = .322, with lawyers taking overall more time to decide punishment than no-punishment. Also the main effect of moral outrage was not significant, F(1.35, 14.80) = 0.32, p = .647, ηp 2 = .028.

Discussion

The results of Experiment 2 show that laypeople’s decisions about exculpatory circumstances depend on how morally outrageous the offence in the legal conditional is. When the offence was of high moral outrage, laypeople seldom decided not to punish. Yet, when the offence was of low moral outrage, laypeople decided in the majority of the cases not to punish the offender. Consequently, laypeople’s punishment decisions did not differ from lawyers in cases of low moral outrage, but did in cases of medium and especially in cases of high moral outrage. This suggests that laypeople do not reject exculpatory circumstances because of the violation of a norm per se, but because the moral outrage evoked by the violation affects the way they deal with exculpatory circumstances. Likewise, since the different offences were paired with the very same exculpatory circumstances (absence of criminal responsibility due to psychological disorders, mistakes of law, and situations of necessity brought about by coercion), the different punishment decisions laypeople made cannot be attributed to not recognizing these circumstances as exculpatory. All in all, the fact that laypeople sometimes decided to punish in light of a given exculpatory circumstance and sometimes not shows that their consideration of exculpatory circumstances as counterexamples depended on moral outrage.

Our hypotheses are also supported by the decision times: the higher the moral outrage, the longer laypeople took to reach a no-punishment decision; this reflects the difficulty in deciding against moral outrage. Also, as shown descriptively in Fig. 2, when the offence was only of low moral outrage, decision times for not punishing were faster than for punishing. Cases of illegal gambling or obtaining benefits by devious means are not offences with a high moral necessity of punishment, so deciding in favor of punishment is almost counterintuitive and may consequently take longer. Accordingly, as can be seen in Table 6, laypeople also often chose to not punish the offender for low moral outrage offences with irrelevant circumstances. However, also in cases of medium moral outrage no-punishment decisions were somewhat faster than punishment decisions. Though they were chosen to evoke some amount of moral outrage, the severity ratings showed that these offences were not considered very severe (around 3.5 on a 7-point scale). Therefore these offences, too, were likely judged as not deserving strict punishment.

Lawyers decided about exculpatory circumstances as prescribed by the penal code. They were somewhat stricter in cases of high moral outrage, but this was probably only because of the legal principle of proportionality, but not primarily because of moral outrage. This interpretation is supported by the decision times, where no significant differences depending on moral outrage were found. In fact, lawyers were always faster in selecting a no-punishment conclusion, indicating that most of the exculpatory circumstances were recognized quickly and without bias. The high decision times for punishment conclusions indicate that when lawyers decided to incorrectly reject an exculpatory circumstance, this was a hard decision for them. However, because of the small sample size of people selecting punishment as well as no-punishment conclusions for all conditions, these results should be interpreted with caution.

The two different instructions (own sense of justice vs. legal system) we gave to laypeople did not affect their punishment conclusions. We expected a higher moral outrage effect for laypeople in the own sense of justice group than for laypeople in the legal system condition. Yet, the effect of moral outrage was found in both conditions. One possible explanation is that participants did not follow the instructions to decide on the basis of the regulations of the legal system. However, we do not think this was the case: laypeople given legal system instructions seemed to understand the perspective they were to take. On the one hand, they were less certain than laypeople in the own sense of justice group in deciding about problems with relevant exculpatory circumstances. On the other hand, laypeople assigned to the legal system group reported in an open-ended questionnaire at the end of the experiment that they followed the instructions and tried to reason like a real judge. Nevertheless, 65 % of them also said that this was a difficult task due to conflicts with their own sense of justice or that they were aware that their opinions and sense of morality still influenced their decisions. This indicates how deeply our morality and sense of justice is engrained in our beliefs about rules and how this affects people’s willingness to defeat the conclusion from a legal conditional rule.

Experiment 3

In the previous two experiments we always presented the potential counterexample to the rule (i.e., exculpatory circumstances) together with the conditional and the categorical statement. Thus, the participants were not instructed to think of counterexamples to a rule themselves. But how well can people themselves retrieve counterexamples to a legal rule from memory? And is the availability of exculpatory circumstances affected by the level of moral outrage evoked by an offence? Some researchers have already highlighted the importance of memory in accepting conditional rules (Chan & Chua, 1994; De Neys et al., 2003a; Markovits & Barrouillet, 2002; see also Markovits & Quinn, 2002), arguing that when people make a conditional inference, they search their memory for domain relevant information, e.g., counterexamples to the rule. The discovery of counterexamples in memory increases the probability of not accepting the conditional rule and triggers the denial of MP inferences (De Neys et al., 2003a). Along these lines, if the search for counterexamples in memory is essential to the application of conditional rules, then our previous experiments might indicate that the ability to recall valid counterexamples for legal rules varies between lawyers and laypeople. To test this, we changed our paradigm and asked participants to generate exculpatory circumstances in a paper- pencil task. Our hypothesis is that (1) lawyers know exculpatory circumstances from their law studies and should therefore be able to recall them independently of moral outrage, whereas (2) laypeople’s capacity to retrieve exculpatory circumstances depends on the moral outrage evoked by the offence: the higher the feelings of moral outrage, the more difficult it should be to retrieve an exculpatory circumstance. As the number of exculpatory circumstances in memory may be confounded with the familiarity of the domain, we also asked participants to generate aggravating circumstances and compared those with the number of exculpatory circumstances. We predicted that it should be more difficult for laypeople to think of examples of exculpatory circumstances than examples of aggravating circumstances, and this difficulty should vary with the moral outrage evoked by the offence. In contrast, lawyers’ amount of retrieved exculpatory circumstances should not depend on moral outrage.

Method

Participants

Participants were 20 lawyers (nine female) and 20 laypeople (13 female). One additional layperson also participated but was unable to complete the experiment and was removed from the data file. The mean age of lawyers was 25.4 years (SD = 1.96); the mean age of laypeople was 23 years (SD = 1.41; 5 missing values). Two participants from the lawyers’ group had finished their law studies. The rest were still at university and had studied for 9.2 semesters on average.

Material and design

We selected six offences from the German penal code: theft, coercion, bodily injury, abortion, manslaughter, and incest. These offences differ in their penalty range and were selected on the basis of the number of exculpatory and aggravating circumstances in the German penal code. We conducted an online study (N = 312; 224 female) to measure levels of moral outrage evoked by these offences. Participants rated their level of moral outrage on a 7-point Likert scale (1 = no moral outrage, 7 = great moral outrage). This online study showed that the offences evoke different levels of moral outrage: manslaughter (M = 6.54; SD = 0.83), bodily injury (M = 5.71; SD = 1.12), coercion (M = 5.15; SD = 1.24), theft (M = 4.33; SD = 1.30), incest (M = 4.31; SD = 1.89), and abortion (M = 2.55; SD = 1.72).

In the main study, offences were presented in a paper booklet consisting of two parts. One part asked for exculpatory and mitigating circumstances, and the other part asked for aggravating circumstances. The order of these parts was counterbalanced across participants. On each page there were two offences. The sequence of pairs of offences over all problems was randomized. We also asked for mitigating circumstances to guarantee that exculpatory circumstances were actually considered exculpatory and not just mitigating. All material was presented in German. We utilized a 3 (category: exculpatory vs. mitigating vs. aggravating) × 2 (group: laypeople vs. lawyers) design. However, as mitigating circumstances were only used to clarify the distinction between exculpatory and aggravating circumstances, these were not included in the analysis.

Procedure

The experiment was a paper and pencil experiment and participants were tested either in groups or individually. In the instructions, we explained the meaning of exculpatory, mitigating, and aggravating circumstances. Participants were instructed to write down all possible situations they would consider exculpatory, mitigating, or aggravating circumstances for a given offence. Exculpatory circumstances were described as circumstances which prevent punishment entirely, mitigating circumstances as those that lower a sentence, and aggravating circumstances as those that elevate a sentence. Participants were told that it was irrelevant whether the situations were regulated in the penal code. One sample problem was given to illustrate the tasks. There were no time restrictions. The experiment took about 45 minutes. All participants received monetary compensation for their participation.

Results

Two raters independently counted the number of situations generated for the different offences (Kendall’s tau =.967 for exculpatory circumstances; Kendall’s tau=.949 for aggravating circumstances). The mean number of these situations (i.e., counterexamples) was analyzed using a 2 (circumstances: exculpatory vs. aggravating) × 2 (group: laypeople vs. lawyers) mixed ANOVA. We found main effects of group, F(1, 38) = 14.03, p=.001 ηp 2 = .270, of circumstances, F(1, 38) = 6.43, p = .015, ηp 2 = .145, and an interaction between group and circumstances, F(1,38) = 9.28, p = .004, ηp 2 =.196. Laypeople (M = 3.56; SD = 1.42) and lawyers (M = 4.38; SD = 1.56) did not differ in the amount of aggravating circumstances generated, t(38) = 1.73, p = .092, d = 0.55, but laypeople generated significantly fewer exculpatory circumstances (M = 1.60; SD = 0.75) than lawyers (M = 4.56; SD = 3.17), t(21.133) = 4.06, p = .001, d = .1.28. Moreover, lawyers did not generate different amounts of exculpatory and aggravating circumstances, t(19) = 0.29, p = .778, d = 0.06, whereas laypeople listed twice as many aggravating than exculpatory circumstances, t(19) = 6.15, p < .001, d = 1.67 (Bonferroni adjusted alpha: 0.013).

To test whether the difference in number of exculpatory and aggravating circumstances was related to moral outrage, we computed the difference between the amount of aggravating and amount of exculpatory circumstances for each offence. As expected, we found such an effect: the higher the moral outrage evoked by an offence, the fewer exculpatory (compared to aggravating) circumstances laypeople generated (Fig. 3), with the following trend: manslaughter > bodily injury > coercion > theft > incest > abortion. This rank order was corroborated by Page’s trend test, Page’s L = 1628, p < .01, and resembles the moral outrage ratings from the online study for the different offences. Lawyers did not show this trend (although Page’s trend was still significant, Page’s L = 1545.5, p < .05, but as can be seen in Fig. 3, the pattern among offences was not clear for this group and did not resemble that of laypeople at all).

Fig. 3
figure 3

Mean differences between the amount of aggravating and exculpatory circumstances per offence in Experiment 3. Error bars represent standard errors

Discussion

The results of this experiment suggest that lawyers and laypeople have different mental representations of exculpatory circumstances. Whereas lawyers easily generated exculpatory and aggravating circumstances, laypeople had difficulties in thinking of exculpatory circumstances, especially for offences of high moral outrage. This shows that the effect of moral outrage is not limited to our defeasible reasoning problems; it also affects retrieval of exculpatory circumstances from memory. However, one can still argue that the difficulty in retrieving exculpatory circumstances for specific offences does not indicate that they are not stored in memory. Exculpatory circumstances may be stored in memory, but not used or retrieved because they are not in accordance with the person’s moral values. This explanation is plausible and might also apply for other experiments where participants are asked to produce counterexamples (e.g., Cummins, 1995; Cummins et al., 1991; De Neys et al., 2003b). However, when investigating how people reason with legal conditionals, only the counterexamples which are retrieved as such are important for the inference process. The results of Experiment 2 show that, when instructed to act like a real judge, laypeople still decide according to moral outrage, which in turn indicates that even if there are some counterexamples stored somewhere in memory, they are rarely considered and therefore have no observable effect on reasoning.

Despite these correspondences between the amount of exculpatory circumstances retrieved from memory and the moral outrage of an offence, the relationship is only correlational. We cannot know whether the difficulty in generating exculpatory circumstances is caused by the moral outrage evoked by the offence or whether the moral outrage evoked by an offence is caused by a small number of exculpatory circumstances stored in memory. Although we are not yet able to clarify this here, we believe that the influence is bidirectional: if not finding many exculpatory circumstances leads participants to classify an offence as highly morally outrageous, then this assessment will in turn hinder them when searching for other possible exculpatory circumstances.

General discussion

In this article we showed that lawyers and laypeople defeat conclusions from legal conditionals in light of exculpatory circumstances, but to a different degree. Lawyers seem to weigh circumstance information according to what is prescribed by the penal code, but laypeople seem to base their decisions on their own sense of justice, guided by feelings of moral outrage. Because of that, laypeople had difficulties in accepting exculpatory circumstances when the offence was of high moral relevance, adhering therefore more strongly to an initial conditional rule than lawyers. Consequently, compared to lawyers, laypeople had difficulties in withdrawing a logically (and perhaps morally) valid conclusion, even when instructed to decide like an actual judge. In Experiment 3 we found evidence that this difficulty seems to arise from an incapability to retrieve exculpatory circumstances for morally outrageous offences from memory.

Our results have several theoretical as well as practical implications. First, our studies show that accepting a given fact as a counterexample to a rule is not a trivial task. The acceptance of counterexamples depends on a person’s domain knowledge and on how emotionally attached a person is to the initial conditional rule. Several studies have already acknowledged the importance of domain knowledge in reasoning by showing that people with and without domain knowledge draw different conclusions (Chan & Chua, 1994, Cummins, 1995; Markovits, 1986). However, little is said about what people without elaborated domain knowledge actually do during reasoning. While Chan and Chua (1994) proposed that people without domain knowledge only have “simple and ill-defined” schemas (p. 234) about the domain in question, we show that in legal reasoning these ill-defined schemas depend on one’s own sense of justice and are thus influenced by the emotional adherence to the conditional rule. In this way, our results suggest that moral outrage affects the perceived necessity of the antecedent (i.e., the offence) for its consequent (i.e., the punishment). Our results can be thus integrated in the broader framework of research on the role of emotions in judgements (e.g., Bechara, Damasio, Tranel, & Damasio, 1997; Schwarz & Clore, 1983, 2013).

However, although the results of our experiments show that laypeople’s decisions depend on how morally outraging the initial offence is, further studies are necessary to show if moral outrage affected all decisions in the same way. We started this paper assuming that laypeople do not have elaborated knowledge about penal code. This is probably true; however this does not mean that they do not have any knowledge at all. It might be that the effect of moral outrage is stronger the more uncertain participants are. It might be even possible that for some problems participants think they know for sure what the “correct” answer is – leaving unclear whether in those cases moral outrage affects conclusions at all. Although the effect of moral outrage might indeed depend on the level of uncertainty, we do not think that there are instances for laypeople where it does not play a role at all. Even if cases of complete certainty did exist, we think moral outrage would still influence conclusions, e.g., by making those decisions more difficult or easy. Already the participants in the legal system condition in Experiment 2 stated that it was difficult to decide contrary to their own sense of justice, suggesting that there are cases where they know that law and own moral preferences are in conflict. In fact, Experiment 2 shows that laypeople actually knew that the circumstances we presented are exculpatory, deciding not to punish in cases of low and medium moral outrage. However, when the offence was of high moral outrage they did not accept them anymore, probably because of conflicts with moral outrage.

One might also argue that it was not moral outrage but severity that affected laypeople’s punishment decisions, especially because the moral outrage values we used were based on the results of a huge online study rather than measured directly on the participants who completed the experiments in the laboratory (which we did to reduce demand characteristics). In theory, severity and moral outrage are not the same, because there are actions that can be categorized as not severe because they are not harmful (such as eating one’s own dead pet dog) although they evoke negative emotions (see Haidt, 2001). However, experiments have shown that severity perception primarily depends on the moral wrongfulness of an act: the more severe people perceive an offence to be, the more morally wrong the offence is, and the higher the moral outrage participants feel (Alter et al., 2007; Darley, 2009; Gromet & Darley, 2009; see also Young et al., 2007). Aspects such as harmfulness (as in the pet dog example) are actually not that relevant for severity perception (e.g., Alter et al., 2007; Carlsmith et al., 2002; see also Keller et al., 2010). For instance, in one study Alter et al. (2007) created offences where moral wrongfulness and harmfulness were disentangled and showed that it was moral wrongfulness that predicted desired sentence severity. Similarly, Carlsmith et al. (2002) and also Darley et al. (2000) showed in path models that the effect of seriousness of offences on punishment is mediated by moral outrage. Yet, some variance was still explained directly by severity. In this way we can conclude that although the majority of the literature supports the interpretation that moral outrage is responsible for our results, there might also be an effect of severity or harm which needs further investigation. Further studies measuring physiological markers might be useful.

Second, our results support the argument that the division between reasoning and decision making is artificial (Evans, 2012). The effect of moral outrage on laypeople’s acceptance of exculpatory circumstances (and corresponding retraction of the MP conclusion) can be interpreted as a utility-maximizing strategy. Bonnefon (Bonnefon, 2009, 2012; Bonnefon, Girotto, & Legrenzi, 2012; Bonnefon & Hilton, 2004; Bonnefon & Sloman, 2013) already showed that people consider utilities when predicting whether an action described in the antecedent of a conditional will be performed (e.g., “If you turn the radio on one more time, then I will hit you”). In our data, laypeople’s reluctance to accept exculpatory circumstances suggests that for laypeople – probably because of moral outrage – the utility of punishing somebody is greater than the utility of acquitting somebody of an offence. This utility-based explanation would account for why laypeople decide to rely on the legal conditional rule (Experiments 1 and 2) even when they can actually think of at least one exculpatory circumstance (Experiment 3). However, utilities cannot account for the conclusions made by lawyers, who actually know which information invalidates which conclusion without the aid of computing utilities. Lawyers – contrary to laypeople – probably decide about exculpatory circumstances in a more or less deductive emotion-free way, thus allowing an interpretation of our results in light of dual process theories, which have been proposed as accounts of reasoning in general (Evans, 2003; Kahnemann, 2011; Verschueren Schaeken, & d’Ydewalle, 2005) and also more specifically, as accounts of moral judgments (Greene, Sommerville, Nystrom, Darley, & Cohen, 2001; Haidt, 2001).

Interpreting our results in light of dual process theories in moral judgment is tempting, yet further studies are necessary. On the one hand it is still necessary to test if laypeople’s and lawyer’s decisions are the results of different cognitive processes. Our results show that laypeople’s decisions may depend on emotions, but lawyer’s not. However, this does not guarantee that they really have a different kind of reasoning. It is still possible that both reason similarly and that laypeople are only biased by their feelings of moral outrage. On the other hand, further studies are necessary to test the role of emotions for lawyers. Lawyers’ decisions did presumably not depend on emotions; however, it is not clear whether lawyers do not react emotionally to offences at all, or if they are simply able to control their emotions and inhibit their effect. Studies including physiological data may be necessary to understand the process of what is often referred to as “thinking like lawyers” (Goodenough, 2001, p. 41).

In this respect, one could criticize our studies for the fact that our lawyer group consisted mainly of advanced law students, or recently graduated lawyers. Although this is true, we do not think that this has important consequences for our interpretations. Our problems were quite simple from a legal point of view, and the only challenge was to recognize a certain exculpatory circumstance correctly. Law students in Germany learn to do this in the middle of their studies, in such a way that the advanced law students we tested are supposed to be familiar with the relevant details of the penal code. In fact, more seasoned lawyers normally specialize in a particular legal domain (e.g., civil law), and may not remember all the details of penal code. One exception would be criminal court judges, yet convincing them to participate in our study would be too difficult. Furthermore, an initial comparison of our advanced law students and the already graduated students did not show relevant significant differences. However, what we could do in further studies is to test lawyers during their first semesters of studies. We can imagine that the effect of moral outrage on punishment decisions decreases especially in the first years of studies, which could certainly improve our interpretations.

Finally, our results also have implications for society. The reported experiments show that investigating reasoning with legal conditionals is of interest beyond the rather abstract investigation of defeasible reasoning. Our results help to understand why people often are annoyed when they hear about offenders released on parole or when they hear that offenders “only” get a hospital treatment order – when offences are of high moral outrage, laypeople have difficulties in accepting exculpatory circumstances. This difficulty does not necessarily mean that laypeople do not know that a specific circumstance may be exculpatory. Our results rather suggest that if considering exculpatory circumstances means acting against one’s own sense of justice, people disagree with them.

The implications of laypeople rejecting situations that the penal code labels as exculpatory are problematic. Although our problems were fictitious and had limited external validity, they show that laypeople perceive offences differently than the legal system does. Darley (2001, 2009) previously discussed the negative consequences such discrepancies can have for society. We agree with Darley that people might be willing to follow and respect a legal rule only if they perceive it as just and right. Therefore, if the discrepancies observed in the present study are also found in practice, it may be interesting to interpret our findings in light of the philosophical debate about the importance of moral correctness in law (see Alexy, 2008).