Introduction

When a litigant needs to prove that a certain result was caused in a specific way, what could make a stronger case than presenting the infinitesimal probability of that result emanating from an alternative cause? When Sally Clark was put on trial in England for the murder of her two babies, the prosecution called upon Professor Sir Roy Meadow, a leading British paediatrician of the time, who testified that the probability of two cases of Sudden Infant Death Syndrome (SIDS, or ‘cot death’) occurring in a family similar to Clark’s was just 1 in 73 million.Footnote 1 The prosecution attempted to use this low probability to support the claim that the cause of death in both cases was their mother’s conduct. The use of evidence of this sort is not unique to the English legal system. When John Veysey was tried in the United States for insurance fraud after he claimed, for the fourth time, that his house had burnt down, an actuary testified that the probability that four houses belonging to the same person would accidentally be burnt to the ground in succession was 1 in 1.773 trillion.Footnote 2 Statistical evidence is used not only in Common Law but also in Continental jurisdictions.Footnote 3 In the Netherlands, Lucia De Berk was accused of murdering seven babies and attempting to murder three more in the hospital department where she worked as a nurse. An expert witness for the prosecution testified that the probability of so many natural deaths coinciding with a particular nurse’s shift pattern was 1 in 342 million.Footnote 4 It seems that adducing statistical evidence that establishes the infinitesimal probability of alternative explanations is an effective way for the prosecution to convince the fact-finder that the results in question (here, the infants’ deaths or the house fires) were caused by the accused.

The use of such evidence has attracted criticism both in the academic literature and in public opinion, but these objections have mainly focused on the technical mistakes made when using this type of evidence, as well as the cognitive biases it often induces. For example, before Clark’s conviction was quashed, the statistical calculation was described by Professor Philip Dawid, forensic statistician and expert witness on behalf of the defence, as ‘extremely dubious’.Footnote 5 Following Clark’s release, the wrongful conviction was predominantly attributed to this calculation,Footnote 6 both by expert statisticiansFootnote 7 and the media.Footnote 8 More generally, it has been argued that the use of statistical evidence by courts carries the risk of misusing statistics,Footnote 9 misrepresenting the information the evidence conveys,Footnote 10 and creating a false impression of accuracy.Footnote 11

However, the problem cannot be rooted in the statistical nature of the evidence, because there are cases in which some statistical evidence seems unobjectionable (such as DNA evidence or calculating compensation for loss of income using statistics on average life expectancy).Footnote 12 While the difficulties of using statistics in court are genuine, they are technical and may be addressed through better education of the legal profession and/or reliance on adequately trained expert statisticians. Even when statistical evidence is gathered, analysed, and presented in a professional and reliable manner, the question remains: should a litigant be allowed to support their contention that the result was caused in a specific manner with evidence that the probability of an alternative cause is minute? This question is equally applicable to both juries and professional judges.

In this paper, I first argue that the contention that a result was due to a certain cause (let us call it C) should remain unaffected by statistical evidence of the extremely low probability of an alternative cause (C*) alone. The paper explains why inferring that a given outcome was the result of a certain cause (C) from the frequency of C’s occurrence requires this frequency to be contrasted, at least implicitly, with that of the result emanating from a different potential cause, C*. In the second half of the paper, I contend that performing this calculation is problematic as a matter of principle, irrespective of its practical difficulties. While I share the view that the various accounts seek to justify why statistical evidence on the probability of criminal activity should not be admissible in court have been unsuccessful,Footnote 13 I do not defend this view here.Footnote 14 Instead, I seek to propose an alternative account based on a general theory that I have developed elsewhere,Footnote 15 according to which the use of generalisations in legal fact-finding is connected to the issue of free will. When both parts of the argument are combined, the following conclusion emerges: if one piece of statistical evidence (on the minute frequency of an alternative cause) is not probative unless contrasted with another piece of statistical evidence (on the frequency of the alleged criminal conduct), and if the latter is inadmissible (because its use is objectionable as a matter of principle), then both pieces of statistical evidence should be inadmissible. It follows that using statistical evidence on the low probability of an alternative cause is objectionable, regardless of how reliable the statistical analysis is.

Contrastive Causation

This section utilises Jonathan Schaffer’s theory of contrastive causation from the philosophy of science,Footnote 16 because this theory makes the contrastive aspect of causal claims explicit and integral to the understanding of causation.Footnote 17 Schaffer claims that causation is not a connection between two relata, cause and effect, as is customarily thought. Rather, causation effectively ties four relata: actual cause, contrastive cause, actual effect, and contrastive effect. According to this theory, instead of formulating the explanation as ‘cause C caused effect E’, the full formulation would be ‘cause C and not cause C* caused effect E and not effect E*’. C* (and E*) could also consist of a disjunction of several causes (and effects), C*1, C*2, C*3, and so on.Footnote 18

The Israeli Russo-Lupo case illustrates how the theory of contrastive causation can elucidate causal disputes in court.Footnote 19 Dr Russo-Lupo was convicted of manslaughter and sentenced to eight years in prison. While working as an anaesthetist during eye surgery on a three-and-a-half-year-old girl, the doctor lowered the volume of the heart monitor below routine levels, accidentally administered an excessively high dose of anaesthetic for a prolonged period, and then took a nap nearby.Footnote 20 About 40 minutes after the anaesthesia was administered, the girl’s heart stopped; but, with the volume turned down, the monitor’s alarm could not be heard. Sometime later, the surgical nurse in the operating room noticed the visual prompt on the monitor and resuscitation efforts began. Unfortunately, the girl died three days later.

Dr Russo-Lupo argued that there was no factual causal connection between her lowering the monitor’s volume and the girl’s death, claiming that, even if she had left the volume at the normal level, it would not have helped save the girl’s life. The Israeli Supreme Court upheld her conviction and rejected her claim. Formulating Dr Russo-Lupo’s argument and the court’s response using the theory of contrastive causation highlights that the dispute centres on the relevant contrast between the actual cause and the contrastive cause. Specifically, is the relevant contrast between ‘lowering the volume’ and ‘not lowering the volume’ (as the defence contended), or between ‘going to sleep’ and ‘not going to sleep’ (as the court determined)? The court accepted Russo-Lupo’s claim that the prosecution failed to prove that ‘quietening’ as opposed to ‘not quietening the monitor’ caused ‘the girl’s death’ and not ‘saving the girl’s life’,Footnote 21 because, even if the doctor had left the monitor’s sound level untouched, it would not have helped her to notice the girl’s distress (because she went to sleep). However, the court determined that the fact that Dr Russo-Lupo went to sleep, as opposed to staying awake, caused the girl’s death and not saving her life. The court explained that, if Dr Russo-Lupo had been awake, she could have saved the girl’s life in one of the following two ways: either by noticing the slowing pulse based on the beeping of the monitor (despite the lowered volume) or through its visual display. Either could have prompted her to prevent the cardiac arrest and save the girl’s life.

The contrastive cause chosen by the court is the one relevant to the legal context of attributing criminal responsibility. Even if some of Dr Russo-Lupo’s actions did not cause the victim’s death, it suffices that one of her actions did. This example illustrates that when an accused’s conduct comprises more than one aspect that violates the criminal prohibition, the prosecution may choose on which aspect to focus. In other words, it is the prosecution rather than the accused that chooses the contrastive cause in the contrastive causal claim that ought to be proven.

Proving Causation

Now that the structure of causal claims has been outlined, this Section proceeds to discuss how a causal claim may be proven or disproven and shows the implications of the theory of contrastive causation for this issue. While the academic literature has discussed the legal implications of the theory of contrastive causation, that discussion has focused on substantive law (particularly Tort Law).Footnote 22 This Section extends this discussion to procedural law, by exploring how causal claims can be substantiated or disproved using statistical evidence.

How Frequencies Support a Causal Claim

Consider this simple example: an engine of some sort located in a factory has stopped working and the factory engineer has been called in to identify the cause of the malfunction. The engineer examines the engine and concludes that the cause is a worn-out strap. How did the engineer reach this conclusion? First, he directly observed the worn strap, which gave rise to the conjecture that it was the cause of malfunction. In addition, he assessed this conjecture using the available ‘background information’, namely his experience and professional expertise, and reached the conclusion that this information supported his conjecture. Notably, the inference made from background information to draw a conclusion about the cause of the malfunction in a specific case always requires a causal generalisation. In general terms, a causal generalisation is a proposition that applies to a certain set of cases, with a combination of common attributes or factors that are either present or missing in all of the cases in the set, which identifies a cause within the group that leads to a certain effect. This generalisation can be either deterministic (applying to every one of the cases in the set) or probabilistic (applying only to a certain percentage of the cases in the set).

According to the theory of contrastive causation, the conclusion reached by the engineer is contrastive in nature. The claim that the worn strap caused the engine’s malfunction includes an implicit contrast of some sort (for example, that it was a worn strap rather than a control card problem that caused the engine to stop working). In some cases, the engineer’s decision to change the worn strap is not based on concrete physical findings that exclude alternative causes, but rather on heuristics or rules of thumb which enable them to home in on a certain cause, even if they lack concrete findings that exclude alternative causes.Footnote 23 Assume that the engineer’s examination did not present any findings that could rule out a control card problem as the cause of the malfunction.Footnote 24 However, based on their longstanding experience and professional expertise, they know that a substantial proportion of engine malfunctions are due to a worn strap and that replacing the strap would usually fix the problem. In such a case, the engineer may depend on this rule of thumb and decide to replace the strap.

The use of a rule of thumb is based on contrastive causal assumptions. The rule of thumb is based on professional knowledge and prior experience that reflect a set of previous similar cases. In every previous case in which replacing the strap fixed the problem, the following causal assumptions turned out to be true: it was the wear-and-tear of the strap, and not a control card problem, a cooling system problem, etc., that caused the engine to malfunction. The rule of thumb effectively summarises and reflects these causal assumptions. It is important to note that by replacing the strap, the engineer was assuming (at least at that time) that the worn strap was the sole cause of the malfunction. By using the rule of thumb, the engineer assumes a set of contrastive causal claims in accordance with the possible causes that turned out to be contrastive, rather than actual, in previous cases in which replacing the worn strap solved the problem. Although the rule of thumb points to a specific cause without explicitly stating the contrastive causes that were absent in previous cases that are effectively ruled out, the very use of a rule of thumb implicitly rules out these contrastive causes.

The high frequency of cases in which a worn strap caused the engine to stop working supports the conclusion that it is the strap rather than the control card that has to be replaced only because the frequency of cases in which the control card was the actual cause is lower. As discussed above, inferring from previous cases relies on a causal generalisation according to which, when the strap is worn, it is, in fact, the worn strap and not a control card problem that causes the machine to stop. The previous cases substantiate this causal claim because, in a significant proportion of previous cases, replacing the strap caused the engine to immediately start working again, without having to change the control card. The point to be gleaned from this example is that the previous cases serve as adequate evidence to establish the actual cause – not because of the high frequency of cases in which the worn strap caused the machine to stop working, but rather because this frequency is high in comparison to the frequency of cases in which a faulty control card caused the engine to stop working.

Notably, the exact same evidence might also support the opposite causal claim, according to which the cause of malfunction was not the worn strap.Footnote 25 This might happen when the (implicit) contrast includes several alternative causes, each having a lower frequency than the worn strap, but the frequency of these alternative causes combined is higher than that of the worn strap (for example, if a worn strap causes the malfunction in 40% of the cases, but a control card problem and a cooling system problem cause it in 30% of the cases each). This qualification highlights, once again, the importance of identifying the exact contrastive causal claim before evaluating whether the evidence supports it.

In the same way that the high frequency of a certain cause can only help identify the cause in the case at hand by contrasting it with the lower frequency of an alternative cause, so, too, the low frequency of a contrastive cause can only be used to rule it out as an actual cause in the case at hand by comparing it to alternative possibilities with a higher frequency. The evidential role of the low frequency of a contrastive cause depends on the contrastive causal claim that the low frequency is meant to substantiate or rule out. For example, the low frequency of a control card problem can help the engineer narrow their search to the strap, because the latter is a more frequent cause for the machine to stop than the control card. That being said, the low frequency of a control card problem would not necessarily lead to the conclusion that the cooling system should be examined. If the cooling system is more reliable than the control card, and the frequency of cooling system malfunctions is lower than that of control card problems, the low frequency of control card problems (compared with worn straps) would not lead to the conclusion that the cooling system should be examined. Quite the contrary: since the frequency of a control card problem, low as it may be, is still higher than that of a cooling system malfunction, it stands as evidence for the conclusion that the engineer should examine the control card rather than the cooling system.

The main contention of the discussion thus far is that the frequency of a possible cause, be it high or low, is meaningless, in and of itself. It is insufficient to use the frequency of a potential cause to substantiate or rule out the causal claim that, in this specific case, it was this potential cause that actually caused the effect – it is necessary to compare the frequency of the potential cause with the frequency of the contrastive cause. When attempting to establish an actual cause based on the high frequency of this cause in cases similar to the case at hand, one needs to compare this frequency to the (lower) frequency of a contrastive cause within the same set of cases. Similarly, when attempting to rule out a cause, one needs to show its low frequency in comparison to the (higher) frequency of the contrastive cause within the same set of cases. Such a comparison is often implicit and unspecific, making it difficult to identify the contrastive cause and the probability assigned to it. For example, stating the high frequency of cases in which the malfunction was caused by a worn strap (90%, for instance) implicitly contrasts this frequency with the low frequency of cases in which the malfunction was a result of (an)other known or unknown potential cause(s) (10%, for instance).Footnote 26 To assess the support that the evidence of high or low frequency provides to the affirmation or negation of a causal contention, it is important to identify the contrastive causal claim the frequency is meant to substantiate, as well as the ratio between the frequency of the contended cause and the contrastive cause(s). Without such a comparison, presenting the frequency (which is meaningless in and of itself) might mislead the fact-finder and lead to the opposite conclusion to that which a careful examination of the previous cases would have led.

Use of Frequencies as Evidence by the Prosecution

To secure a conviction in murder cases, the prosecution has to prove beyond a reasonable doubt that the victim’s death resulted from the accused’s conduct. According to the theory of contrastive causation, proving this requires specific contrasting causal claims to be substantiated – that is, it was the accused’s actions and not the actions of another person (or the victim’s illness, for instance) that caused the death (assuming, for the sake of simplicity, that only one of them could have caused it).Footnote 27 One might think that, to show that the actual cause of the victim’s death was the accused’s conduct, the prosecution would have to prove all the possible contrastive statements. However, in practice, this sort of proof is unnecessary: many contrastive statements are ruled out by virtue of the accused’s not disputing certain facts and assumptions. For example, in Clark, both babies died while only Sally Clark was in the room with them, hence the need to prove that the death of the babies was not caused by another person did not arise. The prosecution’s challenge was to prove that the death of the babies was not caused by some natural cause. Since the autopsy of both bodies did not indicate any specific natural cause, the only remaining option was natural death of an unknown cause, which is the exact definition of SIDS: a broad category that covers all cases in which an autopsy does not reveal a concrete cause of death (natural or otherwise).Footnote 28

To prove the causal statement by which the babies’ death was caused by their mother’s conduct rather than by a natural cause, the prosecution sought to show how low the probability of both babies dying of unknown natural causes actually was. To do so, the prosecution tried to emphasise the minute frequency of the contrastive cause (SIDS) occurring among families similar to that of the Clarks. One of the key expert witnesses for the prosecution, the aforementioned Professor Meadow, relied on an epidemiological study of the time, which examined the frequency of SIDS in families like that of the Clarks (‘a family in which the parents do not smoke, in which at least one has a waged income and in which the mother is over the age of 26 years’)Footnote 29 and argued that the probability of SIDS in such families was 1 in 8,543.Footnote 30 Based on this probability, he inferred that the probability of two cases of SIDS in the same family would be 1 in 73 million (8,543 multiplied by 8,543). As described earlier, this calculation was widely criticised, and quite rightly, because of Meadow’s assumption that both occurrences were not interdependent, despite good reasons to believe that a family that experienced one such death would be at a heightened risk of another such death, compared to other families.Footnote 31

However, criticism of this sort, important as it may be, misses the main issue. Even if an accurate calculation had yielded a different probability, the problem raised here is that the figure was useless, in and of itself, as was the low frequency of a control card problem in the previous example. The point is not that the fact-finder should refrain from assessing the plausibility of the alternative cause offered by (or attributed to) the defence. Rather, the point is that, on its own, this impressively-low frequency does not help the fact-finder do their job. Regardless of whether the probability is 1 in 73 million, 1 in a million, or 1 in 73,000, the low frequency of a specific cause within a set of similar cases is irrelevant, taken in isolation, to the contrastive causal claim by which a different cause led to the actual effect. This is because, if the frequency of the other cause among the same similar cases were lower, the logical conclusion would be the exact opposite.

In a contrastive causal claim, the evidential role of a low frequency of a contrastive cause among similar cases depends on the frequency of the actual cause in the same similar cases. In Clark, the significance of the low frequency of two SIDS cases in similar families depends on the frequency of two murders of babies carried out by their mother. Here, too, this frequency is very hard to calculate. The expert witness for the defence, Professor Dawid, proposed that an initial calculation would be 1 in 2 billion.Footnote 32 However, he did stress that the calculation was for illustration purposes only and that it was subject to the same objections that apply to Meadow’s calculation.Footnote 33 The important point is that, if the frequency of double murder is lower than the frequency of two occurrences of SIDS in the same family, the statistics by which the probability of natural deaths is 1 in 73 million would serve as evidence against the prosecution’s claim that the cause of deaths was Clark’s conduct rather than SIDS.

More importantly, even if the frequency of murder were higher than two SIDS cases, one should not get caught up on the absolute figure, but rather assess the ratio between the frequency of two SIDS cases and the frequency of double murder of two babies by their mother.Footnote 34 Assume, for example, that the frequency of a double murder is only 1 in 15 million. Supposedly, this figure supports the prosecution’s claim of a higher probability of murder as the cause of death, because the frequency of this kind of murder is significantly higher than that of two successive natural deaths. However, the important figure in assessing this contrastive causal statement is the ratio between the two frequencies, which is approximately 1 in 5. This figure is much smaller than the figure presented by Meadow and is significantly less impressive. Focusing on the frequency of 1 in 73 million can thus be misleading, because it might give the mistaken impression that the probability that the death resulted from an unknown natural cause rather than from Sally Clark’s action is extremely low. This contrastive causal claim is not supported by the low frequency of SIDS in families similar to Clark’s. The widespread attention given to the calculation of the 1 in 73 million figure is thus distracting, because it assumes that the figure is important on its own, despite the fact that only the ratio between the frequencies is relevant.

A similar argument would apply to the other cases discussed in the Introduction. In the case of the nurse, Lucia De Berk, if the frequency of seven natural deaths during the same nurse’s shifts were higher than the frequency of nurses who murder seven babies under their care, the conclusion would be that the frequency presented in her trial actually supports her exoneration, even if this frequency is extremely low (1 in 342 million). Insofar as frequencies can substantiate contrastive causal claims, the lower frequency of murder, as opposed to natural death, should support De Berk’s claim of innocence. The same applies to John Veysey’s case: Judge Posner identified mistakes in the actuary’s calculations similar to those seen in Clark and De Berk.Footnote 35 However, the problem that was not discussed by Posner or Veysey’s defence team is that the figure presented by the actuary of 1 in 1.733 trillion is meaningless, in and of itself, because it must be contrasted with the frequency of four successive arson cases. Again, whatever the exact frequency of the contrastive cause is, the figure that matters is the ratio between the frequencies of the actual and the contrastive cause.

The same point equally applies to statistical evidence adduced by the defence. If Veysey’s defence were to adduce evidence showing that committing four successive arsons is very rare (say 1 in 1 million), such evidence would not support on its own the exculpatory cause of accidental fires (or any other exculpatory cause). Once the defence’s evidence is contrasted with that of the prosecution, it immediately backfires because it is much likelier (in fact, 1.773 million times likelier) that the fires were caused by the accused rather than in accidental circumstances. Considering the defence’s evidence on its own might mislead the fact-finder into thinking that the low probability of an inculpatory cause (1 in 1 million) works in favour of the defence – whereas, in fact, when properly contrasted, it is strong evidence for the prosecution.

Consequently, using statistical evidence receives no support from the claim that it is better to provide the fact-finder with hard evidence than to leave them to speculate on what this frequency is. As noted earlier, knowing the frequency of the alternative cause, on its own, does not help the fact-finder at all in assessing the defence’s claims – irrespective of how the fact-finder arrives at this frequency (using statistical evidence or wild speculation). If it could be shown that fact-finders, be they judges or jurors, venture into useless speculations about an irrelevant frequency, then this problem should be addressed by endeavouring to prevent such inferences rather than by luring fact-finders into this fallacy by providing them with apparently ‘probative’ evidence. Therefore, even in the absence of statistical evidence on the alternative cause, it is unhelpful and potentially misleading to consider this evidence on its own.

One might accept that the frequency of the alternative cause is meaningless on its own, but object that it is not necessary to contrast it to evidence of the same form (the frequency of the alleged criminal conduct in the same population). Instead, one might argue that the fact-finder should first assess the non-statistical evidence (assuming such evidence is available), establish the probability that the cause was the accused’s conduct, and then compare this probability to the frequency of the alternative cause.

However, this method is flawed because it compares between probabilities that refer to different groups (such groups are also known as ‘reference classes’).Footnote 36 If the probability that is calculated based on the non-statistical evidence were to be translated into a frequency, it would refer to the frequency of two murders within the group of families whose two babies are known to have suffered from all the symptoms established by the non-statistical evidence (bleeding in the lungs, a torn frenulum and bruises to the body). By contrast, the SIDS frequency refers to a group of families that are similar to the Clarks in some respects (professional, non-smoking mothers over the age of 26), but not in one important respect in particular: in these families, it is unknown whether any of their babies will exhibit such symptoms. Yet, what the fact-finder needs to know is how cases in which two babies have exhibited such symptoms are divided between SIDS and murder (assuming that other potential causes of death were ruled out). To calculate this ratio, it is necessary to compare the frequency of two SIDS cases to that of two murders within the same group. Consequently, the frequency of the alternative cause, which refers to one group, cannot be contrasted to the probability of guilt estimated on the basis of the non-statistical evidence, which refers to a different group.

Freedom, Causation, and Crime Incidence

Apparently, the conclusion to be drawn from the previous Section is that there is no principled objection to using the minute frequency of a contrastive cause (SIDS), as long as this is compared with the frequency of the actual cause alleged by the prosecution (murder). However, in this Section, I argue that frequencies should not be used in court – not (only) because of technical or calculation considerations, but (also) because there is a principled problem with using statistical evidence regarding the frequency of the alleged criminal conduct among people similar to the accused. In previous work,Footnote 37 I have suggested the culpability account, according to which predictive evidence supports the prosecution’s claim that the accused committed the alleged crime only if the accused’s conduct was determined by a certain causal factor that rendered their conduct unfree. Yet, in the context of attributing culpability, it is necessary to presuppose the exact opposite: that the accused was free to determine their own conduct. Using some types of generalisation to determine culpability is objectionable, because it involves contradicting presuppositions about the individual’s conduct.

The culpability account first argues that inferences about human conduct, drawn for either prediction or conviction purposes, require reliance on causal generalisations—that is, generalisations that reflect a causal connection between the type of fact from which the inference begins and the type of fact the inference seeks to establish. If an inference is based on a non-causal generalisation, a mere correlation, it is unlicensed and thus invalid.Footnote 38 The causal relation can operate either directly or through a common cause. Inferring that a smoker is likelier to contract cancer than a non-smoker is based on a causal generalisation that smoking is a cause of (lung) cancer. By contrast, inferring that a Coca-Cola drinker is likelier to contract cancer than a non-drinker involves a causal generalisation that reflects a common cause. It is living in a hot country that is the common cause of both Coca-Cola drinking and (skin) cancer. The culpability account does not require us to specify the (direct or indirect) causal generalisation; it only requires that the existence of such a causal generalisation be presupposed.Footnote 39

But even if inferences about human conduct require reliance on causal generalisations, why cannot free actions be proven with such generalisations? Starting with a simple example, assume that Richard is exposed to radiation of a particular kind, which affects his nervous system, resulting in blotches all over his skin and an irresistible urge to attack everyone around him. Assume further that every person exposed to this radiation develops these symptoms. When Richard is admitted to the hospital, it seems unproblematic to infer from the blotches that, given the opportunity, he will go berserk and should therefore be restrained. However, inferring from these marks that a violent action that had taken place before Richard arrived at the hospital was committed by him (rather than by someone else), for the purpose of convicting him of a violent offence, seems intuitively problematic.

According to the culpability account, this inference should not be used for the purpose of determining culpability, because it leads to a contradiction. To infer from Richard’s skin marks that he had acted violently, it is necessary to presuppose a causal generalisation: either one caused the other or they both have a common cause. In this example, the radiation caused both Richard’s blotches and his violent conduct. However, Richard’s acting violently may be culpable only if he acted freely. Establishing Richard’s guilt by inferring from his skin marks that it was he who acted violently is, therefore, contradictory: Richard’s conduct is treated as free and unfree at the same time. This example also explains why the very same inference seems unproblematic when restraining him in the hospital because, in the medical context, it is not necessary to presuppose that Richard’s violent conduct will be free and culpable.

Moving to probabilistic generalisations, consider the following variation on the previous example. Assume that Stephen is exposed to another type of radiation, which affects the nervous system and always causes certain skin blotches but causes an irresistible urge to attack others, when the opportunity arises, in only 80 per cent of cases. According to the subjective interpretation of probability,Footnote 40 which is commonly considered the most suitable for legal purposes,Footnote 41 probabilistic generalisations reflect the limited state of our knowledge rather than the true nature of the world. While the generalisation about the radiation is probabilistic, it imperfectly reflects a reality that may be deterministic. If the world is indeed deterministic, Stephen belongs to one of two possible sub-groups. One possibility is that he belongs to the sub-group of people who possess an extra unknown variable, which, together with the radiation, determines that he will go berserk. The other possibility is that he belongs to the sub-group of people who do not possess the extra variable, in which case the exposure to the radiation will not cause him to go berserk.

Supporting Stephen’s conviction by inferring from the blotches on his skin that he was (80 per cent) likely to have acted violently is problematic. If Stephen does indeed possess the extra variable, then—similarly to deterministic generalisations—this inference leads to a contradiction: his conduct is taken to be both free (in order to be culpable) and unfree (as, together with another unknown variable, his violent actions were determined by the radiation). If Stephen does not possess the extra variable, then inferring from his skin marks that he acted violently is mistaken and misleading because, if he belongs to the sub-group of people who were not caused to act violently by the radiation, then the probability that he acted violently is not affected by the exposure to the radiation. In sum, this inference is either contradictory, because it requires inconsistent presuppositions, or misleading, because it is mistaken and yet is presented as informative.

The rationale for contending that the frequency of the alleged criminal conduct among a group of people similar to the accused should not be used in court is as follows. As causal claims are contrastive in nature (Section I), using statistical evidence regarding the minute frequency of an alternative cause is meaningless without contrasting it to the frequency of the alleged criminal conduct among a group similar to the accused (Section II). In this Section, I have argued that using the frequency of the alleged criminal conduct is problematic as a matter of principle. Whatever the attributes of the group may be, drawing an inference from this frequency to the specific case requires presupposing a causal connection between these attributes and the criminal conduct. Inferring from the frequency of babies being murdered by mothers similar to Sally Clark to the specific case of Clark requires a causal connection (direct or indirect) to be presupposed between the common attributes of these mothers and the murders some of them committed. Without presupposing a causal connection, the inference from the generalisation to the specific case is invalid. As a result, for the inference to be valid, one has to assume that the generalisation on which it is based is causal. This causal generalisation is probabilistic, because not all the mothers in the group murdered their babies. The set of cases on which the causal generalisation is based comprises two sub-groups: mothers who have the common attributes of the group in addition to another variable (or set of variables) that caused them to murder their babies; and mothers whose common attributes did not lead them to murder their babies. If Sally Clark belongs to the first sub-group, the statistical evidence that substantiates this causal generalisation is relevant and probative to her case; however, she could not be convicted of murdering her babies because she did not act freely and is therefore not culpable for her actions. If she belongs to the second sub-group, the use of the causal generalisation to conclude anything regarding her case is erroneous and misleading.Footnote 42

Although this paper deals with proving causation in criminal proceedings, many of the points made here may also be relevant to proving causation in civil proceedings. In tort cases, for instance, proving causation using statistical evidence is a central and disputed topic.Footnote 43 The elements of the paper that discuss contrastive causation and how it is generally proven are also pertinent in the civil context. For example, in the Israeli case of Dirawi,Footnote 44 a man who underwent laser eye surgery claimed compensation for the retinal bleeding and blindness he suffered following the procedure. The court relied on statistical evidence according to which the frequency of a retinal bleed among short-sighted people is only 7.5 per cent, and concluded that it was the surgery (rather than the short-sightedness) that caused the bleeding and blindness suffered by the claimant. According to the argument of this paper, the critical factor for assessing this contrastive causal claim should be the ratio between the frequencies of retinal bleeding caused by short-sightedness and by surgery, respectively. If the frequency of bleeding caused by surgery is higher than the frequency of bleeding caused by short-sightedness, the court’s conclusion would be supported by the evidence. However, if the frequency of bleeding caused by surgery is lower than that of bleeding caused by short-sightedness, the court’s conclusion would be mistaken. Either way, the court erred in its judgement, because a frequency of one potential cause is meaningless if it is not contrasted with the frequency of the defendant’s contrastive cause.

The application of the argument of this paper regarding culpability and free will to Tort Law depends on the role, if there is one, of the tortfeasor’s culpability (which might differ from the term ‘fault’ often used in tort cases). The objection made in this paper to the use of a minute frequency of one cause to prove causation by another cause in Criminal Law is based on the contention that this use contradicts a presupposition (that of free will) required for the attribution of culpability. The centrality of culpability to this paper’s argument renders its applicability to Tort Law hinging on the question of whether attributing tort liability requires presupposing that the tortfeasor is culpable for the claimant’s damages. In most tort cases, the question of whether the imposition of tort liability presupposes culpability is complex. References to blame are not unheard-of in negligence cases,Footnote 45 and nor are considerations of retributive justice (even across various jurisdictions).Footnote 46 In medical negligence cases, for example, both claimants and defendants view liability as reflecting culpability (or closely-related concepts such as accountability and responsibility).Footnote 47 By applying the argument of this paper to such cases, it is possible to explain why it seems intuitively objectionable to support a claim against a doctor with statistical evidence on the average rate of negligent treatments among similar doctors. On the other hand, one might highlight that, whatever role culpability is intuitively given, the goal of Tort Law is not to punish the tortfeasor,Footnote 48 but rather to correct the victim’s harm, promote optimal levels of care, or enhance distributive justice.Footnote 49 After all, once a clinician is found liable, the compensation is often paid by someone else (the hospital or insurance company, for instance). This phenomenon would be hard to accept if compensation were a matter of culpability.Footnote 50

Insofar as a court holds a defendant liable in torts based either explicitly or implicitly on their being culpable, this paper shows that this liability cannot be established using objectionable statistical evidence to prove that the claimant’s harm was caused by the defendant’s conduct. More generally, this point exemplifies the deep connection between controversies concerning substantial law (specifically, whether tort liability presupposes culpability) and controversies concerning procedural issues (whether statistical evidence may be used to prove causation).

Objections

Reductio ad Absurdum

Returning to the argument of this paper, one could object to it by stating that it rejects the use of all evidence. For example, showing the accused’s clothes that are stained with the victim’s blood relies on a generalisation by which individuals with the victim’s blood on their clothes tend to be more likely to be killers than individuals whose clothes are not stained with the victim’s blood. If the use of such a common generalisation contradicts the assumption that the accused was acting freely, then the argument in this paper would deem all fact-finding impossible. If this is so, there must be something wrong with the argument and it must be rejected.

However, the argument of this paper objects to evidence only if the inference it makes is based on a causal generalisation that runs from an attribute that the accused shares with a group of people to which the generalisation applies, to the action allegedly committed by him or her. Only in these cases could the accused’s conduct be caused by an antecedent factor and considered unfree. Using bloodstains as evidence is not problematic because the direction of the causal generalisation used in the inference does not imply that the accused’s conduct (in this case murder) was caused by the bloodstains. Whatever causes could have influenced the action, bloodstains on the accused’s clothes are not one of them. Using the bloodstains as evidence relies on an inference from a generalisation that states that ‘individuals with blood-stained clothes are more likely to be involved in murders than people without blood-stained clothes’. Unlike evidence regarding the frequency of murder among mothers like Clark, one can make this generalisation without assuming a common antecedent factor among people with blood-stained clothes that causes them to commit a murder. Rather, it is their own free action that causes their commonality – the murderous act that caused their clothes to be stained with blood. For this reason, there is no contradiction between this generalisation and the freedom of the accused’s actions, specifically because it does not interact with the requirement that the accused’s actions be free.Footnote 51

For similar reasons, the argument of this paper does not object to many types of evidence routinely admitted to court. For example, if an eyewitness describes the person who committed the crime, and the description matches the accused, there would be no problem with admitting this evidence. As with the bloodstains, the inference does not assume that the physical attributes described by the witness caused the accused’s actions. If a witness declares seeing a tall man, one need not assume that the tallness caused the accused’s actions. Rather, the accused’s actions are what caused him to be present at the scene of the crime, leading the witness to see a tall man in the area. Similarly, the argument does not object to the use of DNA evidence found at the scene of the crime. Instead of assuming that the accused’s actions were caused by his genetic makeup, the inference used in DNA evidence states that the accused’s actions are what caused his DNA to be present at the scene.Footnote 52

Previous Actions

One might insist that we can use facts about the agent herself together with statistical evidence to support the conclusion that it was she who freely committed the crime. For example, if the suspect told close friends that she planned to kill the victim, that she purchased a gun, and that she had strong grievances against the victim, it seems that we can easily correlate these sorts of properties with the fact that the people with them very often carry out their threats (freely).

This is a serious objection that involves two complicating factors: the relation between previous and current actions (can a past free action causally affect another action without rendering it unfree?); and the relation between character traits/propensities and free actions (can an action be free even if it was causally affected by the agent’s propensity, particularly if acquiring this propensity was outside the agent’s control, for instance, if they were born like that?).

Before responding to this objection, it is important to emphasise that, even if the objection holds, none of the cases discussed so far (Clark, Veysey, or De Berk) included statistical evidence that reflects a causal connection involving the agent’s character traits or previous actions. Consequently, the argument of this paper would still apply to any case in which the statistical evidence reflects a causal connection that does not involve the agent’s character or previous actions.

More importantly, the argument of this paper is unlikely to be limited to non-previous and non-character-based causes. As for character-based causes, consider the example of previous convictions for child molestation, which are currently admissible in both the United Kingdom and the United States.Footnote 53 While the admission of such previous convictions has been criticised on various grounds, such as being unconstitutional,Footnote 54 unfair,Footnote 55 and even truth-suppressing,Footnote 56 the connection to the issue of free will seems to have gone unnoticed. My argument would substantiate a different line of criticism, by drawing attention to the importance of identifying the exact generalisation involved and considering whether using it for conviction conflicts with other presuppositions made in criminal proceedings. Like any inference about human conduct, inferring from the accused’s previous convictions that they are likelier to have committed the alleged similar offence(s) relies on a causal generalisation. These previous convictions may be probative because they indicate that the accused suffers from a condition, such as perversion, illness, or addiction, that caused them to reoffend. Consequently, if these previous convictions are indeed probative, it might only be at the price of exposing that the accused’s conduct is unfree and thus nonculpable. To generalise from this example, the argument of this paper could apply to character-based causes because drawing inferences from a certain property of the agent to the likelihood of their carrying out a free action might still involve a cause that undermines that agent’s freedom.

Moving to previous free actions (such as threatening and planning), the argument of this paper need not object to statistical evidence adduced to establish that such previous actions led to the later actions (of committing the crime). This is because it might be unnecessary to presuppose that the later actions are free to begin with. While this move might seem counterintuitive at first, it is probably because it seems to suggest that the later actions are not culpable (because they are unfree). However, if the agent’s later actions were indeed caused by their own previous free actions,Footnote 57 the agent’s culpability for the later actions is likely to be derived from their culpability for the previous ones. When, how, and why culpability for one action is derived from another are complicated issues to address, and it is particularly questionable whether the agent’s culpability goes beyond their culpability for the first action. Be that as it may, previous actions do not serve as a counterexample to the argument of this paper, because this argument does not have to object to statistical evidence adduced to establish that previous free actions led to later culpable actions.

Sentence Mitigation

One might retort that this analysis stands in contrast to a common intuitive view of criminal responsibility. While the analysis implies that the agent’s conduct is either fully determined or entirely unaffected, the practices of assigning criminal responsibility often seem to assume that an agent can be partially causally influenced. The agent is treated as causally influenced by some factor, but only to some degree, leaving them with a less-than-maximum extent of freedom. For example, a paedophile’s sentence might be mitigated by the fact that he was a victim of molestation in his childhood. According to this view, the mitigation acknowledges that his childhood experience causally influenced the way he currently acts yet left him sufficiently free and, hence, responsible for molesting other children.

The difficulty with this view of criminal responsibility is that it fails to account for the conviction stage of the trial, which seeks a binary outcome: the accused is either guilty of the alleged crime or not. Finding him guilty requires that he is culpable of committing the crime,Footnote 58 which, in turn, requires that he acted freely.Footnote 59 Free action is thus a precondition of criminal responsibility, and when undermined by a defence such as insanity or duress, the accused is found not guilty rather than less guilty.

One means of explaining away the intuitive force of this view of criminal responsibility is to note that, while the question of guilt is binary, the consequences of conviction are typically scalar. The punishment could include a longer or shorter period of imprisonment or a heftier or lighter fine. It is at the sentencing stage that the paedophile’s childhood experience is taken into consideration. However, there could be various explanations for why this experience serves to mitigate the appropriate punishment that do not refer to a partial causal influence. To mention just a few alternatives, there would be the increased effect that punishment would have on him as a result of his experience, his vulnerability to becoming a victim again during imprisonment, or maybe even the attempt to compensate him for his bad fortune in childhood.

Whatever the justification may be, it need not rely on a causal generalisation, according to which his childhood experience causally influenced him to commit the alleged offence. If such a causal generalisation is used at the sentencing stage, it becomes difficult to explain why the prosecution should not be allowed to admit the very same evidence at the conviction stage to support its allegation that the accused has committed the offence. The challenge here is not only to identify a solid objection to the use of such evidence in criminal trials (which is more difficult than it might seem),Footnote 60 but also to explain why the same objection is not equally applicable at the sentencing stage. After all, if the accused’s personal background causally influenced him, it means that he is likelier to have committed the alleged crime, rendering his background probative evidence, which should not be ignored at the conviction stage. While exploring the justification for such mitigation lies outside the scope of this paper, it suffices to note that taking into account the paedophile’s childhood background at the sentencing stage need not be based on his being less free when molesting the children he did. This point is in line with the fact that freedom is taken by almost every theorist of free will as a binary concept.Footnote 61

Therefore, my analysis of probabilistic causal generalisations does not stand in contrast to our sentencing practices. On the contrary, the view of partial causal influences stands in contrast to our binary practices of conviction. Proponents of such a view would thus need to explain how freedom and criminal responsibility work, in their understanding.

Compatibilist Approaches

Unlike libertarians, compatibilists hold that conduct can be free even if it was determined by causal factors outside the agent’s control. Rather, they offer alternative criteria to distinguish free and unfree conduct. Whatever the compatibilists’ criteria are, they regard conduct as unfree once certain causal factors are present. For example, a person whose hand was coercively pressed against a button clearly did not press the button freely because their conduct was caused by physical coercion.Footnote 62 What these causal factors are, and how they explain why the compatibilist criteria for freedom are unsatisfied, vary according to the specific compatibilist theory. The key point is that the argument of this paper is still applicable in principle under a compatibilist approach: a generalisation is objectionable if its use presupposes a causal factor that is one of the causal factors rendering conduct unfree, according to the respective compatibilist criteria.

Consider again the deterministic generalisation according to which every person exposed to a particular type of radiation develops certain skin blotches and an irresistible urge to physically attack everyone around them. Is it objectionable to convict Richard of violent offences by inferring from his skin marks that he acted violently? According to compatibilists, the answer depends on whether any of the facts presupposed by the inference (that Richard was exposed to this radiation, for instance) belongs to the group of causal factors that render his conduct unfree. For example, classical compatibilists might argue that the radiation rendered Richard unfree if its effect was such that he would not have avoided acting violently even had he chosen to. Other compatibilists might adhere to Frankfurt’s theory, according to which a person acts freely only if they had a (second-order) desire to have the (first-order) desire to act as they did.Footnote 63 If the radiation caused Richard to have a desire to act violently even though he did not have a second-order desire to have the desire to act violently, then being exposed to this radiation belongs to the causal factors that render Richard’s conduct unfree. A similar point would apply to semi-compatibilist theories, such as Fischer and Ravizza’s influential theory that distinguishes between ‘regulative control’, which is incompatible with determinism but is not required for culpability, and ‘guidance control’, which suffices for culpability and is based on the agent’s responsiveness to reasons.Footnote 64 If the radiation caused Richard not only to have a desire to act violently but also to become unresponsive to reasons, then the exposure to this radiation rendered Richard’s conduct unfree. Compatibilists (and semi-compatibilists) would hence examine the question of whether a generalisation is objectionable through the lens of their specific criteria for freedom.

The application of the argument of this paper under the compatibilist approach is thus significantly more limited than under the libertarian approach. While, under the latter approach, no causal connection in which the causal antecedent is outside the agent’s control can be used to prove their culpability,Footnote 65 under compatibilist approaches it is possible that causal generalisations in which the causal antecedent is outside the agent’s control could be used as evidence of culpability (if the antecedent is one of the causes that influence the agent’s conduct without rendering it unfree).Footnote 66

It could be objected that most plausible compatibilist theories are unlikely to reach the conclusion that the causal connection in real cases undermines the agent’s freedom. In Clark, for example, the frequency of infant murders committed by mothers similar to Clark could be used, providing the causal antecedents do not negate the mothers’ freedom (assuming that the statistical evidence is gathered, analysed, and presented properly). Arguably, even if there is a direct or indirect causal connection between being a professional, a non-smoker, and over the age of 26 and infanticide, the mother could still have avoided killing her babies had she chosen to. Similarly, it is unlikely that the mother’s first-order desires were affected while her second-order ones remained intact. It is also unlikely that she became unresponsive to reasons. In general, compatibilists usually insist that genetic or environmental causes do not pose a threat to the agent’s freedom, even if they determine the agent’s conduct, unless these causal factors make a severe impact on the agent’s physical or mental capacities (as in cases of coercion or manipulation). Consequently, one might contend that repeating the argument of this paper under compatibilist approaches is unlikely to succeed, making this argument dependent on the strongly-contested claim that Criminal Law is committed to libertarianism.Footnote 67

The issue at stake here is wider than the use of statistical evidence to prove causation. If statistical evidence may be used to prove a contrastive causal claim, the prosecution could use the same evidence to prove additional material facts. In Clark, to substantiate the contrastive causal claim by which it was Clark’s conduct rather than SIDS that brought about the babies’ deaths, the prosecution would be obligated to provide the frequency of SIDS as well as the frequency of infanticide among mothers similar to Clark. Now assume that the ratio between the frequencies favours the prosecution’s position, among other reasons because the frequency of infanticide among mothers similar to Clark is significantly higher than the frequency in the general population. According to compatibilist theories, the prosecution could use this statistical evidence and contrast it with the frequencies of SIDS. Assume, further, that in addition to this factual dispute regarding the cause of death being murder by Clark or SIDS, another question arises, namely whether the babies were murdered by Clark or by someone else. If this were the case, why not allow the prosecution to submit the very same statistical evidence regarding the frequency of infants murdered by mothers similar to Clark, contrast it with the frequency of infanticide in the general population, and argue that this evidence supports the claim that it was Sally Clark, as opposed to someone else, who murdered her babies? More generally, why not allow the prosecution to submit statistical evidence regarding the frequency of the alleged criminal conduct whenever the accused belongs to a group in which the crime-rate is higher than among the general population? For example, why should the prosecution not be allowed to use the high rate of crimes involving illegal firearms in a certain neighbourhood to support the conviction of an individual resident in a crime involving an illegal firearm?Footnote 68

Compatibilist approaches could agree that statistical evidence such as crime-rates should not be used in criminal proceedings, while insisting that this justification has nothing to do with free will. Establishing a principled objection to the use of statistical evidence in criminal trials is more difficult than it might seem,Footnote 69 but if compatibilists are able to do so, the question of whether they would object to statistical evidence on the minute frequency of the alternative cause depends on the exact objection they adopt. Alternatively, compatibilists could accept that statistical evidence such as crime-rates should be admitted in criminal proceedings, which is in stark contrast to our current practices and common intuitions.Footnote 70 Either way, from a compatibilist perspective, this paper may only illustrate the futility of taking the free will route to justify the intuitive objection to statistical evidence, because it exposes the strong (libertarian) assumptions that are required for such an objection. Nevertheless, compatibilists should be interested in the use of statistical evidence in criminal proceedings because, according to the argument of this paper, their libertarian adversaries have a principled objection to the use of intuitively-objectionable statistical evidence such as crime-rates, and that objection is not (fully) available to compatibilists.

Conclusion

This paper has shown that statistical evidence regarding the minute probability of a natural cause cannot be used to substantiate the claim that it was the accused’s conduct that caused the result of the alleged crime. Section I showed that causal claims are contrastive in nature. Section II showed that the use of statistical evidence to prove the minute probability of a natural cause is meaningless, in and of itself, without contrasting it with the frequency of the alleged cause, namely the frequency of the committing of the alleged crime among a group of people similar to the accused. Section III presented a principled objection to the use of crime frequencies among a group similar to the accused. If such a causal generalisation is relevant to the specific case at hand, it is only at the price of undermining the basis for culpability; and if it is irrelevant to the case at hand, it should be deemed erroneous and therefore inadmissible. The conclusion of the paper is therefore that using statistical evidence regarding the infinitesimal frequency of a natural cause is objectionable, regardless of how reliable the statistical analysis is.