Unacceptable Generalizations in Arguments on Legal Evidence

Arguments on legal evidence rely on generalizations, that link a certain circumstance to a certain hypothesis and warrants the claim that the circumstance makes the hypothesis more probable. Some generalizations are acceptable and others are unacceptable. A generalization can be unacceptable on at least four different grounds. A false generalization is unacceptable because membership in the reference class does not increase the probability of the hypothesis. A non-robust generalization is unacceptable because it uses a reference class that is too heterogeneous. A biastriggering generalization is unacceptable because decision makers are inclined to overestimate the evidentiary value of membership in the reference class. A discriminating generalization is unacceptable because it puts members in the reference class in an unfair disadvantage. Research funded by the Swedish Research Council (Vetenskapsrådet).


Introduction
The issues that are addressed in a court of law are traditionally divided into ''issues of law'' and ''issues of fact''. Legal argumentation can therefore be divided into arguments on issues of law and arguments on issues of fact. Arguments of the first kind are concerned with legal interpretation. A legal argument on an issue of law provides a reason for a certain interpretation of the law. Arguments of the second kind are concerned with the assessment of legal evidence. A legal argument on an issue of fact provides a reason for a certain assessment of evidence. Most studies on & Christian Dahlman christian.dahlman@jur.lu.se legal argumentation are concerned with arguments of the first kind. This article is concerned with arguments of the second kind. It investigates arguments on legal evidence in criminal trials. An argument on legal evidence points to a certain piece of evidence and claims that it increases the probability of a certain hypothesis. It should be noted that there are many different kinds of hypotheses that can figure in a criminal trial. The hypothesis could, for example, be that the defendant was at the crime scene around a certain time, that the defendant had knowledge about some crucial circumstance, or that the defendant had a certain motive. The hypothesis could also regard some person other than the defendant. The hypothesis could, for example, be that someone who has testified as a witness is unreliable. Anderson et al. (2005: 60-63) have demonstrated that arguments on legal evidence rely on generalizations. Every argument that points to a certain piece of evidence, and claims that it increases the probability of a certain hypothesis, relies on some kind of generalization that links the evidence to the hypothesis, and justifies the claim that the evidence makes the hypothesis more probable. The generalization is a warrant that justifies the conclusion about the hypothesis. In some arguments, the generalization is stated explicitly as a premise in the argument. In other cases, the generalization is a tacit premise that is logically necessary for the argument to be valid. As an example of a tacit premise, the defense attorney in a murder trial could direct the attention of the judge/jury to the fact that the crime scene was very dark, and claim that this increases the probability that the observation of a certain eye witness was mistaken. This argument relies on tacit premise: the generalization that observations in the dark are more likely to be mistaken.
As we shall see, a generalization connects two classes to each other. I will refer to these classes as the ''reference class'' and the ''target class''. When an argument on legal evidence points to a certain piece of evidence, it classifies the case at hand as belonging to the reference class of cases where this kind of evidence is present, and when the argument claims that the evidence increases the probability of a certain hypothesis, it claims that membership in the reference class increases the probability that the case belongs to the target class of cases where the hypothesis is true. The generalization that observations in the dark are more likely to be mistaken links the reference class ''observations in the dark'' to the target class ''incorrect observations'', and claims that membership in the former increases membership in the latter.
The use of generalizations has been studied in argumentation theory. Prakken et al. (2003: 39) have identified and modeled different ways in which an interlocutor in a debate can attack a generalization (Bex et al. 2003: 141-142). One way to attack a generalization is to attack its source. An argument claiming that it is ''general knowledge'' that observations made in the dark are more likely to be mistaken can, accordingly, be attacked by questioning general knowledge as a reliable source of information. A different way to attack a generalization is to attack the generalization itself. An attack of this kind could, for example, dispute the generalization that observations made in the dark are more likely to be mistaken, by claiming that this is empirically false. A third way to attack a generalization is to say that the generalization is correct as a generalization, but leads to a false conclusion when applied to the specific case at hand, due to special circumstances in this case. An attack of this kind could, for example, admit that observations in the dark are generally more likely to be mistaken, but claim that the particular observation made by the eye witness is not likely to be mistaken, since the witness was using night goggles. These distinctions map out different strategies that a trial lawyer could use in argumentation in front of the judge/jury to attack arguments from the opposing side.
In this article, I will investigate attacks of the second kind. I will investigate attacks on the generalization itself, as a generalization. As I intend to demonstrate, such attacks can attack the generalization on different grounds that should be distinguished from each other. In the example above, the interlocutor launches an attack on the generalization itself, by claiming that it is a false generalization. This is one ground for attack. As I intend to show, there are at least four different grounds on which a generalization can be attacked, as a generalization, that should be distinguish from each other.
Argumentation can be analyzed from different perspectives. In this article, I will investigate arguments that rely on generalizations from the perspective of a decision maker who is presented with arguments, and has to assess to what extent they are sound. This is the situation that faces a judge or jury with regard to legal evidence. The prosecution and the defense make arguments on the interpretation and evaluation of the evidence, and the judge/jury has to assess to what extent the arguments that are advanced in favor of a certain decision actually provide justification for that decision. The judge/jury has to scrutinize if the arguments are logically valid and rely on premises that are acceptable. A premise can be unacceptable in different ways. A descriptive premises is unacceptable if it is epistemically incorrect. A normative premises is unacceptable to the decision maker if he or she finds it morally incorrect. With regard to generalizations, this means that the judge/jury has to assess if the generalizations that are used in the arguments that are presented to them are epistemically and morally acceptable.
Twining says that generalizations are ''necessary but dangerous'' (Twining 1999: 357). In this article, I will show that generalizations can be unacceptable on different grounds that should be distinguished from each other. I will show that there are at least four distinctly different grounds for judging a generalization unacceptable. I will distinguish between false generalizations (Sect. 5), non-robust generalizations (Sect. 6), bias-triggering generalizations (Sect. 7) and discriminating generalizations (Sect. 8). The first three are unacceptable on epistemic or cognitive grounds, while the fourth is unacceptable on moral grounds. A false generalization is unacceptable because membership in the reference class does not increase the probability of membership in the target class, a non-robust generalization is unacceptable because it uses a reference class that is too heterogeneous, a bias-triggering generalization is unacceptable because decision makers are inclined to overestimate the probability of membership in the target class, and a discriminating generalization is unacceptable because it puts members in the reference class at a morally unacceptable disadvantage. In this article I will investigate each of them in turn, and analyze the grounds for judging them unacceptable.
Unacceptable Generalizations in Arguments on Legal Evidence 85 The purpose of this analysis is to facilitate the assessment of generalizations in argument about legal evidence by providing some theoretical distinctions. I hope that the distinctions that I propose can help judges and juries think in a clear and structured way about generalizations that they find problematic. At the end of my investigation, I will offer a check list of critical questions that can be used by legal decision makers when they assess arguments about legal evidence.
It is not the aim of this study to describe how legal decision makers actually assess arguments on legal evidence. I will not investigate which generalizations are accepted by judges and juries, and which are not. Neither is the purpose of this study to argue which generalizations ought to be accepted, and which ought not to be accepted, in my view. The purpose is merely to provide some theoretical distinctions that I hope will be helpful for a legal decision maker. The distinctions that I make in this investigation provide a vocabulary for identifying and separating different grounds for classifying a generalization as acceptable or unacceptable in arguments on legal evidence. This is important for the assessment of legal evidence by legal decision makers. I hope that it will enhance the clarity of such assessments and make them more reasoned.

Acceptable and Unacceptable Generalizations
As we have seen, all arguments on legal evidence rely on generalizations. Some generalizations are so trivial and uncontroversial that judges and jurors do not even think about them as premises in the argument. Other generalizations are problematic, and there are some arguments that trade on generalizations that are unacceptable. Generalizations where membership in a certain social group is connected with a certain feature and generalizations where a claim about a person's character based on past behavior are examples of generalizations that can be problematic and judged unacceptable. These generalizations are used in ad hominem arguments, for example in arguments that attack the credibility of a witness (Macagno and Walton 2012: 20).
The following list provides five examples of arguments on legal evidence. It starts with an argument that relies on the familiar generalization that observations in the dark are more likely to be mistaken, and proceeds with arguments that are more problematic.
(A1) WZ testifies as a witness for the prosecution in a burglary case, and says that he drove passed the crime scene and saw a man loading some boxes into a van. WZ says it was too dark to see what the man looked like, but the van appeared to be blue. The defense attorney comments on WZ's testimony in his closing statement, and makes the following argument: ''It is common knowledge that colors are harder to distinguish in the dark. In places with low illumination, blue can be mistaken for green and vice versa. WZ testified that the car was blue, but this observation could be mistaken. The crime scene was very dark, and this circumstance increases the probability that his observation was incorrect.'' (A2) YP testifies as a witness for the defense in a burglary trial. YP is the defendant's mother, and provides the defendant with an alibi. According to YP, the defendant was watching TV at her house when the burglary took place. The prosecutor questions the credibility of YP's alibi in his cross examination and closing argument. According to the prosecutor: ''We must consider the possibility that YP was lying when she gave her testimony. It is common knowledge that a mother would do anything to protect her child. The fact that YP is the defendant's mother therefore increases the probability that she was lying when she gave him an alibi for the evening of the burglary.'' (A3) FD is standing trial for murder. According to the prosecution, FD killed his neighbor MM with a shotgun. FD's wife ND testifies for the prosecution and says that FD shot MM. FD claims that he is innocent and that it was ND who killed MM. The forensic investigation found FD's fingerprints as well as ND's fingerprints on the shotgun. The prosecutor uses crime statistics as an argument against FD: ''Only 8 % of homicide offenders are women. It is therefore highly probable that it was FD rather than ND who shot MM.'' (A4) HK is standing trial for shoplifting. HK is born in Somalia, and the prosecutor submits crime statistics as evidence to show that people of Somali origin are overrepresented by a factor of seven among convicted shoplifters. According to prosecutor, these statistics strengthen the case against HK: ''The fact that shoplifting is more common among people of Somali origin does not necessarily mean that HK committed this particular offense, but it does increase the probability.'' (A5) TL is standing trial for drug dealing. TL has three previous convictions for the same offense, and the prosecutor argues that this increases the probability that TL is guilty in the present case: ''Previous convictions for the same offense show that TL is disposed to commit this kind crime. They make it substantially more probable that he is guilty.'' Some of these arguments are more problematic than others. In my experience most judges find (A1) and (A2) acceptable, and assess (A3) and (A4) as unacceptable. (A5) seems to be the most controversial argument on the list. In some legal systems prior conviction is admissible as evidence for guilt, in others it is inadmissible. The Swedish legal system is an example of the former. In a survey that I conducted on 261 Swedish judges 61 % accepted the generalization that prior conviction for the same offense increases the probability that the defendant is guilty, and 39 % found this generalization unacceptable (Dahlman 2015: 11).
The notion that some generalizations are acceptable while others are unacceptable raises several questions. Fundamentally, it raises the question, on what ground a generalization is judged as unacceptable. What, exactly, is wrong with the generalizations that are unacceptable that is not the case with acceptable generalizations? In the following, I will discuss different grounds for judging a generalization unacceptable.
Unacceptable Generalizations in Arguments on Legal Evidence 87

Unacceptable Categorically and Non-categorically
When we talk about generalizations that are unacceptable an important distinction should be made between claims of two different kinds: the claim that it is always unacceptable to use a certain circumstance as evidence for a certain hypothesis, and the claim that a certain argument that uses the circumstance as evidence is unacceptable. In the former case, the generalization is categorically unacceptable.
In the latter case, it is unacceptable but the unacceptability is not categorical. The generalization is non-categorically unacceptable. The claim that a certain generalization is categorically unacceptable says that all arguments that use the circumstance as evidence for the hypothesis are unacceptable. The non-categorical claim only says that a certain argument uses the circumstance in an unacceptable way, and does not rule out that there could be other arguments that use the circumstance as evidence in an acceptable way. This distinction is related to the distinction introduced by Prakken, Bex, Reed and Walton (see section one above) between ''attacks on the generalization itself'' and ''attacks on a specific application of a generalization''. It is also similar to Terence Anderson's distinction between ''synthetic-intuitive generalizations'' and ''context-specific'' generalizations (Anderson 1999: 459-460).
The difference between the judgment that a generalization is categorically unacceptable and the judgment that it is non-categorically unacceptable can be illustrated with argument (A5). According to the prosecutor, the defendant's prior convictions for the same offense make it substantially more probable that he is guilty. A legal decision maker could find this argument unacceptable in two different ways.
1. The decision maker says that the use of prior convictions as evidence for guilt is categorically unacceptable. Prior conviction should be inadmissible as evidence for guilt. 2. The decision maker says that arguments that use prior conviction as evidence for guilt are not necessarily unacceptable, but claim that this particular argument is unacceptable as it exaggerates the evidentiary value of the prior conviction. It might be true that prior conviction for the same offense makes it slightly more probable that the defendant is guilty, but it does make it ''substantially more probable''.
As we shall see, some grounds for classifying a generalization as unacceptable render the generalization unacceptable categorically and others non-categorically. The primary focus of this investigation is to identify different grounds for saying that a generalization is categorically unacceptable.

True Generalizations
A generalization points to a certain piece of evidence and classifies the case at hand as belonging to the reference class (E) of cases where this kind of evidence is present, and claims that this is evidence for a certain hypothesis, in the sense that membership in the reference class increases the probability that the case belongs to the target class (H) of cases where the hypothesis is true. The generalization says that knowledge that a certain observation belongs to the reference class makes it more probable ceteris paribus that it belongs to the target class. The probability that a case belongs to the target class (H), given that the case belongs to the reference class (E), is higher than the probability that the case belongs to the target class (H), when it is not given whether the case belongs to the reference class or not, P(H|E) [ P(H). As an example, the generalization in (A1) says that the probability that an observation is mistaken, given the information that it was made in the dark, is higher than the probability that it is mistaken, given that we are ignorant about the light conditions when the observation was made. For this to be true membership in the target class must be more likely in the reference class than in cases in general. Incorrect observations must be more common among observations in the dark than among observations in general, P(H&E)/P(E) [ P(H).
Let us assume that there are 100 cases, and in each case an eyewitness testifies about the color of a van. In 20 cases it was dark when the witness observed the van, and in 80 cases the witness observed the van in good light. Among the 20 cases where the witness observed the van in the dark, the observation was correct (the van actually had the color that the witness named) in 16 cases and incorrect in 4 cases. Among the 80 cases where the witness observed the van in good light, the observation was correct in 79 cases and incorrect in 1 case. See Fig. 1.
If we pick a case at random among these 100 cases, the probability that the observation is incorrect, if we do not know whether the witness observed the van in the dark or in good light, P(H), is 5/100 = 0.05. The probability that the observation is incorrect, given that the observation was made in the dark, P(H&E)/P(E), is (4/ 100)/(20/100) = 0.20. This means that the circumstance that the observation was made in the dark increases the probability of the hypothesis that the observation is incorrect. If we learn that the observation was made in the dark, the probability that the testimony is incorrect increases from 5 to 20 %. This assumes, of course, that there is no other evidence. If there is another circumstance that increases the probability that the observation is incorrect, independently of the evidence that the observation was made in the dark, e.g. that the witness has bad eye sight, the combined probability that the observation is incorrect will, of course, be higher than 20 %, and, if there is another circumstance that decreases the probability that the observation is incorrect, the combined probability will be lower than 20 %. In any case, the circumstance that it was dark when the witness observed the van makes it more probable that the observation is incorrect, as incorrect observations are more frequent among cases where the observation was made in the dark than among cases in general. The effect that E has on the probability of H depends on the value of other evidence. Let us, for example, assume that, due to other circumstances, the probability of the testimony being incorrect is 60 % before we receive the information that the observation was made in the dark. In this situation the probability that the observation is incorrect increases from 60 to 88 % when we take into account that the observation was made in the dark. 1

False Generalizations
The circumstance that a case belongs to a certain reference class makes it more probable that the case belongs to a certain target class, if and only if membership in the target class is more common among cases in the reference class than among cases in general. Notice that it is not sufficient that membership in the target class is common in the reference class. 2 It needs to be more common among cases in the reference class than among cases in general. This principle can be illustrated with argument (A5) as an example. (A5) is based on the generalization that prior conviction for the same offense makes it more probable that the defendant is guilty. Is this generalization true or false? This depends on whether guilty defendants are more common among defendants with a prior conviction for the same offense than among defendants in general. Let us assume that there are 100 cases, and in 60 of these cases the defendant has been previously convicted for the same offense. Out of the 100 cases, there are 70 cases where the defendant is guilty and 30 cases where the defendant is actually innocent. Among the 70 defendants that are guilty most defendants have a prior conviction for the same offense. 40 of the guilty defendants have been previously convicted for the same offense and 30 guilty defendants have not. If we look at the 30 defendants who 1 The probability can be calculated with Bayes' theorem: P(H|E)/P(-H|E) = P(H)/P(-H) 9 P(E|H)/ P(E|-H).
In the given example, there is a 60 % probability that the observation is incorrect before we take account of the evidence that it was dark when the witness observed the van, i.e. P(H) = 0.60. The probability that the observation is correct, P(-H), is 1 -P(H), and this means that P(-H) = 1 -0.60 = 0.40, and that P(H)/P(-H) = 0.60/0.40 = 1.5. P(E|H)/P(E|-H) is known as the likelihood ratio, and can be understood as the evidentiary force of the evidence vis-a-vis the hypothesis. Since P(E|H) = P(E&H)/P(H) and P(E|-H) = P(E&-H)/P(-H), the likelihood ratio can be calculated as [(4/100)/(5/100)]/[(16/100)/(95/100)] = 4.75. If we put these numbers into Bayes' theorem we get P(H|E)/P(-H|E) = 1.5 9 4.75 = 7.125. Since P(H|E) = 1 -P(-H|E), we can find P(H|E) by solving the equation P(H|E)/(1 -P(H|E)) = 7.125. Thus, P(H|E) = 7.125/8.125 & 0.88. 2 There is an article by David Wasserman where he makes this error (Wasserman 1991: 944). are innocent it is also the case that most defendants have a prior conviction for the same offense, since the police have a selection bias towards people with a prior conviction for the same offense when they pick suspects, and this increases the risk for convicted felons to be wrongfully prosecuted for crimes they did not commit. 20 of the innocent defendants have been previously convicted for the same offense, and 10 innocent defendants have not. See Fig. 2.
This means that membership in the target class is common in the reference class. Being guilty is common among defendants with a prior conviction for the same offense. As a matter of fact 67 % of the defendants with a prior conviction for the same offense are guilty, P(H&E)/P(E) = (40/100) (60/100) & 0.67, but this does not mean that membership in the reference class (prior conviction for the same offense) makes it more probable that a person is a member of the target class (guilty), since membership in the target class is not more common in the reference class than among the general population. On the contrary, the probability that a randomly picked defendant is guilty is 70 %, P(H) = 0.70. This means that the probability that the defendant is guilty decreases from 70 to 67 %, when we are informed that the defendant has been previously convicted for the same offense. Given the numbers in Fig. 2, the generalization that ''prior conviction for the same offense makes it more probable that the defendant is guilty'' is a false generalization.
It is important to distinguish between true generalizations and false generalizations. In a true generalization the claim that membership in the reference class makes it more probable that the case belongs to the target class is empirically correct. In a false generalization, this claim is empirically incorrect. The distinction has been stressed by Frederick Schauer as the distinction between ''spurious generalizations'' and ''non-spurious generalizations'' (Schauer 2003: 7).
Let us examine arguments (A1), (A2), (A3), (A4) and (A5) to see if they are based on true generalizations or false generalizations. As we have seen, it must be the case that membership in the target class is more common among cases in the reference class than among cases in general. If this is not the case, the generalization is false.
(A1) depends on the empirical correctness of the following claim: Observations where a green car is mistakenly perceived as blue are more common among observations that are made in the dark than among observations in general. This is true. Blue and green are more difficult to distinguish from each other in the dark. In low illumination the sensitivity of the human eye shifts towards the blue end of the color spectrum, and this can make green or black objects appear blue.
(A2) depends on the empirical correctness of the following claim: Testimony that provides a false alibi is more common among testimony given by the defendant's mother than among testimony in general. This is probably true. All mothers are not prepared to lie under oath to protect their children, but it seems reasonable to assume that false alibis are more common among mothers than among witnesses in general.
(A3) depends on the empirical correctness of the following claim: In murder cases, guilty defendants are more common among male defendants.
This seems to be true. There is a wide agreement among criminologists that more than 90 % of all murders are committed by men. We can therefore assume that roughly the same proportion of guilty defendants are men. The proportion of men among innocent defendants, on the other hand, ought to be lower, since men only make up 50 % of the total population of innocents. This means that the generalization is correct. Guilty defendants are more common among male defendants than among defendants in general.
(A4) depends on the empirical correctness of the following claim: In shoplifting cases, guilty defendants are more common among defendants of Somali origin than among defendants in general.
It is uncertain if this is true or false. It is supported by recent Danish statistics, showing that convictions for shoplifting are 7.3 times more frequent among people of Somali origin living in Denmark than among the Danish population in general. 3 It should be pointed out that this statistic does not necessarily mean that people of Somali origin are overrepresented among guilty defendants. That they are overrepresented with regard to conviction could be caused by bias towards people from Somalia in the Danish legal system.
(A5) depends on the empirical correctness of the following claim: Guilty defendants are more common among defendants who have been previously convicted for the same offense than among defendants in general.
This is probably incorrect for reasons that I have presented in a previous study (Dahlman 2015). The police have a strong selection bias towards people with a prior conviction for the same offense, when they pick possible suspects, and this leads to a situation where ex convicts are more likely to be prosecuted for a crime they did not commit than people in general. Research shows that the number one cause of wrongful prosecution is mistaken photo identification, where an eyewitness is presented with pictures of people with a prior conviction for the same offense, and picks an innocent person that resembles the real perpetrator (McConville et al. 1991: 23-24;Martin 2002: 856;Huff 2003: 16;Fitzgerald 2009: 5). This suggests that innocent defendants are more common among defendants who have been previously convicted for the same offense than among defendants in general, and that means that (A5) relies on a false generalization. Prior conviction for the same offense does not make it more probable that the defendant is guilty. On the contrary, it makes it less probable that the defendant is guilty.

Non-robust Generalizations
Argument (A3) relies on the generalization that it is more probable that the defendant is guilty if he is a man. As we have seen, this is empirically correct, but there is something deeply problematic with this generalization even if it is true. A generalization makes a claim about a class of cases but an argument on legal evidence makes a claim about a specific case. Men as a group commit murder more often than women as a group, but the defendant is a specific man, and it could be the case that he is a peaceful man who would never hurt anyone. Is it really acceptable to judge him on the actions of other men? The philosophical position known as particularism responds to this problem by saying that a case shall be judged on its particular circumstances, not on generalizations (Schauer 2003: 19-20). A person shall be judged on his or her individual merits and flaws, not on the characteristics of some group that he or she happens to belong to (Lippert-Rasmussen 2011: 48). As David Wasserman puts it, inferences about the guilt of a defendant that are based on group generalizations ''are inconsistent with the law's commitment to treat the defendant as an autonomous individual'' (Wasserman 1991: 943).
Particularism may have some intuitive appeal, but is, actually, an impossible idea, as all evidence relies on generalizations, in one way or the other. If we dismiss every piece of evidence that relies on a generalization, we will have no evidence left to judge the case. The impossibility of particularism has been demonstrated by Schauer (2003: 75), Tillers (2005: 44); Stein (2005:65). Schauer shows that every attempt to move beyond a certain generalization will only substitute the generalization for another generalization (Schauer 2003: 67). This can be illustrated with argument (A3). According to prosecutor, the fact that FD is a man makes it highly probable that he, rather than his wife ND, fired the shotgun that killed the neighbor MM. Let us assume that FD's defense attorney objects to this line of reasoning, and argues that it is unacceptable that FD shall be judged on the behavior of men in general. FD's defense attorney claims that FD is a peaceful and law abiding man, and submits evidence on FD's past behavior. According to the defense attorney, FD should be judged on the basis of this character evidence. The defense attorney may very well be right, but the approach that he suggests does not mean that the case is assessed on particular circumstances instead of generalizations. It only means that one generalization is substituted for another generalization. Instead Unacceptable Generalizations in Arguments on Legal Evidence 93 of judging FD on a generalization about men in general, FD will be judged on a generalization about men with a track record of good behavior. The right approach to the problematic nature of generalizations is not to reject all generalizations, but to recognize that some generalizations are more problematic than others. It is, for example, more problematic to judge FD on the behavior of all men than to judge him on the behavior of men with a track record of good behavior. This is due to the fact that ''male'' is a more heterogeneous reference class than ''male with a track record of good behavior'' (Colyvan et al. 2001: 172). The probability that a man is different from other men is higher than the probability that a man with a track record of good behavior is different from other men with a track record of good behavior. This can be described in terms of robustness. A judgment that is based on a less heterogeneous reference class is more robust than a judgment based on more heterogeneous reference class (Dahlman et al. 2015: 17-20).
Robustness measures sensitivity to additional information. That a judgment is more robust means that it is less likely that it will be changed by additional information.
A generalization can be transformed into a more robust generalization by making the reference class more specific. This transforms the reference class into a less heterogeneous reference class. The reference class of cases where circumstance A is present can be transformed into the more specific reference class of cases where ''A and B'' are present, or the even more specific reference class ''A and B and C''. The reference class ''male'' can, for example, be transformed into the more specific reference class ''male with a track record of good behavior'' or the even more specific ''male over 65 with a track record of good behavior''.
The prosecutor's argument in (A3) can, therefore, be criticized with regard to robustness. The objection against the prosecutor's argument would go as follows (Colyvan et al. 2001: 173): ''It is true that FD is a man, and it is true that this circumstance increases the probability that he is guilty, but I am not prepared to settle with this. I want to place FD in a more specific reference class. I want to know more about FD, to see if this changes the probability that he is guilty.'' A problem with this kind objection is that it can be raised against every argument that relies on a generalization. It is always possible that a more specific reference class would change the probability of H. This dilemma is known in probability theory as the reference class problem (Reichenbach 1949: 374).
With regard to arguments on legal evidence the reference class problem can be resolved by the principle that a generalization should not be accepted if the reference class can be specified in a way that typically changes the probability of H. If we know, for example, that considering track record typically changes the probability of H, we have reason to classify a generalization that does not consider track record as unacceptably non-robust. If an argument that relies on such a generalization is presented, the lack of robustness is a ground for the judge/jury to disregard it. According to this line of reasoning, generalizations that rely on oversimplified statistics are unacceptable in argumentation on legal evidence (Stein 2005: 70). (A3) as well as (A4) ''Somali origin'', can be judged as unacceptable on this ground.
It should be noticed, however, that the lack of robustness in (A3) and (A4) only renders these generalizations unacceptable in the non-categorical sense. It does not make them categorically unacceptable. That (A3) is unacceptable because ''male'' is a too heterogeneous reference class does not mean that ''male'' as a circumstance should never be used as evidence for guilt. It does not rule out that a more specific reference class that uses ''male'' as a circumstance in conjunction with other circumstances, e.g. ''male under twenty with a history of violent behavior'', could be sufficiently robust to be acceptable. And the same goes for (A4). That ''Somali origin'' is insufficiently robust as a reference class does not mean that ''Somali origin'' is unacceptable as one of the circumstances in a reference class. The view that ''Somali origin'' is categorically unacceptable as evidence for guilt needs to be justified by something more than lack of robustness.

Bias-Triggering Generalizations
There are situations where a decision maker believes that there is some truth to a certain generalization, but is hesitant to accept the generalization, as it may trigger bias. The decision maker fears that the generalization, if accepted, will be overestimated and overused. Argument (A4) can serve as an example. Let us assume that the generalization used in argument is true. Somali origin increases the probability that the defendant is guilty, P(H|E) [ P(H). It only makes it slightly more probable that the defendant is guilty, but it does increase the probability. A decision maker may still be hesitant towards the acceptance of (A4), fearing that such acceptance would lead to an exaggerated bias against people of Somali origin. This could be a ground for a judge to decide that (A4) is unacceptable in arguments about legal evidence.
The suspicion that the generalization will be overestimated and overused can be related to a number of different agents. First of all, the judge may fear that the acceptance of (A4) in a court of law may legitimize racism among the general population (Schauer 2003: 35). Secondly, the judge may fear that it will encourage bias among other judges. And, thirdly, the judge may doubt his own ability to handle the generalization correctly, preferring to refrain from using it, to avoid the risk of overestimating its evidentiary force. In the last case, the judge is tying himself to the mast, like Ulysses, to avoid irrational judgment.
In legal systems where the evidence is assessed by a jury, the judge may fear that the jurors will overestimate a certain generalization, and the judge will sometimes prevent this from happening by declaring a certain piece of evidence inadmissible, or instructing the jury to disregard it. According to the Federal Rules of Evidence 403, a judge can exclude relevant evidence if the judge finds that the probative value P(H) P(H|E) P bias (H|E) 0 0.5 1 Fig. 3 Overestimation of evidence is substantially outweighed by the jury's prejudice about the evidence (Allen et al. 2011: 140-142). The situation is illustrated in Fig. 3. P(H) is the probability that the defendant is guilty when the jury does not take into account that he is of Somali origin. P(H|E) is the probability that the defendant is guilty, given that he is of Somali origin, according to a non-biased juror who makes a correct assessment. P bias (H|E) is the probability that the defendant is guilty, given that he is of Somali origin, according to the biased juror who overestimates the evidentiary value of Somali origin. If the judge finds that the jurors are biased it becomes problematic to accept that the jury uses this generalization.
It should be noticed that in such situations the jury will never get the probability right. If the jury takes account of E the probability of H will be overestimated. If the jury does not take account of E the probability of H will be underestimated. A solution to this dilemma is to minimize the error. This means that the judge should look at the difference between the correct probability and the assessed probability when the jury does not take the evidence into account, P(H|E) -P(H), in comparison to the difference between the correct probability and the assessed probability when the jury takes the evidence into account, P bias (H|E) -P(H|E). If the latter exceeds the former, P bias (H|E) -P(H|E) [ P(H|E) -P(H), the error is minimized if the judge instructs the jury that it is unacceptable to use Somali origin as evidence of guilt. This is the solution provided in the Federal Rules of Evidence 403.
The idea that dilemmas of this nature shall be resolved by minimizing the error is not without objection. The solution rests on the assumption that all errors are equally undesirable, but this is not the case. Some errors are more undesirable than others. The conviction of an innocent defendant is, for example, more undesirable than setting a guilty defendant free. A judge should take this into account, when he or she decides whether a certain circumstance should be admitted as evidence. In our example above it lends further support to the conclusion that Somali origin should not be accepted as evidence for guilt, but there are other situations where the effect could go in the opposite direction, e.g. when we are dealing with character evidence in favor of the defendant. Jeremy Bentham proposed that the dilemma should be settled on the basis of a utilitarian calculus. Evidence should be dismissed from consideration if the harm of this exclusion is smaller than the harm that would ensue if the evidence were considered (Bentham 1962: 88). This means that it makes a difference if we are dealing with an argument advanced by the prosecution, where the generalization hurts the defendant, or an argument advanced by the defense, where the generalization favors the defendant. Since the harm of a wrongful conviction is greater than the harm of a wrongful acquittal, it takes less of a bias for a generalization that hurts the defendant to be unacceptable.

Discriminating Generalizations
So far we have identified three different grounds for saying that a generalization is unacceptable in arguments on legal evidence: false generalizations, non-robust generalizations and bias-triggering generalizations. The first two are epistemic grounds, and the third is cognitive. I will now investigate a fourth ground that is moral in nature-the notion that a generalization can be unacceptable categorically because it discriminates people that belong to the reference class in an unfair way. This would justify why the generalization in argument (A4) ''Somali origin as evidence for guilt'' should be classified as unacceptable. It is important to notice that we are now talking about acceptability in the categorical sense. We have seen above that (A4) can be classified as unacceptable non-categorically due to lack of robustness. We can now move one step further and classify (A4) as categorically unacceptable on the grounds of discrimination.
The idea that (A4) is unacceptable because it discriminates people from Somalia in an unfair way is appealing, but it needs to explain why ''Somali origin'' is unacceptable, when other circumstances that also discriminate are acceptable. Why is argument (A4) unacceptable, but argument (A2) acceptable? Why is it acceptable to discriminate a mother who is giving her son an alibi? Does not fairness require that she has the same possibility as other people to give the defendant an alibi? At the end of the day, could we not say that every generalization that makes an inference from a social group to an individual is discriminatory and unfair? As you can see, this argument leads to particularism: every person has the right to be judged on individual circumstances only, everything else is unfair discrimination. To avoid this pitfall into particularism we need to distinguish between acceptable discrimination and unacceptable discrimination, and we need a moral ground for the distinction.
An important difference between the generalization that false alibis are especially common among testimony given by the defendant's mother and the generalization that stealing is especially common among people of Somali origin is that the negative impact for people of Somali origin from the latter generalization is much greater than the negative impact for mothers from the former generalization (Hellman 2008: 23). Consider the situation where the generalization that stealing is especially common among people of Somali origin is generally used against people from Somalia by legal decision makers. The cumulative effect of such a practice puts people of Somali origin in a systematic disadvantage. A similar effect does not ensue by the general use of the generalization that mothers will lie to protect their children. This generalization does not make mothers systematically disadvantaged in an unacceptable way. An assessment where (A2) is found to be acceptable while (A4) is classified as categorically unacceptable can be justified on this ground.
That some generalizations have a greater cumulative effect than others can be explained by several factors. First of all, some generalizations are applicable to more situations than others. They can be used by decision makers in many different contexts. Furthermore, some generalizations have a greater cumulative effect because they are more available than others, in the sense that they require less effort on the decision maker's part. Generalizations that require little effort will be used more often, and will, therefore, have greater cumulative impact. Research in cognitive psychology has demonstrated that some generalizations are more available to decision makers, as they come to mind more easily (Tversky and Kahneman 1973: 207). Racial generalizations that play a considerable role in the society where the decision maker is situated are more available to the decision maker. It should also be remembered, that an argument is more available when it is effortless for the decision maker to determine that the case belongs to the reference class (Segall 2012: 96). This is, for example, the case with reference classes that relate to physical appearance, such as skin color.

Check List with Critical Questions
As we have seen, the judgment that a certain generalization is unacceptable in arguments on legal evidence can be made on at least for different grounds that should be separated from each other. I have distinguished between false generalizations, non-robust generalizations, bias-triggering generalizations and discriminating generalizations.
Judges and juries are presented with evidence and listen to arguments about the evidence. It is important that they assess these arguments critically. A legal decision maker must always question if the generalization that an argument relies on is problematic, and, if it is problematic, specify on what ground, exactly. Argumentation theory can help a decision maker in this task, by setting up a check list of critical questions that reminds the decision maker of important issues and separate the issues from each other. This methodology has been used successfully by Doug Walton, and others (e.g. Walton 1997: 199-229). The following check list of critical questions sums up the main results of my analysis in this article.
• Is the generalization empirically true or false, as a generalization?
Is membership in the target class more common in the reference class than among cases in general? • Is the generalization sufficiently robust?
Is the reference class homogenous or heterogeneous? • Does the generalization trigger bias?
Is there a risk that the generalization, if accepted, will be overused or overestimated? • Is the generalization discriminating?
Does the generalization put people in the reference class at an unfair disadvantage?