Poisonous Datasets, Poisonous Trees

Machine learning gives rise to concerns about “algorithmic bias,” arising from bias in the training dataset. A dataset necessarily reflects a past state of affairs, but we may anticipate that the future will be different, or desire it to be so. Law has struggled with an analogous problem: how to deal with “bad evidence” and prejudice. Holmes’s broad view of experience suggests that once undesirable data has been revealed, its potential for mischief is there, and it is futile to pretend it does not exist. Instead, law has developed several strategies to deal with these negative influences. Under the doctrine of the “fruit of the poisonous tree,” in American jurisprudence, evidence that stems from improper actions by public authorities is inadmissible in court. Or, a judge can give instructions to the jury to guard against improper inferences. Or, a jury verdict may be struck down on appeal. Strategies analogous to these might help to make machine learning a tool for better outcomes, rather than a trap that entrenches past mistakes and prejudice.

that the decision-maker draws from certain evidence that might otherwise have undesirable effects on decision-making. This strategy entails an adjustment to the inner workings of the process of decision itself.
Finally, restraint may be imposed at a later stage. For example, courts review outputs (verdicts and judgments) and, if they're not in accord with certain rules, strike them down, which, in turn, means that the instruments of public power will not act on them. In that strategy, a decisionmaking mechanism-for example, a jury, and as much might be said of a machine learning system-was not under inferential restraint (or it was but it ignored the restraint); the output the mechanism gives, on review, is unacceptable in some way; and, so, the output is not used. The restraint did not operate within the mental or computational machinery that generated an output but, instead, upon those persons or instrumentalities who otherwise would have applied the output in the world at large. Defect in the output discerned, they don't apply it.
We turn now to consider more closely the problem of bad evidence; the limits of evidentiary exclusion as a strategy for dealing with bad evidence in machine learning; and the possibility that restraining the inferences drawn from data and restraining how we use the outputs that a machine reaches from data-strategies of restraint that have antecedents in jurisprudence-might be more promising approaches to the problem of bias in the machine learning age.

8.1
The Problem of Bad Evidence As an Associate Justice of the Supreme Court, Holmes had occasion in Silverthorne Lumber Co. v. United States 4 to consider a case of bad evidence. Law enforcement officers had raided a lumber company's premises "without a shadow of authority" to do so. 5 It was uncontested that, in carrying out the raid and taking books, papers, and documents from the premises, they had breached the Fourth Amendment, the provision of the United States Constitution that protects against unreasonable searches and seizures. The Government then sought a subpoena which would authorize its officers to seize the documents which they had earlier seized illegally. Holmes, writing for the Supreme Court, said that "the knowledge gained by the Government's own wrong cannot be used by it in the way proposed." 6 In result, the Government would not be allowed to use the documents 7 ; it would not be allowed to "avail itself of the knowledge obtained by that means." 8 Obviously, no judge could efface the knowledge actually gained and thus lodged in the minds of the government officers concerned. The solution was instead to place a limit on what those officers were permitted to do with the knowledge: they were forbidden from using it to evade the original exclusion.
The "fruit of the poisonous tree," as the principle of evidence applied in Silverthorne Lumber Co. came to be known, is invoked in connection with a range of evidentiary problems. Its distinctiveness is in its application to "secondary" or "derivative" evidence 9 -i.e., evidence such as that obtained by the Government in Silverthorne Lumber on the basis of evidence that had earlier been excluded. Silverthorne and, later, Nardone v. United States , where Holmes's friend Felix Frankfurter gave the principle its well-known name, concerned a difficult question of causation. This is the question, a recurring one in criminal law, whether a concededly illegal search and seizure was really the basis of the knowledge that led to the acquisition of new evidence that the defendant now seeks to exclude. In the second Nardone case, Justice Frankfurter writing for the Court reasoned that the connection between the earlier illegal act and the new evidence "may have become so attenuated as to dissipate the taint." 10 But if the connection is close enough, if "a substantial portion of the case against him was a fruit of the poisonous tree," 11 then the defendant, as of right, is not to be made to answer in court for that evidence. 12 That evidence, if linked closely enough to the original bad evidence, is bad itself.
We have noted three strategies for dealing with bad evidence: one of these is to cut out the bad evidence and so prevent it from entering the decision process in the first place. This strategy, which we will call data pruning , 13 in a judicial setting is to rule certain evidence inadmissible. It is a complete answer, when you have an illegal search and seizure, to the question of what to do with the evidence the police gained from that search. You don't let it in. A different strategy is called for, however, if bad evidence already has entered some phase of a decision process. Judges are usually concerned here with the jury's process of factfinding. On close reading, one sees that Holmes in Silverthorne Lumber was concerned with the law enforcement officers' process of investigation. In regard to either process, and various others, a strategy is called for that restrains the inferences one draws from bad evidence. We will call that strategy inferential restraint . Finally, and further down the chain, where a decision or other output might be turned into practical action in the world at large, a further sort of restraint comes into play: restraint upon action. We will call this variant of restraint executional restraint . Data pruning and the two variants of restraint, all familiar since Holmes's day in American court rooms, have surfaced as possible strategies to address the problems that arise with data in machine learning. We will suggest, given the way machine learning works, that data pruning and strategies of restraint are not equally suited to address those problems.

Data Pruning
Excluding bad evidence from a decision process has at least two aims. For one, it has the aim of deterring impermissible practices by those who gather evidence, in particular officials with police powers. Courts exclude evidence "to compel respect for the constitutional guaranty [i.e., against warrantless search and seizure] in the only effectively available way-by removing the incentive to disregard it." 14 For another, it has the aim of preventing evidence from influencing a decision, if the evidence tends to produce unfair prejudice against the party subject to the decision. In machine learning, the first of these aims-deterring impermissible datagathering practices-is not absent. It is present in regulations on data protection. 15 Our main focus here is with the second aim: preventing certain data from influencing the decision. 16 Data pruning is the main approach to achieving that aim in judicial settings. Data pruning avoids thorny questions of logic, in particular the problem of attenuated causation. Just what inferences did the jury draw from the improper statements or evidence? Just what inferences did the police draw from the evidence gained from the unlawful search? And how did any such inferences affect future conduct (meaning future decision)? It is better not to have to ask those questions. This is a salient advantage of data pruning. It obviates asking, as Justice Frankfurter had to, whether the link between the bad evidence and the challenged evidence has "become so attenuated as to dissipate the taint." 17 Data pruning has the related advantage that, if the bad data is cut away before the decision-maker learns of it, the decision-maker does not have to try not thinking about something that she already knows. Data pruning avoids the problem that knowledge gained cannot be unlearnt. As courts have observed, one "cannot unring a bell." 18 The cognitive problem involved here is also sometimes signaled with the command, "Try not to think of an elephant." By deftly handling evidentiary motions, or where needed by disciplining trial counsel, 19 the judge cuts out the elephant before anybody has a chance to ask the jury not to think about it.
Machine learning has a fundamental difficulty with data pruning. To make a meaningful difference on the learnt parameters, and thus on the eventual outputs when it comes time to execute, you need to strike out huge amounts of data. And, if you do that, you no longer have what you need to train the machine. Machines are bad at learning from small amounts of data; nobody has figured out how to get a machine to learn as a human infant can from a single experience. Nor has anybody, at least yet, found a way to take a scalpel to datasets; there's no way, in the state of the art, to excise "bad" data reliably for purposes of training a machine. 20 Accordingly, data pruning is anathema to computer scientists. 21 As for legal proceedings, data pruning is, as we said, a complete answer to the problem it addresses-in situations in which the data was pruned before a decision-maker sees it. As we noted, however, not all improper evidence stays out of the court room. Nor does all knowledge gained from improper evidence-fruit of poisonous trees-stay out. Once it enters, which is to say once a decision-maker, such as a juror, has learned it, its potential for mischief is there. You cannot undo facts. They exist. Experience is a fact. Things that have been experienced, knowledge that has been gained, do not disappear by fiat.
A formalist would posit that the only facts that affect the trial process are those that the filters of evidentiary exclusion are designed to let in. As we have discussed, however, Holmes understood the law, including the results of trials, to derive from considerably more diverse material. Juries, lawyers, and judges all come with their experiences and their prejudices. To Holmes, these were a given, which is why he thought trying to compel decision-makers "to testify to the operations of their minds in doing the work entrusted to them" was an "anomalous course" and fruitless. 22 You cannot simply excise the unwanted experience from someone's mindany more than present-day computer scientists have succeeded in cutting the "bad" data from the training dataset.

8.3
Inferential Restraint What you can do-however imperfect a strategy it may be-is place limits on what you allow yourself, the jury, the machine, or the judge to infer from the data or the experience. Inferential restraint is familiar in both law and machine learning. In efforts to address the problem of bad evidence (bad data) in machine learning, most of the energy indeed has been directed toward this approach: instead of pruning the data, computer scientists are developing methods to restrict the type of inferential outputs that the machine is able to generate. 23 In the legal setting, placing restrictions upon inferences has been an important strategy for a long time. Judges' instructions to juries serve that purpose; appeals courts recognize that judges' instructions, properly given, have curative effect. 24 Judges, in giving curative instructions, understand that, even when bad evidence of the kind addressed in Silverthorne and Nardone (evidence seized in violation of a constitutional right) has been stopped before it gets to the jury, there still might be knowledge in the jurors' minds that could exercise impermissible effects on their decision. The jurors might have gained such knowledge from a flip word in a lawyer's closing argument. 25 They might have brought it with them in off the street in the form of their life experiences; Holmes understood juries to have a predilection for doing just that. 26 Knowledge exists which is to be kept from affecting verdicts, if those verdicts are to be accepted as sound. But some knowledge comes to light too late to prune. There, instead, a cure is to be applied. In the courtroom, the cure takes the form of an instruction from the judge. The instruction tells the jurors to restrain the inferences they draw from certain evidence they have heard. The restraint is intended to operate in the mental machinery of each juror.
A further situation that calls for inferential restraint is that in which some piece of evidence has probative value and may be used for a permissible purpose, but a risk exists that a decision-maker might use the evidence for an impermissible purpose. Pruning the evidence would have a cost: it would entail losing the probative value. Thus, as judges tell jurors to ignore certain experiences that they bring to the court room and certain bad evidence or statements that, despite best efforts, have entered the court room, so do judges guide jurors in the use of knowledge that the court deliberately keeps. 27 Here, too, analogous approaches are being explored in machine learning. 28

Executional Restraint
From Holmes's judgment in Silverthorne, one discerns that a strategy of restraint operates not just on the mental processes of the people involved at a given time but also on their future conduct and decisions. Silverthorne was a statement to the government about how it was to use knowledge. True, the immediate concern was to cut out the bad evidence root and branch, to keep it from undermining judicial procedure and breaching a party's constitutional rights. Data pruning is what generations of readers of Silverthorne understand it to have done; the principle of the fruit of the poisonous tree more widely indeed is read as a call for getting rid of problematic inputs. 29 There is more to the principle of the fruit of the poisonous tree, however, than data pruning. Consider closely what Holmes said in Silverthorne: "the knowledge gained by the Government's own wrong cannot be used by it in the way proposed" (emphasis added). So the "Government's own wrong" already had led it to gain certain knowledge. Holmes was not proposing the impossible operation of cutting that knowledge from the government's mind. The time for pruning had come and gone. Holmes was proposing, instead, to restrain the Government from executing future actions that the Government on the basis of that knowledge might otherwise have executed: knowledge gained by the Government's wrong was not to be "used by it." The poisonous tree (to use Frankfurter's expression) addresses a state of the world after the bad evidence has already generated knowledge. The effect of that knowledge on future conduct is what is to be limited. That is to say, executional restraint, the strategy of restricting what action it is permissible to execute, inheres in the principle. 30

Poisonous Pasts and Future Growth
Seen in this, its full sense, the principle of the fruit of the poisonous tree has high salience for machine learning, in particular as people seek to use machine learning to achieve outcomes society desires. A training dataset necessarily reflects a past state of affairs. 31 The future will be different. Indeed, in many ways, we desire the future to be different, and we work toward making it so in particular, desirable ways. But change, as such, doesn't require our intervention. Even if we separate ourselves from our desires for the future, from values that we wish to see reflected in the society of tomorrow, it is a matter of empirical observation, a fact, that the future will be different. Thus, either way, whether or not our values enter into it, we err if we rely blindly on a mechanism whose outputs are a faithful reflection of the inputs from the past that shaped it. We must therefore restrain the conclusions that we draw from those outputs, and the actions we take, or else we will be getting the future wrong. In machine learning, there is widespread concern about undesirable correlations. An example could be supplied by a machine that hands out prison sentences. The machine is based on data. The data is a given. Americans of African ancestry have received a disproportionate number of prison sentences. Trained on that data, a machine will give reliable results: it will give results that reliably install the past state of affairs onto its future outputs. African-Americans will keep getting a disproportionate number of prison sentences. Reliability here has no moral valence in itself; it connotes no right or wrong. It is simply a property of the machine. The reason society objects to reliability of this kind, when considering an example as obvious as the prison sentencing machine, is that this reliability owes to data collected under conditions that society hopes will not pertain in the future. We want to live under new conditions. We do not want a machine that perpetuates the correlations found in that data and thus perpetuates (if we obey the machine) the old conditions. Some computer scientists think there may be ways to address this concern about undesirable correlations by pruning the training dataset. 32 We mentioned the technical challenges this presents for machine learning. We speculate that the other strategies will be as important in machine learning as they have been in law: restrain the inferences and actions that derogate the values we wish to protect. That's how we increase the chances that we'll get the future right.

Notes
tainted. (ii) The other type is bad, irrespective of how the evidence collector behaved. It is bad, because it poses the risk of an invidious influence on the decision process itself. 17. Doctrinal writers on evidence have struggled to articulate how to determine whether the link between bad evidence and challenged evidence is attenuated enough to "dissipate the taint." Clear enough is the existence of an exception to the fruit of the poisonous tree. Unclear is when the exception applies. Here the main treatise on American rules of evidence has a go at an answer: This exception… does not rest on the lack of an actual causal link between the original illegality and the obtaining of the challenged evidence. Rather, the exception is triggered by a demonstration that the nature of that causal link is such that the impact of the original illegality upon the obtaining of the evidence is sufficiently minimal that exclusion is not required despite the causal link. field. Restraint has both aspects as well where one is concerned, instead of with preventing people from using knowledge to generate more knowledge, with preventing a machine learning system from using an input to generate more outputs. The overlap arises in machine learning between the two variants of restraint, because machine learning systems (at least in the current state of the art) don't carrying on computing with new inputs unless some action is taken to get them to execute. The executional restraint would be to refrain from switching on the machine (or, if its default position is "on," then to switch the machine off). The overlap is also significant where human institutions function under procedures that control who gets what information and for what purposes. Let us assume that there is an institution that generates decisions with a corporate identity-i.e., decisions that are attributable to the institution, rather than to any one human being belonging to it. Corporations and governments are like that. Let us also assume that, in order to generate a decision that bears the corporate identity, two or more human beings must handle certain information; and one of them, or some third person, has the power to withhold that information. The person having the withholding power may place a restraint upon the institution: she may withhold the information and, thus, the institution cannot carry out the decision process. The restraint in this setting has overlapping characteristics. It is inferential, in that it restrains the decision process; it is executional, in that it restrains the actions of the individual constituents of the institution. 31. See Chapter 3, p. 37. 32. Chouldechova & Roth, op. cit., Section 3.4 p. 7. Cf. Paul Teich, Artificial Intelligence Can Reinforce Bias, Forbes (Sept. 24, 2018) (referring to experts who "say AI fairness is a dataset issue").
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/ by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.