In what follows, I will bring us a step further than the often-invoked biased character of risk assessment tools toward minorities. Again, algorithmic biases are a fundamental, yet not surprising issue given that these algorithms are trained on datasets that are the products of a society suffering from structural racism. The point I would like to make is not that algorithms like COMPAS could or should be improved or optimized; improvement procedures presuppose the reduction of decision to computation, an assumption challenge in my third argument. Rather, their use at sentencing should be abolished.
Rewriting the Temporality of an Individual’s Life
It is considered self-evident in the US criminal justice system that individuals are not judged solely on the offense that brought them before the judge, but that the history of past offenses is taken into account at sentencing (Tonry, 2014, p. 172). One example of this is the “three-strikes law” applied in 30 out of 50 US states. Regardless of whether the offender was already punished with prison time for the first two offenses, by the third strike and in the case of a violent or serious offense, the offender faces a life sentence. This, however, should be viewed as far from evident, considering that the offender would have already been punished for their past crime. Criminologist Michael Tonry notes in a 2014 paper that the Scandinavian countries have a very different conception of punishment, as they deem that past offenses for which the offender has already been punished should not be taken into account at sentencing (Tonry, 2014, p. 172).
A new factor comes into play with predictive tools, however: the risk of future recidivism. This risk is evaluated based on the defendant’s criminal history but also on their overall recidivism risk in conjunction with the “class” of criminals to which the individual is attributed by the risk assessment tool. The data used to produce this evaluation do not solely belong to the individual. The predictors mentioned in “Sect. 2” are predictive only by comparing the individual’s data with the data of the norm group. As a result, the individual is judged relative to the category of criminal to which they are expected to belong. Generalization through datafication is problematic in regard to the conception of justice I mentioned earlier, which has to address the singularity of each case. If it does not, justice becomes nothing more than the automated application of general rules, no matter how different singular cases are from each other.
However, the question of the generalization of sentencing through datafication is not the one I would like to ask here. In fact, both individualization and generalization happen in the assessment produced by predictive algorithms. Indeed, the algorithm treats the offender in a highly individualized way when it comes to the amount and specificity of data gathered regarding that individual’s pastFootnote 8; on the other hand, it generalizes the data by simplification when it pertains to their future. As a result, the individual is judged relative to the category of criminal to which they are expected to belong—“expected” in the sense that predictions are probabilistic and do not amount to determinism. Probabilities only establish the frequency of an event occurring when another event takes place. For this reason, predictions do not establish a causal, necessary relation between both events. Producing a deterministic evaluation of the future of an individual would require—as is Minority Report—deity-like prescience or a system that disposes of the knowledge of the integrality of the factors and causal intricacies at a given stage of an individual’s life on a quantum level.Footnote 9 The algorithm necessary to compute the history of this individual would have to be exactly as long and complex as this history itself. It would thus be useless. Following the concept coined by the mathematician Gregory Chaitin, the individual’s history is incompressible: “… if the experimental data cannot be compressed, if the smallest program for calculating it is just as large as it is …, then the data is lawless, unstructured, patternless …. In a word, random, irreducible!” (Chaitin, 2005, p. 64). To predict the future of an individual necessitates discovering a pattern, and in order to do so, one must compare the data of the said individual to a dataset and thus give up their singularity. One has to trade off the certainty of an impossible determinism for the uncertainty of predictions. Patterning always already implies a simplification per generalization and, thus, a loss of certainty.
The question arising from this assessment is what allows a judge to act as if the future of an individual had already been lived. Indeed, no matter how many factors are included and processed by the predictive algorithm, the sentencing based on its result consists in judging a future that cannot be the future of the judged individual as this singular future has yet to be lived. This future is open, undetermined. It has not happened yet. Probabilities of recidivism can be high, but they are just that: probabilities. To base a decision on a probable outcome which is the outcome of a class of individuals means to deny an individual the openness of their future, in other words, to deny the multiplicity of possible outcomes, while implicitly devolving to the individual the whole responsibility for the social conditions in which they grew up, these conditions being used against them when gathered for prediction purposes. Incidentally, judging an individual based on a future that cannot be theirs can only be justified if justice is implicitly understood as the management of risks, while the present offense and the judged individual are secondary matters.
The denial of the indeterminacy of the individual’s future causes the temporality of the individual’s life to flatten into a present inescapably doomed by its past: the individual’s past is used to predict the future as if this future had already been lived, and this “as if” serves in turn to performatively determine their present through judicial decision-making. The point here is that the issue lies not so much in the existence of statistics and predictions; rather, it consists in the practice of basing decisions on them for purposes of justice. The actual outcome—that the individual reoffends or does not reoffend—might match the predicted outcome, as in the case of an offender who was granted parole based on a low-risk score of recidivism and who does not reoffend. But in the case of a sentence to prison based on predictions of recidivism, there is no way to know what the actual outcome for this individual would have been. With such decisions, the future of this individual has been performatively determined as lived—the “he will reoffend” in Northpointe Practitioner’s Guide—before it could actually be lived. For this reason, it is highly problematic to ground decisions about parole, probation, and time spent in a prison cell not on the present state and needs of an individual but on predictions about a future that cannot belong to the judged individual.
However, from the perspective of risk management, one would argue that society has both the right and good reason to protect itself from offenders by using all the knowledge and data at its disposal and that it is safer to falsely give a longer sentence to someone based on his or her risk of recidivism than freeing an individual by error who then happens to reoffend. This argument does not hold, as it has been proven that imprisonment does not make society safer: American prisons are criminogenic and imprisonment is highly detrimental to communities as it damages their social fabric (Clear, 2008).Footnote 10 What Angela Y. Davis calls the “prison-industrial-complex” (Davis, 2005, p. 35) incarcerates Black people at a much higher rate than their white counterparts for similar offenses, thus systematically overexposing this population to the risks tied to prison.
Probabilities vs. Decisions
In this section, I will discuss how the use of algorithms transforms a probabilistic vision of the future into a deterministic one thanks to the decisional and thus performative character of justice. Because decisions are performative, probabilities become deterministic: they produce the world they predict. To show this, I would like to turn to a discussion regarding the specific performativity of decision-making based on predictions and analyze what changes occur in the reality beyond the ones affecting the judged individual.
By taking the recidivism risk of an individual into account for sentencing, we are determining the future of this individual based, as we have seen in the previous section, on predictions tied to data that are not exclusively theirs. Because decisions are performative, by choosing between one or another outcome (prison or probation), we take probabilities as if they were deterministic: a given individual is predicted to recidivate and will thus be sentenced to prison as if they actually had reoffended. This decision supposes the existence of something, the future reoffense, that is not and cannot be, as the individual is now in prison.
Earlier, I called attention to the striking formulation by Northpointe that I repeat here: “The purpose of the risk scales is prediction—the ability to discriminate between offenders who will and will not recidivate (emphasis mine)” (Northpointe Inc., 2015, p. 7). Northpointe’s use of the indicative instead of the conditional tense confirms the imperceptible shift from probability to determinism previously discussed. This shift is enabled by overlooking the performative power of decisions. However, the connection established between prediction and determinism by the use of the future tense becomes more than an imprecise use of language when COMPAS gets used for the purpose of sentencing. By deciding, and thus performatively determining the present based on past data, one confirms the past state as the norm in light of which the future is preemptively understood as having taken place. Therefore, the past is that which will repeat itself simply because it once was the case.
In addition, the connection between a possible and an inescapable future is materially realized when, by modifying the future of an individual based on predictions, one creates new data that will eventually be added to the dataset serving to train the algorithm. As I am about to show, the more we judge on the basis of predictions, the more we produce auto-confirming data, and the more the reality will fit the data.
Let us draw on the case of the false positives: an individual is predicted to reoffend within 2 years but will not reoffend. This is nothing unexpected as individuals do defy the outcome predicted by the risk assessment tool in a high percentage of cases. In the case of COMPAS and of non-expert human beings’ predictions, false positives for Black individuals amount to 40.4% as opposed to 25.4% for white individuals (Dressel & Farid, 2018, p. 2). I have already pointed out how social injustice and racism are reflected in these numbers. Let us now proceed with a thought experiment: the case of a male individual called Y. Y was excluded from probation measures because of his high risk to reoffend based on his history and on his COMPAS scores.Footnote 11 Instead, Y is sentenced to prison. What kind of data does this case produce? The data confirm the connection between the prediction of a high risk of recidivism and sentencing to prison time. But Y could have belonged to the class of false positives. The issue is that we will never know if this is the case, as Y was not granted probation, and was sent to prison. There is a good chance that Y is among the 40.4% of “outliers” who defy the predictions. However, the possibility of being an outlier to the prediction has been materially excluded by his prison sentencing. It is now impossible to find out if Y would have reoffended or not within 2 years, and thus impossible to rectify future predictions. Y’s case cannot belong to the false positives anymore. As a result, decisions based on predictions systematically eliminate the false positive outliers. The predictions leading to a false negative are the only ones whose rectitude can be checked in real life. Let us pursue the thought experiment: male individual Z was predicted not to reoffend within 2 years and put on probation but ends up reoffending. Consequently, the result of the predictive algorithm was inaccurate and needs to be corrected, which in cybernetic terms is called “negative feedback.”
While the data produced in the case of Y solely confirms the correlation between prediction of recidivism and sentence to prison, the data produced in the case of Z will lead the algorithm to correct future predictions in order to avoid false negatives, leading to harsher predictions. This tendency will happen as soon as data resulting from algorithmic predictions are themselves integrated into the risk assessment tool. Following the mechanism of feedback loops, false positives are progressively eliminated, while false negatives lead to the correction of the algorithm, which results in more sentences to prison rather than releases on probation.
With the generalization of the use of predictive algorithms, no data will be produced that are not themselves the result of predictive mechanisms. The more data produced through predictive algorithms are fed back into the norm dataset, the more new predictions will reflect the absence of false positives and the necessity to avoid false negatives. As a result, less and less individuals should be predicted as non-reoffenders.
Justice as Risk Management
By change what the policy deputies mean is contingency, risk, flexibility, and adaptability to the groundless ground of the hollow capitalist subject, in the realm of automatic subjection that is capital. […] This economy is powered by constant and automatic insistence upon the externalization of risk, the placement at an externally imposed risk of all life, so that work against risk can be harvested without end. Stefano Harney & Fred Moten, The Undercommons: Fugitive Planning & Black Study, pp. 76–77.
In my third and final argument, I aim to show that decision-making premised on predictive algorithms performs a certain understanding of the function of justice. In this understanding, justice is not about fairness; neither is it about retribution or rehabilitation. Rather, it functions as an apparatus for the biopolitical regulation of risks. The judged individual in their irreplaceable singularity is secondary to this purpose. The connection I would like to establish here is the one between biopolitics—understood as the management by the state of the life and death of a population—and a preemptive form of cybernetics. Traditionally, cybernetic systems are characterized by their self-regulation in order to maintain the stability of their organization against the tendency toward energetic dispersion or chaos called entropy (Wiener, 1989). The specificity of preemptive systems is that their regulation consists of anticipatively avoiding something that has not yet happened. In order to understand this mechanism, it is necessary to clarify what making a decision consists of, and what characterizes a decision based on a prediction.
The definition of decision I offer here is loosely based on Jacques Derrida’s Force of Law (Derrida, 1990, p. 961f.) and Walter Benjamin’s Critique of Violence (Benjamin, 1996). A decision etymologically consists in performing a cut (Latin, de-cidere) within a complex reality with the help of a calculation following a set of rules in order to determine what will or will not be. At the same time, a decision is composed of the interpretation of these rules and the results of the calculation based on them. Indeed, if a decision were solely the result of a calculation, it would not be a decision. For instance, that 4 is the result of 2 + 2 is not a decision, only the result of a computation following a rule. To be a decision, a speech act must be more than computation.Footnote 12
Let us unpack this attempt at a definition. While predictive algorithms like COMPAS compute the risk of an individual’s recidivism, they contribute to but do not perform the decision strictly speaking as they provide a calculation without its interpretation. Predictions are expressed in the form of probabilities stretching from 0 (= will not happen) to 1 (= will happen). However, as demonstrated in the previous arguments, there cannot be a 0 or 1 probability of reoffending, as there is no way to gain absolute certainty regarding the future of a given individual. Because there can be no absolute certainty regarding the risk of an individual’s recidivism, there can be no calculation of where to “make the cut.” In contrast to predictions, the decision is binary. Deciding consists in turning the x% chance of an individual to reoffend into either a “will” (= 0) or “will not” (= 1) reoffend, and thus will or will not be sent to prison. The judge makes a cut by interpreting and evaluating the output of the algorithm.
Since there can be no certainty regarding the recidivism of an individual—no calculation of where to “make the cut”—the decision regarding the individual can never be entirely justified by computation and is, in that sense, ungroundable. Again, if a decision were to be fully grounded in computation, it would not be a decision (as in the case of 2 + 2 = 4). Therefore, each actual decision entails an interpretation with its measure of arbitrariness. It bears the risk of being wrong and comes with the responsibility associated with this risk. Because it can be wrong, a decision marks the limits of computation.
While predictive algorithms may mitigate the risk of making a wrong decision, they can never eliminate this risk because they cannot substitute for the measure of arbitrariness in every interpretation, which accounts for the never entirely groundable character of a decision. The use of predictive algorithms conceals that an interpretation happens each and every time a decision is made. This concealment contributes to the idea that justice could be reduced to the automatic application of rules.
We have previously established that a decision is performative. Let us now specify what is performed when a judicial decision is rendered. As we have seen, judicial decisions are performative in the sense that they reshape the life of the judged individual and the community that surrounds them. At the same time, in order to produce such an effect, the decision reaffirms the legal order and the context granting its performative power. A decision referring to an existing rule reinstates the legitimacy of the rule that serves to justify the decision—implying a circularity that underlines the necessary violence of the law (Derrida, 1990, p. 987; Benjamin, 1996). In consequence, deciding does not only entail determining what will be, it implies at the same time performatively reaffirming the normative context in which the decision takes place and makes sense, a context without which the decision would have no legitimacy.
The normative context that is performatively reaffirmed by a decision based on predictive algorithms is risk management. By entrusting predictive algorithms to help make decisions in the judicial context, one displaces the idea of justice as that which is tied to an always-singular situation necessitating a specific interpretation of the law (Derrida, 1990, p. 948) toward, instead, an automatized mechanism of regulation and modulation of risks—be it the risks that criminality is considered to represent for society, or the risks tied to the consequences of a wrong or unfair decision. Ezekiel Dixon-Román et al. describe this management of risks in terms of cost minimization: “In other words, incorrectly identifying an individual as high-risk, and making decisions regarding the nature of that individual’s probation and parole accordingly, is considered less costly than failing to identify someone who goes on to commit a ‘serious offense’ as defined above” (Dixon-Román et al., 2019, p. 31). How to explain this prioritization of risk management in judicial procedures over and above any regard for the judged individual? Antonia Majaca and Luciana Parisi conceive of the form of governmentality that makes use of predictive algorithms as “paranoid,” tying it to the “white male subject of humanism” (Majaca & Parisi, 2016, p. 4). This kind of governmentality emerged from a sense of permanent threat tied to the 9/11 attack and is marked by the desire to act based on what is not known (Amoore, 2013, p. 55 f.). Lorraine Daston leads the generalization of predictive algorithms further back to the context of “Cold War rationality,” where the risk of a nuclear catastrophe was mitigated by universal algorithmic procedures and the idea that everyone played the same game by the same rules. Daston describes “Cold War rationality” as a rationality relying on a set of rules that can be applied mechanically without interpretation, judgment, or deliberation.Footnote 13
While this sense of paranoia and general suspicion inherited from the Cold War and 9/11 can partly explain the generalization of the use of predictive algorithms, I would argue that they are part of a biopolitical mechanism set in motion during the nineteenth century. Risk management—a logic nowadays shared by financial institutions, insurance companies, and the criminal justice system—functions as a “productive” tool (in the Foucauldian sense) for population management at the service of biopolitical governance.
The notion of risk management is connected to a cybernetic conception of society. Modulating risks is part and parcel of a society that, since the nineteenth century, functions biopolitically. Here, we might remember that the aim of biopower is not to discipline bodies on the individual level; its goal is to establish regulating mechanisms from within the population in order to attain an equilibrium, “something like a homeostasis,” writes Foucault, using cybernetic terminology (Foucault, 1997, p. 249).
From a cybernetic standpoint, living and mechanical processes obey the same logic: both are systems that regulate their relation to their environment through feedback mechanisms that enable them to maintain their internal organization against the system’s tendency for energy dispersion or chaos (Wiener, 1989). Placed in this cybernetic context, predictive algorithms function as a naturalized means to maintain social homeostasis. The difference between traditional cybernetic systems and preemptive systems, however, is that in traditional cybernetic systems, the system regulates itself in light of events that have already happened and whose results are fed back into the system in order for it to adapt to a changing situation. By anticipating risks, preemptive systems regulate themselves relative to that which has not happened yet. They exclude in advance any event that could imperil an already given equilibrium, or more precisely, the norm that is at work in this equilibrium. And in order to protect themselves from hypothetical future risks, preemptive systems agree to expose the disenfranchised to actual risks in the present—be it the risks tied to predictive policing (Harcourt, 2007), to a life in prison, to unpayable health insurance, to homelessness and poverty.
As cited in the epigraph of this section, Stefano Harney and Fred Moten emphasize in The Undercommons that neoliberal capitalism is a mode of governance which submits disenfranchised, precarious Black and Brown lives to increasingly higher levels of contingency and flexibility—putting these lives at risk and making any kind of autonomous organization and planning increasingly difficult.Footnote 14 Similarly, in Society Must Be Defended (Foucault, 2003), Foucault describes racism as the way for biopower to let a part of the unwanted population die by exposing it to multiple risks of death or to political death by exclusion. By the sustained exposure of disenfranchised populations to risks by means of risk management tools, the government exerts its biopolitical prerogative to let die in order to maintain its perverse homeostasis.