Introduction

The current debate on the possibilities and limitations of the use of artificial intelligence is of vital importance for the criminal justice system (Castro-Toledo 2022; Greenstein 2021), and the use of actuarial algorithmic tools (NACDL Task Force on Risk Assessment Tools 2020) is a particularly important issue of this debate. Examples of technologies that have been deployed so far include VioGen, used in Spain to guide police protective action with regards to potential victims of gender-based violence (López-Ossorio et al. 2019); RisCanvi, which informs penitentiary measures in Catalonia (Andrés-Pueyo et al. 2018); the controversial COMPAS – Correctional Offender Management Profiles for Alternative Sanctions –, used by criminal justice agents in the USA to aid decision-making based on the risk of recidivism (Fass et al. 2008; Dieterich et al. 2016); and the world-wide implementation of HCR-20 – Historical Clinical Risk Management-20 –, used to predict the risk of violence in forensic practice (Scurich 2018; Silva 2020). In criminal law, the need to critically review the use of both existing and future technologies (Greenstein 2021) has largely been connected to discussions on the implications of the adoption of statistical risk assessment tools for the paradigm of culpability (Allen 2001; Greenstein 2021). Other authors have raised the issue of the scope and value that the predictions generated by such algorithms should receive in the police or judicial sphere (Hannah-Moffat 2009; Parmar and Freeman 2016; Ratcliffe 2019; Meijer and Wessels 2019; Garrett and Monahan 2020; Alikhademi et al. 2021; Ugwudike 2021).

But these legal reflections and warnings often obviate an important ongoing debate that is pertinent in the philosophy of science since the beginning of this century. At the core of this debate lies the epistemological value of statistical tools, which is also leading to reevaluations of the procedure (and value) of computational and statistical sciences. The purpose of this paper is to describe both debates, at least approximately, in order to understand the epistemological value that can be attributed to correlations and predictions in general. Thereafter, we will turn to the normative arguments that are implied by the epistemological ones. While knowledge on mathematic or philosophy are not strictly necessary to defer to normative arguments (e.g., the principle of culpability), it does appear that a deeper understanding of the subject matter is warranted to assess whether and how the use of a given algorithm infringes such normative principles. In order words, if legal operators are to make decisions based on risk and probability, and if criminologists are to inform and critically assess how such decisions should be made, it is important for them to understand what “risk” and “probability” mean in the fields that these terms originate from, and how they corresponding values are generated in the decision support systems. We regard this as the best way to sustain both normative and functional arguments, which corresponds to the character of our discipline and to our societal function as a community of experts that provides proper information for decision-making.

Theory-driven vs. data-driven scientific approaches

Big Data has fundamentally transformed many aspects of modern society, to which science is no exception. Accordingly, data science has emerged as a new scientific domain and seems to have consolidated in an intermediate space between statistics and computer science. But the influence of data science is not limited to the creation of a discipline or domain of its own. Its tools and logic are deeply impactful to various scientific fields, calling the uniqueness of the traditional scientific method into question, or, at the very least, offering a parallel path for the generation of knowledge.

Traditionally, science has followed a theory-driven approach aiming at the identification of abstractable, hence generalizable, elements and stable relationships between them (Maass et al. 2018). In essence, the scientific approach departs from a theory to identify research questions and generate hypotheses. Then, studies are designed and carried out to generate data that verifies (or refutes) the hypotheses, upon which researchers infer knowledge that, in turn, feeds theory (Andersen and Hepburn 2016). Bachelard’s formula (1971) “scientific fact is conquered, constructed and verified” illustrates this procedure and provides an account of a scientific epistemology. Theory enables us to break with preconceptions and apriorisms (especially in the social sciences) by constructing hypotheses that guide scientific research towards the generation of knowledge that will ultimately lead to the correction or readjustment of the initial theory.

In recent decades however, we have witnessed the emergence of a data-driven approach that, taking advantage of both the increasing generation of data as raw material and cheaper and faster calculation power, explores and analyses Big Data to extract “scientific” insight (Kitchin 2014). Here, the starting point is the processing and exploration of data to generate analysis and modelling strategies to discover and contextualize patterns and correlations (Jagadish 2015). The theory of the specific domain becomes secondary to data analysis techniques and, by extension, the use of algorithms to extract answers and solutions. The formulation of the questions and problems to be answered and solved, however, is not mainly informed by theory, as hypotheses are also generated based on the observed data.

In recent years, data analytics have become highly developed as a result of the enormous advances in the field of machine learning and the underlying improvements in inferential statistics and computational power (Brownlee 2011). It is undeniable that these techniques have opened up and revolutionized a wide array of fields, from computer vision to genetics. The main contribution of these approaches, perhaps, is their capacity to generate answers to a variety of practical problems, which can be sorted into two categories (Shobha and Rangaswamy 2018): prediction (e.g., an algorithm that recommends new content based on our viewing history on a streaming platform) and categorization (e.g., a filter that automatically sends certain incoming emails to our spam folder). The epistemological status of information in these two types of algorithms are largely the same. Data processing and modelling helps identify structures within the data to solve practical problems but tends to leave aside the inference of deep knowledge that is information that is embedded within a theory of a certain level of generality. However, this does not imply that data-driven approaches do not generate generalizable knowledge. But the “generalizability” of exclusively data-based knowledge refers to the application of the resulting tool (i.e., model) to new data and not to the integration of conceptual knowledge together with other elements into a theory. The generalizability of data is achieved by statistical techniques (e.g., bootstrapping, cross-validation) that ensure that the model is trained and validated on different datasets, thus avoiding overfitting, and guaranteeing that the model will work on new datasets (Dietterich 1995).

The difference between data and theory-driven knowledge in terms of generalizability also implies other differences between the two approaches. Not only does the role of theory (and by extension “expert knowledge”) vary in guiding the research at the beginning and in integrating observations into the theory at the end, but the different orientation of the approaches permeates a multitude of decisions and steps during the research process. This concerns the aforementioned generalizability to how variables are recorded. For instance, researchers might choose to base variable and label selection on previous research rather than algorithmically identified data clusters.

The death of theory?

Under the striking title The End of Theory, former Wired magazine editor-in-chief Chris Anderson warned in 2008 that the deluge of data could render the traditional scientific method (i.e., hypothesis testing) obsolete, reducing its usage and epistemic status. According to this argument, numbers would be able to speak for themselves if enough data is available (Pigliucci 2009) and causal explanations could become irrelevant. While the assessment of this argument will be made further below, it is at this point sufficient to state that the collection and processing of large amounts of data have already transformed traditional scientific practice and have given a fundamental role to data-driven science. On the one hand, this has led to the emergence of new areas of research such as bioinformatics or systems biology. On the other hand, data has also informed more established fields such as epidemiology and ecology. Most recently, the trend of evidence-based science is also impacting the social sciences, where digital humanities are transitioning from being the exception to being the norm. According to Anderson’s argument, this would lead to the displacement -to a secondary role- of explanatory theoretical frameworks, hypotheses, and discussions on whether experimental results refute or support the original hypotheses. Algorithmically discovered correlations seem to replace traditional causal relationships and the attribution of theoretical significance to empirical phenomena appears to become less relevant since the regularities found in the data are deemed sufficient (Mayer-Schönberger and Cukier 2013). From this perspective, the increasing degree of sophistication of algorithms and statistical tools to process massive amounts of data is the locus of innovation, since it is assumed that correctly processed data can be converted into information and, therefore, into “knowledge” (Mazzocchi 2015).

As Wikström and Kroneberg (2022) have recently pointed out, the criminological discipline has generated vast amounts of information about crime and its different dimensions (victims, offenders, correlates, etc.), and has produced a multitude of theories and hypotheses. However, attempts to generate theoretical frameworks capable of integrating and interpreting this plethora of ideas and information have not been successful. It is undeniable that the body of knowledge generated is of great value and has provided answers to different problems, but the lack of connection between theory and empirical observation has hindered the establishment of the causes of crime thus far (Wikström and Sampson 2016). Multiple correlates of crime have been found within different theoretical frameworks, such as the identification of the predictive capacity of risk factors (e.g.: Mohler et al. 2011; Brantingham et al. 2018; Caplan et al. 2011; Caplan and Kennedy 2016). However, discriminating causal factors from incidental correlates (cum hoc ergo propter hoc - fallacy) remains one of the discipline's greatest challenges (Farrington 2000).

The relationship between statistical techniques and causality in 20th-century science is not straightforward. The dominant statistical paradigm fully turns its back on answering causal questions through observational data (Pearl 2010; Pearl and Mackenzie, 2018). Accordingly, resorting to theory to interpret empirical findings and assign their causal meaning has been the dominant way to answer causal questions for certain disciplines. The displacement of theory as a means to establish cause-effect relationships can be considered a major paradigm shift in the scientific landscape of recent decades. This is what the expressions “the end of science” and “the end of theory” mean at their core, namely the idea that theoretical models with explanatory roots could be relegated to irrelevance, with correlations and predictions being revealed as the only necessary truths. While cum hoc ergo propter hoc – reasoning has long been considered fallacious, and while scientists have, for a long time, defended that “correlation does not imply causation”, Big Data and machine learning algorithms as theoretical models break with these principles: correlation could end up being considered sufficient to justify rational decision-making in the face of the stubbornness of data.

Causality beyond practice: exploring epistemological challenges

Legal and scientific interest in the configuration of causal relationships in the context of crime is as old as the systematic construction of criminal law. While systematically directed at one specific element of criminal phenomena (i.e., causation), it is connected to a myriad of structural elements necessary to determine the reasons for “crime”; and it does not seem exaggerated to venture that such interest will never cease to be topical (Zinger 2004; Garrett and Monahan 2020). Causality is essential to address further questions in criminology, such as the definition of “danger” or the attribution of probabilities in the prediction of human action. This practical interest in causality notwithstanding, it is perhaps more fundamental to note that behind the continuous legal construction of causes for particular (criminal) actions, there lies the philosophical debate on the structure of what “causing an effect” means in principle. This translates to the scientific preoccupation with the question of when some effect can be attributed to an external cause. This debate has intensified especially at the end of the 20th century, in parallel with the growing practice of data science (Mayer-Schönberger and Cukier 2013) and the new technical possibilities offered by Big Data and predictive algorithms for the processing of huge amounts of data. As mentioned above, some authors point to a real paradigm shift with regards to explanations or predictions of phenomena, which may well lead to the end of traditional scientific research (Anderson 2008). But in order to truly understand this debate and its implications for criminal law, it is necessary to uncover the epistemological implications behind the methodological issues considered thus far.

Although the question of causality as a central element of scientific explanation has been the subject of traditional philosophy, it was with the profuse developments in the philosophy of science in the 20th century, as well as in statistics and probability theory (i.e., Bayesian approaches), that the discussion reached its highest degree of sophistication. As recently reported by Woodward and Ross (2021), the modern discussion begins with the development of the deductive-nomological model, a direct legacy of the positivist proposals by the Vienna Circle with a broad presence in the scientific theory and practice of the 20th century up to the present day (Popper 1959; Braithwaite 1953; Gardiner 1959; Nagel 1961). Taking up the critical inductivist tradition of Hume (1748), Hempel (1942, 1965) proposed a model that, in general terms, is based on the identification of elements with causal relevance or explanatory capacity (explanans) for the emergence of other phenomena under given conditions (explanandum). As Diéguez de Miguel Beriain and Diéguez Lucena (2021) have recently rightly explained, causal models construe scientific explanation and prediction as two sides of the same coin in that they share the same logical structure. That is, explaining a phenomenon will subsequently allow us to predict its occurrence (i.e., symmetry thesis). Hence, there is interest in analyzing both as a tandem.

Despite the enormous success of this nomological-deductive model over various decades, its high dependence on deterministic laws and the epistemic impossibility of adequately dealing with natural data (Gabbay et al. 2011; Lindley 2000) led the subsequent philosophical literature of science to address the aforementioned limitations (Woodward and Ross 2021). The causality debate thus shifted toward the capability of integrating the power of prediction into causal explanations: the so-called probabilistic causation theories. The assumption behind this model is that statistically relevant correlations and properties (or information about statistically relevant relationships) are explanatory while statistically irrelevant properties are not (Woodward and Ross 2021). This implies – at least in everyday language usage – that those antecedent phenomena that by their statistical relevance increase the probability of the occurrence of a specific phenomenon are also the cause of this phenomenon. Over the last few decades, a large body of scientific literature has developed that seeks to reformulate this conception of causality in probabilistic terms, with special consideration given to those developed in the Bayesian framework (Bolstad and Curran 2016; Gelman and Shalizi 2013). The latter is characterized by the introduction of a formal apparatus for inductive logic and epistemic rationality tests as a way of extending the justification of the laws of deductive logic, resulting in a justification of the laws of inductive logic. What was unsatisfactory about these theories of probabilistic causation, however, was the great diversity of possible candidates for causation and their potential inability to solve practical problems. The same was the case in the field of criminal causation with the early formulations of the theory of equivalence of conditions (Talbott 2008).

The assumptions of probabilistic causality are implicitly present the Big Data paradigm, albeit with subtle differences. Traditional models are still embedded in a deductive understanding of scientific explanation and hypothesis testing, with hypotheses derived from specific theoretical frameworks that render test results meaningful (Rosenberg and McIntyre 2019). Data-driven research, on the other hand, and especially some machine learning and deep learning algorithms, are presented as (apparently) neutral inductive procedures that are disconnected from any theoretical framework that establishes an a priori interpretative horizon for the meaning of the identified correlations. This circumstance has led many authors to deny the epistemological value of these procedures. Relying on Popper's traditional critique of inductive procedures, they highlight the risk that this procedure ends up giving epistemic status to spurious correlations, and deny epistemic value to so-called “black boxes” (Bibal et al. 2021) in which no causal explanation is possible behind the correlation (Pearl 2018).

New horizons for causal explanations

As pointed out above, there is no agreement on these criticisms either. Most authors do recognize that the success of predictive algorithms cannot be achieved with purely inductivist logic or the destruction of theoretical models and causal explanations, and that they will have to rely on the incorporation of at least approximate causal relationships (Pietsch 2021). Nevertheless, it is also argued that the “end of science” - thesis is a necessary starting point for a far-reaching epistemological discussion on the prediction of phenomena in the era of Big Data (Pietsch 2021); and others spell out warnings against the evaluation of the new epistemic tools based on excessively rigid models with the subsequent risk of falling into fallacious fatalistic arguments against current knowledge (Vallverdú 2020). Faced with the limitations of deductive logic for quantitative research, the lack of traditional sociological variables and the abundance of unknown variables and data formats, Kitchin et al. (2014) propose an abductive logic of scientific inference for quantitative research to mitigate the shortcomings of inductive logic without theoretical guidance. Abductive reasoning overcomes the deductive logic of causal explanation while avoiding the problems of inductive reasoning: the novel hypothesis as generated by abduction would not arise, as C.S. Peirce conceived it (Fann 2012), from instinct, but from data and correlations that lead to the adoption of a hypothesis and its subsequent testing employing the traditional scientific method. There is no leap over what is known, but rather the creation of a suggestive explanatory hypothesis from existing data, which can then be falsified via counterfactuals or other deductive methods (Nagin and Sampson, 2019).

Influential authors such as Judea Pearl (2009, 2010; Pearl and Verma 1995; Pearl Glymour and Jewell 2016), Donald Rubin (2001, 2005) and James Robins (Robins et al. 2000) have driven recent advances in causal analysis emphasizing some of the paradigmatic shifts that the move from traditional statistical analysis to causal inference entails. Especially Pearl and colleagues have made notable advances in the field by formulating a structural theory of causality (Pearl 2009; Pearl 2010) which brings together and structures the advances brought about by graphical models, the potential outcomes framework, and structural equations. This theoretical framework provides a systematization of causal reasoning that departs from classical statistical reasoning based on correlation and offers its own procedures and nomenclature for deriving and resolving causal questions. We are dealing with an algorithmization of causal reasoning that allows, through a series of axioms and mathematical theorems, not only to estimate the causal effect of an event or variable on a population in probabilistic terms but to apply these estimates at the level of the specific case and resolve a multitude of questions hitherto alien to the scientific field. These are synthesized in the formulation: “What would have happened if?” The logic of counterfactuals permits the formulation and solution of this kind of question (Pearl et al. 2016), enabling researchers to know the value that a variable may take under hypothetical conditions that have not yet been observed, but that can be simulated in the form of a counterfactual and resolved mathematically. In a criminological context, the risk of recidivism in an individual under the conditions of a given intervention is an adequate example.

Practical implications on criminal justice

It might be argued that few of the arguments characterized in this article are of immediate interest to criminal law and criminal justice, or only in very general terms. In this sense, some of the above arguments may only prove to be of interest as soon as the use of opaque neural networks and deep learning algorithms for criminal decision-making is standard procedure. This situation might still seem far away, especially given that current predictive algorithms are based on rather classical statistical models and have other kinds of (methodological) problems (Chander 2016; Ferguson 2017; Meijer and Wessels 2019; Miron et al. 2021). At the same time, it could be argued that upon the arrival of more sophisticated opaque algorithms, their epistemological implications are unimportant, given that their usage can easily be limited out of normative considerations related to either the right of defense (Raymond and Shackelford 2013), the principle of guilt (Netter 2006; Slobogin 2006), or others. While we agree that the normative perspective beyond the empirical validity of statistical tools is warranted in a criminal context, we do posit that it should be complemented by a philosophical-scientific (i.e., epistemological) perspective for two reasons:

Firstly, by ignoring the epistemological perspective on opaque AI tools, one creates a risk of falling into a disconnection between the epistemic value of certain new forms of knowledge on the one hand, and the recognition of their value for our field of application on the other hand. Resistance against the use of certain tools, although argued on the grounds of normative guarantees or principles, may be based on an erroneous exaltation of the epistemic value of traditional research methods in unison with the denial of the value of others – even when the latter proves to exhibit greater rigour and epistemic capacities. Secondly, waiting for the widespread use of algorithmic tools before their evaluation hinders the incorporation of ethical-normative standards into their design at the moment of their conception (Miró-Llinares 2019). Thus, a better understanding of the epistemic value of algorithmic tools is desirable from a normative standpoint, to increase the quality (both ethical and epistemic) of the tools that are going to be used. In addition, starting the debate on the epistemological level before adopting normative judgements on the use of algorithmic tools to support criminal decision-making provides an advantage to reflect on how prognostic judgement has been made in criminal law hitherto.Footnote 1

The concern about prognoses and how they are made is not restricted to the use of modern risk assessment tools in criminal justice, understanding the epistemology of statistical probability is useful beyond this purpose. Prognostic arguments have been incorporated in criminal trials in more than the obvious ways. Especially judges are regularly forced to make prognostic judgements, and it is no stretch to demand that such inferential judgements be based on a scientifically established and normatively agreed-upon procedure that is based on true cause-effect relationships – within the confines of the legal and scientific possibilities. Conversely, it is unreasonable to deny legal operators access to support tools on the basis of their epistemological validity, as traditional methods of prognostic judgement do not appear to be subject to the same scrutiny.

It is striking that discussions on the epistemological value of algorithmic tools in criminal justice begin at a time when traditional prognostic tools, whose effectiveness is largely unsupported by empirical evidence, are beginning to be forced out by structured risk assessment tools. Although the rise of the criminal law of dangerousness is a recent phenomenon that is related to the controversial incorporation of sanctioning measures such as probation, judges have long had to make decisions based on prognoses of risk or dangerousness. Until recently, clinical judgement was one of the most traditional prognostic measures. In this paradigm, judges would decide probation based on expert information (i.e., psychologists or psychiatrists). From a historical perspective, dangerousness, as opposed to the risk of violence, has sought to be the predictor of violence par excellence (Netter 2006; Loinaz and de Sousa 2019). However, its parsimony is its main limitation: it lacks specificity and subsequent scientific problematization. In this sense, the error and success rates in the prediction of violence based on the diagnosis of dangerousness by specialists is almost exclusively dependent on their previous experiences, on the low or un-availability of “dangerousness identification tools”, or on the prevalence of the predicted phenomenon (for a general discussion of the practical risks of the concept of dangerousness in the criminal justice system, see Brown and Pratt 2000). While many of the variables that were classically taken into account have not changed in comparison to modern prognostic tools (e.g., sex, age, background, among others), the prognostic model of clinical judgement follows an opaque logic that lacks the available scientific guarantees. This does not mean, of course, that it should be disregarded; but it is certainly unreasonable to place it as the most desirable scenario in judicial decision-making in the face of other opportunities, especially given the lack of academic and scientific problematization it has received in comparison to algorithmic risk assessment tools.

Prognostics of dangerousness began to change in the 1980s when, inspired by the political-criminal assumptions of resocialization, actuarial tools for risk assessment gradually began to replace clinical judgement in key decision-making contexts in the penal system (Ferguson 2017; Garrett and Monahan 2020; Meijer and Wessels 2019; Ratcliffe 2019). The rigorous identification of risk factors associated with higher or lower probability of recidivism (Andres-Pueyo; Arbach and Redondo 2018) or violence (Douglas 2013; Forth et al. 2003; Boer et al. 1997; Kropp and Hart 2000), their mathematical modelling for the prediction of risk, and constant review of their performance upon application to new samples of subjects (Loinaz et al. 2017) imply methodological progress of the state-of-the-art in risk estimation based on transparent, reviewable, and reproducible standards. The same is true for police forecasting tools, particularly those based on location (Caplan et al. 2011; Brantingham et al. 2018) which, by analyzing patterns of crime occurrence, serve to better estimate where crime will occur, in order to allocate adequate resources for prevention.

In Spain, two examples of risk prediction are pertinent. RisCanvi is used to assess the risk of recidivism of those convicted in the Catalan prison system (Andrés-Pueyo et al. 2018), while VioGen assesses the risk of gender violence and the adoption of protection measures for victims by law enforcement agencies throughout the country (González-Álvarez et al. 2018; López-Ossorio et al. 2019). Both tools seek to improve the validity and reliability of traditional clinical judgements. The epistemological structure of these algorithmic assessment tools appears to be curiously similar to the clinical judgement methods described above, in that both move on the plane of probabilistic estimation of future violence. There is a subtle but important difference. The advantage of risk assessment tools is that they establish the main predictive factors more transparently and stratify different types of violence. This means that these tools abandon a homogenous conception of “dangerousness” and take the heterogeneous nature of “violence” into account, thus increasing the predictive capacity of risk assessment tools (Weisburd and Eck 2004). This does not mean that these tools are unproblematic. Certainly, available scientific knowledge is still insufficient, so further investigation into explanatory and predictive risk factors of preventive interest under the traditional scientific paradigm is still necessary. This way both the validity of future algorithmic tools and the legitimacy of judicial decisions taken on their basis can be maximized, thereby reducing the uncertainty of legal operators. It is also important to underline that risk assessment tools have shown important limitations in identifying very specific risks. Further limitations are related to problems with their application by users (Stevenson and Doleac 2019; Monahan et al. 2018; Perez Trujillo and Ross 2008).

Future efforts to overcome these limitations will be interesting from an epistemological standpoint. The emergence of Big Data and the supposed revolution of the scientific method and the “datafication of the world” result is a scenario where essential predictive factors (such as age, gender, criminal record, and others) which were generated in theoretical explanatory frameworks (both inferential and causal) are linked to all kinds of other not theoretically justified data. These combined datasets can be used to extract correlations via machine learning or deep learning, without a motivated causal explanation. Such data could furthermore be enriched with real-time data from surveillance cameras, Internet usage, and other techniques such as facial or movement recognition. Such scenarios may not lie too far in the future but cannot yet be taken into consideration for our argument as hardly any tools of this type are used in the judicial field. One exception are geo-located police forecasting tools such as Geolítica or Predpol (Brantingham et al. 2018). Scientific studies seem to indicate that predictions may improve with machine learning and other AI techniques in the future (Berk 2019; Ghasemi et al. 2021; Campedelli 2022), at least in the sense that newer tools’ predictive capacity will increase under the elimination of some variables. This will challenge the prevalence and validity of predictions made by theoretically motivated risk assessment tools from psychology and other social sciences. In this regard, Kelly Hannah-Moffat (2019) has rightly characterized the evolution of actuarial tools in the following way: while actuarial tools are characterised by 1) having explanatory and predictive capabilities, 2) analysing aggregate populations and limited population sizes, 3) collecting and analysing data with attention to a particular research methodology, 4) using methods based on social science disciplines (often psychology), and 5) being designed for a particular purpose; new risk algorithms based on big data would be 1) able to predict, analyse a huge and infinitely scalable volume of data, 3) gather data from a multiplicity of sources with considerable methodological limitations, 4) have greater variability of purpose and 5) suspected of generating a black box. In other words, and focusing on this last element, correlations are improved, but the explanatory capacity of the phenomena is lost.

Conclusions

In this study, we set out to explore to what extent computing strategies based on machine learning are good candidates for decision support strategies in criminal law and criminal justice. Special attention was given to the quality of the knowledge that they can generate from an epistemological point of view. While the full answer to this question exceeds the limitations of a single article, we were able to point out the numerous challenges that have to be addressed in the current state of affairs. The traceability of automated algorithmic solutions evokes both the criticisms of traditional clinical judgements and the demand for the greatest possible transparency from a normative standpoint. An overlooked, but in our estimation, more critical line of argumentation is the question of the epistemic value of algorithmic techniques and their apparent inability to inform effective causal mechanisms. While causation is a critical explanatory structure, it is also true that the latest generation of automated algorithmic tools can be used to identify weaknesses of traditional risk assessment tools, and can be used to explore alternative explanations using abductive procedures. Their potential, therefore, lies in questioning discretionary judgements (e.g., dangerousness) in favour of more nuanced variables (e.g., risk of recidivism).

We have shown the importance of the current debate on the scope and use of risk assessment tools, whether they make use of automated machine learning or similar techniques. In the criminal justice system, the normative debate on these tools cannot be separated from their epistemic and scientific description without reducing their evaluation ad absurdum. Optimism in the capabilities of algorithmic justice solutions has been criticized, sometimes rightly, sometimes furtively and unjustifiably (Eaglin 2019). Such criticisms point to the overestimation of the empirical testability of risk and subsequent false beliefs and malpractice in criminal justice system operators, most notably when results are misinterpreted based on a lack of understanding of the epistemic limits of statistical results, especially when the apparently different nature of human and algorithmic decisions is not considered.

On the other hand, brute techno-pessimism does not seem adequate either, particularly in the form of “conformism” which demands more scrutiny against risk assessment tools than against traditional decision-making models (e.g., clinical judgement); such demands amount to ignoring the possibilities of new forms of knowledge acquisition, even if only critically and to mitigate the harm that might result from their application. In this sense, the first thing to recognize in the face of the refutation of these new possibilities for the criminal justice system is the empirical verifiability of risk. Claiming that risk was, in principle, not empirically verifiable would amount to denying the epistemic possibilities of criminology and other social sciences that operate under the “evidence-based” decision-making model (Ferguson 2017; Ratcliffe 2019; Meijer and Wessels 2019). Furthermore, the identification of the associated factors of crime has, in practice, led to important societal and legal developments that cannot be discussed away.

The matter is different when a tool’s explanatory and predictive power is criticized based on lacking empirical evidence or suboptimal algorithmic architecture. On this, the epistemic debate is clear: the enormous divergences between the different methodological approaches available in the social sciences (Bhattacherjee 2012; Hagan 2000; Krathwohl 1993, among others) beg the question of whether all research designs are equally valid or whether some are more desirable than others in their given contexts. The fact that standards for assessing the quality of empirical evidence have existed for some decades now (Weisburd et al. 2001) seems to indicate that not all designs are equally valid. In this sense, the predictive and explanatory capacity of different tools can be evaluated based on scientific criteria. In the same sense as rankings of the epistemic quality of empirical evidence have been proposed (Farrington et al. 2003), algorithms are currently evaluated based on predictive power, precision, reliability, security, robustness, traceability or explanatory power, resulting in the establishment of quality standards for trustworthy algorithms (e.g., see on the 17 of July 2020, the High-Level Expert Group on Artificial Intelligence (AI HLEG) presented their final Assessment List for Trustworthy Artificial Intelligence; or EUROPOL’s AP4AI Framework Blueprint).

We want to highlight the importance of explanatory power for our field because the criminal justice system is based on argumentation and justification, with no room for decisions made based on a cum hoc ergo propter hoc fallacy. It is therefore essential that all algorithms that provide predictive information for decision-making come closer to explanatory and argumentative models, especially when they touch upon the rights of affected individuals.