Then a miracle occurs: cause, effect, and the heterogeneity of criminal justice research

del Pozo, Brandon; Belenko, Steven; Taxman, Faye S.; Engel, Robin S.; Ratcliffe, Jerry; Adams, Ian; Piquero, Alex R.

doi:10.1007/s11292-024-09636-7

Then a miracle occurs: cause, effect, and the heterogeneity of criminal justice research

Open access
Published: 12 September 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Experimental Criminology Aims and scope Submit manuscript

Then a miracle occurs: cause, effect, and the heterogeneity of criminal justice research

Download PDF

Brandon del Pozo ORCID: orcid.org/0000-0001-6481-2196¹,
Steven Belenko²,
Faye S. Taxman³,
Robin S. Engel⁴,
Jerry Ratcliffe²,
Ian Adams⁵ &
…
Alex R. Piquero⁶

1 Altmetric

Abstract

In “Cause, Effect, and the Structure of the Social World” (2023), Megan Stevenson makes a claim that randomized controlled trials (RCTs) have not had a significant effect in criminal justice settings. She then draws the conclusion that the gold standard for research designs, RCTs, are inherently incapable of doing so, demonstrating that the social world they intervene on is too complex, but also too resilient, to respond to the types of interventions that are evaluable by RCT. She calls the insistence that RCTs can work an “engineer’s” view of the world, which she discards as a myth. The argument then conflates RCTs with other methods of generating and sustaining change in organizations and systems, and closes suggesting RCTs should be discarded for less rigorous but more sweeping means of social reform. This article proceeds as follows: It characterizes Stevenson’s argument, which she asserts is empirical, as a de facto meta-analysis of criminal justice RCTs executed as a heuristic and presented in a narrative format. It argues that if a formal meta-analysis would be rendered invalid by violating established protocols, then a heuristic analysis that commits the same errors would be invalid as well. The analysis then presents the prohibitions on pooling studies with heterogeneous designs, interventions, outcomes, and metrics for the purpose of meta-analysis. It demonstrates that Stevenson pools a wide range of heterogenous studies, rendering her empirical meta-analytic claims problematic. It is true that many criminal justice RCTs have produced null or lackluster results—which also constitute an important outcome—and attempts to replicate significant findings have often been unsuccessful. This is not unique to criminal justice: psychology was recently in crisis when it was determined few of its most prominent studies could be replicated. However, less rigorous methods of reform do not solve this problem. Instead, more comprehensive research designs such as hybrid implementation/effectiveness trials can reveal aspects of our social world that impact external validity and generalizability. Findings from these studies can help illuminate the conditions that impact outcomes and sustainably modify highly resilient human behaviors. These methods arise from techniques in medicine and public health, which Stevenson brackets off as fundamentally different from criminal justice. This type of thinking may be the actual myth that prevents progress.

Evidence Mapping to Advance Justice Practice

Doing Sociology in the Age of ‘Evidence-Based Research’: Scientific Epistemology versus Political Dominance

Article 04 July 2017

COVID, Crime & Criminal Justice: Affirming the Call for System Reform Research

Article 16 December 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Over the last several decades, much has been written about the impact of criminal justice innovation and reform. One of the most recent contributions, Megan Stevenson’s (2023) “Cause, Effect, and the Structure of the Social World,” revisits well-documented issues, including the limited impact of many of these efforts, and the challenges of replicating the results of randomized controlled trials (RCTs). She asserts that her article is “built around a central empirical claim: most reforms and interventions in the criminal legal space are shown to have little lasting impact when evaluated with gold standard methods of causal inference” (p. 2023).^{Footnote 1} She then goes on to say this reveals the structure of our social world, a structure which is resistant to the type of interventions amenable to evaluation by RCTs. Believing otherwise, she says, amounts to embracing a myth.

Most new claims about the limited impact of RCTs do not generate further discussion among researchers, many of whom are engaged in efforts to address this issue. However, Stevenson’s article generated interest across a wide audience, presumably because she used that claim to assert the experimental work of her colleagues is predicated on an unacknowledged myth, their most cherished methods inherently limit the prospects of meaningful social change, and their careers in experimental research are destined to accomplish little of lasting significance. Some readers may also have taken at face value Stevenson’s claim that these concerns have gone unacknowledged, giving the impression her work breaks new ground. Since publication, her article generated hundreds of retweets on X (Stevenson, 2024, January 2), hundreds of thousands of views, a series of discussions in scholarly venues, and proclamations that it should serve as a cornerstone of syllabi about criminal justice reform. Motivated by Stevenson’s article, Vital City, a journal of data-driven urban policy, commissioned a series of essays that weighed in on the role and limits of empirical evidence in social change (Glazer et al., 2024).

Here, we examine the methods Stevenson used to derive her central claims. We will argue that as a work published in a law journal, her methods were not subject to the methodological scrutiny normally accorded to analyses that pool empirical evidence to make a global claim about something as profound as the structure of our social world. Under more thorough examination, we believe her methods violate prohibitions on pooling heterogenous studies for the purposes of meta-analysis, and this error precludes her ability to draw global conclusions about criminal justice RCTs. Since Stevenson uses this to derive her conclusion about the structure of the social world, its failure as a method precludes the ability to make this second claim. As a result, Stevenson’s conclusions speak beyond the data.

We endeavor to do more than challenge the claims of one paper. Stevenson has done the service of revealing gaps in people’s knowledge about both ongoing efforts to ensure experimental research better scales up and translates to other settings, and the real hazards of ceding time and resources to less rigorous ways of pursuing innovation and reform. Implementation science, a comparatively nascent field designed to assist in such scaling and successful translation, is thriving in healthcare and medicine and making inroads into criminal justice, a field that would be well-served by speeding up its use (del Pozo et al., 2024). At the same time, we will argue that sweeping alternatives to experimentally-driven change rely on fast jumps toward idealistic endstates, are often pursued as articles of faith, and are impossible to reliably achieve unless we carefully unpack their constituent steps. This brings us back to the need for rigorous research. In this way, Stevenson lays the groundwork for aligning criminal justice research methods with the progress we are witnessing in public health, medicine, and other fields that share the goal of creating healthy, resilient, and just communities through rigorous science.

RCTs, their limits, and the structure of our world

In “Cause, Effect, and the Structure of the Social World,” Stevenson (2023) begins with what she calls “big claims” (p. 2043) about RCTs in the broadly construed field of criminal justice. She suggests that the interventions amenable to rigorous evaluation by RCTs are by their nature too limited to generate high-impact change, and that ultimately, such changes cannot be “engineered,” since the world is too complex and yet too stable to be susceptible to such engineering. She argues that reforms and interventions in the field of criminal justice demonstrate little progress, and that this lack of meaningful impact reveals the structure of our social world. She then concludes that the belief that social change can arise from the types of interventions testable by RCTs is contingent upon believing a myth about the world we live in (pp. 2038, 2040, et alibi).

The implication of this argument is that we should reject or seriously curtail the criminal justice RCT as a “gold standard” of research. This is a bold claim, given its implications for the focus and legitimacy of the National Institute of Justice and private research philanthropies such as Arnold Ventures, which fund RCT research with the stated goals of “delivering precise, reliable processes capable of generating consistent, repeatable outcomes,” (NIJ) and “correcting system failures through evidence-based solutions… to create change that outlasts our finding” (Arnold) (p. 2038).^{Footnote 2} The alternatives Stevenson points toward include the more sweeping systemic changes of Marxism, embracing the indeterminism of Hayek, or accepting other, unspecified approaches (p. 2044). Stevenson claims her conclusion is not simply a matter of informed opinion, but meets an evidentiary standard. She states, “this is an evidence-based Article, in that I build my entire argument around evidence derived from RCTs” (p. 2041, emphasis added), adding “my claim is that these studies teach us something broader about the structure of the social world. This is an inductive argument.”^{Footnote 3} Figure 1 provides its formal expression.

In this argument, the successful induction of the second assertion depends entirely on a finding that criminal justice RCTs do not work in a global sense, and then the rest of the conclusions follow. As a self-described empirical claim, the strength of the argument therefore rests on the analytical power of the first conclusion, i.e., it is strong enough for a person to globally assert that (1) RCTs in criminal justice do not contribute to substantial social change in some broad, aggregate way, and (2) this reveals something conclusive about the structure of the world. It is also critical to note the explicit directionality of this reasoning: the structure of the world is revealed by empirical findings about RCTs, and not the other way around. In other words, if an analysis of RCTs cannot yield Stevenson’s conclusions about their ineffectiveness, then her thesis about our social world remains unproven and cannot be used to explain, in the other direction, why RCTs do not work.^{Footnote 4}

Evaluating Stevenson’s claims

We can first observe that Stevenson presumes RCTs are failures when they demonstrate negative results. Many would contest this presumption. The lessons McCord (2003) and others drew from the famous null findings of the Cambridge-Somerville Youth Study greatly improved the design of the Montreal longitudinal-experimental study (Tremblay et al., 1992) and made the case for ceasing certain deleterious and wasteful approaches to delinquency prevention. If they prevent the misallocation of resources, prevent communities from pinning their hopes on something that will not deliver, identify the unforeseen harms of an intervention, or determine how different interventions with shared goals compare, then null RCTs should be deemed successful.

There are other reasons to question the conclusions Stevenson reaches about RCTs, such as the use of a highly incomplete convenience sample, a focus on straw man arguments about “engineers” and people's belief in the immense cascading effects of their favored interventions, and a categorial disregard of the emerging field of implementation science. Some of these issues are considered below. First, however, we will examine whether Stevenson’s methods have the scientific rigor necessary to make her key inductions. We will argue they violate the methodological requirements of meta-analyses by pooling highly heterogeneous studies for the purpose of generalizing about them as a group, thereby precluding validity. This is the critical point upon which the empirical argument of our paper rests.

Narrative reviews and meta-analyses: a methodological primer

First, we need to explain why we are treating Stevenson’s argument as a meta-analysis. One of the challenges of assessing her paper is that she makes a piercing empirical claim about the structure of our world without clearly laying out an approach that will power it. Regarding methods, she says.

My strategy here is threefold. First, I take a wide lens, and discuss findings from a broad survey study of RCTs in a variety of criminal justice topics. Second, I zoom in on several of the most prominent and influential studies of the last few decades, studies in which the effects were so promising that multiple replication studies were attempted. Third, I move through a variety of popular, highly-studied interventions in criminal justice and discuss the evidence associated with each... This three-part strategy certainly falls short of definitive proof; those who arrive skeptical of my claim may not walk away fully convinced. Nonetheless, I hope it is eye opening. (p. 2020)

This renders her approach a type of narrative review.^{Footnote 5} However, her strategy as written cannot plausibly reveal the structure of our social world or render a given conception of it a myth. Canvassing a selection of individual RCTs piecemeal cannot reveal how the world must be structured to have prevented each from having a meaningful effect. This method’s only induction is that future RCTs will probably also fail, but for many possible reasons, and Stevenson’s analytic strategy cannot tell us which reasons, or why. As an example: observing that a person never responds to you despite hundreds of calls, emails, and texts allows you to confidently presume they will not respond the next time you try, but it does not reveal why they are not responding. Likewise, an analysis that suggests individual RCTs will keep failing because RCTs very often have does not provide evidence that one possible reason (e.g., a particular structure of our social world) holds decisive explanatory power while other plausible reasons do not. This problem alone would suggest Stevenson’s conclusions about the world are unsupported by her methods.

Our response intends, however, to construe her strategy as one that could plausibly work. It could, by our view, if Stevenson is offering a de facto heuristic meta-analysis conveyed in narrative form,^{Footnote 6} allowing her to “systematically pool together all relevant research in order to clarify findings and form conclusions based on all currently available information” (Rosenthal & Schisterman, 2010). By this construal, Stevenson asserts that the pooled, or global, treatment effect of the criminal justice RCTs she selected for her study is not significantly different from zero,^{Footnote 7} so the thing we call a “criminal justice RCT” is ineffective at producing change. This type of global conclusion across a large and powerful body of pooled RCTs allows her to opine about the causal structure of the world with much greater confidence. In other words, as a plausible method, we argue Stevenson uses a narrative heuristic to do the work of a formal meta-analysis. We will examine it accordingly.

In doing so, we argue that if, for reasons of analytic integrity, we could not use a formal meta-analysis to conclude that criminal justice RCTs do not have a global treatment effect meaningfully distinguishable from the null, then we cannot use a heuristic meta-analysis to draw this conclusion either. If we cannot draw this heuristic conclusion, there is insufficient reason to reject the hypothesis that criminal justice RCTs “work,” and we cannot draw conclusions about the structure of the social world. Again, keep in mind the directionality of Stevenson’s inductive claim: it requires a certain meta-analytic conclusion about criminal justice RCTs to say anything about the social world. It cannot, in reverse, proceed from an observed assumption about the social world to explain the history of RCT performance. The corresponding formal argument is laid out in Fig. 2.

As a technique, meta-analysis “combines or integrates the results of several independent clinical trials considered by the analyst to be ‘combinable’” (Huque, 1988, p. 29). The standards for plausible combination have been extensively studied, concluding that they require a defensible congruence of targets, methods, and outcome metrics to permit generalizable conclusions about cause and effect. As Thompson (1994) observes, “although meta-analysis is now well established as a method of reviewing evidence, an uncritical use of the technique can be very misleading” (p. 1351), with the risk for a heuristic review being presumably being more acute. Our premise is that as a matter of analytic rigor, we cannot plausibly pool the RCTs presented in Stevenson’s article for any type of meta-analysis, be it formal or heuristic. In other words, while there is an ability to conduct meta-analyses amidst certain narrow conditions of heterogeneity (Higgins & Thompson, 2002; Petitti, 2001), you cannot pool: (1) different types of RCTs to accumulate the power to draw a conclusion about all RCTs, (2) different interventions to draw results about interventions in general, and (3) different outcomes measured using different metrics to accumulate the power to draw a global conclusion about a class of intervention, and then use it to conclude something about the structure of the social world. Our first example from public health presented below is used to illustrate this concept.

RCTs in public health as an example of heterogeneous pooling

Suppose there are three random neighborhood-level assignment trials in which municipal departments of health failed to reduce asthma in Black communities, three individual-level random assignment trials in which they failed at intervening against obesity, and three randomized stepped wedge cluster trials where they failed at intervening against HIV. Barring trials of truly extraordinary size and scope, when considered as individual categories of intervention, these three sub-groups of results do not individually empower us to say that departments of health are inherently incapable of intervening on asthma, obesity, and HIV, nor do they permit us to say anything about the structure of the social world of Black communities. They neither have the power, nor provide the theoretical bases to do so.

Critical for our analyses here, we also cannot combine these studies to say that the nine studies, in aggregate, show departments of health are inherently incapable of intervening on the Black community’s public health problems, or that this empirical track record allows us to draw conclusions about the world. We cannot pool heterogeneous and independently underpowered trial design sub-types to draw this type of global conclusion.

Finally, the results of heterogenous interventions on heterogeneous targets with heterogenous outcome metrics should not be pooled to opine on the global effects of the overarching normative project (i.e., in this case, protecting the health of Black communities). In the above example, we would be pooling the subgroups of obesity, asthma, and HIV outcomes as intervention targets (Table 1). This would indicate pooled infection, weight, and lung function metrics somehow aggregate in a way that amplifies our means to generalize about our inherent ability to exert an effect on any of them, as well as all other health problems that are a part of the project. That is epistemologically impossible, hence the prohibition on doing so in meta-analyses. Having illustrated Stevenson’s problem, we next provide a criminal justice example.

A criminal justice example of invalid heterogenous pooling

Table 1 An example of invalid heterogenous pooling of public health studies for meta-analysis

Full size table

Assume we have a few of the following randomized trials: Neighborhood-level trials about the effect of upgrading street lighting on street crime; stepped wedge cluster by work shift trials of a training to reduce police use of force as measured by suspect injuries; and individual-level randomizations of cognitive behavioral therapy to prevent probation violations among people with substance use disorders. Now, we would like to study the effects of linking people with substance use disorders to buprenorphine treatment rather than charging them with nonviolent misdemeanors, and we have proposed a precinct-level randomized trial to test the effect on subsequent overdose.

Under the methodology used by Stevenson, null results about street lighting, de-escalation training, and psychotherapy that respectively measure street crime, suspect injuries, and probation violations using neighborhood, stepped wedge, and individual randomizations could be aggregated to prospectively conclude that a precinct-level intervention about linkage to addiction medications will not reduce overdose because of what the null results of prior trials reveal about our social world (Table 2). Again, such pooling is invalid whether done formally or heuristically, and does not allow prospective meta-analytic predictions about the study design or effects of the proposed buprenorphine study.

The irrelevance of time frames in meta-analyses

Table 2 An example of invalid heterogenous pooling of criminal justice studies for meta-analysis

Full size table

As a final note on methods, Stevenson frames her analysis as “50-Plus Years of RCT Evidence.” The subtitle is misleading. Timespans have no bearing on the results of meta-analyses, formal or heuristic, but may incorrectly imply that enough time must have passed for there to have been enough studies to draw a meaningful meta-analytic conclusion about them. In the public health example above, it makes no difference if the nine studies were conducted all at once across the nation, or over the course of decades. A paucity of related studies over 100 years, which could be presented as “A Century of RCT Evidence” may provide the same level of analytic insight as the same number of studies conducted over the course of a decade, and less insight than a high volume of studies conducted over the course of two years. Such a title may therefore bias a reader unacquainted with the relevant methods, and despite this 50-year span, Stevenson overlooked several trials that found replicable, significant results. That concern will not be fully addressed here.^{Footnote 8}

The vast heterogeneity of criminal justice research

As argued above, the studies used by Stevenson to generate her conclusion cannot be pooled with the necessary meta-analytic rigor reflected in widely accepted protocols such as the Campbell Collaboration (Schuerman et al., 2002). As the principal source of the RCTs in Stevenson’s study, a review Farrington and Welsh (2006) takes up 122 RCTs over 50 years in five broad criminal justice fields.^{Footnote 9} She then supplements these studies with a selection of others. While each category of study is too small to offer confident global conclusions, the body of studies is far too heterogenous across all dimensions to permit their pooling to power generalizable results (Table 3). A more modest pooling of heterogeneous results in a recent meta-analysis evaluating one narrowly defined type of intervention resulted in the retraction of published findings (May et al., 2018). As a result, there is insufficient evidence to support the conclusions about the structure of the social world drawn in Stevenson’s article.

Table 3 Heterogeneity of studies used for heuristic meta-analysis in Stevenson, 2023 as compiled from Farrington and Welsh (2006)

Full size table

Although the temporal span of a set of studies has no bearing on a meta-analysis of their overall effects, it does allow us to comment on the pace and scale of the research projects they constitute. Farrington and Welsh (2006) cataloged 122 RCTs over 50 years in five broad criminal justice fields; if each trial lasted about two years, this amounts to an average of about one active RCT per setting, per year, in the midst of a gargantuan and sprawling criminal justice system. It suggests that RCTs in criminal justice settings have been considerably underutilized, perhaps due to the limited funding opportunities for RCT research at the federal level (National Research Council, 2010).

Discussion

When a heuristic is used to make an empirical claim, the safeguards that promote and protect the validity of meta-analytic methods are evaded. These safeguards concern the inclusion criteria for detecting the universe of criminal justice RCTs, the criteria for excluding irrelevant or inappropriate ones, what steps will be taken to guard against selection and analytical biases, and why an exception is justified that would permit the pooling of highly heterogeneous studies and results. A protocol that pools every included RCT together to make a claim about the power of RCTs, regardless of their methods or outcome metrics, would have revealed fundamental flaws before the analysis was performed. However, protocol reviews are not typically undertaken by the type of legal journals where Stevenson’s article was published.

It may be the case that RCTs and other methods of strong causal inference have proven insufficient for improving our social world, especially in terms of criminal justice. That does not mean they are not necessary, or at least helpful (Sampson, 2010). When there is enough clinical equipoise (Freedman, 2017) to wonder whether one intervention is more or less effective than another, there are few better ways to answer that question than an RCT.^{Footnote 10} To assume that we already know what the comparative effects of different criminal justice interventions will be is—at our stage of human understanding—a view motivated by hubris or ideology, especially given that criminal justice research consists of disciplines widely regarded as immature sciences (Hibbert, 2016). With this in mind, we spend the remainder of our discussion surveying ways we might improve the quality of criminal justice research. We begin by demonstrating that, contrary to Stevenson’s assertions, the challenges of replicability and external validity facing scientists have been widely acknowledged for some time and pose challenges that hardly limited to criminal justice research.

The challenges of replication are widely known and acknowledged

Stevenson asks, “why aren’t my empirical claims more broadly known?”(p. 2046), and “shouldn’t the academics and policymakers working in this space know better?” Publishing in a law journal, Stevenson takes the time to walk nonspecialists through hypothesis testing and causal inference. More relevant scientific journals, however, assume this knowledge among their readership, and focus on the observation that there is a replication crisis in several fields of science, including many that conduct research in criminal justice settings. They include psychology, information science, business, finance, epidemiology, medicine, and public health (Anvari & Lakens, 2018; Coiera et al., 2018; Dreber & Johannesson, 2019; Hicks, 2023; Jensen et al., 2023; Lash et al., 2018; Oberauer & Lewandowsky, 2019; Pagell, 2021; Ryan & Tipu, 2022; Tackett et al., 2019). The same fields have also conceded that the quest for statistical significance born of publication biases pressures researchers to p-hack (Wooditch et al., 2020), or to HARK, i.e., “hypothesize after the results are known.” This latter phrase, coined by Kerr (1998), has since been cited in research over 2,200 times, and 1,180 of those times were since 2020. This is the reason the protocols for systematic reviews and meta-analyses are required to explain how they will account for such biases in their execution, and the rationale for requirements to pre-register protocols for clinical trials, examples of how science has responded to these problems.

It is also well known that RCTs often fail to produce significant results, most produce small effects, have limited external validity, and have a track record of failing under replication, perhaps an artifact of the very nature of null hypothesis significance testing (Lash, 2017).^{Footnote 11} Table 4 presents the prevalence of articles containing relevant search terms as indexed by PubMed, the National Library of Medicine’s repository of published health and medical research. The repository is an index of all papers published in 5,600 medical and life science journals, and peer-reviewed paper by a researcher whose efforts were funded at least in part by the National Institutes of Health. The second part of the table expands the search to Google Scholar, which is more inclusive. The scientific community is fully conversant in the framing theses underlying Stevenson’s argument.

The instructive relevance of medicine and public health

Table 4 Prevalence of publications referencing concerns about research quality and replicability

Full size table

Readers may be tempted to continue implementing RCTs in criminal justice relieved that Stevenson’s threat to their work seems to have been neutralized. We believe that would be a mistake. Even if there is not a defensible empirical claim to support Stevenson’s conclusions, there is a broadly philosophical one that concerns norms, utility, and the distribution of scarce resources. It is reminiscent of an observation made by Kirsch (2016):

Philosophers are people who, for some reason—Plato called it the sense of wonder—feel compelled to make the obvious strange. When they try to communicate that basic, pervasive strangeness or wonder to other people, they usually find that the other people don’t like it. Sometimes, as with Socrates, they like it so little that they put the philosopher to death. More often, however, they just ignore him.

Readers should not ignore Stevenson’s article because they do not like it and have the privilege of not needing to pay attention. That a scholar has garnered considerable attention for the opinion that RCTs cannot work in criminal justice settings should be a call to consider why it was even facially plausible for Stevenson to make that claim. The answer may be because criminal justice research is still emerging as a true science.

The fact is, RCTs that intervene on human behavior have demonstrated effectiveness. We need only to look to medicine and health to see their power, two fields which Stevenson brackets off as possibly not susceptible to the same critiques she makes of criminal justice. “I want to reiterate that this is a claim about the nature of the social world and does not extend to physics or biology,” Stevenson says. “Medical research, for instance, has clearly shown that limited scope interventions (e.g., drugs or vaccines) can have large and widely replicable effects. Fields such as public health, which straddle medical and social sciences, may be exempt for similar reasons” (p. 2033). In other words, because these fields test the effects of interventional drugs, surgical or other clinical interventions, or the preventive power of vaccines, they are not relevant to the conversation because they concern petri dishes or mechanistic effects rather than our complex, messy, and resilient social world.^{Footnote 12}

It is true, medicine has what we call basic science at its core: the study of how bodies biologically respond to vaccines, medications, and other interventions and treatments. As the Association of American Medical Colleges (n.d.) states, basic science

…focuses on determining the causal mechanisms behind the functioning of the human body in health and illness, and utilizes hypothesis-driven experimental designs that can be specifically tested and revised. More recently, “systems biology” has focused on understanding how complex systems arise from elemental processes. Once these fundamental principles of the biologic processes are understood, these discoveries can be applied or translated into direct application to patient care.

But this is only a portion of the research that takes place in medicine.^{Footnote 13} Note the reference to “application to patient care.” Before the first pills are ingested or the first scalpel pierces skin, a trove of medical research has evaluated the systems that brought the patient to that point and led to the decision to prescribe or operate. These decisions were not just made by consulting a decision tree or a diagnostic manual but were the outcomes of sprawling systems of public and private administration that operate based on malleable laws, policies, and decisions. The widespread criticism of our healthcare system, from its obscene costs and overwhelming billing procedures to its inequitable access to care, are signs that there is more that needs to be researched in medicine than basic science.^{Footnote 14}

This is even more the case for public health, defined as the pursuit of reductions in morbidity and mortality at the population level, or “at scale,” that is, the level of analysis Stevenson holds up as the ultimate target. That public health has achieved notable successes is not because of the basic science that underlies many of its strategies and initiatives. The National Institutes of Health’s RePORTER website (www.reporter.nih.gov) catalogs the constituent institutes’ extramural research spending in its entirety, by individual study. Browsing the entries, each of which includes a “Public Health Relevance Statement,” would quickly dispel the misconception that public health is not operating waist- or chest-deep in our social world.^{Footnote 15}

In public health, RCTs have been used to test and then scale behavioral interventions that are inherently messy and human, such as ones to help people stop smoking, disclose stigmatizing conditions such as HIV or mental illness without upending their lives, choose and adhere to taking pre-exposure drugs to prevent contracting HIV, grow old in healthier ways, and practice safer sex. If we want to understand how and why police officers use their discretionary powers to arrest and charge people when they could divert them to more effective alternatives, or why defendants drop out of diversion programs, then RCTs have a place in seeking answers for the same reasons they have a place in understanding why people do not use condoms or drop out of drug treatment programs.^{Footnote 16} From its inception in 1998, the evidence-based policing movement is a direct intellectual descendant of its counterpart in medicine, which likewise initially proved resistant to evidence-based change (Sherman, 1998), and still remains resistant (Greenhalgh et al., 2014). The Campbell Collaboration standards absent from Stevenson’s methods—which would have foreclosed its heterogenous pooling—were derived from the highly successful Cochrane Collaboration used in health research (Davies & Boruch, 2001). Research has found physicians often resist the effective implementation of evidence-based practices as a feature of their professional culture (Pope, 2003), and the time frame for implementation of evidence-based medical practices is thought to average 17 years (Morris et al., 2011). Lessening this resistance and shortening that timeframe spawned an entire field concerned solely with the effective and sustainable translation of research into practice, i.e., implementation science. Nothing like it exists in criminal justice research yet. We take that up below. Rather than a mythical engineer’s view of the world, the idea that there is something about medicine and public health that makes their more established research traditions incommensurate^{Footnote 17} with criminal justice research may be the central myth Stevenson unwittingly cites in her search for a path forward.

Implementation science, external validity, and outcome metrics

Stevenson deals almost entirely with the distinction between internal and external validity but leaves unexamined the reasons why an internally valid practice may lose its causal power when transported to other settings with different demographics, systems of governance, finances, and institutional and community cultures, as was the case with perhaps the most famous trial about police responses to domestic violence (Sherman & Berk, 1984; Sherman et al., 1992). Such threats to external validity are hardly limited to criminal justice but abound in medicine and public health as well. In response, these fields have developed implementation science, which concerns “the scientific study of methods to promote the systematic uptake of research findings and other evidence-based practices into routine practice” (Eccles & Mittman, 2006). Although Stevenson makes no mention of it, how to effectively translate evidence-based findings across settings has been the subject of over 9,100 articles in the health literature since the inception of implementation science in about 2006.^{Footnote 18} Perhaps this is because with the exception of a group of cooperative agreements funded by the National Institute on Drug Abuse (Belenko et al., 2022; Knight et al., 2016, 2022), there has been little incorporation of implementation science into the criminal justice field (del Pozo, et al., 2024). That seems to be changing, however. With an inaugural 2024 solicitation to fund implementation science demonstration projects and a multiyear Evidence to Action initiative, the present director of the National Institute of Justice has focused on partnerships between researchers and practitioners that emphasize implementation research as a critical component of effective science.

Stevenson’s analysis also does not distinguish between the efficacy and effectiveness of an intervention, which highlights two different veins of intervention research: one intended to establish internal validity, and the other that explores how to successfully generalize a valid model across settings with different contexts (Fagan et al., 2019). Implementation science considers a systematic approach to promoting external validity through hybrid trials (Landes et al., 2019). This not only measures the effectiveness of an intervention but also tests different approaches to implementation to determine which ones enhance effectiveness by ensuring fidelity—or, just as importantly—indicate where adaptations are necessary due to myriad differences between settings.

Finally, there is no mention of what the distal endpoints of justice research ought to be. For example, if they are predominantly health outcomes (that is, we measure how well criminal justice systems decrease morbidity and mortality, protect health, and foster resiliency at the community level), we may see the field migrate toward health research, which has been adjacent but more successful in some ways (del Pozo et al., 2021; Goulka et al., 2021). A more trenchant observation may therefore be that by adhering to a paradigm that identifies crime rates and recidivism as the principal outcomes, criminal justice sets up its research puzzles in ways that have historically disincentivized interdisciplinary work. As Kuhn observed over 50 years ago, such an approach can

…insulate the [research] community from those socially important problems that are not reducible to puzzle form, because they cannot be stated in terms of the conceptual and instrumental tools the paradigm supplies. Such problems can be a distraction, a lesson brilliantly illustrated by… some of the contemporary social sciences (p. 37).

The reason why these omissions matter is not because they provide a means to dismiss Stevenson’s argument, but because they provide a clearer understanding of the problems that concern it, and feasible ways to make the progress we all presumably desire. What it is missing from her analysis is a thorough grasp of the philosophy of science it alludes to at the end (Kuhn, 1970), which has wrestled with all of this before and provided a language for it. Contemporary researchers are tangling with these problems as we speak. There are several ways to describe what has happened in criminal justice without rejecting the possibility that the field can make substantial progress.

As evidence of this, the idea of the meta-analytic narrative review arose from a discussion of Kuhn’s philosophy of science in a paper cited nearly 8,700 times: “Diffusion of innovations in service organizations: systematic review and recommendations” (Greenhalgh et al., 2004). It takes as its premise that even when there is an evidence base for a practice, diffusing it in ways that maintain its external validity is a challenge, and we must be deliberate and rigorous or run the risk of foregoing the effects demonstrated in prior trials. Criminal justice institutions, in their immense scope and fragmentation, are precisely the types of organizations susceptible to these concerns.

The mechanistic fallacy of effective systemic change

Critics of an overly evidence-based approach to criminal justice should also acknowledge the significant obstacles posed by alternatives. One may be an overemphasis on mechanistic end states that portray a utopia. Stevenson clears the way for this point in observing that “…RCT’s tend to focus on questions that aren’t a priori obvious… One does not need an RCT to evaluate whether providing food to the hungry fills bellies… Outcomes that are the direct, mechanical effect of a reform of intervention are generally too obvious to fall within the scope of my claim” (p. 2035). This is akin to reminders that we do not need an RCT to know we benefit from using dental floss (Holmes, 2015), or using a parachute when we jump out of airplanes (Smith & Pell, 2003). The problem with this type of statement is that the more visionary and sweeping the proposed systemic reform is (i.e., as outlined on p. 2043), the more likely it is to devolve into a call for simple mechanical effects. The preconditions of police and prison abolition provide a suitable example. Andrea Ritchie and Mariame Kaba visualize abolition with this invitation: “Can we just imagine a world and build toward it where everyone has everything for everyone without any kind of policing, surveillance, or punishment” (McMenamin, 2023)? If we take this seriously, it is about bellies full of food, bodies with adequate health care and shelter, and people with enough money to live stable, secure lives. If a society can deliver these mechanistic end states, their thinking is that people would not have a reason to engage in criminal behavior, and that those who do would be suffering from medical and psychological conditions that a properly compassionate and supportive society can ameliorate without policing. There are few visions of reform more sweeping, and we do not need an RCT to understand how these things would benefit people.

The problem is that food only fills bellies after it has entered a person’s mouth, shelter only warms and protects a body after the person has taken refuge in it, and money only provides stability and security after it is in a bank account, wallet, or purse. To get food into mouths, people into shelter, and money into bank accounts in widespread, consistent, and resilient ways requires a type of social organization that is bedeviled with complexity. Scientific inquiry is an important means by which we learn how to overcome the resulting challenges. Eschewing it is like saying that the reason traveling by commercial airplane is by far the safest way to travel is because we have fully mastered using air to create lift under a wing. If airline safety is the result of exquisite “engineering,” then it includes engineering (universal, federalized, tightly managed and overseen) systems that guide and direct the multitude of complex human behaviors that keep planes in the air (Stolzer et al., 2023), including factors such as pilot training, adequate numbers of air traffic controllers, reducing human error in airplane manufacturing and maintenance, and airplane crew satisfaction, for example.

Calls for systemic reform unmoored from rigorous research bring to mind the cartoon by Sidney Harris, famed illustrator of science, shown in Fig. 3. Too often, sweeping social reorganization absent rigorous research and planning is just that: the presumption that a miracle will occur somewhere in the middle, be it the miracle of spontaneous social organization, Marxism, abolition, unregulated capitalism, or an ideological belief in any other type of revolutionary change that has yet to demonstrate success.

Is criminal justice research a pseudoscience?

In 1783, the philosopher Immanuel Kant wrote Prolegomena to Any Future Metaphysics That Will Be Able to Present Itself as a Science. It remains one of the most important works of western analytic philosophy. A polemic, it was written in response to David Hume’s incisive critique of the very idea of causality. For Kant, Hume’s critique had a profound effect on his scholarship. In a statement revered among analytic philosophers, he said “David Hume… first interrupted my dogmatic slumber and gave my investigations in the field of speculative philosophy a completely different direction.” Over 240 years later, the epistemological bases and procedures of scientific inquiry have yet to be settled, even if they have become more arcane.

In tying together its big claims, Stevenson’s final footnote references The Structure of Scientific Revolutions by Thomas Kuhn (1970), the magisterial account of scientific knowledge that brought us the idea of the “paradigm shift.” It is meant to bolster the claim that the engineer’s view of the world is what constitutes the present research paradigm, and we should reject it: “researchers see their project as one of mapping the causal structure of the social world in order to help improve it. In other words, the engineer’s view persists because the engineer’s view forms the basis of the research paradigm” (p. 2047).

This is not what science means when it discusses Kuhnian paradigms. They are the actual causal maps that describe the world, not the research methods used to derive them per se. They can run into limits when extreme cases (such as speeds approaching that of light) produce anomalous results that do not reconcile with the accepted mapmaking rules, and a new scientific theory emerges to accommodate them (such as relativity theory). That new set of concepts and its corresponding language, which subsumes but is largely incommensurable with the old one, is the paradigm shift. This distinction between uses of the term “paradigm” is important to note because Kuhn’s only applies to mature scientific fields that falter when they take on the final, stubborn cases in their “mopping up operations” (Kuhn, 1970). Behavioral science has yet to coalesce to this level of maturity. To speak in terms of its paradigm is, ironically, premature. In this way, Stevenson’s note that Kuhn shows us that “progress within science can be limited by scope of current scientific paradigm” is technically correct, but it is in reference to well-developed explanatory paradigms that dominate and guide their fields of inquiry. There is no such thing in behavioral science at present.

Second, Stevenson’s characterization of “paradigmatic” scientific inquiry is generic to a degree that seems to scuttle science itself. What field of earthly science is not endeavoring to understand the relationships that describe the world, and to what extent they are generalizable, with the goal of putting that knowledge to use? In laying out this commitment, Kuhn (1970) observed that in addition to certain paradigmatic beliefs about the building blocks of their field,

…there is another set of commitments without which no [person] is a scientist. The scientist must, for example, be concerned to understand the world and to extend the precision and scope with which it has been ordered. The commitment must, in turn, lead him to scrutinize, either for himself or though colleagues, some aspect of nature in great empirical detail. And, if that scrutiny displays pockets of apparent disorder, then these must challenge him to a new refinement of his observational techniques or to a further articulation of his theories (p. 42).

To reject the engineer is to reject that project. Once you reject it, you may be an activist, reformer, revolutionary, or a daring innovator, but not a scientist. At the core of every scholarly endeavor is explaining how concepts can be used to express relationships between things. There may be another implication, however. Saying we have little reliable knowledge about the causal structure of the social world in justice settings seems to imply that criminal justice is a pseudoscience. It is what philosophers of science call an endeavor that persists in its ways despite providing little to no generalizable knowledge, and does not yield accurate predictions about its subjects (Hasson, 2021). If you follow the logic that criminal justice intervention research has produced little generalizable knowledge, and it is your belief that it simply cannot, then pseudoscience is the label for it. Criminal justice shares many traits with other social sciences, which raises the question as to whether Stevenson may be implying that sociology, psychology, public health, political science, and economics are pseudosciences as well. In this vein, it would be interesting to explore whether her call for sweeping reforms without adherence to rigorous research methods tracks theories of epistemological anarchism (Feyerabend, 2010).

Conclusion

If a formal method lacks the power and validity to demonstrate something, then there is no reason to believe the informal heuristic of that method can do so. The heuristic simply becomes an informed opinion or an intuition. Opinions and intuitions are of some importance in social inquiry; however, while they point the way for further study, and to the acute need for improvements in the scope, metrics, methods, and goals of criminal justice research, they cannot be used to reliably reveal the structure of our social world. We have ample evidence in other fields of research programs that include RCTs in a constellation of methods that have been successful at modifying the same types of complex human behaviors we seek to modify in criminal justice settings. We should approach those fields with curiosity and humility to see how we can improve rigorous causal research in criminal justice. Perhaps a more pertinent (and tenable) conclusion from Stevenson’s article is that it highlights how few RCTs have been conducted for a field that puts so much at stake for people and is so important for maintaining public safety, societal order, and public health.

Data Availability

All data presented in this article were obtained from the public domain and are thereby available to any person for use without restriction.

Notes

All in-text page numbers refer to Stevenson (2023).
Arnold Ventures’ Executive Vice President of Criminal Justice has highlighted the importance of randomization as a research design: “Research based on causal research designs—such as difference-in-differences, regression discontinuities, instrumental variables, and randomization—is best suited to answering the question that is always front of mind for us: Is this intervention effective?” (https://www.arnoldventures.org/stories/funding-strong-causal-evidence-on-what-works-q-a-with-jennifer-doleac; accessed January 31, 2024).
As a side note, it could be argued that Stevenson does not assail criminal justice RCTs, but rather just the interventions these RCTs evaluate. However, her article elides the two throughout its arc by arguing that interventions have little lasting effect because they are designed to be evaluable by RCTs, and this design constraint is why they have accomplished so little. While sweeping societal changes that break the chains of RCT design may accomplish more by Stevenson’s view, they are not compatible with this type of rigorous experimentation. Instead, a preoccupation with RCTs inherently prevents pursuing these big changes, and can serve to justify our own conservatism toward social structures (p. 2042). In other words, her critiques of interventions and the analyses that test them are conceptually inseparable.
This paper will not consider Stevenson’s argument that researchers put too much stock in the promise of a single intervention (e.g., employment, housing, or addiction treatment) to have a cascading effect that dramatically changes a person’s life trajectory or a more sweeping set of justice outcomes in a lasting way (p. 2031). There is broad recognition, especially in the areas of developmental and life course criminology, that many factors come together to set a person’s life trajectory (see Sampson & Laub, 2017; Piquero, 2023). In that context, the goal of behavioral science is to identify the constituent, interlocking pieces of the social world and, to the extent that that are individually malleable, examine and adjust them so the overall results are greater than the changes to each part. The idea of the small intervention with big, lasting effects may be the holy grail for people who seek to demonstrate genius or acute cleverness, but it is otherwise a straw man. We would argue most researchers see their intervention work as a smaller, more incremental contribution to the whole.
It is beyond the scope of this article to explain the history of scientific review for the purposes of generalization for the nonspecialist.
There may a temptation to deny this, and to assert that while Stevenson makes an empirical claim, it is not a meta-analytic one per se. We do not know how to respond to that, because we cannot make sense of it. Pooling studies into a group, reviewing them, and then making a general empirical conclusion about the group is the definition of a meta-analysis, regardless of the methods used for pooling and review, and the standards on which the general conclusions are based. Conversely, if one argues the paper is a systematic review that stops short of a meta-analysis, then that is problematic as well: such a method shows us what direction the accumulated evidence points in, but explicitly denies us the ability to draw global conclusions, especially about something as sweeping and profound as the structure of our social world.
Another interpretation of these results may be that the behavioral/treatment/policy interventions themselves are not powerful enough, or appropriately targeting mechanisms of change, to impact individual behaviors within the social world. That is possible, but if it is the case, Stevenson does not call for a corresponding pivot in interventions studied by researchers, but rather sweeping (and possibly revolutionary) change.
That said, it is still worth noting strong objections to the argument that nothing, in fact, seems to have worked in a substantial and generalizable way. One could cite, for example, meta-analyses of early parent/family training programs (Piquero, et al., 2016), problem-oriented policing approaches to crime and disorder (Hinkle, et al., 2020), and hot spots policing (Braga & Weisburd, 2020), the latter of which suggests “a much more meaningful impact of hot spots policing on crime than previous reviews” (p.1), i.e., the reviews which Stevenson relied on to assert the ineffectiveness of hot spot strategies.
It is our understanding that Stevenson (2023) and Farrington and Welsh (2006) were engaged in projects of a significantly different scope and magnitude, and this bears on whether their respective methods are appropriate. The latter essay’s “main aim is to investigate whether, after another two decades, there have been changes in the uses, methods, conclusions, and challenges of [RCTs]” (p.56). Their very modest conclusion is that RCT’s “are often feasible, and we hope that our essay shows how they can be conducted better in the future… because of high internal validity, attempts should be made to test causal hypotheses using randomized experiments wherever possible” (p. 115). We feel that Farrington and Welsh’s methods are a valid way to draw this modest conclusion, but we did not take it up in the article because it is not germane to its purpose.
In contrast, Stevenson concluded that the criminal justice RCTs she surveyed have not, in the aggregate, had meaningful effects on our world, that this allows her to conclude our world is structured in a way that makes RCTs unable to bring about meaningful change, and the idea that it is possible to use rigorous experiments to affect that world is a myth. This is an extraordinarily broad, sweeping conclusion—she admits as much—and we argue that its threshold for validity is much higher than Farrington and Welsh’s, and cannot arise from the methods she employs, either straightforwardly presented as a narrative review, or interpreted as a heuristic meta-analysis.
We recognize that other types of research are very valuable in setting research agendas and developing hypotheses. Qualitative research is eminently useful for both, as well as for gauging if given approaches to a problem are acceptable or feasible, and understanding their challenges and limits. The use of large administrative datasets in econometric research can identify the broader issues and relationships that can better focus the attention of intervention scientists. The most fruitful research agendas are likely to make use of mixed methods, especially when practical or financial constraints prohibit RCTs or other causally rigorous designs.
Although it is once again worth emphasizing that null results can be as instructive as significant ones.
Our argument will go on to show that this is not the case, as a cursory review of research in medicine and public health makes clear. These fields squarely take on the same social world as criminal justice, and often rely on RCTs to do so, leaving readers to wonder if Stevenson’s project could have proceeded past its first steps had it taken the time to explore the methods and successes of these directly adjacent endeavors, and not preemptively bracketed them off as irrelevant. That her paper makes no mention of implementation science, which is discussed here in a later section, is an illustration of its vulnerability in this regard.
Basic science also frequently falls victim to the problems of selection and researcher biases, measurement problems, and short follow-ups, yielding the replication crises discussed in this paper and suggesting that setting these trials apart from those in criminal justice may amount to an arbitrary distinction.
Examples can be drawn from the FDA model for new drug development, which culminates in large scale RCTs. However, a caveat is that even under these rigorous protocols, negative and sometimes dangerous side effects are sometimes missed, and only uncovered after an approved drug has been in use for years. This reflects some of the limitations of RCTs, and the selection or sampling biases to which they are susceptible. Increasingly, scientists are expected to minimize these biases in their designs.
There are far too many to catalog under the scope of this paper, but the following are four examples of the titles of RCTs in public health, all of them searchable at www.reporter.nih.gov: “Reducing internalized stigma among veterans with PTSD: A pilot trial;” “Testing the leadership and organizational change for implementation in substance use treatment programs;” “A trial of a police-mental health linkage system for jail diversion and reconnection to care;” and “Health stereotype threat, health disparities, and minority aging.”.
Further evidence of public health’s engagement with the social world can be found in the NIH’s multimillion dollar investment in translating science to practice. An example is the Justice Community Opioid Innovation Network’s Coordination and Translation Center (www.jcoinctc.org/jtec), a resource that takes evidence from RCTs and other investigations about the opioid crisis and converts them to e-courses, webinars, and podcasts, customized to different justice disciplines (e.g., courts, prosecutors, probation officials, etc.), to provide a means for practitioners to integrate scientific evidence into their daily work.
Incommensurability has a specific meaning in the philosophy of science, and it is the one we intend. It is the idea that two different scientific fields—or two successive paradigms within the same field—employ such different theories and ontologies that they cannot speak to each other in an intelligible way, and for that reason their research programs cannot merge, or coherently overlap. Stevenson seems to indicate that medicine and public health intervene on different types of worlds than criminal justice, which is why they fall outside her claim about the structure of criminal justice’s social world and its resistance to rigorous science. It reads like a textbook case of incommensurability. See Oberheim, Eric and Paul Hoyningen-Huene, “The Incommensurability of Scientific Theories,” The Stanford Encyclopedia of Philosophy (Fall 2018 Edition), Edward N. Zalta (ed.), https://plato.stanford.edu/archives/fall2018/entries/incommensurability.
To be sure, there have been such movements in criminal justice. Former NIJ Director John Laub made translational criminology a cornerstone of his directorship (https://nij.ojp.gov/topics/articles/strengthening-nij-mission-science-and-process) and the Center for Evidence-Based Crime Policy at George Mason University regularly publishes a newsletter, Translational Criminology, designed to showcase “examples of how research is converted into criminal justice practice.” Implementation science constitutes the evolution of these translational efforts into a formal theoretically-driven experimental practice.

References

Anvari, F., & Lakens, D. (2018). The replicability crisis and public trust in psychological science. Comprehensive Results in Social Psychology, 3(3), 266–286.
Article Google Scholar
Association of American Medical Colleges. (n.d.). Basic science. Retrieved January 15, 2024 from https://www.aamc.org/what-we-do/mission-areas/medical-research/basic-science
Belenko, S., Dembo, R., Knight, D. K., Elkington, K. S., Wasserman, G. A., Robertson, A. A., Welsh, W. N., Schmeidler, J., Joe, G. W., & Wiley, T. (2022). Using structured implementation interventions to improve referral to substance use treatment among justice-involved youth: Findings from a multisite cluster randomized trial. Journal of Substance Abuse Treatment, 140, 108829. https://doi.org/10.1016/j.jsat.2022.108829
Article Google Scholar
Braga, A. A., & Weisburd, D. L. (2020). Does hot spots policing have meaningful impacts on crime? Findings from an alternative approach to estimating effect sizes from place-basedprogram evaluations. Journal of Quantitative Criminology, 38(1), 1–22.
Article Google Scholar
Coiera, E., Ammenwerth, E., Georgiou, A., & Magrabi, F. (2018). Does health informatics have a replication crisis? Journal of the American Medical Informatics Association, 25(8), 963–968.
Article Google Scholar
Davies, P., & Boruch, R. (2001). The Campbell Collaboration: Does for public policy what Cochrane does for health (Vol. 323, pp. 294-295). British Medical Journal Publishing Group.
del Pozo, B., Belenko, S., Pivovarova, E., Ray, B., Martins, K. F., & Taxman, F. S. (2024). Using implementation science to improve evidence-based policing: An introduction for researchers and practitioners. Police Quarterly, 0(0), 10986111241265290. https://doi.org/10.1177/10986111241265290
del Pozo, B., Goulka, J., Beletsky, L., & Kleinig, J. (2021). Beyond: decriminalization Ending the war on drugs requires recasting police discretion through the lens of a public health ethic. The American Journal of Bioethics, 21(4), 41–44. https://doi.org/10.1080/15265161.2021.1891339
Article Google Scholar
Dreber, A., & Johannesson, M. (2019). Statistical significance and the replication crisis in the social sciences. In Oxford research encyclopedia of economics and finance.
Eccles, M. P., & Mittman, B. S. (2006). Welcome to Implementation Science. Implementation Science, 1(1), 1. https://doi.org/10.1186/1748-5908-1-1
Article Google Scholar
Fagan, A. A., Bumbarger, B. K., Barth, R. P., Bradshaw, C. P., Cooper, B. R., Supplee, L. H., & Walker, D. K. (2019). Scaling up evidence-based interventions in US public systems to prevent behavioral health problems: Challenges and opportunities. Prevention Science, 20(8), 1147–1168. https://doi.org/10.1007/s11121-019-01048-8
Article Google Scholar
Farrington, D. P., & Welsh, B. C. (2006). A half century of randomized experiments on crime and justicE. Crime and Justice, 34(1), 55–132. https://doi.org/10.1086/500057
Article Google Scholar
Feyerabend, P. (2010). Against method: Outline of an anarchistic theory of knowledge. Verso Books.
Freedman, B. (2017). Equipoise and the ethics of clinical research. In human experimentation and research (pp. 427–431). Routledge.
Glazer, E., Berman, G., & Greenman, J. (Eds.). (2024). Vital city. https://www.vitalcitynyc.org/issues/issue-7.
Goulka, J., del Pozo, B., & Beletsky, L. (2021). From public safety to public health: Re-envisioning the goals and methods of policing. Journal of Community Safety and Well-Being, 6(1), 22–27.
Article Google Scholar
Greenhalgh, T., Howick, J., & Maskrey, N. (2014). Evidence based medicine: A movement in crisis? BMJ : British Medical Journal, 348, g3725. https://doi.org/10.1136/bmj.g3725
Article Google Scholar
Greenhalgh, T., Robert, G., Macfarlane, F., Bate, P., & Kyriakidou, O. (2004). Diffusion of innovations in service organizations Systematic review and recommendations. The Milbank Quarterly, 82(4), 581–629. https://doi.org/10.1111/j.0887-378X.2004.00325.x
Article Google Scholar
Hasson, S. O. (2021). Science and Pseudo-Science. In E. N. Zalta & U. Nodelman (Eds.), The Stanford Encyclopedia of Philosophy (Fall ed.). https://plato.stanford.edu/archives/fall2021/entries/pseudo-science/. Accessed 3 Apr 2024.
Hibbert, R. (2016). What is an immature science? International Studies in the Philosophy of Science, 30(1), 1–17. https://doi.org/10.1080/02698595.2016.1240433
Article Google Scholar
Hicks, D. J. (2023). Open science, the replication crisis, and environmental public health. Accountability in Research, 30(1), 34–62.
Article Google Scholar
Higgins, J. P. T., & Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21(11), 1539–1558. https://doi.org/10.1002/sim.1186
Article Google Scholar
Hinkle, J. C., Weisburd, D., Telep, C. W., & Petersen, K. (2020). Problem-oriented policing for reducing crime and disorder: An updated systematic review and meta-analysis. Campbell Systematic Reviews, 16(2), e1089. https://doi.org/10.1002/cl2.1089
Article Google Scholar
Holmes, J. (2015, November 25). Flossing and the art of scientific investigation. The New York Times, SR11. https://www.nytimes.com/2016/11/25/opinion/sunday/flossing-and-the-art-of-scientific-investigation.html. Accessed 1 May 2024.
Huque, M. (1988). Experiences with meta-analysis in NDA submissions. Proceedings of the Biopharmaceutical Section of the American Statistical Association, 2, 28–33.
Google Scholar
Jensen, T. I., Kelly, B., & Pedersen, L. H. (2023). Is there a replication crisis in finance? The Journal of Finance, 78(5), 2465–2518.
Article Google Scholar
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196–217.
Article Google Scholar
Kirsch, A. (2016, September 5). Are we really so modern? https://www.newyorker.com/magazine/2016/09/05/the-dream-of-enlightenment-by-anthony-gottlieb
Knight, D. K., Belenko, S., Dennis, M. L., Wasserman, G. A., Joe, G. W., Aarons, G. A., Bartkowski, J. P., Becan, J. E., Elkington, K. S., Hogue, A., McReynolds, L. S., Robertson, A. A., Yang, Y., & Wiley, T. R. A. (2022). The comparative effectiveness of Core versus Core+Enhanced implementation strategies in a randomized controlled trial to improve substance use treatment receipt among justice-involved youth. BMC Health Services Research, 22(1), 1535. https://doi.org/10.1186/s12913-022-08902-6
Article Google Scholar
Knight, D. K., Belenko, S., Wiley, T., Robertson, A. A., Arrigona, N., Dennis, M., Bartkowski, J. P., McReynolds, L. S., Becan, J. E., Knudsen, H. K., Wasserman, G. A., Rose, E., DiClemente, R., Leukefeld, C., JJ-TRIALS Cooperative. (2016). Juvenile Justice—Translational Research on Interventions for Adolescents in the Legal System (JJ-TRIALS): A cluster randomized trial targeting system-wide improvement in substance use services. Implementation Science, 11(1), 57. https://doi.org/10.1186/s13012-016-0423-5
Article Google Scholar
Kuhn, T. S. (1970). The structure of scientific revolutions (2nd ed.). University of Chicago Press.
Landes, S. J., McBain, S. A., & Curran, G. M. (2019). An introduction to effectiveness-implementation hybrid designs. Psychiatry Research, 280, 112513. https://doi.org/10.1016/j.psychres.2019.112513
Article Google Scholar
Lash, T. L. (2017). The harm done to reproducibility by the culture of null hypothesis significance testing. American Journal of Epidemiology, 186(6), 627–635. https://doi.org/10.1093/aje/kwx261
Article Google Scholar
Lash, T. L., Collin, L. J., & Van Dyke, M. E. (2018). The replication crisis in epidemiology: Snowball, snow job, or winter solstice? Current Epidemiology Reports, 5, 175–183.
Article Google Scholar
May, T., Bennett, T., & Holloway, K. (2018). RETRACTED: The impact of medically supervised injection centres on drug-related harms: A meta-analysis. International Journal of Drug Policy, 59, 98–107. https://doi.org/10.1016/j.drugpo.2018.06.018
Article Google Scholar
McCord, J. (2003). Cures that harm: Unanticipated outcomes of crime prevention programs. The ANNALS of the American Academy of Political and Social Science, 587(1), 16–30. https://doi.org/10.1177/0002716202250781
Article Google Scholar
McMenamin, L. (2023). In ‘No More Police,’ Mariame Kaba and Andrea Ritchie Argue for Abolition. Teen Vogue, (January 13). https://www.teenvogue.com/story/mariame-kaba-andrea-ritchie-no-more-police. Accessed 1 May 2024.
Morris, Z. S., Wooding, S., & Grant, J. (2011). The answer is 17 years, what is the question: Understanding time lags in translational research. Journal of the Royal Society of Medicine, 104(12), 510–520. https://doi.org/10.1258/jrsm.2011.110180
Article Google Scholar
National Research Council. (2010). Strengthening the national institute of justice. National Academies Press
Oberauer, K., & Lewandowsky, S. (2019). Addressing the theory crisis in psychology. Psychonomic Bulletin & Review, 26, 1596–1618.
Article Google Scholar
Pagell, M. (2021). Replication without repeating ourselves: Addressing the replication crisis in operations and supply chain management research. Journal of Operations Management, 67(1), 105–115.
Article Google Scholar
Petitti, D. B. (2001). Approaches to heterogeneity in meta-analysis. Statistics in Medicine, 20(23), 3625–3633. https://doi.org/10.1002/sim.1091
Article Google Scholar
Piquero, A. R. (2023). “We study the past to understand the present; we understand the present to guide the future”: The time capsule of developmental and life-course criminology. Journal of Criminal Justice, 85, 101932. https://doi.org/10.1016/j.jcrimjus.2022.101932
Article Google Scholar
Piquero, A. R., Jennings, W. G., Diamond, B., Farrington, D. P., Tremblay, R. E., Welsh, B. C., & Gonzalez, J. M. R. (2016). A meta-analysis update on the effects of early family/parent training programs on antisocial behavior and delinquency. Journal of Experimental Criminology, 12, 229–248.
Article Google Scholar
Pope, C. (2003). Resisting evidence: The study of evidence-based medicine as a contemporary social movement. Health, 7(3), 267–282. https://doi.org/10.1177/1363459303007003002
Article Google Scholar
Rosenthal, L., & Schisterman, E. (2010). Meta-analysis: Drawing conclusions when study results vary. Methods in Molecular Biology, 594, 427–434. https://doi.org/10.1007/978-1-60761-411-1_30
Article Google Scholar
Ryan, J. C., & Tipu, S. A. (2022). Business and management research: Low instances of replication studies and a lack of author independence in replications. Research Policy, 51(1), 104408.
Article Google Scholar
Sampson, R. J. (2010). Gold standard myths: Observations on the experimental turn in quantitative criminology. Journal of Quantitative Criminology, 26(4), 489–500. https://doi.org/10.1007/s10940-010-9117-3
Article Google Scholar
Sampson, R. J., & Laub, J. H. (2017). Life-course desisters? Trajectories of crime among delinquent boys followed to age 70. In Developmental and Life-Course Criminological Theories (pp. 37–74). Routledge.
Schuerman, J., Soydan, H., Macdonald, G., Forslund, M., de Moya, D., & Boruch, R. (2002). The Campbell collaboration. Research on Social Work Practice, 12(2), 309–317.
Article Google Scholar
Sherman, L. (1998). Evidence-based policing (Ideas in American Policing, Issue. https://www.policinginstitute.org/wp-content/uploads/2015/06/Sherman-1998-Evidence-Based-Policing.pdf. Accessed 3 Apr 2024.
Sherman, L. W., & Berk, R. A. (1984). The Minneapolis domestic violence experiment (Vol. 1). Police Foundation Washington, DC.
Sherman, L. W., Schmidt, J. D., & Rogan, D. P. (1992). Policing domestic violence: Experiments and dilemmas. Free Press.
Smith, G. C., & Pell, J. P. (2003). Parachute use to prevent death and major trauma related to gravitational challenge: Systematic review of randomised controlled trials. BMJ, 327(7429), 1459–1461.
Article Google Scholar
Stevenson, M. T. (2023). Cause, effect, and the structure of the social world. Boston University Law Review, 103(7), 2001-2044. https://t.co/xIvIcCseGL
Stevenson, M. T. [@MeganTStevenson]. (2024). This paper surveys 50+ years of randomized control trials in criminal justice and shows that almost no interventions have lasting benefit-- X (Twitter). Retrieved January 30, 2024 from https://twitter.com/MeganTStevenson/status/1742263139331088480
Stolzer, A. J., Sumwalt, R. L., & Goglia, J. J. (2023). Safety management systems in aviation. CRC Press.
Book Google Scholar
Tackett, J. L., Brandes, C. M., King, K. M., & Markon, K. E. (2019). Psychology’s replication crisis and clinical psychological science. Annual Review of Clinical Psychology, 15, 579–604.
Article Google Scholar
Thompson, S. G. (1994). Systematic Review: Why sources of heterogeneity in meta-analysis should be investigated. BMJ, 309(6965), 1351–1355. https://doi.org/10.1136/bmj.309.6965.1351
Article Google Scholar
Tremblay, R. E., Vitaro, F., Bertrand, L., LeBlanc, M., Beauchesne, H., Boileau, H., & David, L. (1992). Parent and child training to prevent early onset of delinquency: The Montréal longitudinal–experimental study. In Preventing antisocial behavior: Interventions from birth through adolescence. (pp. 117–138). Guilford Press.
Wooditch, A., Fisher, R., Wu, X., & Johnson, N. J. (2020). p-value problems? An examination of evidential value in criminology. Journal of Quantitative Criminology, 36(2), 305–328. https://doi.org/10.1007/s10940-020-09459-5
Article Google Scholar

Download references

Acknowledgements

We would like to acknowledge Harold Pollack, David Weisburd, and Brandon Welsh for insights that improved this manuscript.

Funding

Dr. del Pozo’s work was funded by the National Institute on Drug Abuse (grant K01DA056654). The work of Dr. Taxman was funded by the JCOIN Coordination & Translation Center (NIDA grant 5U2CDA050097). The institute played no role in the preparation of this manuscript, and its contents do not necessarily reflect those of the National Institutes of Health.

Author information

Authors and Affiliations

Division of General Internal Medicine, The Warren Alpert Medical School of Brown University, Rhode Island Hospital, 111 Plain Street, Providence, RI, 02903, USA
Brandon del Pozo
Department of Criminal Justice, Temple University, Philadelphia, USA
Steven Belenko & Jerry Ratcliffe
Center for Advancing Correctional Excellence, George Mason University, Fairfax, USA
Faye S. Taxman
John Glenn College of Public Affairs, The Ohio State University, Columbus, USA
Robin S. Engel
Department of Criminology and Criminal Justice, University of South Carolina, Columbia, USA
Ian Adams
Department of Sociology and Criminology, University of Miami, Coral Gables, USA
Alex R. Piquero

Authors

Brandon del Pozo
View author publications
You can also search for this author in PubMed Google Scholar
Steven Belenko
View author publications
You can also search for this author in PubMed Google Scholar
Faye S. Taxman
View author publications
You can also search for this author in PubMed Google Scholar
Robin S. Engel
View author publications
You can also search for this author in PubMed Google Scholar
Jerry Ratcliffe
View author publications
You can also search for this author in PubMed Google Scholar
Ian Adams
View author publications
You can also search for this author in PubMed Google Scholar
Alex R. Piquero
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brandon del Pozo.

Ethics declarations

Competing Interest

The authors declare they have not competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

del Pozo, B., Belenko, S., Taxman, F.S. et al. Then a miracle occurs: cause, effect, and the heterogeneity of criminal justice research. J Exp Criminol (2024). https://doi.org/10.1007/s11292-024-09636-7

Download citation

Accepted: 05 August 2024
Published: 12 September 2024
DOI: https://doi.org/10.1007/s11292-024-09636-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Then a miracle occurs: cause, effect, and the heterogeneity of criminal justice research

Abstract

Similar content being viewed by others

Evidence Mapping to Advance Justice Practice

Doing Sociology in the Age of ‘Evidence-Based Research’: Scientific Epistemology versus Political Dominance

COVID, Crime & Criminal Justice: Affirming the Call for System Reform Research

Introduction

RCTs, their limits, and the structure of our world

Evaluating Stevenson’s claims

Discussion

Conclusion

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Then a miracle occurs: cause, effect, and the heterogeneity of criminal justice research

Abstract

Similar content being viewed by others

Evidence Mapping to Advance Justice Practice

Doing Sociology in the Age of ‘Evidence-Based Research’: Scientific Epistemology versus Political Dominance

COVID, Crime & Criminal Justice: Affirming the Call for System Reform Research

Explore related subjects

Introduction

RCTs, their limits, and the structure of our world

Evaluating Stevenson’s claims

Discussion

Conclusion

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation