Introduction

The reduction of administrative burden through technological innovation has become a critical focus across both public and private sectors in recent years (Mergel et al., 2019). As organizations grapple with increasing demands and resource constraints, the pursuit of efficiency has led to the widespread adoption of various technological solutions. Artificial intelligence (AI) has emerged as a particularly promising tool in this endeavor, offering the potential to automate complex tasks, enhance decision-making processes, and significantly improve productivity (Brynjolfsson & McAfee, 2014). One of the most publicly available demonstrations of AI capabilities has been the rapid adoption of large language models (LLMs) that are able to produce human-like written products (e.g., ChatGPT) (Radford et al., 2019).

In the private sector, AI has been deployed in diverse applications, from streamlining customer service operations to optimizing supply chain management (Davenport & Ronanki, 2018). Companies like Amazon have leveraged AI to enhance warehouse efficiency, while financial institutions use AI-powered algorithms for fraud detection and risk assessment (Agrawal et al., 2017). Similarly, in healthcare, AI assists in diagnostic processes and administrative tasks, potentially reducing physician burnout and improving patient care (Alowais et al., 2023).

The public sector, while often slower to adopt new technologies, also recognizes the potential of AI to address long-standing efficiency challenges (Desouza, 2018; Schiff et al., 2023). Government agencies are exploring AI applications in areas such as tax processing, benefits administration, and public transportation optimization (Mehr, 2017). These efforts aim to reduce paperwork, speed up service delivery, and ultimately improve citizen satisfaction with government services (Sun & Medaglia, 2019).

Just as private and public sector organizations are rapidly experimenting with AI and LLM tools to drive efficiency and output (Senadheera et al., 2024), the policing profession is historically adept at technology adoption to address its own concerns. In the realm of law enforcement, the adoption of AI presents both significant opportunities and unique challenges (Ferguson, 2017). Police staffing has become a critical issue in the USA, with agencies facing significant challenges in recruitment, retention, and retirements (Adams et al., 2023; Mourtgos et al., 2022).

While some of this staffing distress is theorized to be related to acute reactions to significant social disruption beginning in 2020, there is good reason to believe that it is also related to persistent macro trends that are unlikely to markedly improve in the near future (Wilson & Heinonen, 2012). At the same time, call volumes reflect steady and growing demands for police services across the nation, and the primary predictor of police call response times is, unsurprisingly, the police labor available to meet that demand (Mourtgos et al., 2024). This situation underscores the need for law enforcement agencies to enhance their operational efficiency to keep pace with the increasing demand they encounter year after year (Wilson & Weiss, 2014).

Historically, technology has been the cornerstone of improving police efficiency (Stroshine, 2015). Major technological advancements have transformed policing practices and enhanced the capacity of law enforcement to manage their duties effectively. For instance, the introduction of motor vehicles revolutionized police transportation, enabling officers to respond to incidents more rapidly and cover larger areas (Popkess, 1933). Advancements in communication technology, such as the development of two-way radios, significantly improved coordination and response times, allowing for more efficient dispatching and information sharing (Leonard, 1938). The advent of computers and digital databases further streamlined administrative tasks and facilitated better data management, making information retrieval faster and more accurate. In recent years, analytically driven operational measures like hot-spot policing have leveraged data to identify and focus on high-crime areas, leading to more effective deployment of police resources and proactive crime prevention (Braga & Weisburd, 2022).

Building on the legacy of technological innovation, AI-assisted narrative generation represents the latest advancement with the potential to improve police report writing (Adams, 2024; Dement & Inglis, 2024; Ferguson, 2024). This technology is proposed to bring several benefits, including enhanced report quality, consistency, completeness, and efficiency in terms of reducing the time required for report writing (Axon, 2024). Given the increasing administrative burden on officers, reducing the time spent on paperwork could enable reallocation of resources to more critical fieldwork. However, despite commercial claims that this technology will dramatically decrease the time officers spend manually writing initial reports (Keough, 2024), no experimental test of those claims has been reported to date. As is often the case, rapid adoption of police technology is often done in advance of the empirical record on the ability of the tool to achieve its aims and avoid unintended consequences (Adams & Mastracci, 2019; Lum et al., 2017).

In this pre-registered randomized control trial, we focus specifically on the efficiency aspect of AI-assisted report writing, utilizing a commercial product from Axon called “Draft One.” Our primary objective is to experimentally assess whether the use of AI tools can significantly reduce the time officers spend writing initial reports compared to traditional methods. By addressing this efficiency question, we aim to provide empirical evidence on the potential of AI technology to alleviate some of the operational pressures faced by modern police forces, particularly in an era of constrained staffing resources and increasing demands for accountability and transparency.

In both our pre-registered analysis and several alternative specifications, including a difference-in-differences analysis conducted over a full year, our findings consistently indicate that AI assistance did not significantly improve the speed of officers’ report writing. While AI tools like “Draft One” may offer other benefits—such as improved consistency, accuracy, and report quality—the initial promises of this technology do not translate into the time savings that were anticipated.

The promise of AI-assisted police reports

AI-assisted narrative generation represents a key advancement with the potential to improve police report writing (Adams, 2024). This technology may bring several benefits, including enhanced report quality, consistency, completeness, and efficiency in terms of reducing the time required for report writing (Lavezzorio, 2024; Ropek, 2024). Given the increasing imbalance between police staffing and public demand, reducing the time spent on paperwork is crucial to reallocating resources to more critical fieldwork. In other words, by reducing time spent on paperwork, police departments may be able to reduce police staffing woes as officers’ time is freed up to answer more calls for service and spend more time in the community.

Previous scholarly work on AI-assisted police report writing is meager. However, Ferguson (2024) engages in legal analysis about potential risks of the technology, including what he deems “generative suspicion.” Ferguson’s core critique is that traditional police report writing is such a critical part of the criminal justice system that before we allow algorithms to affect the reports, we must better understand the first principles of police report writing. Failure to do so, and rush into adoption, risks a future where we “fundamentally reshape policing” with potentially negative consequences across the criminal justice system (Ferguson, 2024, p. 4).

In the absence of peer-reviewed findings necessary to establish competent priors regarding the purported effects, we rely on information provided by the manufacturers of these commercial products. For instance, Axon, the world’s largest producer of body-worn cameras and conducted energy devices, recently introduced their “Draft One” product, which offers AI assistance for report writing. Axon’s press releases regarding Draft One quote an officer whose agency was testing the product as reporting that officers using the product spent 82% less time on report writing and that the quality and completeness of their reports improved alongside the efficiency gains (Keough, 2024). These commercial claims are falsifiable, and in this study, we focus specifically on the efficiency aspect of AI-assisted report writing. That is—does the use of AI tools significantly reduce the time officers spend on writing initial reports compared to traditional methods? In more formal testing terms, we test the hypothesis:

  • H1: Officers who use AI-assisted report-writing tools will spend significantly less time writing initial reports compared to officers who use traditional report-writing methods.

Method

Axon’s Draft One is an AI-assisted report-writing tool marketed as a solution to streamline the process of creating police incident reports (Keough, 2024). The system integrates with Axon body cameras, employing audio-to-text conversion technology to transcribe officer interactions. After an incident, officers can access the Draft One system, where they input basic incident details such as the type of crime, its severity, and arrest status. The system then generates an initial narrative draft based on the audio transcript and the officer-provided parameters.

The resulting narrative draft follows a standardized structure, typically including date, time, and officer identification, followed by sections detailing the incident’s background, officer actions, suspect reactions, and the basis for suspicion or probable cause. Axon states that several features are incorporated to promote officer engagement and accuracy, including required information inserts, intentionally included errors for correction, and customizable thresholds for officer-generated content. The system reportedly requires officers to review, edit, and approve the final report, acknowledging its AI-generated origin and confirming its accuracy under oath.

At the core of Draft One’s technology is ChatGPT 4, a large language model (LLM) developed by OpenAI (OpenAI et al., 2024).Footnote 1 LLMs are advanced artificial intelligence systems trained on vast amounts of text data, enabling them to understand and generate human-like text based on given prompts or instructions. In the context of Draft One, Axon creates transcripts from body-worn camera footage and then uses custom instructions to interact with the LLM API, requesting the generation of a police report based on the transcript and other provided parameters. Axons claims this technology allows for the rapid creation of structured, contextualized report narratives, and in turn, saves time for the officer creating the report.

Agency context

The study takes place within a medium-sized police department that agreed to participate in a pre-registered randomized controlled trial. The Manchester Police Department (MPD) engaged with the research team for an experimental trial to assess potential efficiency gains before full implementation. Manchester, New Hampshire, is a small city with an estimated population of 115,000, about 50 min north of Boston, in the New England region of the USA. The community is urban in nature and experiences crime and public safety issues consistent with other urban spaces in the country.

According to agency reporting, Manchester experienced a violent crime rate of 384 per 100,000 and a property crime rate of 1960 per 100,000 in the calendar year 2023 (Aldenberg, 2023). MPD has primary law enforcement jurisdiction of the city, with an authorized strength of 271 full-time police officers and 67 non-sworn personnel. Due to recruitment and retention challenges consistent with other large agencies (Adams et al., 2023), MPD’s actual staffing consisted of 249 full-time officers and 54 non-sworn staff. The department is divided into six divisions, the largest being the patrol division, with a total staffing of 124 sworn officers, 106 of whom are patrol officers (who primarily respond to calls for service), and is overseen by a captain (division commander), three lieutenants (shift commanders), and 14 sergeants (front-line supervisors).

During the study period, there were several noteworthy occurrences. First, as the study began, supervision within the patrol division changed. Each shift was assigned a new lieutenant (shift commander). These changes can disrupt the status quo in each shift. Additionally, late in the study, several school resource officers (SROs) were added to the patrol division due to the end of the school year. These SROs were not included in the study. Lastly, an officer-involved shooting occurred in the last week of the study, which was a significant event for the department. The event drew significant resources and was labor-intensive for all involved.

Training and implementation

Prior to participants using the Draft One tool, a structured training program was designed to familiarize officers with the new technology and study protocols. Initial communication was disseminated via email, providing participants with an overview of the technology and study objectives. Subsequently, in-person training sessions were conducted during patrol division roll calls from May 5 to May 12, 2024. The training curriculum was developed at the agency, using Axon-generated materials, to ensure that the technology and its implementation reflected current agency practices and report norms.

The training sessions were integrated into the existing organizational structure of daily roll calls, which typically serves as a platform for disseminating assignments and updates. This integration allowed for minimal disruption to normal operations while ensuring comprehensive coverage of the study population. Patrol supervisors were provided with a training roster to track participation and ensure all selected officers received the necessary instruction. The core of the training program consisted of a 17-min instructional video, which participants viewed following their regular roll call duties. The video content was strategically designed to cover several key areas, including technology overview and functionality, departmental due diligence processes, operational integration with the agency’s Records Management System (RMS), legal and procedural considerations, and best practices for optimal utilization of the Draft One tool.

The training curriculum emphasized three critical aspects of implementation. First, officers were instructed to initiate the incident report in the RMS prior to generating the narrative with Draft One, a crucial step for accurate timestamp tracking in data collection. Second, the importance of thorough review and verification of the AI-generated narratives was repeatedly stressed to ensure accuracy, completeness, and the removal of any erroneous or non-factual elements. Third, officers were trained in strategies to enhance the accuracy and detail of AI-generated reports, including techniques for clear verbalization of actions and observations during incidents and providing comprehensive verbal summaries on body-worn camera (BWC) recordings. Previous scholarship has documented officers using more verbalization when wearing a BWC (Owens & Finn, 2017). Given that AI-assisted report writing requires a transcript in order to generate the narrative, the agency believed that training officers to verbalize more would lead to more complete reports.

The training content was delivered through a webinar format, incorporating narrated screen recordings to provide visual guidance on the web interface usage. This multimedia approach was designed to accommodate various learning styles and enhance retention of the operational procedures. Pre-study testing informed the training design, particularly the emphasis on verbalization techniques, which had been empirically shown to improve the accuracy and detail of generated reports. This evidence-based approach to training development underscores the iterative nature of the implementation process and the integration of preliminary findings into the study protocol.

Sample characteristics and randomization procedure

The study sample comprised 85 police officers from the partner agency, representing a subset of the total patrol complement. This sample size is smaller than the full patrol division and reflects various exclusions, including officers assigned to extended training programs, those on extended sick leave, military deployments, or administrative leave. Officers who opted out of the study were also excluded. Furthermore, newly hired officers still in the police academy or undergoing field training were not included in the sample.

Participants were randomly selected from the pool of willing officers and subsequently randomly assigned to either the control group (n = 43) or the experimental group (n = 42). The control group maintained their usual report writing procedures, while the experimental group received training on and utilized the AI-assisted narrative generation tool.

Table 1 presents the balance of key demographic and professional characteristics across the control and experimental groups. Randomization was done using the `randomizr` package in R (Coppock, 2023). Sample demographics largely align with national law enforcement workforce trends (Gardner & Scott, 2022), and the randomization process achieved a successful balance across treatment groups.

Table 1 Sample statistics balance table

The median age of participants was 31 years (IQR: 29.0, 34.0), with the control group slightly younger (median 30.0 years; IQR: 27.5, 33.0) than the AI group (median 33.0 years; IQR: 30.0, 36.8), though this difference was not statistically significant (p = 0.5). The median tenure was 3.50 years (IQR: 2.50, 6.20), with the AI group showing a marginally higher median tenure (4.20 years; IQR: 2.50, 7.48) compared to the control group (3.50 years; IQR: 2.25, 5.40), but again, this difference was not statistically significant (p = 0.12).

The sample was predominantly male (82%) and white (82%), reflecting broader trends in law enforcement demographics. The gender distribution was nearly identical across groups, with 81% male officers in the control group and 83% in the AI group (p > 0.9). Similarly, the racial composition was balanced, with 79% white officers in the control group and 86% in the AI group (p = 0.6). Shift assignments were also relatively balanced (p = 0.3), with the largest proportion of officers working swing shifts (41%), followed by day shifts (33%) and midnight shifts (26%). The control group had a slightly higher proportion of officers on swing shifts (47% vs. 36%), while the AI group had more officers on day shifts (40% vs. 26%).

The balanced distribution across all measured variables, as evidenced by the non-significant p-values (all p > 0.05), indicates that the randomization process was successful in creating comparable treatment and control groups. This balance strengthens the internal validity of the study, allowing for more robust causal inferences about the effect of the AI-assisted narrative generation tool on report writing outcomes.

Sample size and statistical power

We conducted a power analysis using the pwr package in R (Champely, 2020) to determine the required sample size for detecting a statistically significant difference in report writing time between the control and experimental groups. We used a two-sample t-test assuming equal variances, with a significance level (alpha) of 0.05 and a power of 80%. Based on historical data provided by the partner agency (mean report writing time = 54.63 min, standard deviation = 47.18 min), we estimated that 351 observations per group would be needed to detect a relatively conservative effect size of 10 min reduction in report writing time for the experimental group. In the end, the study period included 755 observations (reports), and therefore, the study is well-powered at the given metrics.

Data and measures

Our sole outcome is police report writing duration, observing the reports submitted by officers during the trial period (n = 755). Our study drew upon the Manchester Police Department’s (MPD) Records Management System (RMS). MPD utilizes Central Square’s Enterprise RMS version 22.2.6. We used an audit report from this system to create data on the time taken to complete incident reports and workstation usage for report completion. We extracted timestamps for report creation (when an officer opens a new template) and report submission (when an officer sends the report for review), along with unique workstation identifiers. This information allowed us to calculate the total (whole) minutes taken to complete each report.

Analysis

Data analysis for this study follows the pre-registered experimental protocol. Pre-registered analyses are a preferred method for conducting experiments, such as the one presented here, as we state our hypotheses and the methods used to test the hypotheses prior to collecting data, thereby eliminating the possibility of p-hacking or other questionable research practices that artificially increase the likelihood of receiving a significant finding. Pre-registering experimental hypotheses has been shown to enhance the transparency and credibility of research by reducing bias and preventing data-driven modifications to hypotheses after results are known. This approach minimizes the risk of engaging in p-hacking or selective reporting, which can distort scientific findings. Studies have demonstrated that pre-registered experiments are less likely to report inflated effect sizes and more likely to produce replicable results, providing a stronger foundation for empirical evidence in fields such as criminology and psychology (Chin et al., 2023; Nosek et al., 2018, 2022). Consequently, pre-registration improves the overall rigor and trustworthiness of experimental research.

Given the nature of the data, where officers completed multiple reports over the study period, our pre-registration specifies a mixed-effects model to accommodate the repeated measures inherent in the data structure. This approach is suited to the hierarchical organization of the dataset—specifically, multiple reports nested within each officer and across various days. The mixed-effects model enabled us to control for individual variability between officers and consider the correlations between reports composed by the same officer.

The primary fixed effect in our model was the treatment variable, distinguishing between control and experimental groups. This distinction enabled us to estimate the average difference in report writing time attributable to the use of the AI tool. We incorporated a random intercept for each officer to recognize and model the natural variation in writing speeds—some officers are inherently faster or slower than others.

The general form of the mixed-effects model we use is

$$\begin{array}{c}\begin{array}{cc}{duration}_{i}& \sim N\left({\alpha }_{j\left[i\right]},{\sigma }^{2}\right)\end{array}\\ \begin{array}{cc}{\alpha }_{j}& \sim N\left({\gamma }_{0}^{\alpha }+{\gamma }_{1}^{\alpha }\left(\text{treatment}\right),{\sigma }_{{\alpha }_{j}}^{2}\right),\text{ for id j}=1,\dots ,\dot{J}\end{array}\end{array}$$
(1)

where.

  • \(duration\ min{s}_{\text{i}}\) is the time it takes officer j to complete report i, measured in minutes.

  • \({\alpha }_{j}\) is the average report writing time for officer j. This allows each officer to have their own baseline writing speed, recognizing natural variations in individual efficiency.

  • \({\gamma }_{0}^{\alpha }\) is the average report writing time for the control group. This represents the baseline writing speed without the AI tool.

  • \({\gamma }_{1}^{\alpha }\) is the effect of the treatment (using the AI tool) on report writing time. This coefficient will reveal whether the AI tool leads to a statistically significant difference in writing speed.

  • treatment is a binary variable indicating whether the officer is in the control group (0) or the experimental group (1).

  • \({\sigma }^{2}\) is the variance of the report writing times within officers. This accounts for the natural variation in writing speed for a single officer across different reports.

  • \({\sigma }_{{\alpha }_{j}}^{2}\) is the variance of the average report writing times between officers. This acknowledges that some officers may be naturally faster or slower writers than others.

  • j is the index for officers, ranging from 1 to J (total number of officers).

  • i is the index for reports written by a specific officer.

Results

We proceed with the preregistered analysis using the mixed-effect regression approach discussed above. The principal finding is that AI assistance did not significantly affect report writing duration. Results are reported in Table 2. Following the main results, to check the robustness of the finding, we provide four supplemental non-registered analyses, all of which support the main findings.

Table 2 Regression results—impact of AI assistance on report writing duration

In the pre-registered protocol main model, treatment was associated with a non-significant reduction of report completion time, with wide confidence intervals (b = -29.66, SE = 39.62). Given the observed skewness in the outcome, the same model with a logged duration outcome measure was analyzed, confirming the non-significant effect of AI assistance on report writing duration. Similarly, we evaluated a model that dropped the 5% longest reports, one that filtered to only reports less than 4 h, and a final model with only reports less than 1 h in duration. Across all specifications, treatment remained non-significant, demonstrating that AI assistance did not meaningfully impact report completion times regardless of the model used.

Our pre-registration also specified a supplemental test using a difference-in-differences model with fixed effects held by officer id, observing both control and treated officers’ reports in the pre- and post-intervention period. Results for that specification, using 1 year of data on report duration (n = 6084), were similarly statistically non-significant, and those results are reported in Appendix Table 3. In addition, while not pre-registered and not fully reported here, an event study specification was tested for the possibility that effects were driven by a familiarization phase. Results of the event study were, like all other robustness checks, null.

Discussion

In a pre-registered protocol, we have provided the first experimental evidence on the impact of AI-assisted report-writing technology on police officers’ report-writing efficiency. While there is widespread hope that efficiency gains could improve the ongoing staffing challenges faced by many agencies (Adams et al., 2023; Mourtgos et al., 2022), our results suggest caution.

The null effects observed in our study conflict with the broader literature on technological advancements improving efficiency in various sectors (Brynjolfsson & McAfee, 2014; Czarnitzki et al., 2023). The context of policing, however, is known to be unique, and researchers are warned to be context-sensitive when considering the potential effects of technology in the policing workplace (Koper et al., 2014).

Several potential policing realities may explain the null effects. Many officers, and indeed even many agencies, already utilize templates or other boilerplate prose for writing reports on common calls and offenses (Dement & Inglis, 2024; Miller & Whitehead, 2014). To the extent that Draft One requires officers to fill in or confirm the details of an incident, the process may not be substantially different from the template approach already commonly utilized (Adams, 2024). Another limit may be that a technology that assists in creating a report narrative does not substantively affect the overall report duration. This is due to the realities of police report writing, in which the entire report engages the officer in more than just a narrative. For example, officers writing reports are typically required to complete a great deal of data entry, such as individual entries for every person they spoke to (complainants, victims, witnesses, and suspects), as well as any and all evidence or other property that came into the officer’s possession during their response to the case (recovered property, drugs, weapons, etc.).

Therefore, even if AI technology like Draft One can streamline the narrative-writing process, it may not significantly reduce the total time required to complete a report. The bulk of police report writing involves meticulous data entry and documentation of various aspects of an incident that AI may not yet be equipped to manage efficiently. Furthermore, the rigid structures already in place, such as templates and standardized data fields, may limit the potential time savings from narrative assistance. These factors suggest that while technological advancements hold promise, their application in policing may face unique constraints that dampen their expected efficiency gains. Like other industries, artificial intelligence technology’s impact on police productivity is context-dependent (Czarnitzki et al., 2023), and the complexities of law enforcement reporting present a distinct challenge that requires more tailored innovations to see substantial improvements in efficiency (Koper et al., 2014; Lum et al., 2017; Mastrobuoni, 2020).

Pushing forward the AI-report writing research agenda

Our results should not be interpreted as a dismissal of all potential effects of AI-assisted report writing. Broadly, these effects can be categorized into efficiency, quality, consistency, and consumption (Adams, 2024; see also Dement & Inglis, 2024). While our study found no significant time savings—contrary to the marketing claims surrounding AI—efficiency should not be the sole focus. Report quality remains a persistent concern in policing, with long-standing issues related to poor spelling, grammar, voice, and tone. AI assistance has the potential to address these issues, and Axon’s internal study suggests that their Draft One system produces reports with improved terminology and coherence while maintaining similar levels of completeness, neutrality, and objectivity (Axon, 2024). However, these findings require independent verification—as noted previously, Axon also claimed an 82% reduction in report-writing time. Future research should develop comprehensive metrics to evaluate the quality of AI-assisted reports, considering factors such as accuracy, completeness, and evidentiary value.

The consistency of report writing is another area where AI could play a beneficial role, potentially reducing variability between officers. However, this consistency might come at the cost of individuality and context-specific nuance, which are often crucial in police reports. Standardization could inadvertently lead to reports that are less reflective of unique incidents, potentially overlooking critical details that are important for legal proceedings or community relations.

Moreover, the downstream consumption of these reports—by courts, lawyers, community oversight bodies, media, and even academics—might also be impacted by the introduction of AI. AI-generated reports may be perceived as more uniform or polished, which could influence how they are interpreted or valued by different stakeholders. This could lead to positive outcomes, such as increased credibility and readability, but also negative consequences, such as a reduced sense of transparency or authenticity. There may be other “downstream” effects on the court system that emerge from AI-assisted report writing, such as better evidence recording and quality report writing for prosecutors to charge and convict suspects (Boivin & Gendron, 2022). Consistently higher-quality AI-assisted reports might raise evidentiary standards in the criminal justice system, presenting challenges for cases based on traditionally written reports. More detailed reports could also require additional time for legal review, potentially creating new bottlenecks in the court system.

At the same time, the downstream effects of AI-assisted report writing on the criminal justice system may not be positive. Ferguson (2024) presents compelling concerns about AI-assisted police reports reshaping the criminal justice system. He argues that these reports could profoundly impact every stage of the process, from charging to sentencing. Prosecutors and judges may rely on AI-generated content for critical decisions without fully grasping its limitations or biases. Ferguson highlights potential discovery issues, questioning whether audit logs, prompts, and training data should be disclosed alongside the final report. At trial, he notes the challenges in cross-examining opaque AI-generated content. In plea bargaining and sentencing, especially for misdemeanors, these reports might disproportionately influence outcomes. Ultimately, Ferguson cautions that generative suspicion could erode human judgment and accountability in the justice system.

On the other hand, we should also consider the potential for agency-level efficiencies that may arise even when the initial report writing duration does not change, as observed in our experiment. If AI assistance improves the consistency and quality of reports, it is plausible that sergeants or supervisors responsible for reviewing and approving these reports may find fewer reasons to reject or require revisions. This could streamline the reporting process, reducing the time spent on back-and-forth edits and approvals, thereby enhancing overall efficiency within the agency. Moreover, the reduction in report rejections could allow officers to spend more time on patrol or other critical duties, further contributing to operational efficiency. Thus, while our findings suggest that AI assistance does not significantly reduce the time taken to write reports initially, its impact on the broader workflow and administrative processes within a police department could still offer valuable gains in efficiency. If this potential effect is realized, officers would spend less time revisiting and revising reports, increasing the operational time available for other duties (Chartrand & Verret, 2023).

Limitations

As with all experimental settings, our design emphasizes internal validity while acknowledging that external validity remains the burden of ongoing and future research. In other words, the primary limitation of our effort is its focus on a single agency. Replication studies across a variety of contexts are necessary and should include smaller and larger agencies, rural and metropolitan settings, and international contexts to validate and extend our findings.

A second limitation relates to our intent-to-treat design, which closely mimics real-world field conditions in which officers are given access to the AI-assisted report-writing tool but will not use it in every report. For example, some interactions are simply not captured by BWC, and therefore no BWC transcript is available. In other situations, officers may not choose to use the AI tool for a variety of reasons. Future work could consider a treatment-on-treated design with carefully controlled laboratory conditions. While this would obviously limit the external generalizability of any findings, if carefully done, it would also allow researchers to establish a ceiling effect for the technology.

Conclusion

We have provided the first experimental evidence of AI-assisted report writing in law enforcement, showing that despite vendor claims of 82% (Keough, 2024), real-world testing resulted in no significant time savings. As we are at just the beginning of the adoption curve, results should be interpreted cautiously. As seen in previous body-worn camera research, initial findings may not be consistently replicated across varied settings (Lum et al., 2019). Further research is needed to validate these results across diverse agencies and to assess long-term impacts on report quality, accuracy, and downstream criminal justice outcomes. Future studies should pay additional attention to potential unintended consequences and ethical considerations, particularly the effects of police technology on vulnerable populations and on core constitutional concerns (Adams & Mastracci, 2017; Ferguson, 2024).

The marketing narrative surrounding AI-assisted technologies has heavily emphasized time savings (Keough, 2024), but our experimental findings provide a strong challenge to this claim. As the inevitable tide of AI-assisted technologies comes to policing’s shores, it is essential to approach the widespread adoption of AI technologies with a critical eye. As seen here, the promised efficiencies may not materialize as expected. Instead of assuming success, scholars and practitioners should be more open to the possibility that these tools might not deliver on all fronts and adjust our expectations accordingly.