Randomized controlled trial procedures
The MPD was tasked with deploying all its BWCs by the end of 2016. The department had nearly 2300 full-time sworn officers, which included approximately 1100 patrol officers who were prioritized to receive BWCs. This mandatory deployment took place over four phases, providing an opportunity for the evaluation team to conduct a RCT during this rollout. The first deployment of cameras occurred in October 2015 with 182 officers as part of a pilot program to assess best training practices for larger deployments. Phase 2 occurred in March 2016, during which we worked with the MPD to initiate a RCT where 252 officers were randomly assigned a BWC (the “treatment group”) and 252 officers continued their work without BWCs (the “control group”). An additional 16 officers who were not part of the RCT were also given BWCs during the second deployment. The third phase occurred in June 2016 when 238 officers who were not part of the RCT were equipped with BWCs. The final deployment occurred across November and December 2016, when BWCs were distributed to the 252 control group officers and an additional 171 officers not involved in the RCT. Thus, the treatment group officers received their cameras around March 21, 2016, and the control group officers received their cameras beginning November 22, 2016, allowing for a 247-day intervention period (approximately 8 months).
As other studies on BWCs have noted, having officers volunteer into a research study can threaten external validity in that any findings may only be generalized to officers who volunteer (Ready and Young 2015; Young and Ready 2018). As implemented in this study, the officers assigned to receive a BWC were randomly, but also mandatorily, assigned a BWC. This methodology has proved successful in other large-scale BWC evaluations (Braga et al. 2017; Owens et al. 2014; White et al. 2017).
To facilitate the randomization, the MPD provided us with the department roster of all sworn personnel prioritized for BWCs a month prior to the second phase of BWC deployments. Officers who received a BWC as part of the department’s pilot program, who were not patrol officers, and who were on any form of limited duty were considered ineligible for the RCT and removed from the roster. Officers from District 5 and the Neighborhood Task Force, who operate outside of district boundaries, received BWCs during the pilot deployment, leading to the majority of officers from those assignments to already have BWCs. As such, any remaining officers from these assignments were considered ineligible for the RCT since their working conditions were considerably different from the other districts where BWCs were not yet deployed. A total of 666 patrol officers and sergeants, from Districts 1, 2, 3, 4, 6, and 7, remained after filtering the roster on these criteria.
To randomly select the 504 officers who would be included in the study, we worked closely with the MPD to conduct a stratified random sampling procedure, where the strata included the officer’s race (non-Hispanic white or nonwhite) and shift (power, late, day, or early). The sample of officers from each district was proportional to the size of that district relative to the overall department. For example, officers in District 1 represented 12% of the overall department; thus, officers from District 1 made up 12% of the RCT sample. Once we knew the total sample per district based on these proportions, we were able to randomly select the appropriate number of officers from each district, and then randomly assign them by strata into the treatment and control groupings. The randomization was completed in SPSS.
Group balance
To determine the success of the randomization procedure, balance between the experimental groups on key officer characteristics was assessed with the effect-size statistics presented in Table 1. Imbalance would be exhibited by Cohen’s d in excess of 0.20 and a t in excess of 1.96. The characteristics examined included the proportion of sergeants, females, race groups (white, black, Hispanic, and Asian), and the average tenure in the job. Furthermore, we examined the degree the experimental groups differed on proactivity during a pre-intervention period that matched the number of days of the study period (247 days, July 18, 2015 to March 20, 2016). These included the total number of proactive activities, business checks, traffic stops, subject stops, and park and walks. None of the tests found significant differences between the treatment and control groups across the officer characteristics or their pre-intervention proactive activities. As a result of these findings, we were confident that the random assignments of officers into the two experimental groupings were balanced in their composition.
Table 1 Group balance diagnostics Data and analytic strategy
The data for the current study come from administrative records and the MPD’s proactivity database. The MPD requires officers to record the start and end times of all proactive activities using the computer-aided dispatch system on their in-car computers. There are many types of activities recorded in these data, but traffic stops (43.7%), business checks (25.2%), subject stops (11.0%), and park and walks (4.2%) were the top four proactive activities and made up 84.1% of all such activities during the study period. The proactivity data were at the officer/event unit of analysis. That is to say, if three officers were involved in a single activity, then three cases were present in the data for that event, one for each officer. This data structure allowed us to calculate the total count of proactive activities for each officer and identify which officers were present at the event.
The BWC intervention study period occurred across 247 days (approximately 8 months) from March 21, 2016, to November 22, 2016. We used an equal period of time for the pre-intervention period, which lasted from July 18, 2015, to March 20, 2016. We used panel data to better measure change across time, where the data were aggregated into sixteen 31-day periods, creating eight pre- and post-intervention periods across the two experimental groups. The first and last panels included 30 days.
It is worth noting that the majority of patrol officers’ time in each shift is spent responding to calls for service from community members and helping to investigate crimes. For that reason, officers typically have a limited amount of time in their workday to engage in proactive activities; however, analysis of the officers’ activities shows that a large proportion of their time is spent conducting proactive activities. In the pre-intervention time period, the group mean of the officers’ on-call events was 38.0% (SD = 19.1) proactive activities. However, the amount of time officers spent at these events was much lower, corresponding to 20.6% (SD = 14.3) of their time on all activities. Thus, these officers conducted many proactive activities during their work day, but spent less time on these activities compared to calls for service.
Table 2 displays the descriptive statistics of the outcomes across the 31-day panels for the full study period, both the pre- and post-intervention periods, and the paired-sample t-statistics to assess the change of the outcomes for the RCT officers. Results indicate that the average amount of both subject stops and park and walks significantly declined during this time, but the other activities were not significantly different.
Table 2 Descriptive statistics of proactive activities panels Because all of the outcomes are count measures with evidence of skewness and overdispersion, we used random-effects negative binomial panel regression models to prevent biased estimates that could result from ordinary least squares regressions (Hilbe 2011; Long and Freese 2006; MacDonald and Lattimore 2010). We used the NBVARGR command and post-estimation statistics in Stata 15.1 to assess whether the Poisson or negative binominal distributions were more appropriate for these data. Negative binomial regression models, as compared to Poisson models, take into account unobserved heterogeneity among observations and do not have downward-biased standard errors. The negative binominal models were superior with all outcomes.
We used difference-in-differences (DiD) to estimate the difference between the treatment officers’ pre-intervention and post-intervention outcomes, relative to the same difference for the control officers in the experiment. To estimate the DiD, we included the following independent variables in our models: group, a dummy variable identifying whether an individual officer was in the treatment group (1) or control group (0); period, an indicator of whether each panel was during the intervention (1) or pre-intervention period (0); and the product of the group dummy with the period dummy, which provided the DiD estimate of the effect of BWCs on the outcome. We also controlled for a number of covariates in the model, including whether the officers were a sergeant (yes = 1, no = 0), their sex (female = 1), dummy coded race variables (black, Hispanic, or other race) with white as the reference category, and the officers’ tenure as a MPD officer in years. These characteristics were measured on the date the RCT began.
The examined outcomes include the total count of all proactive activities, as well as the individual counts of traffic stops, business checks, subject stops, and park and walks. To better account for the rate of these events relative to officers’ time in the field, we included the total number of proactive activities as an exposure variable in the models examining the changes in traffic stops, business checks, subject stops, and park and walks. A critical assumption of DiD estimation is that the two experimental groups have similar pre-intervention trends in the selected outcome variables (i.e., the parallel trends assumption; see Bertrand et al. 2004). We examined trend lines for the eight pre-intervention periods for each outcome to assess this assumption and found that trend lines for the treatment group were clearly equal to the trend lines for the control group (i.e., the lines were overlapping entirely during the pre-intervention periods).
We used the XTNBREG command to provide incidence rate ratios of the DiD estimator described above and the covariates used in the models. By using the incidence rate ratio, we can more intuitively determine the percentage change of the outcome between the pre- and post-intervention periods on average (Piza 2012). For example, an incidence rate ratio for the DiD estimator of 1.04 would indicate that the treatment group’s count of the outcome increased by 4% on average during the post-intervention period compared to the control group, while a ratio of 0.87 would indicate that the treatment group’s count of the outcome decreased by 13% on average compared to the control group. Incidence rate ratios with continuous covariates are interpreted as a change in the outcome on average by the percentage with every one-unit increase of the covariate. For example, a 1.12 incidence rate ratio for the covariate measuring tenure (measured in years) would suggest that the outcome increased by approximately 12% on average with every additional year of tenure.
We tested three models for each outcome. The first, model A, was an unconditional regression that only included the variables for the DiD. This model allowed us to assess the change in the outcomes from the RCT with no other controls. Model B built on model A by adding BWC-contamination levels, a description of which is detailed in the next section. Finally, model C includes all other covariates described above. For each model, we report the incident rate ratios and 95% confidence intervals for each independent variable, the model’s Wald Chi-square estimate, and the Akaike information criterion and Bayesian information criterion to assess and compare model fit.
Contamination
Since the adoption of BWCs into the policing field, there have been many scholarly discussions about the problem of contamination in experimental evaluations. In particular, contamination threatens internal validity when officers who are and are not equipped with a BWC are involved in the same policing event. The randomization occurs at the individual unit of analysis, but policing activities occur at the event unit of analysis, often times with multiple officers, potentially leading to high contamination when analyzing outcomes at the event level. As Braga et al. (2017) has advised, contaminated control conditions can undermine the counterfactual contrast between officers that receive the treatment and officers that do not. Ariel et al. (2018) dive deep into this issue and go as far to say that BWC studies that are randomized by individual render experimental results misleading.
One approach in prior research to respond to this issue has been to randomize officers by shift, where all officers on a randomly chosen shift are equipped with a BWC during that shift while officers do not wear cameras during control shifts (see Ariel et al. 2015, 2016a, b; Farrar and Ariel 2013). However, this approach still yields intra-officer contamination in that the same officers would wear BWCs on one shift and leave them off on other shifts. This design also produces additional threats to internal validity as the study participants become aware of the intervention. For example, what is known as the compensatory rivalry social threat to validity could occur when officers increase their efforts to perform well in comparison to the treatment group (Horner et al. 2006). In this context, officers could alter their behaviors purposely when they don’t have a BWC to match their behaviors when they do. Another approach that reduces the compensatory rivalry social threat to validity is to randomize entire districts, where all officers within a randomly selected district are permanently assigned BWCs (Hedberg et al. 2016; Katz et al. 2014, 2015; Morrow et al. 2016).
The concerns raised about contamination for RCTs that randomize officers—as opposed to shifts or districts—are valid, but little research has estimated the extent contamination actually has on measured outcomes. The current study seeks to address this research gap by including a measure of contamination in our models. Using the officer-level identifiers in MPD’s proactive activity data, we were able to identify the proportion of officers with and without BWCs at each event. We also recognize that there were officers outside of the two experimental groups both with and without BWCs (e.g., officers who received cameras in the phase 1 and 3 deployments and officers not selected to be in the RCT sample during the phase 2 deployment). As such, we define contamination as present in an event when (1) a control group officer was with any BWC-wearing officers at the scene and where (2) a treatment group officer was at the scene with any officer not wearing a BWC.
Figure 1 details the trend lines of contamination levels during proactive activities for officers involved in the RCT. The events were separated into three mutually exclusive categories: events where control group officers were involved, events where treatment group officers were involved, and events where both control and treatment group officers were involved. Prior to the RCT, contamination levels for these officers—which include events where the non-BWC-equipped RCT officers interacted with one or more BWC-wearing officers from the phase 1 deployment—were relatively low, averaging at 5.15% from October 2015 to February 2016. Contamination increased greatly during the RCT study period. Overall, contamination levels for these officers averaged 34.4% from March to November 2016, but levels varied across the different types of events. As expected, contamination levels were at or near 100% for events where both the treatment and control group officers were involved. Treatment group officers had higher contamination levels than control group officers prior to the phase 3 BWC deployment that occurred in June 2016. This is because only 40.4% of the department had BWCs before that deployment, resulting in treatment group officers interacting with more officers that did not have BWCs in the daily activities. After the phase 3 deployment, 61.9% of MPD patrol officers were equipped with BWCs, causing this pattern to switch to where contamination levels decreased for the treatment group officers (as more MPD officers had BWCs) and increased for control group officers. After the 4th deployment in November and December 2016, nearing all officers had BWCs and contamination levels dropped significantly. In January 2017 (a month after the end of the RCT study period), the MPD removed BWCs from all sergeants, resulting in stable, low levels of contamination from January 2017 onward.
We examined contamination levels across proactive activities further by separating out the five distinct activities for all officers involved in the RCT, displayed in Fig. 2. Peak levels of contamination for the RCT officers occurred in July 2016, the month after the third phase of BWC deployments. During this month, 45.5% of all events were contaminated. Traffic stops accounted for the greatest amount of these contamination levels, ranging from 29.7 to 60.1% of the overall contamination during the RCT study period. This is perhaps not surprising considering that officers often request backup during these types of stops. On the other hand, subject stops only accounted for 7.0 to 13.6% of the total contamination during the RCT. This might be a result of officers not requesting backup during relatively brief encounters with individuals in the street. Park and walks consistently had the lowest levels of contamination, accounting for just 2.9 to 4.8% of the overall contamination during the RCT. This is likely because officers typically perform this activity by themselves and do not require backup.
As our analyses relied on panel regression models, we calculated the degree of officer-level-contaminated events for each outcome in each period. For example, in the business checks analyses, we calculated the percent contamination for each officer as the number of business checks with contamination divided by the total number of business checks within the specific panel period. This provided a more accurate measure specific to the examined outcome as opposed to calculating a general contamination level for each officer. It is worth emphasizing that this study does not examine how a mixture of officers with or without a BWC during a proactive activity affected the outcome of that encounter, but instead, examines whether the degree of monthly events with contamination influenced the amount of proactive activities individual officers conducted.