Introduction

On May 25, 2020, Mr. George Floyd’s life was taken by then Minneapolis police officer Derek Chauvin. Three other police officers, two of whom were rookies, looked on as Officer Chauvin knelt on Mr. Floyd’s neck for a reported 9 min and 29 s. By the time Officer Chauvin removed his knee, Mr. Floyd had already taken his last breath.

The New York Times reported that Chauvin, now convicted of Mr. Floyd’s murder, had 22 official citizen complaints filed against him over the course of a 19-year career, many due to overly aggressive behavior (Barker and Kovaleski 2020). However, even though he had been reprimanded by the Minneapolis Police Department (MPD) for his misconduct, Chauvin had still been assigned duties as a training officer for new recruits, affording him the opportunity to shape the behavior of his colleagues.

Chauvin’s continued position of influence within the MPD despite his history of misconduct was not an isolated occurrence. In 2017, The Washington Post reported that nearly 1900 officers had been terminated from police departments in major metropolitan areas of the United States in the preceding decade because of improper behavior, with hundreds of these officers being reinstated after their removal was fought by police unions (Kelly et al. 2017). As the Post notes, “in many cases, the underlying misconduct was undisputed” and thus officers with a documented history of deviant behavior continued their employment—presumably working in close interaction with their colleagues. And, even amongst those officers who were not reinstated, some may have gone on to work for a different police department, where, for example, Grunwald and Rappaport (2020) find that 3% of police in Florida are so-called “wandering officers” who secure a job at another agency after being fired elsewhere.

The sustained presence of deviant officers in the halls of police departments stands to be hugely consequential to the integrity of law enforcement and, by extension, prospects for meaningful police reform and the development of trust between police and the communities they serve. This is because the social ties between members of a police force are important bases for occupational socialization, where sustained workplace relationships (e.g., friendship, line management, workgroup membership, instruction from a field training officer) and more ephemeral workplace interactions (e.g., assistance at the scene of an incident, advice sharing) present officers with numerous opportunities to learn what is and is not acceptable behavior (Chappell and Piquero 2004; Conti and Doreian 2010; Doreian and Conti 2017; Getty et al. 2016; Ingram et al. 2013, 2018; Lee et al. 2013; McNulty 1994; Ouellet et al. 2019; Paoline 2003; Quispe-Torreblanca and Stewart 2019; Skolnick and Fyfe 1993; Van Maanen 1974; Wood et al. 2019). Indeed, criminological scholarship on the adverse effects of police sub-culture—namely its ability to normalize malfeasance (Barker 1977; Chappell and Piquero 2004; Lee et al. 2013; Punch 2000, 2003, 2010)—suggests that the continued employment of officers with a history of deviant behavior may result in a scenario wherein the colleagues of these individuals (e.g., their partners, subordinates, and ad hoc collaborators) are themselves led astray over the course of regular interaction for the purposes of training, case work, and patrol. Consequently, here we investigate whether routine on-the-job interaction between members of a police force induces an interdependence of misbehavior, asking: if an officer is directly tied to others who step out of line, what impact, if any, might it have on her own propensity to misbehave? Put simply, might police misconduct be “contagious?”

To answer this question, we explicitly adopt a network perspective to examine the association between webs of collaborative workplace interactions and sanctioned misbehavior amongst uniformed officers of the Dallas Police Department (DPD) throughout 2013 and 2014. To dynamically map workplace collaboration networks, we rely on an accessible source of data that, to our knowledge, has never been used to formally measure police interaction—i.e., daily records of 911 calls for service. Like many police departments in the U.S., the DPD dispatches multiple patrol officers in separate vehicles to respond to 911 calls, particularly those deemed high priority. Such joint response to an incident creates a collaborative link between the officers involved as they work together to remedy the situation, where the scene of the incident itself presents an opportunity for discussion about acceptable behavior, the modelling of acceptable behavior, and general knowledge exchange (see McNulty 1994). Of particular interest here, however, is the agglomeration of joint responses to 911 calls, which, in aggregate, constitute a dynamic, weighted (i.e., non-binary; valued), department-spanning social network of front-line policing wherein the association between any two officers is the number of times they collaborate over a given day.

Although we see considerable value in investigations of the role of organizational hierarchy (Quispe-Torreblanca and Stewart 2019; Ingram et al. 2013, 2018) and co-offending (Ouellet et al. 2019; Wood et al. 2019; Zhao and Papachristos 2020) in facilitating police misconduct, here we aim to contribute to the nascent body of research at the intersection of police deviance and social network analysis by using joint response to 911 calls to explore the behavioral implications of a broad set of direct, routine workplace interactions. Of course, whether police misconduct is found to be “contagious” likely depends on the nature of the intra-force social relationship that a researcher chooses to measure. Accordingly, some readers may balk at our decision to focus on daily, ad hoc workplace collaboration under the assumption that friends (i.e., “strong” social ties) and other informal acquaintances (e.g., confidantes, co-conspirators, and sources of advice) are more relevant conduits for social influence compared to the colleagues one is required to engage with at the scene of a 911 incident. Nevertheless, we maintain that the dense informal social networks that police officers build with one another from the point of recruitment (Conti and Doreian 2010; Doreian and Conti 2017) should be reinforced through formal on-the-job interaction (e.g., team membership and project collaboration), as in other organizations (e.g., see Ellwardt et al. 2012a; b; Lazega and Pattison 1999; Potter et al. 2015; Siciliano 2015).

Social Networks and Police Deviance

Law enforcement officers stand to be powerfully drawn into errant behavior by their co-workers (Barker 1977; Punch 2000, 2003, 2010). This is especially so when considering the general segregation of officers from the public in addition to their need to repeatedly interact in close proximity and win the trust of their colleagues by conforming to occupational norms from the beginning of their careers (Merrington 2017; Savitz 1970; Van Maanen 1974; Waddington 1999). Indeed, police academies in the U.S. have been compared to medical schools as they function as “hot houses” that facilitate the formation of atypically dense social networks that can serve as vehicles for the transmission of occupational culture (Conti and Doreian 2010; Doreian and Conti 2017). In particular, academies enable recruits to gain a sense of what it means to be a “true” police officer with respect to: (1) formal policies and procedures; and (2) the informal, “common-sense” knowledge employed when dealing with the ambiguity inherent to law enforcement—where this occupational learning continues outside of classrooms, both “on the street” and off-duty (Conti and Doreian 2010; Doreian and Conti 2017; McNulty 1994; Moskos 2008; Paoline 2003; Van Maanen 1974).

Of course, acceptable conduct, as much as misbehavior, should be subject to peer effects (Chappell and Piquero 2004; Paoline 2003; Sutherland 1947). Acknowledging potential socialization into both positive and negative police behavior is essential due to the former being more common than the latter. Specifically, while there is considerable and persuasive quantitative evidence of racially-biased policing (see Knox et al. 2020; Ross et al. 2018), in general, serious forms of police deviance are expected to be rare relative to the sheer volume of police activity and police-public interaction typical of U.S. police departments (Worden 1996). Indeed, data presented in prior research, in addition to the data we analyze here, suggest that when bad behavior occurs it is likely to take the form of comparatively benign police misconduct such as administrative infractions, accepting free food, speeding unnecessarily, or sleeping on duty (see Chappell and Piquero 2004; Donner et al. 2016a; Huff et al. 2018; Kane and White 2009; Lersch and Mieczkowski 2000; Rozema and Schanzenbach 2019; Son and Rome 2004; Terrill and Ingram 2016; Quispe-Torreblanca and Stewart 2019). To be sure, grave on-the-job malpractice in the form of police corruption (e.g., modification of normal police behavior for reward; collaboration with criminals) and police crime (e.g., excessive and unjustified use of force; sexual assault) occurs—with dangerous and fatal consequences—but it is expected to be relatively unusual given the number of police and the volume of police activity.Footnote 1 Consequently, officers should not be routinely exposed to colleagues engaged in serious forms of deviance (Rozema and Schanzenbach 2019; Terrill and Ingram 2016).Footnote 2 And, by extension, officers should encounter colleagues with varying histories of and attitudes about bad behavior in line with the patterning of social relationships throughout a police force.

This network perspective on officers’ exposure to deviance via their co-workers (Ouellet et al. 2019; 2020; Quispe-Torreblanca and Stewart 2019; Wood et al. 2019; Zhao and Papachristos 2020) occupies a middle ground between the extremes of under- and over-socialized accounts of why police misbehave—i.e., “bad apples” versus “rotten barrels/orchards.” That is, a network perspective eschews a view of officers as “lone wolves” in order to focus on the behavior of police “in social relations” (Abbott 1997, p. 1152 and pp. 1165–1166; see also Brass et al. 1998). Simultaneously, it dispenses with the idea that officers are uniformly vulnerable to the influence of their aberrant colleagues by acknowledging that the social relationships between police, the behavioral implications of these relationships, and the behavior of police themselves are all far from monolithic. Consequently, a network perspective on police deviance has the great virtue of foregrounding intra-force heterogeneity such that it is consistent with conceptualizations of police sub-culture that emphasize how different officers will have distinct experiences with their colleagues, take diverging approaches to policing, and adhere to dissimilar, perhaps conflicting, behavioral logics around, for example, safety, competence, and machismo (see Campeau 2015; Herbert 1996; Herbert 1998; Ingram et al. 2018; Muir 1977; Paoline 2003; Paoline and Gau 2018; Son and Rome 2004).

Amongst scholarship focused on policing in the U.S., relational studies of deviance have generally been animated by social learning theory and, to a lesser degree, social control theory. The central premise of the former is that behavior is acquired through interaction with colleagues (e.g., field training officers and line managers) who model and normatively define action for some focal officer in a fashion that results in favorable or unfavorable views of deviance (Akers et al. 1979; Akers and Jennings 2015). On the other hand, social control theory posits that delinquent behavior is curtailed through officers’ positive socialization via strong ties to institutions and wider society (Wiatrowski et al. 1981).

In the various empirical applications of these two theories to the study of police, there is clearly the flavor of a network perspective on how officers encounter definitions of deviance (i.e., attitudes, values, and beliefs about inappropriate conduct) and, ultimately, come to misbehave (see, e.g., Chappell and Piquero 2004; Donner et al. 2016b; Getty et al. 2016; Lee et al. 2013; Wolfe and Piquero 2011). Indeed, some of this research comprises a promising, nascent empirical literature on the social networks of police (e.g., Ouellet et al. 2019; Roithmayr 2016; Wood et al. 2019). However, studies wherein criminologists actually analyze deviance alongside a social network spanning all, or some meaningful proportion of, a police department’s officers are rare. Moreover, in the few studies that explicitly relate the social networks of police to their deviant behavior, scholars have focused on less-traditional social ties by investigating either: (1) networks of indirect associations (cf. friendship) whereby officers are distally connected through the sharing of line managers (Quispe-Torreblanca and Stewart 2019); or (2) networks composed solely of deviant links between a subset of officers in a department who co-offend, thus excluding police with an untarnished disciplinary history (Ouellet et al. 2019; Wood et al. 2019; Zhao and Papachristos 2020). Because of these study designs, past relational research on police deviance within the traditions of social learning theory and social control theory cannot tell us whether an intra-force network of direct, non-deviant relationships might be associated with an officer’s propensity to engage in misconduct. This represents a notable gap in criminological understanding of police behavior vis-à-vis police sociality as direct, non-deviant relationships (e.g., friendship, advice provision, gossip, and collaborative exchange) are likely fundamental ties between a police department’s officers—where connections of this kind have previously been linked to deviance in other domains (e.g., see Gallupe et al. 2019; Paluck et al. 2016; Ragan et al. 2014).

Accordingly, we set out to quantitatively gauge the extent of the evidence in support of the idea that police misconduct is “contagious.” Specifically, we assess whether a propensity to engage in police misconduct is positively associated with patterns of direct and routine interaction (i.e., collaboration during the same calls for service) with other officers who have themselves engaged in misconduct in the past. In line with social learning theory, we expect that the risk of an officer engaging in police misconduct will increase when her direct exposure to deviant colleagues grows (Hypothesis 1). However, we acknowledge the possibility of positive (i.e., non-deviant) socialization as there is no theoretical basis for the presumption that only forms of deviant behavior are learned. Thus, we also expect that the risk of an officer engaging in police misconduct will decrease as her direct exposure to colleagues with untarnished disciplinary records grows (Hypothesis 2).Footnote 3

Methods

The primary data used for our analysis consist of: (1) the complete population of incidents generating 911 calls for service to the Dallas Police Department (DPD) that DPD officers responded to during 2013 and 2014 (i.e., 1,165,136 call-generating incidents, where 3475 officers responded to at least one incident); (2) official records of all formally alleged misconduct that led to disciplinary action against DPD officers between 2010 and 2014; and (3) demographic data on the employees of the City of Dallas (e.g., age, ethnicity, hiring date) between 2012 and 2017. Data on call response and disciplinary action were obtained through open records requests made directly to the DPD by the second author (Request Reference Numbers: 2015-06773 and 2016-04342). These requests specifically asked for information about all police officers responding to each respective 911 incident as opposed to just information about the first officer on the scene or the officer filing the subsequent incident report. Data on the employees of the City of Dallas were obtained from the City of Dallas’ online search tool for previously fulfilled open records requests (Request Reference Number: C000213-010818; https://dallascityhall.com/). See “Appendix 1” for details on how we link the three sets of data using officers’ badge numbers, names, and hiring dates as well as “Appendix 2” for a review of limitations of using procedurally generated data.

For our study, we focused on incidents deemed “high priority”—i.e., incidents classed by the DPD as an “emergency” (Priority 1) or simply as “urgent” (Priority 2)—and incidents deemed “low priority”—i.e., incidents classed by the DPD as “general service” (Priority 3) or “non-critical” (Priority 4). Approximately 52% of the incidents (603,544) are high priority.

All 3475 responding officers for 2013 and 2014 constitute the sample for our analysis. However, in constructing the daily collaboration networks spanning the DPD, we only draw a collaborative tie between these officers when they jointly respond to an incident that is: (1) small in size (i.e., five officers or less); and (2) circumscribed in duration (i.e., all officers assigned on the same day; short). These two restrictions are imposed to help bolster the integrity of our assumption that police at the scene of an incident directly interact (see “Appendix 3” for further explanation). Furthermore, these restrictions help protect our analysis from the potential impact of undercounting officers at the scene of large and protracted incidents. Specifically, official records of which officers are dispatched to which calls may not include ancillary officers who arrive on the scenes of incidents without informing the dispatcher. This is especially so for “hot” calls, such as those for officers in distress, during which there may be exceptional motivation for multiple officers to respond outside of the normal dispatch process in order to ensure officer safety—thus inflating the actual (undocumented) number of police associated with an incident and, possibly, its length.Footnote 4 In total, 1,127,840 of the 1,165,136 incidents have five responding officers or less who all receive their assignment on the same day. These 1,127,840 incidents are used to construct the daily collaboration networks.Footnote 5

Dependent Variable

The outcome of interest for our study is a binary variable indicating, for each of the 730 days of 2013 and 2014, whether a given DPD officer in our sample engaged in any police misconduct that was reported and ultimately met with disciplinary action. To be clear, our dataset only includes information on alleged misconduct that was investigated and formally sanctioned by the DPD. The DPD did not provide us with information on any disciplinary cases wherein misconduct was alleged and did not result in disciplinary action of some form. Consequently, our dependent variable does not reflect unreported misbehavior or unsanctioned misconduct, instead only encapsulating the information we have on misconduct that resulted in some official form of disciplinary action, as formally documented by the DPD. Although our approach perhaps improves upon analyses of officers’ reports of their own misconduct (e.g., see Donner 2018, p. 6; Son and Rome 2004, p. 184), analyses of all instances of alleged misconduct regardless of outcome are the “gold standard” in police studies. Thus, a key limitation of our approach is that the DPD records we analyze may under-represent the frequency of deviance.Footnote 6

Disciplinary action, in Dallas and elsewhere, can stem from a wide array of behaviors. Although infractions such as excessive use of force and the abuse of an individual in police custody are canonical examples of police deviance, acts such as the acceptance of free lunches and other small gifts have also been classed as errant by criminologists (Chappell and Piquero 2004; Punch 2000). Here we draw on work by Thomas Barker and David Carter, as cited in Donner et al. (2016a, p. 744), to view police deviance as activities that are inconsistent with officers’ legal and organizational authority and/or their standards of ethical behavior. This definition is broad enough to accommodate the myriad forms of deviance discussed in the criminological literature, namely: (1) job-specific malpractice that is nevertheless legal (Kane 2002); (2) the moderation of normal police behavior for some reward, and/or formal partnership with organized crime (Lauchs et al. 2011); and (3) the violation of legislatively-enacted laws (Donner et al. 2016a). We respectively class these forms of police deviance as police misconduct, police corruption, and police crime.

Returning to our data, we have information on the date that each disciplinary incident occurred, the date that an allegation of inappropriate behavior was received by the DPD, as well as the date that disciplinary action was taken by the DPD against the misbehaving officer. In total, there were 2651 disciplinary incidents involving one or more of the 3475 DPD officers in our sample between 2010 and 2014, inclusive. From these disciplinary incidents, we remove 15 wherein allegations of misbehavior were ultimately rescinded, or officers were ultimately exonerated, by department leadership. We also remove 131 “complex” disciplinary incidents wherein more than one of the 3475 DPD officers that responded to 911 calls in 2013 and 2014 were alleged to have been involved, analysis of which we eschew in order to ensure that we model the behavior of the individuals in our sample.Footnote 7 This left us with 2506 disciplinary incidents involving 2703 instances of police deviance across the 3475 officers.

Most of the instances of deviance are forms of police misconduct and are relatively benign.Footnote 8 Specifically, just 51 of the 2703 infractions are best classified as police crime (e.g., assault, unnecessary use of force, fraud, and sexual misconduct) whereas the remaining 2652 infractions are of an administrative nature and are best classified as police misconduct (see Supplementary Information [SI] Table 1 for our classification of the 134 unique infractions recorded by the DPD). Here we limit our attention to police misconduct as the small number of cases of police crime preclude large-scale quantitative assessment. Additionally, we restrict our analysis to the level of the day as this is the temporal granularity at which the disciplinary action records were constructed by the DPD. Consequently, we arrive at our dependent variable which, again, is a binary indicator for which of the 730 days of 2013 and 2014 that each DPD officer is recorded as having engaged in any police misconduct that led to disciplinary action (hereafter, “sanctioned misconduct”). We also stratify our regressions (discussed below) using the number of days that an officer is recorded as having engaged in any sanctioned misconduct prior to day \(t\), counting from the beginning of our disciplinary action data in 2010.

Across the 3475 officers in our sample, 1082 of the 2,536,750 officer-day observations (3475 officers \(\times\) 730 days) for 2013 and 2014 see an officer engage in one or more forms of sanctioned misconduct, where the number of previous “days of misconduct,” counting from 2010 until time \(t\), ranges from 0 to 10 across the officer-day observations (see SI Table 2). For our analysis, we focus on the time until sanctioned misconduct (i.e., “day of misconduct” = 1 on some given day \(t\))—a repeatable event—and whether this time is associated with an officer’s direct, on-the-job exposure to colleagues who themselves have engaged in sanctioned misconduct in the past and, conversely, colleagues who have not engaged in sanctioned misconduct in the past.Footnote 9

Independent Variables

Following Kane’s (2002) assessment of the spatial dependence of misconduct rates across police precincts, as well as prior studies of the network dependence of misconduct across police (Ouellet et al. 2019; Quispe-Torreblanca and Stewart 2019), our main correlates of interest are two “spatial lags.” In their most basic form, these spatial lags capture: (1) the weighted sum of a focal officer’s number of colleagues who engage in sanctioned misconduct on day \(t\) (hereafter labelled “calls with deviant colleagues”); and (2) the weighted sum of a focal officer’s number of colleagues who do not engage in sanctioned misconduct on day \(t\) (hereafter “calls with non-deviant colleagues”).

Formally, officer response to calls for service naturally constitutes two-mode (i.e., actor-by-event) networks which may be represented by a matrix with dimensions \(N \times Z\) where, in the present case, \(Z\) is the number of unique small, short call-generating incidents (whether high-priority or low-priority) occurring across a given day \(t\) and \(N\) is the number of DPD officers in our sample. As we are interested in any evidence suggestive of spillover in officers’ propensities to misbehave, we transform or “project” this two-mode structure to create a symmetric network represented by an \(N \times N\) matrix \(W\) that encodes the number of times any two DPD officers respond to the same 911 incident across a given day \(t\). The connectivity matrix \(W\) is then used to weight the behavior of those to whom an officer is directly connected, where the spatial lag for exposure to deviance is simply the sum of these weights. Specifically, this spatial lag is given by \(\sum\nolimits_{j,i \ne j}^{N} {w_{ij}^{t} } y_{jt}\), where \(N =\) 3475 officers, \(w_{ij}\) is the cell in \(W\) encoding the number of unique small, short incidents that officers \(i\) and \(j\) jointly respond to on day \(t\), and \(y_{t}\) is a binary vector indicating whether or not each of the \(N\) officers engaged in any misconduct that was ultimately sanctioned on day \(t\). To construct the spatial lag for non-deviant conduct, \(y_{t}\) is simply inverted to \(y_{t}^{\prime }\).

Valid estimates for the association between a spatial lag and some outcome of interest requires careful specification of \(W\) as results stand to dramatically differ based on how this matrix is transformed. Importantly, \(W\) should not be specified based on convention alone. Here we specify \(W\) in a straightforward manner based on the directives of Neumayer and Plümper (2016). Specifically, we leave \(W\) “as is”—i.e., a connectivity matrix of counts as described in the above paragraph. This results in the following assumptions. First, officers are differentially exposed to their peers (i.e., heterogeneous total exposure or, rather, all row-vectors in \(W\) sum to different strengths/values). Second, the relative importance of the behavior of a colleague \(j\) for some focal officer \(i\) is fully determined by the number of collaborative events between \(i\) and \(j\) (i.e., tie strength). And third, those officers who share no collaborative events with \(i\) on a given day \(t\) are irrelevant to \(i\)’s behavior.

Although transformation of \(W\) via row-standardization is typically done following influential work on “network autocorrelation” (e.g., Leenders 2002) and “spatial dependence” (e.g., Lacombe and LeSage 2018), Neumayer and Plümper (2016) forcefully argue against this practice, maintaining that row-standardization should be avoided without strong theoretical justification as it imposes an assumption of homogenous total exposure—i.e., all row-vectors in \(W\) sum to unity—which erases between-actor (i.e., between-row) heterogeneity. While Neumayer and Plümper (2016) argue against row-normalization broadly, we avoid it specifically within the context of police studies as homogenous total exposure contravenes the aforementioned theoretical work on police sub-culture which, again, underscores the diversity of officers and their experiences (e.g., see Campeau 2015; Herbert 1996; Ingram et al. 2018; Paoline 2003; Paoline and Gau 2018).

Moreover, spatial lags of the form we adopt here clearly impose a strict assumption on how social influence is presumed to unfold—i.e., an officer’s behavior on a single day is only impacted by direct interaction with deviant/non-deviant colleagues across a single day. We can, of course, define much longer “exposure windows” of, for example, a few weeks or a few months. Yet, the existing criminological literature provides no theoretical justification for these choices and long exposure windows strike us as implausible due to the sheer number of incidents that typical patrol officers are generally required to respond to (Moskos 2007; Jaramillo 2019), such that cooperative experiences beyond the recent past may be quickly forgotten. On the other hand, a one-day exposure window is also arbitrary and, perhaps, overly restrictive.

Accordingly, we opted for a middle-ground approach by using the cumulative sum of calls with deviant/non-deviant colleagues, where this cumulative sum then “decays” each day to incorporate an element of “forgetting.” To clarify, consider, for example, a string of seven days starting on the first day of our observation period (i.e., January 1, 2013) for which a hypothetical officer has two “calls with deviant colleagues” on day one, one deviant call on day two, four deviant calls on day six, one deviant call on day seven, and no deviant calls on the remaining days—i.e., \(\left\{ {2_{{t_{1} }} ,1_{{t_{2} }} ,0_{{t_{3} }} ,0_{{t_{4} }} ,0_{{t_{5} }} ,4_{{t_{6} }} ,1_{{t_{7} }} } \right\}\). Using a multiplicative factor whereby this officer’s cumulative sum for “calls with deviant colleagues” decays by 50% each day (i.e., 0.5), the set of values used to fit our models would be \(\left\{ {2_{{t_{1} }} ,2_{{t_{2} }} ,1_{{t_{3} }} ,0.5_{{t_{4} }} ,0.25_{{t_{5} }} ,4.125_{{t_{6} }} ,3.0625_{{t_{7} }} } \right\}\). To calculate the first three numbers of the new set of values representing “calls with deviant colleagues” with a 50% decay: our hypothetical officer’s original value for “calls with deviant colleagues” at \(t_{1}\) (i.e., 2) is taken as given; his original value for “calls with deviant colleagues” at \(t_{2}\) (i.e., 1) is added to his original value at \(t_{1}\) multiplied by the decay factor of 0.5 (i.e., \(2 \times 0.5 + 1\)) to get a new \(t_{2}\) decayed value of 2; and his original value for “calls with deviant colleagues” at \(t_{3}\) (i.e. 0) is added to his decayed value at \(t_{2}\) (i.e., 2) multiplied by the decay factor of 0.5 (i.e., \(2 \times 0.5 + 0\)) to get a new \(t_{3}\) decayed value of 1. The remaining values of “calls with deviant colleagues” with a 50% decay are then produced using the same arithmetic behind the decayed value at \(t_{3}\) such that our hypothetical officer’s decayed value at \(t_{4}\) is the result of adding his original value for “calls with deviant colleagues” at \(t_{4}\) (i.e., 0) to his decayed value at \(t_{3}\) (i.e., 1) multiplied by the decay factor of 0.5 (i.e., \(1 \times 0.5 + 0\)) to get 0.5, so on and so forth until the end of our observation period (i.e., December 31, 2014) and mutatis mutandis for “calls with non-deviant colleagues.”

Crucially, our approach balances: (1) concern that an officer is unable to remember every cooperative experience with their colleagues; with (2) an awareness that the behavioral consequences of an officer’s exposure to their colleagues on any given day may manifest over longer time scales. Furthermore, our approach sidesteps the “sharp transition” that is inherent to using the daily spatial lags by themselves (i.e., 100% decay) or taking the sum of calls with deviant/non-deviant colleagues using, for example, a rolling sum of three days, as these exposure-window-based approaches would see the value of the spatial lag drop instantly to zero at the end of each one-day/three-day period when an officer subsequently responds to no additional calls with deviant/non-deviant colleagues within the one-day/three-day window. In contrast, use of the daily cumulative sum with a daily decay allows the value of the spatial lag at time \(t\) to continue to degrade (i.e., approach zero) each day until the end of our observation period or until an officer has more calls with deviant/non-deviant colleagues.

Of course, there is still an element of arbitrariness to our approach as the existing criminological literature does not indicate what decay factor one ought to use to best capture an officer’s “forgetting.” Thus, we opt for a small rate of decay—i.e., 5% or a decay factor of 0.95—under the assumption that an officer’s socialization will be both cumulative and salient. However, to judge the robustness of our findings, we also fit models using 10%, 20%, 30%, 40%, and 50% decay factors.

Last, note that the decayed cumulative sum of the daily spatial lags is itself temporally lagged by one day when fitting all of our models. That is, we use decayed cumulative calls with deviant/non-deviant colleagues on \(t_{ - 1}\) to model sanctioned misconduct at \(t_{1}\). We fit models in this manner to reflect our assumption that any network effect is unlikely to be instantaneous or, rather, that officers take time to react when exposed to the behavior of their ad hoc collaborators. Temporally lagging by one day also assuages concerns around reverse causality. This is because poor behavior may impact officers’ job assignments and thus their availability for call response and, by extension, front-line exposure to peers.

Other Variables

We adjusted our models for the confounding influences of an officer’s gender, age, ethnicity, and department tenure. Furthermore, we adjusted for whether an officer was involved with patrol work on a given day \(t\) (versus non-patrol tasks or not being at work) using a binary variable (Non-Patrol Day) that equals one for the days on which an officer responds to zero 911 calls for service of any severity, size, and length.

With census tract data from the 2012–2016 American Community Survey and geographic information for the analyzed calls for service, we also adjusted our models for the social context of policing. This is vital as prior research convincingly demonstrates that the nature of police work is contingent upon setting and situational characteristics, such that police are more likely to use force and to engage in misconduct in disadvantaged areas (Ba et al. 2021; Kane 2002; Kirk 2008; Parker et al. 2005; Reiss 1968; Sherman 1980; Smith 1986; Terrill and Reisig 2016). And our ability to account for key features of the social context of policing in a granular manner distinguishes our research from the few other networked-based studies of police misconduct and excessive use of force (Ouellet et al. 2019; Quispe-Torreblanca and Stewart 2019).

Specifically, for each of the 1,165,136 calls for service in our data, we matched its street address to the corresponding census tract to account for variation in the neighborhood conditions within which officers were tasked with responding to calls for service. This was done to create three binary indicators for whether, on a given day \(t\), an officer responds to at least one 911 incident (of any severity, size, and length) in an area where: (1) at least 75% of the residents are Black (“predominantly-Black”); (2) at least 75% of the residents are Hispanic (any race) (“predominantly-Hispanic”); and (3) at least 40% of families are below the poverty line (“concentrated poverty”). For example, if an officer were to respond to four calls for service on day \(t\), the binary indicator for exposure to areas of concentrated poverty would be equal to one if any of her four calls were in a census tract where the percentage of families below the poverty line is greater than or equal to 40% (i.e., a flag for whether a given officer spent some portion of her day responding to calls in an impoverished neighborhood). Accordingly, on those days wherein this officer responds to zero calls for service—or on those days wherein the calls she responds to have data for the poverty rate that are completely missing—her exposure to poverty for the purposes of responding to emergencies is assumed to be zero, mutatis mutandis for calls in predominantly-Black areas and calls in predominantly-Hispanic areas.Footnote 10 Before model fitting, the binary indicators for predominantly-Black, predominantly-Hispanic, and concentrated poverty exposure are all temporally lagged by one day as, similarly to the spatial lags, poor behavior may impact officers’ job assignments and thus their availability for call response and, by extension, front-line exposure to deprivation and majority-minority areas.Footnote 11

Descriptive statistics for all covariates appear in Table 1.

Table 1 Global descriptive statisticsa

Modeling Strategy

To relate the spatial lags to sanctioned misconduct, we used a style of Cox regression for repeated events with complex dependencies and time-varying covariates that is broadly in line with the directives of Box-Steffensmeier and colleagues (Box-Steffensmeier and De Boef 2006; Box-Steffensmeier et al. 2007, 2014; Box-Steffensmeier and Jones 2009). Specifically, we made use of event-specific baseline hazards by stratifying our models by event number—here, the number of days that an officer has engaged in any sanctioned misconduct prior to day \(t\), counting from the beginning of our disciplinary action data in 2010.Footnote 12 Theoretically speaking, stratification reflects our assumption that an officer’s “days of misconduct” are dependent/conditional upon one another such that past misbehavior may shape how one acts in the future (Donner 2018). Practically speaking, stratification restricts the risk set (i.e., the set of officers at risk of engaging in misconduct on a given day \(t\)) such that the risk set for the \(k\)th “day of misconduct” is only comprised of the risk intervals (i.e., officer-day observations) for officers who have experienced \(k - 1\) “days of misconduct”—where there are 2,536,750 possible risk intervals, or one risk interval for each of the 3475 officers for each day of 2013 and 2014 (i.e., \(3475 \times 730\)). Furthermore, our models include varying effects or “frailties” for each DPD officer in order to adjust for unknown, unmeasured, or unmeasurable factors that make some officers intrinsically more or less prone to engaging in misbehavior (i.e., actor-specific excess risk possibly attributable to factors such as a lack of self-control and/or lower inhibitions around rule-breaking that we cannot explicitly account for). As a result, the models used here are akin to multilevel models with random intercepts (Austin 2017).

Formally, and using the notation of Balan and colleagues (Balan 2018; Balan and Putter 2019, 2020), the model we estimated is as follows. Let \(N_{i}\) represent an increasing, right-continuous counting process beginning on January 1, 2013 that is reflective of the event history of officer \(i\), where \(N_{i} \left( t \right)\) is the number of events (i.e., the number of “days of misconduct”) experienced by officer \(i\) up to day \(t\). Moreover, let \(Y_{i} \left( t \right)\) represent an indicator function that equals one if officer \(i\) is at risk of engaging in misconduct on day \(t\) and zero otherwise. Here \(N_{i}\) is modelled as a Poisson process and thus we are concerned with its intensity—i.e., the instantaneous probability of sanctioned misconduct on day \(t\) given the entirety of officer \(i\)’s event history, or, rather, the force of transition from a “day of non-deviant behavior” to a “day of misconduct” given \(N_{i} \left( t \right)\) (Andersen and Gill 1982; Balan 2018; Balan and Putter 2019, 2020). This may be contrasted with the familiar hazard for a single event which is the instantaneous probability of the event at time \(t\) given that it has not yet occurred (e.g., death). Although “hazard” and “intensity” are sometimes used interchangeably (Prentice et al. 1981), here we use the latter terminology to highlight the counting process formulation of our models and to ensure consistency with Balan and Putter (Balan 2018; Balan and Putter 2019, 2020).

The intensity of the counting process \(N_{i}\) for the \(k\)th strata (i.e., the \(k\)th “day of misconduct”) at time \(t\), or \(\lambda_{ik} \left( {t|Z_{i} } \right), \) is given as:

$$ \lambda_{ik} \left( {t|Z_{i} } \right) = Y_{i} \left( t \right)Z_{i} \exp \left( {\beta^{{\top }} \varvec{x}_{i} \left( t \right)} \right)\lambda_{0k} \left( t \right) $$
(1)

where \(Z_{i}\) is the unobserved frailty (i.e., the random/varying effect) shared across all risk intervals for officer \(i\) and \(\lambda_{0k}\) is the unspecified, nonnegative baseline intensity for the \(k\)th strata/“day of misconduct” at time \(t\) when all covariates are equal to zero. Furthermore, \({\varvec{x}}_{i} \left( t \right)\) is a \(p \times 1\) vector of \(p\) time varying and/or time invariant covariates for actor \(i\) at time \(t\), whereas \(\beta\) is the corresponding \(p \times 1\) vector of unknown parameters (i.e., the regression coefficients). Note that these parameters are conditional log intensity ratios which summarize the positive or negative association between some covariate of interest \(x_{ip}\) and the intensity of the \(k\)th event for a one-unit increase in \(x_{ip}\), where this association is multiplicative. It is assumed that event times are independent conditional on \(Z_{i}\) and that the frailties themselves are independent and identically distributed in line with a distribution \(Z\).

There is an active debate across the social and biomedical sciences around what distribution \(Z\) should be assumed to govern frailties (Austin 2017; Balan 2018; Balan and Putter 2019, 2020; Box-Steffensmeier and Jones 2009; Hougaard 2000; Therneau et al. 2003). Here we fit our models using the gamma distribution for \(Z\) in light of the simulation-based findings of Box-Steffensmeier et al. (2007, 2014) which indicate superior performance of stratified, repeated-events Cox models with gamma-distributed random effects in diverse scenarios typically of interest to social scientists—although the authors use gap (i.e., inter-event) time as opposed to the elapsed (i.e., calendar) time we employ here. In so doing, we rely on the “emfrail()” routine in Balan and Putter’s (2019) R package “frailtyEM” which: (1) adjusts standard errors for the parameter estimates \(\hat{\beta }\) based on uncertainty stemming from the estimation of the gamma distribution’s scale parameter \(\theta\); (2) provides a standard error and confidence interval for the estimated variance of the frailty terms; and (3) allows one to assess whether the proportional hazards/intensity assumption is met (i.e., the core assumption of Cox-style regression models) using the popular “cox.zph()” routine in Therneau’s (2018) R package “survival.” Assessment of whether a model specification inclusive of frailties is an improvement over that same model specification without frailties was done using a modified likelihood ratio test (see Balan 2018, pp. 37–40 and Balan and Putter 2019), which is the preferred formal assessment of \(H_{Null} {:} \theta = 0\) (Box-Steffensmeier et al. 2007; Therneau et al. 2003).

Balan and Putter (Balan 2018; Balan and Putter 2019, 2020) provide additional formalism and a discussion of their expectation–maximization estimation procedure in relation to other implementations of frailty models popular in the social and biomedical sciences. Additionally, our appendices provide extended details on key decisions we took in relation to our modelling strategy. Therein we specifically discuss: (1) how we handle “time” with respect to the construction of the risk intervals and time-varying covariates (“Appendix 4”); (2) the exclusions we made to the set of 2,536,750 possible risk intervals in order to construct the final analytic risk set reflective of officers’ missing data and employment dates (“Appendix 5”); (3) how we go about stratifying our models (“Appendix 5”); (4) our rationale for the use of spatial lags and survival analysis over spatial regression (“Appendix 6”); and (5) our ability to identify “contagion” with observational data (“Appendix 7”).

Results

Parameter estimates \(\hat{\beta }\) (associational; non-causal) and their confidence intervals from our main model using a 5% decay factor are depicted in Fig. 1 in descending order by magnitude. With respect to interpretation, recall that \(\hat{\beta }\) are estimated log intensity ratios (i.e., the bullet points in the figure). Accordingly, their exponentiation yields intensity ratios that summarize multiplicative shifts in the instantaneous probability of an event at time \(t\)—i.e., a transition from a day of non-deviant behavior to a day that includes one or more instances of sanctioned misconduct—between two officers with the same frailty and with covariate vectors that are identical except for a difference of one unit in the covariate of interest. Covariates with log intensity ratios greater than zero equate to additional “days of misconduct,” where the converse is true for log intensity ratios less than zero. With that in mind, estimates in Fig. 1 conflict with both of our hypotheses.

Fig. 1
figure 1

Covariate-specific risk of police misconduct (5% decay factor) for the Dallas Police Department (2013–2014). Parameter estimates \(\hat{\beta }\) (log intensity ratios; bullets) in descending order and 95% confidence intervals from a repeated-events survival model of days until a Dallas police officer engages in sanctioned police misconduct alongside the estimated variance of the gamma-distributed frailty parameters capturing officer-specific excess risk (inset, above). Note, when reading p values, \(e\) symbolizes base-10 scientific notation such that \(p = \) 8.97e−05 \( =\) 8.97 \(\times 10^{ - 5} =\) 0.0000897

Regarding our first hypothesis, jointly responding to an additional call for service with a deviant colleague who has engaged in misconduct in the past is negatively associated with the intensity of a transition to a “day of misconduct.” Specifically, a one-unit increase in Decayed (5%) Cumulative Calls with Deviant Colleagues is associated with a reduction in the intensity of a transition to a “day of misconduct” by a factor of \(\exp \left( { - 0.209} \right) = 0.811\) or 19%, holding the other covariates constant. One possible interpretation of this negative coefficient is that the sanctioned misconduct of a fellow officer deters other officers from engaging in similar behavior. However, the estimate is noisy and evidence against the null hypothesis of no effect is far from compelling (p value = 0.197).

In opposition to our second hypothesis, we find evidence to suggest that jointly responding to calls for service with colleagues who exhibited acceptable behavior in the past (Decayed (5%) Cumulative Calls with Non-Deviant Colleagues) is positively, rather than negatively, associated with the intensity of a transition to a “day of misconduct” (\(\hat{\beta } = 0.003\); p value = 0.008), although the effect is miniscule. More specifically, an additional 911 call for service with a colleague who has not engaged in sanctioned misconduct is associated with an increase in the intensity of a transition to a “day of misconduct” by a factor of \(\exp \left( {0.003} \right) = 1.003\), or 0.3%.

Turning to the sensitivity of our results, estimates \(\hat{\beta }\) for Decayed Cumulative Calls with Deviant Colleagues and estimates \(\hat{\beta }\) for Decayed Cumulative Calls with Non-Deviant Colleagues meaningfully vary when using different plausible decay factors. This is plainly demonstrated in Fig. 2 which plots \(\hat{\beta }\) and their 95% Confidence Intervals for Decayed Cumulative Calls with Deviant Colleagues (top) and Decayed Cumulative Calls with Non-Deviant Colleagues (bottom) using the different decay factors while adjusting for the same variables seen in Fig. 1 and using the same 2,232,677 officer-day observations/risk intervals. In both instances, as the decay factor increases, the estimates become more negative and more uncertain; the latter of which is perhaps unsurprising given fewer officer-day observations with non-zero values under larger decay factors. Practically speaking, comparing the AIC and the BIC across the six models (Fig. 2; \(x\)-axis) indicates that the specification depicted in Fig. 1 using the 5% decay factor is (marginally) preferred. Nevertheless, it is clear that the value of the estimated coefficients associated with our two variables of interest are not robust to different plausible decay factors. Accordingly, the most conservative conclusion is that there is no association between Decayed Cumulative Calls with Deviant Colleagues and the intensity of misconduct nor between Decayed Calls with Non-Deviant Colleagues and the intensity of misconduct for our sample of DPD officers in 2013 and 2014. Thus, we fail to find support for either of our hypotheses.

Fig. 2
figure 2

Risk of police misconduct for the Dallas Police Department (2013–2014) associated with Calls with Deviant/Non-Deviant Colleagues using decay factors from 5 to 50%, identical control variables, and identical risk intervals (see Fig. 1)

Maintaining focus on the AIC-/BIC-favored model in Fig. 1 and turning to the other covariates, results echo prior research on the neighborhood context of policing (e.g., see Kane 2002) in that they suggest that the intensity of misconduct varies with the characteristics of the census tracts that officers patrol, specifically tract deprivation (\(\hat{\beta } = 0.231\) for Any Calls in Areas of Concentrated Poverty; p value \(= 0.018\)). And, even when accounting for neighborhood context and involvement with patrol work (\(\hat{\beta } = - 1.012\) for Non-Patrol Day; p value \(< 0.005\)),Footnote 13 we also find compelling evidence that shifts in the intensity of misconduct are associated with the traits of individual officers. Specifically, and holding the other covariates constant, compared to White officers, the instantaneous probability of transitioning to a “day of misconduct” for Black officers and Hispanic/Latino/Spanish officers is estimated to be higher by a factor of \(\exp \left( {0.303} \right) = 1.354\), or 35% (p value \(< 0.005\)), and \(\exp \left( {0.269} \right) = 1.308\), or 31% (\(p\) value \(< 0.005\)), respectively.

These findings are consistent with some prior research related to racial/ethnic differences in the likelihood of police misconduct as well as police violence (Kane and White 2009; Lersch and Mieczkowski 2000; Ridgeway 2016, 2020; White and Kane 2013). And because we adjust for the characteristics of the neighborhoods within which DPD officers respond to calls for service, we have attempted to estimate racial/ethnic differences in the intensity of misconduct net of any tendency for the DPD to systematically task its White, Black, Hispanic/Latino/Spanish, and Asian officers with call response in different areas of Dallas because of their race/ethnicity (Brown and Frank 2007). Nevertheless, our findings diverge from recent research that exploits fine-grained data on differences in officer duty assignments by race (Ba et al. 2021; Hoekstra and Sloan 2020), perhaps suggesting that the association between race/ethnicity that we observe would disappear with more comprehensive data. Moreover, because we are unable to adjust our models in a fashion that accounts for differential treatment of racial/ethnic groups by the DPD with respect to its disciplinary process, the coefficients could reflect a greater tendency for the misconduct of Black and Hispanic/Latino/Spanish officers to be recorded and sanctioned by the DPD compared to misconduct on the part of their White colleagues. Indeed, new research by Ralph (2020) documents at length the institutional racism and marginalization of non-White officers inside big city police departments. Hence, a “code of silence” may more commonly shield White police officers from being disciplined for their misconduct compared to Black or Hispanic/Latino/Spanish officers. We return to the race/ethnicity coefficients below vis-à-vis internal (i.e., department) and external (i.e., citizen) complaints.

As for the DPD officers’ other traits, department tenure is associated with a decrease in the intensity of misconduct by a factor of \(\exp \left( { - 0.014} \right) = 0.986\) or 1% for each additional year on the job (\(p\) value \(< 0.005\)). Substantively speaking, the intensity of misconduct would weaken, holding the other covariates constant, by a factor of \(\exp \left( { - 0.014 \times 20} \right) = 0.756\), or 24%, if one were to move from being a relative newcomer with five years of experience to a veteran with twenty-five. Note that the parameter estimate for tenure maintains its negative expression when adjusting for officer age such that our findings likely reflect differences in conduct due to officer experience. However, age was dropped from the model specification in Fig. 1 due to its violation of the assumption that the intensities of officers being compared are constant through time (i.e., the proportional hazards assumption core to Cox-style regression).

Although the estimates summarizing covariate-specific shifts in the intensity of misconduct are interesting, a major strength of frailty models is the ability to go beyond observed heterogeneity in order to explore unobserved heterogeneity in the form of actor-specific excess risk that is unaccounted for by the set of covariates. In the present scenario, our model suggests that this risk merits discussion—although we stress that frailties could reflect unmeasured officer characteristics (e.g., personality) and/or unmeasured, misconduct-inducing situational factors that are simply correlated with unmeasured officer characteristics, especially given the recognized importance of situation to officer behavior (Ridgeway 2020).

Specifically, for the model depicted in Fig. 1, the p value for the likelihood ratio test comparing the fit of the model with frailties to the fit of the same model without frailties is 0.003, indicating the presence of excess risk; where the estimated variance of the gamma-distributed frailties for the DPD officers is approximately 0.187 (i.e., \(1{/}\theta\); where \(\hat{\theta } = 5.355\)). To make sense of this heterogeneity, consider Fig. 3, which depicts the estimated frailties \(\hat{Z}_{i}\) (i.e., empirical Bayes estimates; hollow bullets) from the model in Fig. 1 for the 3278 officers with valid risk intervals (see Table 1 and “Appendix 5”) in relation to these officers’ total number of “days of misconduct” experienced during the observation period. With respect to interpretation, a frailty indicates that were two officers to have identical observed covariate vectors and be observed over the same time period, the officer with the larger frailty would have a higher intensity of misconduct and thus more “days of misconduct,” where the frailty multiplicatively impacts the baseline intensity as in Eq. (1). For this particular sample of police officers, this impact is estimated to range, holding the covariates constant, from a roughly 20% reduction in the intensity of misconduct (i.e., zero total “days of misconduct” in Fig. 3) to a roughly 75% increase in the intensity of misconduct for those officers who repeatedly engaged in sanctioned misbehavior across the observation period (i.e., six total “days of misconduct” in Fig. 3). The positive relationship between the number of days that an officer engaged in misconduct that was ultimately sanctioned and their excess risk of future misconduct appears to be stark for our sample—a finding that is consistent with evidence recently presented by Donner (2018) suggesting a “state dependent” component of police behavior whereby prior misconduct shapes an officer’s propensity to misbehave in the future. Nevertheless, a strong association between the number of events and the frailties is to be expected (see Hougaard 2000, pp. 316–319) and the empirical Bayes estimates are all rather noisy, perhaps due to the few events per officer across the observation period (Mean = 0.323)—where the 0.025 and 0.975 quantiles of the Posterior Gamma Distribution of all estimated frailties \(\hat{Z}_{i}\) cross 1.00.

Fig. 3
figure 3

Officer-specific excess risk of police misconduct for the Dallas Police Department (2013–2014) based on the repeated-events survival model depicted in Fig. 1

Last, we assessed the sensitivity of our results to decisions around specification by estimating a series of additional models using permutations of the set of correlates in Fig. 1. Given limits on journal space, we present the figures depicting these ancillary models in the online-only Supplementary Information for our paper which is hosted alongside our data and code on the Open Science Framework: https://osf.io/9ypx6/. SI Fig. 1 and SI Figs. 2 through 6 respectively depict estimates from the aforementioned age-adjusted model and the aforementioned models using Decayed Cumulative Calls with Deviant/Non-Deviant Colleagues constructed with decay factors ranging from 10% to 50%. SI Fig. 7 depicts estimates from a model fit using Decayed (5%) Cumulative Calls with Deviant/Non-Deviant Colleagues and that adjusts for no social ecology indicators, which could moderate the impact of race/ethnicity on the intensity of misconduct as non-White officers may be disproportionately assigned to communities more conducive to misbehavior—namely those that are disadvantaged and that feature more crime (White and Kane 2013). SI Fig. 8 depicts estimates from a model fit using Decayed (5%) Cumulative Calls with Deviant/Non-Deviant Colleagues, the controls in Fig. 1, and only those risk intervals for the days on which officers respond to one or more 911 calls for service of any severity, size, and length (i.e., only those risk intervals where Non-Patrol Day = 0). SI Figs. 9 and 10 depict estimates from models that take an exposure-window-based approach whereby Decayed (5%) Cumulative Calls with Deviant/Non-Deviant Colleagues is replaced with the temporally lagged rolling sum of calls with deviant/non-deviant colleagues over the past three days (i.e., \(t_{ - 1}\) to \(t_{ - 3}\)) and over the past seven days (i.e., \(t_{ - 1}\) to \(t_{ - 7}\)). Finally, SI Figs. 11 and 12 depict results from models wherein we disaggregate our main outcome variable into binary indicators for external (i.e., civilian-facing) and internal (i.e., department-facing) misconduct.

Results using the various alternative model specifications are generally similar to those seen in Fig. 1, where none provide evidence in support of our two hypotheses. Note that in the models for which we disaggregate our dependent variable to distinguish between external and internal misconduct (SI Figs. 11 and 12), the association between race/ethnicity and the intensity of misconduct only persists for disciplined misconduct based on internal allegations (i.e., \(\hat{\beta }_{{{\text{Black}}\;({\text{Internal}} \;{\text{Misconduct}})}} = 0.349\), \(p\) \(< 0.005\); \(\hat{\beta }_{{{\text{Black}}\;({\text{External}}\;{\text{Misconduct}})}} = - 0.084\), \(p\) \(= 0.715\); \(\hat{\beta }_{{{\text{Hispanic/Latino/Spanish}}\;({\text{Internal}}\;{\text{Misconduct}})}} = 0.320\), \(p\) \(< 0.005\); \(\hat{\beta }_{{{\text{Hispanic/Latino/Spanish}}\;({\text{External}}\;{\text{Misconduct}})}} = - 0.149\), \(p\) \(= 0.574\)). That is, it is only for internal forms of misconduct—the most common of which include behaviors such as violation of sick leave policy, violation of off-duty employment policy, and absence without official leave—and not external misconduct (e.g., threatening statements and illegal searches) that we observe an association with race/ethnicity. As stated above, findings may result from Black and Hispanic/Latino/Spanish officers being: (1) more prone to misconduct; (2) assigned to areas more conducive to misconduct; (3) less likely to have their minor forms of misconduct hidden by a “code of silence;” and/or (4) held to a higher standard of conduct compared to their White counterparts. Regardless, our ancillary models suggest that the race/ethnicity of the officers in our sample is not associated with misconduct stemming from civilian-facing allegations—although note that the number of officer-day observations that see external misconduct is tiny at 115 compared to the number of officer-day observations involving internal misconduct (i.e., 964).

Finally, regarding model diagnostics, the models presented in Fig. 1 and in SI Figs. 1–12 all have p values for the global test of the proportional intensity assumption (i.e., Schoenfeld residuals compared against the Kaplan–Meier transformation of time) that are greater than 0.05. Save Age in the model using all of our covariates (SI Fig. 1), Non-Patrol Day in the external-misconduct model (SI Fig. 11), and Ethnicity: Asian in the internal-misconduct model (SI Fig. 12), covariates in our models all have p values for the effect-specific tests of the proportional intensity assumption that are all greater than 0.05. Note that results and p values for the tests of the proportional intensity assumption are not reproduced in Fig. 1 and SI Figs. 1–12 and may instead be accessed via our entire “R” workspace uploaded to the Open Science Framework: https://osf.io/g93m7/.

Discussion

From a policy standpoint, understanding if and how misconduct spreads between members of a police organization is vital for determining how to curtail police deviance and thus the most efficacious path to tangible police reform. Accordingly, here we have probed whether police misconduct might be “contagious,” such that a propensity to engage in deviant behavior increases through direct interaction with officers who have themselves engaged in misconduct in the past. In doing so, we have tested for the operation of a micro-level relational mechanism reflecting our supposition that police deviance is the result of differential exposure to errant colleagues in line with the dynamic structure of the department-spanning social networks within which members of a police force are routinely embedded. Critically, this mechanism is in agreement with cultural explanations of police behavior and the thrust of prior applications of social learning theory and social control theory to the study of police deviance—both of which underscore the diversity of officers’ dispositions, experiences, social ties, and social interactions, and thus their behavioral outcomes (e.g., see Campeau 2015; Herbert 1996; Ingram et al. 2018; Ouellet et al. 2019; Paoline 2003; Paoline and Gau 2018; Quispe-Torreblanca and Stewart 2019; Wood et al. 2019). Nevertheless, empirical support for this relational mechanism was found to be lacking. Specifically, results from our case study of sanctioned misbehavior and ad hoc workplace collaboration amongst 3475 uniformed members of the Dallas Police Department provided no evidence to compellingly suggest the “contagion” of police misconduct. Rather, the observed and unobserved traits of individual officers—i.e., their tenure, disciplinary history, individual proneness, and, in some cases, their race—appear to have the clearest association with whom ultimately steps out of line.

Do our results, then, provide support for “bad apples” theories of police behavior which emphasize correlates of misconduct such as specific personality traits (e.g., authoritarianism) and a lack of self-control (see, e.g., Donner 2018; Donner et al. 2016a; Worden 1996)? Again, our findings do point to several individual-level correlates of misconduct as well as unobserved officer-specific heterogeneity in misconduct, even after adjusting our models for factors related to the locations wherein officers respond to 911 calls. Nevertheless, the individual-level associations we observe do not necessarily indicate “bad apples” in the sense that our findings are wholly driven by officers who are psychologically or temperamentally predisposed to engaging in misconduct. This is because some portion of the observed associations between officers’ traits and misconduct may still be explained by unmeasured or unmeasurable factors (e.g., omitted variables related to situation or social context).

For instance, regarding racial/ethnic differences in officer behavior, Ba and colleagues (2021, p. 696) recently stressed that “rigorous evaluation of the effects of police diversity has been stymied by a lack of sufficiently fine-grained data on officer deployment and behavior that makes it difficult or impossible to ensure that officers being compared are facing common circumstances while on duty.” While we adjusted our models for the characteristics of the neighborhoods wherein officers responded to calls for service, our data are limited as we lack the kind of granular information on daily assignments and officer deployment that Ba et al. use in their study of the Chicago Police Department. Indeed, Ba and colleagues (2021, p. 699) were able to compare the behavior of Black versus White officers “given the same patrol assignment, in the same month, on the same day of the week, and at the same shift time.” Thus, in addition to the differential standards of conduct mentioned in the Results section, the associations between race/ethnicity and misconduct that we observe may in part stem from Black and Hispanic/Latino/Spanish officers being formally assigned to duties that tend to place them in situations that make misconduct more likely (e.g., patrol duty, in contrast to community policing or traffic duty) or assigned to certain shifts of work (e.g., night) or to days of work (e.g., weekends) that produce more situations conducive to misconduct.

Similarly, research suggests that less-experienced officers tend to be more active, patrol more aggressively, and initiate more citizen contacts than more experienced officers (see Worden 1996 for a discussion). Accordingly, that less-experienced officers in our data are associated with a greater intensity of misconduct may be a byproduct of these officers being in more situations conducive to misconduct rather than these officers being “bad apples” in the sense that they are inclined to disregard departmental policies. Moreover, as one of the lessons that officers learn in training and from their peers is how to “lay low” in order to avoid undue attention from superiors (see Paoline 2003), more experience yields a greater ability to evade situations conducive to misconduct as well as a greater understanding of how to conceal one’s misbehavior if desired.

This latter issue about concealing behavior underscores the limitations of relying exclusively on procedurally generated administrative data to study the behavior of police. Although our approach has allowed us to analyze official records of conduct for a substantial number of officers, we have zero control over data quality such that we may be missing unrecorded incidents of police misconduct—especially as our analyzed data only concern officially reported and formally sanctioned deviance. Of course, this lack of control is inherent to any study relying on documents and datasets created by third parties such as government agencies and private firms. Nevertheless, use of department records is particularly problematic to the extent that police sub-culture is characterized by codes of silence whereby deviance is under-reported (Cancino and Enriquez 2004; Ivković 2003). Furthermore, our use of department records is extra troublesome to the extent that misconduct is systematically overlooked for the purposes of compiling official records—particularly if, hypothetically speaking, the DPD had a culture of “noble” deviance (Punch 2000; Wolfe and Piquero 2011) during the study period whereby rules were routinely “bent” or widely flouted in the interest of, for example, “better” policing outcomes (e.g., an arrest) or bolstering officer safety (e.g., use of emergency lights, speeding, and brandishing weapons in association with putatively mundane incidents; see Sierra-Arévalo 2021).

Ultimately, what we can say given our analysis is that for the particular type of social tie examined in this study—i.e., collaborative interactions formed through joint responses to 911 calls—we do not find evidence suggestive of the “contagion” of police misconduct. In practical terms, what solutions should then be pursued to address police deviance? Keeping top of mind that our study is observational and based only on two years of data from just one large U.S. police department, results suggest that interventions focused on individual officers, including the termination of repeat offenders, may be fruitful for curtailing police malfeasance irrespective of the social connections that deviant officers have to other members of a police force. Indeed, recent work by Rozema and Schanzenbach (2019) focused on the behavior of thousands of police officers, detectives, and sergeants employed by the Chicago Police Department (CPD) indicates that the vast majority of these individuals were not problematic. That is, Rozema and Schanzenbach (2019) find that just 1% of the CPD employees (120 in total) received the bulk of citizen complaints and generated the costliest damage pay-outs in litigation—circa $6 million between 2009 and 2014 (not including legal fees)—leading the authors to also conclude that interventions should be concentrated on the very worst offenders. And, as our results suggest that a history of sanctioned misconduct fuels future misbehavior (see also Donner 2018), early interventions focused on new offenders may be key to avoiding the escalation of deviance (Ridgeway 2018) and the formation of the egregious offenders observed by Rozema and Schanzenbach (2019)—especially if the relatively minor forms of police deviance examined here are precursors to more dangerous behavior such as officer-involved shootings (Ridgeway 2016; see also Punch 2000, pp. 315–317). That said, the efficacy of removing problem officers is not without debate, as evidenced by a recent lively exchange between Chalfin and Kaplan (2021) and Sierra-Arévalo and Papachristos (2021). Specifically, whereas evidence originally presented by Chalfin and Kaplan (2021) suggests that incapacitating “bad apples” within a police department will have modest effects on deviance, Sierra‐Arévalo and Papachristos (2021) revisit Chalfin and Kaplan’s (2021) findings to instead demonstrate notable reductions in misconduct when removing problem officers should network spillover in police behavior occur.

To conclude, we return to the idea that whether police misconduct is found to be “contagious” may depend on the nature of the measured relationship between the officers of interest. Thus, we end by raising the possibility that “contagion” may indeed play out across the DPD, but that the formal on-the-job interactions we have studied here are simply not conduits for social influence despite our opening assertion that they should reinforce and encourage the informal intra-force social ties which, based on prior research (e.g., see Gallupe et al. 2019; Paluck et al. 2016; Ragan et al. 2014), are likely implicated in workplace social learning (e.g., friendship, advice giving). Two considerations lead us to consider such a conclusion.

First, recall that collaborative partners for the purposes of responding to 911 calls for service are, in many respects, forced upon each DPD officer such that these individuals are likely to have little to no control over who they get to work with. As discussed in detail in “Appendix 7,” homophily (i.e., choice of social contacts based on similarity stemming from factors such as race, taste, and personality) seriously confounds studies of peer influence (Shalizi and Thomas 2011). Yet, the workplace social contacts that officers actively choose may be far more relevant to their behavior. And, as Doreian and Conti’s (Doreian and Conti 2017; Conti and Doreian 2010) ethnographic research on the formation of a friendship-like network inside a U.S. police academy demonstrates, members of law enforcement may clearly prefer social relationships with some colleagues over others, despite the overall number of connections between members of a force being large (i.e., high network density).

Second, both Ouellet et al. (2019) and Quispe-Torreblanca and Stewart (2019) find that officers exposed to colleagues who have engaged in malfeasance are indeed more likely to engage in improper behavior. Recall that the manner in which social ties are drawn between police in both of these studies differs starkly from our own. Again, we have given primacy to direct, ad hoc collaborative interactions for the purposes of 911 call response which make up an important component of routine law enforcement. On the other hand, Ouellet et al. (2019; see also Wood et al. 2019; Zhao and Papachristos 2020) analyze co-offending relationships (i.e., whether Chicago police officers are co-named in a formal complaint), while Quispe-Torreblanca and Stewart (2019) focus on connections reflective of formal organizational hierarchy (i.e., whether London police officers are in the same “peer group” simply defined as being “assigned to the same line manager”). Although both pioneering studies are clever in their designs, they respectively rely on: (1) deviant relations amongst a subset of deviant officers who are perhaps predisposed to stepping out of line, excluding relations these individuals have to non-deviant peers; and (2) institutional connections that may or may not reflect meaningful direct interaction. Indeed, in designing our analysis, we have heeded the recommendations of the authors of these pioneering studies to examine the broad social structure of police departments by analyzing officers’ behavior in relation to ties to both deviant and non-deviant colleagues (Ouellet et al. 2019, p. 18; Wood et al. 2019, p. 14).Footnote 14

Still, the differences in our approach and that of Ouellet et al. (2019) and Quispe-Torreblanca and Stewart (2019) raise the possibility that our conflicting findings are attributable to differences in how officers’ social relationships are measured as well as our analysis of deviant officers alongside non-deviant officers. And our research is not meant to be an exact or conceptual replication of the studies of Quispe-Torreblanca and Stewart (2019) and Ouellet et al. (2019) due to the substantial differences between the three studies with respect to relational data, dependent variable, length of time, setting, and methodology—where future research should aim to replicate and build upon all three studies.

All in all, there are multiple ways to define and operationalize the social relationships between members of a police force that are plausibly germane to: (1) building a greater understanding of how officers might influence one another; and, more broadly, (2) clarifying the behavioral implications of police sub-culture. However, future network research on police deviance ought to make a special effort to assess whether the theoretically and empirically consistent micro-level relational mechanism we have advanced here sees support when emulating Doreian and Conti (Doreian and Conti 2017; Conti and Doreian 2010) and Ouellet et al. (2020) by analyzing data on informal, voluntary intra-force social ties such as “friendship,” who officers prefer to “hang out with” off-duty, and who officers explicitly “look to for advice and guidance” (McNulty 1994; Paluck et al. 2016). Of course, the administrative data used by Ouellet et al. (2019), Quispe-Torreblanca and Stewart (2019), and ourselves represent easy ways to measure social networks that span an entire police force and thus model the behavior of officers at scale. Yet, a shift in focus to informal face-to-face social relationships measured in the most rigorous manner (see Lee and Butts 2018) using standardized survey prompts will enable future studies on the interplay between intra-force networks and police deviance to be more readily compared and synthesized in meta-analyses (e.g., see Gallupe et al. 2019 on adolescent friendship and deviance) in order to move towards definitive and actionable conclusions about police malfeasance.Footnote 15 Furthermore, a focus on intra-force networks composed of officers’ informal, voluntary relationships will allow researchers to capitalize on models developed by sociologists for the coevolution of networks and behavior (see Greenan 2015; Gallupe et al. 2019; Snijders 2017)—arguably the “gold standard” for observational studies of peer influence via face-to-face networks—and, when appropriate, employ techniques to assess causality (see Aral and Nicolaides 2017; Eckles and Bakshy 2021; Shalizi and McFowland III 2018; and “Appendix 7”).