A simple checklist, that is all it takes: a cluster randomized controlled field trial on improving the treatment of suspected terrorists by the police

When it comes to interviewing suspected terrorists, global evidence points to harsh interrogation procedures, despite the likelihood of false positives. How can the state maintain an effective counterterrorism policy while simultaneously protecting civil rights? Until now, the shroud of secrecy of “national security” practices has thwarted attempts by researchers to test apparatuses that engender fair interrogation procedures. The present study aims to test one approach: the use of a “procedural justice checklist” (PJ Checklist) in interviews of suspected terrorists by counterterrorism police officers in port settings. Using a clustered randomized controlled field test in a European democracy, we measure the effect of implementing Procedural Justice (PJ) Checklists in counterterrorism police settings. With 65 teams of officers randomly-assigned into treatment and control conditions, we compare post-interrogation surveys of suspects (n = 1418) on perceptions of legitimacy; obligations to obey the law; willingness to cooperate with the police; effectiveness of counterterrorism measures; distributive justice; feelings of social resistance to the state; and PJ. A series of multi-level linear, logistic, and ordered logit regression models are used to estimate the treatment effect, with Hedges’ g and odds ratios used for effect sizes. When compared with control conditions, implementing a policy of PJ Checklist causes statistically significant and large enhancement in all measured dimensions, including the willingness of suspects to obey the law (g = 1.022 [0.905, 1.138]), to cooperate with the police (g = 1.118 [0.999, 1.238]), distributive justice (g = 0.993 [0.880, 1.106]), effectiveness (g = 1.077 [0.959, 1.195]), procedural justice (g = 1.044 [0.930, 1.158]), and feelings of resistance towards the state (g = − 0.370 [− 0.259, − 0.482]). PJ checklists offer a simple, scalable means of improving how state agents interact with terrorism suspects. The police can use what is evidently a cost-effective tool to enhance legitimacy and cooperation with the police, even in a counterterrorism environment.

In this study, we present the results of the first national test of one type of intervention: a prospective experiment in which counterterrorism officers were tasked with applying a procedural justice checklist during their engagement with suspected terrorists. This checklist is similar to those that have been applied by pilots, surgeons, and NASA astronauts (see Gawande 2009). This reminder to comply with a rigid "code of practice" meant providing suspects with information about their arrest, allowing them to voice their concerns, and treating them impartially. The checklist was hypothesized to alter how suspects perceive interactions with counterterrorism officers across multiple dimensions, as well as their overall approval of policing.
What do we know about the treatment of suspected terrorists?
Research evidence shows that most people experience "morality silence" when asked how to respond to individuals suspected of terrorism (Kugler and Cooper 2010), i.e., a tendency to look the other way when the procedural rights of "enemies" are denied. These rightsranging from freedom from torture, to having reasonable access to legal aid through pretrial arrests or preventative detention, to respectful treatment by law enforcement officers-are commonly subverted worldwide (Kreps 2014). This moral and procedural ambivalence is most stark when it comes to the treatment of suspected and convicted terrorists (Gronke et al. 2010;Hewitt 1990). The greater people's sense of insecurity, the lower their support for civil liberties (Davis and Silver 2004;Viscusi and Zeckhauser 2003). Similarly, the indiscriminate violent acts perpetuated by terrorists severely provoke us, often leading to greater support for a "tougher hand" (Randahl 2018).
Given the above, it is unsurprising that police and security services use the full extent of legal measures to identify terrorists or gather intelligence about future terrorist attacks (Bamford 2005;Darmer 2002;Godsey 2005;Gross 2001). Further, in the context of an imminent attack, officials sometimes consider it justifiable to use more extreme approaches (Brecher and Brecher 2007;Dershowitz 2002). However, it becomes challenging to justify unusual techniques for the purpose of identifying a suspected terrorist. There are three objections commonly made to doing so, which we set out below.
First, applying tough or extreme measures is difficult to justify when the subject is merely a suspect or (ultimately) innocent (Curzer 2006). Law enforcement officers, like members of any profession, are fallible, and both clinical and actuarial assessments are prone to false positives and false negatives (Jonas and Harper 2006;Spinney 2010;Wainer and Savage 2008). Poor intelligence can lead to inaccurate identification of alleged terrorists, sometimes in large numbers (Chang 2011;Gross 2004) and can produce vast pools of "persons of interest" that present significant challenges for both police and security services. Law enforcement agencies should therefore apply clear codes of practice with regard to the fair treatment of suspects (Dotan 2004;Perito and Parvez 2014). A strong supporting argument for this practice is utilitarian: the more often that extreme measures are applied on false positives (i.e., arresting many but uninvolved members of the community), the less likely that community will be to support or assist with counterterrorism efforts ).
Second, obtaining information under duress or via manipulation or illegitimate means often leads to false information (Malloy et al. 2014;Sangero 2016). Following up on such leads means that actual terrorists remain at large and can continue carrying out attacks (Ginbar 2008). It is entirely conceivable that suspects will provide false information about themselves, or implicate the wrong persons, just "to give up some names", only as a way to stop the aggressive interrogation from continuing. Such false confessions and erroneous intelligence can hinder the investigation.
Third, there is a fundamental issue here: suspects should be treated professionally, respectfully, and without prejudice or bias (Grant 2007;Hirose 2014;Tankebe 2013;Watson 2019). Even if we assume that punishment for terrorism should be severe, the central rule in most democracy-based legal systems is that suspects should be treated fairly, no matter how heinous the crime of which they are suspected (McCrudden 2008). Until proven otherwise, the underlying assumption of law enforcement agencies must be that the suspect is innocent (Dotan 1999). We expect police and security services to handle such persons with respect and to accord them fundamental human rights (Rothe and Muzzatti 2004;Tyler and Wakslak 2004).
However, despite agreement about the need to treat suspects and the accused with respect (Obama 2010; Office of the United Nations High Commissioner for Human Rights 2008; Tyler 2012), the practice can fall far short of this aim. Evidence collected around the globe suggests that suspected terrorists are often treated disrespectfully or worse (Bonner 2013;Amnesty International 2017;Leslie 2017). Different ethnic, religious, or racial groups report in surveys that they experience discrimination, bias, and unfair treatment and are viewed as a security threat (Perry and Jonathan-Zamir 2014). Interestingly, the same groups of respondents often express strong and positive attitudes towards cooperating with the police and report terrorist threats to authorities (Hasisi and Weisburd 2011). What can be done to minimize the risk of improper conduct during initial interviews?
Scholars interested in changing these practices highlight the need for training (Williamson 2013), better recruitment processes (Perliger et al. 2009), increased legal oversight (Guiora 2005), videotaping interrogations (Drizin and Colgan 2000;Hyatt et al. 2017), and conducting public inquiries (Rehman 2007). However, there is a limited number of rigorous evaluations of counterterrorism strategies, and virtually none concerning matters that take place during interviews. A systematic review of counterterrorism interventions detected only seven studies that used at least moderately rigorous scientific methods (Lum et al. 2006). The shroud of secrecy that characterizes counterterrorism strategies limits access to field settings and data, let alone carefully controlled tests of tactics (Langley 2014). We know of no randomized controlled trials that have attempted to modify the behavior of interviewers and detectives as they interact with suspected terrorists.
One recent approach that has been suggested as a tool for changing how policing practice is conducted is "procedural justice" (Tyler 2006). This term refers to perceptions of the fairness of processes that police officers-or, more broadly, legal authorities-use when they interact with citizens (Jonathan-Zamir and Harpaz 2018). Surveys in various countries have shown that, in forming their views about the legitimacy of police authority, citizens sometimes focus more heavily on the quality of their interactions with the police and much less on the outcomes received from the police (Sunshine and Tyler 2003;Tyler 2006;Tyler et al. 2010). If citizens feel as though they were treated with dignity and respect and if they perceive the officer's decision as fair, they will trust the officer's decision (Tyler and Bies 1990) and feel a greater obligation to obey the law (Tyler and Wakslak 2004). Furthermore, research on police-citizen contacts involving the use of procedural justice shows a link to enhanced quality of police-citizen interactions, with citizens reporting greater satisfaction with the interaction (Mastrofski et al. 1996;McCluskey 2003;Reiss 1971;Reiss and Farrington 1991;Tyler and Fagan 2008), willingness to comply with police directives (Tyler and Bies 1990), and willingness to cooperate with the police (Murphy et al. 2008;Sunshine and Tyler 2003;Tyler and Wakslak 2004;Tankebe 2013).
Procedural justice can be contextualized more broadly within legitimacy theory. Legitimacy in law enforcement refers to the moral right of the state and its actors to govern. As such, it is multidimensional (Bottoms and Tankebe 2012). In the context of counterterrorism interviews, this intricacy is even more pronounced. If the police investigation should be viewed as "legitimate," for whichever purpose, then we would require the police officer to behave lawfully, to apply fair procedures when dealing with people, and to produce outcomes that are equally distributed between individuals of different backgrounds, but also to be effective in its mission (see review in Ariel et al. 2020).
A crucial gap in the literature is methodological: The majority of evidence on procedural justice and legitimacy is observational in design (Murphy et al. 2008;Tyler et al. 2014). A small number of studies applied randomized controlled trial methodologies to assess the effects of police behavior on legitimacy outcomes. These include the Queensland Community Engagement Trial in Australia (Mazerolle et al. 2013a) and the Scotland Community Engagement Trial (MacQueen and Bradford 2015), which involved use of "procedural justice scripts" by officers conducting breath-testing at roadblocks. The Australian experiment indicated that the procedurally just traffic stops improved the citizens' perceptions of both the individual officer and the entire police department (Mazerolle et al., 2013). Notably, however, in the Scotland study, citizens' perceptions of police legitimacy diminished with procedurally just police contact (MacQueen and Bradford 2015; but cf. MacQueen and Bradford 2017).
Similarly, Ariel et al. (2020) and Demir et al. (2018) used body-worn cameras to increase the application of procedural justice by traffic officers in Uruguay and Turkey, respectively. In the Uruguay study (Ariel et al. 2020; see also Mitchell et al. 2018), drivers who interacted with officers using body-worn cameras were more likely to report that they were listened to before the officers made decisions about their case, were treated with respect, and felt that officers expressed sincere concern for their wellbeing. They were also more likely to say that the traffic officers demonstrated neutrality in handling their cases. Collectively, the studies indicated significant improvements across various dimensions of legitimacy when officers using body-worn cameras issued tickets to traffic violators; importantly, these enhancements were linked to increases in perceived procedural justice, compared to control conditions (see Ariel et al. 2020).
How can procedural justice be increased during police interviews?
Making abstract concepts like "fairness," "legitimacy," "respect," and "fair treatment" actionable and measurable has been challenging. We know that samples of Muslims in New York City (Tyler et al. 2010) and London (Huq et al. 2011), Arab Israelis in Israel (Hasisi and Weisburd 2011;Perry and Hasisi 2018), and passengers in U.S. airports (Lum et al. 2015;Sindhav et al. 2006) are concerned about procedural justice when it comes to counterterrorism. However, how to operationalize these concepts is unclear. The change is not necessarily legal, because in most democracies the lawfulness of counterterrorism is not as much a concern as its practice-i.e., black letter laws often prohibit disrespectful and unfair treatment of citizens (Clayton et al. 2000). In fact, a review of standard operating guidelines in law enforcement agencies worldwide shows that the manuals are, prima facie, within the rule of law-i.e., fair and unbiased-and the issue has always been in the application of these concepts (Schmid 2013).
Furthermore, the evidence regarding what works to increase the use of procedural justice in policing is limited and often mixed. Research tends to show an initial effect of police training, but this effect decays over time (Antrobus et al. 2018;Murphy et al. 2014). Enhanced oversight to increase the use of procedural justice is promising (Hyatt et al. 2017), though not without a potentially negative effect on the motivation and productivity of police officers (Wallace et al. 2018;Ariel et al. 2020).
Borrowing from the well-known work of Gawande (2009), we propose a different approach: using a checklist to remind officers of their code of practice during interviews with the intention of improving procedural compliance. Trials in various professions that implemented a checklist policy have shown an increase in compliance with rules and procedures on critical issues (Hales and Pronovost 2006;Weiser et al. 2010). Pilots (Degani and Wiener 1993), surgeons (Treadwell et al. 2014), astronauts (Marshburn et al. 2003), and physicians (Huis et al. 2012) were asked to use checklists to increase cooperation with specialized yet critical guidelines, and the results were similar: enhanced services and professional delivery of outcomes.
If checklists are useful in promoting health and safety, could they also be used to remind officers to be courteous, fair, and unbiased towards citizens in high-stakes interviews? Some evidence suggests that different forms of reminders are effective compared with no-reminder conditions (Langley 2014;Mazerolle et al. 2013a;Mitchell et al. 2018;Sahin et al. 2017). However, whether a checklist will successfully increase the fair and just treatment of suspected terrorists is presently unknown.

Methods
Field experiments in policing are still relatively rare, and controlled tests on counterterrorism are even rarer (Lum et al. 2006). We found no published trials applying more just or fair counterterrorism or counter-extremism processes. We were particularly interested in studies of counterterrorism powers that allow the police to operate with broad and intrusive powers to stop, search, and hold individuals suspected of being terrorists, as these powers are at the heart of potential human rights violations (Heath-Kelly 2012; Moran 2005). These laws provide security services with the discretion to stop, interview, or investigate individuals suspected of involvement in terrorism without the need for the same level of grounds required for "ordinary crimes" (see more broadly Ajzenstadt and Ariel 2008). However, we found no record of field evaluation with randomized controlled tests of these policies and practices.
Despite the lack of rigorous evaluations, in many countries, the use of such special policing powers is high at points of entry into a country, whether by sea, air, or land (hereinafter, "ports"). Counterterrorism officers are given the authority to detain and question any person, search them and their belongings, and take biometric information from them, usually without having to ask for permission. Further, it is an offense to remain silent in these interviews, and those being detained at the port have no right to a lawyer. These special powers apply when counterterrorism officers search for specific individuals based on certain intelligence upon the request of the security services (local or abroad). For example, the intelligence community might be aware of a suspected terrorist traveling between countries and alert the counterterrorism officers that a subject of interest is arriving. This has been the case with ISIS fighters who spent time in Syria and Iraq and are now returning to their home countries (Byman 2015). More broadly, the fact that some terrorists travel intercontinentally suggests that spatial clustering is modus operandi-specific (see Hasisi et al. 2019), indicating a wider range of terrorism threats and responses.
These powers to detain, intercept, or arrest can also be applied based on actuarial predictions and behavioral detection tactics, although these are likely to produce false positives (Mueller and Stewart 2016;Stewart and Mueller 2013). This is because identifying a person as a suspect at a port based on body language or overall demeanor and appearance is very difficult (Rae 2012). The likelihood of falsely identifying an innocent person as a suspected terrorist is thus very high in such settings. Take the United Kingdom as an example: The special powers to detain and question passengers in ports are found in Schedule 7 of the England and Wales Terrorism Act 2000 ('Schedule 7'), which is a national security port and border power that enables examining officers to stop, search, question and detain any individual traveling through a port. According to published information by the Home Office, 69,109 people were stopped and examined in 2011 and 2012 under Schedule 7 powers, of whom 681 were then detained. If one takes detention as a mark of success, the true positive rate is 0.0099% (Office for Security and Counter-Terrorism 2012). Given the seemingly small chance of apprehending a suspected terrorist without prior intelligence, and the likely high false positive rate, it seems that U.K. counterterrorism officers ought to treat people with whom they interact courteously, fairly, and respectfully, as they are likely to be innocent passengers rather than terrorists.

Study design
In the areas of counterterrorism, anti-extremism, counterinsurgency, and civil unrest, there are no rigorous evaluations using random allocation of treatments; see the debate over the use of these designs in counterterrorism in Laycock (2012) and Lum and Kennedy (2012), as well as Lum et al. (2006, p. 510-512). Other designs in this area are asymptotically able to infer causality like randomized controlled trials-see, e.g., agent-based modeling (Andrighetto et al. 2019). However, the present study conditions lent themselves to a controlled test of the effect of procedural justice on various outcomes associated with legitimacy. As detailed herein, the random allocation of units was both feasible and ethical, and the assumptions of randomized experiments were met-for example, having a sufficiently large sample size, controlling for self-selection biases, maintaining treatment integrity, avoiding crossover contamination, and the availability of valid and reliable measurable outcomes (see review in West 2009).
This study was designed as a cluster-randomized controlled field trial to estimate the effect of a procedural justice checklist (PJ Checklist) on multiple outcomes compared with controlled conditions. We collaborated with 21 ports in this European democracy in which 65 teams were the units of randomization (hence the cluster component). We applied PJ Checklists in about half of the teams, while control teams proceeded with "business as usual." There was no spillover of cases, officers, or spatiotemporal interactions between the treatment and control teams. All officers in these clusters took part in the experiment (either as control or treatment), with no exclusions.

Settings
The experiment was initiated by the research team, who contacted the country's relevant counterterrorism authorities with a request to conduct the study. The 21 ports in this European democracy participated in a 12-month experiment beginning in March 2016, following a request sent to all sea, air, or land ports across the nation (over 100 in total). The approach to the ports with an invitation to participate was made by the country's national counterterrorism headquarters. Table 1 shows the pre-intervention comparability of participating teams and ports. None of the pre-intervention comparisons suggests a difference between clusters, meaning that randomization appears to have been successful in creating comparable ("balanced") treatment and control groups.
The study was conducted in sea, train, and international ports that handled approximately 1.5 billion passengers during the year prior to the experiment. The 65 teams were comprised of 451 counterterrorism officers who conducted thousands of interviews per year, based on intelligence from counterterrorism agencies and a range of counterterrorism investigation tactics, including behavioral profiling (Dowden et al. 2007;Fox and Farrington 2018). Counterterrorism officers monitor passenger movements and operate searches in cordons in embarkations and disembarkation areas. These teams, along with security services, are principally responsible for the exercise of counterterrorism powers. To emphasize, these police officers make no other contact with citizens other than dealing directly with suspected terrorist and extremist threats.

Population
Our population consisted of clusters of counterterrorism officers (i.e., teams) who stop and search people suspected of involvement in the commission, preparation, or instigation of acts of terrorism. During the experimental period, 1418 subjects of interest underwent counterterrorism procedures at the participating ports and were included in the study (23.6 subjects per each treatment cluster and 19.8 subjects per control cluster; χ 2 = .365, p < .10). Key demographic indicators of the suspects are listed in Table 2.
None of the baseline comparisons yielded any clinically significant difference. Running  Moher et al. 2012;Senn 1995).

Random allocation
Randomization was carried out by the research team, not the treatment delivery team. We used simple random assignment using the Cambridge Randomizer . Teams within ports were the unit of allocation, with outcomes collected at the level of suspect interviews, clustered within teams. When randomizing teams within ports, one concern was violation of the stable unit treatment value assumption (SUTVA): "non-interference" between units, otherwise described as the outcome of one unit not depending on the outcome of any other unit (see Shadish et al. 2002). When there is "interference," treatment effects may be attenuated, meaning that the true impact of the independent variable on the dependent variable is masked to some degree ). All previous experiments on procedural justice, for example, suffered from this problem, which complicates causal inference (Nagin and Sampson 2018). To counter this risk, we randomly assigned entire officer teams in a cluster randomized controlled trial design (Raudenbush 1997). Treatment teams were separated from the control counterterrorism officer teams. Moreover, teams did not engage with one another, did not work on the same cases, had their own assigned line managers, and did not work at the same location as other teams. As such, likelihood of contamination is reduced.

Instrument and the experimental procedure
As reviewed earlier, the literature identifies four dimensions that express "procedural justice" in any interaction between police and the public (Gau 2014;Jonathan-Zamir et al. 2015;Tyler and Wakslak 2004): 1. a citizen's participation in the proceedings prior to an authority reaching a decision ("voice"); 2. neutrality of the authority in its decision; 3. whether the authority showed dignity and respect throughout the interaction; and 4. trustworthy motives.
We used one instrument-the PJ Checklist-to operationalize these four dimensions by converting the items into a checklist format (see Supplementary Materials A). Each item corresponded to one of these four elements of procedural justice, and we used multiple items for each dimension in order to improve the likelihood that the construct would in fact be implemented.
The same procedure was mandated in all ports. Before the experiment, the research team communicated with each cluster (i.e., team) and explained to all supervisors the purpose of the experiment and the procedures that were agreed upon with the counterterrorism headquarters (we did not experience any resistance from the local managers). We then met with all the treatment cluster officers, explained the provided guidance on the PJ Checklist and the rationale behind this project, and consulted with them on the feasibility of the experimental procedures. Modifications were therefore incorporated into the process prior to random assignment. The focus of these meetings was to obtain "buy-in" from the field counterterrorism officers, particularly about the potential benefits of using procedural justice in terms of increasing the perceived legitimacy of the passengers who undergo these interviews.
When counterterrorism officers profiled passengers as potential suspects (based on intelligence briefs or other investigative means), they would ask the suspect to accompany them to an interview room, where the suspect would be questioned with the goal of eliciting information from them. The counterterrorism guidelines in this country dictate that suspects are to be told explicitly that they are being screened for security reasons (as opposed to the possibility of being in possession of contraband, for example). As the process commenced, officers notified their duty supervisor and operations coordinator; the operations coordinators then registered the case on the data portal. The research team audited all case reference numbers against duty logs and found that all cases were delivered as assigned and that no registered case was excluded after treatment delivery.
We distributed paper copies of the PJ Checklists to the counterterrorism officers to use during their interactions with suspects. The implementation of the apparatus was mandated and managed by a local supervising officer in every treatment team; however, supervisors' presence in interviews varied across teams, ranging between 9% and 75% of interviews, with a 31% average presence across all teams. Every interaction was recorded, and a supervisor then signed off on the PJ Checklist completed by the officers. Subsequently, a PJ Checklist policy was applied to all of the treatment subjects, and no PJ Checklists were applied in control conditions (as these teams did not take part in the experimental procedure). We are unable to verify the implementation of the PJ Checklists beyond these self-reported cards and the validations of the line managers. We were not granted access to the interview room and/or any recordings made by the counterterrorism officers; the PJ Checklist formed part of the interview dossier, and we also therefore did not gain access to each PJ Checklist. We discuss this limitation below.
A senior national project led by the country's national counterterrorism headquarters was appointed to serve as the organizational interface with the research team and provide organizational support for the project. In addition, given the distance between the different ports nationwide, we employed local project managers who were vetted to scrutinize and collect data. Their task included creating workable databases that contemporaneously recorded each case that went through the experimental pipeline.
At the end of each interview (both under treatment and control conditions), subjects were handed an envelope by the leading counterterrorism officer, in which was included an invitation to complete a research survey (which had been translated into 11 languages through a professional translation company, including backtranslation). As shown in Supplementary Materials B, the survey did not form part of the police interview, and the police officer was not asked to stay with the subjects while they completed the survey. The invitation was always made after the interview was concluded, and per the University of Cambridge's ethical guidelines, consent was requested from all participants prior to the participation in the study; the completion of the survey was voluntary. Once the participant filled out the survey, they were asked to place the completed survey in a sealed envelope. The duty supervisor collected the sealed envelopes and sent all the surveys to the research team every 2 weeks. The research team then coded the data into Excel for analysis purposes. Over 12 months, 10,225 passengers were suspected of involvement in terrorism and interacted with the counterterrorism officers in the participating ports; of those, 1418 eligible subjects consented to complete the survey-a common level of response rate to surveys in the field of procedural justice research. For example, for mail surveys, Mazerolle et al. (2013a) and Sunshine and Tyler (2003) reported response rates of 15% and 22%, respectively; for telephone surveys, Skogan (2006), Hasisi and Weisburd (2014), Jonathan-Zamir et al. (2016), and Tyler (2006) reported response rates of 35%, 40%, 58%, and 63%, respectively. We found no clinically significant differences between responders and non-responders: The percentage of respondents who are not citizens of this European Democracy in the respondent and non-respondent groups were 41% and 45%, respectively; 19% versus 18% White suspects, respectively; and the mode year of birth was 1982 (SD = 13.1 years) and 1979 (SD = 10.6 years), respectively.

Treatment conditions
During every interrogation, the counterterrorism officer was required to use the PJ Checklist and tick the boxes as they went through the encounter. At the end of the interview, regardless of outcome (detention, arrest, or release), the officer gave the completed PJ Checklist to the duty supervisor, who would then deliver it to the research team.

Control conditions
Control teams applied a "business as usual" approach. This meant that interviews were conducted as they normally would be, without using the PJ Checklist. However, every counterterrorism encounter was recorded by the operations coordinator, and data about the encounter were recorded on the case portal.

Measures
We collected data for several dimensions. The measures were perceptions of distributive justice, effectiveness, obligation to obey the law, willingness to co-operate, and social resistance (for the latter, see Factor et al. 2011Factor et al. , 2013. We collected and report these separately because there is no agreement in the police legitimacy literature on the status of these variables. For example, some scholars operationalize legitimacy as the feeling of obligation the law (Tyler and Fagan 2008), while others view legitimacy as a multidimensional concept, comprising procedural justice, distributive justice, effectiveness and lawfulness (Tankebe 2013 Bottoms andTankebe 2012). The present experiment does not attempt to "settle the score." Instead, we provide treatment versus control comparisons across multiple measures, as all outcomes are potentially of interest to scholars in this field. Because the relevant theories all hypothesize outcomes in the same direction-improved perceptions of both the specific experience with the counterterrorism officers and global perceptions about counterterrorism and the law more broadly-the fundamental question then becomes one of magnitude, which we address in the analytical plan below. Overall, six outcome measures were used (but not combined, as they are likely to represent discrete theoretical dimensions).
Distributive justice This variable refers to the fairness of the outcome a person receives in comparison with what others receive (Bottoms and Tankebe 2017). We used four Likert-type items to address this dimension-"The police treated me the same way any other person would have been treated" (Sahin et al. 2017)-on a scale of 1 ("strongly disagree") to 5 ("strongly agree"). The higher the score, the greater the perception of distributive justice. The responses for the items were internally consistent (α = .91).
Effectiveness This dimension captures participants' feelings of effectiveness of counterterrorism, measured using multiple items with a Likert-type scale of 1 ("strongly disagree") to 5 ("strongly agree"): "The police are too soft on terrorism" (flipped); "The police are doing a good job in fighting terrorism." The higher the score, the more favorable the judgment of effectiveness. The collapsed dimension was sufficiently reliable (α = .71).
Obligation to obey the law This measured participants' sense of duty to obey the law. Three items were used, with responses ranging from 1 ("strongly disagree") to 5 ("strongly agree"). These were combined to create an "obligation-to-obey-law" scale, with a higher score reflecting a greater feeling of obligation (α = .96).
Willingness to co-operatewith the police This measured participants' intentions to support counterterrorism efforts through the supply of information to the police. Ten questions were used. The response categories were on a scale of 1 to 4. The responses were combined to create a willingness-to-cooperate scale (α = .98), with higher scores reflecting a greater intention to cooperate with counterterrorism efforts.
Social resistance This framework offers an addition to the procedural justice model: The notions of alienation and active social resistance that are central to the social resistance framework may play a role in the link between perceptions of procedural justice and non-compliance with the law or engagement in criminal behaviors (Factor et al. 2013). Because of discrimination, members of non-dominant minority groups may feel a lack of attachment to the country and alienation from the larger society. In response, members of these groups may actively engage in counterterrorism, extremism, or other deviant behaviors. By engaging in such behaviors, members of such groups express their willingness and ability to defy the country and the dominant group. To operationalize this framework, we used several items scored on a 1-5 Likert scale: (a) "I often find myself objecting to the symbols of my country (e.g. the flag, the national anthem)"; (b) "I disagree with the values that my country represents"; (c) "sometimes I am opposed to what my country represents"; (d) "it is okay for people who are in a difficult situation to occasionally disobey the law"; (e) "sometimes I get so frustrated I feel like damaging public property"; (f) "sometimes my economic and social status makes me want to show others that I am angry"; and (g) "sometimes I get so frustrated I feel like protesting to express my economic and social status." The items were internally consistent (Cronbach's α = .87).

Implementation check
While we are not able to ascertain the extent to which each item in the PJ Checklist was completed, we are, however, in a position to conduct an implementation check by measuring how participants experienced the independent variable-procedural justice-using several questions on a Likert-type scale (see Supplementary Materials B). Each PJ dimension included several items, which were all internally consistent based on Cronbach's alpha (voice (α = .98; (Mazerolle et al. 2012)); respect (α = .97; (Sahin et al. 2017)); trustworthiness (α = .98). Impartiality was measured with a single "flipped" item, which is consistent with other research in this area (Tankebe 2013). For each section of voice, respect, and trustworthiness, higher scores indicate a more favorable perception of the interaction. We used these scales to compare the treatment and control conditions on different dimensions of procedural justice.
In addition, we incorporated specific questions about procedural effectiveness-i.e., the extent to which participants recalled the efficiency of the encounter. Specifically, we asked (a) whether the counterterrorism officer introduced themselves, (b) whether the counterterrorism officer explained what legislation was being used, and (c) the participant's perception of how long it took for the interaction to be completed.

Statistical procedures
Baseline balance First, baseline characteristics, collected at the time of commencing the trial, were cross-tabulated according to the randomized cluster to assess balance and to provide an overview of the study population, both at the cluster and suspect levels (Tables 1 and 2).

Main outcomes
We ran a series of linear, logistic, and ordered logit multilevel regression models in Stata version 15.1 to estimate the treatment effect on the dependent variables (Hox et al. 2017). 1 The distribution of the outcome variables dictated the type of model we applied (linear, logistic, or ordered logit). Table 3 reports the raw means and standard deviations by treatment group for each outcome. As all analyses were multilevel models, this accounts for clustering of suspects by location or team, but we also estimated models with robust standard errors. We report exact p values or "< 0.001" where p values were given as "0.000" in raw output. Where multiple outcome testing was undertaken-for example, with procedural justice and subdomains-we report the p values and 1 As Wears (2002) commented, "any time there are multiple analytic options, there will be differences of opinion among statisticians about which choice is best, although most will admit that previous familiarity with a method plays a large role in their opinion. This uncertainty is compounded when there is not a "natural" choice for the analytic unit." (p. 337) We ruled out analyzing the data with the individual participants as the unit of analysis, as many experimenters agree that the allocation unit should be the unit of analysis (Murray 1998). include a calculation for the adjusted threshold for statistical significance using the conservative Bonferonni correction (see Abdi 2007;Sedgwick 2012).
To measure the magnitude of the differences between treatment and control conditions, we used two statistics. First, to estimate standardized mean differences and 95% confidence intervals, we used Hedges' g, given the unequal cluster sample sizes (Borenstein et al. 2011;Hedges 2007;Nakagawa and Cuthill 2007). The interpretation of Hedges' g (Hedges 2007) is the same as Cohen's d, meaning these results can be compared with previous police legitimacy experiments (Mazerolle et al. 2013b). We applied this statistic for all outcomes measured in a scale format. Second, we used odds ratios for measures that observed binary outcomes, specifically for the statements "[the] officer introduced himself" and "[the] officer explained what legislation is used." Table 3 lists the mean participant responses post-intervention across multiple outcomes. There are noticeable differences between treatment and control  clusters, all in the hypothesized direction: The application of a PJ Checklist led to higher scores on the scales for procedural justice, distributive justice, effectiveness, feelings of obligation to obey the law, willingness to cooperate with the police, and lower scores for social resistance. The same pattern emerges for specific items-procedural effectiveness and the various components of procedural justice-when measured independently, where the treatment group expressed more favorable views compared with the control group. For example, while over 90% of the treatment respondents reported that the counterterrorism officer introduced themselves and explained what legislation was used to take the respondent through a counterterrorism interrogation procedure, less than half of control participants reported the same. Of further interest is the perception of time: There were no differences between the treatment and control groups in terms of the perceived length of the interaction with the police officers, which suggests that applying a PJ Checklist does not result in the perception of a longer encounter. These differences are supported by the statistical results shown in Table 4. Across the board, the independent samples t tests are all highly significant (p ≤ .0001), and the coefficients are statistically different between groups, with relatively small standard errors. Table 4 further details the magnitude of these differences.

Main outcomes
Distributive justice Treatment participants perceived the outcomes they received in comparison with what others received as fairer than control participants (b = 1.072 [95% CI 0.818, 1.326]). The effect size was g = 0.993 (95% CI 0.880, 1.106).
Effectiveness This dimension captures participants' feelings about the effectiveness of counterterrorism policy. The findings suggest that treatment respondents did not perceive the officers and police as less efficient (b = 1.046 [95% CI 0.821, 1.271]). A similar magnitude of difference was detected for this outcome as for the other outcomes (g = 1.077 [95% CI 0.959,1.195]).
In terms of respondents' willingness to co-operate with the counterterrorism effort and their obligation to obey the law, large differences were reported between groups (g = 1.022 [95% CI 0.905, 1.138] and g = 1.118 [95% CI 0.999, 1.238], respectively). Read together with the group-level mean differences (Table 3), while the mean score for the control group ranged from 56 to 58, the treatment group's mean scores range from 81 to 87 (on a scale of 1-100)-more than one standard deviation apart.
Social resistance PJ Checklists reduced the expressed social resistance and willingness of treatment participants to engage in criminal or deviant behavior compared with control conditions (b = − 0.308 [95% CI − 0.519, − 0.096]), but had the weakest effect size (g = − 0.370 [95% CI − 0.259, − 0.482]). This result is probably due to the fact that respondents were less likely to express perceptions such as "I often find myself objecting to the symbols of my country" or to disagree with the core values that their home countries represent (see Table 3). Still, the prevalence of these resistance perceptions was not negligible, with more than a third of respondents expressing these views.
Perceptions of procedural justice PJ Checklists led to improved perceptions of procedural justice (b = 1.272 [95% CI 1.019, 1.525]), with a large effect size (g = 1.044 [95% CI 0.930, 1.158]). The same findings and effect sizes were found for each of the four dimensions that comprise this scale-voice, trustworthy motives, respect, and impartiality-as well as for the procedural effectiveness variables. When using PJ Checklists, officers are significantly more likely to introduce themselves to suspects and to inform suspects of their legal status.

Discussion
The secrecy that characterizes counterterrorism limits the scope of experimental research in this area of law enforcement (Lum et al. 2006). As noted previously, public safety concerns in which valid estimates of intervention effects could prove beneficial are precisely the situations in which evidence is scarce; this evidence paradox obstructs any attempt to robustly test the effectiveness of these policies. However, this study was able to produce some of the first rigorous evidence regarding whether it is possible to change the behavior of officers conducting counterterrorism interviews. This is a crucial turning point in the development of evidence-led policymaking in counterterrorism (Sanderson 2002;Sherman 2013). Our experiment illustrates that rigorous impact evaluations in this extremely sensitive area of criminal justice policy are possible, and we hope that replications and follow-up research will be conducted in the future to further assess the results presented here. Procedural justice is a key component of police legitimacy Tankebe 2012, 2017). We have shown here that PJ checklists offer a simple, scalable means of improving how state agents interact with suspected terrorists. The police can enhance people's obligation to obey counterterrorism laws and improve the flow of information, even in a counterterrorism environment, with what is seemingly a cost-effective tool: a checklist that takes very little time to complete. A straightforward reminder to treat suspects in accordance with the tenets of fairness, dignity, equality, and respect has resulted in large effects, in the magnitude of more than an entire standard deviation from the control group. We found improvements in every dimension measured. Specifically, terrorism suspects were given more information about their interrogation, were allowed to voice their concerns, were treated equally in comparison to other suspects, and reported higher overall legitimacy scores compared with control conditions. The checklist altered how suspects perceive both the "local" interaction with specific counterterrorism officers and "global" dimensions of counterterrorism policies.
Previous studies that sought to increase police legitimacy resulted in relatively modest effect sizes for procedural justice (Mazerolle et al. 2013b). The effect sizes reported in the present paper are substantially larger. This difference can be interpreted in two ways: methodological and theoretical. To our knowledge, this is the first cluster randomized trial in this area of research. Using a cluster-randomized design overcomes many of the threats to internal validity that occur in other randomized controlled trial designs . For example, minimizing contamination through design means that we have avoided problems that have affected other PJ experiments-e.g., the same officers being in both treatment and control conditions (Langley 2014;Mazerolle, Antrobus, et al., 2013). Through minimizing this particular threat to validity, a cleaner treatment/control comparison is possible.
Second, and more conceptually (but with a clear implication for public policy), procedural justice is important in every type of law-enforcement procedure (Nagin and Telep 2017a;Tyler et al. 2014), specifically when dealing with individuals suspected of terrorism. Participants in this study experienced what must have been an emotionally and psychologically intense event. In the control group, the overall scores were far from "positive perceptions." For example, as shown in Table 3, the non-PJ Checklist group expressed overall negative views; without the checklist, most suspects did not perceive the interrogation as fair or legitimate. We can only speculate as to how such negative experiences with the police are reported in the interviewee's community, thereby perpetuating a vicious cycle of perceived illegitimacy (Skogan 2006). However, once the officers were "nudged" to be more procedurally just, suspects subsequently reported these interactions as more fair, just, and effective.
Democracies have the inalienable right to defend themselves to the fullest extent of the law. At the same time, however, even in the context of counterterrorism and extremism, societies should not look the other way when fundamental human and legal rights risk being sidetracked in the name of safety and security (Zedner 2003). As the police often rely on the public to report threats or cooperate with counterterrorism efforts, the police need to be viewed as wielding their state-bestowed powers legitimately (Hough et al. 2010). As such, it seems sensible to conduct interviews with terrorism suspects in a way that engenders the highest possible levels of perceived fairness and procedurally just conduct. Failing to do so means that the rights of innocent citizens may be routinely violated (Luban 2002), which may in turn affect wider community perceptions of police and policing legitimacy (Tyler 2012). While we have no evidence that a PJ Checklist policy leads to more general legitimacy, manifested as more cooperation or less extremism (see Nagin and Telep 2017b), this does not mean that the goal of changing perception is unimportant. 2 Legitimacy is one of the "ultimate values" by which policing is judged, and it creates a potential "cushion of support" for police in times of difficulty, such as terrorist attacks (Rasinski et al. 1985). If treated unfairly, why would communities cooperate? If counterterrorism interrogations apply extreme measures and violate human rights, who would dare report a suspect? These very tactics can drive some elements into extremism, in what can only be viewed as a backfire of a counterterrorism policy (Chalk 2017). Our study shows that a simple checklist may make a disproportionately large contribution to enhancing a range of perception associated with legitimacy in one of the earliest phases of the counterterrorism interview process.
Given the evidence presented here, a practical policy implication can be identified: incorporating a PJ Checklist should be part and parcel of counterterrorism policing's standard operating procedure. The approach is inexpensive, and the PJ Checklist can be adapted for and tested in other jurisdictions, practices, or needs. We would also argue that if checklists are useful for counterterrorism, they are likely to be valuable in other contentious law-enforcement encounters: stop-and-accounts, searches, and criminal interviews more broadly. However, replications, preferably in diverse samples, are still required to confirm this.
At the same time, if officers will not adhere to the code of practice which serves as the foundation for the PJ Checklist, and if line managers will not interject when the checklists do not serve their purpose, the PJ Checklist policy will be "toothless" (Ariel 2012). An organizationally-led implementation policy is necessary for the rules to be followed systematically and continuously, with clear guidelines for taking steps to deal with non-compliance (Fixsen et al. 2009).
A PJ Checklist is not a salve for egregious violations of suspects' human rights. In places where outright torture is practiced, a policy of ticking boxes will not matter much. However, our study takes place in a European democracy that, broadly speaking, has a demonstrated record of securing human rights, like most of the developed The Organization for Economic Co-operation and Development's (OECD) countries. It remains true that, in similar countries, a constant clash exists between security needs and securing human rights (Hasisi and Weisburd 2011;Merin et al. 2015;Newheiser and DeMarco 2018). The counterterrorism code of practice expresses this dilemma, but it still requires that suspects be treated fairly. It turns out that officers need to be reminded to follow these rules, just like pilots, surgeons, and engineers. Therefore, for security agencies that express a genuine concern for due process, a little checklist can go a long way.
Notwithstanding these findings, we are cognizant that one of, if not the most, important pieces of the puzzle is missing: We did not have the opportunity to test actual cooperation with the police in its counterterrorism mission (see Nagin and Telep 2017a, as well as The National Academies of Sciences, Engineering, and Medicine 2018, on proactive policing for a broader critique). We have at least been able to show improvements in the willingness of those who interact with counterterrorism officers to act upon the tested reflections. Differences from control participants were not negligible, and treatment participants expressed a greater willingness to report various terrorism-related activities to the police, including different threats to national security. As important as these perceptions are in democratic societies (Tyler 2017), and notwithstanding the view that procedural justice ought to be practiced inconsequentially to the question of tangible benefits (Nagin and Telep 2017b), there is no evidence linking the PJ Checklist to behavioral manifestations. We did not gain access to operational intelligence data that links those who underwent a PJ Checklist treatment (or control conditions) to future reporting of any crime to the police or the counterterrorism information hotline. Future controlled tests in police legitimacy research ought to delve into these potential benefits in greater depth.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.