Introduction

Police forces in the UK, and in particular the Metropolitan Police Service (MPS), are facing an acute crisis related to the dissipation of public trust and confidence. More generally, a catalogue of actions by police within the UK and across the world have created a challenging environment for police officers to operate within. The death of George Floyd in May 2020 in the USA and the murder of Sarah Everard in March 2021 in the UK are two such watershed moments that have hurt public confidence in law enforcement. In particular, Sarah Everard’s death sparked significant public discussions regarding both violence against women and girls and the actions of law enforcement personnel in reducing these cases. Moreover, the culture of the MPS was called into question with the investigation in 2022 by the Independent Office for Police Conduct, Operation Hotton (Independent Office for Police Conduct (IOPC), 2022) of officers from the MPS Charing Cross Police Station that uncovered evidence of discrimination, misogyny, harassment, and bullying.

More recently, the MPS has been subject to a review into the standards of behaviour and internal culture, led by Baroness Casey. The contents of the report found the MPS to be institutionally racist, homophobic, and sexist (Casey, 2023). Importantly, the report makes specific mention of the MPS’ policing of London’s black community, stating “the black community are more likely to be stopped and searched, handcuffed, batoned and tasered, leading to generational mistrust within this particular community.” More generally, the report highlights the connection between stop and search and legitimacy in policing.

Similarly, a report by the Children’s Commissioner Dame Rachel de Souza (BBC News, 2023) raises concerns in relation to the strip searching of children, with 2847 children strip searched in England and Wales between 2018 and mid-2022. This report found that black children were six times more likely to be searched when compared with the overall child population.

The use of stop and search remains an enduring issue in the UK. This is particularly due to ongoing concerns about racial profiling and discrimination. Importantly, stop and search remains a primary tactic in the Metropolitan Police Service’s plan to tackle violent crime in London. Within the South Basic Command Unit (SN BCU), the power to stop and search was used on 22,505 occasions in the calendar year of 2021. This creates a significant resourcing challenge due to volume of the policing interactions that not only consume a considerable amount of operational time but may also impact local trust and confidence in policing if not undertaken both lawfully and proportionately.

This study examines the decision-making of police supervisors in relation to stop and search to identify the presence or absence of discriminatory behaviour among these key operational officers. More specifically, we utilize a vignette survey design, presenting 15 real-world stop and search examples from within the South Basic Command Unit policing area to 118 frontline uniformed supervisors (Sergeants and Inspectors) in the MPS. These examples were obtained from the MPS’ criminal intelligence database (CRIMINT). Importantly, we introduce a randomised characteristic assignment with relation to the ethnicity of the subject featured in the vignette to test the impact on officer decision-making when the suspect is black or white. Given the collection of both Likert scale and free text responses, a combination of descriptive statistics, inferential methods, and text mining is utilized. More specially, we employ summary statistics, t tests, descriptive text analysis, and sentiment analysis.

Research Question

This research seeks to identify variability in officer decision-making and looks to identify key characteristics that may have an impact on that decision-making. Moreover, this research will help to shape and develop the following key areas that are critical to policing: support the development of bespoke training in relation to stop and search, promote organisational inclusivity, build a more representative police force, and increase trust and confidence in policing. More generally, the results of this study will help to inform MPS policy on the use and capacity of stop and search as a crime reduction and intelligence gathering tool. Indeed, if the results suggest that officers are indeed relying on stereotypes and categories when making decisions about stop and search then additional bespoke training may be necessary to help officers recognise and overcome these conscious and unconscious biases. Additionally, policy changes could be implemented to ensure that stop and search procedures are conducted fairly and without discrimination.

Prior Research

While many police officers consider stop and search to be a necessary tactic to keep communities safe, others believe that it undermines the legitimacy of the police. As a result, this remains subject to intense scrutiny from the media, politicians, and the wider communities. Furthermore, while this paper will not comment on the effectiveness of stop and search, it is important to note that prior research (Miller et al., 2000) indicated that this tactic does not have a significant impact on overall crime rates. Nevertheless, stop and search does contribute to arrests for crimes that are detectable by searches and serves as a key intelligence gathering tool. There is also a notable difference in how police organizations employ the tactic, with some forces relying more heavily than others on stop and searches to make arrests.

Legitimacy in Policing

Police legitimacy is a concept that is well established in the UK and fundamental to “policing by consent”, which is crucial in a democratic society (Her Majesty's Inspectorate of Constabulary and Fire and Rescue Services (HMICFRS), 2017). Research to identify the drivers of police legitimacy have identified four key building blocks: procedural justice, distributive justice, effectiveness, and lawfulness (Bottoms & Tankebe, 2017). The use of stop and search touches on all four of these aspects. Procedural justice, defined as “the fairness of the process employed to reach specific outcomes or decisions”, includes the quality of both decision-making and the treatment of citizens (Bottoms & Tankebe, 2017). To achieve procedural justice, police organizations need to ensure that they recognize the rights of the individual, along with acknowledgement of their humanity and in doing so that individual feels valued. The second building block, distributive justice, emphasizes the importance of how individuals are treated by the police. This includes the over and/or under-policing of specific communities or groups of people. The third and fourth building blocks, effectiveness and lawfulness, relate to the public’s expectation of efficacious policing and the requirement of officers to act within legislative boundaries, respectively.

In a study examining the role of procedural justice and legitimacy in shaping public support for policing, Sunshine and Tyler (2003) utilised two surveys of New Yorkers to gain an understanding of how they judged the legitimacy of policing. The findings suggest that the fairness of the practices and tactics used by the police is the key driver of legitimacy. Interestingly, both white and non-white respondents indicated that public perceptions of fair police practices affected overall views of police legitimacy. A recent study (Murray et al., 2021) conducted in several cities across England and Wales found that stop and search not only eroded trust among adolescents but also resulted in increased criminal activity in specific cases.

Effects of Stop and Search

Extensive research has been conducted on both the effects and disparities of stop and search on both individuals and communities. Research utilising data from ‘Stop Watch’ and qualitative interviews with children searched demonstrated that the use of this power on children can result in severe emotional distress for the individuals involved and contribute to their engagement in criminal activities later on in life (Flacks, 2018). Another study (Hargreaves, 2018) examining the use of stop and search on the British Muslim community found statistical data to support claims that this practice was being applied in a disproportionate manner, particularly when compared to white Christian groups.

Legal Framework

In conducting research on stop and search, it is crucial to consider the factors that influence the decision-making of officers as well as the cultural and environmental context in which they work. The legal framework for stop and search requires officers to have a reasonable suspicion to utilise their legislative powers. However, “suspicion” is subjective and is affected by officers’ own judgement and operational context. While officers are trained in the legislation, many are unable to describe what reasonable suspicion meant in practice, with clear differences between officers on what would amount to reasonable suspicion (Quinton, 2011). This research found that when a Sect. 60 Criminal Justice Act authority was implemented which did not require reasonable suspicion (because a senior officer believed serious violence may take place in this location), an officer’s motivation to search became less focused on finding a weapon and more “reactive to the individuals’ behaviour towards officers”. This included threatening or abusive behaviour towards the officers or challenges to the officer’s authority. When considering operational settings, there were clear differences in police practice based upon the locality the officer operated in. For example, some police areas in London had a distinct performance culture while others were more focused on community policing. Quinton (2011) concluded that decisions to initiate encounters were based on broad generalisations and stereotypes based on memberships of defined social categories.

Cultural Dimensions

Police culture is comprised of two distinct working environments. There is an organisational environment which consists of interactions with supervisors and senior officers, and an operational environment that includes interactions with the public and is rife with the constant risk of physical danger. These two working environments are said to create considerable stress and anxiety for officers leading to a somewhat unique police culture (Terrill et al., 2003). Cockcroft (2007) highlights three potential descriptions of police culture:

  1. 1.

    A layer of informal occupational norms and values operating under the apparently rigid hierarchical structure of police organisations.

  2. 2.

    Accepted practices, rules, and principles of conduct that are situationally applied, and generalized rationales and beliefs.

  3. 3.

    A patterned set of understandings which help to cope with and adjust to the pressures and tensions which confront the police.

The extent to which officers adopt this culture will be evaluated through a vignette-based study that will measure their willingness to tolerate varying levels of suspicious circumstances. Using these vignettes, we seek to gain an understanding of the police culture within an MPS BCU, which will support the organisation with the development of tailored and specific awareness training.

Despite a significant body of research on the effects of stop and search on local communities, there has been limited critical analysis of the actual decision-making process of officers in relation to specific community groups. This research will contribute to the existing body of knowledge on stop and search and provide an evidence base for consideration by senior police leaders within the MPS. The research provides insights into the decision-making processes of police supervisors when conducting stop and search on members of the black community and recommend ways to improve the use of this policing tactic.

Data

The location of this study is the MPS South Basic Command Unit (BCU), which consists of the London Boroughs of Sutton, Croydon, and Bromley. This the fourth largest BCU in the MPS by officer count, with a population of just under one million, 52% of whom are members of minority communities (Office of National Statistics, 2021). In 2022, the South BCU undertook 16,950 stop and searches, which represented 9% of all stop and searches in the MPS. Importantly, drugs, weapons, and stolen property constituted 60.3%, 17.3%, and 13.5% of reasons for stop and searches, respectively. Furthermore, 72.3% of all stop and searches resulted in no further action being taken, with only 13.7% resulting in an arrest.

Study Design

This study utilised an online survey tool to provide vignette stories to test the variability in decision-making by supervisory sergeants and inspectors operating in the South BCU. The vignette approach was chosen as it offers a standardised approach to survey questions ensuring all participants are given the same situation and information thereby reducing variation in responses. These surveys involve presenting participants with a hypothetical scenario (vignette) which describes a situation or event, and asks the participant to provide their opinions, attitudes, or behaviours based on that set of circumstances. In short, vignettes also provide a degree of operational realism, making it easier for the participants to relate to the situation presented and make decisions in an environment that they are comfortable operating within. This serves to improve the overall quality of the data obtained. The use of vignettes also allows researchers a greater control of the variables being studied. In effect, this approach allows for the controlled isolation of specific factors and their corresponding impact on participant responses. In this case, we used black or white suspects in the same vignette description.

Focus on Supervisors

The data were generated from frontline supervisory ranks of sergeants and inspectors. Given the low average age and length of service of police constables in this police force (estimated to be under 2 or 3 years), we decided to focus on supervisors. The officers sought for the surveys had encountered many more situations that could justify stop and search than probationary and recent recruits, and would have made more decisions as to whether a stop was appropriate. Moreover, their responses indicate the content of what they might train younger colleagues to look for in deciding whether a stop meets the required legal thresholds.

Vignettes

To create an operational reality in the vignettes, the MPS Stop and Search database was searched to identify suitable real incidents to test the supervisors with. All incidents consisted of searches that had resulted in a negative outcome, that lacked strong recorded grounds, and had been supervised by a sergeant or inspector. These incidents were redacted to ensure no personal data remained and then incorporated into the online survey tool. It should be noted that the vignettes are relatively simplistic in design. However, that reflects the quality of the information provided by operational officers when completing a record of the stop and search. It was also felt that the vignettes needed to be simple to ensure that the survey could be completed within a reasonable timeframe without survey fatigue affecting the outcome of the respondents’ answers.

By ensuring respondent anonymity through an online survey tool, we believe the study encouraged responses that were open and honest reflections of the officers’ thought processes. To this extent, the use of real-world stop and search scenarios ensured that the vignette was recognised by respondents as a real situation as opposed to a contrived one. The only factual difference in the individual vignettes was the random assignment of either a black or white suspect within the vignette itself, ensuring everyone was assessed against the same circumstances.

Including questions on officer’s rank, years of service, sex, and ethnicity, the survey examined officer decision-making by having them explain their utilisation (or lack thereof) of stop and search powers based on the information available within 15 vignettes (see Appendix 1 for vignette list). Their decision-making was scored though a scale of − 5 to + 5, with − 5 indicating an unwillingness to search against a score of + 5 indicating a strong willingness to search. The survey also asked the participants to provide written feedback articulating their thought processes in free text boxes. Crucially, the online survey tool was comprised of fifteen selected vignettes and programmed to randomly change the ethnicity of the suspect (e.g. black or white). Officers would not receive the same vignette twice with different ethnicities.

Participant Selection

Given this study’s explicit focus on testing the decision-making of police supervisors who oversee officers undertaking stop and searches, lists of every supervisory officer with the rank of sergeant and inspector within the South BCU were generated with the aid of the MPS human resources department. Of the 154 eligible officers, 118 participated in this study. The work emails for each officer where then entered on the online survey tool which then provided a randomised candidate number for every officer. The survey tool then tracked the officers’ returns and allowed for follow-up emails to be sent to those that failed to respond. At no point in this study did the survey tool allow a direct link to be drawn between individual officer and their generated candidate number, ensuring absolute anonymity. Importantly, all officers were guaranteed of their anonymity and told that completion of the survey was optional.

Methods

We leverage a combination of descriptive statistics, inferential methods, and text mining. We use both Microsoft Excel and R to conduct these analyses. As it relates to descriptive statistics, we employ simple summary statistics to determine the demographical characteristics of the officers involved in the study. We utilize t-tests to measure the Likert scale responses (− 5 to + 5) for each vignette, comparing responses when the suspect was black against the responses when the suspect was white.

Text Mining

Text mining is a relatively new technological development and is defined as the act of “processing a collection of documents, or corpus, in which documents are converted into structured data, such that each document is described using a set of features called concepts to provide a holistic perspective of textual and non-textual information” (Mikroyannidis & Theodoulidis, 2006, 45). More generally, text mining allows for the automatic analysis of large amounts of qualitative data. Given that this study collected free text responses for each vignette, traditional qualitative research approaches such as content analysis, while appropriate, were deemed inadequate. Text mining represented the most viable methodological option as we sought to measure the frequency of word use and the sentiment of officer responses against the ethnicity of the suspect in the vignette.

Text mining necessitates a series of pre-processing procedures. For this study, data pre-processing consisted of tokenization, filtering, and stemming. More specifically, the textual data were cleaned by removing punctuation, special characters, digits and uniform resource locator links. Tokenization, the process of reducing words into pieces of information called tokens, was conducted in order to identify meaningful keywords. Next, all stop-words were removed from the corpus. These are words such as ‘and’ or ‘the’ that do not carry information. Next, word stemming is conducted—a process of transforming words into their roots. All these data were then converted to a corpus which is a large, structured set of texts. From here, these data were converted into a structured format from which analyses can be conducted. Finally, a vectorspace model was used to capture the relevant features for each document within the data.

Text Analysis

We leverage descriptive text analysis to identify the frequency and distribution of words used by the respondents. We also calculate the power law distribution of word use using a cumulative distribution function. It should be noted that there are limitations to this analysis. Indeed, descriptive text analysis may not necessarily consider the context in which specific words are used. This may lead to the misinterpretation or oversimplification of the significance of specific words.

Sentiment Analysis

We utilize sentiment analysis to gauge the tone of officers’ responses when the suspect in the vignette is black or white. Sentiment analysis is the practice of applying natural language processing (NLP) and text analysis to identify subjective information within written text (Hussein, 2018). The main goal of sentiment analysis is to automatically classify the sentiment of a piece of text into positive, negative, or neutral classification. To achieve this, NLP-based algorithms use various techniques such as lexicon-based analysis. Lexicon-based analysis involves assigning scores to words based on their positive or negative connotations. We use the “qdap” package in R to provide polarity scores ranging from − 1, indicating negative sentiment, to + 1, indicating positive sentiment. Importantly, sentiment analysis possesses two broad weaknesses. First, it may fail to recognize the ambiguity of language as some words can having different meanings and connotations depending on the tone, cultural background, and context in which it is used. Second, it may fail to account for sarcasm and irony which can produce incorrect scores.

Findings

This study has a response rate of 77%, with 118 of the 154 supervisory officers completing the survey. Table 1 presents a breakdown of key characteristics for the respondents. As it apparent, 67.8% (80) of officers were male, with 81.4% of all respondents identifying as British (white). Furthermore, 76.3% (90), were police sergeants, with 40.7% between the ages of 40 and 49,.

Table 1 Characteristics of respondents

Scaled Response Results

Table 2 presents the t-test results for each vignette, comparing officers’ responses when suspects were black or white. It is important to note that a small p value does not necessarily imply that the differences in officers’ decision to stop and search black or white suspects is large or meaningful, or that the alternative hypothesis is true. It simply means that the data offered strong evidence against a null hypothesis. Additionally, the interpretation of p-values should always be made in the context of the study design, the sample size, and the potential sources of bias. As noted above, the mean score averages individual responses ranging from − 5 (very unlikely to justify a stop-and-search to + 5 (highly likely likely to justify a search).

Table 2 T test results of all vignettes

Based on the results of the t-test, it appears that officers are, in general, neither more or less likely to stop and search black or white suspects. Indeed, the race of the suspect is seemingly inconsequential in an officer’s decision to use their power to stop and search. This can be gleaned from the mean scores for black and whites across all vignettes. Nevertheless, there are three vignettes where there exists a statistically significant difference between officer responses between black and white suspects. These are vignettes 9, 11, and 12 where black suspects were less likely than white suspects to be stopped and searched by officers. Moreover, the aggregated averages for black and white suspects indicate that white suspects are, overall, more likely than black suspects to be stopped and searched by officers evaluating the vignettes.

Descriptive Text Analysis

Analysis of the free text responses of participants demonstrated a power law distribution, with approximately 1% of words being using 26% of the time (see Fig. 1). Alternatively, 20% of all words were used 80% of the time, abiding by the popular 80/20 principle. For context, there were 2344 unique words and19,749 total words used by the 118 survey respondents.

Fig. 1
figure 1

Word occurrence power curve

Word frequency refers to how often a particular word is used in a language or text. Figures 2a–c present the top 20 words used by respondents for all vignettes, vignettes where the suspect is white, and vignettes where the suspect is black, respectively. Upon examination, the words used across all three groups, particularly when the suspect is black or white, are the same or similar. Indeed, there is seemingly no substantive difference in the free text responses officers when the suspect is black or white. Moreover, it is clear from these frequently used words that the respondents were searching for more information. This is evidenced by the use of the words: “search”, “grounds”, “male”, “drug”, and “information”, which are held in common across all three groups. In effect, the respondents were investigating the circumstances within the vignette, wanting to know and gather more information in order to justify the use of stop and search powers. Furthermore, it is also clear that these words are non-malicious in nature and there is little significant difference between the black and white suspect groups. These frequently used words are indicative of professional conduct by the respondents to the survey.

Fig. 2
figure 2

a Word frequencies (all). b Word frequencies (White). c Word frequencies (Black)

Sentiment Analysis

Table 3 presents the distribution of polarity scores for officer responses for black suspects, white suspects, and all suspects. Based on these results, there is no substantive difference between the average polarity scores of officer responses when the suspect is black or white. This is further substantiated by the closeness of the standard deviations of both groups as the majority of sentiment scores fall close to the mean (see Fig. 3a–c). Importantly, the score of − 0.1 for white suspects and − 0.11 for black suspects, while negative, is relatively close to 0, indicating value neutrality where the verbiage of officers was neither positive nor negative. Overall, sentiment analysis corroborates previous finding where the words used by officers convey professional and sober search for evidence to legally justify the decision to stop and search.

Table 3 Distribution of polarity scores
Fig. 3
figure 3

a Sentiment polarity (all). b Sentiment polarity (white). c Sentiment polarity (black)

Discussion and Conclusion

This study has demonstrated that, given a consistent set of circumstances in relation to stop and search scenarios, operational supervisors may apply different interpretations and weighing to the information provided. This, in turn, leads to variability in both the decision to search and their strength of conviction in relation to the use of that power. Nevertheless, when examined across the t tests, descriptive text analysis, and sentiment analysis, this research paints a clear picture regarding the use of stop and search as it relates to the race of the suspect. It finds there is no substantive difference in the justification of stop and search powers between white suspects and black suspects.

In general, the study has identified a lack of disproportionality when comparing decisions made by officers in relation to black and white suspects. While this is a significant outcome which seemingly goes against popular narrative regarding law enforcement’s treatment of minorities communities, it may be tempered by a social desirability effect (McCambridge et al., 2014) causing supervisors to select what they thought was the ‘’right’’ answer. We must, therefore, consider these issues and their potential to influence officer decision-making when interpreting these data. Indeed, as the survey was designed by a senior MPS officer, the respondents, all MPS officers, would have known that their responses, while still anonymous, would be closely examined. We must also acknowledge that these officers operate against the backdrop of significant organisational criticism for its treatment of individuals from ethnic minority backgrounds which has resulted in the implementation of policies and processes that hold officers to account when evidence of discriminatory behaviour is presented. This coupled with a negative police culture of mistrust may have influenced the responses given by officers. However, every effort was made to ensure the survey responses remained anonymous and officers were personally reassured of this throughout the survey period.

It is important to note that sample size is a critical aspect of any survey as it determines the reliability and validity of the results. This survey has focused on the South BCU and, as such, its findings should not be used to inform judgements in relation to the entire MPS. As of February 2023, the MPS has grown to 34,350 officers with 810 inspectors and 3,235 sergeants. The survey originally recruited 154 supervisors which equates to 3.8% of these ranks across the MPS. Moreover, this survey’s response rate would equate to 2.9% of all first and second-line supervisors across the entire MPS. Therefore, future research should be run with a much larger sample size, perhaps focused on both Front-Line Policing (FLP) units along with a number of Pan-London operational commands. This approach would include multiple BCUs that police very different areas and communities in London along with the Territorial Support Group (TSG) and Violent Crime Task Force (VCTF) which operate across all 32 London Boroughs.

To further enhance this research, there is an option to utilise virtual reality (VR) technology as this would provide a unique, immersive, and realistic experience for the respondents. The benefit of a VR approach is the creation of real-world scenarios, providing a vivid representation of the vignette simulating the real-world environment, pressures, and behaviours of the suspect. This approach would allow the respondent to interact with measurable elements in a more immersive way. The impact of experiencing the vignette in a real-world environment versus reading a story are significant and should not be underestimated. There are also some considerations to this approach that need to be balanced against the benefits. First, the use of VR requires specialist and costly equipment along with additional production costs in relation to the material used. Second, with any hi-tech equipment there is always the risk of equipment failure leading to respondent frustration and skewed data. Third, the use of VR may create bias if it is not representative of the respondent’s physical environment, particularly in terms of the locations used and the suspect’s characteristics. Overall, the use of VR in surveys creates an exciting development of this key research type, but it should be used judiciously with due consideration to the limitations.

The decision-making process of police officers is a critical aspect of law enforcement. It is imperative that officers make sound, effective, and lawful decisions to ensure public safety and maintain order. Poor decision-making can lead to erosion of trust and confidence in policing. However, there is often a significant variability in decision-making among police supervisors who, when presented with similar situations, can make very different decisions which can lead to inconsistencies in the application of laws and regulations. While this research demonstrates direct evidence of this variability in decision making it also, and perhaps more importantly in relation to trust and confidence, shows there is no evidence of a negative bias towards members of the black community when compared to the white community.

While the lack of disproportionality is a worthwhile finding, it must be considered against the backdrop of the survey size, the simplistic nature of the vignettes, and the Hawthorne effect. However, the survey was randomised and anonymised, with officers being personally assured of confidentiality and as such, it is hoped that these findings are representative of their thought processes and do provide some reassurance to the South BCU’s local communities. To further test these findings and to provide an even greater understanding, additional research should consider utilising a randomised controlled trial (RCT) approach that not only considers officers response to vignettes in a survey but also cross-compares actual stop and search data along with reviews of stop and search interactions on body worn video. While it is possible that these findings may be at odds with the lived experience of many in specific communities, this research should, nevertheless, encourage other researchers to design future studies that enhance our understanding of police decision-making and stop and search.