Introduction

Police chiefs are often criticized for their apparent failure to dismiss officers who are guilty of gross misconduct. While there may well be patterns of such failures in the USA or other parts of the world, the question of who decides to dismiss police officers (or not) is complicated by the legal regulation of dismissal decisions. The processes and powers of police chiefs to dismiss officers vary widely, not just across countries (or US cities and states) but also within jurisdictions over time.

A prime case of changes within jurisdictions is England and Wales, where statutory regulations governing police misconduct hearings leading to dismissals have been changed at least five times since 2007 (in 2008, 2012, 2015, 2018, and 2020). From the perspective of many chief officers bound by these regulations, the most profound change occurred in the 2015 amendment of the 2012 regulations, by which senior officers (ranks above Chief Superintendent) were no longer allowed to chair ordinary misconduct hearings. While senior officers could continue to take a select minority of cases to what is now called an “accelerated” hearing if the evidence against the officer was incontrovertible, the majority of misconduct hearings from 2016 onwards have been required to be chaired by a “Legally Qualified Chair” (LQC), generally a Barrister.

These ordinary misconduct hearings have long been prescribed to have 3 members, including at least one “independent member” who is not a police officer. Prior to 2016, the other two members were generally police leaders, one of whom was a Chief Officer who was required to serve as Chair. These hearings took oral testimony and other evidence presented at the hearings into account in reaching a number of findings, beginning with whether there was a case to answer. From 2016 onward, the same procedures were followed but with an LQC in the chair, working with an independent member and a police officer.

In repeated controversies both before and after the newrules became effective in 2016, Chief Officers have been blamed for not dismissing officers with substantial evidence of serious misconduct. Yet some have recently challenged that criticism, on the grounds that in the majority of cases against police officers, dismissal is not within the power of any Chief Constable or Commissioner within the 43 territorial police forces in England and Wales. This public dialogue has led to increasing concern about whether using LQCs to chair the misconduct hearings has reduced the rate of dismissal of officers whose conduct merits dismissal.

The change to LQCs chairing misconduct hearings has not been widely understood, let alone studied carefully. The purpose of this report is to examine the relevant evidence from the UK’s largest police force, the Metropolitan Police Service (MPS). That force has observed that dismissal rates have declined since the 2015 Amendment. LQCs and others have objected that other changes in the process could have caused any apparent reduction in dismissal rates, so that LQCs were not the cause. Working with the hearing records of the MPS, we examine the differences in outcomes of cases between those conducted under old and new hearing regulations, as well as possible causes of any changes in dismissal rates for those hearings.

Two Analytic Frameworks

This report also introduces a hearing-based analytic framework for police misconduct insights, as distinct from other ways of comparing misconduct investigations. That framework is appropriate for a policy analysis of issues connected to the background and viewpoint of the panel chair. In recent research on a related issue in the MPS in the same time period, an analysis of racial disparity in bringing “referrals” to hearings as reported in an independent report by an experienced public servant, commissioned by the MPS, with preliminary findings in October 2022 (Casey, 2022). The report found clear evidence of racial disparity in finding a “case to answer” for each allegation made against each officer (a process that precedes any hearings on the allegations).

The initial Casey report did not address the issue of how often the presentation of one or more allegations in a misconduct investigation, with one or more allegations presented in a single misconduct hearing, resulted in dismissal. This question addresses both overall dismissal rates of officers facing a preliminary case to answer, as well as how much racial disparity in dismissal rates there may have been. The present report’s hearing-based method of analysis thereby provides a like-for-like comparison that focuses on dismissal of individual officers, independent of the number of allegations made in each hearing, or of differences in numbers of charges per hearing.

Two Pathways to Dismissal

Part of the confusion over the powers of British police chiefs to dismiss officers may be the fact that (in England and Wales) two pathways are provided for misconduct hearings. One pathway has been officially called “special” or “accelerated” hearings and colloquially known as “fast track.” These are the minority of cases, in which the evidence is so strong (such as a criminal conviction) that a Chief Officer can hold hearings on written evidence without calling live witnesses, in order to decide whether to dismiss an officer. The majority (66%) of all misconduct hearings in our MPS data set, however, followed the “standard” track, in which a panel of three decision-makers meets under the leadership of a (voting) chair.

Because there was no 2015 change in legislation empowering chief officers to conduct special (or “accelerated”) hearings, this category of cases is not relevant to the impact of placing LQCs in charge of hearings and have therefore been excluded from the analysis. It is worth noting, however, that throughout the entire study period, before and after LQCs were introduced to chair standard hearings, the dismissal rate for the fast track hearings remained unchanged at 92% in both time periods.

Two Categories of Standard Track Chairs

It is the standard track cases that have changed substantially in the period 2014 through 2018. In 2015, a new law received Royal Assent reducing the powers of chief officers in decisions to dismiss police officers, replacing them with a “Legally Qualified Chair” (LQC) who is generally a Barrister. LQCs are appointed by a police oversight authority, in this case the London Mayor’s Office of Policing and Crime (MOPAC).

Under the provisions applied to all standard track cases from 1 January 2016, no hearings were actually conducted by LQCs until 30 March 2016. These hearings were limited to officers whose investigations were opened under the 2012 amendments to the relevant statutes; they did not apply to investigations that were opened prior to 2012. This meant that even after the LQC chairs were required for the more recent cases, Chief Officers continued to chair a total of 20 standard hearings for these older cases through the end of our study period on 31 March 2018, and beyond.

Prior to the change effected in 2016, the standard misconduct hearing panel was chaired by a senior police leader (defined as any officer above the rank of chief superintendent).

In the MPS, this role was generally taken by a Commander, an MPS rank equivalent to assistant chief constable in 41 other Home Office territorial forces. The rest of the panel consisted of one independent member and a less senior police officer, such as a superintendent.

Standard of Proof?

Under the change in legal framework, the only difference between standard track cases before and after late March of 2016 was the qualification of the chair; there was no formal change in the standard of proof. In practice, some observers have hypothesized that the LQCs did tend to impose a higher standard of proof than had been used by chief officers. LQCs, in this view, tended to apply a standard closer to “beyond a reasonable doubt” rather than a “balance of probabilities.” The latter standard is what the chief officers were said to have used in the period prior to the requirement to appoint LQCs in standard hearings on allegations of gross police misconduct.

Some observers have also said that a trend in legal representation of accused officers may have brought in more arguments presuming a standard of  “reasonable doubt.” More experienced counsel attacking evidence from that perspective could have been received differently by LQCs vs. Chief Officers. While we have no data on this point, it could be relevant to understand what may have been changing at the same time as the qualifications of the chair.

Relevant to this debate is the following statement in the Home Office guidance that has legal standing:

2.265. The more serious the allegation of misconduct that is made or the more serious the consequences for the individual which flow from a finding against him or her, the more persuasive (cogent) the evidence will need to be in order to meet that standard.

Whether cogency can be equated to probabilities in this context is not clear. It would certainly be difficult to measure. What can be measured, however, is whether an officer was dismissed in a hearing which is predicated on a panel making an initial decision confirming the basis for the referral of the case to a hearing: that the evidence is sufficient to create a case to answer. The question of whether the officer is dismissed allows this report to focus on two questions of fact.

Two Questions of Fact

The scientific focus of this report is on two questions related to dismissal of police officers. The research does not presume that every officer subjected to a standard misconduct hearing should be dismissed. It simply uses the facts of dismissal to compare its probability across cases under a variety of different conditions.

  1. 1)

    Did the requirement for LQCs to chair standard hearings reduce the probability of dismissing accused officers? And if so….

  2. 2)

    Was there any other explanation for a reduction in officer dismissal probability at the same point in time that dismissals declined (in the immediate aftermath of introducing LQCs)?

Correlation vs. Causation

To answer the first question, we can readily determine whether, and by how much, the probability of officers being dismissed for gross misconduct declined after the advent of the LQC system. What we cannot readily determine is the causes of any change in hearing outcomes. No matter how big a change might have occurred immediately after the adoption of the LQC system, there is a problem of inferring causal impact of a single factor. This problem was expressed by ancient Romans as post hoc ergo propter hoc (after that, then because of that). The ancients understood what we often forget in modern life: that correlation alone does not prove causation.

The simple reason correlation is not sufficient to prove causality is that many other things may happen at the same time as the two correlated factors. These alternative, rival explanations may be equally well correlated with any change to be explained, such as a change in dismissal rates. The problem is that these alternatives are generally unobserved, unless someone else presents them. It is only by examining them that we can know whether or not they really do match the correlations that we do observe.

That is why there is a well-developed science of causation that eliminates rival alternative hypotheses. The idea of a randomized controlled trial is the core of evidence-based medicine (Millenson, 1997). It offers a method for comparing two different ways of treating similar kinds of patients, while ruling out any other measureable difference between the two different treatment groups. By dividing a large population into two halves by an equal probability of each case going into one treatment group or the other, the method generally ensures equal proportions of both groups in gender, age group, body-mass index, education, diet, income, etc. Because the two groups are—at the outset of an experiment—highly similar, any difference in medical outcomes can be attributed solely to the medical treatment. It cannot be attributed to pre-existing differences in the groups, which were eliminated by random assignment.

The present analysis is not based on a randomized trial. It is simply a descriptive tracking study, one that can only observe correlations. It is not one that can provide strong proof of causation. Yet, that was also the case with cowpox victims never catching smallpox—a correlation that led Jenner to develop the smallpox vaccine. Strong correlations are always worthy of attention, if only to investigate them further.

In the long run, all scientific conclusions are provisional, as are policy decisions based on science. When the facts change (as J.M. Keynes observed), we may need to change our conclusions, policies, or practices. Most decisions are based on incomplete information, if only because urgent problems require immediate action. In the short run, observing a strong correlation is better than having no evidence at all.

Discovering strong correlations also creates a logical burden of proof on those who dispute a cause-and-effect character of the correlation. In order for them to claim that the correlation is what statisticians call “spurious,” they would need to marshal the evidence that the initial finding can be explained by a third factor. To the extent that the initial discovery can be tested in this way, a preliminary search for obvious alternate explanations can strengthen the claim that X caused Y—or that LQCs caused lower dismissal rates in standard police misconduct hearings.

Resignations Before Hearings

In the present analysis, we have tested the LQC hypothesis against a leading alternative explanation. That alternative hypothesis is that dismissals declined sharply because of the further change in law allowing police officers to resign while a gross misconduct charge is pending against them. The facts required to support this hypothesis would be as follows:

  • Officers who would have been dismissed could resign before dismissal.

  • These officers can be tracked as “would have been dismissed” cases.

  • The sum of officers actually dismissed and officers who “would have been dismissed” would show that the total separations of officers accused of misconduct has not changed from before to after the change to LQCs.

In light of that alternative hypothesis, we approached the available data with a key question: what was the effective date of the legal change that allowed officers to resign without approval of their chief officers? We asked that question to discover whether there was a substantial period of time after the change to LQCs, and before the change to allow officers to resign without permission from chief officers. The answer was that a period of 22 months elapsed from the 30 March start of LQCs chairing standard hearings to the start of the officers’ right to resign without permission.

The 22 months in question therefore serves as a “purer” measure of the correlation between LQCs vs. Chief Officers and dismissal rates for gross misconduct. This means that dismissals cannot be substantially reduced by resignations before hearings. With no option to resign without permission before a misconduct hearing, the total number of dismissals is all there is to count; there is no option to count officers who resigned who would have been dismissed.

Having decided to limit the comparison of dismissals to a period that was uncomplicated by resignations, we then faced the issue of statistical power to balance the numbers of cases with each type of hearing panel chair. In order to the number of cases in the “after-LQC” period with the “before LQC” period, we restricted the analysis to 22 months of dismissals in both periods. In that way, we eliminated the possibility of any claim that the effect of LQCs was mixed in with the effects of a right to resign in advance of dismissal.

Data

The data for this analysis were provided by the MPS Directorate of Professional Standards (DPS), drawn from the national database of records (CENTURION) on 100% of misconduct hearings on record for the MPS from the present back to 2009 when digital records began, with further information from the digital TRIBUNE files on the same cases. Data collection was challenged by missing data from some of the digital records, so that we could not analyze the proportion of cases found to have been gross vs. standard misconduct. We were, however, able to identify in each case whether the Hearing panel determined that the case was proven or not.

Units of Analysis

The records reflect three units of analysis. The largest unit is each specific allegation of misconduct against each officer, with multiple allegations possible in a single case. The second largest unit is the number of officers against whom the allegations were made, with multiple officers possible to be accused in a single case. The smallest unit of analysis is the case hearing, in which a decision is made whether to dismiss one (or more) officers in relation to one (or more) allegation. In order to link the outcomes to officers at risk, we use a combination of officers and hearings, described below as officer-hearings, by which each officer may have multiple hearings.

The initial database we examined has 2861 person-allegation observations between 2009 and July 2022. It contains individual characteristics for the officer accused, information on the type of misconduct (or “breach,” and the outcome of the misconduct hearing as including one dismissal (or more) in each case hearing. There are 1160 unique cases in the database and 1209 unique officers accused of misconduct. Again, a case can involve more than one individual, and for each individual there can be more than one allegation (one for each breach). The same panel assesses each individual case, but outcomes may differ for each individual accused and for each unique allegation.

In order to assess the impact of changes in the composition of the hearing panels, we created a population (100% of records) of officer-hearings. This population consists of 1307 unique observations of unique individual-case combinations, or (equivalently) unique individual-date combinations. In this report, a “hearing” refers to the assessment of misconduct for a unique individual in a given date. If the same hearing affected 2 officers, we would count those as 2 separate observations, because each officer may have very different outcomes even within the same hearing. They would also have different case reference numbers, as listed in the Appendix.

Methods

We restricted the sample to the period from June 2014 to February 2018 so as to have 22 months of data before the introduction of LQCs and 22 months after. The reason for stopping the sample in February 2018 is that before that date, officers who were accused of misconduct had the option of requesting permission to resign without being assessed in a hearing. Since February 2018, the hearings data also includes information for officers who would have been dismissed, but who resigned prior to the hearing date. We therefore stopped including cases in our analysis as of the last date prior to the first “would have been dismissed” entry.

Findings

The key findings of this analysis are presented in Figs. 1, 2, 3, and 4.

Fig. 1
figure 1

Comparing dismissals by chair qualification

Fig. 2
figure 2

Comparing dismissals of white officers by chair qualification

Fig. 3
figure 3

Comparing ethnic disparity ratios in dismissals by chair qualification

Fig. 4
figure 4

Comparing dismissals of BAMEH officers by chair qualification

The probability of officer dismissal in standard misconduct hearings chaired by Chief Officers was 47% (67 out of 142); the probability in such hearings chaired by LQCs was 34% (31 out of 92). The probability of dismissal in LQC-chaired hearings was therefore 29% lower in LQC hearings than in Chief Officer-chaired hearings. Hearings chaired by Chiefs were 38% more likely to decide to dismiss an officer.

The dismissal rates between the two categories of hearings showed even greater difference by ethnic disparity, with white officers dismissed in LQC hearings at half the rate (27%) as in hearings chaired by Chief Officers (46%).

This pattern yielded a large difference in ethnic disparity of dismissals, with BAMEH officers 115% more likely than white officers to be dismissed in LQC hearings, but only 13% more likely to be dismissed than white officers in Chief Officer hearings, with a disparity ratio in dismissal outcomes for the 19 BAMEH officers in LQC hearings that was eleven times higher than for the 31 BAMEH officers in the Chief Officer hearings.

This higher ethnic disparity in LQC decisions compared to Chief Officer decisions is due primarily to lower LQC dismissal rates for white officers and not to substantially higher dismissal rates of BAMEH officers by LQCs than by Chief Officers.

Figure 5 offers some facts relevant to the standard of proof issue. The Tribune data show that in the LQC cases, only 70% of the panels ruled that the case was proven. In the Chief Officer cases, 87% of the panels ruled that the case was proven. This means that the Chief Officer panels were 24% more likely to rule a case proven than the LQC panels, or conversely that the LQCs had a 20% lower rate of proven findings.

Fig. 5
figure 5

Proportion of cases proven by chair qualification

“Noise” and Statistical Significance

All of the findings are reported without tests of statistical “significance.” Those tests are designed for estimating whether a test based on a sample of a larger population is likely to be consistent with repeated tests based on independent samples (selected separately) of the same population. The use of such tests is therefore inappropriate for a population study, in which 100% of a population is included in the analysis. The logic of selecting different samples from the same population disappears when a study already has 100% of the population and would draw identical cases with each new sample.

The fact that significance tests are statistically inappropriate, however, does not rule out the possibility of “noise” (Kahneman et al., 2022). This problem consists of multiple sources that can create flaws in perceptions of data and in making judgments about what data mean. In this case, the question is not whether repeated samples of the current population would reach the same result. The question is whether these results would also be found in future cases that have not even happened yet, including larger populations that are less prone to the possible fluctuations in small numbers of cases over time.

One way to think about this problem is to use significance tests as a heuristic, or guidance tool, in imagining whether the future will look any different from the past. One way to apply this thinking is to run a significance test only as a heuristic, so that we can see how likely the result is to be a product of chance “noise” rather than of an underlying pattern.

We therefore subjected the findings above to a variety of tests. Most important, we found that when the LQC dismissal rates were compared to the Chief Officer rates in an odds ratio, the result appeared to have less than a 5% chance of constituting noise. For the comparison in Fig. 1, the odds ratio (OR) = 0.57 (or 1/OR = 1.75), with a confidence interval of 0.33–0.98). By this metric, LQCs were 1.75 times less likely—or 75% less likely—to dismiss overall. The confidence interval, or range of error around the result, does not contain 1.00. The odds ratio of dismissal is therefore statistically significant at p < 0.05.

The same is found for the difference in LQC dismissals of white officers compared to Chief Officer dismissals. When white officer cases are looked at in isolation, the OR is significant for LQCs vs. Chief Officers. The 95% confidence interval runs between 0.24 and 0.84; it means that the OR is significantly lower than 1.00. This tells us that the LQCs are better described as being more protective of white officers than as being more punitive against BAMEH officers.

The other figures rely on smaller parts of the population, so that fewer cases mean greater risk of noise—and less statistical power to rule out the difference as noise. Yet, the key findings are about the comparison for all officers between LQC and Chief Officer cases, as well as the drop in dismissals of white officers under LQCs compared to Chiefs.

Alternative Explanations

Because this analysis is not based on a randomized controlled trial, there are many other reasons—in theory—that the difference in dismissal rates was caused by some factor other than the change from Chief Officers to LQCs. Yet, for any alternative explanation to be plausible, it must be articulated and tested. In the spirit of testing the findings, we have examined some of possibilities but found little evidence of changes over time that correspond directly with the change from Chief Officers to LQCs.

One possibility that we examined in relation to the nature of the cases was the proportion of cases referred to accelerated or special hearings. While this proportion did change somewhat over time, it does not provide a compelling alternative explanation to the results in our analysis of the standard cases. The proportion of all cases that were “fast-tracked” rather than standard rose from about 20% in 2014–2015 to some 40% in 2016–2017. This could have reflected, however, an increase in the number of cases qualifying for special status, simply because of more officers pleading guilty to criminal charges or other evidentiary factors that we were not able to measure. A full analysis of all issues of evidentiary strength in all standard and special cases over that time period would be very time-consuming and would have delayed the production of this report.

A final issue is the question of cases that were found proven, but not at the level of gross misconduct. Neither the Tribune nor Centurion data had consistent records on this point, so we are not able to include this point in the analysis. Yet, even if the dismissal rates by LQCs were driven by a drop in the percentage of cases found to be gross misconduct, we have no means to separate that decision from the decision to dismiss itself. The two decisions are likely to be highly correlated, since in every case they are both made by the same three decision-makers and not two independent panels.

Conclusions

As noted above, this analysis is not a causal test of the effects of changing from Chief Officers to LQCs to chair standard hearings. It is only a descriptive analysis of the differences in dismissal rates that quickly emerged. To put it another way, for every 100 cases heard by Chief Officers rather than LQCs, there would be 38 more dismissals expected from the Chief Officer hearings. That difference could create a major reduction in the level of actual gross misconduct by MPS officers. We cannot guarantee that possibility. But we can say it is certainly possible.

While we cannot rule out the possibility that changing back to Chief Officers chairing standard misconduct hearings would make no difference in dismissal rates, we can say that it could indeed increase officer dismissal rates in gross misconduct cases. These differences in these findings are substantial.

There is no evidence that the LQC system has produced any improvement for the standard of discipline at the MPS. There is, in fact, every possibility that the LQC system has made the MPS less likely to dismiss officers whose continued service poses a risk to the public. That is the range of possibilities that can be inferred from the evidence in this analysis.