Introduction: Evidence on Policing Domestic Abuse

Since the Minneapolis Domestic Violence Experiment (MDVE) was launched in 1981, US police have completed at least 14 randomized controlled trials of policing domestic violence. Repeated proposals to replicate any of those US tests in the UK, however, have been rejected by a succession of governments, both national and local. The need for more experiments in more countries on policing domestic abuse grows greater as the evidence from the US trials gets more worrisome. While the initial US results were encouraging (Sherman and Berk 1984), or at least equivocal (Maxwell et al. 2001), longer-term evidence now suggests that for at least one population (urban African-American women victims), the policy of mandatory arrest may cause a 100% increase in premature death from all causes among victims whose abusers are arrested (Sherman and Harris 2015).

Despite resistance to rigorous testing of police responses to domestic violence, the UK has been fertile ground for the first of evidence-based policing’s three “T”s (targeting, testing, and tracking). In a series of articles in the Cambridge Journal of Evidence-Based Policing, major insights about targeting the distribution of harm from domestic abuse are developed in Volume 1 of the Journal. These studies were stimulated in part by Chief Constable Sara Thornton’s (2011) Cambridge Master’s thesis showing that a prior history of suicide attempts was the most powerful predictor of an offender going on to commit, or attempt, domestic homicide. Since most completed or attempted domestic homicides occur in dyads (unique victim-offender relationships) with no prior police contact concerning domestic abuse, it is of great practical value to be able to identify a potential marker from other kinds of police records (including prior arrests of offenders unrelated to domestic violence) as well as records of other agencies.

Other Cambridge “pracademic” (practitioner-academic) studies of targeting differential harm from domestic abuse have replicated or supported Thornton’s evidence of the predictive value of an offender’s prior suicide attempts, or suicide ideation or self-harm. Button (2016), Chalkley (2015), and Bridger (2015) all provide new UK evidence on forecasting homicide with prior suicide attempts, rather than using a responding constable’s subjective responses to an untested checklist of questions about Domestic Abuse, Stalking and Harassment (DASH).

Further empirical evidence on segmenting case risk for targeting police resources was produced in response to the Master’s thesis of Matthew Bland, Chief Analyst of Norfolk and Suffolk Police. Bland and Ariel (2015) analysed 36,000 police records over 6 years in Suffolk to examine long-term distributions of prevalence, frequency, severity, and escalation of domestic abuse. Their analysis compiled the cases across every unique dyad meeting the broad definition of “domestic” used in UK police practice (including parent-child, wife-husband, brother-sister, boyfriend-girlfriend, and same-sex couples). In a stunning falsification of the “escalation” hypothesis, Bland’s evidence showed that 76% of dyads ever reported to the police had no repeat contact of any kind in police records. Among the 25% with one or more repeat contacts, there was some evidence of increasing frequency. But there was no statistically significant pattern of escalation in seriousness of subsequent incidents, using the Cambridge Crime Harm Index (Sherman et al. 2016) as the metric of seriousness. Bland also showed that the 75% with no repeat contact yielded only half of the total crime harm, but less than 4% of all cases yielded 80% of the total Crime Harm Index (CHI) tally.

Bland’s UK analysis was followed by Deputy Commissioner Jeanette Kerr’s analysis of 61,796 cases of intimate partner violence (IPV) cases in Northern Territory, Australia, which used a standard 4-year follow-up of all cases (Kerr et al. CJEBP 2017). While she did not find any evidence of escalation in either frequency or seriousness in the white population, she found clear evidence of escalation in one subset of the Aboriginal population.

What none of these targeting studies have shown is any support for the DASH risk assessment tool used widely by police across England and Wales. In the Thames Valley analysis, for example (Thornton 2011), none of the offenders charged with domestic homicide or attempted homicide who had been previously assessed with DASH had been identified as “high-risk”. Nonetheless, the DASH assessment tool has maintained a powerful hold on police policy and practice in the UK, especially in the face of proposed innovations that might deviate from a maximum sanctioning strategy.

The importance of targeting analyses comes from the major resource questions surrounding domestic abuse, which often receives more police responses than any other category of emergency calls to police. The challenge is compounded by the vast “haystack” of low-harm, no repeat domestic callouts relative to the very few “needles” of injury or death. Quite apart from the many other demands for police resources for threats besides domestic abuse, there is an important question of how best to protect victims of domestic abuse by reducing the overall harm level from that crime category. Should police invest equal resources in every domestic call they receive? Or should they somehow concentrate on the cases that pose the greatest risk of serious harm? If a police agency decides to concentrate on those with the greatest risk of greatest harm, what policy or practice can be used to reduce the total harm from domestic abuse?

The most common answer is a policy of mandatory arrest whenever police have forensic evidence sufficient to justify an arrest. This policy may provide either general deterrence of domestic abuse (for which there is no evidence) or an escalation in homicide against abuse victims (for which evidence has been identified by Iyengar 2010). Evidence on the effects of arrests on those arrested is more certain: in the US experiments, individual effects of arrest are very heterogeneous, with unemployed offenders increasing their domestic violence after arrests in three US experiments, while employed offenders were deterred by arrest (Sherman 1992).

Yet, even with a policy of mandatory arrest, most arrestees in the Southampton area of Hampshire were released shortly after arrest for lack of “prosecutability” to a likely conviction. The decision to take no further action (“NFA”) was reported in 55% of 2244 domestic abuse arrests in Southampton in 2012–2013, with only 33% charged and 22% convicted (Rowland 2013). Few might be happy with mandatory arrest as the primary response to domestic abuse knowing that most offenders are released without even an admonition within a few hours. Given the huge investment in mandatory arrest for these cases, the most incremental change may be to offer a low-cost follow-up to arrest that could help reduce total harm from IPV. We frame the question that way of only because the nature of abuse in a sexual relationship may have fundamentally different dynamics from those in other kinds of domestic relationships.

In the spirit of trying to get better results from the investment in domestic abuse arrests, Hampshire police began a journey 8 years ago with no roadmap or knowable timetable. Where they arrived was the first completed randomized controlled trial of a domestic violence policing strategy in the history of the UK (Neyroud 2017; see also Matheson et al. 2015).

Targeting and Designing IPV Harm Reduction

The origin of this report on the Cautioning and Relationship Abuse (CARA) experiment is in 2010, when then-Chief Constable Alex Marshall of Hampshire Constabulary approved Superintendent Robin Jarman’s acceptance of a Sir Anthony Bottoms Bursary to study in the Cambridge Police Executive Programme. Jarman then proposed a thesis that would design a domestic abuse experiment and asked Alex Marshall to seek approval for the design. Marshall wrote to the Cabinet Minister leading the UK’s Home Office (later Prime Minister, Theresa May) the national ministry responsible for local policing, asking permission to conduct a test of policing domestic abuse in Hampshire. The original proposal was to experiment with the half of all domestic abuse arrests that resulted in “simple cautions” (offenders being taken to police stations under arrest, only to be released with a formal caution, equal to a criminal conviction, if they signed a statement admitting guilt). The Home Office response was to refer the request to the national Director of Public Prosecutions (DPP), which responded in turn by saying that simple cautions should not be used for domestic abuse. Further, the DPP response said all cautions for domestic abuse should carry conditions. If offenders failed to meet those conditions, they should be “breached” and prosecuted for that breach as well as for the initial offences.

The golden opportunity in the DPP response was to provide support for experimenting, by random assignment, with different kinds of conditions for the cautions. That set Superintendent Jarman off on a long process of consultations. His own Master’s thesis (Jarman 2011) designed the protocol for that RCT, which was taken forward by subsequent cohorts of Hampshire police leaders who joined the Cambridge Police Executive Programme. These mid-career postgraduate students, including Scott Chilton, Joanna Rowland, and Nicole Cornelius, engaged in extensive consultations with local groups. The baton Jarman handed off to this relay team was to find a plausible means of reducing repeat offending with low-risk offenders (according to DASH assessments), who constituted the majority of those arrested at that time. As commander of the designated area for the trial in Hampshire Police and a Cambridge MSt student, Scott Chilton (2012) led the design and implementation of the CARA experiment to great success.

In retrospect, the process they completed neatly fits the recently articulated Behavioural Insights Team TEST model of innovations (BIT 2017): Target, Explore, Solution, Trial. The target was the low-risk cases receiving simple cautions. The exploration looked for locally available resources for supporting men to desist from domestic abuse, especially for intimate partners. The solution chosen was the services of the Hampton Trust, a local charity in Southampton with many years of experience in working with domestic abusers to help them reflect on, and change, their behaviour. The trial became the basis of this article.

Research Question: The CARA Experiment

The research question for this study was simply whether a certain programme would reduce the future harm that a certain group of offenders would cause by intimate partner violence compared to nearly identical offenders who were not placed in the programme. The question was not theoretically driven or “deductive,” but rather an “inductive,” trail-and-error approach to trying out a tool that was readily at hand.

The scope for the exploration of possible options for a conditional caution of domestic abusers in Southampton was not great in 2010. The cost of issuing a simple caution had been virtually zero, until the DPP banned them. The cost of providing a task by which offenders could accept a caution that required meeting a condition was potentially infinite, but resources were not. What the exploration produced was a plausible condition, provided by a locally respected charity that was within discretionary spending limits of a police force purchasing services rather than police salaries.

The context for selecting this solution for a controlled trial was the legal framework of “conditional cautions,” introduced by the Criminal Justice Act for England and Wales of 2003. Under the new statute, an offender receiving a conditional caution could avoid prosecution in court, which meant they could eliminate the potential for a prison sentence, many hours of community service, or a large fine. The price of this escape, however, was that they had to (1) sign a statement making full admissions of guilt for the offence; (2) sign an agreement to complete the condition imposed, whatever it may be; and (3) not be arrested for any new crimes or breach of the conditions within a period of time after receiving the caution, usually 4 months. Ideally, police wished to find a service provider to whom they could hand over an offender who had just signed these agreements. Then, they could wait until the provider reported back on whether the offender had completed the conditions (which closed the case) or not (which meant the offender would be charged in court for the original offence, to which they had signed a confession).

The Hampton Trust was the charity selected, offering Hampshire Police a wealth of experience in dealing with domestic abuse offenders. The Trust had been involved over two decades in the provision of services for domestic abuse victims and offenders. Police asked Hampton Trust to develop a bespoke 2-day offender workshop, in which they focus on raising awareness of abusive behaviour, especially around the safety of partners and children. Their stated objective was to move offenders from denial and minimisation towards acceptance of responsibility for harm, as well as to provide strategies for conflict resolution within the relationship. Central to the theory of the intervention was a demonstration of respect towards the offenders, while condemning their offences—not unlike the criminological theory of reintegrative shaming (Braithwaite 1989).

The exact nature of the Trust’s methods was not known to the planning team at the time of the decision to create a partnership, nor had there been any direct observations of any workshops or meetings with offenders. Both of those issues later became the subject of yet another master’s thesis, this one by Superintendent Tony Rowlinson of Hampshire Police (Rowlinson 2016). At the time of the programme launch, the Hampshire police were satisfied that Hampton Trust had the right kind of track record and reputation for meeting the difficult challenge posed by the goal of reducing future harm. They also had the modest funding that was needed to purchase the services agreed to, which the police planned to impose as the test condition for a caution.

For these practical reasons, the final trial protocol was to consist of the following elements:

  • Screening all domestic abuse arrests for eligibility

  • Confirming victim consent to the RCT (usually by telephone)

  • Confirming Crown Prosecution Service approval of a conditional caution

  • Using a computerized random assignment algorithm in the police station to randomly assign eligible cases to one treatment group or the other.

  • Releasing the arrestees from police custody after all the paperwork was signed

The control group condition was almost identical to a simple caution, of the kind that had been banned. Instead of being the final disposition, the simple conditional caution required that the offender not have any repeat offence in the next 4 months. If they failed to satisfy that condition, the offenders were told they would be prosecuted in court for both the current offence and a new offence. The control group had no other conditions to meet.

The workshop treatment group had to sign the same confessions and agreements, but also had to agree to attend, with four to seven other cautioned offenders, a 5-hour “workshop” on two separate Saturdays, 4 weeks apart, held in an upscale (but not lavish) local hotel. The workshop began in the morning with coffee, took a long lunch break, and reconvened in the afternoon, spreading (or exceeding) the 5 hours over the majority of the day. Rowlinson (2016) summarized the approach taken by the two Hampton Trust workshop facilitators as “motivational interviewing: a collaborative conversational style for strengthening a person’s own motivation and commitment to change…in which [offenders] were more likely to be persuaded by what they hear themselves say.” Some offenders attended only one session, some attended none, but most who were assigned to the workshop group attended both sessions as scheduled (see Fig. 1).

Fig. 1
figure 1

CONSORT diagram of the CARA experiment

Eligibility Criteria

Both the control and workshop treatment groups were screened in on the basis of the following 12 eligibility criteria:

  1. 1.

    The DASH risk assessment did not classify the offender as high risk.

  2. 2.

    The CPS had been consulted and authorized a conditional caution.

  3. 3.

    The offender was male and the victim female.

  4. 4.

    The offender was over 18.

  5. 5.

    The offender had no previous convictions or cautions for violence in the preceding 2 years.

  6. 6.

    The offender had admitted the offence, or there was overwhelming evidence of the offender’s guilt.

  7. 7.

    The offender was not currently subject to a community-based court order.

  8. 8.

    The offender was not on court or police bail for other offences.

  9. 9.

    The offence was classified as one of the following: Common Assault/Battery, Criminal Damage, Harassment, Threatening Behaviour, and Domestic Theft.

  10. 10.

    The offence involved abuse against a past or present (intimate) partner or spouse.

  11. 11.

    The victim had not indicated that the conditional caution would place them at significant risk.

  12. 12.

    The offender had a sufficient level of English comprehension to take part in the workshops if assigned to do so.

The CARA experiment commenced in September 2012. The last case was randomized in November 2015. During that 39-month period, there were 4768 domestic abuse arrests in the CARA catchment area of Hampshire (Western Hampshire, including Southampton), of which 293 were enrolled in the experiment.

Data and Methods

All of the 4768 cases were required to have bene screened for eligibility, although we were not able to fully audit the process. We did have a local data manager who tracked all of these arrests on a daily basis, producing reports not just on the experiment but on the dispositions of all arrest cases in that time period.

The arrestees were brought into a single, brand-new police station with a large “custody suite” in Southampton immediately upon their arrival for processing the arrests. The arresting officers brought the arrestees to a counter where a custody officer would perform the first stages of the eligibility checks and refer the potentially eligible cases to the Custody Investigative Team in a nearby office to contact the victim and Crown Prosecution Service for approvals, as well as the arrestee.

Once the case was deemed fully eligible, the offender’s identification details were to be entered into a computer program originally called the “Cambridge Randomizer” (Ariel et al. 2012) and subsequently named the “Cambridge Gateway.” This software created a record of each case and its treatment as randomly assigned. It was then immediately applied by the Custody Officer who processed the conditional caution on the spot. Both workshop and control group offenders were released in approximately the same number of hours after they were locked in the cell for the arrest processing period, approximately 4 to 8 h after arrival.

We know that of the 4768 cases, 1630 were assessed as “high risk” on the DASH (683 of which were disposed of by No Further Action, or NFA) and were therefore not eligible. A further 356 cases assessed as “low/medium” risk were disposed of by NFA and could not therefore be included in CARA.

After removing the high risk and the NFA cases, of the 4768 total cases, 1469 cases remained potentially eligible for inclusion in CARA. Of the 1469 cases, ultimately 293 cases were randomly assigned in the experiment to either the workshop (154) or control (139) condition. We do not know the reasons why 1076 cases were not considered for CARA: either they did not meet one or more of the eligibility criteria listed above, or the victim or offender declined to participate, or the arresting officer did not seek to establish eligibility.

We do, however, have some data on reasons for drop-out from CARA post-random assignment:

  • In seven cases (three workshops and four controls), the Crown Prosecution Service overrode the police decision to put the case in the experiment, in some cases, because it was too serious and some because it was not serious enough to dispose of by conditional caution.

  • In 20 cases (10 workshops and 10 controls), the case was accepted in error by the police, i.e. the case did not meet one or more of the eligibility criteria listed above but was randomly assigned anyway.

Subsequent to the initial disposition, which is our test for concluding that 91% of cases were initially treated as randomly assigned, there was further variability introduced by the arrestees. That variability, or differences between treatment groups in its statistical profile, is properly seen an outcome of the treatment pathway, rather than as an issue of delivering the treatment pathway.

Offenders Who Breached

Most of the variant cases were offenders’ acts that breached their conditions of the conditional caution. There were more ways to do this in the workshop group than in the control condition. Recall that control group offenders were only required not to be re-arrested in the 4 months following the disposal of the case. The workshop cases, however, were not only required not to be re-arrested in the 4 months, but also had to attend the two workshop sessions that constituted the treatment. With these additional grounds for being breached, offenders in the workshop condition had over twice as many breaches (22) as those in the control condition (9). But six of the workshop group who were breached did in fact attend two workshops and thus received their allocated intervention.

In the workshop group, the reasons for breach were as follows:

  • No workshops attended (but no re-offending)—5

  • No workshops attended and one DA arrest within 4 months—2

  • 1 workshop attended (but no re-offending)—4

  • 1 workshop attended but arrested for DA offence within 4 months—4

  • 1 workshop attended but charged with a DA offence within 4 months—1

  • 2 workshops attended but arrested for non-DA offence within 4 months—1

  • 2 workshops attended but charged with a DA offence within 4 months—4

  • 2 workshops attended but charged with non-DA offence within 4 months—1

In the Control group, the reasons for the nine breaches were:

  • Arrested for DA within 4 months—1

  • Charged with a DA offence within 4 months—7

  • Charge with a non-DA offence within 4 months—1

Despite these predictably varying developments in each offender sanctioning pathway, the overwhelming majority of the offenders in both groups did (and did not do) exactly what was randomly assigned. This claim is best supported by the recommended approach for presenting a description of what really happened in an experiment, as distinct from what the plan was. This description is called a CONSORT diagram, something that was developed by medical researchers to make their research reporting more consistent and therefore more transparent. The Consolidated Statement On Reporting of Trials diagram is presented in Fig. 1 to summarize the distribution of all of the cases across the intake and treatment of eligible cases.

Crime Harm Index

The follow-up data collection was done by a single team member (Braddock) over the course of the experiment. His daily review of new cases was supplemented by a review of all of the previously enrolled offenders for new arrests or convictions. He compiled all of these tracking data into a spreadsheet with specific offence types listed. Once these data were assembled for at least 365 days for all of the 293 enrolled offenders, the specific offence types were coded by another member of the team (Weinborn), who assigned to each offence the number of days in prison recommended by the Sentencing Council of England and Wales as the “starting point” for sentencing (Sherman et al. 2016). The starting point is the sentence the crime itself deserves, without any consideration of the prior offending history (or lack of it) on the part of the offender, or the aggravating or mitigating circumstances of the offence. The complete list of the starting point sentences by offence types is posted on the University of Cambridge website here: https://docs.google.com/spreadsheets/d/12uwGjz2HhoqKXsHB-F-J2vvWgHvLJ7ukrDIVP8YjlpQ/edit#gid=0. Once Weinborn classified all the offence codes, he summed all of the days of recommended imprisonment for the crimes the CARA sample had been arrested for in the first 365 days after random assignment. These sums were then the basis for the outcome analysis produced by another team member (Ariel).

Findings

The primary outcome measure was the Cambridge Crime Harm Index values; secondary outcomes were the prevalence and frequency of repeat offending counts. All measures were examined both for domestic abuse only, and for all offence types, with the main effects for the entire study period. In addition, a sensitivity analysis shows the difference in outcomes between two periods of the enrolment of cases, in the first and second halves of the experimental time period.

Main Effects

Figure 2 shows that over the 365 days after random assignment, the workshop treatment group members were arrested for new domestic abuse crimes with 27% less CHI severity than the control group. The mean CHI severity was 8.44 days of recommended imprisonment per offender in the workshop group compared to the mean of 11.63 days per offender in the control group. This was based on a total of 1299 days of recommended imprisonment across all 154 workshop group offenders, compared to 1616 days across all 139 control group offenders. The results for all crimes were almost identical, with 1341 days for the workshop and 1645 days for the controls; most of the crime harm for these low-risk offenders was in domestic abuse.

Fig. 2
figure 2

Domestic abuse outcomes in Cambridge crime harm index

Table 1 shows that for the traditional recidivism measures of prevalence and frequency, the results are consistent with those of the CHI measures. Note that the measures of raw counts of crime treat all crimes as if they were of equal seriousness, from criminal damage to a serious assault. It is for that reason that we treat the CHI values as the primary outcome.

Table 1 Crime count analysis, all crimes weighted equally

In more rigorous analysis of the count data, we examined the standardized mean difference between the offenders in the two treatment groups, finding small to medium statistically significant effects. For recidivism by all crime types, d = −.286 (95% CI −.517, −.056; P < .02). For domestic abuse recidivism only, d = −.299 (95% CI −.530, −.068; P < .01).

Differences between two periods

Because the weekly rate of cases enrolled in the experiment dropped substantially during the second half of the project period, we conducted a sensitivity analysis of the effects by time period. In the first five quarters, CARA enrolled an average of 27.4 cases per quarter. In the remaining five quarters, CARA enrolled an average of 20 cases per quarter. The second period had several differences from the first, including fewer domestic abuse arrests (dropping from about 200 per month to 160 per month), a higher percentage of DASH assessments excluding cases from CARA as high-risk and longer waiting periods between arrest and workshops being scheduled. The number of participants in the workshops also declined by 24%. In the first 146 cases, with 11 workshops, the mean was 5.5 offenders in each workshop. In the remaining 147 cases, with 18 workshops, the mean was 4.2 offenders per workshop.

It is hard to interpret these differences theoretically, but we can at least report that the effect size of CARA’s benefits was much higher in the first half of the experiment than in the second half. Figure 3 shows the differences between the first and second period. While the relative difference between the workshop and the control group declines in the second period compared to the first, the absolute levels of harm increase substantially. That is, the eligible sample seems to have much higher CHI levels of recidivism, as indicated by the control group. At a mean of 18 CHI days in period two, the control group recidivism is over three times as harmful in period 2 as in period 1. That difference is not in any way indicative of treatment effects. It simply describes a substantive change in the criminogenic factors associated with the cases meeting the same eligibility criteria in period 2 as in period 1.

Fig. 3
figure 3

Effects on domestic abuse recidivism in first and second half of experimental period

Conclusions

Regardless of the differences between the first and second periods of the experiment, the overall effects of CARA are very encouraging. This is, to our knowledge, the only randomized trial in policing domestic violence to report a substantial reduction in repeat offending since 1984 (Sherman and Berk 1984). It is also the only randomized experiment in policing domestic abuse ever to calculate the effect of the programme on the basis of a crime harm index (Sherman 2007, 2011, 2013; Sherman et al. 2016). Whether the new approach of using CHI will continue to show similar results to prevalence and frequency will be an interesting question for this CHI measure.

The more important question for domestic violence victims is whether the CARA approach can help more victims if it is used more widely. The evidence from this experiment is certainly strong enough to justify further randomized trials. The evidence is far stronger than it is for many other expensive, and potentially harmful, methods of policing this crime. It is also, at a cost of roughly £100 per offender per case, a policy that is likely to be highly cost-effective. If the ideal size of a discussion group is about two or three people more than in the 29 workshops in the CARA experiment, then the cost-effectiveness would be even greater.

What seems most important to question, based on these results, is whether there is any reason to exclude cases that are assessed as “high risk” on DASH when that instrument itself has not been validated. While Thornton (2011) found that none of the domestic homicides or attempted homicides she studied in Thames Valley had been assessed at high risk (if there had been previous police contact), many other cases were assessed as high risk—but had no subsequent serious harm. If British policing were to support the more educational approach of CARA with 100,000 or more first-offence IPV cases per year, it is not implausible that it could foster a changing culture of male views on low level IPV. That would not be the only goal of policing domestic abuse, but it would be a goal well worth achieving. Unless CARA is given a green light for wider use and larger-scale testing, we will never know whether it can help to change any culture of domestic abuse.