Police officers often have to make far-reaching decisions and perform actions in threatening, ambiguous, and rapidly changing situations. For instance, when a suspect suddenly pulls a knife, a police officer must decide ad hoc how to react and execute his or her reaction correctly. In such situations, stress – triggered by anxiety-producing threats – is likely to disrupt cognitive and motor processes (e.g.,Beilock and Carr 2001; Eysenck et al. 2007; Masters 1992; Nieuwenhuys and Oudejans 2017). Indeed, stress and its associated responses have been shown to impair officers’ shooting performance (Giessing et al. 2019; Nieuwenhuys et al. 2009; Nieuwenhuys and Oudejans 2010) as well as arrest and defense skills (Renden et al. 2014, 2017). Given the importance to make good decisions and act appropriately in policing, the current study aimed to test the extent to which occupational performance under stress may be improved by representative training designs.

One particular type of training intervention that has started to gain attention in various high-stress domains is pressure training (PT; Kegelaers and Oudejans 2022). PT involves the physical exercise of domain-specific skills (e.g., shooting or self-defense skills in police) while deliberately exposing trainees to (simulated) stressors within the practice environment in order to elicit a psychophysiological state of stress including the experience of anxiety and release of biological stress responses. Within the PT literature, pressure is typically defined as “any factor or combination of factors that increases the importance of performing well on a particular occasion” (Baumeister 1984, p. 610) and can be manipulated by either increasing task demands or consequences of performance on a given task. Several systematic reviews and meta-analyses have already demonstrated the effectiveness of PT for improving performance under stress in sports, firefighting, medicine, and policing (Gröpel and Mesagno 2019; Kent et al. 2018; Low et al. 2021). Although little is known about the mechanisms or processes through which PT contributes to improved performance outcomes (for an overview of potential functions see Kegelaers and Oudejans 2022), processing efficiency theories (e.g., Eysenck et al. 2007; Nieuwenhuys and Oudejans 2017) have repeatedly been proposed to explain how PT can have positive performance effects. These theories suggest that individuals can compensate for the negative impact of stress on cognitive and motor processes by exerting increased mental effort to inhibit stimulus-driven processes and enforce goal-directed processes. Indeed, several studies showed that participants in the PT group still reported increased stress and mental effort in the post-test, although they improved their performance (Alder et al. 2016; Oudejans and Pijpers 2009, 2010; Nieuwenhuys and Oudejans 2011).

Building on these theoretical considerations, one may wish that PT trains a general ability to perform under stress and that PT of one specific skill transfers performance even when task constraints or stressors are slightly different. However, most existing studies measured the effectiveness of PT by testing one specific skill in the same task under the same performance stressors that were practiced during PT (e.g., Nieuwenhuys and Oudejans 2011; Oudejans 2008; Oudejans and Pijpers 2009; for notable exceptions see Alder et al. 2016; Liu et al. 2018; Nieuwenhuys et al. 2015). Thus, it is still unknown to which extent the effects are specific to the training conditions or translate to other conditions (e.g., different task constraints or performance stressors) that performers might face in real life. Critically, we believe that goal-directed performance requires both cognitive and motoric online adaptations, requiring performers to choose “what” to do and “how” to do it (Raab 2017; Voigt et al. 2023). However, most existing studies have implemented tasks that focused either on the motor component of “how “ to do it (e.g.,Nieuwenhuys and Oudejans 2011; Oudejans 2008) or the decision-making component of “what” to do (e.g., Nieuwenhuys et al. 2015).

In the present study, we aimed to address the full complexity of goal-directed performance and trained a skill that requires both cognitive and motor adaptations to a changing environment. To achieve this, police officers practiced knife-defense skills either under high-stress (experimental group) or low-stress conditions (control group). Knife-defense skills include the decision-making skill of “what” to do (e.g., attack or recoil, use of the handgun) and the motor skill of “how” to do it (e.g., techniques to control the blade arm of the attacker). In line with the processing efficiency theories, we expected that the experimental group would improve their performance after the intervention compared to the control group, even if they would experience comparable stress and mental effort before and after the intervention.

Method

Participants

Participants were recruited among three classes of third-semester police recruits at a German police academy and a sample of active police officers on duty of a German police headquarter. A total of 84 officers (18 women) volunteered to participate. Active police officers were randomly divided into the experimental and control group. Police recruits were assigned to the groups based on their class (i.e., two experimental classes and one control class). The experimental group consisted of 51 officers and the control group consisted of 33 officers. Participants were on average 27.8 years old (SD = 8.5) and had 5.6 years of work experience (SD = 7.8). The control group had significantly more working experience than the experimental group (t(78) = -2.19, p = .031), but groups did not significantly differ in age (p = .168). All participants provided written informed consent.

The study was approved by the ethical review board of the German Sport University Cologne and performed in accordance with the declaration of Helsinki. Given the involvement of firearms and other weapons, it was executed under the responsibility of certified police firearms instructors, following their standard safety protocol.

Design

The experiment consisted of two testing (pre- and post-test) and four 4-h training in four weeks (one per week). Test and training sessions were set up at the training facilities of the police academy.

Training Intervention

Participants of both groups received four training sessions of four hours each. Officers were trained in groups of 12 to 20 and supervised by two experienced police instructors. During these training sessions, participants received input about knife-defense techniques and executed several exercises to practice these techniques under various conditions. The training exercises were the same for both groups, with the sole difference that the experimental group practiced under additional stressors, i.e., anticipation of aversive stimuli, social evaluation, uncertainty, and time pressure (for the manipulation of these stressors see Table 1). The control group practiced without additional stressors.

Table 1 Overview of stressors and their manipulations in the training intervention in the experimental group

Testing Sessions

Officers’ occupational performance was measured in reality-based police scenarios that were designed by two experienced police instructors. In developing the scenarios, we aimed to develop a representative training and research design (Pinder et al. 2011) by artificially constructing realistic environments (i.e., station forecourt, living room) and increasing the level of threat to such a degree that participants had to perform while they experienced stress. Officers did not receive specific instructions about what to do. They received a short briefing about the incident similar to a radio message and were instructed to act as they would on duty. They were dressed as usual and were equipped with a pepper spray, a padded baton, and a handgun identical to their duty weapon (Heckler and Koch, P30), but adjusted to fit colored soap cartridges (Simunition ®, FX Marking Ammunition). Participants entered the scenario with a colleague to simulate the usual police patrol of two officers. However, the colleague – a confederate police trainer – acted to be occupied with dealing with distracting stimuli (i.e., second loudly agitated person), creating a one-on-one situation with the actual perpetrator for the participant. All involved persons (i.e., colleague, distracting person, and actual perpetrator) were role-played by experienced police instructors. All scenarios were supervised by a police trainer and recorded on video by a stationary GoPro camera and a mobile camera operated by a police trainer for later performance analyses (see Measures).

Test Scenarios

In the pre- and post-testing, participants underwent a knife-attack scenario. Although the background set-up for the scenarios varied between testing sessions (pre-test: identity check of two suspects at a station forecourt; post-test: domestic dispute), the scenarios were designed to create a comparable knife attack in both scenarios. To mirror the spatial narrowness of an apartment in the domestic-dispute scenario, parking cars were placed on the station forecourt in the pre-test scenario. In both scenarios, the participant and all acting police trainers were located at a pre-defined starting position, so that the distance between the participant and the knife-attacking suspect was about 2.5 m starting signal, both suspects got loudly agitated. The confederate colleague spatially separated one of the suspects to create the one-on-one situation between the participant and the actual perpetrator. During the one-on-one situation, the perpetrator pulled a knife that was positioned in the waistband behind the back. To standardize the knife attack, the perpetrator took the knife briefly next to his hip, before he stabbed dynamically at stomach level in the direction below the protective vest. The supervisory police trainer ended the scenario when the knife-attacking suspect had been incapable of further attacks, an arrest had been made, or the participant had given up prematurely.

To reduce expectancy effects regarding the knife attack, we employed an additional scenario of passive resistance in a domestic dispute in the posttest. No knife attack occurred during this scenario. The order of the knife-attack and passive-resistance scenario at posttest was counterbalanced.

Measures

Evaluation of Training Intervention

To assess the subjective satisfaction with the taught skills and the intervention at posttest, we adopted the six items of the arrest and self-defense skills (ASDS) preparation scale used by Renden et al. (2015). To compare the use of knife-defense skills and perceived performance effectiveness between pre- and post-test, we adopted items of the ASDS use (5 items) and performance effectiveness scales (4 items) used by Renden et al. (2015), respectively.

Stress Responses

To assess whether test scenarios and training exercises elicited stress responses, participants’ perceived stress and mental effort in each test scenario and training exercise were assessed by using the anxiety thermometer (Houtman and Bakker 1989) ranging from 0 (not at all stressed) to 100 (extremely stressed) and the rating scale mental effort (RSME; Ziljstra 1993) ranging from 0 (not effortful) to 150 (extremely effortful).

Performance

We used a five-point Likert scale to assess number of measures of knife-defense performance in each scenario (Nieuwenhuys et al. 2009; Renden et al. 2017). Following the procedure in Nieuwenhuys et al. (2009), four experienced police instructors who conducted the training interventions in the present study agreed on the relevant criteria and developed descriptors for the extremes of each scale: overall performance, distance to suspect, physical defense, situational control, use of applied force. Higher scores on these scales represent better performance. Three other police instructors – uninvolved in the development and conduction of the training interventions – used these scales to assess participants’ performance on the basis of video recordings of the scenarios. Additionally, they assessed the survival rate on a visual analogue scale ranging from 0 to 100%. To make sure that scenarios were well visible from different angles, a stationary GoPro camera and a mobile digital camera operated by an experimenter were installed. The raters were able to use the images of both cameras as often as they wanted until they were satisfied with the score. The videos were masked and randomly presented. Intraclass correlation coefficients showed satisfactory inter-rater reliability ranging from .70 to .90 (Hallgren 2012).

Statistical Analyses

As a manipulation check of the training intervention, differences between groups in perceived stress and mental effort in the training exercises (averaged over the different sessions) were analyzed using independent samples t-tests. To compare how participants evaluated the two training interventions, we performed independent samples t-tests with subjective satisfaction as dependent variable as well as 2 (Group: experimental, control) × 2 (Test: pretest, posttest) mixed design ANOVAs on ASDS use and performance effectiveness.

To assess the effects of the training interventions on stress, mental effort, and the performance parameters, we performed 2 (Group: experimental, control) × 2 (Test: pretest, posttest) mixed design ANOVAs. The alpha level for significance was set at .05. Effect sizes were calculated using Cohen’s d for t-tests and η2p for ANOVAs. Significant effects were followed up by Bonferroni-corrected post-hoc tests. All analyses were performed using JASP (version 0.16.4).

Results

Manipulation Check

The experimental group reported significantly more stress (t(82) = 6.28, p < .001, d = 1.40, 95% CI [0.99, \(\infty\)]) and mental effort (t(82) = 4.37, p < .001, d = 0.98, 95% CI [0.59, \(\infty\)]) in the training exercises than the control group.

Evaluation of the Training Intervention

The groups did not differ in their subjective satisfaction with the training intervention (t(69) = -0.22, p = .585). The ANOVAs on ASDS use and performance effectiveness showed a significant main effect of test, but no main effect of group nor interaction effect (see Table 2).

Table 2 Descriptive and inferential statistics of arrest and self-defense skills (ASDS) use, performance effectiveness, stress, mental effort, and performance parameters

Stress Responses

The ANOVA on perceived stress in the scenarios showed neither a significant main effect of test and group nor an interaction (ps > .176). Likewise, the ANOVA on mental effort showed no significant effects (ps > .300; see Table 2).

Performance

The ANOVA on the overall performance showed a significant main effect of test (F(1, 48) = 35.24, p < .001, η2p = .42), but no main effect of group nor interaction (ps > .487). Participants performed significantly better in the posttest than the pretest, p < .001, 95% CI [-1.62, -0.80], with corresponding results for all performance variables (see Table 2).

Discussion

When individuals experience stress in response to environmental threats, associated stress responses may disrupt cognitive and motor processes, resulting in performance decrements (e.g., Beilock and Carr 2001; Eysenck et al. 2007; Masters 1992; Nieuwenhuys and Oudejans 2017). Previous research has shown that PT is effective in counteracting such performance decrements (Gröpel and Mesagno 2019; Kent et al. 2018; Low et al. 2021). Using a pretest-intervention-posttest design, the current study aimed to explore the extent to which police officers’ performance in occupationally relevant skills may be improved by PT. Although the experimental group reported more stress and mental effort during the training interventions than the control group, results showed that both groups improved in self-reported knife-defense skills and performance effectiveness as well as in externally assessed performance variables. Survival rate in the critical-incident scenarios increased by on average 28% from pretest to posttest and improvements in the performance variables entailed one scale point on a 5-point Likert scale. Since the performance improvement also applied to the control group, who trained under low-stress conditions, the effect was likely due to the introduction and training of novel skills, irrespective of the training conditions. Based on these findings, we conclude that PT did not help to significantly improve officers’ knife-defense performance beyond the non-pressurized training.

This finding is in contrast to a recent meta-analysis which demonstrated performance-enhancing effects of PT across domains (Low et al. 2021). We propose two potential explanations for the divergent result: First, the knife-defense skill in the present study required coupled “what” and “how” decisions instead of testing and training them in isolation. Second, we tested this skill under task constraints and performance stressors that differed between training exercises and testing scenarios. Thus, performance-enhancing effects of PT may only hold when the same skills are tested under the same conditions under which they were practiced and do not require “what” decisions (Nieuwenhuys and Oudejans 2011; Oudejans 2008; Oudejans and Pijpers 2009). This interpretation aligns with the findings of Nieuwenhuys and colleagues (2015) that isolated “what” decisions (i.e., shoot vs. not shoot) under stress did not improve after PT compared to low-stress or no training. Therefore, we cautiously conclude that PT may not train a general ability to perform under stress, but rather is a motor-specific adaptation to the training conditions, as proposed in the specificity of practice principle (Proteau 1992; also see Cassell et al. 2017; Lawrence et al. 2014). PT may be effective in enhancing movement execution (without decisions) under stress, especially when tested under the same stressors. However, the practical implications of PT for high-stress domains, such as policing, in which “what” and “how” decisions are required should be derived with caution until transferability to real-life settings have been shown. To disentangle these effects, future research should employ a full experimental design in which “what” and “how” decisions are tested separately under both the same and different stressors that are applied during training.

Another alternative explanation for the non-significant interaction effect may be the comparably high stress levels in the control group. Although there was a significant difference in reported stress levels and mental effort between the experimental and control group, stress levels and mental effort in the control group were comparable to or even higher than the levels reported in PT interventions in other police samples (cf.Nieuwenhuys and Oudejans 2011; Nieuwenhuys et al. 2015). Possibly, the elicited stress levels in the control group exceeded a certain threshold, so that the amount of stress experienced turned the intervention of the control group into a PT and therefore produced the same positive effects as in the experimental group. Although it has been repeatedly noted that stress levels in PT should be high enough to accrue the benefits, but not too high to interfere with learning and training, an empirical identification of the optimal amount of stress for PT is still outstanding (Di Nota and Huhta 2019; Giessing 2021; Kegelaers and Oudejans 2022). Thus, the alternative explanation remains speculative until future research has identified how much stress should be elicited in PT.

With group sizes of n = 36 in the experimental group and n = 14 for the control group in the analyses for the performance variables, one issue that might be raised is that our experimental power may have been too low to observe the hypothesized interaction effect. Nevertheless, our group sizes exceeded those obtained in other studies which tested similar PT interventions using comparable designs and which showed positive effects of PT (see Low et al. 2021).

Importantly, our results showed that it is possible to improve officers’ performance under high stress. While occupational performance under stress remained challenging after PT, officers did perform better on the posttest than on the pretest. Although this was a general effect, which was also shown by the control group, this indicates that knife-defense skills are sensitive to improvement. Notably, in contrast to previous studies on shooting skills in police samples (Nieuwenhuys and Oudejans 2011; Nieuwenhuys et al. 2015), the skill trained in the present study was novel to the participants. While they were able to rely on general physical defense and arrest skills that they had acquired in their previous training, they had nearly no experience with the knife-defense principles and techniques trained in the present study. This might explain why we observed performance improvements in both the experimental and control group in the present study. Given there was still room for improvement after the training intervention (see Table 1), PT might only become effective later in the (motor) learning process after a certain skill level has been reached. Support for this argument comes from an experiment that tested the influence of the timing of stress exposure in the learning process. The results showed that training benefits were greatest when exposure to stress occurred in the latter half of the training intervention (Lawrence et al. 2014). Thus, it is possible that we would have observed favorable outcomes from PT if participants had engaged in more frequent practice sessions or repetitions. However, we chose the number of training sessions based on a recent meta-analysis that demonstrated beneficial effects of PT interventions, even with fewer than five sessions (Low et al. 2021). As such, whether additional training indeed leads to the superior efficacy of PT remains a topic for future studies. Given the limited availability of resources for training in many high-stress domains, virtual reality might be a promising tool to implement PT with higher frequency and more repetitions, while requiring both decision making and action in a fully immersive environment (Giessing 2021; Kegelaers and Oudejans 2022; Kleygrewe et al. 2023).