Eyewitness testimony and perpetrator identification lies at the heart of the criminal justice system. In the absence of incriminating physical evidence, an eyewitness can be crucial in convincing a court of a defendant’s guilt. Despite its importance, a large body of research has found that the capacity of eyewitnesses to identify perpetrators is error prone (Steblay et al., 2001, 2003), and the nature of these errors remains largely unexplored, such that it is unclear if a systematic bias exists in the appearance of those wrongly identified as criminals. Specifically, do their appearances align with that of a stereotypical “criminal”?

Identification errors may be attributable to the difficulty people often experience with recognizing unfamiliar faces. Although early work in facial recognition suggested that people are experts in recognizing unfamiliar faces (Hochberg & Galper, 1967; Yin, 1969), it has since been argued that this “expertise” reflects our capacity to match exact images of people only (Bruce, 1982; Bruce et al., 1999). Even slight tweaks in the viewpoint or lighting of an unfamiliar face can disrupt recognition accuracy (Hancock et al., 2000). Person identification rates in immediate memory tasks have been shown to be as low as 60% correct (Megreya & Burton, 2008), with participants also recognizing the target in target-absent line-ups in 20% of cases.

In addition to difficulty in identifying faces, people may also struggle with accurately recalling other important visual features, such as body morphology. Indeed, body shape plays an important role in our perceptions of others (Aviezer et al., 2008; Aviezer et al., 2012a, 2012b; Bobak et al., 2016b, b; Enea & Iancu, 2016; Meeren et al., 2005; Van den Stock et al., 2007; Vrancken et al., 2017), in how accurately we recall them, particularly in the absence of useful facial information (Rice et al., 2013). However, our capacity for accurate body recall is limited. Buckhout et al. (1974) found that participants were relatively accurate in their estimations of criminal height following a staged mock crime, but underestimated weight by 10–15 kg. Others report the opposite trend, with target weight slightly overestimated and height slightly underestimated (Buckhout, 1980; Buckhout et al., 1975). More recent work (Ebbesen & Rienick, 1998) found that both target height and weight were underestimated, while a study of eyewitness recall under naturalistic settings (Yarmey & Yarmey, 1997) found that estimations for weight and height were only moderately accurate.

In a study on the recall of real-life eyewitnesses who had observed a particular gun-shooting incident, Yuille and Cutshall (1986) found that errors in estimating height, weight, and age accounted for 52% of the person description errors at first recall. This prompted the authors to declare that “witnesses should not be asked to provide descriptive statistics of people. It is obvious that attempts to guess at height, weight, and age are pointless. . . . police should be content with relative estimates of these characteristics. That is, they should ask each witness to make height judgments relative to some environmental fixture, or another individual” (Yuille & Cutshall, 1986, p. 299).

Given that recognition accuracy for unfamiliar faces under controlled laboratory conditions can be low (Bruce, 1982; Bruce et al., 1999; Hancock et al., 2000; Megreya & Burton, 2008), and that body morphology estimations are error prone, it is unsurprising that eyewitness identification is often poor. Indeed, changes to the eyewitness lineup procedure have yielded only relatively small improvements in identification accuracy (McQuiston-Surrett et al., 2006). Lineup procedures typically involve the presentation of one suspect embedded around known innocent distractors, or “foils”, together in a line (simultaneous lineup). A more recent method consists of a witness viewing one lineup member at a time and deciding whether that person matches the appearance of the criminal prior to viewing the next member (sequential lineup; Lindsay & Wells, 1985). This helps to minimize the risk of a witness simply selecting the member who most closely resembles the perpetrator in a relative comparison in a scenario where the actual perpetrator is absent (Steblay et al., 2001).

Despite this potential improvement in lineup procedure, Wells et al. (2015) found that, in a study on actual former eyewitnesses, among those who made an identification, 32% of witnesses in a sequential lineup paradigm selected an innocent foil. Given this, it is unsurprising that eyewitness misidentification of innocent defendants plays a significant role in most cases of prisoners later exonerated through DNA evidence (The Innocence Project, 2022; Wells et al., 1998; West & Meterko, 2016). In actual criminal cases, suspect identification is subject to additional extraneous variables, making the task even more difficult (Wells, 1978). Indeed, many aspects of a crime outside of the control of the justice system (known as estimator variables) can affect the accuracy of perpetrator identification. For example, recognition accuracy improves when the physical distance between witness and criminal is small, and the duration the witness views the criminal is long (Lindsay et al., 2008; MacLin et al., 2001; Memon et al., 2003; Pezdek & Blandon-Gitlin, 2005; Wagenaar & Van Der Schrier, 1996).

Directional biases in misidentification: The CMEI

It is clear that errors can occur in the identification of unfamiliar people accused of crimes; however, it is less clear whether some people are more likely to be erroneously identified as a criminal. The contextual model of eyewitness identification (CMEI; Osborne & Davies, 2014) posits that associating someone with a crime distorts their appearance in later recall by automatically activating a stereotype about a perpetrator’s appearance that is congruent with the crime being committed. This is most frequently illustrated through examples of racial stereotypes, wherein crimes such as identity theft or embezzlement activate a Caucasian stereotype, while crimes such as procuring prostitution and carjacking activate a Black stereotype (Osborne & Davies, 2013). Upon activation, eyewitnesses are primed (Bargh & Chartrand, 2014) to preferentially encode—and later retrieve from memory—features of the perpetrator that are prototypically consistent with that stereotype. As such, the recollection of the criminal will be biased to match a stereotypic culprit more closely. The extent to which the recalled appearance of the perpetrator is distorted by this stereotype is subject to the estimator variables present, with disruptive extraneous factors increasing contextual distortion.

Evidence for the ability of stereotype activation to shape later identification was found by Eberhardt et al. (2003). When participants were presented with an image of a racially ambiguous man, and informed that he was either Black or White, the recalled target was distorted to comply more with the activated racial context. Kleider et al. (2012) report similar findings, showing that participants misidentified people with stereotypically Black features (as rated subjectively by a set of participants) as drug dealers significantly more than those with less stereotypical features. In a direct test of the CMEI, Osborne and Davies (2013) demonstrated that framing a target as the perpetrator of a stereotypically “Black” crime (drive-by shooting) resulted in participants misidentifying the target as significantly more stereotypically Black than when framed as the perpetrator of a stereotypically “White” crime (serial killer). Furthermore, Ben-Zeev et al. (2014) found that when participants studying a Black face were subliminally primed with a word that was counterstereotypic of Black people, they subsequently recalled the face as Whiter than participants primed with a stereotypic “Black” prime. These findings are consistent with the CMEI’s prediction that context can result in witnesses recalling a face as more racially congruent with the provided information.

The CMEI, perceived threat, and criminality

According to the CMEI, criminal stereotype activation is not restricted to race. For example, the authors argue that prostitutes are associated with relatively feminine features (Ward et al., 2012), and therefore suspected prostitutes are more likely to be misidentified if they are high in perceived female gender stereotypicality (i.e., if they have relatively feminine features). Similarly, when someone is associated with a criminal act (Flowe, 2012), they argue that they should later be recalled as appearing less trustworthy than when they are associated with a neutral act. There is some support for this nonracial memory biasing. For example, in a facial recreation study, participants recalled an identical face as less intelligent and attractive (as rated by a separate set of participants) when they were told it depicted a murderer as opposed to a lifeboat captain (Shepherd et al., 1978). In a rare study on the biased recall of body morphology (Shaw & Wafler, 2016), participants were presented with footage of a staged crime and then asked to select the culprit from a (suspect-absent) lineup that consisted of digitally manipulated defendants of varying body morphologies. Suspects who were altered to appear muscular were significantly more likely to be identified as the criminal than suspects who more closely matched the morphology of the actual perpetrator. These results suggest that, as per the CMEI prediction, the suspect was perceived and later recalled in a biased fashion to match the appearance of a stereotypically threatening criminal. Indeed, perceived threat in computer-generated (CG) body stimuli has previously been shown to increase linearly with increases in musculature (McElvaney et al., 2021).

However, this prediction that a criminal framing can subsequently modify the manner in which appearance is processed and recalled has yet to be systematically explored. One key attribute associated with a “criminal appearance” is how threatening a person appears. In a study on the perception of criminality, Funk et al. (2017) found that the perceived threat of a face was highly positively correlated with the criminality associated with the face. In addition, Oosterhof and Todorov (2008) found that high perceived threat is a result of a combination of high dominance and low trustworthiness, which has been shown to be associated with harsher, more extreme criminal sentencing decisions (Wilson & Rule, 2015, 2016). Hence, the perception of a stereotypical criminal is likely to be biased towards high threat. Thus, if a person is associated with committing a crime, this should distort their recall by participants to align with that inherent stereotype (i.e., appearing more threatening; MacRae et al., 2002).

Research on the effects of a positive or negative framing context on person identification has yielded inconsistent results. Recognition accuracy for people paired with prosocial contexts is higher than when paired with antisocial or neutral contexts (Felisberti & McDermott, 2013; Felisberti & Pavey, 2010). Moreover, informational contexts that frame the target as belonging to an out-group of a lower social class than the participant result in something like a cross-race effect (CRE; Sporer, 2001), wherein recognition is impaired for targets outside of the participant’s cultural or socioeconomic/class circle (Marsh, 2021; Shriver et al., 2007). These observations broadly align with the CMEI model, whereby the encoding and memory of people with unsavoury behaviours should be biased or impaired.

However, the opposite pattern has also been found. For example, face identification is enhanced when people are framed as having a history of cheating (Mealey et al., 1996). Similarly, neutral male faces depicted as defectors in the Prisoner’s Dilemma game are identified with greater accuracy than cooperating males (Oda, 1997). Simply pairing faces with pleasant, neutral, or unpleasant information can impact later identification, with some studies showing that faces presented in unpleasant contexts are recalled with greater accuracy than positively/neutrally valanced faces (Mattarozzi et al., 2015; Mattarozzi et al., 2019). Faces that inherently appear untrustworthy are often recalled more accurately that those with trustworthy appearances (Rule et al., 2012).

These results run counter to the predictions of the CMEI and suggest that recognition for appearance may be enhanced by the presence of negative character trait information, or affectively salient contextual information (Kerr & Winograd, 1982). However, none of the above studies directly assessed the nature of the people misidentified in the wake of being initially framed in a threatening context. The incorrect choice alternatives (foils) were not varied systematically in their perceived threat relative to the target, making it difficult to conclude whether the targets were actually encoded in a biased fashion.

The current study

If the appearance of those accused of a crime is encoded in such a way that they are recalled as more threatening, this suggests a potential bias against people with naturally more threatening appearances. Hence, the goal of the current study was to conduct a systematic examination of the contextual effects of crime information on the identification of criminal faces and bodies.

We measured the impact of context using still images of CG face stimuli varying in perceived facial threat (Todorov et al., 2013) and CG body stimuli varying in musculature. We asked participants to study the image and later identify it from the foils. Although this does not match the procedural experience of real eyewitnesses, this allowed us to explore the potentially biasing effects of criminal context while maintaining tight control over the stimuli. We explored this potential bias to recall targets as more threatening when framed in criminal contexts in three experiments using (i) an immediate memory paradigm with a simultaneous lineup of targets and foils, (ii) a delayed memory paradigm with a sequential lineup of targets and foils, and (iii) a delayed memory paradigm with a simultaneous lineup of foils only. We hypothesized that, in the absence of criminal context, participants would not show any particular directional bias in recalling a body or face. We also hypothesized that, consistent with the CMEI, participants would select more threatening faces/larger bodies when the original target was presented in a criminal context than when presented in a neutral context. Each experiment and all hypotheses were preregistered before data collection began—Experiment 1 (https://osf.io/a82rw/), Experiment 2 (https://osf.io/4pmc5/), and Experiment 3 (https://osf.io/he84j). Ethical approval was granted by the Queen Mary University of London Institutional Review Board.

Experiment 1

We examined the effect of providing a context vignette (“framing”) of target CG faces and bodies as perpetrators of crimes on the encoding and later identification of these targets in an immediate memory paradigm. Participants were briefly presented with a target face or body, and then asked to identify the target from a lineup of either similar faces varying in perceived threat or similar bodies varying in musculature . Half of the participants were given a contextual backstory depicting the targets as perpetrators of a violent crime and informed their task would be to identify the targets, while the other half were given no contextual information (no context).

We first hypothesized (H1) that, in the no-context condition, participants would not be systematically biased in selecting a body or face. Second (H2), we predicted that participants would select more threatening faces/larger bodies in the criminal-context condition than in the no-context condition. We further expected an interaction between condition and the threat/musculature level of the target presented. Namely (H3), we expected the effect of the criminal context to be stronger for less threatening/smaller stimuli for two reasons. First, previous studies have shown that, when presented with a subliminal prime that contradicts the stereotypical nature of a presented stimulus, participants recall the stimulus as more closely matching the prime (Ben-Zeev et al., 2014). For the current experiment, if a nonthreatening/smaller person was accused of a crime, participants may subsequently select a stimulus that aligns more with the criminal context, i.e., a more threatening/larger person. Second, given the nature of the stimuli, smaller targets had a greater number of larger foil (more threatening) options available than larger targets. Hence, if the effect of crime condition emerged, it would be more pronounced in the less threatening/smaller stimuli.



To determine the required sample size for the current experiment, we conducted a power analysis using G*Power3 (Faul et al., 2007). For a mixed-methods design (ANOVA: Repeated measures, between factors), assuming a small-medium effect size (Cohen’s f = 0.20) a total sample size of 220 participants was required to assess the effects of both target stimulus (three levels of measurement) and context (two groups) on selection accuracy with a power of .95 and alpha level of .05. The experiment was conducted in April 2020 via the online software Qualtrics, with participants recruited from Prolific, an online crowdsourcing platform, and paid £0.85 for taking part. Participants were located in the UK, over 18 years of age, fluent English speakers, and had achieved an approval rate of at least 85% in their previous Prolific study participations (61 males; age: M =34.96 years, SD =12.57).

Stimuli (faces)

We selected 10 different identity face stimuli from the Todorov et al. (2013) dataset of computer-generated faces that have been previously validated to vary along the dimension of threat. All faces in the dataset are Caucasian male, and we selected those that appeared the least ambiguously Caucasian and also the least feminine in the low threat range. We converted the stimuli to greyscale and selected one face from each of these 10 sets of identity faces to act as the target stimulus. Six of these targets were at the central level of threat, 0SD (see Fig. 1), while the other four targets consisted of two faces each at −3SD and +3SD to avoid expectation effects.

Fig. 1
figure 1

Five levels of facial threat in one identity face. Left-to-right: −2SD, −1SD, 0SD, +1SD, +2SD, from Todorov et al. (2013)

Stimuli (bodies)

The body stimuli were realistic CG human male figures created using Daz Studio 4.10 Pro (https://www.daz3d.com/daz_studio) with the Male Anatomy Smart Content package. This software provides a default, anatomically accurate standard model (‘Genesis 8 Basic Male’) with dimensions that can be modified precisely to allow fine control over individual body shapes.

We manipulated the body’s appearance using the in-built musculature scale on the initial standard model to create target stimuli at three levels of musculature: 0% (unaltered), 50%, and 100%. There were 10 target identity bodies (using different clothing and blurred faces), six targets were at the central level of musculature (50%), two were at 0%, and two were at 100% musculature to avoid expectation effects. Foils of each target were created using step intervals of 25% musculature (see Fig. 2; see Appendix 1 for details).

Fig. 2
figure 2

Various levels of musculature in Experiment 1. Left-to-right: 0%, 25%, 50%, 75%, 100%. Target stimulus (e.g., 50% musculature) shown in the centre


Participants were randomized into either a criminal-context (N = 110) or no-context (N = 110) condition. In the criminal-context condition, they were informed that they were to adopt the role of an eyewitness, and that they would be presented with CG recreations of people accused of a crime. Participants were informed that the person they were about to see was accused of manslaughter, having shot and killed an innocent bystander while committing a robbery. In the no-context condition, participants were informed that the goal of the experiment was to examine how accurately they could identify unfamiliar people after a brief presentation.

Participants completed 20 trials in total, 10 target face trials (six trials at 0SD, two trials at −3SD, and two trials at +3SD) and 10 target body trials (six trials at 50% morphology, two trials at 0%, and two trials at 100%), using a different identity body/face target in each trial. In the criminal-context condition they were provided the same crime information prior to each trial, and then presented with the target stimulus (1 second), followed by a blank screen (1 second), followed by the lineup. In the no-context condition there was no information prior to each trial, and they were presented with a target stimulus on screen (1 second), followed by a blank screen (1 second), followed by the lineup (see Fig. 3). Body and face trials were randomly interleaved, and the participants’ task was to select the target from the lineup. After confirming their selection, participants proceeded to the next face or body trial.

Fig. 3
figure 3

Experiment 1 sample body trial in criminal-context condition. Participants read the crime of which the target was accused. This was followed by a brief presentation of the target (1,000 ms), followed by a blank screen (1,000 ms). Participants then selected the target from a lineup of five possibilities (no time limit imposed)

For face-target conditions, there were five options on the screen (the target and four incorrect foils randomly positioned on each trial) of the same identity face varying in threat. The options stayed on the screen until the participant made their selection (see Fig. 3). For the 0SD target-face conditions, the five options consisted of faces at −2SD, −1SD, 0SD, +1SD, and +2SD. For the −3SD target face condition, the five options consisted of faces at −3SD, −2SD, −1SD, 0SD, and +1SD. For the +3SD target-face condition, the five options consisted of faces at +3SD, +2SD, +1SD, 0SD and −1SD.

For body-target conditions, participants were presented with five same identity body options on screen in a randomized order. One of these was the original target body stimulus and the others were foils. On all trials, the five options consisted of the identity body at five levels of musculature: 0%, 25%, 50%, 75%, and 100%.

To check that increased musculature increased perceived threat, at the end of the experiment, participants were randomly presented with five new body stimuli wearing the same clothes and varying in musculature and asked to rate the perceived threat of the bodies on a scale of 1 to 7.


Musculature check

We used a mixed-effects ordered logit to assess the influence of level of musculature on perceived threat (PT) in the body stimuli. The overall model was significant in predicting the variation in PT of the body stimuli, Wald χ2(1) = 270.19, p < .001. Consistent with previous research (McElvaney et al., 2021), we found a significant stepwise, linear impact of musculature, with each increment accompanied by an increase in PT, OR = 3.74, 95% CI [3.19, 4.37], p < .001.

Descriptive statistics

Mean accuracy (proportion of correct identifications) was found to be higher for faces (M = .57, SE = .01) than for bodies (M = .49, SE = .01). In both cases accuracy was lower than the accuracy in a pilot study (Pilot 2) where participants made no errors when they simply had to match the target stimulus to the correct option (with target still on screen). For faces and bodies in our task, accuracy was also still well above chance (0.2). No clear difference emerged in performance between those in the criminal-context condition (M = .54, SE = .01) and those in the no-context condition (M = .52, SE = .01). A breakdown of performance across the two conditions can be found in Table 1.

Table 1 Identification accuracy (proportion of correct responses correct)

Directional bias in central stimuli

No context

To assess directional biases in participants’ responses, we focused on no context trials using the central target stimulus (0SD Threat Face or 50% Musculature) that had equal probability of choices above or below the target (face/musculature). If the correct face/body was selected, this response was coded as a difference score of zero. If the participant overestimated (underestimated) the threat/muscle level by one or two levels, this difference was coded as a score of +1/+2 (−1/−2) respectively. To investigate our first hypothesis, we used one-sample t tests to assess whether the mean difference scores for the no context central stimuli deviated significantly from the correct target value of zero. Consistent with H1, for the bodies, we found that the mean selected body (M = 0.04, SE = 0.03) did not vary significantly from the target body, t(109) = .336, p = .738, d = .03. However, for the faces, the mean selected face (M = 0.09, SE = 0.04) was significantly higher in threat than the target face, t(109) = 2.37, p = .019, d = .22.

Bias across all stimuli

To test our second and third hypotheses, we calculated absolute difference scores for each level of facial threat and musculature. We then used mixed ANOVAs to test the influence of target facial threat/musculature (within-participant) and the presence of criminal context (between-participant) on absolute difference scores.Footnote 1 As the absolute difference score for the noncentral trials could vary from 0 to 4 (since foils could be a maximum of four steps away), while the potential absolute difference for the central target trials varied from 0 to 2 (since foils could be a maximum of two steps away), we did not analyze the within-participant difference between the central stimuli and extreme stimuli.

We found no effect of target facial threat or criminal context on absolute difference scores for the facial trials, nor any significant interaction. Similarly, an exploratory analysis of response accuracy (correctly identifying the target stimulus) to facial trials revealed no effect of target threat or context. We did find a main effect of musculature on absolute difference scores, F(1.72, 374.05) = 24.08, p < .001, partial η2 = .10. The absolute distance of the mean chosen body from the target was significantly higher for the 0% muscle targets (M =.77, SE =.05) than for the 100% muscle targets (M = .51, SE = .04), p = .004, d = .22 indicating better performance on the larger bodies (Fig. 4).

Fig. 4
figure 4

Plots of mean absolute difference of selected face from target face (left) and selected body from target body (right). Error bars represent ±1 SE

Similarly, an exploratory analysis of response accuracy to body trials revealed a significant effect of musculature, F(1.85, 403.38) = 20.39, p < .001, partial η2 = .09, with accuracy for target selection significantly higher for the 100% muscle targets (M = .58, SE = .03) than for the 0% muscle targets (M = .50, SE = .03), p = .020, d = .15. Contrary to H2 and H3, we found no effect of context, nor any interaction.

Experiment 1 discussion

Contrary to our predictions, we found that pairing faces and bodies with criminally charged contextual information had no effect on the recall of either the face or body. The lack of significant differences between groups is unlikely to be due to ceiling/floor effects. Indeed, if contextual information had biased recall on the faces or bodies, we would have expected accuracy to go down relative to the no context case, which was not the case. Rather, according to the CMEI, the activation of a biasing stereotype is contingent on the moderating influence of estimator variables and system variables (Osborne & Davies, 2014). One key estimator variable thought to impinge on successful eyewitness identification is delay, or the length of time between witnessing the crime and later attempting to identify the perpetrator (Behrman & Davey, 2001; Deffenbacher et al., 2008). Due to the use of an immediate memory paradigm in the current experiment, it is possible that such a distortion may not have had sufficient time to manifest. Furthermore, it is possible that employing a longer, more detailed crime vignette (like that used by Yang et al., 2019), along with a reduction in the number of target faces/bodies presented might increase the salience of the criminal context.

Experiment 2

To address these issues, we conducted a second experiment, broadly following the methodology of Ben-Zeev et al. (2014) and included a neutral-context condition so participants had a text to read in one of the control conditions. There were three different conditions: (a) description of a crime, (b) description of a neutral event, and (c) no context. Participants studied a target face/body. To add delay, they then completed a distractor task, followed by with an individually viewed face/body trial, where participants indicated whether the stimulus was identical to the target (yes/no task). Participants also rated their confidence in their response (see Appendix 2 for confidence measures results).

We hypothesized that performance in all 3 conditions would be better on easier foil trials (foils further/more distinct from the target) than on difficult foil trials (foils closer to the target) (H1). Second, as person information has been shown to facilitate facial memory, we predicted that participants would make more errors (overall) in the no-context condition than in the neutral condition (H2). Third, we expected that participants in the crime condition would make more errors of identification on the target trials than in the neutral condition (H3). Fourth, we predicted that participants in the crime condition only would show a directional bias in their responses, making more errors of identification for more threatening/muscular face/body foils than for less threatening/muscular face/body foils (H4). Crucially, we expected an interaction between context condition and threat/musculature, such that participants in the criminal-context condition would make more errors of identification on more threatening foil trials and less errors of identification on less threatening foil trials than those in the neutral context and no-context conditions (H5). Furthermore, we expected participants in the criminal-context condition to make more errors of identification on the most threatening foil trials and fewer errors of identification on the least threatening foil trials than those in the other two groups (H6).



For a mixed-methods design (ANOVA: repeated measures, between factors), assuming a small-medium effect size (Cohen’s f = 0.22), our power analysis indicated that a minimum sample size of 192 participants would be required to assess the effects of both context condition (three groups) and level of threat/musculature (six levels of musculature) on participant response accuracy with a power of .95 and alpha level of .05. For the current study, this was increased to 200. Participants were recruited in July 2020 from Prolific (52 males; age: M = 32.34 years, SD = 11.04) and paid £1.50 for taking part. The experiment was built in JavaScript using PsychoJS, an online variant of PsychoPy (Peirce et al., 2019) and hosted on Pavlovia.org.



We selected one identity stimulus (seven levels of threat) from Todorov et al. (2013) and converted the faces to greyscale. The neutral threat face (0SD) was the target stimulus while the remaining six variants of this identity face (−3, −2, −1, +1, +2, and +3SD) were foils.


The body stimuli were realistic CG human male figures, created using Daz Studio, whose faces were blurred and wearing identical clothing. Because only one body stimulus was presented on-screen at a time, this allowed us to use smaller step differences in musculature of 16.67%, resulting in stimuli that varied in seven levels of test musculature; 0%, 16.67%, 33.33%, 50%, 66.67%, 83.33%, 100% (henceforth referred to as Musculature Levels 1–7). We selected the central body (50% musculature/Level 4) to be the target body stimulus and the remaining six variants to be the foils.


We designed two crime vignettes based on those originally used in studies by Porter et al. (2010) and Mendelsohn and Sewell (2004; see Appendix 3). Crime Vignette 1 described a shop robbery resulting in a murder, while Crime Vignette 2 described a street mugging also resulting in a murder. We then created two neutral vignettes that matched the crime vignettes in length and setting. Neutral Vignette 1 described someone purchasing a winning lottery in a shop, while Neutral Vignette 2 described someone asking for directions on a street corner. In all conditions, one vignette was randomly allocated to the face stimuli and the other to the body stimuli


Participants were randomized into one of the three context conditions. In the crime and neutral conditions, they were presented with one of the vignettes and told it described actions by a person about to be presented on screen that they would need to identify later (no time limit on reading was imposed). Participants in the no-context condition were simply told to study the person on the screen for later identification. In all conditions, they were given 30 seconds to study the stimulus. To introduce a delay before the identification stage of the experiment, they then completed eight trials of a modified n-back distractor task where they were presented with a random sequence of numbers and, at random intervals, asked to input the previous three numbers from the sequence. This distractor task lasted for approximately 5 minutes. Following this, participants did the identification task in a block of 24 trials, using either a body-target stimulus or a face-target stimulus in a counterbalanced order across participants.

In a replication of the procedure of Ben-Zeev et al. (2014), on a given trial the presented stimulus was randomly selected to be either the (central value) target or one of five valued foils. Target and foils were randomly repeated four times resulting in 24 trials. When participants completed one block of 24 trials for one stimulus type (e.g., body), they were given a second context vignette (or none in the no-context condition) and repeated the same procedure with the other stimulus type (e.g., face).

In each block, a trial started with a fixation dot presented for 2,000 ms, followed by a stimulus for 500 ms, followed by a random noise mask for 1,000 ms. After the mask was extinguished, participants saw a response screen that asked them to answer Yes/No (using a key press) to the question “Did that face/body EXACTLY match the one you previously studied?” and then rate, on a scale from 1 to 100, how confident they were in their response (see Fig. 5 and Appendix 2 for confidence ratings).

Fig. 5
figure 5

Experiment 2 sample body trial in criminal-context condition. Participants read a detailed crime vignette, then studied the target stimulus for 30 seconds, followed by a distractor task (5 minutes). Each trial consisted of a fixation dot (2,000 ms), the presentation of a foil/target stimulus (500 ms), followed by a random noise mask (1,000 ms). Participants were then asked whether the presented stimulus matched the original target, and to rate their confidence in the decision. Participants completed 24 trials for each face/body target stimulus

To counteract a potential regression to the mean where participants may have recognized that the true target stimulus lay at the centre of the threat/musculature of the stimuli, each participant was only presented with foil trials consisting of five out of the six possible foil levels, randomly excluding either the most/least threatening face trials and most/least muscular body trials.


Face accuracy

We examined accuracy (proportion of correct responses, including hits and correct rejections) across the varying levels of test facial threat (−2SD, −1SD, 0SD, +1SD, and +2SD) between the three context conditions. Since participants were only exposed to one of the extreme face foils (−3SD/+3SD), responses to these extreme trials were analyzed separately. Accuracy was calculated for each threat level by averaging the participant’s performance across each of the four trials completed at that level (Fig. 6). We used mixed ANOVAs to assess the effects of context (between-subject) and test threat level (within-subject) on participant accuracy. Consistent with H1, we found a significant effect of test threat level on accuracy, F(2.92, 574.81) = 123.88, p < .001, partial η2 = .386. Bonferroni corrected pairwise comparisons indicated significant differences (p < .05) in accuracy between all threat levels. Accuracy was highest on the least threatening −2SD trials (M = .87, SE = .02), followed by the 0SD trials (M = .78, SE = .02), followed by +2SD trials (M = .71, SE = .02), −1SD trials (M = .57, SE = .02) and +1SD trials (M = .29, SE = .02). Contrary to H2–H4, we found no main effect of context nor any interaction between context and threat level.

Fig. 6
figure 6

Mean response accuracy for face (left) and body (right) in Experiment 2. Error bars represent ±1 SE

Finally, we conducted a two-way between groups ANOVA to examine the effects of context condition and facial threat on performance on the most/least threatening face trials (−3SD/+3SD). Again, we found a significant effect of facial threat, F(1, 194) = 7.66, p = .006, partial η2 = .038, with performance significantly higher on the least threatening trials (M = .97, SE = .02) than on the more threatening trials (M = .90, SE = .02). However, no main effect of context condition was found and, contrary to H5, no interaction between facial threat and context was seen.

Body accuracy

We examined accuracy on the body trials across the varying levels of musculature (Levels 2–6) and between the three context conditions. Responses to extreme body foils (Level 1/Level 7) were analyzed separately. Consistent with H1, we found a significant effect of musculature level on accuracy, F(2.70, 531.88) = 29.10, p < .001, partial η2 = .129, with highest accuracy when the stimulus test was at the target level (Level 4) (M = .70, SE = .02). Bonferroni corrected pairwise comparisons revealed significantly higher (p < .05) performance on the target level test trials than all other stimuli test levels except Level 6 (M = .60, SE = .03). Performance on Level 6 was higher than all other trials other than Level 2 (M = .57, SE = .03). Accuracy on Level 2 was significantly higher than that on Levels 3 (M = .40, SE = .02) and Level 5 (M = .40, SE = .02).

Moreover, we found a main effect of context, F(2, 197) = 4.42, p = .013, partial η2 = .043. In line with H2, performance in the neutral context condition (M = .57, SE = .02) was significantly higher than that in the no-context condition (M = .50, SE = .02), p = .012, d = .58. There was no significant interaction between context condition and musculature. Contrary to H3–H5, participants did not make more errors in the crime condition than the neutral condition on the target trials or on foil trials higher in musculature than the target.

Finally, a two-way between-groups ANOVA was conducted to examine the effects of context condition and musculature on performance on the most/least muscular body trials (Level 1/Level 7). However, we found no effects of context or musculature, nor a significant interaction.

Experiment 2 discussion

Consistent with the results of Experiment 1, we found that pairing targets (faces or bodies) with criminally charged contexts did not significantly bias participant recall towards larger/more threatening foils. Identification accuracy was similar in the crime condition to that in the neutral information condition, despite the more detailed crime vignette, and a crime description that only related to a single target rather than different targets.

A possible explanation is that we included target-present trials. It is possible that many cases of false eyewitness identification arise under conditions where the actual criminal is not present in the lineup. According to the CMEI, innocent people whose appearance more closely matches that of a stereotypical criminal will be more likely to be misidentified. For example, Shaw and Wafler (2016) found that participants who witnessed staged crimes were more likely to misidentify more muscular defendants as criminals in the absence of the target criminal. However, their study lacked nuanced control over the body muscularity of the defendant stimuli and included facial information. Furthermore, it lacked a no-crime condition, instead using two different types of crime scenario. Thus, it is difficult to conclude that it was the actual criminal context that drove their pattern of results as opposed to a general tendency to recall bodies as more muscular.

Experiment 3

We conducted a final experiment aimed at assessing the potential bias in target-absent lineups, using body stimuli only. As in Experiment 2, participants were presented with a target body linked with either a criminal context, a neutral context, or no context at all. Following a distractor task, they were asked to identify the original target from a selection consisting of lower and higher musculature foils only, presented together in a randomized lineup on screen.

As person information has been shown to facilitate facial memory, we predicted that participants would make smaller errors (choose bodies that more closely match the original target) in the neutral condition than in the no-context condition (H1). We also predicted that participants in the crime condition only would show a directional bias in their responses and select more muscular body foils than the target (H2). Finally, we expected that participants in the crime condition would significantly differ from the neutral and no-context conditions in the nature of the foil selected, such that they would select significantly more muscular body foils (H3).



Our power analysis indicated that, for a one-way between-groups ANOVA (ANOVA: fixed effects, omnibus, one-way), with a between-groups factor of 3 levels, assuming a medium effect size (Cohen’s f = .25) a minimum sample size of 252 participants would be required to assess the effects of context condition on participant foil selection with a power of .95 and alpha level of .05.

Participants (69 males, 183 females; age: M = 26.85 years, SD = 11.75) were recruited in January 2021. The sample consisted of a mixture of undergraduate psychology students who were awarded one course credit for taking part in the study (N = 95), wth the remainder (N = 157) recruited from Prolific and paid £0.90 for taking part. The experiment was built and hosted on Gorilla Experiment Builder (www.gorilla.sc; Anwyl-Irvine et al., 2019).

Stimuli (bodies)

The body stimuli were realistic CG human male figures created using Daz Studio, with appearance varying in levels of musculature. The target body stimulus was at the central point of the musculature scale (50%). Since the target stimulus was never tested at identification stage, we used foils that were closer to the target on the musculature scale, covering 12.5%, 37.5%, 62.5% and 87.5% musculature. The default faces on the bodies were blurred and each was dressed in identical clothing, as in Experiments 1 and 2.


Participants were randomized into one of three context conditions. In the crime and neutral conditions, they were presented with either Crime Vignette 2/Neutral Vignette 2 (as used in Experiment 2; see Appendix 3) and told it applied to the action of a person about to be presented on screen. There was no time limit on reading the vignette. They were then presented with the body target stimulus on screen for 30 seconds and asked to study it for later identification. To introduce a larger delay, participants completed a modified n-back task before the identification task which lasted approximately 10 minutes. The identification task consisted of a single trial, where they were presented with the four body foils in a randomized-order lineup on the screen and asked to identify the one matching the previously viewed target (Fig 7). Crucially, none of these bodies matched the original target, as this was a target-absent paradigm.

Fig. 7
figure 7

Experiment 3 procedure (criminal-context condition). Participants read a detailed crime vignette, then studied the target body stimulus (30 seconds), followed by a distractor task. Participants were then presented with a lineup of four foils and asked to select the one that matched the target


This was a target-absent paradigm with no correct response. We found that participants tended to choose more muscular foils than the original target rather than less muscular foils. Participants selected the least muscular foil in 7.5% of cases, and the foil one step below the target level musculature in 25.8% of cases. Conversely, they chose the foil one step above the target level musculature in 43.3% of cases, and the most muscular foil in 23.4% of cases. However, no clear difference emerged between vignette conditions in pattern of response (Fig. 8).

Fig. 8
figure 8

Proportion of foil selections in Experiment 3. Foil 1 represents the least muscular foil, and Foil 4 represents the most muscular foil

To analyze the potential differences in vignette conditions, we first assessed absolute differences between the three context groups in body selection. To do this, we calculated difference scores for each participant to reflect their chosen body. If the participant had selected the body foil one level above/below the original target musculature, they received a score of +1/−1 that were converted to absolute values to reflect a nondirectional error score. We then used a one-way between-groups ANOVA to investigate differences between the three groups in their errors. Contrary to H1, there were no significant differences between the three context groups, F(2, 249) = .664, p = .516, partial η2 = .005.

To assess whether any of the groups displayed a directional bias in their responses, separate one-sample t tests were conducted for each group on the directional error scores, testing whether mean responses deviated significantly from zero. In line with H2, a positive bias was found for those in the crime condition (M = .50. SE = .14), with participants tending to select foils significantly larger than the original target, t(83) = 3.68, p < .001, d = .40. However, contrary to H2, the same bias was seen in both neutral (M = .52, SE = .14), t(83) = 3.63, p < .001, d = .40, and no-context conditions (M = .45, SE = .15), t(83) = 3.08, p = .003, d = .34.

Finally, to assess differences between the groups in this bias, a one-way ANOVA on these directional differences scores was conducted. Contrary to H3, no differences were observed between the three groups, F(2, 249) = .065, p = .937, partial η2 = .001.

Experiment 3 discussion

In line with our results in Experiments 1 and 2, we found that target bodies presented with criminally charged contexts did not significantly impact participant recall. We found that participants selected significantly larger bodies than the target, but this biased response was common across the three context groups and so cannot be attributable to an activated criminal stereotype. This pattern may reflect Weber’s law (Burt, 1960), whereby our capacity to recognize a change in a stimulus is proportional to the size of the stimulus. In this case, the foil that was least discernible from the original target may have been the one slightly larger than the original, despite the absolute difference in mass between the target and the foil just below the target in musculature being equal to that between the target and the foil just above the target in musculature.

General discussion

We examined identification accuracy for contextually primed face and body stimuli in three preregistered studies. Despite the predictions of the CMEI, we found no evidence for biased identification of either faces or bodies when paired with criminally charged contexts. Participants viewing images of alleged violent criminals were no more likely to overestimate the facial threat or musculature of the target stimuli than those who studied the targets in empty or neutral contexts. These results suggest that, although errors of eyewitness identification can and do occur, they may not be driven by systematic biases related to how threatening a criminal is later recalled.

Our findings add to the growing skepticism around the field of priming. Conceptual priming broadly refers to the activation of mental concepts via situational cues (Bargh & Chartrand, 2014). This is often through prompted recall of past experience (Callan et al., 2014; Fowler et al., 2011) or as was the case here, prompting participants to think about specific concepts (Cohn et al., 2010; Cohn et al., 2015). This is then used to assess the influence of these primed mental concepts on subsequent behaviours and judgments. Here, we primed participants with the social identity of a target person as a violent criminal, (e.g., Ben-Zeev et al., 2014); however, we found no effect of priming on performance. Our results align with recent failed replications of several prominent priming studies (Yong, 2012). Some suggest that observed priming phenomena are simply the result of false positives and publication bias (Simmons et al., 2011), while others suggest that priming effects are, by their very nature, considerably subtle and thus highly sensitive to even slight variations in experimental methodology (Cesario, 2014). Admittedly, the current study examined the effect of priming on criminal appearance somewhat indirectly, using facial threat and body musculature as proxies of a stereotypically criminal appearance. However, given previous work linking threat and dominance to perceived criminality (Funk et al., 2017; Oosterhof & Todorov, 2008), these measures are useful in exploring the potential role played by context in eyewitness recall. As no effect of priming was observed here across three different studies with varying methodologies, it appears unlikely that priming participants with a criminal context affects their recall of a target’s body morphology or facial threat.

Despite the lack of priming effect and support for the CMEI, this study offers considerable novel advancements in the realm of body recall and identification. Body size is a factor often overlooked as a potential source of prejudice, particularly in legal contexts. Prior investigations have primarily focused on the accuracy of estimation of exact body dimensions, such as height and weight, with these errors often accounting for a high proportion of description errors in culprit recall. However, as suggested by Yuille and Cutshall (1986), we have instead asked participants to identify targets based on their body morphology. This study represents the first attempt, to our knowledge, to explore body identification accuracy with careful control over fine morphology changes.

Experiment 1 revealed that, under a simultaneous lineup presentation format, participants were significantly more accurate in identifying briefly presented target bodies when they were high in musculature relative to target bodies low in musculature. This aligns with earlier research that suggests that inherently threatening or negative stimuli are likely to be recalled more accurately than neutral or positive stimuli (e.g., Mattarozzi et al., 2015; Mealey et al., 1996; Oda, 1997; Rule et al., 2012). Thus, it may be in Experiment 1 that the most muscular bodies were recalled with the highest accuracy because they appeared more threatening, consistent with our manipulation check and earlier work (McElvaney et al., 2021).

In Experiment 2, the target body was common across all participants and did not vary in musculature. While not supporting the CMEI, our results support the idea that contextual information can significantly affect person processing (Wieser et al., 2014; Wieser & Brosch, 2012), with participants in the neutral information condition significantly outperforming those in the no-context condition. Although this does not inform whether prosocial or antisocial contexts assist more in later person identification, it does suggest that paired contexts in general may be more helpful that empty (no) contexts.

Taken together, the results from Experiments 1 and uggest that in the absence of contextual information, suggest that in the absence of contextual information, an identification enhancement effect is observed with participants identifying the most threatening (muscular) bodies with the highest level of accuracy. However, when paired with more detailed and salient information, participants are significantly more accurate in identifying target bodies paired with neutral information than bodies not paired with any contextual information. Overall, these findings indicate that memory for body morphology may be enhanced by either the threat-signal of the stimulus or by salient contextual information.

Several limitations should be mentioned in our studies. First, the current methodology lacks ecological validity. Rather than witness a live staged event or video footage of a target culprit committing a staged crime (Ihlebæk et al., 2003; Pozzulo et al., 2008), participants simply read a description of a crime paired with a still target image. Moreover, while a delay was introduced in Experiments 2 and 3, this may still have been too short to reflect the much longer delays of days or weeks between exposure and identification experienced by real eyewitnesses. Given the importance of understanding what factors may affect priming, we sought to increase control at the expense of ecological validity.

In this departure from a true eyewitness procedure, it could be argued that our paradigm was more comparable to a delayed matching task since participants were tasked with identifying the image they had previously studied. It is possible that had a more ecological presentation of criminal target been used, a biased identification may have been observed. Indeed, under real conditions, myriad estimator variables can impinge on later identification. These include the viewing conditions (duration, distance, lighting), distracting stimuli (loud noises, bright lights), and psychological elements (witness motivation, attention), which add uncertainty, leading to a breakdown of accurate sensory communication (Albright, 2017).

Visual uncertainty can play a key role in eyewitness misidentification and increases odds of the intrusion of bias. While some noise was introduced via short target presentation in Experiment 1 and distractor tasks in Experiments 2 and 3 (accuracy in the tasks was lower than in the pilots of pure matching), it is possible that this type of noise was insufficient to activate a bias that would have been amplified by the priming. However, even simple face matching tasks (with no memory component required) are not easy or immune to error (Bruce et al., 1999), In addition, those who excel at unfamiliar face matching tasks tend to also excel at unfamiliar face recognition (Bobak, Dowsett, & Bate, 2016b; Bobak, Hancock, & Bate, 2016a), hinting at important shared mechanisms behind the two processes. Finally, it is unclear that during eyewitness procedures, the witnesses are not trying to do a matching task between the faces in the lineup and the originally viewed face.

Second, regarding our face stimuli, we elected to use tightly controlled CG stimuli that systematically varied in perceived threat due to the close link between facial threat and criminality (Todorov & Oosterhof, 2011). While this provided good experimental control, CG face stimuli can lack the photorealism of real faces (Fan et al., 2012; Fan et al., 2014; Farid & Bravo, 2012), which may disrupt facial identification (Crookes et al., 2015). Despite this, such CG faces have been used extensively and successfully to measure social cognition using behavioural (Brambilla et al., 2021; Oh et al., 2019; Sakuta et al., 2018) and brain imaging methods (Cao et al., 2020; Pelphrey et al., 2003; Todorov et al., 2011). However, the use of CG stimuli may have introduced an additional extraneous confound. Due to the data-driven method by which these faces are created, they may inadvertently correlate with other judgments. In the case of threat, the faces perceived as low in threat may have also been perceived as low in masculinity/high in femininity (Hester et al., 2020). As a result, this may have resulted in easier trials at the lower end of the threat scale than at the higher end, as participants could more reliably reject those more feminine cases as distinct from the original target.

Finally, all stimuli used in this study were Caucasian. Therefore, conclusions drawn from the results are restricted to Caucasian defendants. It is possible that race may interact with body morphology in the perception of threat. For example, Hester and Gray (2018) found that simply being tall increased perceptions of threat for black males only. In addition, we did not record the race of the participants themselves, so there may be an influence of the “other-race” effect (O’Toole et al., 1994). However, we did not test participants on the perceived race of the stimuli, and it is possible that since they were computer generated, participants did not perceive them to be of a particular race.

To address the issues above, future studies could attempt to use morphing software to produce photo-realistic face stimuli (of varying races) that vary in perceived threat. Using a tool such as Psychomorph (Tiddeman et al., 2001) or Abrosoft Fantamorph, two real faces high and low in threat could potentially be morphed together to create a target face of medium threat level (Holtzman, 2011). This morphing process would then allow researchers to create many faces along such a threat continuum, (Holtzman, 2011). This morphing process would then allow researchers to create many faces along such a threat continuum, biasing effects of contextual information on facial memory. Furthermore, a more ecological presentation of the target criminal stimulus, introducing a greater degree of visual uncertainty, with longer delays between target exposure and later recognition, may more accurately test the presence of a bias in criminal recognition.


We find no evidence of bias in the recollection of targets as a result of contextual framing; neither bodies nor faces were recalled as larger or more threatening than they were. However, we do find that accuracy of body identification is dependent on body morphology. Participant identification accuracy was highest for the most threatening body stimuli high in musculature. In addition, identification improved when the body was paired with detailed neutral contextual information. Limitations notwithstanding, these findings expand the literature on the identification of body morphology. Furthermore, they can help to inform future eyewitness paradigm studies on holistic suspect identification in the wake of a witnessed mock crime.