Cognitive Biases in Criminal Case Evaluation: A Review of the Research

Psychological heuristics are an adaptive part of human cognition, helping us operate efficiently in a world full of complex stimuli. However, these mental shortcuts also have the potential to undermine the search for truth in a criminal investigation. We reviewed 30 social science research papers on cognitive biases in criminal case evaluations (i.e., integrating and drawing conclusions based on the totality of the evidence in a criminal case), 18 of which were based on police participants or an examination of police documents. Only two of these police participant studies were done in the USA, with the remainder conducted in various European countries. The studies provide supporting evidence that lay people and law enforcement professionals alike are vulnerable to confirmation bias, and there are other environmental, individual, and case-specific factors that may exacerbate this risk. Six studies described or evaluated the efficacy of intervention strategies, with varying evidence of success. Further research, particularly in the USA, is needed to evaluate different approaches to protect criminal investigations from cognitive biases.


Introduction
Decades of research in cognitive and social psychology have taught us that there are limitations to human attention and decision-making abilities (see, for example, Gilovich et al. 2002). We cannot process all the stimuli that surround us on a daily basis, so instead we have adapted for efficiency by attuning to patterns and developing mental shortcuts or rules of thumb to help us effectively navigate our complex world. While this tendency to rely on heuristics and biases can serve us well by allowing us to make quick decisions with little cognitive effort, it also has the potential to inadvertently undermine accuracy and thus the fair administration of justice.
Cognitive bias is an umbrella term that refers to a variety of inadvertent but predictable mental tendencies which can impact perception, memory, reasoning, and behavior. Cognitive biases include phenomena like confirmation bias (e.g., Nickerson 1998), anchoring (e.g., Tversky & Kahneman 1974), hindsight bias (e.g., Fischhoff 1975), the availability heuristic (e.g., Tversky & Kahneman 1973), unconscious or implicit racial (or other identifying characteristics) bias (e.g., Greenwald et al. 1998;Staats et al. 2017), and others. In this context, the word "bias" does not imply an ethical issue (e.g., Dror 2020) but simply suggests a probable response pattern. Indeed, social scientists have demonstrated and discussed how even those who actively endorse egalitarian values harbor unconscious biases (e.g., Pearson et al. 2009;Richardson 2017) and how expertise, rather than insulating us from biases, can actually create them through learned selective attention or reliance on expectations based on past experiences (e.g., Dror 2020). Consequently, we recognize the potential for these human factors to negatively influence our criminal justice process.
In an effort to explore the role of cognitive biases in criminal investigations and prosecutions, we conducted a literature review to determine the scope of available research and strength of the findings. The questions guiding this exercise were as follows: (1)

Methods
We searched PsycINFO for scholarly writing focused on cognitive biases in criminal investigations and prosecutions in December 2016 and again in January 2020. 1 We reviewed all results by title and then reviewed the subset of possibly-relevant titles by abstract, erring on the side of over-inclusivity. We repeated this process using the Social Sciences Full Text, PubMed, and Criminal Justice Abstracts with Full Text databases to identify additional papers. Finally, we manually reviewed the reference lists in the identified papers for any unique sources we may have missed in prior searches.
We sorted the articles into categories by the actor or action in the criminal investigation and prosecution process that they addressed, including physical evidence collection, witness evaluation, suspect evaluation, forensic analysis and testimony, police case evaluation (i.e., integrating and drawing conclusions based on the totality of the evidence), prosecutors, defense attorneys, judges, juries, and sentencing. Within each of these categories, we further sorted the articles into one of three types of sources: "primary data studies" describing experimental or observational studies that involved data collection or analysis, "intervention studies" that were solutionoriented and involved implementing some type of intervention or training to prevent or mitigate a phenomenon, and "secondary sources" (e.g., commentaries, letters, reviews, theoretical pieces, general book chapters) that discussed cognitive biases but did not present primary data.
To narrow the scope of this review, we did not include articles that focus solely on implicit racial bias or structural racial bias in the criminal legal system. The foundational and persistent problem of racial (particularly anti-Black) bias throughout our legal system-from policing to sentencing (e.g., Voigt et al. 2017;NYCLU 2011;Blair et al. 2004;Eberhardt et al. 2006)-has been clearly demonstrated in laboratory experiments and analyses of real-world data and is well-documented in an ever-growing body of academic publications and policy reports (e.g., Correll et al. 2002;Chanin et al. 2018;Owens et al. 2017;Staats et al. 2017).

Scope of Available Research and Methodology
Cognitive biases in forensic science have received the most attention from researchers to date (for a review of these forensic science studies, see Cooper & Meterko 2019). The second most substantial amount of scholarship focused on case evaluation (i.e., integrating and drawing conclusions based on the totality of the evidence in a case). Ultimately, we found 43 scholarly sources that addressed various issues related to the evaluation of the totality of evidence in criminal cases: 25 primary data (non-intervention) studies, five intervention studies, and one additional paper that presented both primary data and interventions, and 12 secondary sources. For the remainder of this article, we focus solely on the primary data and intervention studies. One of the primary data studies (Fahsing & Ask 2013) described the development of materials that were used in two subsequent studies included in this review (Fahsing & Ask 2016;, and thus, this materials-development paper is not reviewed further here. Table 1 presents an overview of the research participants and focus of the other 30 primary data and intervention studies included in our review.
One challenge in synthesizing this collection of research is the fact that these studies address different but adjacent concepts using a variety of measures and-in some instances-report mixed results. The heterogeneity of this research reveals the complex nature of human factors in criminal case evaluations.
Eighteen of the 30 papers (13 primary data and three intervention) included participants who were criminal justice professionals (e.g., police, judges) or analyzed actual police documents. An appendix provides a detailed summary of the methods and results of the 18 criminal justice participant (or document) studies. Fifteen papers were based on or presented additional separate analyses with student or lay participants. Recruiting professionals to participate in research is commendable as it is notoriously challenging but allows us to identify any differences between those with training and experience versus the general public, and to be more confident that conclusions will generalize to realworld behavior. Of course, representativeness (or not) must still be considered when making generalizations about police investigations.
Reported sample sizes ranged from a dozen to several hundred participants and must be taken into account when interpreting individual study results. Comparison or control groups and manipulation checks are also essential to accurately interpreting results; some studies incorporated these components in their designs while others did not. Most studies used vignettes or case materials-both real and fictionalized-as stimuli. Some studies did not include enough information about stimulus or intervention materials to allow readers to critically interpret the results or replicate an intervention test. Future researchers would benefit from publishers making more detailed information available. Further, while the use of case vignettes is a practical way to study these complex scenarios, this approach may not completely mimic the pressures of a real criminal case, fully appreciate how the probative value of evidence can depend on context, or accurately reflect naturalistic decision-making.
Notably, only two of the criminal case evaluation studies using professional participants were conducted in the USA; all others were based in Europe (Austria, Netherlands, Norway, Sweden, and the UK). The differences between police training, operations, and the criminal justice systems writ large should be considered when applying lessons from these studies to the USA or elsewhere.
Finally, all of these papers were published relatively recently, within the past 15 years. This emerging body of research is clearly current, relevant, and has room to grow.

Research Findings
The primary data studies address a constellation of concepts that demonstrate how human factors can inadvertently undermine the seemingly objective and methodical process of a criminal investigation. To organize these concepts, we used a taxonomy originally developed to describe potential sources of bias in forensic science observations and conclusions as a guide (Dror 2017;Dror et al. 2017) and adapted it to this collection of case evaluation literature. 2 As in Dror's taxonomy, the broad base of this organizing pyramid is "human nature," and as the pyramid narrows to its peak, potential sources of bias become increasingly dependent on environmental, individual, and case-specific circumstances and characteristics (Fig. 1). Some authors in this collection address more than one of these research areas within the same paper through multiple manipulations or a series of studies (Table 1).

Human Nature
The "human nature" studies include those that demonstrate universal psychological phenomena and their underlying mechanisms in the context of a criminal case evaluation. Several studies focused on confirmation bias. Confirmation a "Police" participant studies were those that used police personnel (including police officers, criminal investigators, crime analysts, police trainees or recruits) or a review of police documents (including investigative case files or decision logs); "Students or community" participant studies were those that used undergraduate students, graduate students, law students, US citizens, or a general public/online sample b Fahsing and Ask developed materials for these studies in 2013 by conducting semi-structured interviews to elicit factors that could disrupt optimal decision-making in homicide investigations; used content analysis to develop categories of tipping points (naming, arresting, or charging a suspect, choice of main hypotheses or lines of inquiry) and related situational (availability of information/evidence, external pressure/community impact, internal pressure/organizational issues, time pressure) and individual (detective experience, training and education, personal characteristics) factors c O'Brien's 2009 publication includes a subset of studies from her 2007 dissertation. Consequently, the numbering of studies in these two papers is different: published "Study 1" = dissertation "Study 2" (N = 108) and published "Study 2" = dissertation "Study 3" (N = 109 bias, sometimes colloquially referred to as "tunnel vision," denotes selective seeking, recalling, weighting, and/or interpreting information in ways that support existing beliefs, expectations, or hypotheses, while simultaneously avoiding or minimizing inconsistent or contradictory information (Nickerson 1998;Findley 2012). Some authors in this collection of studies used other terms to describe this concept or elements of it, including "context effects," the term used by Charman et al. (2015) to describe when "a preexisting belief affects the subsequent interpretation of evidence" (p. 214), and asymmetrical skepticism (Ask & Granhag 2007b;Marksteiner et al. 2010).
Eight studies with law enforcement personnel (Ask & Granhag 2007b;Ask et al. 2008;Charman et al. 2017;Ditrich 2015;Groenendaal & Helsloot 2015;Marksteiner et al. 2010;Rassin 2010;Wallace 2015) examined aspects of confirmation bias; one addressed the distinct but related phenomenon of groupthink (Kerstholt & Eikelboom 2007). The importance of this issue was demonstrated by a survey of an unspecified number of professional crime scene officers conducted by Ditrich (2015), asking for their opinions about the relative frequency and severity of various cognitive errors that could potentially negatively affect a criminal investigation; based on their experiences, respondents highlighted confirmation bias (as well as overestimating the validity of partial information and shifting the burden of proof to the suspect). The other studies within this group used experimental designs to assess police officers' evaluation of evidence. Charman et al. (2017) reported that police officers' initial beliefs about the innocence or guilt of a suspect in a fictional criminal case predicted their evaluation of subsequent ambiguous evidence, which in turn predicted their final beliefs about the suspect's innocence or guilt. This is not the only study to demonstrate that, like the rest of us, police officers are susceptible to confirmation bias. Ask and colleagues (2008) found that police recruits discredited or supported the same exact evidence ("the viewing distance of 10 m makes the witness identification unreliable" versus "from 10 m one ought to see what a person looks like") depending on whether it was consistent or inconsistent with their hypothesis of a suspect's guilt. Ask and Granhag (2007b) found that when experienced criminal investigators c Fig. 1 Organizational framework for case evaluation studies, adapted from Dror's (2017) taxonomy of different sources of potential bias that may cognitively contaminate forensic observations and conclu-sions. The specific factors listed in this pyramid are those that were examined in the collection of studies in the present literature review read a vignette that implied a suspect's guilt (but left room for an alternative explanation), they rated subsequent guiltconsistent evidence as more credible and reliable than evidence that was inconsistent with their theory of guilt; similar results were seen in a study of police officers, district attorneys, and judges by Rassin (2010). Marksteiner et al. (2010) investigated the motivational underpinnings of this type of asymmetrical skepticism among police trainees, asking whether it is driven by a desire to reconcile inconsistent information with prior beliefs or by the goal of case closure, and encountered mixed results. The group who initially hypothesized guilt reacted as expected, rating subsequent incriminating evidence as more reliable, but in the group whose initial hypothesis was innocence, there was no difference in the way that they rated additional consistent or inconsistent information. Wallace (2015) found that the order in which evidence was presented influenced guilt beliefs. When police officers encountered exculpatory evidence prior to inculpatory evidence, guilt belief scores decreased, suggesting their final decisions were influenced by their initial impressions. Kerstholt and Eikelboom (2007) describe how teams tend to converge on one interpretation, and once such an interpretation is adopted, individual members are less able to examine underlying assumptions critically. They asked independent crime analysts to evaluate a realistic criminal investigation with fresh eyes and found that they were demonstrably influenced when they were aware of the investigative team's existing working hypothesis.
Studies in student and general populations examining confirmation bias and other aspects of human cognition (Ask et al. 2011b;Charman et al. 2015;Greenspan & Surich 2016;O'Brien 2007;Price & Dahl 2014;Rassin et al. 2010;Simon et al. 2004;Wastell et al. 2012) reported similar patterns to those described above with police participants. O'Brien (2007; found that students who named a suspect early in a mock criminal investigation were biased towards confirming that person's guilt as the investigation continued. O'Brien measured memory for hypothesis-consistent versus hypothesis-inconsistent information, interpretation of ambiguous evidence, participants' decisions to select lines of inquiry into the suspect or an alternative, and ultimate opinions about guilt or innocence. In a novel virtual crime scene investigation, Wastell et al. (2012) found that all students (those who ultimately chose the predetermined "correct" suspect from the multiple available people of interest and those who chose incorrectly) sought more chosen-suspectconsistent information during the exercise. However, those who were ultimately unsuccessful (i.e., chose the wrong person) spent more time in a virtual workspace (a measure of the importance placed on potential evidence) after accessing confirmatory information. They also found that students who settled on a suspect early in the exercise-measured by prompts throughout the virtual investigation-were comparatively unsuccessful.
Other psychological phenomena such as recency effects (i.e., our ease of recalling information presented at the end of a list relative to information presented at the beginning or middle) and the feature positive effect (i.e., our tendency to generally attune to presence more than absence) were also examined in studies with student or general population participants. Price and Dahl (2014) explored evidence presentation order and found that under certain circumstances, evidence presented later in an investigation had a greater impact on student participant decision-making in a mock criminal investigation. Charman and colleagues also found order of evidence presentation influenced ratings of strength of evidence and likelihood of guilt in their 2015 study of evidence integration with student participants. These results appear to provide evidence against the presence of confirmation bias, but recency effects still demonstrate the influence of human factors as, arguably, the order in which one learns about various pieces of evidence -whether first or last-should not impact interpretation. Several research teams found that a positive eyewitness identification is seen as more credible than a failure to identify someone (Price & Dhal 2014, p.147) and the presence of fingerprints-as opposed to a lack of fingerprints-is more readily remembered and used to make decisions about a criminal case , even though the absence of evidence can also be diagnostic. Other researchers highlighted our psychic discomfort with cognitive dissonance (Ask et al. 2011b) and our tendency to reconcile ambiguity and artificially impose consistency in a criminal case by engaging in "bidirectional coherence-based reasoning" (Simon et al. 2004;Greenspan & Surich 2016).

Environment and Culture
The three "environment and culture" studies with police personnel (Ask & Granhag 2007b;Ask et al. 2011a;Fahsing & Ask 2016) revealed the ways in which external factors can influence an investigation. For instance, type of training appears to impact the ability to generate a variety of relevant hypotheses and actions in an investigation. English and Norwegian investigators are trained and performed differently when faced with semi-fictitious crime vignettes (Fahsing & Ask 2016). Organizational culture can impact the integrity of an investigation as well. Ask and colleagues (2011a) concluded that a focus on efficiency-as opposed to thoroughness-produces more cursory processing among police participants, which could be detrimental to the accurate assessment of evidence found later in an investigation. Ask and Granhag (2007b) observed that induced time pressure influenced officers' decision-making, creating a higher tendency to stick with initial beliefs and a lower tendency to be influenced by the evidence presented.

Individual Characteristics
Seven "individual characteristics" studies with police personnel (Ask & Granhag 2005;2007a;Dando & Ormerod 2017;Fahsing & Ask 2016;Kerstholt & Eikelboom 2007;Wallace 2015) plus two studies with student populations (Rassin 2010(Rassin , 2018a examined ways in which personal attributes can influence an investigation. Varying amounts of professional experience may matter when it comes to assessments of potential criminal cases and assumptions about guilt. For instance, police recruits appear to have a strong tendency toward criminal-as opposed to non-criminal-explanations for an ambiguous situation like a person's disappearance (Fahsing & Ask 2017) and less experienced recruits show more suspicion than seasoned investigators (Wallace 2015). In a departure from the typical mock crime vignette method, Dando and Ormerod (2017) reviewed police decision logs (used for recording and justifying decisions made during serious crime investigations) and found that senior officers generated more hypotheses early in an investigation, and switched between considering different hypotheses both early and late in an investigation (suggesting a willingness to entertain alternative theories) compared with inexperienced investigators. An experimental study, however, found that professional crime analyst experience level (mean 7 months versus 7 years) was not related to case evaluation decisions and did not protect against knowledge of prior interpretations of the evidence influencing conclusions (Kerstholt & Eikelboom 2007).
Two studies examined differences in reasoning skills in relation to the evaluation of evidence. Fahsing and Ask (2017) found that police recruits' deductive and inductive reasoning skills were not associated with performance on an investigative reasoning task. In contrast, in a study with undergraduate students, accuracy of decision-making regarding guilt or innocence in two case scenarios was associated with differences in logical reasoning abilities as measured by a test adapted from the Wason Card Selection Test (Rassin 2018a). Ask and Granhag (2005) found inconsistent results in a study of police officers' dispositional need for cognitive closure and the effect on criminal investigations. Those with a high need for cognitive closure (measured with an established scale) were less likely to acknowledge inconsistencies in case materials when those materials contained a potential motive for the suspect, but were more likely to acknowledge inconsistencies when made aware of the possibility of an alternative perpetrator. In a replication study with undergraduate students, Ask & Granhag (2005) found that initial hypotheses significantly affected subsequent evidence interpretation, but found no interaction with individual need for cognitive closure. Students who were aware of an alternative suspect (compared with those aware of a potential motive for the prime suspect) were simply less likely to evaluate subsequent information as evidence supporting guilt.
In another study, when Ask and Granhag (2007a) induced negative emotions in police officers and then asked them to make judgments about a criminal case, sad participants were better able to substantively process the consistency of evidence or lack thereof, whereas angry participants used heuristic processing.

Case-Specific
Four studies of police personnel (Ask et al. 2008;Fahsing & Ask 2016;Wallace 2015), one using police records (Dando & Omerod 2017), and three studies of student populations (Ask et al. 2011b;O'Brien 2007;Rassin et al. 2010) examined "case-specific" and evidence-specific factors. In a study of police officers, Ask and colleagues (2008) showed that the perceived reliability of some types of evidence (DNA versus photographs versus witnesses) is more malleable than others; similar results pertaining to DNA versus witness evidence were found in a study of law students (Ask et al. 2011b). Fahsing and Ask (2016) found that police recruits who were presented with a scenario including a clear "tipping point" (an arrest) did not actually produce significantly fewer hypotheses than those who were not presented with a tipping point (though they acknowledge that the manipulation-one sentence embedded in a case file-may not have been an ecologically valid one). In a subsequent study with police recruits, the presence of a tipping point resulted in fewer generated hypotheses, but the difference was not statistically significant (Fahsing & Ask 2017).
Other studies using law students  or undergraduate students (O'Brien 2007) examined the influence of crime severity on decision-making. Rassin et al. (2010) observed that the affinity for incriminating evidence increases with crime severity, but in one of O'Brien's (2007) studies, crime severity did not have a demonstrable impact on confirmation bias.

Interventions
Taken together, this body of work demonstrates vulnerabilities in criminal investigations. Some researchers have suggested theoretically supported solutions to protect against these vulnerabilities, such as gathering facts rather than building a case (Wallace 2015) or institutionalizing the role of a "contrarian" in a criminal investigation (MacFarlane 2008). Few studies have tested and evaluated these potential remedies, however. Testing is an essential prerequisite to any advocacy for policy changes because theoretically sound interventions may not, in fact, have the intended effect when applied (e.g., see below for a description of O'Brien's work testing multiple interventions with differing results).
Four studies have examined various intervention approaches with police departments or investigators (Groenendaal & Helsloot 2015;Jones et al. 2008;Rassin 2018b;Salet & Terpstra 2014). Jones et al. (2008) created a tool that helped an experimental group of investigators produce higher quality reviews of a closed murder case than those working without the aid of the review tool. Their article provides an appendix with "categories used in the review tool" (e.g., crime scene management, house-to-house enquiries, community involvement) but lacks a detailed description of the tool itself and the outcome measures. Importantly, the authors raise the possibility that a review tool like this may improve how officers think through a case because of the structure or content of the tool or it may succeed by simply slowing them down so they can think more critically and thoroughly. Another approach that shows promise in reducing tunnel vision is using a pen and paper tool to prompt investigators to consider how well the same evidence supports different hypotheses (Rassin 2018b). In a study of actual case files, supplemented with interviews, Salet and Terpstra (2014) explored "contrarians" and found that there are real-world challenges to the position's efficacy (e.g., personal desire to be a criminal investigator, desire for solidarity with colleagues) and considerable variability in the way contrarians approach their work, with some opting for closeness to an investigation and others opting for distance; individuals also embraced different roles (e.g., supervisor, devil's advocate, focus on procedure). The researchers concluded that, in practice, these contrarians appear to have exerted subtle influence on investigations but there is no evidence of a radical change in case trajectory. Similarly, members of criminal investigation teams in the Netherlands reported that, in practice, designated devil's advocates tend to provide sound advice but do not fundamentally change the course of investigations (Groenendaal & Helsloot 2015). Groenendaal and Helsloot describe the development and implementation of the Criminal Investigation Reinforcement Programme in the Netherlands, which was prompted by a national reckoning stemming from a widely publicized wrongful conviction. The program included new policies aimed at, among other things, reducing tunnel vision (including the use of devil's advocates, structured decision-making around "hypotheses and scenarios," and professionalized, permanent "Command Core Teams" dedicated to major crimes). This deliberate intervention provided an opportunity for researchers to interview investigators who were directly impacted by the new policies. Groenendaal and Helsloot conclude that the main effect of this intervention was an increased awareness about the potential problem of tunnel vision, and they focus on an unresolved a tension between "efficacy" (more convictions) and "precaution" (minimizing wrongful convictions). Their work underscores the importance of collecting criminal legal system data, as interveiwees reported their experiences and impressions but could not report whether more correct convictions had been obtained or more wrongful convictions avoided.
Other studies have examined various intervention ideas with student populations (Haas et al. 2015;O'Brien 2007;). Haas et al. (2015) found that using a checklist tool to evaluate evidence appears to improve students' abductive reasoning and reduce confirmation bias. O'Brien (2007; found that orienting participants to being accountable for good process versus outcome had no impact, and that when participants expected to have to persuade someone of their hypothesis, this anticipation actually worsened bias. More promisingly, she discovered that participants who were asked to name a suspect early in an investigation, but were then told to consider how their selected suspect could be innocent and then generate counter-arguments, displayed less confirmation bias across a variety of measures (they looked the same as those who did not name a suspect early). But another approach-asking participants to generate two additional alternative suspects-was not effective (these participants showed the same amount of bias as those who identified just one suspect). Zalman and Larson (2016) have observed "the failure of innocence movement advocates, activists, and scholars to view the entirety of police investigation as a potential source of wrongful convictions, as opposed to exploring arguably more discrete police processes (e.g., eyewitness identification, interrogation, handling informants)" (p.3). While the thorough examination of these discrete processes has led to a better understanding of risk factors and, ultimately, reforms in police practices (e.g., see the Department of Justice 2017 guidelines for best practices with eyewitnesses), a recent shift towards viewing wrongful convictions from a "sentinel events" 3 perspective advances the conversation around these criminal justice system failures (Doyle 2012;2014;Rossmo & Pollock 2019).

Discussion
This literature review has identified a body of research that lends support to this holistic perspective. The studies reviewed here address a constellation of concepts that demonstrate how the human element-including universal psychological tendencies, predictable responses to situational and organizational factors, personal factors, and characteristics of the crime itself-can unintentionally undermine truth-seeking in the complex evidence integration process. Some concepts are addressed by one study, some are addressed by several, and some studies explored multiple variables (e.g., demonstrating the existence of confirmation bias and measuring how level of professional experience plays a role).
Several contemporary studies have demonstrated the existence of confirmation bias in police officers within the context of criminal investigations. Other psychological phenomena have not been examined in police populations but have been examined in student or general populations using study materials designed to assess the interpretation of criminal case evidence and decision-making. This collection of studies also investigates the role of environmental factors that may be specific to a department or organization, characteristics of individual investigators, or of the specific case under review. At the environmental level, type of training and organizational customs were influential and are promising areas for further research as these factors are within the control of police departments and can be modified. With respect to individual characteristics, a better understanding of advantageous dispositional tendencies and what is gained by professional experience, as well as the unique risks of expertise, could lead to better recruitment and training methods. Case-specific factors are outside the control of investigators, but awareness of factors that pose a greater risk for bias could serve as an alert and future research could identify ways to use this information in practice (see also Rossmo & Pollock 2019 for an in-depth discussion of "risk recipes").
Charman and colleagues (2017) present a particularly interesting illustration of the way in which a criminal case is not merely the sum of its parts. In this study, the researchers presented law enforcement officers with exonerating, incriminating, or neutral DNA or eyewitness evidence, collected initial beliefs about guilt, asked participants to evaluate a variety of other ambiguous evidence (alibi, composite sketch, handwriting comparison, and informant information that could be reasonably interpreted in different ways), and then provide a final rating of guilt. As hypothesized, the researchers found those who were primed with incriminating evidence at the beginning were more likely to believe the suspect guilty at the end. However, even those who initially received exonerating information and initially rated the likelihood of suspect guilt as relatively low ended up increasing their guilt rating after reviewing the other ambiguous evidence. It appears that the cumulative effect of ambiguous evidence tilted the scales towards guilt. This unexpected outcome underscores the value of understanding how the 4 As Snook and Cullen (2008) assert, "it is unrealistic to expect police officers to investigate all possible suspects, collect evidence on all of those suspects, explore all possible avenues concerning the circumstances surrounding a crime, search for disconfirming and confirming evidence of guilt for every suspect, and integrate all of this information" (p. 72). Dando and Ormerod (2017) illustrate this real-world complexity when they describe an investigation that was delayed because a call for tips led to a flood of false leads, suggesting that more information is not always better. Further, though it addresses procedural justice in street policing rather than evidence integration in a criminal investigation (and thus was not included in this review), Owens et al. (2018) provide an example of a field study, complete with published scripts. Recognizing the automated thinking and behavior that comes with job experience, these researchers tested an intervention to reduce the number of incidents resolved with arrests and use of force by implementing a training program aimed at encouraging beat officers to think more slowly and deliberately during routine encounters; they also assessed the cost of this intervention in the police department. totality of evidence in a criminal case is evaluated, and has implications for the legal doctrine of "harmless error" rooted in assumptions of evidentiary independence (e.g., Hasel & Kassin 2009).
Consistently incorporating control groups into future study designs and including complete stimulus materials in future publications could build on this foundation. This would help future researchers fully interpret and replicate study results and would assist in determining what elements of intervention strategies work. Since the majority of these studies were conducted in Europe, it would be worthwhile to explore whether or not these results can be replicated in the USA, given the similarities and differences in our criminal justice systems and the variety of approaches used to select and train detectives across police departments. Finally, valuable future research will move beyond the demonstration of these human vulnerabilities and will design and test strategies to mitigate them in the complex real world. 4 Vignettes and mock-investigations are clever ways of studying criminal investigations, but it is worth remembering that these approaches cannot fully capture the dynamics of a real criminal investigation. Collaboration between academic researchers and criminal investigators could generate robust expansions of this work.
Evidence evaluation and synthesis in criminal investigations is, of course, just one part of a larger legal process. In addition to police, defense attorneys, prosecutors, and judges have powerful roles in determining case outcomes, especially in a system that is heavily reliant on plea bargaining. Critically addressing the potential influence of cognitive biases throughout this system, and promoting and implementing proven, practical protections against these tendencies will advance accuracy and justice.

N=49
criminal investigators, mean experience 10 years, Sweden • Explanatory Measures: Need for cognitive closure induced by time pressure (assignment to one of two groups; unlimited time to complete task or limited to 20 minutes, which was less than the median time the participants in the comparison group took); case scenario version (random assignment not specified). Lower ratings for witness reliability, witnessing conditions, retention interval, and weight of witness evidence for witness who provided evidence contradicting (exonerating), as opposed to supporting, the suspect's guilt (incriminating): Human Nature; Environment and Culture • Primary Outcome Measures: Pre-and post-eyewitness evidence ratings of guilt or innocence (dichotomous), confidence, strength of evidence, adequacy of evidence for prosecution; ratings of witness credibility, conditions for making reliable observations, impact of witness emotional reaction, effect of 7-day delay in witness report to police, weight of witness evidence in relation to other evidence, agreement of witness evidence with other evidence (all 9-point Likert scales). No significant differences were seen in ratings of witness credibility or impact of witness emotions, though the patterns emerged in the same direction.
Little difference in asymmetrical skepticism results when comparing low-and high-time pressure groups, though the greatest response to information conveyed in the witness statement was found in the lowtime pressure condition. Ask, Rebelius, and Granhag (2008) • Materials: Homicide case c description suggesting guilt of a suspect apprehended near the scene + one additional type of evidence (DNA, eyewitness, photo) that was either consistent with or inconsistent with a guilty conclusion.
Results provide evidence for asymmetrical skepticism in criminal investigations.

N=117
police trainees, Sweden • Explanatory Measures: Type of evidence (random assignment); consistent or inconsistent evidence version (random assignment). Higher rating of reliability for evidence consistent with guilt compared to evidence inconsistent with guilt across types of evidence in the case: Human Nature; Case-Specific • Primary Outcome Measures: Ratings of guilt probability, strength of evidence before and after review of additional evidence, ratings of reliability of the specific evidence provided and of the reliability of that evidence type in general (all 9-point Likert scales). Type of Evidence: DNA evidence rated higher reliability than photo or witness (p < 0.01); no difference between photo and witness Similar results were seen with ratings of evidence in general. [

Results
Ask, Granhag, and Rebelius (2011a) • Materials: Given description of a "good investigator" that emphasized efficiency, thoroughness, or neither; then given assault caseb description suggesting the victim's father was guilty + two witness statements. Second witness statement was either consistent or inconsistent with the hypothesis that the victim's father was responsible. Results provide evidence for salient social norms influencing information processing in criminal investigations.
Efficiency versus thoroughness associated with composite rating of guilt, with interaction by type of witness evidence; guilt ratings in the thoroughness group were most influenced: Environment and Culture • Primary Outcome Measures: Task completion speed (relative to other participants), ratings of guilt probability and strength of evidence (1 to 7 Likert scales), combined into a composite score, ratings of their own case processing with respect to judgment spontaneity, cognitive effort, difficulty deciding guilt, and confidence (1 to 7 Likert scales).

Charman, Kavetski and Mueller (2017)
• Materials: Computer-administered homicide case description with either DNA or eyewitness evidence that was either incriminating, exonerating or neutral. After making an initial rating of likelihood of guilt, all participants reviewed additional ambiguous alibi, facial composite, handwriting, and informant evidence in a randomized order. Results provide evidence of confirmation bias in evaluation of ambiguous criminal evidence.

N=89
police officers, mean experience 20 years, United States • Explanatory Measures: Type of evidence (DNA or eyewitness) and incriminating, exonerating, or neutral nature of evidence (random assignment). Interaction between type of evidence and its interpretation -higher initial ratings of guilt with incriminating DNA evidence compared with incriminating eyewitness evidence; lower initial ratings of guilt with exonerating DNA compared with exonerating eyewitness evidence: Human Nature • Primary Outcome Measures: Initial ratings of probable guilt (1 to 100 scale) after reviewing case description; ratings of each additional type of evidence (1 to 7 Likert scale for strength of alibi; 1 to 100 scale for similarity of facial composite to suspect, similarity of handwriting to suspect, the extent to which informant evidence implicated suspect); final ratings of probable guilt (1 to 100 scale) after reviewing all of the additional evidence. Strength of initial guilt belief influenced subsequent evaluation of evidence, except for alibi strength: [β = relation between initial guilt belief and evidence evaluation]

Results
The cumulative effect of ambiguous evidence was to increase perception of guilt: all final guilt ratings were higher than initial guilt ratings, except for those who began with incriminating DNA evidence, perhaps indicating a ceiling effect: Dando and Ormerod (2017) • Materials: Randomly selected decision logs from two police forces in which entries concerned a crime, detectives made clear a preference of possible action, and a reason was given to follow the course of action. Plotted case timelines noting when hypotheses were generated and tested. Results suggest that use of decision logs varies by type of case and the officer involved, but this documentation reveals differences in investigative decision-making by professional experience level throughout an investigation.

N=60
police decision logs, United Kingdom • Explanatory Measures: Police officer experience level (mean 10 versus mean 2 years).
Similar number of log entries each week by experience level: Individual Characteristics; Case-Specific • Primary Outcome Measures: Number of log entries, length of investigation; analyzed number of hypotheses generated, number of evidence sources examined to support these hypotheses, and order in which hypotheses were generated; ratio of horizontal to vertical activity transitions (indicating number of hypotheses being examined: > 1 = multiple lines of inquiry; < 1 = "satisficing", or focusing on a single line of inquiry). Number of hypotheses generated was highest in first quartile and higher for more experienced investigators: Mean number of evidence sources opened was highest in the beginning of the investigation, but there was little difference by experience level: Experienced investigators explored multiple hypotheses in the beginning and at the end of the investigation; less experienced investigators focused on a single hypothesis throughout the investigation:

Results
Ditrich (2015) • Materials: List of cognitive errors adapted from medical literature to the criminal justice context, presented verbally. An "inside view" from experienced crime scene officers suggested that the most common and potentially detrimental cognitive errors included confirmation bias, anchoring, and shifting the burden of proof from the investigator to the suspect.
"small" number of experienced crime scene officers, Austria • Explanatory Measures: Not applicable.
Human Nature • Primary Outcome Measures: Opinions about frequency of appearance (5-point Likert scale from "never" to "very often"), as well as concepts selected as having the strongest adverse effect in practice. Fahsing and Ask (2016) • Materials: Two missing person case d descriptions that did or did not contain a tipping point (decision to arrest a particular suspect), asked to write down investigative hypotheses and actions to be taken. Presence of tipping point was not associated with generation of hypotheses or actions ("case-specific"), but researchers saw an interaction between experience level ("individual characteristics") and type of training ("environment and culture"). In regression analyses, no association was seen between the number of "gold standard" hypotheses generated in either variation of the cases and the inductive reasoning score or deductive reasoning score: Earlier "tipping point" (arrest) did not significantly decrease the number of "gold standard" hypotheses generated, but there were trends in that direction. Groenendaal and Helsloot (2015) • Materials: Historical record of "the Schiedammer Park murder" in the Netherlands; semi-structured group interviews with leaders and coordinators of Command Core Teams from police forces across the country. Narrative description of policies produced in response to the Schiedammer Park murder and subsequent Posthumus Commission. Interviewees reported that the Major Investigation Team model improved investigations, but did not know whether more crimes were solved as a result. They reported both pros and cons to permanent positions, record-keeping about the hypotheses and scenarios decision-making process. They felt positively about the devil's advocate system but found that it generally confirmed the direction already being pursued and did not identify flaws. They reported that, compared to previous years, the culture had become more open.

Intervention
• Primary Outcome Measures: Investigators' self-reported experiences with various elements of the new Programme, as well as "efficacy" (i.e., "number of solved crimes") and "precaution" (i.e., "minimisation of the chance of wrongful conviction"). The study authors' identified tension between the concepts of efficacy and precaution, and concluded that the main result of the Programme was increased awareness about (but no measurable elimination of) tunnel vision. providing best practices for 31 categories of investigative activities; control group received no instructions but could consult ACPO MIM or other materials. Review Tool produced increased quantity (amount not specified) and quality (37% higher) of information, and took longer to complete (approximately 33% longer).

N=12
police officers, 6 experienced and 6 inexperienced investigators, United Kingdom • Intervention: Review Tool or control group (random assignment); officer experience level (experienced = senior investigating officers, inexperienced = little or no murder investigation experience).
Experienced officers in both the intervention and control groups produced higher quantity and quality work Intervention • Primary Outcome Measures: Amount of information produced in each investigation review; content analysis, with each comment rated for usefulness (1 to 7 Likert scale), amount of time to complete the review. Results show that both experience and the tested Review Tool helped (though mechanism is unclear -Review Tool might help officers think more thoroughly and critically because of the content and structure of the tool, or might simply succeed by slowing officers down so they can think more thoroughly and critically). Kerstholt and Eikelboom, (2007) • Materials: Two realistic case scenarios (Case 1: possible sex trafficking, Case 2: disappearance of a young woman).
Half of participants received a plausible, but not the most likely (based on pilot testing), prior interpretation for each case (Case 1: "Rodriquez" played a key role, Case 2: missing woman's father played role in disappearance). Groupthink, confirmation bias demonstrated. Results show that analysts who are privy to an investigative team's working hypothesis will suggest that hypothesis as the most likely at a higher rate than analysts who merely get access to the facts of the case but no prior interpretation. Novices and experts performed the same way.

N=38
crime analysts, Netherlands • Explanatory Measures: Knowledge of prior plausible but unlikely interpretation or not; experience level (mean 7 years versus 7 months).
Human Nature; Individual Characteristics • Primary Outcome Measures: For Case 1, description of role of each person and most likely scenario, ranking of sources of information and importance, noted missing information, and suggestions for further investigation. For Case 2, number and type of possible explanations generated, conclusion about most likely hypothesis, suggestions for further investigation. Case 1: Knowledge of prior interpretation was associated with identifying "Rodriguez" as a "key player," evidence mentioned and suggestions for further investigation, but not with ranking of information sources, or reports of missing information. Experience level was not associated with decision-making.

Results
Case 2: Knowledge of prior interpretation was associated with inclusion of father scenario as a possible explanation and as the most likely explanation, and as suggestion for further investigation, but not with number of explanations generated. Experience level was not associated with decision-making. Marksteiner et al. (2010) • Materials: Computer-administered homicide case c description suggesting guilt or innocence of suspect, plus incriminating or exonerating eyewitness evidence. Asymmetrical skepticism among participants with a guilty hypothesis: Incriminating witness evidence (case-specific, and in general) viewed most favorably by those who initially considered the suspect guilty; the same pattern did not emerge for those who initially considered the suspect innocent.

N=107
police trainees, Sweden • Explanatory Measures: Eyewitness evidence consistent or inconsistent with initial beliefs (random assignment).
Human Nature • Primary Outcome Measures: Ratings of strength of evidence, probability of guilt (both 1 to 9 Likert scales), conviction decision (dichotomous) before and after review of witness information; ratings of reliability of the specific witness evidence provided and of the reliability of witness evidence in general (1 to 9 Likert scales) after review of witness information.
Rassin (2010) • Materials: Homicide case a description that included information about a named female suspect's motive and information about an alternative male suspect.
Results provided some evidence in support of and some evidence against the presence of confirmation bias.

Results
Study 2: N=45 police officers, district attorneys, and judges, Netherlands • Intervention: Pen and paper exercise rating degree to which each of 20 case details indicated the named suspect's guilt (10-point scale; 0 = exonerating, 10 = incriminating), probability of guilt (0 = definitely innocent to 100 = definitely guilty), and conviction decision (yes/no). Then told to imagine that the unnamed suspect had committed the crime and asked to rate the degree to which each of 20 case details indicated the alternative suspect's guilt (10-point scale; 0 = exonerating, 10 = incriminating). After this exercise, rated probability of named suspect's guilt a second time (0 = definitely innocent to 100 = definitely guilty), and made a final conviction decision (yes/no). Similar ratings of the extent to which the evidence fit named suspect hypothesis and alternative suspect hypothesis [Mean ± SD]: Intervention • Primary Outcome Measures: Guilt estimates and conviction rates before and after pen and paper and imagination exercise.
Guilty estimate decreased with intervention: Salet and Terpstra (2014) • Materials: In response to wrongful conviction in a child sex abuse and murder case in the Netherlands, a national commission recommended use of a critical review protocol (including the use of contrarians) for complex criminal investigations.
Researchers assessed the implementation and results of this critical review procedure through a review of real case files and review dossiers from five different police forces and interviews with lead investigators and contrarians. Critical reviews had concrete effects on criminal investigations but did not radically change the direction of any case.
N=26 case files and dossiers; interviews with 47 leaders of investigative teams and "contrarians," Netherlands • Intervention: Critical review procedures introduced into policework. Most police forces were practicing critical reviews in 2011, but variability in the role of contrarians between forces. Some contrarians used a closeness strategy (actively involved in the investigation), while others used a distance strategy (reviewing decisions after they were made

Results
Wallace (2015) • Materials: Computer-administered sexual assault case vignette (with either a child or adult victim), including 10 items of evidence (presented either simultaneously, sequentially, or in reverse-sequential order). Rated confidence in guilt or innocence after every piece of evidence and provided an overall, final decision about guilt or innocence. Results did not support the hypothesis that extreme emotion did not impact guilt judgements (or that victim age is not an effective manipulation of extreme emotion) as the age of the sexual assault victim did not influence guilt judgments.

N=166
police officers (basic training recruits, patrol officers, and criminal investigators), United States • Explanatory Measures: Presumed emotional impact of case scenario version (random assignment), professional experience (recruit, patrol, investigator), evidence presentation order -exculpatory evidence presented early or late (random assignment). The order of evidence presentation did matter, demonstrating confirmation bias. When exculpatory evidence was viewed prior to inculpatory evidence, guilt belief scores decreased significantly.
Human Nature; Individual Characteristics; Case-Specific • Primary Outcome Measures: Confidence in suspect's guilt or innocence (0 to 10 Likert scale), final decision about guilt or innocence (yes/no). Finally, confirmation bias was greater among police recruits (i.e., those with the least professional experience). a Homicide case vignette was the same as the others with this designation b Assault case vignette was the same as the others with this designation c Homicide case vignette was the same as the others with this designation d Missing person case vignettes were the same as others with this designation e The second study reported in this article used undergraduate student participants