Introducing EMMIE: an evidence rating scale to encourage mixed-method crime prevention synthesis reviews
- 3.9k Downloads
This paper describes the need for, and the development of, a coding system to distil the quality and coverage of systematic reviews of the evidence relating to crime prevention interventions. The starting point for the coding system concerns the evidence needs of policymakers and practitioners.
The proposed coding scheme (EMMIE) builds on previous scales that have been developed to assess the probity, coverage and utility of evidence both in health and criminal justice. It also draws on the principles of realist synthesis and review.
The proposed EMMIE scale identifies five dimensions to which systematic reviews intended to inform crime prevention should speak. These are the Effect of intervention, the identification of the causal Mechanism(s) through which interventions are intended to work, the factors that Moderate their impact, the articulation of practical Implementation issues, and the Economic costs of intervention.
Systematic reviews of crime prevention, and the primary studies on which they are based, typically address the question of effect size, but are often silent on the other dimensions of EMMIE. This lacuna of knowledge is unhelpful to practitioners who want to know more than what might work to reduce crime. The EMMIE framework is intended to encourage the collection of primary data regarding these issues and the synthesis of such knowledge in future systematic reviews.
KeywordsSystematic reviews Evidence quality Evidence standards Realist review Meta-analysis
The volume of research relating to crime prevention is enormous, but of varying quality. Policymakers and practitioners who want to improve their decisions by drawing on evidence thus face a variety of problems. These include, for example, finding the evidence, assessing its quality, working out which evidence is relevant to their issues, and persuading stakeholders that policy and practice should accord with what the evidence suggests.
Systematic reviews (SRs) have emerged as a method for finding, sifting, sorting and synthesizing the findings of primary evaluations relevant to particular interventions. Methods have been developed for the conduct of SRs, including the process of selecting studies for analysis, and the statistical meta-analytic procedures used to summarize the overall impact(s) of intervention. Despite this, just like the primary evaluations on which they are based, SRs vary in quality and do so in ways that should be considered by those involved in evidence based policy.
Building on earlier work concerned with primary studies (Perry et al. 2010; Sidebottom and Tilley 2012), this paper focuses on the assessment of the evidence quality of SRs, and provides guidance for the conduct of future ones. Measures of effect size are discussed, but consideration is also given to other dimensions of importance to practitioners—the intended primary consumers of SRs. These include what an intervention actually comprises and the ease with which it can be implemented. While the work reported is primarily focused on SRs, many of the issues are equally germane to primary studies.
In what follows, we first consider existing efforts that have provided the means to assess the quality of evaluation evidence. We draw on research from public health and medicine as well as crime prevention. Next, we consider what practitioners need to know. SRs that most adequately attend to all of the issues of importance will be more valuable to practitioners and so we present a rating scale1 designed to enable the systematic assessment of the quality of SRs of crime prevention initiatives, and to inform future ones.
Existing scales for assessing the evidence base
Scholars have noted that evaluations and clinical trials vary in quality, and that their reporting is often incomplete (e.g., Adetugbo and Williams 2000; Perry et al. 2010). In response, efforts have been undertaken to produce guidance regarding the conduct of primary studies (e.g., the CONSORT statement: Schulz et al. 2010; STROBE: von Elm et al. 2007), and SRs of them (e.g., AMSTAR, GRADE, PRISMA, RAMESES). In criminology, the Maryland Scale (Sherman et al. 1997) was developed to gauge the strength of individual studies according to their methodological rigor. It represents a 5-level hierarchy of evaluation evidence intended to indicate the extent to which an evaluation is able to rule out forms of bias as alternative explanations to a program effect. That is, it speaks to the issue of internal validity (see Campbell and Stanley 1963). It says little, however, about the level of detail that authors should report about primary evaluations conducted (i.e., their ‘descriptive validity’, see Gill 2011).
In other disciplines, more effort has been invested in the provision of such guidance. In the case of primary studies, Moher et al. (2010) report the most recent incarnation of the CONSORT instrument. CONSORT 2010 is a 25-item checklist that focuses on the reporting of randomized controlled trials (RCTs). It primarily focuses on the extent to which study conclusions can reasonably be attributed to the treatment investigated (i.e., internal validity). The Cochrane Risk of Bias Scale (Higgins et al. 2011) also considers such issues, paying particular attention to the blinding of treatment providers, recipients and analysts, and problems with placebos. While systematic, these scales are silent on other types of validity (Sidebottom and Tilley 2012).
Apropos SRs, the AMSTAR (Shea et al. 2007), GRADE (Guyatt et al. 2008) and PRISMA (Moher et al. 2009) guidelines were developed to facilitate the assessment of the methodological quality of conducted studies (see also Higgins and Green 2011). Like the checklists for primary studies, however, they tend to focus on issues of internal validity.
Beyond internal validity
Others have made similar suggestions (e.g., Cartwright and Hardie 2012; Rosenbaum 1988). The process of ‘realist evaluation’ and review has attempted to address such issues more directly (Pawson 2006; Pawson and Tilley 1997) and speaks to this agenda. In particular, realist studies explicitly focus on the causal ‘mechanisms’ through which interventions bring about their effects, the ‘contexts’ or conditions needed for treatments to activate potential causal mechanisms, and the ‘outcomes’ realized by the activation of causal mechanisms in the conditions in which they are introduced. What are produced in realist evaluations and reviews are Context, Mechanisms, Outcome pattern Configurations (CMOCs). This provides a framework for thinking about things other than effect size and factors that SRs might address.
The proper agenda for the next generation of treatment effectiveness research, for both primary and meta-analytic studies, is investigation into which treatment variants are most effective, the mediating causal processes through which they work, and the characteristics of recipients, providers, and settings that most influence their results. (Lipsey and Wilson 1993: 1201)
To illustrate the importance of this, consider that interventions may bring about their effects in various ways. One example is the variation in mechanisms through which CCTV might reduce crime in car parks. These include, for example, the ‘caught in the act’ mechanism which leads to specific deterrence and incapacitation of the offender; ‘you’ve been framed’, where the offender perceives an increased risk; and ‘memory jogging’, where the presence of cameras reminds users to take precautions (Pawson and Tilley 1997: 55–82).
Crucially, the mechanisms being activated will depend on the particular conditions of the car park. For example, ‘memory jogging’ can only occur when the cameras are positioned in observable places, and the ‘you’ve been framed’ mechanism will only be activated if offenders can see the cameras or are aware of them. SRs and primary evaluations alike can only tease out the possible mechanisms through which interventions work by articulating ‘logic models’ of how they might do so and collecting the necessary data to test them. SRs will, of course, be limited by what can be found in primary studies, but they should explicitly seek to locate such information, and indicate the absence of information as well as synthesize what is available.
In the case of SRs, the nearest counterpart to realists’ mechanisms and contexts are meta-analysts’ ‘mediators’ and ‘moderators’. Mediators describe the chains of events (or intermediate outcomes) that occur between a treatment and the ultimate outcomes produced. In our CCTV example, mediator variables that might be used to test for activation of the ‘caught in the act’ mechanism include the volume of offenders identified on CCTV footage, and the number subsequently prosecuted. In the absence of evidence that offenders had at least been identified on CCTV footage, this mechanism would not represent a plausible explanation for any impact observed. Such data should not be difficult to obtain in primary evaluations, and systematic reviewers should have no difficulty in determining whether chains of causality have been explored in primary studies.
While the checklists discussed above are silent on these issues, the SQUIRE guidelines (Ogrinc et al. 2008), developed to inform primary studies of quality improvement in healthcare, are not. The authors draw on the realist approach (see also RAMESES: Wong et al. 2013) suggesting (for example) that primary studies should “describe the mechanisms by which intervention components were expected to cause changes, and plans for testing whether those mechanisms were effective” (p. 65). The SQUIRE guidelines thus represent a useful complement to those that focus on issues of internal validity. However, such guidance has yet to be incorporated into advice for the conduct or rating of SRs—the focus of this paper.
Moderators are equally important. They refer to variables that may explain variation in outcomes across different studies. They can include circumstances associated with differences in the efficacy of the intervention, such as the type of location. For example, CCTV may work more effectively in contained environments (e.g., car parks) than in open spaces (e.g., town centers). They can also include the study methods employed. For example, weaker effect sizes may be reported for RCTs than quasi-experimental studies (Weisburd 2010). While SRs typically consider the latter type of moderator, more attention could arguably be given to the former.
As suggested by Lipsey and Wilson (see also Cartwright and Hardie 2012; Weisburd et al. 2015), to better inform policy, the evidence base needs to speak to how interventions work and where and when they might do so most effectively. Consequently, when assessing the quality of the available evidence, in addition to considering the extent to which evaluations manage to rule out biases that might distort estimates of effect size, we also need to gauge the extent to which they contribute to understanding of the contexts/moderators relevant to the activation of the mechanisms/mediators that produce variations in outcome across differing sub-groups.
Despite their focus on internal validity, the CONSORT and SQUIRE guidelines for primary studies include items on the implementation of interventions, asking whether they provide sufficient detail to allow replication elsewhere or to determine whether they will be suited to particular situations. In a clinical trial, this would include the dose of drug, and how and when it was administered. This is encouraging as implementation is rarely straight forward, but we suggest more is required.
Finally, because practitioners have limited budgets, resourcing one intervention means that something else must be forgone. Moreover, the most effective intervention tested will be of little practical value if it is prohibitively expensive to implement or maintain. Thus, to make good decisions, policymakers and practitioners need information on the overall costs and benefits of particular interventions and their alternatives. Current guidelines are typically silent on these issues.
The EMMIE framework
the overall effect direction and size (alongside major unintended effects) of an intervention and the confidence that should be placed on that estimate
the mechanisms/mediators activated by the policy, practice or program in question
the moderators/contexts relevant to the production/non-production of intended and major unintended effects of different sizes
the key sources of success and failure in implementing the policy, practice or program
the economic costs (and benefits) associated with the policy, practice or program.
Both primary evaluations and SRs may attend to each of these more or less adequately. In assessing the evidence, it is thus important to differentiate between what the evidence suggests (e.g., an estimate of effect size) and the quality of that evidence (e.g., the methodological adequacy of the studies on which the estimate is based). With respect to assessing evidence quality, a key question concerns how meticulous the reviewers were in attending to each dimension. In the next sections, we discuss each in turn. As noted, we focus on SRs. We do so as their intended purpose is to synthesize evidence on treatments—an exercise which can provide practitioners with a good starting point in selecting interventions.
E - Effects: overall effect direction and size
Factors that should inform the assessment of the methodological adequacy of an SR in terms of estimating effect sizes
Components (where appropriate)
A transparent and well-designed search strategy*
Higgins and Green (2011)
High statistical conclusion validity (at least four of the following are necessary for a study to be considered sufficient)*
–Calculation of appropriate effect sizes
–The analysis of heterogeneity
–Use of a random effects model where appropriate
–Attention to the issue of dependency
–Appropriate weighting of individual effect sizes in the calculation of mean effect sizes
Lipsey and Wilson (2001)
Borenstein et al. (2009)
Borenstein et al. (2010)
Hedges et al. (2010)
Lipsey and Wilson (2001)
Sufficient assessment of the risk of bias (at least two necessary for sufficient consideration)*
–Assessment of potential publication bias
–Consideration of inter-rater reliability
–Consideration of the influence of statistical outliers
Hedges and Vevea (1996)
Stock et al. (1982)
Huffcutt and Arthur (1995)
Attention to the validity of the constructs, with only comparable outcomes combined and/or exploration of the implications of combining outcome constructs*
Petticrew and Roberts (2006)
Assessment of the influence of study design (e.g., separate overall effect sizes for experimental and quasi-experimental design)
Bryant and Wortman (1984)
Assessment of the influence of unanticipated outcomes or spin-offs on the size of the effect (e.g., quantification of displacement or diffusion of benefit)
EMMIE evidence and five-point scales for assessing quality on each dimension
EMMIE-E (evidence itself)
Measurement/consideration of unanticipated effects
0. Insufficient consideration of validity elements listed above (in Table 1)
1: Sufficient consideration of one *element of validity
2: Sufficient consideration of two *elements of validity
3: Sufficient consideration of three or four *elements of validity
4: Sufficient consideration of five or six elements of validity (including all of those marked with an '*')
Map of possible mechanisms/logic maps
A priori mediator or mechanism-based moderator analysis
Post hoc mediator or mechanism-based moderator analysis
Assessment/statements of most likely mechanisms and any contextual conditions (these can be narratives)
0. No reference to theory; simple black box
1: Broad statement of assumed program theory stated (mechanisms and/or processes)
2: Detailed articulation of theory, based on interrogation of relevant literature and/or elicited from practice.
3: Formalization of theory and derivation of precise predictions from it
4: Test, corroboration, falsification and refinement of theories, using data assembled for the purpose.
A priori context-based moderator analysis/subgroup analysis (analysis testing the differences that context makes to outcome; theoretically driven)
Post hoc context-based moderator analysis/subgroup analysis (analysis testing the difference context makes to outcome; conducted due to data availability/not theoretically driven/ not mentioned prior to analysis)
Statements qualifying contextual variations (these can be narratives)
0: No reference to condition contexts or moderators that may be significant for activation of mediators or mechanisms
1: Ad hoc description of possible relevant moderators or contexts
2: Tests of the effects of moderators or mechanisms defined post hoc using variables that are at hand
3: Theory-based pre-specification of expected moderators and mediators relevant to the activation of mediators or mechanisms
4. Collection and analysis of relevant data relating to the pre-specified expected moderators and contexts.
A list/statement of key components necessary for implementation of reviewed interventions
A list/statement of key components deemed necessary for replication elsewhere
0: No account of implementation or implementation challenges
1: Ad hoc comments on implementation
2: Systematic efforts to document implementation issues
3: Detailed evidence-based account of expected levels of fidelity to program, policy or treatment plans
4: Complete evidence-based account of expected levels of fidelity to program, expected obstacles and specification of elements necessary for replication elsewhere.
Quantification of inputs to the intervention
Quantification intervention outputs
Quantification of intensity (e.g., spend per head)
Estimate of cost of implementation
Estimate of cost of implementation by subgroup
Estimate of cost-effectiveness per unit output
Estimate of cost-effectiveness by subgroup
Estimate of cost-benefit
Estimate of cost-benefit by subgroup
0: No mention of costs (and/or benefits)
1: Only direct or explicit costs (and/or benefits) estimated
2: Direct or explicit and indirect and implicit costs (and/or benefits) estimated
3: Marginal or total or opportunity costs (and/or benefits) estimated
4: Marginal or total or opportunity costs (and/or benefits) by bearer (or recipient) estimated
EMMIE-Q individual elements for scoring existing SRs and checklist for new SRs
1a. Transparent well-designed search strategy
1b. Calculation of appropriate mean effect size (ES)
1c. Analysis of heterogeneity in ES
1d. Attention to statistical dependency in effect sizes if appropriate
1e. Appropriate weighting of individual ESs
2a. Assessment of publication bias
2b. Consideration of inter-coder reliability
2c. Consideration of the influence of statistical outliers
3. Attention to construct validity
4. Moderator analysis of study design
5. Assessment of unintended outcomes
1. Mention of mechanisms and/or mediators
2. Search of literature relating to mechanisms and/or mediators
3. Discussion with stakeholders about mechanisms and reporting of these in the review
4. Articulation of the theory of change, including sets of testable intermediate (or proxy) variables and patterns that would be observed in the data
5. Collection and analysis of data that test whether mechansims and mediators are operating as expected
6. Conclusions that corroborate, falsify or suggest refinements to the mediator or mechanisms theories
1. Mention of moderators/contexts
2. Search of literature for causally significant moderators or contexts
3. Consultation with others (including practitioners) about the contextual factors that might matter
4. Pre-specification of theoretically or empirically derived moderators or contexts
5. Collection of data to allow the effects of context/moderators to be tested
6. Subgroup/moderator analysis undertaken
7. A priori moderator analysis undertaken theoretically
1. Description of what was delivered in practice
2. Identification (by review authors) of enablers and obstacles encountered when attempting to implement an intervention
3. Specification of what is crucial to the successful implementation of the intervention
4. Specification of what would represent a replication of the intervention
1. Collection of data to assess the direct financial revenue and set-up costs of the intervention as incurred by the provider
2. Collection of data to assess the indirect financial revenue and set-up costs of the intervention
3. Evidence-based quantification of the direct, financial costs of the intervention to the provider per unit cost of output (marginal costs)
4. Evidenced-based quantified estimate of the direct financial provider costs per unit of (positive or negative) outcome
5. Evidence-based estimate of the monetized financial and non-financial costs and benefits per unit of monetized financial and non-financial unit of intended and un-intended outcome
6. Evidence-based estimate of the distribution by stakeholder of direct and indirect costs and benefits
M - Mechanisms/mediators: how the policy, practice or program produces its effects
In pharmaceutical medicine, prior to clinical trials, much laboratory work is undertaken to test and refine understanding of the chemical and physiological processes through which a drug produces its effects. Such background work is rarely undertaken in crime prevention, and hence the mechanism(s) through which an intervention might impact upon crime are often poorly understood prior to implementation.
Moreover, social interventions are generally complex. What is delivered may differ from one site and time to another and there can be long causal chains between the intervention implemented and effects realized. Working out what it is about an intervention that brings about its intended (and unintended) outcomes is thus of practical importance. A strong primary evaluation will explicate the underlying theory or theories of an intervention, and assemble the relevant data to test it. A strong SR will summarize these theories, and synthesize the available evidence to test them.
To do this, authors of an SR may need to engage with a wider literature than is necessary to estimate the effect size of an intervention. Such studies might more explicitly articulate the mechanism(s) through which an intervention is expected to work, or provide a test of this.
An example of such a review is provided by Weisburd et al. (2015), who conducted a SR of broken windows policing (Wilson and Kelling 1982). To test for evidence of the broken windows mechanism (that intervention reduces residents’ fear of crime, and this in turn increases their willingness to act collectively to deter crime: p. 6), the authors searched not for studies that examined the impact of intervention on crime but for those that examined the impact on fear of crime and/or collective efficacy. They found no evidence to support this mechanism, but also concluded that “[t]here have simply been too few studies of the mechanisms underlying crime control in the broken windows policing model” (p. 11). We agree, and suggest that this is a more general issue in primary evaluations and SRs of them.
Table 2 lists the types of evidence that could be included in a SR that seeks to explain how an intervention works. As with the rating of effect size, we propose a 5-point scale for assessing the quality of an SR on this dimension.
M - Moderators/contexts: conditions for the activation of the mediator or mechanism
Interventions rarely work unconditionally or equally effectively each time they are applied. The location (geographic or otherwise) and time they are implemented can affect the outcomes observed, as can the characteristics of those who receive or implement them. In deciding if, when, where and on whom to target a specific intervention, policymakers need evidence on which settings and subgroups are most likely to benefit from the intervention, which will most likely be unaffected, and which may have possibly negative outcomes. For this, estimates of mean effect sizes will be insufficient.
Most SRs include statistical moderator analyses to examine effect size variation across subgroups. However, the selection of subgroups varies, as does the rationale for choosing them. Subgroups should not be chosen using standard variables of convenience. Instead, the moderators selected should ideally be those for which the theory (mechanisms and mediators) suggests variations are to be expected. Of course, in the case of SRs, a moderator analysis requires a statistical approach, and so the study authors will be constrained by the data available in the primary studies. However, where the relevant data are unavailable this should be explicitly stated in the review with a view to mobilizing its collection in subsequent primary studies.
Table 2 lists the types of evidence that should be included in a SR to document the contexts in which an intervention works, and how the quality of the evidence and the thoroughness with which it was sought out may be assessed.
I - Implementation: how the policy, practice, treatment or intervention is applied
For both successful and unsuccessful initiatives, it is important for the practitioner to know what was done, what was crucial to the intervention and what difficulties might be experienced if it were to be replicated elsewhere. For example, SRs of hot spots policing (e.g., Braga and Weisburd 2012) suggest that this approach to crime reduction is successful. However, those intending to replicate previous efforts need to know more than this. They need to know what to do. This would include an indication of how a spatial hot spot is defined—what density of crime defines a hot spot, or what should inform the selection of such an area. In the case of police patrols, they need to know how many officers might need to be deployed, how frequently and for how long. They need to know whether the effect on crime depends on patrol dosage (e.g., the number of officer patrol hours per day per unit area). And so on. This is problematic for crime prevention because without such information, attempts at replication may vary considerably in terms of what is actually done (e.g., Tilley 1996). It is also problematic for evidence synthesis, as evaluations of interventions that prima facie appear to be the same thing, might actually be rather different, and in some cases it may be that nothing was implemented at all. In this case, the primary evaluator and the systematic reviewer should take account of this.
Finally, even simple interventions can be fraught with difficulties (e.g., Johnson and Loxley 2001; Knutsson and Tilley 2009). Thus, practitioners need to know if particular interventions are easy or difficult to implement, if successful implementation is contingent upon particular conditions, and what is liable to impede or facilitate the process. We suggest that a strong review will focus on the issues listed in Table 2.
E - Economic analysis: the cost-effectiveness of the policy, practice, program, treatment or intervention
In policy terms, it is necessary but not sufficient that a given measure is capable of producing an intended outcome. In addition to the issues already discussed, the cost of intervention will ideally be known.
Estimating costs is complex. Comprehensive costing will include not only costs incurred by those responsible for the policy but also those falling on any third parties implicated in the delivery of interventions, the program participants themselves and those bearing any negative side effects (‘indirect costs’). As programs expand, there are often diminishing marginal costs on those delivering interventions, as set-up and capital costs (‘fixed costs’) are spread over an increasing volume of activity, and so only those variable costs that are explicitly associated with increased output (e.g., police time) will increase.
Various forms of economic analysis exist, two of which will be briefly discussed. Cost effectiveness is relatively straightforward. It can speak either to the unit of output (e.g., cost of treatment per day per offender imprisoned) or the unit of outcome (e.g., cost per crime prevented). Such analysis helps to inform practitioners of what it may cost to deliver a given level of intervention, or crime reduction, and enables comparisons across interventions.
Cost–benefit calculations are more difficult as they require monetization of both the costs of intervention and (say) crimes prevented. This is particularly complicated as the range of those implicated expands, as unintended side effects are incorporated and as emotional as well as direct financial costs and benefits are swept into the calculations (see Farrell et al. 2004).
We will not discuss these forms of analysis further (but refer the interested reader to Farrell et al. 2004; McDougal et al. 2008), except to emphasize the fact that the estimation of costs should ideally enumerate the complete portfolio of costs that are necessary to implement an intervention. McDougal et al. (2008) suggest a rating scale to assess the methodological adequacy of SRs that includes a cost–benefit analysis, but this has no provision for rating SRs that include only a cost-effectiveness analysis. Since the latter are helpful to practitioners, Table 2 shows the forms of evidence that could be reported in a review and our proposed quality rating scale for this dimension of EMMIE.
We have proposed five dimensions for rating the quality of SRs, described by the acronym EMMIE. Each dimension speaks to a different element of an SR, and may inform the decision-making or activity of different practitioners, or different stages of the policy-making process. Consequently, when rating reviews, we suggest that an EMMIE profile be produced rather than a single overall score. While our focus here has been on the rating of SRs, as noted above, with slight adaptation EMMIE scores can and should also be awarded to primary studies.
The use of EMMIE to rate existing SRs can help practitioners to assess the confidence they should place on the conclusions of a review. Applying the framework on an ad hoc basis will be helpful, but efforts by a consortium of universities led by UCL, in collaboration with the UK College of Policing, are also underway to systematically rate existing SRs using EMMIE (see Bowers et al. 2014; note that future publications will discuss the practicalities of operationalizing the approach and provide empirical examples). The ultimate aim of the exercise is to provide practitioners with an online tool (hosted by the UK College of Policing) to assist their engagement with, and understanding of, the available evidence.
As well as being used to rate existing studies, it is our hope that the EMMIE framework will inform the conduct of future primary studies and SRs. At this point in time, we expect (and have started to find that) existing SRs achieve relatively lower ratings on the MMIE dimensions than they do for effect size (E). However, by encouraging researchers to explicitly focus on these issues in future primary studies and reviews of them, we hope that this will soon change.
With this in mind, three points are worthy of discussion. First, the research methods required to score high on each dimension are liable to differ, some depending heavily on quantitative methods, others on more qualitative approaches, such as realist synthesis (e.g., Pawson 2002). Thus, as is hinted in the title of this article, we encourage the use of mixed-method SRs. Second, to score high on all dimensions of EMMIE, future SRs will ideally employ broader inclusion criteria during the search stage of the review than is traditional, searching for research that addresses dimensions of EMMIE other than effect size. SRs are, of course, time consuming to conduct and hence some pragmatism will be required. Where an extended search proves to be impractical, we suggest that the review authors note this and synthesize what evidence is uncovered as it speaks to each dimension of EMMIE. Moreover, to set an agenda for primary studies, one role of future SRs will be to explicitly note the absence of evidence for each dimension of EMMIE (see also Gill 2011; Perry et al. 2010).
It is unlikely that any single primary study will or could score full marks on all dimensions of EMMIE. One reason for synthesizing diverse studies is to draw together what is known across all dimensions. Confining attention to the methodological adequacy with which effect sizes are estimated can establish with some certainty what has worked and hence what can work. Limiting attention in this way, however, is less useful in working out what will work, particularly in new conditions, and what needs to be present and what needs to be done to make something work as efficiently and as effectively as possible. Yet the latter are crucial for policy decisions. Consequently, the EMMIE framework is intended to catalyze both primary and secondary research that speaks to this agenda.
Developed as part of an ongoing project, joint funded by the ESRC and UK College of Policing, to identify and rate existing SRs of what works to reduce crime.
This was undertaken as part of the Commissioned Partnership Programme: the What Works Centre for Crime Reduction. As this is delivered by a large consortium partnership in which ideas are freely exchanged, we would like to thank all of our colleagues who have assisted with this research. The opinions stated in this article are solely those of the named authors, and are not necessarily shared by other academics or organizations (such as the College of Policing). The research reported in this article was funded by the Economic and Social Research Council (ESRC) grant ES/L007223/1, and the College of Policing.
- Bowers, K., Tompson, L., & Johnson, S. D. (2014). Implementing Information Science in Policing: Mapping the Evidence Base. Policing, advanced online access.Google Scholar
- Campbell, D. T., & Stanley, J. C. (1963). Experimental and Quasi-Experimental Designs for Research. Boston: Houghton Mifflin.Google Scholar
- Farrell, G., Bowers, K., & Johnson, S. D. (2004). Cost-benefit analysis for crime science: making cost-benefit analysis useful through a portfolio of outcomes. In M. Smith & N. Tilley (Eds.), Launching Crime Science. London: Willan.Google Scholar
- Higgins J. P. T., & Green, S. (Eds.). (2011). Cochrane handbook for systematic reviews of interventions Version 5.1.0 [updated March 2011]. The Cochrane Collaboration. Available from www.cochrane-handbook.org.
- Johnson, S. and Loxley, C. (2001) ‘Installing Alley-Gates: Practical Lessons from Burglary Prevention Projects’. Home Office Briefing Note 2/01. London. Home Office.Google Scholar
- Johnson, S. D., Guerette, R. T., & Bowers, K. (2014) Crime displacement: what we know, what we don’t know, and what it means for crime reduction. Journal of Experimental Criminology, 10(4), 549–571.Google Scholar
- Knutsson, J., & Tilley, N. (2009). Introduction. In J. Knutsson & N. Tilley (Eds.), Evaluating crime reduction initiatives (pp. 1–6). New Jersey: Prentice Hall.Google Scholar
- Lipsey, M., & Wilson, D. (2001). Practical Meta-Analysis. London: Sage.Google Scholar
- Moher, D., Hopewell, S., Schulz, K. F., Montori, V., Gotzsche, P. C., Devereaux, P. J., Elbourne, D., Egger, M., & Altman, D. G. (2010). CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomized trials. Journal of Clinical Epidemiology, 63, e1–e27.CrossRefGoogle Scholar
- Ogrinc, G., Mooney, S. E., Estrada, C., Foster, T., Goldmann, D., Hall, L. W., Huizinga, M. M., Liu, S. K., Mills, P., Neily, J., Nelson, W., Pronovost, P. J., Provost, L., Rubenstein, L. V., Speroff, T., Splaine, M., Thomson, R., Tomolo, A. M., & Watts, B. (2008). The SQUIRE (standards for QUality improvement reporting excellence) guidelines for quality improvement reporting: explanation and elaboration. Quality and Safety in Health Care, 17(Suppl 1), i13–i32.CrossRefGoogle Scholar
- Pawson, R. (2006). Evidence-Based Policy. London: Sage.Google Scholar
- Pawson, R., & Tilley, N. (1997). Realistic Evaluation. London: Sage.Google Scholar
- Shea, B. J., Grimshaw, J. M., Wells, G. A., Boers, M., Andersson, N., Hamel, C., Porter, A. C., Tugwell, P., Moher, D., & Bouter, L. M. (2007). Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Medical Research Methodology, 7(10), 1–7.Google Scholar
- Sherman, L., Gottfredson, D., MacKenzie, D., Eck, J., Reuter, P., & Bushway, S. (1997). Preventing Crime: What Works, What Doesn’t, What’s Promising. Washington DC: US Department of Justice Office of Justice Programs.Google Scholar
- von Elm, E., Altman, D. G., Egger, M., Pocock, S. J., Gøtzsche, P. C., et al. (2007). The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. PLoS Medicine, 4(10), e296. doi: 10.1371/journal.pmed.0040296.CrossRefGoogle Scholar
- Weisburd, D., Hinkle, J., Braga, A., and Wooditch, A. (2015). Understanding the Mechanisms Underlying Broken Windows Policing: The Need for Evaluation Evidence, Journal of Research in Crime and Delinquency (in press).Google Scholar
- Wilson, J. Q., & Kelling, G. L. (1982). Broken windows: the police and neighborhood safety. Atlantic Monthly, 211, 29–38.Google Scholar
- Wong, G., Greenhalgh, T., Westhorp, G., Buckingham, J., & Pawson, R. (2013). RAMESES publication standards: realist syntheses. BMC Medicine, 11(21), 1–14.Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.