Introduction

Systematic reviews are a contemporary method in the evolution of evidence synthesis; aiming to synthesize findings from primary studies to build a composite picture of the evidence in a field. They supplant traditional narrative reviews, common throughout the 20th century, which have been extensively criticized for having opaque methods and potentially unreliable conclusions (Glass et al. 1981). The defining feature of a systematic review is that it uses explicit and accountable methods (Gough et al. 2013). The method is grounded in the principle that decision points in the research process should be reported in enough detail so they are transparent and replicable by others, and findings can be scientifically verified. By embracing wide-ranging and comprehensive search tactics, through which multidisciplinary academic and grey literature is pursued, and having “firm editorial criteria” (White 2009, p. 53) that are explicitly documented, systematic reviews mitigate the possibility of bias.

When appropriate, and possible, primary study data located via systematic search methods are statistically aggregated by meta-analytic techniques. Crime and criminal justice studies have been slower than other fields—for example, psychology and medicine—to embrace meta-analytic approaches for knowledge accumulation objectives (Wells 2009). However, the last decade has seen a tremendous increase in application of this aggregative method; in 1995, Petrosino reported that there had been 13 meta-analyses of crime prevention programmes to date (Petrosino 1995), in 2014 this number exceeded 184 (Bowers et al. 2014a).

A transparent search strategy is a distinctive characteristic of systematic reviews and is intended to avoid subjectively choosing the sample of studies for synthesis. Subjectivity in the search and selection process—synonymous with data collection—creates the risk of producing unreliable or unrepresentative findings (Brunton et al. 2013). Hence, objectivity is essential for calculating unbiased estimates of effect in quantitative meta-analysis, as well as thematic configuration in qualitative evidence synthesis.

Although information retrieval is a specialist skill (Rothstein and Hopewell 2009), relatively little scholarly attention has been paid to how primary studies are retrieved while synthesizing evidence in the fields of criminology, criminal justice, and crime prevention. Pioneering work by Petrosino (1995) is one of the few attempts to describe in detail the data collection phase in a crime prevention meta-analysis. Whilst comprehensive, and still relevant for many of the principles of searching, this work is dated, since the technological tools used to search and locate relevant records for evidence synthesis have rapidly changed in the last two decades.

This paper is intended as an update of Petrosino’s 1995 methodological note on information search and retrieval. It documents the experience of applying principles of information retrieval espoused by the Campbell Collaboration (Hammerstrøm et al. 2010) to a research project assembling the evidence base in crime prevention. In doing so, we elicit the practical difficulties involved in systematically searching for evidence in the current digital era and present possible solutions for overcoming them.

We begin by describing the nature and background of the research project that will be used to illustrate our arguments. Subsequently, we present our experiences of applying the principles of information retrieval in the area of crime prevention and discuss our key findings. The final section presents our conclusions and some guidelines to aid future researchers conducting systematic reviews in allied fields. This is the first attempt of its kind to develop a ‘how-to’ guide to conduct systematic searches in crime prevention and can be extended to criminology and criminal justice studies.

Overcoming bias

The introduction of conscious and unconscious bias into the systematic review process can take many forms. Conceptually, the most important form of bias is selection (or sampling) bias, in which the sample of studies selected fundamentally differs from the wider population of studies. In evidence synthesis nomenclature this is known as publication bias, and has been acknowledged in the academic literature for four decades (Greenwald 1975). It might be apt to rename this ‘easier to find bias’ since studies that are more visible are more likely to feature in a systematic review.

Strong, positive effect sizes—that is, in support of a hypothesis—are attractive to the academic community, to journal editors, and to funding bodies. They are therefore likely to be published more (publication bias), published more rapidly (time lag bias), cited more often (citation bias), published in multiple publications and outlets (multiple publication bias), and published in English (language bias) (Alderson and Green 2002).

Studies published in academic outlets (such as journal articles) are most visible to other researchers. Empirical research has established that published studies are not representative of all the research findings that are generated (Lipsey and Wilson 1993; Rothstein and Hopewell 2009; Smith 1980)—a problem commonly referred to as the ‘file drawer’ problem (Rosenthal 1979). That is, findings that produce neutral or negative results often languish in the bottom of filing cabinets, rather than get written up for publication in peer-reviewed journals (Dickersin 2005; Wilson 2009). If publication bias favors studies with positive results, then the aggregation or synthesis of these skewed results can culminate in a review that overestimates the effect of an intervention. Basing policies or practices on such erroneous effects can lead to a wastage of resources and potentially worsen a problem.

In criminology and criminal justice substantial research is conducted and produced by practitioners, or funded by bodies that receive no benefit in publishing findings in peer-reviewed journals. If written up at all, practitioner generated research is published outside of the control of academic publishers—known as grey (gray) literature (Auger 1998). Omitting grey literature from a systematic review can be costly, for Smith (1980) found that academic publications reported a higher programme effect than comparative studies in the grey literature.

A high-quality systematic review, then, aims to minimize bias in the sampling of studies for synthesis. In fields like medicine, exhaustive searching is the ‘gold standard’, as the point of such systematic reviews is to aggregate the findings of quantitative data (DeLuca et al. 2008). Here, researchers strive for a “sufficient sample of studies for unbiased aggregation” (Brunton et al. 2013, p.108), which involves identifying studies in the universe of studies that fit the inclusion criteria. Crucially, however, it is rarely possible to know the limits of the universe of the population being studied (i.e., the sampling frame).

The survey of crime prevention evidence syntheses

As part of the UK Government’s commitment to evidence-led policy-making, the ‘What Works Centre for Crime Reduction’ (WWCCR) was formed in 2013. This centre, hosted by the College of Policing (CoP), is supported by a ‘commissioned partnership programme’, delivered by a consortium of universities. A vital component of the research programme is a project to systematically identify the evidence base for interventions with a crime reduction focus. The overarching aim of the research programme is to inform the decision-making of practitioners and policy-makers.

Our research question for this project was set by the mission of the WWCCR: to identify the best available evidence on approaches to reducing crime (and the potential savings to the police service, their crime reduction partners and the public). Footnote 1 We defined ‘best available evidence’ as the findings from systematic reviews (anchored in any methodological philosophy) or meta-analysis evidence syntheses. ‘Approaches to reducing crime’ was translated into the broadly defined ‘crime prevention’, which required a measured crime reduction or prevention outcome (for more details on the inclusion criteria see Bowers et al. 2014b). Our overall aim was to search for evaluations of interventions in all relevant fields that might have a crime prevention outcome. To our knowledge, this was the first time such an ambitious task had been attempted.

To identify evidence on crime prevention we adopted principles of high-quality information retrieval commonly used by evidence synthesists (see more below). In this tradition, we created a review protocol that outlined the search strategy to be used (Bowers et al. 2014b). For quality assurance, this was reviewed by a panel of external experts and the core team at CoP. The search strategy articulated in the protocol stated that we would (1) review lists of known systematic reviews in crime prevention; (2) perform keyword searches of electronic databases; (3) review ‘grey literature’ reports from government, research and policing organizations; (4) check the reference lists of relevant studies; and (5) consult with experts. The literature searches were performed at the authors’ institution and the UK National Police Library, with additional support from information specialists in the US and Australia.

Overall, the search tactics yielded a list of around 17,500 research articles, which were imported into the EPPI Reviewer 4 software tool.Footnote 2 This web-based research synthesis tool assisted in automatically identifying many of the duplicates, had functionality for coding and data extraction and provided a means of tracking allocated workloads. Once duplicate records were removed just under 14,000 articles remained. An extensive two-stage screening process was then employed; the first stage sifting the articles on title and abstract for potential candidate studies, with the second-stage screening the shortlisted full-text articles for eligibility against a set of inclusion criteria (Bowers et al. 2014b). Concomitantly with the second stage, the articles that met the inclusion criteria (n = 320) were ‘light coded’ to extract basic information about each evidence synthesis such as study characteristics, population, outcomes, and methods used.Footnote 3 A description of the ‘evidence map’ produced from these light codes is available in Bowers and Tompson 2014.

The search strategy

We first reviewed materials from organizations with a well-established pedigree in systematic review methods. Chiefly, this included the internationally renowned Campbell Collaboration (Petrosino et al. 2001). We drew heavily from their comprehensive guidance document on searching for studies by Hammerstrøm and colleagues (2010), and viewed their specialist SPECTR databases and YouTube channel of conference sessions. We garnered further best practice from another known center of excellence—the UK University of York’s Centre for Reviews and Dissemination.

Our search strategy was designed to follow procedures identified in best practice, but the process was also iterative to balance sensitivity and precision (Hammerstrøm et al. 2010). Sensitivity relates to casting a wide enough net so that all relevant studies are retrieved, whereas precision refers to the proportion of relevant studies to all studies retrieved. Crucially, precision and sensitivity are incompatible aims; highly sensitive strategies result in extensive searches that generate a lot of leads to irrelevant studies (false positives) (White 1994). High-precision strategies necessitate low sensitivity, and result in a higher number of false negatives (relevant studies not identified). Balancing the two is vital.

As the objective was to amass the evidence base in crime reduction, our protocol was intentionally sensitive. This meant that we chose many different sources and generated a wide range of keywords to apply in electronic searches. It is well known that the knowledge base across criminal justice is disparate and fragmented (Farrington et al. 1986; Hammerstrøm et al. 2010), and that studies on crime prevention span many fields. To ensure that we were not limited to our own disciplinary boundaries (and thereby liable to introduce bias into the selection of studies—see Brunton et al. 2013), wide-ranging and multi-disciplinary sources were selected. The feedback garnered from circulating the protocol assisted in the development of our sources and keywords. As principal sources of primary studies, the search process was started by conducting preliminary searches of relevant electronic bibliographic databases.

Electronic bibliographic databases

Each electronic bibliographic database usually focuses on a discipline, collection of disciplines, or a commercial publisher, and within this purview a collection of research records are held. Some are subject area (e.g., criminal justice) or discipline (e.g., social science) specific, others are multi-disciplinary (e.g., SCOPUS and Web of Science) and contain millions of records. Since no single database contains all extant research records, it is essential to discover the scope of each database by examining their particular focus. While Google Scholar is a popular source for validating search results from electronic databases, the limitations of its functionality mean that it has yet to usurp professional literature retrieval tools (Boeker et al. 2013).

Each record in a database has standard fields (such as Citation details) and indexed fields that can be used in a search. Oftentimes an abstract will be contained within a record. A few databases hold the full text documents for each record, but this is not typical. Detailed overviews of the databases relevant to criminology and criminal justice focused research questions are available in Reed and Baxter (2009) and Hammerstrøm et al. (2010).

Selecting appropriate databases to search is important, but usually constrained by access rights. At an early stage of our project, we carried out a scoping exercise that determined where the key journals likely to publish systematic reviews on crime prevention topics were indexed. These journals were derived from lists of known systematic reviews and meta-analyses in the criminal justice field (Petrosino 1997; Weisburd and Farrington 2015; Wells 2009). We also called upon information specialists in other countries to search databases not available to us (CINCH and Social Sciences Full Texts).

Generating search terms for electronic databases

Keyword searches of electronic databases require the generation and testing of individual search terms, and thereafter the compilation of a search syntax combining all the terms. Generating search terms requires a two-fold approach: the creation of a list of natural language search terms, and the creation of a list of controlled vocabulary terms. Controlled vocabulary is the manual assignment of descriptors to a research record, and is done to link studies that might not share the same natural language vocabulary. They are supposed to ‘re-express’ the content of documents with standard indices, often referred to as subject headings (White 1994). (Anticipated controlled vocabulary for the current study can be found in the author keywords for this paper.) The multidisciplinary nature of criminology and criminal justice studies results in idiosyncratic and scattered vocabulary. For this reason, to reduce false negatives, all relevant synonyms and appropriate subject headings should be used in a search (Reed and Baxter 2009).

The research question is the orientating force for search term generation. The question, therefore, should be expressed clearly and precisely, and subsequently be broken down by major ideas (concepts) or components of the question. A list of terms and their synonyms can then be developed which articulate these individual components. In our survey of crime prevention evidence syntheses we needed to identify crime terms, prevention terms and research design terms.Footnote 4 Defining crime prevention required a considered approach; we chose to break it down into crime and prevention, which afforded the opportunity to define crime types of interest. We compiled a list of these from UK government websites,Footnote 5 and augmented this with terms for more general criminal behavior. We then considered international equivalents—for example ‘delinquent’ and ‘offender’ for the term ‘criminal’.

To refine our keywords, and to assist in generating controlled vocabulary, we took advice from information specialists who recommend using known (potentially) eligible studies to establish a baseline of search terms (Hammerstrøm et al. 2010; White 1994). These were harvested from the reference lists of existing meta-analyses and systematic searches in criminal justice and crime prevention (Petrosino 1997; Weisburd and Farrington 2015; Wells 2009). This search tactic is known as backwards reference checking, which is often undertaken towards the end of a search strategy. In our experience, however, this is fruitful to do at the outset, for it produces a set of studies that can be used in various ways to benchmark the relevance of syntaxes developed for electronic database searches.

We looked up the potential candidate studies (n = 199, 63 of which were eventually appraised as eligible; see Table 3) to ascertain how they were recorded in electronic databases. In doing so, we found a large variance in the controlled vocabulary used to index the research records. Prominently, we found that records indexed in separate databases held by the same commercial providerFootnote 6 had different controlled vocabulary. This chimes with the warnings issued in the information specialist literature; index terms are considered relatively robust in medical databases, but are inconsistently applied in other databases (Hammerstrøm et al. 2010). So whilst controlled vocabulary appears more objective, it is not infallible; it is often produced by postgraduate students without domain expertise who subjectively interpret the study.

Further, we used database thesauri whenever available to generate broader and narrower terms associated with crime and crime prevention. These thesauri organize controlled vocabulary in a database into an alphabetical listing or hierarchical system so that a term can be used to search for other related terms. When searching in some databases, one can select clusters of controlled terms to appear in the search; this is referred to as ‘exploding’ a subject heading (index term) so that its subordinate terms are also used. Thesauri are database-specific, therefore controlled vocabulary needs to be tailored for each database searched and particular care must be taken if multiple databases are searched simultaneously through a commercial provider interface.Footnote 7

The final stage of generating search terms related to the research design component of our inclusion criteria—evidence syntheses. This stipulated that a study had to either be a systematic review and/or a meta-analysis. As there are no developed reporting guidelines for systematic reviews in crime prevention (see Sidebottom and Tilley 2012), there are many search terms that could be used by scholars to describe an evidence synthesis. For example, ‘scientific review’ and ‘bibliographic review’ are, perhaps, two of the less obvious terms used. For this reason, we consulted search filters designed to identify systematic reviews in other disciplines,Footnote 8 and added terms we felt might be used across the social sciences and public health literature.

Testing search terms

It is prudent to test search terms before embarking on searching electronic databases (Petrosino 1997). Dip sampling some results may reveal missing search terms. Additionally, estimates of the likely search hits obtained can be compared to available resources. This testing can be likened to the ‘piloting’ stage of a primary study, whereby the feasibility, time expended and adverse effects can be estimated. Some search terms in their first incarnation may be too broad, and return an unmanageable number of hits in a database search. For example, one of the search terms in our first list was ‘review’, representing the research design we were interested in. When this term was used to search the title, abstract, and keywords in the large multi-disciplinary electronic database SCOPUS, over three million hits were returned. Other imprecise terms were ‘drugs’ and ‘disorder’, which retrieved an unwieldy number of medical studies.

To increase the precision of our keywords, we incorporated speech marks, wildcards, and proximity operators into our search terms. Speech marks are simply used to treat the words within a phrase as related, such as “calls for service” and “bodily harm”. Larger databases typically have functionality for inserting wildcards into search terms. These take the form of left-hand truncation (e.g., *crime), right-hand truncation (e.g., crime*), or both. Wildcards can also be inserted in the middle of words to account for international spelling variations (if the database does not automatically search for these); for example, ‘offen?es’. These can be applied with ascending sophistication depending on the functionality of the database.

Furthermore, proximity operators offered in many databases help increase the precision of terms. For instance, a search for studies that focus on offenders who have committed firearms offences can be expressed in many ways. Ignoring synonyms here for the sake of clarity, the two key search terms are firearms and offenders. Using these with the Boolean operator AND yields an unmanageable number of search hits when applied to the title and abstract fields. Using a proximity operator (e.g., NEAR or WITH) allows the association between the words to be specified within given parameters. For example, the following syntax specifies that the first word (with wildcard for plurals) needs to be located within five words of the second (also with wildcard)—“firearm? NEAR/5 offender?”.

We performed sensitivity analysis on each search term with the aim of generating the most precise version to be included in the final search. This involved trialing different combinations of the wildcards and proximity operators, and empirically testing the results to ascertain if the known studies (as derived through the backwards searching described above) would be captured.

The next step was translating the search syntax to each database. Typically this requires identifying non-universal controlled vocabulary and technical conventions. Careful reading of the FAQs and search tips provided by each database reveals appropriate application of wildcards and proximity operators alongside limitations that searchers should be mindful of. Not all databases alert a searcher when they misapply these operators, and therefore adhering to technical conventions of each database can protect against unsuitable results being returned.

Building up search syntaxes

Electronic database fields can be searched with individual search terms, but the most powerful types of searches apply a search syntax, which is a collection of search terms interspersed with Boolean operators (see below). The chief operators referred to here are ‘AND’ and ‘OR’; the ‘NOT’ operator should be applied with caution due to the instability of the results it can return (Hammerstrøm et al. 2010).

As noted previously, a review’s research question will steer the search strategy and syntax used. The area of interest usually lies at the intersection of multiple concepts—be they population, intervention, comparator, or outcome. Our search had three primary concepts: crime, prevention, and an evidence synthesis research design. Figure 1 illustrates these as a Venn diagram: the shaded area represents our area of interest—where we wished to locate relevant studies. The Boolean operator AND is used to retrieve studies that fall within the overlap between such concepts.

Fig. 1
figure 1

A Venn diagram illustrating the locale of the studies of relevance to the survey of crime prevention evidence syntheses

Recall that researchers may use different words to describe the same concepts—especially across fields studying similar phenomena from different ideologies. This diversity of expression is essential to capture in a search, which is the reason synonyms for each of the concepts from the research question are used. These synonyms are combined with the Boolean OR operator. A useful schematic from Hammerstrøm (no date) is adapted in Fig. 2 to demonstrate how the OR and the AND operators are combined in a search.Footnote 9 This shows that search terms that represent the same concept are combined with OR, whereas search terms that represent different concepts are combined with AND. Applying Boolean operators in an appropriate structure is critical to a well-performing search.

Fig. 2
figure 2

Schematic illustrating how search terms are combined with Boolean operators across the different concepts of a research question (adapted from Hammerstrøm (no date))

The preceding discussion segues to the logic of combining natural language and controlled vocabulary terms in a search. Usually an electronic database will permit the searcher to ‘build’ search syntax incrementally, so that each step of the search can be viewed alone, and in combination with other steps. Our search, with the typical sequence of steps, is illustrated below. This emphasizes that each concept is broken down into natural language and controlled vocabulary terms; and these are searched individually before being combined with OR (steps 3, 6, and 10). Once the search terms from the distinct concepts are appropriately fused, they are then combined with each other using AND (steps 7 and 11)—resulting in the final search syntax.

  1. 1.

    Natural language terms for crime types

  2. 2.

    Controlled vocabulary terms for crime types

  3. 3.

    #1 OR #2

  4. 4.

    Natural language terms for prevention outcome

  5. 5.

    Controlled vocabulary terms for prevention outcome

  6. 6.

    #4 OR #5

  7. 7.

    #3 AND #6

  8. 8.

    Natural language terms for evidence synthesis research design

  9. 9.

    Controlled vocabulary terms for evidence synthesis research design

  10. 10.

    #8 OR #9

  11. 11.

    #7 AND #10

Refining search results

Once the search syntax was formulated for each database, we considered how the search results might be refined. Common ways of doing this include restricting the results by time period or by language. We chose to restrict our search to after 1975—the first documentation of systematic review methods (Gough et al. 2013). Further, due to resource constraints, our protocol specified a restriction to studies published in English—a common but risky strategy (as it incurs publication bias, see Brunton et al. 2013).

To manage the numbers of the search results we applied further restrictions. These included the document type (e.g., excluding speeches, meeting papers, teaching materials, book reviews); the precise subject heading (e.g., excluding subjects such as “film studies” and “food science”); or, in the case of larger databases which permit visual analysis of the search results, exact keywords (e.g., in SCOPUS we excluded terms such as “heart arrest” and “prognosis”). For quality assurance, and to protect against subjective bias creeping into the search at this stage, two researchers agreed the removal of every term from the search results. We again tested the results against our list of known studies to check that the removal of terms did not have deleterious consequences.

Lastly, in the databases containing grey literature (see more below) we found few index terms that could be used to restrict the considerable search results. For the theses and dissertations, we reasoned that crime or security must be mentioned at least once in the document, and therefore refined the results with these keywords (using the full text as the search field).

Results of electronic database searches

Table 1 illustrates the 15 databases that were searched in our study, along with, for each, the total number of records retrieved, the number of records duplicated across multiple databases, the number of unique records, the number of relevant studies,Footnote 10 and, where available, the approximate number of records in each database at the time of the search. Visual inspection of Table 1 shows that over 15,000 records were identified across the 15 databases, with over a fifth of these being duplicates. The number of records that were returned from each database roughly corresponds with the overall database size—so that large interdisciplinary databases like SCOPUS and Web of Knowledge returned a substantial number of records. However, more modestly sized topic specific databases such as Criminal Justice Abstracts and National Criminal Justice Reference are the exception to this trend, returning many relevant and non-relevant records. It should be noted that PsycINFO was the database used to test the reference data set of studies against decisions made in the search strategy process, and therefore the proportion of relevant hits for this database could be an artefact of this process, rather than being the most fruitful in terms of precision.

Table 1 Studies found in individual databases for the survey of systematic evidence in crime prevention

The distribution of eligible records across the 13 databases we searched (n = 174)Footnote 11 can be seen in Table 2. The ‘total’ column is the number of records in the eventual list of included studies, for each database. The ‘unique’ column refers to the number of studies, for each database, that were not retrieved from other databases. This shows that just under half (84 of 174 studies) were found in a single database, which reinforces that a thorough search cannot rely on one database alone. Table 2 also shows that, for this search strategy at least, there is some redundancy across the databases; for example, three databases (ASSIA, IBSS, and T & D) have no unique records. This suggests that it was unnecessary to search these databases to arrive at the eventual sample of eligible studies. However, with no empirically derived guidelines available this could not be anticipated prospectively. For other research questions, it is likely that the pattern of high and low uniqueness across databases would be different. We now turn to the complementary sources used in our search.

Table 2 Database overlap for the studies found in the survey of systematic evidence in crime prevention

Grey literature sources

Systematic reviews that rely exclusively on electronic databases are likely to miss important studies, thus biasing the overall results of the synthesis. An imperative part of our search strategy was pursuing ‘grey’ literature. This was difficult: presumably grey literature is also known as ‘fugitive literature’ (Sechrest et al. 1979) because it is elusive and hard to track down via traditional means. Purposive searching for grey literature is demanding throughout the entire research process—it is more challenging to search for, because structured databases on grey studies are few. It is more resource intensive to retrieve documents (Reed and Baxter 2009), and it can considerably increase the final sample of studies for synthesis (Weisburd et al. 1990). Accordingly, the retrieval of grey literature is now considered a distinct research specialization (Rothstein and Hopewell 2009) and guidance strongly recommends collaborating with information specialists (Brunton et al. 2013; Hammerstrøm et al. 2010; Reed and Baxter 2009). To this end, we commissioned the services of a grey literature information specialistFootnote 12 with extensive criminal justice knowledge to assist in searching for grey literature.

First, we searched for grey literature through the electronic database search. Some of the databases (in Tables 1 and 2) contained grey literature as well as academic publications—for example National Criminal Justice Reference Service and Criminal Justice Abstracts. Others focus exclusively on grey literature, namely, PsycEXTRA, ProQuest theses and dissertations and Social Policy and Practice. Structured keyword searches, as outlined above, were used to retrieve grey literature from these sources.

Conference proceedings can also be an invaluable source of grey literature. For reasons unrelated to research quality, many studies that are presented at conferences are never subsequently published (Rothstein and Hopewell 2009). Of those that are, they may be published many months or years later than the conference (Scherer et al. 2007). Whilst (for large conferences) searching can be a time-consuming business, it can generate important leads to studies that may not otherwise be found, and is a good way of horizon-scanning for ongoing studies. The abstracts from the American Society of Criminology are indexed in Criminal Justice Abstracts and thus were picked up in our keyword search of that database. These led to one eligible dissertation being identified.

The grey literature information specialist assisted in manual searches (called hand searches despite these sometimes being conducted electronically) of publications from 25 international policing agencies and practitioner-oriented research organizations (see Bowers et al. 2014b for more details), along with the large gray (grey) literature database managed by Rutgers University. We also had help searching the UK National Police Library catalogue and consulted wider grey literature resources.Footnote 13

Supplementary sources

As previously mentioned, we employed backwards reference checking at the beginning of the search. This is part of a wider form of lead-generation that can be done in either direction: backward searches look at the references of eligible studies, forward searches look at which authors have cited an eligible study in their own references (known as ‘citation analysis’). The approach is akin to snowball sampling in primary studies, whereby one study leads to several others, and each lead is followed until all are exhausted. It is referred to as ‘pearl growing’ or ‘pearling’ (Cooke et al. 2012).

In previous studies, reference checking has been reported as a particularly fruitful search tactic; for example Papaioannou et al. (2009) found 7 % of their eligible studies in a social science systematic review through this method. This is because the citations that authors choose “make substantive or methodological links between studies explicit” (White 1994, pp.51–52). Such links are made independent of vocabulary and, accordingly, they remedy the limitations of relying on natural language terms and controlled vocabulary in electronic databases. Therefore they provide connections between studies on a given topic which might not have been obvious through other means (Pao and Worthen 1989).

Citation analysis is time-consuming, and it is wise to devise a means of prioritizing studies to be analyzed in this way. We performed citation analyses in Web of Knowledge on ‘review of reviews’ method studies, and those that had been initially ‘light’ coded as integrating multiple systematic reviews. These 35 studies generated 460 leads—395 were already identified, but the remainder mostly consisted of leads to grey literature. Six studies were eventually included in our survey through these means; and several others were linked to included studies (that is, they were the same study written up in a different publication).

During the course of searching, we called upon the so-called ‘invisible college’ (Cooper 1989) of researchers in crime prevention to help us identify eligible studies. In particular, the lead author contacted scholars via a range of electronic routes to chase ‘leads’ from conference abstracts and other studies that were mentioned to us by academic and practitioner contacts. The scientific community was especially helpful when sourcing the full texts of potentially relevant studies, and our grey literature information specialist utilized her well-developed network for this task. Researcher networks are fundamental to scientific communication and should be considered a source of studies in their own right—in particular ones that have never been published and are sitting in (electronic or physical) file-drawers. Unpublished studies are inaccessible to any other search tactic, and therefore contacting experts in relevant fields is an important component of any search strategy. Moreover, in the process of screening and coding the studies derived through our search, we noted prolific authors and consulted online lists of their publications. Relatedly, when sourcing book chapters we browsed other chapters in the books in the event they led us to other candidate studies (Reed and Baxter 2009).

Sources of candidate studies

Eventually, once duplicates and linked documents—that is, studies published in multiple places—were removed, 320 candidate studies were eligible according to our inclusion criteria. The sources of these studies are presented in Table 3 according to the chronology in which they were searched (duplicates found as the search progressed are not included). This shows that, as previously mentioned, backwards searches of lists of known systematic reviews in crime prevention was productive in producing a baseline of studies to work with. Electronic database searching was the most lucrative, netting 164 studies (if the dissertation lead from the conference proceedings is included). However, it should be noted that searching and screening the database search results was also the most time-consuming part of the process—it took a team of three full-time researchers approximately 6 months to search and screen all the electronic database hits (including the protracted stage of testing and refining the search terms), with two postgraduate students sourcing several hundred full texts for the second screening phase.

Table 3 Sources of candidate studies

Outside of electronic databases, searches for grey literature (the UK Police Library Catalogue and hand searches of practitioner websites), yielded a total of 72 studies; see Table 3. Forward searches based on a selection of eligible studies generated six studies. This is a smaller proportion than reported in searches for primary studies, which may have been influenced by our method of prioritizing studies for citation analysis. Lastly, serendipity led us to identify 15 studies as eligible for our final sample. These were studies found through our scouring of prolific authors’ publication lists, skimming additional chapters in identified books, journal alerts and notifications from platforms that share academic publications.

The composition of our final sample in terms of publication types demonstrated an interesting trend. As might be expected, 192 studies, or 60 %, were journal articles, with smaller numbers of book chapters and books (18 and 12 respectively). A sizeable proportion (just over 30 %) of our eligible studies came from grey literature (85 reports and 14 dissertations). This is lower than the 48 % of unpublished sources that Wilson (2009) reported when examining 11 Campbell Collaboration systematic reviews, but we think it still noteworthy given that we were specifically seeking evidence syntheses. This demonstrates the value governments and other non-academic bodies place on evidence syntheses across the spectrum of crime prevention (see Bowers et al. 2014a), and is a trend that is likely to persist.

As a final note, the point made earlier about the lack of reporting guidelines in criminology generally, and crime prevention more specifically, is an important one (Sidebottom and Tilley 2012). This is amplified by the lack of structured abstracts and common vocabulary across the social sciences (Glanville et al. 2008). For these reasons, systematic searchers cannot rely on the key information being in the title alone. Underscoring this point, in our eventual database of evidence syntheses in crime prevention, of the 224 records that were coded as systematic reviews, only 89 had ‘systematic review’ in the title; of the 176 meta-analyses, only 92 had meta-analysis (or a variant of) in the title. Cross-disciplinary common vocabulary is hence sorely needed for high-quality evidence synthesis, and is a fertile area for future research efforts.

Discussion

Currently, scholars undertaking systematic searches of criminology and crime prevention literature have a limited reference point, for there is a dearth of empirical examples to guide them through the process. In articulating the experiences of applying principles of high-quality information retrieval to the field of crime prevention this paper goes some way to addressing this. In doing so, we have elicited and marshaled tacit knowledge, which can usefully inform the search process for future evidence synthesists. Furthermore, this paper provides results from the first empirical tests of optimizing the balance between sensitivity and precision with the criminological literature. In this discussion we summarize these key findings.

Systematic searching of the literature mirrors the wider research process in that it is not a linear process. A high-quality search will involve a substantial time investment in honing the research question, specifying the precise scope of the work, and trialing and testing of search tactics. Whilst it is good practice to publish the intentions of the search strategy prospectively in a protocol document, it is not uncommon for the search strategy to evolve once the process is underway (Reed and Baxter 2009). Indeed, refinement of the strategy through reflective practice assists in shrewd use of resources.

Time and practical constraints are critical contributory factors in systematic search decisions. While an exhaustive search—finding all relevant studies across all possible sources — is the common ambition, it should be emphasized that to do so would require unlimited resources. Pragmatically, White advises that “the point is not to track down every paper that is somehow related to the topic… the point is to avoid missing a useful paper that lies outside of one's regular purview” (White 1994, p.44). Crucially, it is about minimizing publication bias. As stressed throughout this paper, an optimal search synchronizes high sensitivity with high precision, and thus strikes a balance between identifying all the records of relevance to a research question, managed within the context of available resources. Tactics should therefore be devised with resources in mind, as each source can significantly increase the number of studies that need to be screened and retrieved (prior to the coding and synthesis stage of a review).

Studies on crime prevention topics (and criminology more generally) originate from a broad range of constituent fields. For this reason, systematic searches on topics in this area should employ a myriad of sources so that disciplinary boundaries and the limits of researcher knowledge do not bias the search. In addition, the multidisciplinary nature of crime prevention results in idiosyncratic and scattered vocabulary. Such language is not easily anticipated in searches of electronic databases, and thus care should be taken to generate a wide range of synonyms so that the diversity of expression may be accounted for.

On a related note, high-quality systematic searches recognize the value of using a range of search tactics, and do not rely solely on database searches. Many search tactics exist to mitigate the known problems with publication bias (for example backwards and forwards reference checking, hand searches and consultation of relevant annotated bibliographies). Pursuant to this, in the field of crime prevention it is vital to search widely for grey literature; that is, research findings that are published outside of academic outlets. It is worth acknowledging that grey literature retrieval is a specialist skill, which few academics have. Accordingly, we affirm the recommendation in the information retrieval guidance to work closely with information specialists in the field of interest (Hammerstrøm et al. 2010).

Our survey of crime prevention evidence syntheses cumulated a sample with just under a third of the studies being grey literature – a finding that is consistent with previous empirical enquiry (Wilson 2009). This is, however, an under-researched area and the quantification of grey literature in the criminology field is an important research agenda. For example, it would be useful to know what proportion of conference abstracts make it to final publication, the time it takes from conference to publication and the reasons for not publishing (i.e., weak or null results). This would facilitate an empirically based estimate of the impact of publication bias in the field.

In a related vein, methods to detect inflated estimates of effect in primary studies are now being applied in allied fields such as psychology (Francis et al. 2014). These tests are performed to estimate the probability that appropriately conducted experiments will produce as many ‘successful’ outcomes as reported in the literature. Quantification of potential bias in primary study evidence is an exciting development, and one the criminological community could apply to estimate the magnitude of published biased findings.

Lastly, it is important to recognize that each research question will require its own tailored approach. The focal topic of the systematic search will dictate the selection of the sources, the keywords and controlled vocabulary used in electronic database searching, and complementary search tactics. The overarching principle is to search widely, meticulously and judiciously so that as many studies are found within the scope of the available resources. Akin to research more generally, data collection is rarely perfect, but adopting these precepts can protect against biased results.