Background

One of the principles underpinning evidence informed policy and practice is that of knowledge accumulation: that we do the most good, and avoid harms, by basing our decisions on systematic reviews of high quality research [1]. Systematic reviews can synthesize a large amount of sometimes conflicting evidence and can therefore be a potentially important influence on practitioner and policy-makers’ decisions [2].

However, how suitable are systematic reviews for informing decision-making when using reviews that were not commissioned for that specific decision? The applicability of review findings has been called into question recently, with some reviews being criticized for lacking the context-specific detail that is essential to translate their findings to specific practical situations [3]. Equally important is the question of whether systematic reviews can be relied upon genuinely to reflect the state of the evidence base. To do this they must: (1) ensure that all relevant studies are identified through the use of exhaustive search strategies; and (2) ensure that their conclusions are based on reliable studies.

In this paper we examine eight reviews of community interventions to promote physical activity in order to investigate the issue of comprehensiveness and reliability in reviews and to consider the problem of applicability; if we were a practitioner wishing to use these reviews to inform our practice, how confident can we be that our decision would be based on all the available evidence and that the conclusions drawn were reliable? And, while the reviews we may seek to draw upon will all appear to address similar questions, are we able to mediate differences between them?

In essence, we placed ourselves in the hypothetical position of wanting to identify evidence about ‘what community interventions work’ to promote physical activity among children to inform our decision-making. Using a systematic ‘map’ of reviews we selected a set of reviews that are ostensibly all about the same broad issue – that of community interventions to promote physical activity. Our confidence in the evidence base as portrayed by the reviews would be increased if we could see how they each contributed to an overall understanding of the field; and if reviews addressing the same question identified the same studies and treated them in the same way. If they did not, we might worry that other, equally relevant, studies were missing too, and without considerable effort on our part, we would have no way of mediating between conflicting findings.

We wished to explore any differences between reviews in terms of the studies they included. While there may be legitimate reasons for reviews on the same subject not containing the same studies (for example, differences in scope or population) it may be that it is difficult to understand the inclusion or exclusion of certain studies purely in terms of the scope of reviews; where there are differences between reviews that cannot be explained by their having different scopes or purposes, these differences might be explained in terms of differential review quality. In addition, we would hope that reviews which included the same studies would report the same results from those studies, draw the same conclusions, given their similar scope, and reach similar conclusions about the effectiveness of community interventions for physical activity.

One of the justifications for systematic reviews is encapsulated by the concept ‘knowledge accumulation’; that new research should build on previous work and say how it contributes to existing knowledge. Additionally, locating new systematic reviews in the context of other reviews should facilitate the process of piecing together knowledge from multiple reviews in order to inform practical decisions. To see how far the eight reviews facilitated this, we investigated how far the reviews cited one another, since inter-citation may be taken as evidence that reviewers were both aware of previous work and sought explicitly to advance the state of knowledge in their area.

Since it is not always clear whether a review is systematic or non-systematic, and given that literature reviews are often commissioned to inform decisions, we included all types of literature in this area (not just systematic reviews) and assessed the relationship between review methods and included data, reporting, and conclusions.

Our research questions were:

  1. 1.

    To what extent do reviews answering a similar research question include the same primary studies?

  2. 2.

    Where reviews do not contain the same studies, is this explicable in terms of differences in their scope?

  3. 3.

    How similarly do reviews answering a similar research question report the results of the primary studies they have in common?

  4. 4.

    To what extent do reviews answering a similar research question draw the same conclusions?

  5. 5.

    To what extent do reviews answering a similar research question cite other reviews on the same topic?

  6. 6.

    Does the methodological quality of reviews answering a similar research question help us to understand any differences between included studies in terms of results and conclusions?

Methods

Identifying reviews which answer similar research questions

In 2008 we published a systematic map of reviews on ‘Social and environmental interventions to reduce childhood obesity’ which included 33 reviews about the impact of upstream or ‘social and environmental’ interventions on eating, physical activity, sedentary behavior, and/or associated attitudes [4]. This map included reviews about physical activity (or sedentary behavior), healthy eating (or weight management) with an OECD country focus, and which included children in their topics of focus. In order to investigate study overlap between reviews, we needed a sufficiently large sample of reviews that were as homogenous as possible in terms of their topic areas. We therefore used a subset of the reviews in the above ‘map’ to be the focus of our investigation. This subset of reviews investigated the effectiveness of community interventions to promote physical activity (either alone or in combination with healthy eating). There were 16 such reviews in the above ‘map’ but in order to maximize comparability of research question and scope of the reviews, we excluded reviews which:

  1. 1.

    only had very little effectiveness data (for example, were primarily a description of funded interventions);

  2. 2.

    had inclusion criteria that restricted the population in terms of ethnicity, race, or age (for example, only included studies about Aboriginal/Torres Strait Islander people);

  3. 3.

    did not draw any conclusions about physical activity.

On this basis, we excluded eight reviews [511]. We tabulated the inclusion criteria of the remaining eight reviews (Table 1) [1219] and judged that they were similar enough in scope to be compared. Although two reviews [14, 15] had been updated since our searches (2007–2008), we based our analyses on the original reviews included in our map.

Table 1 Characteristics of the reviews which met our inclusion criteria ( n = 8)

Methodological quality of the included reviews

As discussed in the background, systematic reviews are promoted as an important means of ensuring decisions are informed by reliable research evidence. Unfortunately, while some reviews may describe themselves as ‘systematic’ they may not be; likewise some reviews are systematic without necessarily being described as such. We therefore assessed the quality of the reviews using the AMSTAR quality assessment tool [20] to assess the degree to which they met accepted standards for being systematic reviews (broadly examining their reporting of their inclusion (and/or exclusion) criteria; their search strategy; synthesis methods; quality assessment; details reported about included studies; and quality assurance measures (that is, screening and/or data extraction and/or quality assessment of studies completed independently by two reviewers (at least in part) and differences resolved) (Table 1). This tool was developed as a result of a systematic survey of other review quality assessment tools and a consultation exercise; it therefore identifies what are widely held to be the most important characteristics of systematic reviews. We classified reviews which have clear inclusion criteria, an adequate search strategy and quality appraisal of included studies as ‘systematic’. We included non-systematic reviews as they are often used for the same purposes as systematic reviews, and are frequently commissioned to inform policy.

Given the challenges of locating data for public health interventions [21, 22], we went beyond the AMSTAR criteria and only judged a search strategy to be adequate if the authors reported all of the following: searching more than two databases using both free text and thesaurus terms; searching at least one topic specific database or journal (such as those relating to physical activity, obesity, eating or food, or public health (the scope of the original map)); and using at least one non-database search source (internet searching, website searching, contacting experts, checking reference lists, or hand-searching key-journals). Where there was no mention of the quality indicator or where it was unclear, we assumed the quality indicators were not present.

Identifying the studies included in the reviews

We compiled a list of all the studies that were included in the above reviews. We determined whether a study had been ‘included’ in a review by assessing whether it had its findings about the effectiveness of a community intervention reported by the review authors. We defined ‘findings about effectiveness’ as being any report of the impact of a social and environmental change or any report of an observational comparison between populations with and without a specific social and environmental factor (for example, access to walking paths). We were broad in our interpretation of ‘social and environmental’ and only excluded evaluations of purely educational interventions delivered exclusively in the workplace or classroom. Outcomes relevant to ‘physical activity’ were defined as any measure of activity, sedentary behavior, knowledge, or beliefs, or body-weight, BMI, or energy intake, following an intervention with a physical activity component.

Analysis

We entered each included study onto our review management software EPPI-Reviewer [23] and coded the studies according to: (1) the reviews in which they were included; and (2) whether there was an obvious reason for the study’s exclusion from certain reviews, based on information available from the inclusion criteria of each review and the abstract of the included studies. In two cases, the abstract was not available (one study was very old [24] and the other was a conference abstract without any details except the title [25]). We excluded these two studies from the analyses that relied on the abstract. Researcher judgment was needed to determine whether there was a likely reason for exclusion, especially where inclusion criteria were not clearly reported. Despite overlap in scope, the reviews answered different research questions (Table 1). With this in mind, we only classified the reasons for non-included primary studies as ‘unclear’ (Table 2) if we could not discern any reason at all, based on their date, scope, and inclusion criteria, why they were not included in the review. In addition, and based on a detailed reading of the full text, we described how each review reported the results of the included studies and summarized the conclusions that each review drew about the effectiveness of community interventions for promoting or increasing physical activity. Finally, we checked the reference lists of each review to establish the frequency with which the reviews cited other relevant reviews. As manuscripts are submitted many months before publication, we judged that when publication dates were within 1 year of each other, reviews were not necessarily able to cite one other.

Table 2 The reasons that we deduced why studies may have been excluded from each review

Multiple publications arising from one study were analyzed as a group (that is, our unit of analysis was the included study rather than publication). We found that three studies that had generated multiple publications were included in the eight reviews (Table 3).

Table 3 Studies with multiple included publications (the first publication in the list is the one that has been used to reference the study in the text and tables above)

Quality assurance

Data about, review quality, research question, and scope (Tables 1 and 4) were extracted as part of the original project [4]. These data were independently extracted by two researchers and discrepancies resolved by recourse to the original publications or, in some cases, to a third reviewer. Identification of ‘included’ studies in the eight reviews was also carried out independently by two reviewers and differences resolved by discussion and consensus. All other analyses were conducted by one reviewer with quality assurance checks conducted by a second reviewer on a subset of data.

Table 4 Overlap between primary studies included in reviews

Results

A total of 28 primary studies in the eight reviews met our criteria for being ‘included studies’ [2426, 35, 39, 4163]. Twenty-six of these studies (93%) had an abstract available. In many cases, especially with less high-quality reviews, it was difficult to judge which studies were ‘included’ (author using result used to answer questions about effectiveness) and which studies were referenced for another reason.

To what extent do reviews answering a similar research question include the same primary studies?

There was little overlap between data included in the eight reviews: the majority of primary studies (n = 22/28; 79%) were only included in one review; four studies were included in two reviews and two of the studies that had generated multiple publications were included in five reviews (Table 4). Of the six studies which were included in multiple reviews, four [39, 6163] were included in two reviews and two were included in five reviews [26, 35].

Where reviews do not contain the same studies, can we explain why not?

For most of the 26 included studies with an available abstract, it was possible to justify why primary studies had been excluded from each review, although this involved a high degree of reviewer deduction (Table 2). Systematic reviews had fewer inexplicable exclusions of studies: it was possible to explain the absence of primary studies in the three systematic reviews. The reason for exclusion was usually research design of the primary data (some reviews specified controlled trials, of which there are few in this field) or outcome (Table 2).

As we could usually justify why primary studies were not included in reviews, the limited overlap between included primary studies might also be due to slight variations in scope and inclusion criteria (Table 1) rather than only to inadequacy of search strategies.

How similarly do reviews answering a similar research question report the results of the primary studies they have in common?

We were able to analyze similarity in reporting of primary study results for six studies which were included multiple times in five reviews (Table 5). Results were reported similarly by different review authors for the three studies which generated only one publication. However, for the remaining three studies (Welsh heart project, Minnesota Heart Health Programmed, and Stanford 5 City; Table 5), there were discrepancies between results reported by different review authors in terms of effectiveness data, subgroup analyses, and emphasis. These studies were conducted over a longer time period, with staged and multiple evaluations and, in one case, adaptation of intervention for subgroups. None of these three reviews referenced the same combination of publications generated by the two studies with multiple publications (Table 5).

Table 5 How the results about physical activity from the seven studies that were included in more than one review were reported in each review

To what extent do reviews answering a similar research question draw the same conclusions?

Despite the low levels of overlap of included studies in the eight reviews, the conclusions of the reviews were similar (Table 6). All review authors made cautious claims about the effectiveness of interventions in this field for increasing physical activity behavior. All reviews except for one concluded that there was limited or no evidence of effectiveness for increasing physical activity. This one review concluded that there was evidence of effectiveness in all studies but that the size of the impact was very modest [16]. Where authors discussed subgroup effects it was either to highlight a need for evidence in this area or to suggest that targeting interventions was likely to be a promising avenue for future interventions [12, 16]. Five authors drew conclusions specifically relating to the quality and methods of the evidence. Four of these authors reported that good quality evidence was limited or lacking [1315, 18]. Additionally, Dobbins and Beyers suggested that there was good quality but very complex evidence [12]. The three authors who gave clear explanations of their findings [12, 13, 16] suggested that a lack of strong evidence for the positive impact of community interventions for physical activity might be at least partly due to difficulties in measuring impact and/or design problems such as small sample size. All authors concluded that we should not abandon community interventions to increase physical activity. Instead, they recommended that more research was needed and most gave specific recommendations.

Table 6 Conclusions from each review about physical activity

To what extent do reviews answering a similar research question cite other reviews on the same topic?

There was little citation of the eight reviews by one another. Only three reviews [16, 18, 19] cited any other of the reviews. Of a possible 28 instances where the eight reviews could have cited one other (once date of publication had been taken into account), there were only four instances of citation (Table 7). The four instances of citation were of the same two non-systematic reviews, one of which was cited by three different reviews (Table 7) [17, 18].

Table 7 To what extent do the included reviews reference each other

Does the methodological quality of reviews answering a similar research question help us to understand any differences between included studies, results, and conclusions?

We found that the methodological quality of the reviews varied (Table 1). There were three ‘systematic reviews’ (Table 1) [12, 14, 15]. Only the two Cochrane reviews [14, 15] met our criteria for an adequate search strategy. However, the searches by one other review met all search criteria except reporting that it searched using both free text and thesaurus terms. This can also be thought of as a ‘systematic’ review [12].

For the three systematic reviews (two of which were ‘empty’ reviews; that is, they did not contain any included primary studies), it was possible to explain why all non-included primary studies were not included [12, 14, 15]. However, in the lower quality reviews, it was more difficult to explain reasons for exclusion and almost half the exclusions in one such review could not be explained (n = 12/26 not explained; 46%) [17].

As two of the three ‘systematic’ reviews were ‘empty’, we could not meaningfully compare differences between included studies, results and conclusions in systematic and non-systematic reviews.

Discussion

Main findings

It was often difficult to identify ‘included’ studies and much deduction was needed in explaining why some primary studies may not have been included in a specific review.

We found little overlap of included studies within the eight reviews, despite the similarity of the research question. Studies with multiple publications were more likely to be included in reviews than shorter term studies which generated single publications. The results of studies with multiple publications were also more likely to be reported differently by different review authors.

Although search strategies in the majority of cases did not meet our quality threshold, the inclusion criteria of the reviews appeared to justify the lack of inclusion of specific primary studies. Unsurprisingly, it was easier to explain the exclusion of studies in better quality reviews, as they had clearer inclusion criteria and search strategies.

Reviews of longitudinal and multi-stage interventions were more likely to find larger studies, but less likely to report their findings comprehensively because these are dispersed across many publications, not all of which were necessarily reported.

Discrepancies in findings did not lead to discrepancies in conclusions. This may be because it is particularly challenging to show an impact arising from complex interventions and reviewers tended to be cautious with their interpretations.

There was little cross-citation between reviews and only the lower quality reviews cited other reviews in our analysis.

It was possible to explain why all non-included studies were absent from the systematic reviews, but more difficult to do so for the non-systematic reviews. (Since two out of the three systematic reviews were ‘empty’ we were unable to compare differences in terms of how reviews of different quality treated their included studies.)

Strengths of this study

There are several strengths of this study. First, our searches were far-reaching and sensitive and our definition of ‘community intervention’ was broad. Consequently, the eight reviews analyzed here are likely to represent fully the group of reviews available at the time of the searches which aim to evaluate the effectiveness of community interventions for promoting physical activity. Secondly, by excluding reviews which were mainly descriptive, which did not draw conclusions specifically about physical activity or which restricted their population of interest, we ensured that the scope of the reviews was similar enough to warrant a comparison. Thirdly, we assessed the quality of the reviews and were able to comment on the relationship between review quality and our findings. It was necessary to use high levels of researcher judgment at several key stages of analysis: when classifying primary studies as ‘included’, when extracting authors’ conclusions and when assessing whether exclusion of primary data could be ascertained. We implemented quality assurance measures to minimize the potential for inconsistencies when extracting and analyzing data, especially for the lower-quality reviews which had less defined boundaries.

Weaknesses of this study

Our analyses of reasons for exclusion of primary studies were based on the abstract of the included studies. It is possible that our analyses of the reasons for exclusion would have been different had we used the full text of the included studies and/or had contacted the review authors for data. We assumed that a primary study had been found and excluded by a review if we could justify its exclusion by the inclusion criteria or search/publication date. We cannot quantify how much primary data were never found by the reviews and cannot, therefore, comment on whether it is the scope of the review or the methods used that led to the lack of inclusion of specific primary studies.

We also acknowledge that since the searches for the original review of reviews were carried out in November/December 2007, other reviews on this topic have been published. These may reflect developments in review method that overcome some of the weaknesses in the reviewed evidence base; however, the general messages contained in this paper about understanding how different reviews on the same subject relate to one another will remain important to understand.

Methodological issues

To some extent, we were surprised by our findings. We had expected to find greater overlap between reviews and, where overlap was limited, diversity in findings. The similarity in findings can be explained by the fact that no reviews found compelling evidence of effectiveness in the studies they included; they were all therefore cautious in their conclusions. This finding echoes the results of a similar study, that, even though the scope and quality assessment methods employed in health promotion reviews differed, this is ‘unlikely to divide opinion radically about effectiveness amongst cautious reviewers’ [64]. In contrast, two reviews with a similar research question came to very different conclusions about the effectiveness of interventions for childhood obesity [65]. In these reviews, conclusions were based on the results of randomized controlled trials (RCTs) and it may be that reviewers tend to be more cautious, and therefore their conclusions less divergent, when interpreting observational data.

The lack of overlap of primary studies warrants further examination, because it cannot be explained (entirely) in terms of deficiencies in the search strategies of the reviews, but rather seems to be due to differences in the scope (inclusion criteria) of the reviews which relates to heterogeneity in their review questions. This finding is consistent with other methodological studies that found that many apparent inconsistencies in the citation and selection of primary studies, especially non-RCTs, could be attributed to differences in inclusion criteria and outcome assessments of the reviews (rather than being due primarily to problems in their search strategies) [65, 66]. Even though we had determined our sample of reviews to be as similar as possible in scope so that we could investigate overlap, in practice, the scope of the reviews did not overlap very much. This has important implications for the utility of reviews to inform policy and practice.

First, in areas where evaluation and impact measurement is known to be difficult and where research and policy interest is relatively recent, it is likely that the findings of reviews will reflect uncertainties in the primary studies and be less enlightening about the substantive topic. Review conclusions can only ever be as good as the available data on the topic [67]; this was certainly the case in the reviews that we examined. Across the topic of community interventions to promote physical activity, reviews were necessarily cautious in their findings because of uncertainties in the evidence base. While this is useful for researchers and research commissioners to know, it is less useful for people involved in determining policy and practice.

Second, dealing with linked publications (multiple publications from the same study) was complicated and confusing, both for ourselves and seemingly for the reviewers of our eight included reviews. To improve fidelity of reporting and ensure that all relevant evidence informs review results and conclusions, it is important to identify all publications from studies with multiple or staged evaluations. We therefore recommend that study authors aid researchers by clearly citing all previous and intended work in each publication and that this is also something that editors check before publication. Larger studies might consider keeping a website for the study which details all related publications (as some already do). Reviewers can search for multiple publications from a study by searching for papers by authors, studies and research groups that feature in the provisional list of included studies for the review. In order to build on existing knowledge, review authors should search for existing relevant reviews in the area and use this knowledge to contextualize their aims and findings. Inclusion (and citation) of relevant reviews will also help direct readers to relevant resources.

The study has also highlighted some of the unavoidable complexities that face potential users of systematic reviews. We placed ourselves in a hypothetical situation, but one that is similar to that faced by many policymakers and practitioners who would like their decisions to be informed by evidence; for example, a newly formed Health and Wellbeing board in the UK, tasked with reducing obesity among young people, might well want to examine what works in terms of promoting physical activity. If they used the map of community interventions and identified these eight reviews as being relevant, they would find that: while all the reviews were about the promotion of physical activity, they each had a particular ‘angle’, which determined the range of research they included; where the same studies were included in reviews, their findings were not always reported consistently; the concept of ‘community’ was often discussed in reviews, but there were also differences in its conceptualization; and on the whole, the reviews did not position themselves as contributing to a wider evidence base around the promotion of physical activity (as evidenced by the lack of inter-citation between them).

There was an inevitable tension in this analysis between a narrowness that ensured that all reviews were on exactly the same topic, and a breadth that ensured all potentially relevant reviews were included; the same tension concerning homogeneity of focus as exists in many systematic reviews in public health. Given that most public health decisions are about identifying solutions to a problem (in this case, increasing levels of physical activity), obtaining a range of reviews is to be expected; and the question that this paper begins to unpick emerges: ‘how coherent is the picture that emerges?’

Reviews which give a limited ‘slice’ of the evidence are extremely valuable if the policy/practice question is closely aligned to the scope of the review, but are less useful if they give only a partial picture. In our topic area however, even with the findings of all eight reviews at our disposal, we would not be confident that we were building on the results of all research about community interventions to promote physical activity, because each review contains a limited portion of the evidence and there may well be relevant studies that fall outside the scope of any of our reviews. (We should reiterate the point made above, that systematic review methods are developing quickly, and that some of these ‘gaps’ may now be filled.)

The above points relate to wider and unsolved issues about the amount of ‘work done’ in a review [68]. Some reviews have a relatively narrow focus, undertaking a detailed look at a relatively small area; there is additional ‘work’ to be done by users in identifying a range of such reviews and ‘synthesizing’ them to inform their particular decision. Other reviews are broader in scope which means that, potentially, less ‘work’ needs to be done by their users, though there is a tension between achieving both breadth and depth in the same review the risk being that broad reviews may suffer from a lack of focus and be deficient in essential detail [16]. While a detailed discussion of these issues is beyond the scope of this paper, we have highlighted areas within which review authors might usefully assist potential users.

Conclusions

One possible future way forward is to undertake more systematic ‘maps’ of research activity. Systematic maps find and describe the research on a given topic and help researchers and policymakers to judge where there is and is not sufficient data to justify a narrow and in-depth review which seeks to answer a specific policy or practice question [32]. It is important, however, that systematic maps are kept updated and that funders allocated resources to this end. To maximize access to the knowledge gathered in systematic maps, they should be made freely available to researchers, funders, and policymakers.

Finally, we recommend for further reading the Guidelines for systematic reviews of health promotion and public health interventions [69] that was written by members of the Cochrane Public Health Review Group. This document discusses many of the issues mentioned above and aims to build reviewing capacity among those working in the difficult areas that create a great deal of the complexity identified in this analysis. Also, for those interested in the substantive topic of the reviews discussed here, we refer readers to a recent Cochrane review on the subject [70].