Background

The World Health Report 2013 argued that “adding to the impetus to do more research is a growing body of evidence on the returns on investment” [1]. While much of the evidence on the benefits of research came originally from high-income countries, interest in producing such evidence is spreading globally, with examples from Bangladesh [2], Brazil [3], Ghana [4] and Iran [5] published in 2015–2016. Studies typically identify the impacts of health research in one or more of categories such as health policy, clinical practice, health outcomes and the healthcare system. Individual research impact assessment studies can provide powerful evidence, but their nature and findings vary greatly [69] and ways to combine findings systematically across studies are being sought.

Previous reviews of studies assessing the impact of health research have analysed the methods and frameworks that are being developed and applied [6, 813]. An additional question, which has to date received less attention, is what level of impact might be expected from different types of programmes and portfolios of health research.

This paper describes the methods used in two successive comprehensive reviews of research impact studies, by Hanney et al. [6] and Raftery et al. [9], and justifies a sample of those studies for inclusion in the current analysis. We also consider the methodological challenges of seeking to draw comparisons across programmes that go beyond summing the impacts of individual projects within programmes. Importantly, programmes would need to be comparable in certain ways for such cross-programme comparisons to be legitimate.

For this paper, we deliberately sought studies that had assessed the impact of all projects in multi-project programmes, whether coordinated or not. We focused on such multi-project programmes because this approach offered the best opportunities for meaningful comparisons across programmes both of the methods and frameworks most frequently used for impact assessment and, crucially, of the levels of impact achieved and some of the factors associated with such impact. Furthermore, such an approach focused attention on the desirability of finding ways to introduce greater standardisation in research impact assessment. However, we also discuss the severe limitations on how far this analysis can be taken. Finally, we consider the implications of our findings for investment in health research and development and the methodology of research on research impact.

Methods

The methods used to conduct the two previous reviews on which this study is based [6, 9] are described in Box 1.

Box 1 Search strategy of two original reviews

The two narrative systematic reviews of impact assessment studies on which this paper is based were conducted in broadly similar ways that included systematic searching of various databases and a range of additional techniques. Both were funded by the United Kingdom National Institute for Health Research (NIHR) Health Technology Assessment (HTA) Programme.

The searches from the first review, published in 2007, were run from 1990 to July 2005 [6]. The second was a more recent meta-synthesis of studies of research impact covering primary studies published between 2005 and 2014 [9]. The search strategy used in the first review was adapted to take account of new indexing terms and a modified version by Banzi et al. [11] (see Additional file 1: Literature search strategies for the two reviews, for a full description of the search strategies). Although the updated search strategy increased the sensitivity of the search, filters were used to improve the precision and study quality of the results.

The electronic databases searched in both studies included: Ovid MEDLINE, MEDLINE(R) In-Process, EMBASE, CINAHL, the Cochrane Library including the Cochrane Methodology Register, Health Technology Assessment Database, the NHS Economic Evaluation Database and Health Management Information Consortium, which includes grey literature such as unpublished papers and reports. The first review included additional databases not included in the updated review: ECONLIT, Web of Knowledge (incorporating Science Citation Index and Social Science Citation Index), National Library of Medicine Gateway Databases and Conference Proceedings Index.

In addition to the standard searching of electronic databases, other methods to identify relevant literature were used in both studies. This included in the second review an independent hand-searching of four journals (Implementation Science, International Journal of Technology Assessment in Health Care, Research Evaluation, Health Research Policy and Systems), a list of known studies identified by team members, reviewing publication lists identified in major reviews published since 2005, and citation tracking of selected key publications using Google Scholar.

The 2007 review highlighted nine separate frameworks and approaches to assessing health research impact and identified 41 studies describing the application of these, or other, approaches. The second review identified over 20 different impact models and frameworks (five of them continuing or building on ones from the first review) and 110 additional studies describing their empirical applications (as single or multiple case studies), although only a handful of frameworks had proven robust and flexible across a range of examples.

For the current study the main inclusion criterion was studies that had attempted to identify projects within multi-project programmes in which investigators had claimed to have made some wider impact, especially on policy or practice, and/or for which there was an external assessment showing such impact. We included only one paper per impact assessment and therefore, for example, excluded papers that reported in detail on a subset of the projects included in a main paper. We did not include studies that reported only on the total number of incidents of impacts on policy claimed for a whole programme, rather than the number of projects claiming to make such impact. We included only those studies where the findings were described in a way that allowed them to be collated with others, then analysed and presented in a broadly standardised way. This meant, for example, that the categories of impacts described by the study had to fit into at least one of a number of broad categories.

We defined the categories as broadly as possible to be inclusive and avoid creating overlapping categories. Following an initial scan of the available studies we identified four impact categories that were broadly compatible with, but not necessarily identical to, the impact categories in the widely-used Payback Framework [14, 15] and the Canadian Academy of Health Sciences adaptation of that framework [10]. The categories were impact on health policy or on a healthcare organisation, informing practice or clinician behaviour, a combined category covering policy and clinician impact, and impact on health gain, patient benefit, improved care or other benefits to the healthcare system.

Studies were included if they had presented findings in one or more of these categories in a way that could allow standardised comparison across programmes. In some cases, the studies presented findings solely in terms of the numbers of projects that had claimed or been shown to have had impact in a particular category. These had to be standardised and presented as percentages. Each study was given the same weight in the analysis, irrespective of the number of individual projects covered by the study. For each of the four categories of impacts we then calculated the median figure for those studies showing the percentage of projects that had claimed to make an impact in that category. We also presented the full range of percentages in each category.

We extracted data on methods and conceptual frameworks for assessment of research impact described in each study, and on categories of factors considered by the authors to be relevant for the level of impact achieved. In identifying the latter, our approach was informed by a range of international research literature, in particular the 1983 analysis by Kogan and Henkel of the importance of researchers and potential users working together in a collaborative approach, the role of research brokers, and the presence of bodies that are ready to receive and use the research findings [16, 17]. Other papers on these and related themes that influenced our approach to the analysis included literature related to North and Central America [1821], Africa [22], the European Union [23], and the United Kingdom [6, 14, 24], as well as international studies and reviews [2531].

Results

Thirty-six studies met the inclusion criteria for this analysis [6, 3266]. These were highly diverse in terms of the location of the research, nature and size of the funder’s research programme or portfolio, the fields of research and modes of funding, time between completion of the programme and impact assessment, the methods (and sometimes conceptual frameworks) used to assess the impact, and levels of impact achieved. A brief summary of each study is provided in Table 1.

Table 1 Thirty-six impact assessment studies: methods, frameworks, findings, factors linked to impact achieved

The studies came from 11 different countries, plus a European Union study and one covering various locations in Africa. The number of projects supplying data to the studies ranged from just eight in a study of an occupational therapy research programme in the United Kingdom [59], to 22 operational research projects in Guatemala [35], 153 projects in a range of programmes within the portfolio of the Australian National Breast Cancer Foundation [38], and 178 projects from the Hong Kong Health and Health Services Research Fund [51].

In terms of the methods used to gather data about the projects in a programme, 21 of the 36 studies surveyed the researchers, usually just each project’s Principal or Chief Investigator (PIs), either as the sole source of data or combined with other methods such as documentary review, interviews and case studies. Six studies relied exclusively, or primarily, on documentary review and desk analysis. In at least three studies, interviewing all PIs was the main method or key starting point used to identify further interviewees. The picture is complicated because some studies used one approach, usually surveys, to gain information about all projects, and then supplemented that with other approaches for selected projects on which case studies were additionally conducted, and often involved interviews with PIs. In total, over a third of the studies involved interviews with stakeholders, again sometimes in combination with documentary review. Many studies drew on a range of methods, but two examples illustrate a particularly wide range of methods. In the case of Brambila et al. [35] in Guatemala, this included site visits which were used to support key informant interviews. Hera’s [46] assessment of the impact of the Africa Health Systems Initiative Support to African Research Partnerships involved a range of methods. These included documentary review, and programme level interviews. Project level information was obtained from workshops for six projects and from a total of 12 interviews for the remaining four projects. In addition, they used participant observation of an end-of-programme workshop, at which they also presented some preliminary findings. In this instance, while the early timing of the assessment meant that it was unable to capture all the impact, the programme’s interactive approach led to some policy impact during the time the projects were underway.

In 20 of the 36 studies, the various methods used were organised according to a named conceptual framework (see Hanney et al. [6] and Raftery et al. [9] for a summary of all these frameworks); 16 of the 36 studies drew partly or wholly on the Payback Framework [15]. A series of existing named frameworks each informed one of the 36 studies, and included the Research Impact Framework [24], applied by Caddell et al. [37]; the Canadian Academy of Health Sciences framework [10], applied by Adam et al. [32]; the Banzi Research Impact model [11], applied by Milat et al. [53]; and the Becker Medical Library model [67], applied by Sainty [59].

In addition, various studies were identified as drawing, at least to some degree, on particular approaches, albeit without an explicitly named framework being described. Jacob and Battista [47] developed and applied their own approach to evaluate the impact of studies conducted by the Quebec Council of Health Care Technology Assessments (CETS); the approach was broadly replicated in a further evaluation of the impact from CETS [48] and informed subsequent studies in Quebec [52], France [34] and Austria [66]. The interactive approach was referred to by several studies [35, 46]. The study by Molas-Gallert et al. [54] of the impact from a programme of AIDS research funded by the United Kingdom’s Economic and Social Research Council used an approach that they subsequently further developed with Spaapen et al. [23] in the Social Impact Assessment Methods through the study of Productive Interactions (SIAMPI) approach.

Only one included study assessed the monetary value of a research programme’s resultant health gain. Johnston et al.’s [49] assessment of the impact from a National Institutes of Health (NIH) programme of clinical trials in the United States is described in some detail here because studies providing a rate of return were seen in the World Health Report as key evidence for promoting the future funding of health research [1]. For the trials identified as making an impact in terms of health gain and/or cost savings, Johnston et al. [46] employed a bottom-up approach. They identified cost-utility estimates for the interventions implemented following the NIH research to obtain a per patient net monetary benefit. A timeline of usage was constructed for each of the interventions to produce a population timeline of net monetary benefit and was related to the investment in research. The results indicated an impact, with a return on investment for the whole programme of 46% per year. However, the authors acknowledged the difficulty of acquiring the necessary data to conduct an exercise of this kind, with only 8 out of 28 trials contributing the benefits used to calculate the rate of return on investment. While we did not have a category related specifically to the economic impacts of health research, we included this study in the health gain category because the latter was a key step towards being able to calculate monetary value and was identified as occurring in six out of the 28 projects (21%).

Despite the diversity, each of the 36 studies reported on the number of projects in the multi-project programme making an impact in one, or more, of four broad categories. The number of projects reporting on each category, and the number (and range) of projects that reported having achieved some such impact is set out in Table 2.

Table 2 Analysis of quantitative data from 36 studies reporting on findings from each project in a multi-project programme

One example from the various studies can be used to illustrate what is included in each of the four types of impact. The 1997 study by Jacob and McGregor [48] reported that 86% of the HTAs conducted in Canada by the Quebec CETS had influenced policy. One of these HTAs found that the likelihood of health benefits from routine preoperative chest radiography was extremely slender; prior to the publication of that HTA report, 55 out of 118 hospitals questioned had a policy of using such routine chest radiography, yet 3 years later, all but three had abandoned this policy and in 79% of cases the HTA was cited as a reason for the policy change. In terms of impact on practice, in 2007, Kwan et al. gave the following as an example of the local impact on provider behaviour made by the health and health services research programme in Hong Kong: “improved reporting of unintentional child injury cases and liaison between the Hospital Authority Informatics and Accident and Emergency” ([51], p. 8).

Illustrating the combined category, Milat et al. [53] used a category called ‘Policy and practice impacts’ in their 2013 assessment of the impact from the research funded in Australia by the New South Wales Health Promotion Demonstration Research Grants Scheme. While the analysis provided overall figures only for this combined category, the few examples that were given were presented separately for policy impacts and practice impacts. In some, but not all, instances the accounts covered both dimensions, for example, research informed policy planning by identifying areas for investment in tai chi for older people (as a way of preventing falls) and smoking cessation brief interventions. Then, in terms of practice, the research in those same two areas helped inform professional development for the relevant staff providing the services. An example of health gain comes from one of the NIH trials analysed in the 2006 assessment by Johnston et al. [49] described above, where the authors estimated that implementation of the findings from the trial of the use of tissue plasminogen activator in cases of acute ischemic stroke, published in 1995, had a projected health gain in the 10 years after funding was completed of 134,066 quality-adjusted life years.

For each category, apart from the combined one, there was a wide range in the proportion of studies per programme that had demonstrated (or claimed) impact in each category.

Most included studies had considered key factors that might help explain the level of impact achieved (see last column in Table 1 for direct quotes, or comments that in most cases came from the original paper). Differences in impact appeared to relate partly to the approaches used and the timing of the assessment. For example, one study that appeared to shown a very low proportion of projects with impact on policy had assessed this purely through desk analysis of end-of-project reports. Such an approach restricted the opportunities to identify the actual levels of impact achieved, as opposed to the expected levels of impact, which were much higher and at least some of which would presumably have arisen later [39].

Various features of the different programmes of research also influenced the levels of impact achieved. In four studies of research programmes, 10% or fewer of PIs reported that their research had made an impact on policy, but three of these studies [38, 50, 65] included basic research (from which direct policy impact would be much less likely to occur) and, in two of those, assessment of impact was performed relatively soon after completion of the research.

While the median for the 31 studies reporting on policy impact made by programmes was 35% of projects making such an impact, the interquartile range was 20–70%. This reflects the existence of both a group of studies, as described above, where a very low proportion of projects informed policies, and a group of studies with a very high proportion of projects informing policies. In fact, a median of 77% (range 29–100%) of projects in the nine included HTA programmes [6, 34, 43, 44, 47, 48, 52, 55, 66] had had a demonstrable impact on policy. Even within this group of programmes, the type of research conducted varied. Most were technology appraisal reviews that had usually been requested by those making decisions for the relevant health service about funding (or disinvesting in) particular technologies or services. In some cases, an extremely high proportion of projects in these programmes made an impact on policy; for example, 97% of the assessments from the Austrian HTA programme were classified as making at least some impact on coverage policies [66], as were 100% of the HTA reports from the HTA unit of McGill University Health Centre in Quebec, Canada [52]. By contrast, while the Health Care Efficiency Research programme from the Netherlands was classified as an HTA programme, it included a large responsive mode element and most studies were prospective clinical trials and impact assessment occurred soon after the end of the trials [55]; a lower proportion of projects in these studies (29%) had demonstrated a policy impact.

The review of programmes funded in the first decade of the United Kingdom HTA Programme showed that, overall, 73% of projects had an impact on policy [6]. Of these, 96% of technology appraisal reviews undertaken to inform the work of the, then, National Institute for Health and Clinical Excellence, actually did so (that is, they were commissioned to inform the work of a specific user body), and 60% of other projects (mostly trials) had a direct impact on policy. The 60% figure for these latter studies compares favourably with the median of 35% in our sample overall, and is probably due to the fact that, even though the projects were not usually commissioned by a specific user body, they were on topics that had been identified as meeting a need within the healthcare system. In only four of the 22 non-HTA programmes that reported making an impact on policy was the claimed figure higher than 50% of projects [46, 56, 57, 60]. In three of those [46, 56, 57], the authors identified involvement of potential users in agenda setting and/or interaction over the research as a key factor facilitating impact. For example, Reed et al. said that the figure of 53% of projects from a programme of primary care research in Australia making an impact on policy and organisational decisions reflected “a high level of engagement of the researchers with potential users of their research findings” ([57], p. 5) (See Table 1 for further details).

Similarly, of the seven non-HTA programmes with a high proportion of projects making an impact in terms of informing practice or clinician behaviour, three highlighted the importance of interaction with potential users [32, 33, 51] and a further two were small-scale funding initiatives where the impact was often on clinicians at the location where the research had been conducted [37, 59]. In all three of the programmes where the impact was in the combined policy and practice category the proportion of projects making an impact was at least 60%, and there was interaction with users and/or the research met their needs [35, 41, 53].

Finally, in some instances observations were recorded on how the impact evaluations of whole programmes of work had been, or could be, used to inform policies of the research funding body whose work had been assessed and/or used to highlight the benefits that arise from donating to medical research charities. Examples include public research funders, such as the Catalan Agency for Health Information, Assessment and Quality, and the Northern Ireland Executive [32, 58], and medical research charities such as Asthma UK and the Australian National Breast Cancer Foundation [38, 45].

Discussion

The findings provide lessons about how a range of methods for assessing research impact can be applied, with surveys of PIs being the most frequently used, but interviews and desk analysis also being adopted as alternatives or supplements. Such methods could be adopted elsewhere in future research impact assessments. Furthermore, the methods adopted and the whole impact study were often, but not always, organised using an existing conceptual framework. The various approaches used in impact assessments have different strengths and weaknesses, and a range of theoretical underpinnings. A selection of six key established frameworks was analysed in Greenhalgh et al. [8], namely the Payback Framework [14], the Research Impact Framework [24], the Canadian Academy of Health Sciences framework [10], monetary value approaches [68], social impact assessment [23, 69] and the Research Excellence Framework (REF) [70], a pioneering approach used in the United Kingdom to assess the impact from university research groups and on which considerable subsequent analysis has been conducted [71]. While the approach used in the REF is not related to specific programmes of research, but to the research of teams who often had multiple sources of funding, the REF built on approaches originally developed to assess the impact of research programmes. The first five of the six frameworks highlighted by Greenhalgh et al. [8] helped inform at least one of the 36 studies in this current analysis and, according to the Higher Education Funding Council for England, the sixth (i.e. the REF) was itself partly informed by studies applying the Payback Framework [72]. These six key frameworks are described in Box 2.

Box 2 Summary of major impact assessment frameworks

The Payback Framework

Developed by Buxton and Hanney in 1996, the Payback Framework consists of two elements, namely a logic model of the seven stages of research from conceptualisation to impact and five categories to classify the paybacks [14]:

 • knowledge (e.g. academic publications)

 • benefits to future research (e.g. training new researchers)

 • benefits to policy (e.g. information base for clinical policies)

 • benefits to health and the health system (including cost savings and greater equity)

 • broader economic benefits (e.g. commercial spin-outs)

Two interfaces for interaction between researchers and potential users of research (‘project specification, selection and commissioning’ and ‘dissemination’) and various feedback loops connecting the stages are seen as crucial. The Payback Framework can be applied through surveys, which can be applied to all PIs but have various limitations or to case studies. For the latter, researcher interviews are combined with document analysis and verification of claimed impacts to prepare a detailed case study containing both qualitative and quantitative information; this provides a fuller picture than surveys, but is more labour intensive.

Research Impact Framework (RIF)

Originally developed by Kuruvilla et al. [24] for academics who were interested in measuring and monitoring the impact of their own research, RIF is a ‘light touch’ checklist intended for use by individual researchers who seek to identify and select impacts from their work. Categories include

 • research-related impacts

 • policy and practice impacts

 • service (including health) impacts

 • ‘societal impact’ (with seven sub-categories)

Because of its (intentional) trade-off between comprehensiveness and practicality, it generally produces a less thorough assessment than the Payback Framework and was not designed to be used in formal impact assessment studies by third parties. However, the approach proved to be highly acceptable to those researchers with whom it was applied.

Canadian Academy of Health Sciences (CAHS) Framework

CAHS Framework was developed from the Payback Framework through a multi-stakeholder consensus-building process; it is claimed to be a ‘systems approach’ that takes greater account of non-linear influences [10]. It encourages a careful assessment of context and the subsequent consideration of impacts under five categories:

 • advancing knowledge (measures of research quality, activity, outreach and structure)

 • capacity building (developing researchers and research infrastructure)

 • informing decision-making (decisions about health and healthcare, including public health and social care, decisions about future research investment, and decisions by public and citizens)

 • health impacts (including health status, determinants of health – including individual risk factors and environmental and social determinants – and health system changes)

 • economic and social benefits (including commercialisation, cultural outcomes, socioeconomic implications and public understanding of science)

For each category, a menu of metrics and measures (66 in total) is offered, and users are encouraged to draw on these flexibly to suit their circumstances. By choosing appropriate sets of indicators, CAHS can be used to track impacts within any of the four ‘pillars’ of health research (basic biomedical, applied clinical, health services and systems, and population health – or within domains that cut across these pillars) and at various levels (individual, institutional, regional, national or international).

Monetisation models

Monetisation models, which are mostly at a relatively early stage of development [68], express returns on research investment in various ways, including as cost savings, the monetary value of net health gains via cost per quality-adjusted life year using metrics such as willingness-to-pay or opportunity cost, and internal rates of return (return on investment as an annual percentage yield). These models draw largely from the economic evaluation literature and differ principally in terms of which costs and benefits (health and non-health) they include and in the valuation of seemingly non-monetary components of the estimation. Prevailing debates on monetisation models of research impact centre on the nature of simplifying assumptions in different models and on the balance between ‘top down’ approaches (which start at a macro level and consider an aggregate health gain, usually at a national level over a specific period, and then consider how far a (national) body of research might have been responsible for it arising) or ‘bottom-up’ approaches (which start with particular research advances, sometimes all the projects in a specific programme, and calculate the health gain from them).

Societal impact assessment (SIA)

Used mainly in the social sciences, SIA emphasises impacts beyond health. Its protagonists distinguish the social relevance of knowledge from its monetised impacts, arguing that the intrinsic value of knowledge may be less significant than the varied and changing social configurations that enable its production, transformation and use. Assessment of SIA usually begins by self-evaluation by a research team of the relationships, interactions and interdependencies that link it to other elements of the research ecosystem (e.g. nature and strength of links with clinicians, policymakers and industry), as well as external peer review of these links. SIA informed the Evaluating Research in Context programme that produced the Sci-Quest model [69] and also the EU-funded SIAMPI (Social Impact Assessment Methods through the study of Productive Interactions) framework [23].

Sci-Quest was described by its authors as a ‘fourth-generation’ approach to impact assessment – the previous three generations having been characterised, respectively, by measurement (e.g. an unenhanced logic model), description (e.g. the narrative accompanying a logic model) and judgement (e.g. an assessment of whether the impact was socially useful or not). Fourth-generation impact assessment, they suggest, is fundamentally a social, political and value-oriented activity and involves reflexivity on the part of researchers to identify and evaluate their own research goals and key relationships [69]. Whilst the approach has many theoretical strengths, it has been criticised for being labour intensive to apply and difficult to systematically compare across projects and programmes.

United Kingdom Research Excellence Framework (REF)

The 2014 REF – an extensive exercise developed by the Higher Education Funding Council for England to assess United Kingdom universities’ research performance – allocated 20% of the total score to research impact [70]. Each institution submitted an impact template describing its strategy and infrastructure for achieving impact, along with several four-page impact case studies, each of which described a programme of research, claimed impacts and supporting evidence. These narratives, which were required to follow a linear and time-bound structure (describing research undertaken between 1993 and 2013, followed by a description of impact occurring between 2008 and 2013) were peer-reviewed by an intersectoral assessment panel representing academia and research users (industry and policymakers). Almost 7000 impact case studies were produced for the 2014 REF; these have been collated in a searchable online database on which further research is currently being undertaken [71]. Independent evaluation by RAND concluded that the narrative form of the REF impact case studies and their peer review by a mixed panel of experts from within and beyond academia had been a robust and fair way of assessing research impact.

In its internal review of the REF, the members of Main Panel A, which covered biomedical and health research, noted that “International MPA [Main Panel A] members cautioned against attempts to ‘metricise’ the evaluation of the many superb and well-told narrations describing the evolution of basic discovery to health, economic and societal impact” [70].

One of the featured approaches currently receiving more attention is the attempt to put a monetary value on the impact of health research, and in particular studies involving attempts to value the health gain from research. Various examples of the latter were identified in the two reviews [7379]. One study, that of Johnston et al. [49], occupies a particular place in the consideration of frameworks because it included all individual projects within a programme (see above) and, while all of the projects were examined, only a small proportion were identified as making a measurable impact. Those projects ensured the programme as a whole had a high rate of return. Some other studies with a more limited scope have also used a bottom-up approach to assess the impact of specific projects, but have not gone as far as attempting a comprehensive valuation of the impact of a whole programme of research. Nevertheless, such studies can indicate probable minimum levels of returns from the whole programme studied [79].

It is important to acknowledge that this review has a number of limitations. First, fine distinctions had to be made about which studies to include, and some studies that initially seemed relevant had to be excluded because data extracted could not be meaningfully combined with those of other studies, thus reducing the comprehensiveness of the review. The seven studies [8086] assessing the impact of multi-project programmes that were included in the two reviews on which this study was based, but excluded from this current analysis, are listed on Table 3, along with reasons for their exclusion.

Table 3 Seven excluded studies

Second, each of the included studies was liable to have inherent weaknesses associated with the type of data gathering techniques employed in assessing impact from multi-project programmes. Many of the studies relied on self-reported survey data, and some of them acknowledged potential concerns about such data [51]. Nevertheless, approaches such as triangulation can somewhat mitigate these weaknesses and, in at least four examples, data were collected both by surveys and interviews and, in each case, the self-reported survey data did not seem, on average, to over-emphasise the level of impact [6, 36, 42, 45]. A further limitation with surveys is that the response rate was generally between 50% and 75%, with only four studies receiving replies from more than three-quarters of projects: Kwan et al. [51], 87%; Oorwijn et al. [55], 79%; Soper and Hanney [61], 83%; and Wooding et al. [65], 87%. Other approaches, such as the desk analysis based on end of project reports [39], obtained data from a higher proportion of projects, but, as described above, provided limited opportunities to gather data on actual impacts achieved. To the extent that differences in the impact identified for each programme reflect differences in the approach used to conduct the assessment, there will be limitations in drawing lessons from the overall dataset of 36 assessments of the impact from programmes.

Third, in various studies, it was observed that the impact assessment was occurring at a time that was too early for some, or most, of the research to have had time to make an impact [38, 39, 42, 55, 65]. In such cases, the reported level of impact achieved was not only likely to be lower than it would have been in a later assessment, but also it might look comparatively lower than that from other programmes included in the analysis where the assessment took place some years after the research had been completed. This again complicates attempts to draw lessons from the overall dataset of 36 programmes.

Fourth, in order to facilitate the analysis, it was necessary to create a small number of impact categories, but the definitions for impact categories used in the diverse studies varied widely. Therefore, compromises had to be made and not all the examples included in each category had been defined in precisely the same way; therefore, what was included in a category from one study might not exactly match what was included in the same category from another study. Particular problems arose in relation to whether there should be a ‘cost-savings’ category. There has been considerable debate about the place for cost-savings within an impact categorisation [9]; it was decided not to include a separate cost-saving category in this current analysis. However, various studies had cost-savings as one element in the broader category of ‘impact on health gain, patient benefit, improved care or other benefits to the healthcare system’ and these were included.

A final limitation is that each project counted equally to the final tally, and the question of whether impact had occurred was framed as a binary yes/no. This meant that large, well-conducted projects that had produced very significant impacts counted the same as smaller, more methodologically questionable projects whose impact was limited (but which could still be said to have occurred). In quite a few of the individual impact assessments this limitation was reduced because more detailed case studies were also conducted on selected case studies. These were often reported to provide examples of the significant impact. However, in our current analysis, any supplementary case studies were not included in the data used to construct Table 2, which is the main comparative account of the findings.

Given the various limitations, the findings should be viewed with a degree of caution. Nevertheless, the included studies do present evidence of wide-ranging levels of impact resulting from diverse programmes of health research. Quite large numbers of projects made at least some impact, and case studies often illustrated extensive impact arising from certain projects. Our findings resonate with theoretical models of research impact, namely impact is more likely to be achieved when the topics of applied research, and how it might best be conducted, are discussed with potential users of the findings and when mechanisms are in place to receive and use the findings [6, 13, 1621, 2830]. We also found variations depending on the nature of the research being conducted. These points can be illustrated by some of the more notable examples from Table 1. For example, in the case of 100% of HTA reports from the HTA unit of McGill University Health Centre in Quebec, Canada, the impact was said to be because of “(i) relevance (selection of topics by administration with on-site production of HTAs allowing them to incorporate local data and reflect local needs), (ii) timeliness, and (iii) formulation of policy reflecting community values by a local representative committee” ([52], p. 263). In the case of 97% of the assessments from the Austrian HTA programme being classified as making at least some impact on coverage policies [66], there were features of the Austrian policymaking structures that facilitated the use of HTA reports. The authors explained that, to be used, the HTA reports “need primarily to be in German language and they have to be produced within a time period that is strongly linked to the decision-making process” ([66], p. 77). By contrast, and as noted above, while the Health Care Efficiency Research programme from the Netherlands was also classified as an HTA programme, it included a large responsive mode element and most studies were prospective clinical trials rather than the technology appraisal reports that are the main element of many HTA programmes [55]. The lower proportion of projects in these studies (29%) demonstrating a policy impact illustrates that variations in levels of impact achieved can be linked to the type of research conducted, even in the same overall field, which in this case was further exacerbated by the impact assessment occurring soon after the end of the trials [55].

Overall, as Jacob and McGregor reported for the HTAs conducted in Canada by the Quebec CETS, “The best insurance for impact is a request by a decider that an evaluation be made” ([48], p. 78). Furthermore, for those programmes (or parts of wider programmes) for which there were explicit mechanisms such as formal committees to receive and use the findings from technology appraisal reports in coverage decisions about investment or disinvestment, the proportion of projects making an impact was very high.

Further examples of studies of the impact of multi-project programmes have been published since the second review was conducted, with the examples from Bangladesh, Brazil, Ghana and Iran [25] illustrating a widening interest in producing evidence of impact. In the Ghanaian example, 20 out of 30 studies were used to contribute to action, and Kok et al. again showed that considerable levels of impact could be achieved by adopting an interactive approach; they reported that “the results of 17 out of 18 user-initiated studies were translated into action” ([4], p. 1). These four impact assessments provide further evidence that contributes to the global pool of studies showing the breadth of impact made by health research, and also reinforces the evidence that research impact assessment has become a rapidly growing field.

As was noted, some individual studies provided lessons for the specific funder on whose research they focussed as to how that funder might best use its research resources. Some more general lessons could also be drawn in terms of the types of research programmes, for example, needs led and collaborative ones, that seem to be more likely to lead to impacts, though it is widely understood that overall it is desirable for there to be a diversity of health research funded. Additionally, the growing body of evidence about the impacts that come from health research could potentially be used to promote research funding along the lines argued in the World Health Report 2013 [1]. Studies showing the monetary value in terms of a high rate of return on health research expenditure, whether from specific programmes or more widely, seem to have particular potential to be used to promote the case for further funding for medical research [77].

Lessons can also be learnt from the review about the range of methods and frameworks available to conduct health research impact assessments. Furthermore, in addition to continuing refinement of existing frameworks, for example, of the Canadian Academy of Health Sciences’ framework in Canada [87], there are also ever-increasing numbers of studies on which to draw to inform analysis, including current work in Australia [88]. Given the expanding focus on research impact assessment, the potential lessons that could be drawn from them, individually and collectively, are likely to be more significant if there could be somewhat greater standardisation. Any standardisation of methods might attempt to reduce the current diversity on items such as the categories of impact to include and their definition, and the timing of data collection and its presentation. Such moves towards standardisation might facilitate comparisons between processes used in different programmes and, in that way, inform strategic decisions that funding organisations will always need to make as to how best to use resources.

Some ideas about standardisation, as well as some potential dangers, might come from recent experience in the United Kingdom where many research funders are now using a standardised approach called Researchfish® (Researchfish Ltd, Cambridge, United Kingdom). This is an on-line survey, originally developed with the United Kingdom’s Medical Research Council, that an increasing number of research funders are now sending annually to the PIs of all the projects they support. It asks for information on outputs, outcomes and impacts (see Raftery et al. [9] for a more detailed account). It has several advantages, including a high formal response rate, wide use that could facilitate comparability between programmes and funders (though it does not currently report data in a way that would have facilitated its use in the comparisons made in our analysis), and a database that builds up a fuller picture over successive years, including a number of years after a project’s competition, thus allowing the capture of certain data that a one-off bespoke survey might miss. Its main limitations include being a burden on researchers (although this has been reducing as successive versions of the assessment survey have been made more user-friendly), the potential danger of a poorer response rate to key questions than can be obtained by bespoke surveys, and reduced specificity for some aspects of health research because it has been standardised to cover many research fields. As with other survey approaches, Researchfish provides less detailed information and understanding than can come from case studies, but allows wider coverage for the same resources.

How best to address these issues when seeking more standardised approaches could be of interest to the newly established WHO Global Observatory for Health Research and Development [89]. Furthermore, perhaps there would be scope for bringing together the expanding body of evidence providing examples of the impact from programmes of health research, with the increasing sophistication, and global spread, of the analysis of factors that might be associated with research use [90, 91].

Conclusion

The quite high proportion of projects that reported making an impact from some multi-project programmes, including needs-led and collaborative ones, as well as the demonstration of the monetary value of a programme, could potentially be used to promote future research funding along the lines argued in the World Health Report 2013 [1]. This review also indicates that the evidence about health research impact is continuing to grow.

In addition to being of value to research managers in identifying factors that might lead to increased impact, this review of impact studies also demonstrates the range of methods and conceptual frameworks that can be used in conducting such studies. However, weaknesses in some studies, and diversity between studies in terms of methods and timing used, reduces the value of some individual studies and the ability to make comparisons within the full suite of 36 studies.

A standardised approach to assessing the impact of research programmes could address existing methodological inconsistencies and better inform strategic decisions about research investment in order to enhance impact. However, experience from the United Kingdom shows that moving towards such standardisation can itself generate further difficulties. There could be a role for the newly established WHO Global Observatory for Health Research and Development [89] in both drawing on the existing evidence from many countries about the impact of health research and in promoting ideas for achieving greater standardisation in health research impact assessment.