1 Introduction

Evaluation has been a legislative requirement for European Research and Technology Development (RTD) programmes since the early 1980s. Since then the Commission Services have gained various experiences in evaluating research. The launch of the Fourth Framework Programme (FP) in 1994 led the European Commission (EC) to introduce a new evaluation scheme consisting of annual reporting of continuous monitoring, and a five-year assessment that includes the review of two previous research programmes [10]. The most recent ex-post Evaluation of FP6 [7] deals with the entirety of FP6 and provides some input into the interim evaluation of FP7 to be performed in 2010. The Expert Group of the FP6 evaluation addressed three broad sets of issues, particularly: the rationale, implementation and achievements of FP6. For the FP7, a new monitoring system or an internal management tool, consisting of series of annual reports and system of indicators is under development.

In the field of transport, the evaluation of European research projects’ achievements and impacts does not have a long tradition. Some national level evaluations have been carried out in the recent years (e.g. Pihlajamaa and Berg [25], Kalenoja et al. [12], in Finland; Albrecht and Vaněček [1] in the Czech Republic) but essentially the research evaluation is a new, emerging field in the transport context both at national and European levels.

Currently the EU RTD evaluation practices comprise of continuous monitoring, 5 year assessments and mid-term evaluations. They are characterised by a strong focus on monitoring compared to impact assessment, on projects and programmes rather than the broad policy context, and a heavy reliance on expert panels rather than studies. Also, there is a constraint imposed by the limited time and financial resources devoted to evaluation (EC Joint Research Centre (2002) RTD Evaluation Toolbox. http://www.fteval.at/files/evstudien/epub.pdf). Georghiou and Polt [10] detail that in terms of evaluation on a European level, ‘there is no single model of good practice’, but peer reviews and expert groups are used for evaluation processes. This is also emphasised by Durieux and Fayl [6], who state that the most important means within the evaluation of European RTD programmes are independent expert panels, interviews, questionnaires and core indicators. The panels are made up of people with high levels of responsibility in the field, which in practice results in a balance of experts with either an industrial or an academic background [6]. Experts are selected by the Commission on the basis of their experience and knowledge of community research policy, which indicates that they will be drawn primarily from the knowledge sector. Efforts are also made to ensure a balance among different sectors of the research community as well as a geographic spread of evaluators.

The range of users of knowledge produced by evaluations is broad, because evaluations may be conducted both internally and externally and by different organisations. The most typical user categories are decision makers, policy makers, practitioners, scientists, consultants, auditors, trained evaluators, programme and project managers, project participants, economic analysts, NGOs and consumer groups [8, 11, 23, 24].

Within each of the categories, there is a significant diversity of users, whose expectations for evaluation results and methodologies may vary. Consequently, the nature of produced knowledge depends on its use, i.e. by whom and how the knowledge will be used. In addition, the utilisation of produced evaluation knowledge seems to be very challenging. Even though FP evaluations are becoming a permanent practice, the development of evaluation methodologies is often too short sighted, not continuous and the results are not disseminated widely to different stakeholders. Our view is that these issues need to be addressed carefully in the future, to allow evaluations to gain greater role in guiding future policy and research agendas.

Our interest in the transport research evaluation was initiated by the METRONOME project, financed under the FP7 of the EC, aiming to develop a methodology for evaluation of research project impacts in the field of transport. The project, together with four other transport research evaluation projects (AGAPE, AIMS, MEFISTO and SITPRO Plus), presents transport research’s contribution to the overall trend in the EC FP evaluations and provides a mean to get a more detailed view on transport research achievements in the previous FP projects. In addition, the project contributes to new research policy objectives in the field of transport.

In the traditional view, European investment in RTD creates a demand for information on the efficiency with which RTD is managed, the quality of the work itself, and the economic and social returns. Evaluation schemes set up to supply this information are important tools for policymakers, and they give the research community an opportunity to demonstrate its achievements. Hence, the traditional role of the research programme evaluations have been to legitimate the past research activities. Since the focus has been on ex-post evaluations, only very little attention has been given to the elements of future development, learning and strategic long term planning, the elements, which are growing strong in the contemporary evaluation literature (e.g. [2, 9, 1417]). Kuhlmann [14], for example, argues that current RTD arena with well organised actors (having differing interests, values, and power) but no dominant player, competition for impact and resources, and search for (some) alignment and policy learning, requires more from evaluation practices than just legitimacy. It requires considering research evaluations as ‘Strategic Intelligence’ in order to steer the research policy developments of the future. Arnold [3] complements Kuhlmann’s arguments by claiming that growing EU research budget means also: increased need for accountability; efficiency of the European RTD system under scrutiny; timing of forthcoming evaluations in line with need to have an informed debate on future EU RTD policy; need to focus more on the “fundamental” aspects and less on minor implementation issues; and need to develop evaluation capacities as part of the European Research Area.

Our contribution to the research evaluation discussion can be found on the development of evaluation methodology for transport research projects, which includes multiple evaluation methods and considers many of the above aspects. We claim that the key questions that need to be addressed through the developed transport research evaluation methodology are as follows:

  • What kind of elements should a framework for evaluating the achievements and potential impacts of transport projects supported in EC framework programmes include?

  • What are the forms of evaluation methods required within such a framework?

  • Can such an evaluation framework produce recommendations for future transport research and policy objectives as well as mutual learning for the basis of strategic long term planning—the strategic intelligence?

In order to find answers to the above questions, we have structured the article as follows: First, we present the theoretical background for evaluation of research. Second, we describe the evaluation methodology developed in the course of the METRONOME project. In the subsequent section we present, based on the methodology testing, the main results relating to the rationale, implementation and achievements of the FP5 and FP6 transport research projects. We conclude with a discussion on both the theoretical and practical implications of our method and by presenting some relevant future research needs.

2 Theoretical background for evaluation of research

According to the classical definition of Scriven [26], “Evaluation is the process of determining the merit, worth and value of things.” It is the process of distinguishing the worthwhile from the worthless, the precious from the useless. Chelimsky [5] emphasises that evaluation by definition is social research. As regards programme evaluation, she points out that it is application of systematic research methods to the assessment of programme design, implementation and effectiveness.

The evaluation of RTD activity, e.g. research programmes, makes use of the same basic concepts as evaluation activity in general. These are output, outcome, impact and effectiveness. It is clear that the borderlines between these concepts and their contents are not absolute, but rather flexible, and hence they are often used inconsistently. It is common, for example, to regard impact and effectiveness as interchangeable. Our view on the concepts, and their contents, are presented below.

  1. 1)

    Output: the concrete result of a research project (e.g. final report of a project)

  2. 2)

    Outcome: the product or process arising from the research result (e.g. new methodology, software tool, process)

  3. 3)

    Impact:the product, event, condition and/or change that follows from the outcome (e.g. policy initiative, new product/service development)

  4. 4)

    Effect/effectiveness:broad, general, societal change that indicates the extent to which the impacts of a programme, policy or organisation have promoted the achievement of set goals and/or initiated societal change (e.g. established norms and regulation, contributed to strategy processes of public and private organizations) [22].

In addition to the different evaluation concepts, another dimension for analysis is that of temporal scale. We can differentiate between immediate, intermediate and ultimate impacts (and effects) of projects (e.g. [29]). This indicates the expected time required for the achievement of impacts and effectiveness (See Fig. 1).

Fig. 1
figure 1

Expected time perspective of impacts (source: [29])

In the RTD evaluation, there are two basic types of evaluation. The first, summative evaluationfocuses on relationships between inputs and outputs. Here, we can make distinctions between the impact and effectiveness evaluation and goal achievement evaluation. Impact and effectiveness evaluation differs from goal achievement evaluation in that it takes into account the side impacts or unanticipated impacts that a programme may have, which the latter type of evaluation does not cover. In this light, it is useful to divide impacts as: (1) anticipated and unanticipated, (2) inside and outside the target area (or relevant or irrelevant) and (3) productive and detrimental (or neutral in impact) (e.g. [21]). The goal achievement evaluation again, focuses on the relevance of objectives or the costs arising from the activity, which the former type does not take into account.

The second type, formative evaluation, focuses on future development, learning, strategic long term planning and structural change, issues grouped under an umbrella concept ‘strategic intelligence’ in the contemporary evaluation literature (e.g. [2, 1417]).

In general, the main purpose of the recent research programme evaluations has been to justify the past research actions (value for money) and consequently the focus has been on summative evaluations. It seems, however, that the perspective of ‘strategic intelligence’ is growing stronger also in European programme evaluation. The METRONOME evaluation method we present in the following sections includes features both on summative and formative evaluations.

Evaluations of research often include both qualitative and quantitative elements. The qualitative aspect tends to constitute a process of peer review by people with expertise within the appropriate area or different kinds of participatory approaches (workshops, interviews, etc.), whilst the quantitative aspect frequently involves the use of indicators. In the latter case, the data can be obtained e.g. by questionnaire survey. Traditionally, involvement of informed peers has been regarded as the most reliable and comprehensive way (and indeed sometimes the only way) to judge scientific quality and societal impact [4, 21]. Quantitative data has been seen as a supportive element to the peer review process [23].

Basically, carrying out valid evaluations requires complementary information and knowledge, produced by various methods. In addition, the nature of produced knowledge depends on its use, i.e. by whom and how the knowledge will be used. For example, for legitimising purposes, the knowledge can be indicator based and quantitative, but if the focus is on strategic development, the knowledge needs to be qualitative and participatory.

In order to carry out valid and transparent evaluations taking into consideration the different evaluation concepts, types, methods and expected results, the evaluation process needs to be structured in a comprehensive way. An example of such process (evaluation steps) is introduced e.g. by Kuitunen and Hyytinen [13] in Lähteenmäki-Smith et al. [19]:

  1. 1.

    Setting and defining of evaluation objectives

  2. 2.

    Choice of evaluation methods

  3. 3.

    Specification of goals of the policy, programme, organisation or similar to be evaluated

  4. 4.

    Identification of the evaluation target’s impact and effectiveness mechanisms

  5. 5.

    Identification of contextual issues

  6. 6.

    Reviewing objectives in relation to observed impacts

  7. 7.

    Utilisation of evaluation information in setting the goals and future needs

Based on the previous practical and theoretical considerations, there seems to be a need for improved strategic intelligence and indicators to understand the actual dynamics (see e.g. LEG [16]) and impacts of research programmes and involvement of relevant stakeholders. It is not, however, realistic to try to find one general methodology for programme evaluation, but rather to specify different mixes of approaches depending on overall focus and purpose of the evaluation. The following framework for evaluation of the impacts of transport research projects presents our view on such a framework for the transport domain.

3 Method

3.1 The framework

The proposed evaluation framework focuses on three themes currently relevant for European transport research: Strengthening industrial competitiveness (IndCo); Contributing to sustainable development (SuD); and Improving community and public policies (CPP). The methodology takes a two-dimensional approach to project impact evaluation. On the one hand, the projects’ achievements are evaluated against the FP Work Programme objectives and targets set for IndCo, SuD and CPP themes (goal achievement evaluation). On the other hand, it evaluates, through the METRONOME impact model, the impacts of the FP research projects according to four impact groups (impact evaluation). Based on these two approaches including a mix of evaluation methods, ‘strategic intelligence, i.e. recommendations relating to definition of performance targets for future FPs and new research policy objectives, research instruments and actor networks can be formed (formative evaluation).

The METRONOME screening, selection and evaluation methodology has three main phases (Fig. 2):

  1. 1.

    Identification of European transport research and policy objectives for Industrial Competitiveness; Sustainable Development; and Community and Public Policies

  2. 2.

    Screening and selection of FP themes and projects for the evaluation

  3. 3.

    Evaluating project achievements and impacts

Fig. 2
figure 2

The three phases of the METRONOME evaluation methodology and their linkages

In the first phase, the thematic European transport research and policy objectives are derived from relevant European policy documents and research work programmes. The second phaseincludes the following three steps. First, the FP themes and key actions relevant for the transport theme are identified from the FP Work Programme. Second, outputs of projects (final reports) under the selected themes are gathered. Third, projects to go through a detailed evaluation are selected with the help of text mining software and checklist (for details, see METRONOME Deliverable D2.1). As a result of the project selection, a sample of (e.g. 30) “best matching” projects within each of the evaluation themes (IndCo, SuD and CPP) can be selected for detailed evaluation. The third phase, the actual project evaluation in the METRONOME framework is based on the following two pillars.

The first one is the evaluation of research project performance against FP objectives and targets (goal achievement evaluation). Two complementary approaches are proposed here and presented in the following sections. The second one (impact evaluation) is the METRONOME impact model (Fig. 3), which is founded on and further elaborated from the Impact Assessment Model by Lähteenmäki-Smith et al. [19]. The model illustrates how FPs can bring about four kinds of impacts, namely (1) impacts on management and co-ordination, (2) scientific impacts, (3) customer/end user impacts, and (4) societal impacts in the fields of IndCo, SuD and CPP. The main beneficiaries of the research results are listed on the right hand side of Fig. 3. The above four impact groups were identified to best reflect project impacts in the FP research programme evaluation context. The “lower level” impacts in the impact pile can be seen as enabling factors for the upper level impacts. For example, good management co-ordination impacts enable (but do not guarantee) good scientific impacts.

Fig. 3
figure 3

The METRONOME impact model

The METRONOME impact model thus proposes four indicator groups (Table 1). Impact indicators on management and coordinationreflect the ‘enabling factors’ or ‘tools’ for complementing the impacts measured in the other three groups. Scientific impact indicatorsreflect the quality and validity of research project results (outcomes) versus the project’s own and FP objectives and targets set on different levels. Customer/End user impact indicatorsreflect the (short-term) benefit of the research results to their actual end users (e.g. EC, industry, national governments, ministries, research organisations, etc.). Societal impact indicatorsreflect long-term effects of the research on the society (e.g. on the transport system end-users: individuals, logistics companies, industry, etc.).

3.2 The evaluation methods

The following sections present the four complementary evaluation methods developed and tested in the course of the FP7 METRONOME project. The methods are: two different project evaluation matrices (based on project reports); coordinator questionnaire; and lead user interviews. In order to get a comprehensive view of the programme achievements and impacts, a specific mix of evaluation methods was applied for each of the three evaluation themes (IndCo, SuD, and CPP).

3.2.1 Evaluation of achievements of FP objectives and targets by research projects—a matrix approach

The proposed evaluation method includes twelve distinct steps, and is applied in the following to the Industrial Competitiveness theme (for details see METRONOME Deliverable D3.1).

  1. Step 1

    Identification of Industrial Competitiveness domains

    Based on a detailed review of scientific and European Union’s policy document literature, relevant domains are identified. As an example, such domains can be:

    • Technologies. Processes and Services

    • Products

    • Infrastructures

    • Patents & Standards

    • Societal & Environmental

    • Legislative

    • Financial

  2. Step 2

    Identification of Framework Programme specific objectives and targets related to Industrial Competitiveness

    Here, a detailed analysis of the policy objectives and measurable targets of the FP Work Programmes is carried out.

  3. Step 3

    Definition of Indicators based on each Framework Programme target

    An indicator is defined as the effort to quantify and simplify phenomena and help understand complex realities. Indicators are aggregates of raw and processed data but they can be further aggregated to form complex indices. The indicators are defined by transforming the targets set by the FP Work Programmes to measurable statistics and indices.

  4. Step 4

    Grouping of indicators based on Framework Programme objectives

    The grouping of indicators is carried out in two levels: (1) according to the objectives that the targets—which are addressed by each indicator—are related to; (2) in order to achieve a reduction of indicators that address the same topic both semantically and logically. In this case two or more indicators—within the same group of indicators per objective—can be merged into one indicator that will measure more than one characteristic.

  5. Step 5

    Relating each indicator to one of the domains

    Each indicator is associated to the domains that are addressed by it. This occurs with semantic and logical terms. The association indicates both the exact domains that each indicator is associated to and also provides some useful qualitative insights for each indicator in terms of relations to these domains.

  6. Step 6

    Definition of the Evaluation Framework and success/failure criteria

    The Evaluation Framework is a database which consists of general information of the project under evaluation (such as name, acronym, etc.). In addition, there are fields where each indicator is measured. The indicators’ selection for each project assessed is based on the objectives—and thus resulting targets which are then transformed to indicators according to steps 3 and 4. The overall question that shall be answered for each indicator is: “Rate the extent to which the project contributed/addressed the indicator”. The measuring scale for each indicator is presented in Table 2.

    Table 1 Indicator groups and examples of indicators
    Table 2 The scale for measuring indicators and related definitions

    The actual implementation of the proposed evaluation method is carried out by deploying the Evaluation Framework files and filling in the necessary information by measuring the extent to which the project under evaluation addressed each indicator. An indicative sample of an Evaluation Framework database file is presented in Table 3.

    Table 3 Sample Evaluation Framework database file
  7. Step 7

    Definition of the Justification Matrix for selecting projects

    The selection of the projects to be evaluated is determined through a project selection Justification Matrix. This matrix is comprised by the domains, which have already been defined. In order for a project to be selected, at least one of the Industrial Competitiveness domains must be addressed by the project.

  8. Step 8

    Selection of projects based on the Justification Matrix and sampling

    The identification of the projects to be evaluated is executed in two ways. First, the thematic area addressed by the project, together with the project objectives is identified. Second, a two page indexed project identification document is created for each project. In case that one or more of the search criteria are identified during this indexed search process, then this domain is considered as relevant to the specific project and it is marked positively in the Justification Matrix. The process is an iterative procedure which has to be repeated several times in order to identify the existence of relevance with each one of the domains assessed.

  9. Step 9

    Testing the applicability of the method on a small number of projects.

    In order to ensure the applicability of the proposed evaluation method, a validation step has to be executed in this stage [18]. The testing of the method occurs with applying it on a small number of projects. This number of projects to undergo this testing step is considered sufficient when it reaches 2–5% of the total projects to be assessed and should be within a range of 12–25 projects in total, independent of the actual total number of projects [27, 28]. Note that this is only the sample size for the testing of the method, i.e. investigating the effectiveness and applicability of the data-mining techniques of the previous 8 steps and it is not the actual sample size required for the projects, as mentioned in step 8. In case the application of the method is not considered successful (e.g. no results can be measured, no projects can be found, no association between indicators and targets can be justified etc.), then the user is advised to return to Step 3 and re-run the evaluation process, based on the shortcomings identified.

  10. Step 10

    Qualitative analysis of all projects and analysis of the results

    This step involves the analysis of the results of the above described evaluation process. Each selected project is rated for each indicator. The end results of ratings of all projects are then analysed collectively in the following manner: Each scale used for rating indicators is assigned a number from one to five. An index is then created for each scale according to the number that this scale has been used (through the rating process) for all projects assessed for each indicator. The sum of these indexes always must sum up to one. This procedure is repeated for each indicator per objective. A graphical chart is then created as follows: the x-Axis is labelled with the five scales and the indicators of each objective. The y-Axis is labelled with the index achieved for each indicator per scale. An illustrative example of such a chart is presented in Fig. 4.

    Fig. 4
    figure 4

    Example of an analysis chart for the indicators (columns in different colors) of one FP objective

    The same procedure results to the analysis of each indicator separately against each objective or Framework Programme.

  11. Step 11

    Relation of all projects’ evaluation results to Industrial Competitiveness domains

    The project results are related to the defined domains (see step 6) as follows. Each time that a response according to the evaluation scale is recorded, a relation to the respective domains which are assigned to the respective indicator is made. The total number of responses according to the five scale values of the evaluation process indicates the performance for each evaluation domain.

  12. Step 12

    Conclusions, recommendations and further use by EC services

    The final step of the proposed method consists of the interpretation of the results and drawing of conclusions and recommendations.

3.2.2 A simple matrix approach to evaluate project achievements and impacts

An alternative matrix approach, more simple than the one presented above, was tested in the course of the METRONOME project. The approach includes two complementary evaluation matrices and contributes both to goal achievement and impact assessment. It was tested under the Sustainable Development and Community and Public Policies themes (for details see METRONOME Deliverable D4.1 and METRONOME Deliverable D5.1).

The first evaluation matrix supports a qualitative evaluation of the extent to which research projects financed under FP have contributed to the evaluation theme, e.g. SuD. Based on a review of the FP research and commissioning structures, at least three levels of objectives can be identified as relevant to many of the transport research projects commissioned under past two Framework Programmes. These are: (1) FP Work Programme-level (WP) objectives; (2) Work Programme sub-level (thematic) objectives (that the project was commissioned under); (3) Project-level objectives. The evaluation matrix enables evaluators to specify whether each of the above objectives have been met fully, partially, indirectly or not at all. The same approach is applied to evaluate the potential impacts of research projects in four impact groups and with related indicators (see Table 1). In addition, each completed evaluation needs to be accompanied by a textual summary. The summary supplements the evaluation matrix by detailing other relevant and specific information about projects and/or their outcomes. The research objectives on different levels and impact indicators form the basis of the evaluation matrices. The evaluation matrices are completed basing on the published Final Reports of projects. A skeleton template for the approach adopted is shown in Appendix 1.

The second part of the evaluation matrix concerns the success of project result dissemination (Appendix 2). FP projects typically result in the publication of a wide range of deliverables and outputs, both formal and informal. The dissemination quality matrix enables evaluators to specify the characteristics of specific dissemination activities undertaken during and after the project lifetime. Consequently, it assesses the potential effects of project results and indicates whether estimated impacts upon the objectives are likely to have been achieved in practice. The matrix indicators (list of activities) are selected on the basis that they are comprehensive whilst also feasible to be answered based upon written documents in the public domain. Project dissemination reports, project final reports and websites provide evidence of the scope and nature of dissemination activities conducted. In addition, each completed evaluation should be accompanied by a textual summary of the dissemination information assessment, detailing other relevant and specific dissemination information about projects.

3.2.3 Assessment of potential project impacts

A questionnaire designed and distributed to a sample of FP project coordinators provide the main tool for this method, which was tested in the context of Sustainable Development theme (for details see METRONOME Deliverable D4.1). The main aim of the questionnaire is to collect information of the impacts of research projects, in four impact groups presented in Table 1. The questionnaire is composed of four parts according to the impact groups. Further, indicators are identified to describe the impacts within each of the groups. Statements or questions to be answered are again designed based on the indicators (Table 4). In METRONOME project, an email survey was considered as the best approach to gather information from a geographically dispersed group of coordinators. The developed questionnaire uses a qualitative Liker scale as follows: Completely disagree—Partially Disagree—Neutral—Mostly Agree—Completely Agree—Don’t know. The Likertscale is the most used scale in survey research, but it always includes a risk for bias relating to the neutral answers.

Table 4 Relationships between indicators and questions in the METRONOME co-ordinator questionnaire

During the METRONOME project, it was discovered that to complement the information from the questionnaires, it is advisable to carry out detailed co-ordinator interviews. Interviews can provide additional information about the projects, dissemination and use of results in order to draw conclusions on the impacts of the evaluated projects.

3.2.4 Lead-user views on project achievements and impacts

The following fourth approach was considered as the most important one to collect information on programme or project impacts among the METRONOME consortium. This approach includes a workshop with potential lead-users, and interviews conducted among potential and target users of FP projects. The approach was tested in the context of Contribution to Community and Public Policies theme (for details see METRONOME Deliverable D5.1).

The lead-users can be defined as persons (civil servants, consultants, scientists, policy makers, etc.) really using the knowledge gained from EU-research project.

The workshop organized as part of the METRONOME-project was primarily focused on gathering information on specific evaluation indicators that would be relevant to lead-users. Based on this information a specific questionnaire was produced and used to collect information from potential lead-users. The selection of potential lead-users was based on respondent’s characteristics and not on their potential interest for specific projects in the METRONOME sample sets. This meant that the lead-users views did not reflect their opinion on the sample projects, but on a wider sample of FP projects.

The data collection, using the questionnaire, took place in two ‘waves’. In the first ‘wave’ the METRONOME partners approached self-selected lead-users. By using the developed uniform questionnaire format, consistency between the results of interviews by different partners was maintained. The partners were free to use either telephone or face-to-face interviews or distribute the questionnaire by email to pre-selected respondents. The questions in the questionnaire related to the perceived impact of FP research in general, the results of specific projects in which the respondents had been involved, the benefits for the respondent and his/her organisation and what did and did not work in FP projects. After analyses of the first response wave, it was decided to enlarge the number of responses by adding a second ‘wave’. This second wave included mainly the people registered as potentially interested participants for a planned 2nd METRONOME workshop. In addition, the questionnaire format was slightly changed to better accommodate the use of email.

Of the total number of questionnaires that became available for analysis, around 20% were from respondents that had not (in any way) been involved in FP projects and could therefore not answer any questions on specific project results. For those involved in projects, this involvement differed from project partner to participant in project events (workshops, etc.).

4 Results

In order to test the feasibility of the developed framework and evaluation methods within, a case study of 100 FP5 and FP6 transport projects was carried out in the course of the METRONOME project. The projects represented the themes of Industrial Competitiveness (50 projects), Sustainable Development (30 projects) and contribution to Community and Public Policies (20 projects). A specific combination of the evaluation methods presented above was applied for each of the themes. The case study projects were financed under either the FP5 thematic priorities Sustainable Mobility and Intermodality and Land Transport and Marine Technologies or FP6 priorities Sustainable Surface Transport and Research for Policy Support. Altogether 700 transport projects were financed under those priorities during the years 1999–2006.

4.1 Rationale

The case study showed that FP5 and FP6 Work Programmes presented a wide variety of transport objectives and targets at different levels. The analysis revealed the following three levels of objectives relevant to many of the transport research projects commissioned under the programmes:

  • Specific Work Programme (WP) or Thematic Area objectives

  • Key Action (KA) of the Work Programme or Programme Subdivision (PS) objectives (or targets)

  • Strategic project objectives

Table 5is based on the matrix-approach results and shows the number of objectives set for the themes Industrial Competitiveness, Sustainable Development and Community and Public Policies at FP5 and FP6 Work Programme level and lower KA/PS level. The number of thematic objectives indicates a much higher significance given to the Industrial Competitiveness theme in FP6 than in FP5. As regards contribution to Community and Public Policies, a similar but weaker trend can be identified. In the field of Sustainable Development, the emphasis seems to be quite similar in both FPs.

Table 5 FPs and number of thematic objectives/targets set for different levels

The set of objectives that were best met were the strategic project objectives. This is hardly surprising as these are the objectives that have the most direct relevance to the project. A surprising finding, based on the matrix evaluations, was that in the fields of Sustainable Development and Community and Public Policies, both FP5 and FP6 projects were considered to have contributed more to higher-level WP objectives than the lower-level KA or PS objectives, which could be considered more directly applicable to the projects commissioned. One explanation could be that the higher level objectives are more general and thus easier to meet than the lower level objectives that are more specific. Also, when a project meets its specific objectives satisfactorily, but not the European policy objectives, it might be because the project is focused on one single goal only and thus has been evaluated low for the wider objectives. In the field of SuD only 20%, and in the field of CPP 50%, of the projects reviewed met their strategic project objectives and the relevant KA (or equivalent) objectives that they were commissioned under, as well as one or more of the relevant WP objectives. This suggests that there could be considerable discrepancies between the SuD and CPP components of different levels of objectives set.

Based on the co-ordinator surveys, the level of funding was considered sufficient in FP5 projects, but not among FP6 projects. For example, in the field of CPP less than 30% of the respondents considered the research budget being adequate. In addition, the (input) data availability was considered much better in FP5 than in FP6. The lead-user interviews did not complement the above results. Instead, based on the interviews, the cost effectiveness of the projects in terms of money or resources spent was considered better in FP6 than in FP5.

4.2 Implementation

In general, and based on all approaches, project management in both FPs was carried out satisfactorily. The level of expertise among project participants was considered high in both FPs by both the co-ordinators and lead-users. Dissemination of project results, however, seemed to be a contradictory issue. On the one hand, the majority of project co-ordinators agreed that the project results in both FPs were adequately disseminated to the end-users. The lead-users agreed with the good dissemination level regarding FP6 projects. On the other hand, in more general terms, e.g. for FP evaluation purposes neither the project result dissemination level nor the quality were adequate. At the time of METRONOME evaluation, the project results were not easily available from a centralised web address.

4.3 Achievements

Within all three evaluation themes, the vast majority of strategic project objectives of the project sample were considered to be fully met. This indicates that on individual project levels, in terms of both substance and practicalities, the projects worked well. However, as argued in the previous section, this does not guarantee a positive contribution to higher-level objectives, since there seem to be discrepancies between the different levels of objectives and targets set for the FPs.

In the field of Industrial Competitiveness, major achievements were found in the fields of development of advanced technologies, processes and services, and in contribution to societal, environmental (e.g. safety, traffic congestion) and financial issues. These same fields were emphasised in both FPs. The main contributions of Sustainable Development projects in both FPs were identified as developing, integrating and managing a more efficient, safer, more secure and environmentally friendly transport system to provide user-friendly door-to-door services. Contribution to the development of decision-making tools was the main achievement of Community and Public Policies related projects in both FPs.

Based both on the lead-user survey and the co-ordinator survey, scientific publications and a high level of scientific expertise in general were considered as the main immediate impacts of FP5. Improved networking between researchers and public/private organisations and strengthened networks between international parties were seen as the major immediate impacts of FP6. In addition, patents and standards produced in IndCo projects (especially in FP6), represent the immediate impacts, even though the transport industry and service sector seem not to have been greatly involved.

As regards the intermediate impacts, the major successes of the activities of both FPs can be considered to be strengthened expertise, development of decision-making tools, and co-operation with end-users in the projects. Contributions (often indirectly) to new transport policy development but also to new products or service development were also considered slightly positive. The evoked networking or co-operation seemed to be strongest among project research partners, but could be identified also among stakeholders in both public and private sectors. Failure to convert project results into standards, norms or regulations and the fact that the projects did not raise new unsolved research questions were considered to be the weaknesses of the FP5 and FP6 transport projects. This indicates that even though tools for decision-making have been developed and some contributions to e.g. transport and SuD policies and strategies have been made, the practical, regulatory outcomes have either been modest or are not known. In addition, the identified discrepancy between the FP objectives of different levels and the low level of achievement of European level objectives by the projects in matrix evaluations supports this finding.

Evaluating the ultimate impacts, which might be realised 10 or more years down the road, is a difficult task within all of the METRONOME evaluation themes, since most of the FP5 and FP6 projects are more recent. In addition, investigating the impact pathways and mechanisms (e.g. follow-up research projects and their impacts and consequences) was considered too time and money consuming for our case study resources, but should certainly be considered as an essential part of future evaluations. However, improved transport safety and awareness of environmental impacts from transport and consequent utilisation of developed environmental impact assessment methods or even implementation of identified transport measures are examples of such ultimate impacts.

5 Discussion

Testing the METRONOME methodology illustrated that different mixes of evaluation methods (both qualitative and quantitative) are needed for evaluation of projects under the themes of IndCo, SuD and CPP. The main findings regarding the suitability of tested evaluation methods are presented below.

First, the project evaluation matrix provided an indication of the results of each research project evaluated, as well as a holistic summary of the research project findings, their contribution to objectives, and estimation of impact areas and types. The dissemination quality matrix supported the Final Report analysis by providing a more detailed indication of potential impacts of research projects. The matrix approaches were easy to apply, but time consuming. However, in order to gain a thorough understanding of the projects, their background and achievements, allocating enough time to their evaluation is necessary.

Second, the co-ordinator survey provided co-ordinators’ self-evaluation of the potential and actual impacts of the projects. The results were useful as supplements to other evaluation methods, but by themselves the risk of bias in co-ordinator responses, as always in self-evaluations, is present. The lead-user interviews were found to be the most valuable source of information regarding the actual use of research results. These kinds of interviews should be promoted in the future, combined with in-depth, long-term project impact evaluations, in co-operation with technology platforms and EC officers.

The third motivation for using various methods in thematic evaluations stems from the different time perspectives of the expected impacts. The most typical immediate impacts of all projects were publications and networking. If we exclude those, IndCo projects were more likely to produce immediate results (e.g. patents and prototypes) than SuD and CPP projects, which focused more on intermediate impacts, like strengthened expertise, public discourse and support for decision making and strategy development (see also Fig. 1). The ultimate impacts in all themes were very difficult to evaluate because of the long-term perspective (10 years or more). In addition, the different target areas of thematic impacts would require different approaches.

The main difficulty encountered during the METRONOME methodological development was the availability of project result (e.g. Final Report) data. A structured, up-to-date FP project result database that is ready and available for the evaluators would enable more reliable, less time-consuming and less costly FP impact evaluations. Other major difficulties identified during the methodological development were the relatively low response rate in co-ordinator survey and interpretation of the multi-level objective and target structures of the FPs as the basis for evaluation. In order to avoid missing or misinterpreted objectives and targets, input of strategic research objectives and targets from official EC data sources should be ready and available to the evaluators. As regards the surveys, sending the questionnaires officially by EC bodies could improve the response rate; the responses could even be demanded as a part of project proceedings.

In our view, METRONOME evaluation presents only a first phase in a FP impact evaluation process. As it often takes a long time for project impacts to materialise, only a repeated (and simultaneously elaborated) evaluation process can provide more detailed analysis of project or programme impacts. Further, and as a complement to the former, more emphasis and resources are needed in integrating such future elements (formative evaluation) into evaluation methodologies that can better support strategic research and policy planning (including WP objective setting), in the changing European transport and research environment.

6 Conclusions and future research needs

Based on the testing of the METRONOME framework with a sample of 100 FP5 and FP6 projects, we can conclude the following. The evaluation methodology proved to be useful in producing information for definition of performance targets for future FPs and for new research policy objectives from the perspectives of: (1) achieving FP objectives and targets, (2) FP’s implementation and operational environment, and (3) research project outcomes and impacts. These areas represent the traditional evaluation perspectives. Further, we can claim that the framework provided information also from the new evaluation perspectives such as using complementary evaluation methods, focusing on wider perspective than just (managerial) implementation issues (e.g. WP structure analysis) and searching the alignment and mutual learning with research and policy development. Consequently, we may argue that the developed framework can be seen as a first step towards formative evaluation, the requested “strategic intelligence”, in transport research evaluations.

Testing our methodology showed that achievement of objectives in both FPs was good throughout and in some cases even very good. Potential impacts in all four impact groups (management and co-ordination, scientific, end-user, and societal) were positive. The impacts of projects in both FPs were strongest within the group of management and co-ordination. Also scientific and end-user impacts were adequate, but wider societal impacts quite modest.

To conclude, it seems that FP5 and FP6 have certainly played a significant role in the European science and technology agenda. For evaluation of the role of the FPs on the global map or their contribution to EU research competitiveness at international level, the project sample does not give a representative insight. Experience from the METRONOME evaluation methodology development and testing revealed the following future research needs in relation to FP impact evaluation and transport research in general, in the fields of IndCo, SuD and CPP.

If one looks at the potential impacts of FPs on shaping the European Research Area (ERA), the most critical issues are the availability and dissemination of FP project result data. This concerns both lower, individual project level and centralised EC level. Currently the project results are not easily available for the use of individual projects/persons or for FP evaluation purposes. Consequently, the FP output quality needs improvements both on a project level (e.g. longer supported maintenance of web sites) and community level (centralised FP project output database). Managerial incentives such as rewards and bonuses for successful projects and excellent R&D achievements by the Commission could provide here a possibility to increase the project quality in terms of project results and dissemination activities alike.

Another important issue is the lack of consistency identified between the different levels of objectives set for FP Work Programmes. Only a few of the evaluated projects met their own strategic objectives, WP objectives on two levels and relevant European policy objectives. In order to clarify the FP future evaluation in terms of objective achievements, the consistency of the WP objective structure should be increased. In addition, the supporting role of current and future FP evaluation methodologies in WP objective/target setting should be analysed carefully and methodologies developed further.

Other aspects identified relevant for formative future FP evaluations were the following. First, close co-operation with technology platforms and EC officials in project evaluations might result in a more comprehensive and detailed view of project achievements and enhance the uptake of evaluation results. Second, and related to the former, investigating the follow-up research project paths that certain (groups of) projects have evoked might lead to a detailed understanding of the intermediate or even ultimate impacts of FP projects on a certain field. Third, including transport projects commissioned under other programmes than transport (e.g. Information Society, Environment and Security) into the evaluation could provide a more comprehensive view of the impacts of FP transport research. Finally, finding the right time for FP evaluation is always difficult. In our case, for example, FP5 and FP6 stand on a different line in the evaluation because of the temporal aspects. Later implementation of FP6 might have evoked, depending on the circumstances, more intense (positive or negative) responses in the surveys than the more distant FP5.