Keywords

Introduction

Peer review is the foundation of quality assurance in scholarly research. Both for the social study of science and for research policy, it is fundamental to understand the processes determining the outcome of peer review – peer review sets the standard for good research and decides who gets tenure, and what kind of research is funded and published. The present study deals with the processes determining the outcome of peer review of grant proposals. The focus is on how such seemingly irrelevant or ‘innocent’ factors as rating scales and peer panels’ ranking methods and voting systems affect the assessments of proposals, and thus what kind of projects are funded. Do such factors influence what counts as good research? Do they influence de facto research ‘policy’?

Numerous studies of peer review focus on the reliability of, and the possible biases in, peer review, and find low degrees of agreement between referees and various kinds of bias: academic and institutional status, nationality, gender and research field of the applicant, as well as different kinds of cognitive bias, are all found to affect the outcome.Footnote 1 Other studies focus on criteria and find a common ‘language’ in evaluation of research quality, certain criteria that researchers pay attention to.Footnote 2 The combination of findings of low degrees of agreement between referees and of a common set of criteria for assessments of research quality indicates that while there is a certain set of criteria that reviewers pay attention to – more or less explicitly – these criteria are interpreted or operationalized differently by various reviewers.

There are no clear norms for assessments, and there may be a large variation in what criteria reviewers emphasize – and how they are emphasized. The determinants of peer review may in this way be accidental: for example, who reviews what research and how reviews are organized may determine outcomes, and this process may be open to various kinds of bias. As criteria have no standard operationalization or interpretation, there are ample possibilities, for instance, to choose interpretations that promote the personal favourites of the reviewers.

It should be noted that the concept of bias is seldom discussed, but interpreted in various ways. Some studies that find disagreements among reviewers interpret these as some sort of ‘cognitive particularism’ (Travis & H.M. Collins 1991), or ‘confirmatory bias’,Footnote 3 while others interpret disagreements as ‘real and legitimate differences of opinion among experts about what good science is or should be’ (Cole et al. 1981, p. 885).Footnote 4 Such divergent interpretations reveal a lack of common understanding not only of the notion of bias, but also about what are legitimate considerations when assessing research. One view may be that ‘grant applications should be judged on universalistic criteria, such as the scientific merit of the proposal’ (Travis & Collins 1991, p. 325), and that ‘school of thought’ is a particularistic criterion. This implies that peer review should use uncontroversial criteria, and not take a stand in ongoing debates. When scholars disagree, such ‘scholarly-neutral’ assessments may not be feasible.

Another point of view is to see the reviewers as central actors in the definition and redefinition of ‘good research’. In this view, low inter-reviewer agreement on a peer panel is no indication of low validity or low legitimacy of the assessments. In fact, it may indicate that the panel is highly competent because it represents a wide sample of the various views on what is good and valuable research (see Harnad 1985). Broad representation of divergent judgements and open debate about criteria and assessments are then desirable, and focussing on how various models manage these concerns is consequently important.

The problems of handling disagreements between reviewers, and the need to understand the effects of various peer review models, are increased by today’s high refusal rates of grant proposals. When a very small proportion of projects are funded, the effects of grant review as censorship against certain kinds of research (cognitive bias) or researchers (for example, institutional bias or gender bias) may be high. Organizational factors may increase or reduce the problems of cronyism and conservative assessments, as well as increasing or reducing arbitrary outcomes.

The main characteristic of peer review – that quality criteria have no standard operationalization, and that judgements depend on the ‘intimate craft knowledge’ (Ravetz 1971, p. 274) of the reviewers – is the main problem for students of peer review; biases are hard to prove for outsiders.Footnote 5 The aim of the present study is to understand and explain the decision-making processes of grant review and their policy effects. There are no attempts to identify ‘biases’ on the basis of quantitative correlation between organizational factors and outcome. The review processes of different review units are contrasted to gain insight into the factors decisive for the overall direction of review outcome in terms of policy effects, not in terms of measuring specific biases. Consequently, the study does not aim at conclusions about whether peer review is ‘reliable’ or ‘biased’, nor at defining such terms in relation to peer review. The aim is a more general understanding of the decision-making processes of grant review, and how organizational constraints influence review outcome. The constraints in focus are: review guidelines; rating scales; the review panels’ ranking methods; disciplinary versus multi-disciplinary panels; mail reviews versus panel reviews; and budgets. The kind of ‘bias’ in focus is the effect on the overall direction of the outcome, in terms of the weight put on various research policy objectives: scholarly pluralism; innovative research; the strengthening of weak research fields; geographical distribution of funds; and priority to female applicants. These effects are analysed through the weight put on such concerns in review documents, panel discussions and panels’ ranking lists.Footnote 6

Data Sources and Methods

This study includes grant reviews for The Research Council of Norway (RCN) for 1997/98 in 10 different fields: economics; history; social anthropology; philology; interdisciplinary social science and humanities; clinical medicine; pre-clinical medicine; biology; environment and development research; and mathematics. Fields were selected to include all the different review models of RCN, and a broad variety of research fields.

Data sources are (619) applications and review documents, direct observation of the panel meetings, and interviews with (25) panel members. Fieldnotes were taken at the panel meetings, and interviews were taped, transcribed and analysed with the help of the NUD*IST software for qualitative data analysis.Footnote 7 The interviews with panel members were semi-structured and dealt with the objects of distributing research grants, review criteria, the different points of view on the panel, the decision processes, the effects of panel discussions on the panel members’ assessments, views on various models for grant review, the role of the research council staff, and the relations between the panel and the research council.

One selected panel refused to be observed and was substituted by another panel (the stated reason for refusal was that they did not want their meeting disturbed by an outsider). Panel members were given general written information about the object of the project (to study the policy implications of different models of grant review).

The review documents, the fieldnotes and the interview transcripts were analysed with regard to the emphases on different criteria for assessing applications, and the arguments used for the ranking of applications (all observation, interviewing, transcriptions, categorizing and analyses were done by the author). Both for the assessment criteria and the ranking arguments, the main categories used for the analysis included (with examples of criteria/arguments in brackets):

  • the applicants’ prior merits (publications, citations, originality, solidity, experience in the research field, position/reputation of the applicant/group or institution);

  • the project descriptions (methods, clarity, originality, up-to-dateness);

  • the expected value of the projects (scholarly/scientific value, expected use/applicability for specific audiences or in general);

  • distributional policy (research field, institution, region, gender);

  • research policy objectives (building up research competence within specific fields/‘needs’ of the fields, [large] multi-disciplinary projects, international collaboration, scholarly breadth/pluralism/diversity, national importance of the field);

  • other considerations (budget and budget obligations, maximizing the panel budget, applicant’s prior/other grants).

Analysis of the Grant-Review Practices of RCN

The Various Models

The Research Council of Norway (RCN) practises several different models of grant review, and is therefore especially suited for the study of implications of different models. The present Council is a merger of the five previous Norwegian research councils (one council for basic research and four sector councils). Today’s models for reviewing grant applications are partly adopted from the old councils, partly results of reforms after the merger. The processes of allocating general grants (‘responsive mode’ funding) in four divisions were studied:Footnote 8

  • Medicine and Health Division: There were 4 medium-sized peer panels (10 members) that reviewed applications in their respective area. There was no mail review. Each of the panel members marked all applications on a fine-graded scale (1.0–4.0) and tables of these individual marks and average marks were set up before the panel meetings (available only to the chair of the panel). Panel decisions were based on discussion, average marks, and the chairman’s discretion.

  • Culture and Society Division (the social sciences and humanities): There were 15 small discipline-based peer panels (3–5 members) that reviewed applications. A review of each application, with marks on a 4-graded scale, was written by one of the panel members before the panel meeting. Advisory mail reviewers – selected by RCN-staff and panel chair in collaboration – might be used, but seldom were. Panel decisions were based on discussion, negotiation and/or majority rule.Footnote 9

  • Science and Technology Division: Here the administrative staff ranked the applications based on mail review reports. There were usually two reviewers per application, selected by RCN staff from a pool of (Scandinavian) reviewers within the field. A 5-graded scale was used, and there were extensive guidelines for reviewers. There were three multidisciplinary advisory panels that commented on the staff ’s ranking of the applications related to their area, but these panels had no concrete influence on the outcome. When ranking the applications, staff used the average marks from the (two) mail reviews and criteria for priorities given by the Division Board. When more fine-graded ranking was needed to reach a decision, staff had discretion to interpret reviews.

  • Environment and Development Division: There was one medium-sized peer panel (9 members) with representatives of a broad scale of disciplines, which reviewed all applications for general grants. There was one advisory mail review per application, selected by RCN staff. The criteria for the selection of the mail reviewers were not specified, and were unclear to the panel members, as they were not involved in this process. The responsibility for each application was divided between the panel members. The panel members’ assessments were presented orally at the panel meeting. Panel decisions were based on discussion, negotiation and/or majority rule. The mark set by the mail reviewer was altered by the panel for 51% of the applications.

In all divisions, the formal decisions on grants were taken by the Division Board.Footnote 10 The panels’ ranking of the applications, and the RCN-staff’s ranking in the case of Science and Technology, were advisory to the Board, but in reality these rankings were the final outcome. The influence of the Board was more indirect, by appointing the review panels, by allocating the budgets between panels, and by setting the review criteria and guidelines. It should be added that there are substantial differences in funding rates. In the studied units of review, from 12% to 51% of the applications were funded (see Table 13.1). Each panel is given a budget for new applications before their panel meeting. This budget might be adjusted by the Division Board as it judges the lists from the various panels, or it might be reduced or increased for all panels as, at the time of the panel meetings, the budget proposal has not yet been accepted by the Parliament.

Table 13.1 Overview of Studied Proposals and Grants (RCN 1997/98)

The main criteria for selecting the members of the review panels were research competence and coverage of the panel’s field, and a fair representation of regions and gender. Rules for handling conflicts of interest were common for all divisions: in the case of any affiliation to an application (for example, an applicant from the department of one of the panel members), the involved panel member leaves the meeting during the discussion and the ranking of the application. The formal written criteria for review varied between the divisions. Translated quotes from the guidelines are given in the Appendix (below: pp. 839–41), to illustrate the differences. In addition to these review criteria, the guidelines for review included policy directions (see the section on ‘Considerations Other than Research Quality in the Assessment’, below: pp. 828–31).

Overview of Proposals and Grants

General grants (responsive mode funding) at RCN are organized into 30 review units/fields. This study encompasses 10 of these units of review. Table 13.1 shows the amount of proposals, budget restrictions, grading, proportion of successful proposals, and proportion of successful female applicants for each of the studied units.

As illustrated in the first column of Table 13.1, there was a large variation in number of proposals between the review units. The panel for interdisciplinary social science and humanities reviewed 23 proposals for the studied year, whereas the panel for environment and development research reviewed 122 applications. Comparing the columns ‘applied sums in relation to RCN budget’ and ‘% of proposals funded’, we see that the latter is substantially higher than the former. This is because project budgets are cut before funding, and give room for funding a larger number of proposals.

The column ‘% of proposals judged as “clearly fundable”’ tells us more about the different rating scales and directions for rating, than about differences in the quality of proposals. Within the social sciences and humanities, there was a four-graded scale of which the best mark was ‘clearly fundable’. Within the sciences and within environment and development research, there was a 5-graded scale of which the two best marks were ‘clearly fundable’. Within medicine, there was a fine-graded scale (1.0–4.0) of which 1.0–1.9 was ‘clearly fundable’. Medicine differed from the other divisions in that there were no restrictions against funding a proposal that was not marked ‘clearly fundable’ (for instance, a proposal getting 2.2 might get funds). The result of the demand for the mark ‘clearly fundable’ to be funded in social science and humanities was little differentiation in marking: a large proportion of the proposals get the best mark (which was needed to be part of the priority discussion of the panels). In conclusion, the percent of proposals judged as ‘clearly fundable’ in the different review units does not tell us much about differences in quality of the proposals. For instance, the low percentages of clearly fundable proposals in medicine reflect other directions for review, and more differentiated rating.

The column ‘% of female applicants funded’ shows that proposals from female applicants had a somewhat lowwer chance of getting funded than have the proposals in general. This is most evident in anthropology, interdisciplinary social science & humanities, philology, and mathematics. It should be added that numbers of female applicants varied and were particularly small in economics (4), interdisciplinary social science/humanities (4), and mathematics (3). As the number of funded proposals also is small, statistics for several years are needed to draw any conclusions about discrimination of female applicants. Analysis of prior funding (obligations for the particular year) showed that anthropology, interdisciplinary social science/humanities and philology had a majority of grants to female applicants.

The Quality Criteria in the Assessments

There are notable differences between the RCN-divisions regarding the content of the studied reviews. With one important exception, the clearest differences in the weight given to the various quality criteria do not seem related to the differences in directions and lists of criteria given to the reviewers. At the same time there are substantial differences between the review units within the divisions – which operate under the same directions – further substantiating that directions to reviewers are of limited importance. Examples illustrating the assessments (of highly-ranked applications), and the differences within the divisions, are given in Table 13.2.Footnote 11

Table 13.2 Review criteria

The most striking differences (see Table 13.2) between the reviews of the different RCN-divisions regard the weight given to criteria related to the project description within the Social Sciences and Humanities, the weight given to criteria related to the prior merits of the applicant (and his/her group) within the Medicine and Health Division, and the special emphasis put on the centrality and the importance of the research field in the reviews within the Science and Technology Division.Footnote 12 The reviews in the Social Sciences and Humanities were more concerned about criteria related to the project description than those in the other divisions. There is nothing in the guidelines for reviews indicating such different emphases (see the Appendix). On the contrary, if any of the divisions’ guidelines may be said to put more emphases on criteria related to the project description, this would be the Medicine and Health Division, which put these criteria first and clearly set up more detailed criteria than did the Culture and Society Division. The reviews in the medical sciences, however, put less weight on the project description and more weight on prior research merits and reputation than did the reviews in the other divisions.

Reviews within Mathematics and Biology were more concerned about the centrality and the importance of the research field than were reviews in other divisions. This special emphasis can be related to a specification which is found only in the guidelines of the Science and Technology Division:

What importance does research in this area have for the development of the field (the area of research in question may no longer be of importance or, at the other end of the scale, be new and rapidly expanding)?

The first part of this question is found both in the Science and Technology guidelines and in the Environment and Development guidelines, whereas the specification in parentheses is only found in the Science and Technology guidelines. In this case then, differences (even in brackets) in guidelines had a clear impact on the content of reviews.

As shown in Table 13.2, there are also differences within divisions. Review units that operated under the same guidelines differed clearly with regard to emphases on the various quality criteria. Within Medicine and Health, more weight was placed on applicants’ prior merits in the pre-clinical panel than in the clinical panel. Within Culture and Society, there was less weight on the project description in the economics panel than in the other panels. Within Science and Technology, there were more detailed assessments of the project descriptions within biology than within mathematics.

In conclusion, weight on quality criteria differed both between and within the RCN-divisions, regardless of guidelines.

Considerations Other than Research Quality in the Assessments

When applications were given equal rating on scholarly quality, considerations related to policy objectives and distributional policy affected the ranking. As with the quality criteria, these considerations differed both within and between the divisions.

The Divisions’ Boards, to varying degrees, gave directions on distributional policy and research policy objectives to be included in the review:

  • The Board of the Medicine and Health Division gave a limited set of policy directions. These concerned the Board’s view on the priority on gender and two of the funding modes: ‘applications for postdocs are to be given priority’, ‘the number of female postdocs should be increased’, and ‘senior fellowships have low priority within the Medicine and Health Division’. In addition, research needs and priorities had their separate points in the review form: ‘Field with special national needs for new knowledge; Field with special national conditions for doing research; Field with special national needs for building up new competence; Field with good alternative funding sources’.Footnote 13

  • The guidelines within the Culture and Society Division summarized the policy priorities in this way: ‘Support research recruits in fields with a need for recruits; Support projects that lead to scholarly innovation of research fields and groups; Support projects with a potential for internationalization; Support projects that strengthen the recruitment of female researchers in fields with a low percentage of females, and projects that may lead to more females obtaining tenure’. They were also told to consider the needs for geographical distribution, for increased activity in particular fields, and for scholarly breadth versus depth. In sum, the guidelines allowed special arguments for most kinds of applications. It was still emphasized that most weight should be put on the need for research recruits.

  • In the Science and Technology Division, the administrative staff ranked applications based on mail reviews and the Division Board’s policy priorities. The Board’s priorities included internationalization, research recruits, the national importance of the research fields, scholarly diversity, as well as distribution on gender and institutions, and the applicants’ prior/other resources.Footnote 14

  • The Environment and Development panel was given no policy directions. However, the Division Board had more direct budget control over priorities, as it divided the budget into separate sums for fellowships for research recruits, for larger group projects, and for ordinary projects.

The considerations taken were only partly related to these directions. An example of implemented directions is the need for research recruits, which was given high priority within most units of review. The priority given to female applicants is an example of a priority not related to the variation in directions. Within the two divisions with general directions for priority to female applicants (the Culture and Society Division and the Science and Technology Division), explicit priority to female applicants was rare: the sample included priority to one female researcher within mathematics and a general priority to female applicants within economics. Except for the economics panel, the studied panels within the Culture and Society Division saw no need to be concerned about the gender of applicants (see the section on Overview of Proposals and Grants, above). With regard to mathematics, there were disagreements on the topic. In the staff’s ranking of applications, explicit priority was given to one of the three female applicants within mathematics. The advisory panel that commented on the ranking of proposals in mathematics wanted to give priority also to another female applicant, who was number one among those not recommended for funding. This was argued against in the administration’s recommendations, which was accepted by the Division Board:

The opinion of the administration is that an individual fellowship is a funding mode that should primarily be assessed by the quality criterion.... Based on this criterion. . . the administration maintains its initial recommendation.

When contrasting this limited concern with the funding of female applicants resulting from general directions to give priority to female applicants with what happened within divisions that did not give any general directions for priority to female applicants, we find that the situation is much the same: within Medicine, there are cases of priority to females within all kinds of applications, not only to the post-docs specified in the directions; within Environment and Development, the panel gave explicit priority to a female applicant, although the panel was given no directions about such priorities.

Priorities related to geographical distribution also differed, regardless of directions. Such priority was given within Medicine, where there were no such directions, but not within Culture and Society, that had such directions. Priorities other than those specified in the directions also include distribution over different research fields within Environment and Development, and priority to small/weak research fields and considerations of needs/quality in terms of prior funding within Medicine. For instance, research field was a major concern when the Environment and Development panel ranked applications for fellowships, and the five candidates ranked above the cut-off line were all from different disciplines: sociology, history, biology, geography and agriculture.

We will now turn to a discussion of the conditions for taking considerations other than research quality into account.

The Effects of the Various Ranking Processes

The rough-rating scales and open decision-making processes within the Culture and Society Division and the Environment and Development Division gave ample room for research policy considerations, such as scholarly pluralism and support to innovative projects. In these two divisions, the clearest examples of processes in favour of support to innovative projects were observed: enthusiastic panel members managed to change the panels’ views on projects that first were seen as too risky, peripheral or immature. In one case (in one of the Culture and Society panels), the chair and a new panel member had divergent opinions about an application. On the first day of the meeting, when all applications were graded, the chair said it should have the next best mark, which meant no grant. The new member wanted to give it the best mark, which meant it might get a grant (as it would be included in the ranking process the following day):

Chair: The applicant does not substantiate the reasons for doing what she wants to do, and the reasons are not evident.

New member: I disagree with your reading of the project description. The research of [name of another researcher] seems to show that this is well substantiated. I see this as a springy project [potential for jumping high or long].

The new member was supported by another panel member, and the application ended up with the best mark. In the ranking process the following day, the new member stated what should be the three top-ranked applications from the point of view of ‘springiness’, and managed to get the disputed application placed among the three at the top, and consequently granted (his three proposed top candidates ended up as the three top candidates).

Medicine and Health’s ranking processes, on the other hand – with a larger number of panels members all giving marks to all applications individually, a fine-graded rating scale and decisions based on the average of the individual marks – promoted more thoroughness and predictability. The room for explicit considerations other than quality review was rather modest compared to the other divisions, though larger budgets per panel secured the possibility of funding a broad spectrum of research. The decision-making processes on the two observed medical panels differed: one panel put more weight on average marks, whereas the chair’s discretion was more central in the other panel. In the panel depending most heavily on average marks, it was left to the individual panel member to adjust his/her marks after the panel discussion (and a new average was calculated if someone did). This process was more conservative with regard to letting the discussion influence the outcome than the processes on the other panel. This discussion of an application was typical:

Chair: The average mark is 2.3, the variance is generally low. [Panel member 1] gives it 2.9?

Panel member 1: The applicant has little experience. . . .

Chair: [Lists applicant’s prior merits]. I think this is a fundable project.

Panel member 2: It is somewhat incomplete.

Panel member 3: I don’t think so. . . .

Panel member 2: The design is not stringent, it is somewhat imprecise. . . .

Panel member 1: Yes, the planning is incomplete.

Chair: . . . No one changes his mark, so 2.3 is upheld.

On the other panel, the chair had the power to adjust marks after panel discussion (though in some cases the chair had to remake his decisions as panel members objected), and he also initiated proposals to move applicants up and down the ‘final’ list, so as more effectively to include various policy objectives. The example below illustrates this rôle of the chair (the first panel member was primarily responsible for the review):

Panel member 1: The applicant is well qualified. The project has some formal shortages, but it is important to establish this kind of project in Norway. I say 2.1.

Panel member 2: I gave 1.9. There are some clear frustrations in the application about the conditions for doing research. They have got unique competence and from a strategic point of view this has top priority. It is a challenging project. . . The ambitions may be too high.

Chair: The applicant is perfectionist and might manage. Do we say 2.1? The average was 2.2.

Panel member 3: At the minimum.

Chair: Then we say 2.0.

The processes in this panel gave considerably more leeway for policy and distributional concerns than those of the former panel, which relied more on the average marks. In the former panel, the initial average mark was changed for 34% of the applications, whereas in the panel where the chair used his discretion to adjust marks, the initial average mark was changed for 48% of the applications.

In the Science and Technology Division there were only two reviewers per proposal, and no panel involved in the comparisons and ranking of the proposals.Footnote 15 In general, the RCN-staff that ranked the proposals seemed more concerned about getting the ranking ‘right’ with regard to the scientific merit of the application, than the panels were (see, for example, the panel’s concern with gender priorities in mathematics, discussed above). When more applications had the same average mark, the staff looked to the content of the reviews and the graduation marks of research recruits, whereas review panels were more concerned about distributional policy.

Only 30% of the studied applications within Science and Technology got the same mark (on the scale from 1 to 5) from the two mail reviewers. This means that outcomes may depend heavily on which reviewers are picked for the particular proposal (randomness). The individual reviewer had a significant role in determining the outcome for the single applications. In this way, the composition of the pool of reviewers may be important for the scholarly pluralism in the overall outcome of review. The large number of reviewers in the pool, and the disagreements between the reviewers, indicate some scholarly pluralism in the pool, and should also give leeway for scholarly pluralism in the final outcome. On the other hand, few reviewers per application and no scholars to compare and rank the whole portfolio of applications give ample room for randomness – which may or may not be moderated by the competence, discretion and guidelines of the RCN staff who rank the proposals. Among the interviewed members of the advising panels within the division, there were some doubts about the system. The most critical expressed it like this:

Now it is one staff-member that does it all. I am convinced that we [the advising panel] are more able to do it.... We cannot have a system that relies on the [one] staff-member being good.

In conclusion, the models studied support different outcome profiles: leeway for scholarly pluralism and innovative/risky projects within the Culture and Society Division and the Environment and Development Division; thoroughness and predictability – and consequently more conservative assessments – within the Medicine and Health Division; and randomness and possibly scholarly pluralism within the Science and Technology Division. The effects of some of the organizational factors are further explained in the next section.

Important Organizational Factors Affecting Ranking

Distributional policy and research policy objectives may or may not be decisive for the outcome of grant review. A central finding is that the budget available and the rating scale both affect the degree to which such considerations are taken. Ample budgets and a rough-rating scale give much more room for policy priorities than do tight budgets and fine-rating scales. When there are funds only for a few highly-selected projects, the wide range of policy objectives that are given in (some of the RCN divisions’) guidelines are impossible to fulfil. The panels do not want policy priorities to overrule research quality assessments, and instructions to include policy concerns have little to say in such a context. There is simply ‘no room’ for distributional policy. With ample budgets, on the other hand, there is room for funding more than a small number of ‘obviously best’ applications and, with a rough-rating scale for research quality, the panel ends up with several applications with identical marks. In such a situation, the panels seemed glad to have a set of policy priorities to help reaching decisions: criteria to rank the group of identically-rated possibly-funded applications. The existence of a set of identically-rated possibly-funded proposals seemed a central condition for giving priority to research with special needs (strengthen weak fields), and for taking the distribution on various subfields into consideration (pluralism). Also, original/innovative projects that do not easily compete with projects formulated within established traditions and methods, have better chances for funding when there is room for more than the proposals getting top rating. Tight budgets and fine-rating scales, on the other hand, tend to strengthen established research and give less pluralism in funded research. The panel discussion of an unorthodox application (that did not receive any grant) within a field with high rejection rates illustrates that budgets may also be a direct argument against risk projects:

Panel member 1: I doubt this project, I don’t think it will succeed.

Panel member 2: It has got charm, it tries to do away with the force of gravity!

Chair: With a better budget we could have taken the chance on a wild card a year.

In addition to budgets and rating scales, the decision-making process itself is found to be important for the scholarly pluralism in funded research. The ranking of applications depends heavily on the method applied. Methods implying that all panel members get their favourite candidate funded secure pluralism far better than methods eliminating proposals to which a majority of panel members do not give priority (given some scholarly pluralism on the panel). On the other hand, the first set of methods let single panel members decide outcomes, and are thus open to accidental circumstances.

Table 13.3 illustrates a situation where three different methods of ranking give very different outcomes. With method 1, each panel member has as many votes as there are applications to be funded. With method 2, each panel member has one decisive vote. With method 3, applications not to be funded are excluded by majority votes. The table illustrates the logic of methods that were observed at the panel meetings when ranking applications with identical marks. In the meetings, the decision-making processes were far from as simple, explicit and structured as in Table 13.3. An example from the meetings is given in Table 13.4 – an example in which elements from various methods were combined.

Table 13.3 The results of different methods for the ranking of applications a-f, of which there is room for three within the budget
Table 13.4 Combination of elements of methods 2 and 3, and other methods, in one of the Culture and Society Panels

The situation in Table 13.3 is simplified to include only three panel members and six applications, of which only three may be funded. Only application ‘b’ is funded with all three methods of ranking:

  • Method 1: All members propose 3 candidates for funding; that is, they have 3 votes each. Application b gets 3 votes and first rank; e gets 2 votes and second rank; a, c, d and f get 1 vote each. Third rank is left to member X’s favourite a, because this member has only got in one of his ‘votes’ for rank 1 and 2, while the two other members have got two of their ‘votes’ among the first two on the funding list.

  • Method 2: All members propose 1 candidate for funding; that is, they have 1 vote each. The three applications that receive one vote are funded. These are a, b and f.

  • Method 3: Elimination. Candidates to be eliminated from funding are voted on in the order they are proposed. The outcome depends on the chosen voting order. Here the supposed voting order is f, c, a. These three are all eliminated with votes two against one, and the three applications remaining for funding are b, d and e.

A likely reaction when seeing the methods spelled out and analysed as in Table 13.3 is that serious funding agencies should and would never allow such arbitrary outcomes, and that the RCN findings must be exaggerated or exceptional. It should be noted that the table explains the underlying logic of methods that were adopted ad hoc by the observed panels. The methods had no formal status, and no stated rationale. The panels had to make a decision in one way or another, and found a way to do it. There are also more general arguments for the extent of the problem. Social choice theory has shown that there are fundamental problems in the aggregation of preferences. Voting methods do affect outcomes, and the choice between methods is problematic as there are no simple clues as to which method is the better (from the point of view of a set of fundamental requirements).Footnote 16

The implications of the method chosen may be substantial. Given that the panel members’ different rankings (at least partly) represent conflicting scholarly norms and interests, method 2 gives funding to projects scoring very differently in this regard. The favourite candidates of member X and Z are funded, although these applications are at the bottom of other panel members’ lists. If agreement on high ranking indicates that the projects are uncontroversial regarding research questions, scientific methods, and so on, while disagreement indicates controversial research and risk-projects, method 2 may fund controversial research and risk-projects, while method 3 tends to fund uncontroversial and safe projects. In this way, the panels’ choice of ranking methods has far-reaching implications on the chances for various kinds of research to be funded.

Conclusions

The most unambiguous conclusions to be drawn from this study regard the effects of guidelines, rating scales, budgets and ranking methods.

The guidelines given to the panels had little effect on the criteria they emphasized,Footnote 17 whereas mail reviewers were more consciously attempting to write reviews in accordance with the guidelines. Put more clearly, it seems that panels do as they like, whereas mail reviewers do as they are told – or, at least, mail reviewers phrase their reviews more in accordance with the guidelines, to make sure they have influence on the ranking of proposals. The criteria emphasized in the review documents, in the panel discussions and by the interviewed panel members were studied. The panel reviews within medical science were those most focused on applicants’ prior merits, whereas the panels within the social sciences and the humanities reviews were more focused on criteria related to the project description. The focuses of these panels ought to be the opposite of these, according to their guidelines. The mail reviewers within the sciences, on the other hand, wrote reviews more in accordance with their guidelines – with specific focus on the centrality and importance of the research field (for example, stating that this is outdated research, or that this research will have central future importance), which was a specific concern of the guidelines within this RCN division.

The differences in reviews found within the divisions (that is, differences within units given the same guidelines) underline the limited effects of guidelines, both for panel reviews and for mail reviews. Furthermore, geographical priority, priority to female applicants and several other policy concerns were taken by panels contrary to their guidelines. The guidelines of some divisions told panels to include such concerns, but the panels did not include them; whereas other divisions did not give any instructions on such concerns, but the panels did include them.

Whereas the guidelines, which are supposed to influence reviews, did so to a limited extent, factors that are not supposed to influence outcomes were found to be much more important. The size of the budgets and the kind of rating scale applied were found to affect such policy concerns as the funding of fields with special needs, the distribution on the various sub-fields, priority to female applicants, and geographical priority. Also, original and controversial projects seemed to have better chances with ample budgets and rough-rating scales. With a rough-rating scale for research quality, the panel ends up with several applications with identical marks, and with a good budget, the panel may fund more than a small number of ‘obviously best’ applications. Such a situation, with identically-rated projects that may get funds, was a central condition for peer panels to rank applications on the basis of policy objectives. With the opposite kind of situation, with funds only for a few highly-selected projects that were ranked on the basis of a fine-rating scale, there was no room for supplementary policy priorities. The members of the review panels did not want policy priorities to overrule the research quality assessments. Tight budgets and fine-rating scales may therefore easily strengthen established research fields, and give less pluralism in funded research.

The ranking method applied by the panel may be decisive for the outcome of review. Some methods imply that all panel members get their favourite candidate funded. If the panel members’ different ratings represent conflicting scholarly norms and interests, and there is some scholarly pluralism on the panel, such methods ensure some scholarly pluralism. This kind of method also involves more randomness as, in reality, single panel members decide outcomes. On the other hand, there are methods that eliminate proposals to which a majority of the panel members do not give priority. It is argued that these methods tend to support uncontroversial and safe projects, as agreement on high ranking of a proposal indicates that the projects are uncontroversial regarding research questions, scientific methods, and the like, while disagreement indicates controversial research and risk-projects. In this way, the panel’s choice of ranking method may have far-reaching implications on the chances for various kinds of research to be funded.

In sum, the organization of grant review is found to influence what counts as a good and relevant grant application: (1) guidelines seem to have limited effects on review panels; but (2) the review outcome is found to be highly dependent on (a) rating methods, (b) rating scales and (c) budgets. Each of these factors may have far-reaching implications on the funding chances of applications for different kinds of research. Various organizational factors also interact and reinforce each other. As candidates for further study on how the organization of grant review affects outcomes, two hypotheses – including a broad set of organizational factors – should, on the basis of the present study, be emphasized:

  • Ample budgets, rough-rating scales, heterogeneous panels and open decision-making processes give leeway for scholarly pluralism and innovative/risky projects.

  • Tight budgets, fine-rating scales, average marks and majority decisions tend to strengthen established research and give less pluralism.

Another finding is that administrative ranking gave less room for policy concerns than panel ranking, as administrative staff were more concerned about ranking on the basis of scientific merit. There is no obvious reason for this effect of administrative ranking, and further study may show that the effect of administrative ranking depends on the kind of administrative staff involved, and what instructions they are given (see Rip 1994, pp. 16-17).

It should be noted that there is an inherent tension between the different aims of research councils: good and reliable peer review on the one hand, and various policy aims on the other. Those review models that score highly on thoroughness and reliability do not score highly with regard to encouraging controversial projects or securing greater scholarly pluralism, and vice versa – leaving those trying to improve grant-review processes in a constant dilemma. Consequently, unambiguous recommendations for the design of grant review cannot be made, except that conflicting concerns should be balanced consciously.

At the beginning of this paper, two different views on peer review bias were presented, views on whether ‘school of thought’ is (or is not) a legitimate consideration when assessing research. Regardless of one’s standing on this question, one might embrace the intuitive supposition that processes properly aggregating marks on the scientific merit of the applications yield less bias than a process of discussions and negotiations.Footnote 18 The findings of this study might be used to challenge that view. Let us say we accept that the possibilities of ranking applications uniquely on the basis of neutral universalistic criteria of scientific merit are limited (which must be said to be one central implication of decades of studies of peer review), adopting the view that ‘school of thought’ is an inherent and legitimate basis of peer review, and that reviewers are central actors in the definition and redefinition of ‘good research’. This position opens up the view that peer-panel discussions and negotiations involving policy objectives may be a better way to avoid biases, than processes simply aggregating marks on scientific merit. Review panels that are concerned about the distribution of funds for research directions/‘schools of thought’, gender, institutions, position, and so on, may reduce such biases. The present study shows that peer panels are willing to take up such concerns, and in some cases do so more than do their administrative staff. However, this willingness is restricted by budgets and indirectly by rating scales. Such seemingly irrelevant factors affect the review outcome, as they decide whether panels only deal with ‘scientific merit’, or also include discussions and negotiations on distributional policy and other research policy concerns.