Introduction

It is well known that the average number of citations per publication varies significantly across scientific fields. Of course, the average number of citations per publication also varies across publications of different ages. That is, older publications on average have more citations than newer ones. Due to these effects, citation counts of publications published in different fields or in different years cannot be directly compared with each other.

It is generally agreed that in citation-based research performance evaluations one needs to control for the field and the year in which a publication was published. In performance evaluation studies, our institute, the Centre for Science and Technology Studies (CWTS) of Leiden University, uses a standard set of bibliometric indicators (Van Raan 2005). Our best-known indicator, which we often refer to as the crown indicator, relies on a normalization mechanism that aims to correct for the field and the year in which a publication was published.Footnote 1 An indicator similar to the crown indicator is used by the Centre for R&D Monitoring (ECOOM) in Leuven, Belgium. ECOOM calls its indicator the normalized mean citation rate (e.g., Glänzel et al. 2009).

The normalization mechanism of the crown indicator basically works as follows. Given a set of publications, we count for each publication the number of citations it has received. We also determine for each publication its expected number of citations. The expected number of citations of a publication equals the average number of citations of all publications of the same document type (i.e., article, letter, or review) published in the same field and in the same year. To obtain the crown indicator, we divide the sum of the actual number of citations of all publications by the sum of the expected number of citations of all publications.

As an alternative to the above normalization mechanism, one could take the following approach. One first calculates for each publication the ratio of its actual number of citations and its expected number of citations, and one then takes the average of the ratios that one has obtained. An indicator that corrects for field differences using this alternative normalization mechanism was introduced by Lundberg (2007). He called his indicator the item-oriented field-normalized citation score average. More recently, Opthof and Leydesdorff (2010) argued in favor of the alternative normalization mechanism. Their paper has been the starting point of a debate in the literature. A reply to Opthof and Leydesdorff was given by CWTS (Van Raan et al. 2010). Other contributions to the discussion were made by Bornmann (2010), Bornmann and Mutz (2011), Gingras and Larivière (2011), Leydesdorff and Opthof (2010, 2011), Moed (2010), and Spaan (2010). Indicators that rely on the alternative normalization mechanism are being used by various institutes, among which Karolinska Institute in Sweden (Rehn and Kronman 2008), Science-Metrix in the US and Canada (e.g., Campbell et al. 2008, p. 12), the SCImago research group in Spain (SCImago Research Group 2009), and Wageningen University in the Netherlands (Van Veller et al. 2009). The alternative mechanism is also employed in studies by Colliander and Ahlgren (2011) and Sandström (2009, pp. 33–34).

In a recent paper (Waltman et al. 2011), we have presented a theoretical comparison between the normalization mechanism of the crown indicator and the alternative normalization mechanism advocated by Lundberg (2007) and Opthof and Leydesdorff (2010). The main conclusion that we have reached is that, at least for the purpose of correcting for the field in which a publication was published, the alternative mechanism has more satisfactory properties than the mechanism of the crown indicator. In particular, the alternative mechanism weighs all publications equally while the mechanism of the crown indicator gives more weight to publications from fields with a high expected number of citations. The alternative mechanism also has a so-called consistency property. Basically, this property ensures that the ranking of two units relative to each other does not change when both units make the same progress in terms of publications and citations. The normalization mechanism of the crown indicator does not have this important property.

At CWTS, we are currently exploring a new crown indicator, in which we use the alternative normalization mechanism. In this paper, we perform an empirical comparison between on the one hand the normalization mechanism of our current crown indicator and on the other hand the alternative normalization mechanism of the new crown indicator that we are exploring. The comparison that we perform provides a detailed empirical illustration of various issues discussed in the indicator debate initiated by Opthof and Leydesdorff (2010). Our focus in this paper is on the problem of correcting for the field and the year in which a publication was published. We do not consider the problem of correcting for a publication’s document type. We study four aggregation levels at which bibliometric indicators can be calculated, namely the level of research groups, the level of research institutions, the level of countries, and the level of journals. We pay special attention to the way in which recent publications are handled when the alternative normalization mechanism is used. Finally, we want to emphasize that this is an empirical paper. It is not our aim to argue on theoretical grounds in favor of one of the two normalization mechanisms. For a theoretical discussion of the two normalization mechanisms, we refer to our earlier work (Waltman et al. 2011).

Definitions of indicators

In this section, we formally define the CPP/FCSm indicator and the MNCS indicator. The CPP/FCSm indicator, where CPP and FCSm are acronyms for, respectively, citations per publication and mean field citation score, has been used as the so-called crown indicator of CWTS for more than a decade. The MNCS indicator, where MNCS is an acronym for mean normalized citation score, is the new crown indicator that CWTS is currently exploring.

Consider a set of n publications, denoted by 1 …, n. Let c i denote the number of citations of publication i, and let e i denote the expected number of citations of publication i given the field and the year in which publication i was published. In other words, e i equals the average number of citations of all publications published in the same field and in the same year as publication i. The field in which a publication was published can be defined in many different ways. At CWTS, we normally define fields based on subject categories in the Web of Science database. The CPP/FCSm indicator is defined as

$$ {\text{CPP/FCSm}} = {\frac{{{{\sum\nolimits_{i = 1}^{n} {c_{i} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i = 1}^{n} {c_{i} } } n}} \right. \kern-\nulldelimiterspace} n}}}{{{{\sum\nolimits_{i = 1}^{n} {e_{i} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i = 1}^{n} {e_{i} } } n}} \right. \kern-\nulldelimiterspace} n}}}} = {\frac{{\sum\nolimits_{i = 1}^{n} {c_{i} } }}{{\sum\nolimits_{i = 1}^{n} {e_{i} } }}} .$$
(1)

The CPP/FCSm indicator was introduced by De Bruin et al. (1993) and Moed et al. (1995). A similar indicator, the normalized mean citation rate, was introduced somewhat earlier by Braun and Glänzel (1990).Footnote 2 The normalization mechanism of the CPP/FCSm indicator goes back to Schubert and Braun (1986) and Vinkler (1986). Schubert and Braun employed the mechanism for normalization at the level of journals, while Vinkler employed it for normalization at the level of fields. For a discussion of the conceptual foundation of the CPP/FCSm indicator, we refer to Moed (2010).

We now turn to the MNCS indicator (Waltman et al. 2011). This indicator is defined as

$$ {\text{MNCS}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {{\frac{{c_{i} }}{{e_{i} }}}} .$$
(2)

The MNCS indicator is similar to the item-oriented field-normalized citation score average indicator introduced by Lundberg (2007). The normalization mechanism of the MNCS indicator is also applied in the relative paper citation rate indicator discussed by Vinkler (1996). The difference between the indicators of Lundberg and Vinkler is that Lundberg’s indicator normalizes at the level of fields while Vinkler’s indicator normalizes at the level of journals.Footnote 3 Comparing Eqs. 1 and 2, it can be seen that the CPP/FCSm indicator normalizes by calculating a ratio of averages while the MNCS indicator normalizes by calculating an average of ratios.Footnote 4

There is an interesting relation between the CPP/FCSm indicator and the MNCS indicator. It turns out that the CPP/FCSm indicator is a kind of weighted version of the MNCS indicator (Waltman et al. 2011). This can be seen by rewriting Eq. 1 as

$$ {\text{CPP/FCSm}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {w_{i} {\frac{{c_{i} }}{{e_{i} }}}} $$
(3)

where w i is given by

$$ w_{i} = {\frac{{e_{i} }}{{{{\sum\nolimits_{j = 1}^{n} {e_{j} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{j = 1}^{n} {e_{j} } } n}} \right. \kern-\nulldelimiterspace} n}}}} .$$
(4)

It follows from Eqs. 3 and 4 that, like the MNCS indicator, the CPP/FCSm indicator can be written as an average of ratios. However, unlike the MNCS indicator, the CPP/FCSm indicator does not weigh all ratios equally. Instead, it gives more weight to ratios corresponding with publications that have a higher expected number of citations. In other words, publications from fields with a high average number of citations per publication have more weight in the calculation of the CPP/FCSm indicator than publications from fields with a low average number of citations per publication. Similarly, older publications have more weight in the calculation of the CPP/FCSm indicator than more recent publications.

How to handle recent publications?

We now consider in more detail the way in which recent publications are handled in our indicators of interest. As indicated by Eqs. 3 and 4, the CPP/FCSm indicator weighs publications proportionally to their expected number of citations. Recent publications tend to have a low expected number of citations, and their effect in the calculation of the CPP/FCSm indicator therefore tends to be small. This is different in the case of the MNCS indicator. Unlike the CPP/FCSm indicator, the MNCS indicator weighs all publications equally. Because of this, recent publications have an equally strong effect in the calculation of the MNCS indicator as older publications.

Weighing all publications equally seems very natural and has theoretical advantages (Waltman et al. 2011). However, it also has a disadvantage. Recent publications have not had much time to earn citations, and their current number of citations therefore need not be a very accurate indicator of their long-run impact. To illustrate this issue, we look at some empirical data.

Our analysis is based on the Web of Science database. We selected seven subject categories in this database. We interpret these subject categories as scientific fields. The selected subject categories are listed in the first column of Table 1. For each of the selected subject categories, we identified all publications of the document types article and review published in 1999 in journals belonging to the subject category. For each of the identified publications, we counted the number of times the publication had been cited by the end of each year between 1999 and 2008. Author self-citations are not included in our citation counts. For each subject category, the number of identified publications is listed in the second column of Table 1. Average citation counts of the identified publications are reported in the remaining columns of the table.

Table 1 Average citation counts of publications published in 1999 in seven subject categories

The citation counts in Table 1 show large differences among fields. Biochemistry & molecular biology has the highest citation counts, and Mathematics has the lowest. The difference is roughly one order of magnitude. This difference clearly indicates the importance of correcting for the field in which a publication was published. It can further be seen in Table 1 that during the first 10 years after a publication was published citation counts on average increase approximately linearly with time.

As shown in the third column of Table 1, publications receive almost no citations in the year in which they were published. This is not surprising. Citing publications need to be written, reviewed, revised, and copyedited, which even under the most favorable conditions takes at least several months. In addition, some journals have a substantial backlog of manuscripts waiting to be published. This also delays the citation process. For these reasons, it is unlikely that publications receive more than a few citations in the year in which they were published.Footnote 5 This is especially true for publications published towards the end of the year. Notice in Table 1 that in some fields, in particular in Mathematics, publications are unlikely to be cited not only in the year in which they were published but also in the next year.

How well does the number of citations of a publication 1 or 2 years after the publication appeared predict the number of citations of the publication in the medium or long-run, say, after 5 or 10 years? In Table 2, we report for any 2 years y 1 and y 2, with y 1 and y 2 between 1999 and 2008, the Pearson correlation between the number of citations a publication has received by the end of year y 1 and the number of citations a publication has received by the end of year y 2. The correlations in the upper right part of the table were calculated for publications published in 1999 in the subject category Biochemistry & molecular biology. The correlations in the lower left part of the table were calculated for publications published in 1999 in the subject category Mathematics.

Table 2 Pearson correlations between the number of citations a publication has received by the end of one year and the number of citations a publication has received by the end of another year

As can be seen in Table 2, correlations between short-run citation counts and long-run citation counts can be quite weak. In the case of publications in Mathematics published in 1999, the correlation between the number of citations received by the end of 1999 and the number of citations received by the end of 2008 equals just 0.25. The correlation between the number of citations received by the end of 2000 and the number of citations received by the end of 2008 equals 0.59, which is still only a very moderate correlation. Of the seven subject categories that we have selected, Biochemistry & molecular biology has the strongest correlations between short-run citation counts and long-run citation counts. This is to be expected, since Biochemistry & molecular biology also has the highest citation counts. However, even in the case of publications in Biochemistry & molecular biology, the correlation between the number of citations received by the end of 1999 and the number of citations received by the end of 2008 is rather moderate, with a value of 0.55.

Based on Tables 1 and 2, we conclude that in the calculation of the MNCS indicator recent publications need special attention. These publications have low citation counts (Table 1), and because of this their long-run impact cannot be predicted very well (Table 2). This is not a big problem in the case of the CPP/FCSm indicator, since this indicator gives less weight to recent publications than to older ones. The MNCS indicator, however, weighs all publications equally, and recent publications may then introduce a quite significant amount of noise in the indicator. Especially when the MNCS indicator is calculated at lower aggregation levels (e.g., at the level of research groups or individual researchers), where only a limited number of publications are available, this can be a serious problem. To alleviate this problem, one may consider leaving out the most recent publications in the calculation of the MNCS indicator. For example, all publications that have had less than 1 year to earn citations could be left out. In this way, one loses some relevant information, but one also gets rid of a lot of noise.

Empirical comparison

In this section, we present an empirical comparison between the CPP/FCSm indicator and the MNCS indicator. We distinguish between two variants of the MNCS indicator. In one variant, referred to as the MNCS1 indicator, all publications are taken into consideration. In the other variant, referred to as the MNCS2 indicator, publications that have had less than 1 year to earn citations are left out.

We study four aggregation levels at which bibliometric indicators can be calculated, namely the level of research groups, the level of research institutions, the level of countries, and the level of journals. We do not consider the level of individual researchers. An analysis at this level can be found elsewhere (Van Raan et al. 2010). We use the following four data sets:

  • Research groups. Chemistry and chemical engineering research groups in the Netherlands. This data set has been employed in a performance evaluation study for the Association of Universities in the Netherlands (VSNU 2002).

  • Research institutions. The 365 universities with the largest number of publications in the Web of Science database.

  • Countries. The 58 countries with the largest number of publications in the Web of Science database.

  • Journals. All journals in the Web of Science database except arts and humanities journals.

The main characteristics of the data sets are listed in Table 3.

Table 3 Characteristics of the data sets used to compare the CPP/FCSm indicator and the MNCS indicator

The comparison between the CPP/FCSm indicator and the MNCS indicator was performed as follows. For each research group, research institution, country, or journal, we retrieved from the Web of Science database all publications of the document types article, note, and review published in the relevant time period specified in Table 3.Footnote 6 Publications in the arts and humanities were left out of the analysis. This was done because these publications tend to have very low citation counts, which makes the use of citation-based performance indicators problematic. We counted citations until the end of the relevant time period.Footnote 7 Author self-citations were ignored. In the calculation of the indicators, we normalized for the field and the year in which a publication was published. We did not normalize for a publication’s document type. Fields were defined by Web of Science subject categories. As mentioned earlier, in the MNCS2 indicator, publications that have had less than 1 year to earn citations are left out. In the other two indicators, all publications are taken into consideration.

For each of the four data sets that we use, Pearson and Spearman correlations between the CPP/FCSm indicator, the MNCS1 indicator, and the MNCS2 indicator are reported in Table 4. The Pearson correlation measures to what degree two indicators are linearly related. The Spearman correlation, on the other hand, measures to what degree two indicators are monotonically related (i.e., to what degree two indicators yield the same ranking of items). Scatter plots of the relations between the indicators are shown in Figs. 1, 2, 3, 4 and 5. Items with no more than 50 publications (excluding publications that have had less than 1 year to earn citations) are indicated by red squares in the scatter plots. Items with more than 50 publications are indicated by blue circles. In each scatter plot, a 45° line through the origin has been drawn. The closer items are located to this line, the stronger the relation between two indicators.

Table 4 Pearson and Spearman correlations between the CPP/FCSm indicator, the MNCS1 indicator, and the MNCS2 indicator
Fig. 1
figure 1

Relation between the CPP/FCSm indicator and the MNCS1 and MNCS2 indicators for the research groups data set

Fig. 2
figure 2

Relation between the CPP/FCSm indicator and the MNCS1 and MNCS2 indicators for the research institutions data set

Fig. 3
figure 3

Relation between the CPP/FCSm indicator and the MNCS1 and MNCS2 indicators for the countries data set

Fig. 4
figure 4

Relation between the CPP/FCSm indicator and the MNCS1 and MNCS2 indicators for the journals data set

Fig. 5
figure 5

Relation between the CPP/FCSm indicator and the MNCS1 and MNCS2 indicators for the journals data set. Only journals with a CPP/FCSm score and an MNCS1 or MNCS2 score below 2.5 are shown

We first consider the research groups data set. For this data set, we observe a moderately strong relation between the CPP/FCSm indicator and the MNCS1 indicator (see Fig. 1, left panel). For most research groups, the difference between the CPP/FCSm score and the MNCS1 score is not very large. However, there are a number of research groups for which the MNCS1 score is much higher or much lower than the CPP/FCSm score. The relation between the CPP/FCSm indicator and the MNCS2 indicator is considerably stronger (see Fig. 1, right panel). There are only a small number of research groups for which the CPP/FCSm score and the MNCS2 score really differ significantly from each other.

The three research groups for which the difference between the CPP/FCSm score and the MNCS2 score is largest have been marked with the letters A, B, and C in the right panel of Fig. 1. Let us consider these research groups in more detail. Research group A has only 15 publications. For each of these publications, we report in Table 5 the publication year, the number of citations, the expected number of citations,Footnote 8 and the normalized citation score. The normalized citation score of a publication is defined as the ratio of the actual and the expected number of citations of the publication. Why is the CPP/FCSm score of research group A so much lower than the MNCS2 score of this research group? As can be seen in Table 5, the three publications of research group A with the highest normalized citation score were all published in 1999, which is second-last year of the analysis. These publications have a large effect on the MNCS2 score of research group A.Footnote 9 Their effect on the CPP/FCSm score of research group A is much smaller. This is because, as discussed earlier, recent publications have less weight in the CPP/FCSm indicator than in the MNCS2 indicator. This explains why the CPP/FCSm score of research group A is much lower than the MNCS2 score. Research groups B and C have more publications than research group A (respectively 42 and 165), but the explanation for the difference between the CPP/FCSm score and the MNCS2 score is similar. Like research group A, research group B has a number of recent publications with a high normalized citation score. Because of this, the MNCS2 score of research group B is much higher than the CPP/FCSm score. Research group C has two very highly cited publications in 1991, the first year of the analysis. These publications have more weight in the CPP/FCSm indicator than in the MNCS2 indicator, which explains the difference between the CPP/FCSm score and the MNCS2 score of research group C.

Table 5 Publication year, number of citations, expected number of citations, and normalized citation score of the publications of research group A

We now turn to the research institutions data set. For this data set, we observe a very strong relation between on the one hand the CPP/FCSm indicator and on the other hand the MNCS1 indicator and the MNCS2 indicator (see Fig. 2). The relation is approximately equally strong for both MNCS variants. As can be seen in the left panel of Fig. 2, there is one university for which the MNCS1 score (1.66) is much higher than the CPP/FCSm score (1.06). It turns out that in 2008 this university, the University of Göttingen, published an article that by the end of 2008 had already been cited 3489 times.Footnote 10 Since this is a very recent article, it has much more weight in the MNCS1 indicator than in the CPP/FCSm indicator. This explains the very different CPP/FCSm and MNCS1 scores of the university. Notice that in the MNCS2 indicator articles published in 2008 are not taken into consideration. Because of this, there is no substantial difference between the CPP/FCSm score (1.06) and the MNCS2 score (1.10) of the university.

The results obtained for the countries data set are similar to those obtained for the research institutions data set. We again observe a very strong relation between the CPP/FCSm indicator and the two MNCS variants (see Fig. 3), and again the relation is approximately equally strong for both MNCS variants. A striking observation is that there are almost no countries for which the MNCS1 and MNCS2 scores are lower than the CPP/FCSm score. We currently do not have an explanation for this observation. In Table 6, we list the ten highest-ranked countries according to each of the three indicators that we study. As can be seen, the three indicators yield very similar results.

Table 6 The ten highest-ranked countries according to the CPP/FCSm indicator, the MNCS1 indicator, and the MNCS2 indicator

Finally, we turn to the journals data set. For a large majority of the journals, we observe a strong relation between the CPP/FCSm indicator and the MNCS1 indicator (see the left panels of Figs. 4, 5).Footnote 11 However, there are also a substantial number of journals for which the MNCS1 score is much higher or much lower than the CPP/FCSm score. Comparing the CPP/FCSm indicator with the MNCS2 indicator, we observe much less journals with largely different scores (see the right panels of Figs. 4, 5). Hence, the CPP/FCSm indicator has a considerably stronger relation with the MNCS2 indicator than with the MNCS1 indicator. This is similar to what we found for the research groups data set. Notice that even when CPP/FCSm scores are compared with MNCS2 scores, there are a number of journals for which rather large differences can be observed. However, given that overall we have more than 8,000 journals, these journals constitute a small minority of exceptional cases.Footnote 12

Conclusions

We have presented an empirical comparison between two normalization mechanisms for citation-based indicators of research performance. One normalization mechanism is implemented in the CPP/FCSm indicator, which is the current so-called crown indicator of CWTS. The other normalization mechanism is implemented in the MNCS indicator, which is the new crown indicator that CWTS is currently exploring. The use of the latter normalization mechanism was advocated by Lundberg (2007) and Opthof and Leydesdorff (2010), and in a recent theoretical paper (Waltman et al. 2011) we have also argued in favor of this mechanism. Our empirical results indicate that at high aggregation levels, such as at the level of large research institutions or at the level of countries, the differences between the CPP/FCSm indicator and the MNCS indicator are very small. At lower aggregation levels, such as at the level of research groups or at the level of journals, the differences between the two indicators are somewhat larger. Hence, at lower aggregation levels, the choice between the two indicators is not only of theoretical interest but also has a significant practical relevance.

We have also pointed out that recent publications need special attention in the calculation of the MNCS indicator. These publications have low citation counts, and because of this their long-run impact cannot be predicted very well. Since the MNCS indicator gives the same weight to recent publications as to older ones, recent publications may introduce a significant amount of noise in this indicator. To alleviate this problem, one may consider leaving out the most recent publications in the calculation of the indicator. In our empirical analysis, we have examined the effect of leaving out publications that have had less than 1 year to earn citations. At lower aggregation levels, the effect turns out to be quite substantial. In particular, leaving out the most recent publications in the calculation of the MNCS indicator turns out to lead to a stronger relation between the CPP/FCSm indicator and the MNCS indicator. This suggests that differences between the CPP/FCSm indicator and the MNCS indicator may be partly due to noise introduced in the MNCS indicator by recent publications.