1 Introduction

Arts and science have more in common than at first sight [6]. One shared feature is that their significance cannot be reduced to a simple number. This is what in principle occurs when citations of scientists, groups, departments or whole universities or even countries are counted, either as a total or expressed per paper. Older [15] and more recent efforts [30] to develop relevant numerical parameters as one of the components for the assessment of scientific quality has provoked vigorous debate, in particular, when such methods were applied at the level of individual scientists [1, 7, 12, 17, 1921, 26, 27, 29, 31]. At the interface between society, politics and science there is a need by some for ‘objective criteria’. However, it is strange that it has not been investigated whether these can be defined. The debate on the technicalities of how to interpret differences in the citation frequency of scientific work as a parameter of scientific quality ignores one pivotal issue. Is it at all possible to use citation data in assessment of the quality of science? And if the answer would be ‘no’, what would be the alternative for counting these citations? Even when there would be no alternative for quality assessment other than by reading and studying by experts, there remains inevitable bias because of personal relationships and conflicts of interest, which are related to the choice of individuals in a peer review (site visit) committee [9]. The question was ‘is it at all possible to use citation data in assessment of the quality of science’. And if the answer would be ‘yes’, how can that be done in fair way. How should one relate obtained citations to the papers published by scientists and groups when there may be differences between disciplines.

Three factors influence the number of citations that can be obtained by a scientific paper. (1) The journal of publication. One might argue that the peer review system, despite its all well known shortcomings [23, 24], including geographically oriented bias [22], is a more or less safe goalkeeper of the system. (2) The number of references that are normally included in the citing papers. The differences between fields as mathematics and, f.e., biochemistry, are substantial. Fractional counting of such references has been suggested as a solution for this problem [13], and the large difference between impact factors of journals in different scientific fields indeed tends to disappear almost completely [11]. (3) A completely ignored third factor is the number of scientists active in the same (sub)field. It is obvious that it is impossible to retrieve many citations when there are relatively few colleagues working on the same topic. One of the scarce pieces of circumstantial evidence for the presumption that the total number of scientists in a subfield is an important determinant of the number of citations that can be obtained by an individual paper/scientist is that impact factors of number 1 journals in 69 scientific categories of the Science Citation Index are positively correlated with the number of journals (active scientists?) in these categories [16]. Within the category ‘cardiology & cardiovascular systems’ of the Web of Science of Thomson Reuters, but also within specific cardiovascular journals, it can be anticipated that there is heterogeneity between subfields. This might imply that scientists active in more basic fields can obtain different numbers of citations than more clinically oriented scientists, but also that within clinical cardiology some fields may acquire more citations than others. For example, different numbers of scientists are active in subfields as ‘atherosclerosis’, ‘arrhythmias’ or ‘congenital heart disease’.

In this brief study, assessment of heterogeneity of citation of subfields is performed within fields that are considered à priori as homogeneous. To this end part of the contents of Circulation as published in 1998 is considered and it is analyzed how individual papers were cited between 1998 and 2006. Also, citation of papers in three clinical cardiovascular journals and three basic cardiovascular journals are compared. First, the results of these analyses will be described before the factors that influence citation are further discussed.

2 Datasets

Two datasets were explored.

Set 1 concerns papers published by Circulation, the Journal of the American College of Cardiology, the European Heart Journal (three clinical cardiovascular journals), and Circulation Research, Cardiovascular Research and the Journal of Molecular and Cellular Cardiology (three basic cardiovascular journals) during 1997–2006. These were the journals with the highest impact factor within the category “Cardiac and cardiovascular system” in the Journals Citation Reports, a product of Thomson Reuters, publishing the impact factors of scientific journals on a yearly basis. Citation of papers by these six journals was analyzed with respect to most frequent citation, average citation and most frequent relative to average citation using Journal Citation Reports and the Web of Science of Thomson Reuters. In this set, differences in citation between papers published by clinical versus basic cardiovascular journals are explored.

Set 2 are papers published by Circulation in 1998. Citation of these papers was analyzed on a per paper basis using the Web of Science of Thomson Reuters. In this set, differences between citations of clinically oriented versus basic science oriented papers are addressed, but now published by one and the same journal. In addition, the topics covered by papers most and least frequently cited within the clinical and basic science contents of Circulation in 1998 were assessed.

2.1 Set 1: citation of clinical versus basic science papers in the top leading six cardiovascular journals

In this set of data, the aim is simply to compare citation of clinically oriented cardiovascular papers in the three clinical top journals with citation of basic oriented cardiovascular papers in the three basic top journals. Thus, Fig. 1 shows the number of citations of the most frequently cited paper published by the clinical top journals Circulation, Journal of the American College of Cardiology and European Heart Journal (filled symbols) and those of the basic science top journals Circulation Research, Cardiovascular Research and Journal of Molecular and Cellular Cardiology (open symbols) for each of the publication years 1997 till 2006 as they were obtained during a citation window from 1997 to 2006. The publication year is on the abscissa, and the absolute number of citations is on the ordinate. Thus, papers published in 2006 had a citation window of only 1 year, whereas papers published in 1997 had a citation window of 10 years. The number of citations was higher for the clinical journals. The window of 1997–2006 is identical to that used by the Center for Science and Technology Studies (CWTS, Leiden, The Netherlands) for a bibliometrical analysis of the output of principal investigators (PIs) of the Academic Medical Center (AMC) in Amsterdam ([5]; vide infra). With the exception of 1997 and 1999, when the basic science journal Circulation Research published the most frequently cited paper, all number 1 and 2 positions were taken by papers published by a clinical journal. In 5 of the 10 years, the first three positions were taken by a clinical journal.

Fig. 1
figure 1

Citation from 1997 till 2006 of the most cited papers published by Circulation, the Journal of the American College of Cardiology (JACC), the European Heart Journal (EHJ), Circulation Research (Circ Res), Cardiovascular Research (CVR) and the Journal of Molecular and Cellular Cardiology (JMCC) published in the years as indicated along the abscissa. The three clinical journals have filled symbols. The three basic journals have open symbols

Figure 2 shows that there was a huge difference in the number of citations of the most frequently cited paper in the clinical versus the basic group [373 ± 53.7 vs. 140 ± 30.4; mean ± SEM, n = 30 (10 years, 3 journals, P < 0.0005, ANOVA)]. One might argue that clinical journals often publish trials or statement type papers on public health which attract many citations. Thus, the difference might mainly apply to the two or three most frequently cited papers. To explore this in more detail, we also compared citation of the papers at the 5th percentile, that is, the paper at position 10 when a journal published 200 articles in a given year or a paper at position 30 when a journal published 600 articles. The results in the clinical and basic sets of journals were averaged. Figure 2 shows that papers at the 5th percentile were still more often cited when they were published in clinical journals compared with basic journals (92 ± 12.1 vs. 58 ± 8.8; mean ± SEM, n = 30 (10 years, 3 journals, P < 0.05).

Fig. 2
figure 2

Comparison of the most cited papers in the three clinical journals during 10 years (n = 30) with the most cited papers in the three basic journals. The same comparison is made with papers at the 5th percentile

Also when all papers were assessed, the citation of papers in the clinical journals was 34% higher than in the basic journals (30 ± 3.6 vs. 22 ± 3.3). The fact that this difference remained insignificant is no surprise, when one appreciates the large differences in the average citation amongst the three clinical journals at the one hand and amongst the three basic journals at the other hand. The ratio’s between most frequently cited and average cited papers were 12.7 ± 1.08 and 6.4 ± 0.62 in the clinical and basic journals, respectively (P < 0.0005). This implies that it is ‘easier’ to accumulate a high number of citations, in absolute but also relative sense, in clinical cardiovascular science than in basic cardiovascular science. The outcome is remarkable, because Circulation and the Journal of the American College of Cardiology publish also basic science papers, which may have ‘diluted’ the citation of their clinical papers. Obviously, a fair and acceptable system of bibliometrical analysis should appreciate these differences in citation frequency between clinical and basic cardiovascular papers/journals.

2.2 Set 2: citation of basic and clinical cardiovascular papers within one journal: the case of Circulation (1998)

The aim of this section is to compare the citation of clinically and basic oriented cardiovascular papers as they were published by one and the same journal. Circulation was chosen because (i) it has retained a leading position in the field of cardiac and vascular physiology since publication of its first issue in 1950 and (ii) it is a clinical journal that also publishes basic science papers in cardiovascular science. During 1998 Circulation published 753 ‘articles’ according to the indexing data of the Web of Science. Advantage was taken of the editorial policy of Circulation in 1998 to subdivide its contents into several categories, amongst which ‘Clinical investigation reports’ and ‘Basic science reports’. In 1998 Circulation published a total of 567 of such papers from which 381 papers appeared in the clinical category and 186 papers in the basic category. The listing of these papers in the present analysis was performed by hand. It was not possible to select them directly from the Web of Science, because the majority of these 567 papers were classified as ‘articles’, but others as ‘proceedings papers’. In reverse, there were papers amongst the 753 ‘articles’ in the Web of Science that were not published under the two mentioned categories chosen by the Editors of Circulation, but under other ones. Therefore, the data in this subsection cannot be retrieved from the Web of Science without substantial editing (relevant files are available for those interested in the technical details). All 567 individual papers were scored for the accumulated number of citations between 1998 and 2006 (within the time frame of the CWTS analysis [5] of the scientific output of the PIs of the AMC in Amsterdam, The Netherlands).

The most frequently cited paper in the clinical category was that of Laufs et al. [10] performed in human saphenous vein endothelial cells. The paper assesses the important question whether or not inhibitors of HMG CoA reductase—controlling a rate limiting step in the production of cholesterol—have an additional effect apart from reducing cholesterol. Statins are such inhibitors. They interfere with the mevalonate pathway, leading to reduction of serum cholesterol. Second, there is also an independent effect on the induction of endothelial cell nitric oxide (ecNO) synthase, which improves impaired vasodilatation. Third, restored ecNO synthase has an independent, beneficial effect on atherosclerosis.

The most frequently cited paper in the basic category was that of Wei et al. [33]. These authors measured myocardial blood flow in dogs with a technique based on continuous infusion of microbubbles. This novel technique has a potential on measuring tissue perfusion in any organ accessible by ultrasound. Figure 3 shows the citation profile for both papers. The total number of citations was 708 for Laufs et al. [10] and 506 for Wei et al. [33] between 1998 and 2006.

Fig. 3
figure 3

Citation numbers of the most cited clinical paper of Circulation published in 1998 (Laufs et al. [10]) and the most cited basic paper (Wei et al. [33]). The dashed box indicates the time window over which the citation were used in these papers and all 565 others in set 2 (see text)

Figure 4 shows the number of papers in the two categories (n = 381 and n = 186) along the abscissa, ranked from most frequently cited to not cited. The larger number of papers in the clinical category as well as the higher number of citations is obvious (compare thin and fat lines). The difference in citation between the two categories was significant (Wilcoxon signed rank test, P < 0.0005). The average citation of all 567 papers was 84 ± 3.8 (mean ± SEM) over the period 1998–2006.

Fig. 4
figure 4

Citation of 381 clinical investigation reports and 186 basic investigation reports published by Circulation in 1998 and cited between 1998 and 2006 (see text for explanation of the selection of the papers; the set cannot easily be retrieved from the Web of Science by lack of congruence between the indexing by the editor of Circulation and indexing by the Web of Science of Thomson Reuters

In Fig. 5, the 381 clinical and 186 basic papers were rescaled to a percentage scale along the abscissa. Along the ordinate, the number of citations was divided (see Fig. 4) by the average of all papers (84). Obviously, the basic papers are less frequently cited over the whole range. The vertical dashed lines indicate that from the clinical papers 38% is more often cited than the average (ratio above 1.0) and that this is pertinent to only 23% of the basic papers. The top ratio (number of citations obtained by the most frequently paper divided by the average citation of all papers) was 8.4 in the clinical category and 6.0 in the basic category.

Fig. 5
figure 5

The same data are presented as in Fig. 4. The number of papers of 381 and 186 are both rescaled to 100%. The number of citations was divided by the average citation (84) of all 567 papers. The vertical dashed lines indicate that 38% of the clinical papers were cited above average and 23% of the basic papers

The average citation was 93 ± 4.9 for the clinical papers and 66 ± 5.4 for the basic papers. The difference amounts to about 40%. Figure 6 shows that the two distributions are superimposed when the numbers of citations are divided by the averages of each of the two groups. Obviously, there is heterogeneity below the aggregation level of a scientific journal.

Fig. 6
figure 6

The same data are presented as in Fig. 5. The only difference is that the number of citations (see Fig. 4) was not divided by the average of the total of 567 papers, but by the average of the two groups. These averages were 93 and 66 citations, respectively

Although the procedure followed in Fig. 6 corrects the overall difference between the clinical and basic cardiovascular papers, it does not exclude that there are further unexplained differences within the clinical group of 381 papers and within the basic group of 186 papers.

To investigate why some papers within each of the two categories were cited more than others the key-words assigned to the papers by the authors themselves were assessed. The first ten clinical papers, C1–C10, the next ten clinical papers C11–C20 and the last ten clinical papers, C372–C381 were selected. The same procedure was followed for the basic papers leading to the series B1–B10, B11–B20 and finally B177–B186, all based on citation from 1998 till 2006. Table 1 shows that articles in the categories C1–C10 and C11–C20 had 58% of key-words in common, but C1–C10 and C372–C381 only 12%. For the basic papers categories, B1–B10 and B11–B20 shared 42% of key-words, whereas this number was only 14% for categories B1–B10 and B177–B186. This implies that within each category the topic is a determinant of the frequency of citation of cardiovascular papers. In addition, Table 1 shows that any combination between a C (clinical) group and a B (basic) group always leads to a lower percentage of shared key-words than between the categories C1–C10 and C11–C20 (58%) or B1–B10 and B11–B20 (42%).

Table 1 Comparison of the key-words of sets of ten papers taken from the series shown in Fig. 4

The implication of the data in Figs. 3, 4, 5, 6 and Table 1 is that there are (i) differences in citation frequency of clinical and basic cardiovascular papers, but (ii) also in topics between the most frequently and least frequently cited papers within these two categories. From this observation, one may infer that a journal is not a suitable reference level for citation analysis of individual papers and/or authors, let alone a set of journals, like in the case of the former ‘crown parameter’ of the CWTS [15] or its recent alternative [32]. Bornmann et al. [3] also suggested that reference sets based on Medical Subject headings (MeSHs) can deviate substantially from reference sets based on the complete contents of journals like the former and new citation indicators of the CWTS.

3 Other factors that influence citation statistics: “productivity”

The aim of this section is to discuss the difference between productivity and citation frequency. Important citation parameters do not take into account the number of papers published by a University, a department, a group or an individual. The total number of citations that a scientist can acquire depends, of course, on the number of papers he/she publishes. The number of papers that a scientist can publish depends—apart from his/her own activity—on the number of co-workers and more globally on the network in which he or she is active. It goes without saying that the number of papers written as a first author is a clear-cut indication of scientific productivity of a scientist. However, how should one deal with co-authorships and with senior authorships? Should a mid position of an author in a paper with more than 50 authors, which is not exceptional nowadays, f.e. in clinical trials, receive the same credit as a mid authorship of a paper with only three authors? To us the answer is no, but if it were yes, one would not know how to cope with this in a reasonable way. Counting of citations is easy to do, but hard to interpret as we saw in the previous section. When this would be further complicated by weighting different author contributions, even when it would very simply be restricted to the number of co-authors, it is not difficult to imagine the terrible bureaucracy that would follow (see Ref. [34]). Nevertheless, it is strange that previous [15] and current citation indicators [32] do not take into account the total number of papers published by an author. It makes no difference whether the number of papers is 10, 100 or 1000. The citation of the total body of work is just made relative to a reference set in either the same journals as the publications of the scientist under assessment or a set of journals. At present it is simply not possible to define a fair reference set.

The Hirsch index (h-index) is a very simple parameter [8] which was recently applied to 28 Dutch professors in clinical cardiology ([19]; see for a recent update [21]). The h-index very simply combines productivity with citation. When a scientist has a h-index of 50, it means that he/she has published 50 papers that were each cited 50 times or more and that the remainder of his/her papers were all cited less than 50 times. Numerous efforts have been made to improve on this parameter [2] as can be appreciated in almost each issue of the current specialized literature. When a Hirsch type of index was to be restricted to first authored papers these indices would lose a lot of inflation that is inherent to these indicators, because scientists can—and will—find a strategy to optimize their ratings. This is a danger for the meaning of authorship and thereby may flaw accountability for scientific claims. A restriction to first authored papers is a good idea when one aims at preventing double counting of productivity. Although it will not easily meet enthusiasm amongst senior scientists for obvious reasons, it would bring younger scientists in a much more favourable position.

An advantage (or disadvantage…) of the h-index is that it can only increase with time. This can be corrected by dividing it by the years of scientific activity. A h-index can be determined for authors, but also for journals or topics (‘key-words’). To give one example, if we select the six leading cardiovascular journals (described as set 1 in Sect. 2.1) and restrict the publication years to 2000–2009, one can calculate a h-index for ‘Marfan Syndrome’, a congenital developmental heart disease and for ‘Brugada Syndrome’, which is an arrhythmic disease. The result is 29 and 59, respectively. When it will be possible to construct appropriate reference levels for any citation indicator that appreciates these types of differences, there may be a future for them. If not, it may be better to stop this type of analyses, in particular, at the level of individual scientists [5, 17, 19, 21, 27, 31].

4 Quality assessment of the Academic Medical Center by the CWTS

The aim of this section is to demonstrate that citation analysis by professional/industrial parties may fall short when differences between different (sub)fields are not taken into consideration. The CWTS (Leiden, The Netherlands) recently performed an analysis of the scientific output of the PIs of the AMC in Amsterdam, The Netherlands (internal report 1997–2006 [5]; there are also 1997–2008 and 1997–2009 updates).

At the level of individual scientists, one of the applied indicators varied from 0.30 to >3.00 relative to an averaged worldwide (reference) level of 1.00. The tacit assumption underlying the analysis is that it is possible to correct for differences in citation frequency between highly specialized fields. This assumption heavily leans on previous work performed by Moed et al. at the CWTS [15]. The problem arises already at the stage of definition. A journal is a journal, but what is a ‘field’? At the operational level of the CWTS, it is considered as a set of journals which have originally been grouped by a commercial institute (Thomson Reuters, formerly ISI) in several categories.

The results of our present analyses (datasets 1 and 2, Figs. 3, 4, 5, 6 and in particular Table 1) and also our calculation of a substantially different h-index for subfields as ‘Marfan Syndrome’ and ‘Brugada Syndrome’ show that the assumption of homogeneity of the citation frequency of the topics covered by cardiovascular journals is not justified. Regardless whether one relates the citation of a specific paper to the average citation of a paper in the same journal or in a set of journals (a field or category), the basic comparator remains at the journal level. Apparently, this level lacks specificity.

In the terminology of the present paper, the author would prefer to reserve ‘field’ within the category ‘cardiac & cardiovascular system’, for ‘atherosclerosis’, or ‘atrial fibrillation’ or ‘sinus node’, but one could use ‘key-word’ or ‘topic’ as well.

5 Peer review and citation of papers

The aim of this section is to discuss the relation between peer review ratings and citation parameters. The assumption that frequent citation equals high scientific quality is based on thin ice. It is true that Nobel Prize winners are more frequently cited than other scientists and it is also true that Nobel Prize winners are not more frequently cited after than before winning the award, excluding that the prize itself increased the visibility of the laureate [4]. However, it is unknown whether the findings derived from such a special category of scientists can be translated to larger groups of more modest quality. The correlation of peer’s perceptions of the importance/relevance of the work of individual scientists with the citation frequency of their papers simply is too weak. With correlation coefficients (r) between 0.53 and 0.70 for categories as biochemistry, psychology, chemistry, physics and sociology, only 25–50% of the differences in peer judgment on individuals can be substantiated with associated citation numbers [4].

Figure 7 shows an analysis of peer reviewer’s quality assessments of 12 chemistry departments of a Dutch University and its correlation with the h-index and two citation indicators developed by the CWTS (CPP/JCSm and CPP/FCSm; both parameters have recently been adapted [32] in response to criticism [17]). The h-index was explained above; both other parameters relate citation of a set of papers to the same type of papers published in the same years in the same journals or sets of journals. The ordinate is labelled with ‘arbitrary units’. It has a different meaning for the three parameters: the h-index and two citation parameters of the CWTS. Thus, the differences between the three sets of parameters have no meaning. The data were collected from Table 1 in Ref. [30]. The original study [30] aimed at a comparison between these three citation parameters, not on the relation between peer review parameters and the three citation parameters. This small amount of data was analyzed here in an alternative way. The result is obvious. The pivotal issue is the absence of any difference between research labelled by peers as ‘good’ or ‘excellent’ for whatever citation parameter, including the h-index (see also [18]) . There is one other problem in relating peer judgment with citation parameters. This problem also applies to the judgment on grant proposals. What is the dependent and what is the independent parameter? Moreover, it appears important whether or not peers are informed by the organization asking for their advice on citation data. Even in the case that such information is NOT provided, modern retrieval tools make it possible that peers search for these data themselves and that this influences a judgment which would otherwise have been based only on reading a paper or proposal. This problem has been emphasized previously by Moed [14]. Recently, Van den Besselaar and Leydesdorff [28] pointed out that the best non-granted biomedical research proposals of ‘NWO’ (Dutch Organization for Fundamental Research) actually had higher bibliometrical parameters than the granted proposals, emphasizing again that citation data are not suitable for the distinction between what is considered as ‘good’ and ‘excellent’. Finally, Spaan [25] has paid attention to the fact that citation parameters that are related to the total years of scientific activity may be unfair to women in general and also to scientists with a ‘break’ in their career, f.e. due to a period of extensive clinical training. Also, a major move from one field to another may put creative scientists with a temporary decrease in scientific output at a disadvantage. This may not be good news for scientific innovation.

Fig. 7
figure 7

Comparison of peer esteem of 12 chemistry groups of a Dutch University with three citation indicators. There is no difference between the peer esteem ‘good’ and ‘excellent’. Data from Table 1 in Ref. [30]

6 Historical aspects

The aim of this section is to explore the meaning of citation parameters when work is compared that stems from different era’s. In the Introduction, it was mentioned that one of the parameters determining the citation of a paper is the amount of scientists active within the same topic. There is not a single citation indicator that corrects for this important parameter. It was indicated that the number of journals in a category is a vague measure for the number of scientists in the field. In Biochemistry and Molecular Biology there are 283 journals, and in Biomedical Engineering there are ‘only’ 59. The number of journals in a category is significantly correlated with the impact factor of the number 1 journal of the category [16].

Circulation is the top journal in the category ‘Cardiac & cardiovascular system’ in the Web of Science database of Thomson Reuters. Figure 8 shows the cumulative citation of papers published by Circulation during the years 1955–1995 with 10 year intervals and the year 2000 as well. In the database the type of paper ‘article’ was selected in the general search mode. Next, the cumulative number of citations was scored from 1955 till 2007. Thus, a paper published in 1955 had a citation window of 53 years, whereas a paper published in 2000 had only a citation window of 8 years. It is obvious that there are far more publishing scientists nowadays than in 1955. Notwithstanding this, it came as a surprise to us that an average article published in 2000 is already as frequently cited after 4 years as an average paper published in 1955 or 1965 after 53 or 43 years. It can also be appreciated from Fig. 8 that there was a huge increase in citations frequency between 1965 and 1975. These data are not presented to make the statement that older scientists are at a disadvantage. When the presumption is accepted that Circulation was and is the top journal in the category ‘Cardiac and cardiovascular System’ over the last 60 years, the numerical increase over the years makes clear that the number of citations that can be obtained is a function of the number of contemporary scientists. Then, it logically follows that this is also true for different topics covered by a journal. This is exactly what is shown in Figs. 3, 4, 5, 6 and Table 1.

Fig. 8
figure 8

Average citation of ‘articles’ published by Circulation in 1955, 1965, 1975, 1985, 1995 and 2000 between data of publication and 2007

7 Conclusion

There is considerable variability in numbers of citations not only between clinical and basic cardiovascular papers, but also within these two categories, that is, between different topics within clinical and basic cardiovascular science, respectively. Thereby, citation indicators that are—at first glance—sophisticated, but are based at a journal reference level, may legitimate to quality labels that are unjustified and thus unfair to individuals and also to research topics with smaller numbers of scientists.