The integrated impact indicator revisited (I3*): a non-parametric alternative to the journal impact factor
Abstract
We propose the I3* indicator as a non-parametric alternative to the journal impact factor (JIF) and h-index. We apply I3* to more than 10,000 journals. The results can be compared with other journal metrics. I3* is a promising variant within the general scheme of non-parametric I3 indicators introduced previously: I3* provides a single metric which correlates with both impact in terms of citations (c) and output in terms of publications (p). We argue for weighting using four percentile classes: the top-1% and top-10% as excellence indicators; the top-50% and bottom-50% as output indicators. Like the h-index, which also incorporates both c and p, I3*-values are size-dependent; however, division of I3* by the number of publications (I3*/N) provides a size-independent indicator which correlates strongly with the 2- and 5-year journal impact factors (JIF2 and JIF5). Unlike the h-index, I3* correlates significantly with both the total number of citations and publications. The values of I3* and I3*/N can be statistically tested against the expectation or against one another using chi-squared tests or effect sizes. A template (in Excel) is provided online for relevant tests.
Keywords
Journal indicator Percentile Citation analysis I3* Journal impact factorIntroduction
Citations create links between publications; but to relate citations to publications as two different things, one needs a model (for example, an equation). The journal impact factor (JIF) indexes only one aspect of this relationship: citation impact. Using the h-index, papers with at least h citations are counted. One can also count papers with h^{2} or h/2 citations (Egghe 2008). This paper is based on a different and, in our opinion, more informative model: the Integrated Impact Indicator I3.
The 2-year JIF was outlined by Garfield and Sher (1963; cf. Garfield 1955; Sher and Garfield 1965) at the time of establishing the Institute for Scientific Information (ISI). JIF2 is defined as the number of citations in the current year (t) to any of a journal’s publications of the two previous years (t − 1 and t − 2), divided by the number of citable items (substantive articles, reviews, and proceedings) in the same journal in these two previous years. Although not strictly a mathematical average, JIF2 provides a functional approximation of the mean early citation rate per citable item. A JIF2 of 2.5 implies that, on average, the citable items published 1 or 2 years ago were cited two and a half times. Other JIF variants are also available; for example, JIF5 covers a 5-year window.^{1}
The central problem that led Garfield (1972, 1979) to use the JIF when developing the Science Citation Index, was the selection of journals for inclusion in this database. He argued that citation analysis provides an excellent source of information for evaluating journals. The choice of a 2-year time window was based on experiments with the Genetics Citation Index and the early Science Citation Index (Garfield 2003, at p. 364; Martyn and Gilchrist 1968). However, one possible disadvantage of the short term (2 years) could be that “the journal impact factors enter the picture when an individual’s most recent papers have not yet had time to be cited” (Garfield 2003, p. 365; cf. Archambault and Larivière 2009). Bio-medical fields have a fast-moving research front with a short citation cycle, and JIF2 may be an appropriate measure for such fields but less so for other fields (Price 1970). In the 2007 edition of Journal Citation Reports (reissued for this reason in 2009) a 5-year JIF (JIF5, considering five instead of only two publication years) was added to balance the focus on short-term citations provided by JIF2 (Jacsó 2009; cf. Frandsen and Rousseau 2005).^{2}
The skew in citation distributions provides another challenge to the evaluation (Seglen 1992, 1997). The mean of a skewed distribution provides less information than the median as a measure of central tendency. To address this problem, McAllister et al. (1983, at p. 207) proposed the use of percentiles or percentile classes as a non-parametric indicator (Narin 1987^{3}; see later: Bornmann and Mutz 2011; Tijssen et al. 2002). Using this non-parametric approach, and on the basis of a list of criteria provided by Leydesdorff et al. (2011), two of us first developed the Integrated Impact Indicator (I3) based on the integration of the quantile values attributed to each element in a distribution (Leydesdorff and Bornmann 2011).
Since I3 is based on integration, the development of I3 presents citation analysts with a construct fundamentally different from a methodology based on averages. An analogy that demonstrates the difference between integration and averaging is given by basic mechanics: the impact of two colliding bodies is determined by their combined mass and velocity, and not by the average of their velocities. So, it can be argued that the gross impact of the journal as an entity is the combined volume and citation of its contents (articles and other items); but not an average. Journals differ both in size (the number of published items) and in the skew and kurtosis of the distribution of citations across items. A useful and informative indicator for the comparison of journal influences should respond to these differences. A citation average cannot reflect the variation in both publications and citations but an indicator based on integration can do so.
One route to indexing both performance and impact via a single number has been provided by the h-index (Hirsch 2005) and its variants (e.g., Bornmann et al. 2011a, b; Egghe 2008). However, the h-index has many drawbacks, not least mathematical inconsistency (Marchant 2009; Waltman and Van Eck 2012). Furthermore, Bornmann et al. (2008) showed that the h-index is mainly determined by the number of papers (and not by citation impact). In other words, the impact dimension of a publication set may not be properly measured using the h-index. One aspect that I3 has in common with the h-index is that the focus is no longer on impact as an attribute but on the information production process (Egghe and Rousseau 1990; Ye et al. 2017). This approach could be applied not only to journals but also to other sets of documents with citations such as the research portfolios of departments or universities. In this study, however, we focus on journal indicators.
At the time of our previous paper about I3 (Leydesdorff and Bornmann 2011), we were unable to demonstrate the generic value of the non-parametric approach because of limited data access. Recently, however, the complete Web of Science became accessible under license to the Max Planck Society (Germany). This enables us to compare I3-values across the database with other journal indicators such as JIF2 and JIF5, total citations (NCit), and numbers of publications (NPub). The choice for journals as units of analysis provides us with a rich and well-studied domain.
Our approach based on percentiles can be considered as the development of “second generation indicators” for two reasons. First, we build on the first-generation approach that Garfield (1979, 2003, 2006) developed for the selection of journals. Second, the original objective of journal selection is very different from the purposes of research evaluation to which JIF has erroneously ben applied (e.g., Alberts 2013). The relevant indicators should accordingly be appropriately sophisticated.
The weighting scheme and the I3* indicator
In this study, we introduce I3*—a variant within the general I3 scheme—by proposing a weighting scheme of percentile classes. We elaborate on Bornmann and Mutz (2011) who counted six percentile classes with weights from one to six. Since that publication, however, several threads of work have clarified the position of the top-10% and top-1% categories as proxies for excellence. On the basis of this literature (e.g., Bornmann 2014), our basic assertion is that a paper in the top-1% class can be weighted at ten times the value of a paper in the top-10% class. It follows log-linearly that a top-1% paper weighs 100 times more than a paper at the bottom. This weighting scheme reflects the highly-skewed nature of citation distributions. We add, as a second assertion, a weighting to distinguish between papers in the top-50% (weight = 2) and bottom-50% (weight = 1). The dividing line between bottom-50% and top-50% is less pronounced than the line between an averagely-cited paper and an exceptionally-cited one.
As is the case for all I3 evaluations, I3* is size-dependent: it scales ceteris paribus with journal size. By dividing I3* by the number of elements of the distribution N = Σ_{i}n_{i} (of documents), a size-independent equivalent can be generated. Not surprisingly, this latter measure is highly correlated with JIF2 and JIF5. In other words, I3*_{i}/N_{i} provides the journal-specific expected I3* value of a paper published in journal i. This value can be used as a benchmark for testing whether the observed citation count for a specific paper is above or below expectation.
Note that we test expected citation rates against observed ones at the level of a sample (e.g., a journal). Consequently, our approach avoids the “ecological fallacy” of using a journal characteristic as an expected value to compare with observed values derived from the individual papers published in the respective journal (Robinson 1950; Kreft and de Leeuw 1988; cf. Waltman and Traag 2017). The observed values are not estimated on the basis of a journal characteristic, but are measured in order to inform the expectation.
Methods
Data
Data were harvested at the Max Planck Digital Library (MPDL) in-house database of the Max Planck Society during the period October 15–29, 2018. This database contains an analytically enriched copy of the Sciences Citation Index-Expanded (SCI-E), the Social Sciences Citation Index (SSCI), and the Arts and Humanities Citation Index (AHCI). Citation count data can be normalized for the Clarivate Web of Science Subject Categories (WoS Categories) and theoretically could be based on whole-number counting or fractional counting in the case of more than a single co-author. The unit of analysis in this study, however, is the individual paper to which citation counts are attributed irrespective of whether the paper is single- or multi-authored.
The (current) citation window in the in-house database was the period to the end of 2017, at the time of the data collection. We collected substantive items (articles and reviews) using the publication year 2014 with a 3-year citation window to the end of 2017. The results were checked against a similar download for the publication year 2009, that is, 5 years earlier. The year 2014 was chosen as the last year with a complete 3-year citation window at the time of this research (October–November, 2018); furthermore, the year 2009 is the first year after the update of WoS to its current version 5.
Non-normalized data
The in-house database contains many more journals than the Journal Citation Reports (JCR, which form the basis for the computation of JIF). In order to be able to compare between I3*-values and other indicators, we use only the subset of publications in the 11,761 journals contained in the JCR 2014. These journals all have JIFs and other standard indicators. Of these journals, 11,149 are unique in the SCI-E and SSCI, and the overlap between SSCI and SCI-E is 612 journals. Another 207 journals could not be matched unequivocally on the basis of journal name abbreviations in the in-house database and JCR, so that our sample is 10,942 journals. Note that we are using individual-journal attributes so that the inclusion or exclusion of a specific journal does not affect the values for the other journals under study.
We collected the data as follows. On the basis of the number of papers (articles and reviews, excluding non-academic ephemera such as editorials) in a specific year (in this case: 2014), we identified the threshold number of citations at category boundaries, e.g. the lower boundary of the 1% most-frequently cited papers. If there are, for example, a total of 100,000 papers in a year, then one thousand of them should belong to the most-cited-1% for obvious stochastic reasons. If the papers are ranked by descending citation counts, then the citation count of the 1000^{th} paper is the threshold value (Ahlgren et al. 2014). For each journal, the number of papers in this set can be counted. By counting the number of papers with a citation count exceeding this threshold value, the problem of ties is circumvented. However, there is a possibility that more than 1,000 papers may thereby be included in the top-1% because there are several papers with the same value as the threshold (in 2014, e.g., 1.03% of the papers instead of exactly 1%). The same applies to the other top-x % classes.
In summary, we harvest the top-1%, top-10%, top-50%, and bottom-50% publication scores for each journal by first determining the thresholds of these percentile classes for the entire database and, second, by counting each journal’s participation in the respective layers of the database. Using a dedicated routine, the data are organized in a relational database with JCR-2014 data. The tables resulting from the analyses can be read into standard software (e.g., Excel, SPSS) for further processing and statistical analysis.
Normalized data
Citation counts are also field-normalized in the in-house database using the WoS Categories, because citation rates differ between fields. These field-normalized scores are available at individual document level for all publications since 1980. The I3* indicator calculated with field-normalized data will be denoted as I3*F—pragmatically abbreviating I3*F(99-100, 90-10, 50-2, 0-1) in this case. Some journals are assigned to more than a single WoS category: in these instances, the journal items and their citation counts are fractionally attributed. In the case of ties at the thresholds of a top-x% class of papers (see above), the field-normalized indicators have been calculated following Waltman and Schreiber (2013). Thus, the in-house database shows whether a paper belongs to the top-1%, top-10%, or top-50% of papers in the corresponding WoS Categories. Papers at the threshold separating the top from the bottom are fractionally assigned to the top paper set.
Statistics
PLOS One data as an example of the calculation of I3*, based on non-normalized and field-normalized values
Data from the in-house database | Distinct classes | Number of papers in distinct classes | Weights | I3* and I3*F | ||||
---|---|---|---|---|---|---|---|---|
Non-normalized (a) | Field-normalized (b) | Percentile rank classes | Non-normalized (c) | Field-normalized (d) | Non-normalized (f) | Field-normalized (g) | ||
Top 1% | 91 | 14.000 | 99–100 | 91 | 14.000 | × 100 = | 9100 | 1400 |
Top 10% | 2545 | 926.821 | 90–98 | 2454 | 912.821 | × 10 = | 24,500 | 9128.21 |
Top 50% | 20,141 | 14,853.688 | 50–89 | 17,506 | 13,926.867 | × 2 = | 35,192 | 27,853.73 |
7265 | 14,247.191 | 0–49 | 9901 | 15,188.312 | × 1 = | 9901 | 15,188.31 | |
Total | 30,042 | 30,042 | 30,042 | 30,042 | 78,733 | 53,570.26 |
The maximal I3* is ((30,042 * 100) + (0 * 10) + (0 * 2) + (0 * 1) =) 3,004,200 whereby all papers in the journal would belong to the 1% most frequently cited papers in the corresponding fields. With I3* = 53,570.256, the journal reaches 1.78% of this maximum. Without field-normalization, this is 2.62%. In other words, there is ample room for improvement.^{4}
As noted, I3* can be divided by N, the number of publications (which is by definition equal to the sum of the numbers in the four percentile classes). I3*/N is based on relative frequencies, since the number in each term (n_{i}) is divided by N (= Σ_{i}n_{i}). One can expect I3*/N to no longer be size-dependent and thus to have applications different from I3*, as we shall show below. We focus on I3* in this paper; we will discuss potential applications of I3*/N in a later paper.
- 1.
total numbers of publications (NPub);
- 2.
citations (NCit);
- 3.
JIF2;
- 4.
JIF5;
- 5.
Non-normalized I3*-values (I3*);
- 6.
Field-normalized I3*-values (I3*F);
- 7.
I3*/N for the non-normalized case (I3*/N).
The results are shown as factor-plots using the first two components as x- and y-axes. This representation in a two-dimensional map provides a ready means of assessing the results visually.
We chose two components in accordance with our design, but the number of eigenvectors with a value larger than one is also two. The results indicate that the two first eigenvectors explain about 85–90% of the variance in the subsequent analyses. Since the distributions are non-normal, Spearman’s rank-order correlations are preferable to Pearson correlations.^{6} Note that the factor-analysis is based on Pearson correlations and the results are consequently, in this respect, approximations. Rotated factor matrices and the percentages of explained variance are also provided for each analysis.
Results
Full set (journal count, n = 10,942)
Rotated factor matrix of the seven indicators plotted in Fig. 1
Rotated component matrix^{a} | ||
---|---|---|
Indicator | Component | |
1 | 2 | |
I3*F | .925 | .284 |
I3* | .915 | .286 |
NPub | .870 | − .094 |
NCit | .802 | .304 |
JIF5 | .175 | .958 |
JIF2 | .188 | .957 |
I3*/N | .162 | .917 |
Spearman rank-order correlations between the variables listed in Table 3
NCit | JIF2 | JIF5 | NPub | I3* | I3*F | I3*/N | |
---|---|---|---|---|---|---|---|
NCit | 1.000 | .766** | .776** | .719** | .816** | .802** | .706** |
JIF2 | 1.000 | .924** | .444** | .668** | .638** | .882** | |
JIF5 | 1.000 | .417** | .635** | .623** | .848** | ||
NPub | 1.000 | .920** | .861** | .420** | |||
I3* | 1.000 | .940** | .697** | ||||
I3*F | 1.000 | .683** | |||||
I3*/N | 1.000 |
25 journals ranked on non-normalized I3* values (I3*), field-normalized values (I3*F), and non-normalized values (I3*/N)
Journal | I3* | Journal | I3*F | Journal | I3*/N |
---|---|---|---|---|---|
PLOS One | 78,733 | PLOS One | 53,570.26 | Nat. Rev. Drug Discov. | 90.8 |
J. Am. Chem. Soc. | 55,786 | Nature | 27,397.23 | Physiol. Rev. | 76.8 |
Nature | 52,888 | Phys. Rev. Lett. | 24,909.74 | Nat. Rev. Genet. | 72.5 |
Proc. Natl. Acad. Sci. U. S. A. | 47,041 | Adv. Mater. | 23,741.25 | Prog. Mater. Sci. | 71.9 |
Nat. Commun. | 46,762 | Nat. Commun. | 21,689.35 | Nat. Rev. Mol. Cell Biol. | 70.6 |
Science | 41,946 | Science | 21,493.38 | Nat. Rev. Cancer | 69.8 |
Angew. Chem.-Int. Edit. | 40,572 | Proc. Natl. Acad. Sci. U. S. A. | 21,204.46 | Chem. Rev. | 63.7 |
Adv. Mater. | 34,435 | J. Am. Chem. Soc. | 20,435.29 | Nat. Rev. Neurosci. | 63.1 |
Phys. Rev. Lett. | 30,549 | J. Mater. Chem. A | 18,323.90 | Nat. Rev. Immunol. | 62.1 |
ACS Nano | 29,284 | Nano Lett. | 16,905.86 | N. Engl. J. Med. | 62.0 |
J. Mater. Chem. A | 28,260 | ACS Nano | 16,608.79 | Nature | 61.4 |
Chem. Commun. | 26,209 | Phys. Rev. B | 16,176.17 | Living Rev. Relativ. | 57.7 |
Nano Lett. | 24,717 | Appl. Phys. Lett. | 16,077.51 | Chem. Soc. Rev. | 56.8 |
ACS Appl. Mater. Interfaces | 24,407 | Angew. Chem.-Int. Edit. | 15,136.02 | Lancet | 55.8 |
RSC Adv. | 24,326 | Opt. Express | 14,893.89 | Rev. Mod. Phys. | 55.7 |
Cell | 22,993 | Org. Lett. | 14,819.56 | Cell Stem Cell | 54.6 |
N. Engl. J. Med. | 21,874 | Energy Environ. Sci. | 14,565.50 | Nat. Photonics | 54.4 |
Chem. Soc. Rev. | 21,576 | RSC Adv. | 14,479.96 | Nature Genet. | 54.2 |
Sci Rep | 20,098 | Cell | 13,689.86 | Prog. Polym. Sci. | 53.5 |
Nanoscale | 19,631 | ACS Appl. Mater. Interfaces | 13,678.24 | Nat. Biotechnol. | 53.2 |
Astrophys. J. | 19,130 | Anal. Chem. | 12,548.23 | Psychol. Sci. Public Interest | 52.8 |
Phys. Rev. B | 18,831 | Phys. Chem. Chem. Phys. | 12,459.76 | Cell | 52.7 |
Chem. Rev. | 17,889 | Nanoscale | 12,418.65 | Nat. Med. | 52.5 |
Energy Environ. Sci. | 17,151 | Astrophys. J. | 12,269.03 | Nat. Mater. | 51.3 |
Phys. Rev. D | 16,893 | Chem. Soc. Rev. | 11,710.45 | Cancer Cell | 51.2 |
There may be a disciplinary interaction with normalization: field-normalization seems to affect the leading chemistry journals more than others. The Journal of the American Chemical Society (JACS), for example, holds second place on the (left-side) list of I3*, but only ninth place on I3*F. By contrast, leading physics journals seem to list higher on the normalized indicator. Perhaps, these relatively well-cited journals in chemistry have a longer-tailed citation distribution than comparable physics journals: normalization (division by N) will have a greater effect with increasing values of N. As noted, the two indicators are highly correlated overall, but the possibility of a disciplinary factor will need further elucidation in future research.
The Social Sciences Citation Index (SSCI)
Rotated factor matrix of the seven indicators plotted in Fig. 4
Indicator | Component | |
---|---|---|
1 | 2 | |
JIF2 | .922 | .260 |
JIF5 | .891 | .277 |
I3*/N | .872 | .134 |
NPub | − .032 | .938 |
I3* | .416 | .848 |
I3*F | .399 | .848 |
NCit | .519 | .626 |
Spearman rank-order correlations among the seven indicators under study for 3105 journals in the SSCI
NCit | JIF2 | JIF5 | NPub | I3* | I3*F | I3*/N | |
---|---|---|---|---|---|---|---|
NCit | 1.000 | .799** | .820** | .633** | .775** | .742** | .746** |
JIF2 | 1.000 | .881** | .412** | .644** | .626** | .846** | |
JIF5 | 1.000 | .397** | .616** | .605** | .797** | ||
NPub | 1.000 | .904** | .786** | .403** | |||
I3* | 1.000 | .893** | .704** | ||||
I3*F | 1.000 | .693** | |||||
I3*/N | 1.000 |
Rotated factor matrices for two specialist WoS categories, one each from SSCI and SCI
Library and Information Science, 83 journals | Spectroscopy, 41 journals | ||||
---|---|---|---|---|---|
Rotated component matrix^{a} | Rotated component matrix^{a} | ||||
Indicator | Component | Indicator | Component | ||
1 | 2 | 1 | 2 | ||
JIF5 | 0.959 | 0.170 | I3* | 0.982 | |
JIF2 | 0.927 | 0.215 | I3*F | 0.962 | |
I3*/N | 0.799 | 0.265 | NPub | 0.959 | − 0.113 |
NCit | 0.757 | 0.527 | NCit | 0.771 | 0.167 |
NPub | 0.968 | JIF2 | 0.987 | ||
I3* | 0.41 | 0.903 | JIF5 | 0.977 | |
I3*F | 0.444 | 0.861 | I3*/N | 0.930 |
While the number of publications drives the number of citations in the SCI-E, this appears to be less the case in the SSCI. Size is less important for impact in SSCI than in SCIE. I3* correlates with size (NPub) more than with citations (NCit) in the social sciences.
Comparison with 2009
It is possible that the results obtained for 2014 were specific for that year, because it is relatively recent and the citation counts were not yet stable. We tested this by repeating the analysis for 2009 data, which was chosen because the WoS (version 5) was reorganized in 2008/2009.
Rotated factor matrices for full sets in 2009 and 2014
JCR 2009: 8904 journals | JCR 2014: 10,942 journals | ||||
---|---|---|---|---|---|
Rotated component matrix^{a} | Rotated component matrix^{a} | ||||
Component | Component | ||||
1 | 2 | 1 | 2 | ||
NPub | 0.904 | − 0.048 | I3 * F | 0.925 | 0.284 |
I3*F | 0.903 | 0.276 | I3 * | 0.915 | 0.286 |
I3* | 0.884 | 0.329 | NPub | 0.870 | |
NCit | 0.87 | 0.259 | NCit | 0.802 | 0.304 |
JIF5 | 0.212 | 0.949 | JIF5 | 0.175 | 0.958 |
JIF2 | 0.201 | 0.939 | JIF2 | 0.188 | 0.957 |
I3*/N | 0.145 | 0.923 | I3 */ N | 0.162 | 0.917 |
Two factors explain 88.1% of the variance in 2009 and 87.5% in 2014. Figure 5 shows the 2-component plot for 2009. The results are virtually identical in these two sample years. Thus, the indicator appears to be robust over time.
Statistics
PLOS One was by far the largest journal in 2014 with 30,042 publications. It was followed in this analysis by RSC Advances with 8345 citable items. In terms of total citations, however, PLOS One is in eighth place with 332,716 citations. In the same year, Nature accrued 617,363 citations to 862 publications. The simple citations/publication (c/p) ratio for Nature is 716.3 and for PLOS One is 11.1. By comparison, the values of I3*/N are 61.4 for Nature and 2.6 for PLOS One and, in seeming contradiction to conventional indicators, the (non-normalized) I3* values are 78,733 for PLOS One and 52,883 for Nature.
Comparison of PLOS One with expected values
PLOS One | Observed | Expected | Classes | Observed | Expected | I3* | I3*_exp | Standardized residuals of the χ^{2} | χ ^{2} | |
---|---|---|---|---|---|---|---|---|---|---|
(a) | (b) | (c) | (d) | (e) | (f) | (g) | (h) | (i) | (j) | (k) |
Top-1% | 91 | 300.42 | 99–100 | 91 | 300.42 | 9100 | 30,042 | − 6.10 | p < .001 | 7498.42 |
Top-10% | 2545 | 3004.2 | 90–98 | 2454 | 2703.78 | 24,540 | 27,038 | 0.82 | n.s. | 135.94 |
Top-50% | 20,141 | 15,021 | 50–89 | 17,596 | 12,016.8 | 35,192 | 24,034 | 4.96 | p < .001 | 4958.68 |
Bottom-50% | 15,021 | 0–49 | 9901 | 15,021 | 9901 | 15,021 | − 1.18 | n.s. | 282.45 | |
Sum | 30,042 | 33,346.62 | 30,042 | 30,042 | 78,733 | 75,402 | χ^{2} = | 12,875.49 | ||
df = 3 | p < .001 | |||||||||
Cramèr’s V = 0.271 |
PLOS One | I3*/N obs. | I3*/N exp. | p(obs) | p(exp) | z test | Cohen’s w | Cohen’s h | |
---|---|---|---|---|---|---|---|---|
(a) | (l) | (m) | (o) | (p) | (q) | (r) | (s) | (t) |
Top-1% | 0.303 | 1.000 | 0.0030 | 0.01 | − 0.086 | n.s. | 0.005 | − 0.090 |
Top-10% | 0.817 | 0.900 | 0.0817 | 0.09 | − 0.028 | n.s. | 0.001 | − 0.030 |
Top-50% | 1.171 | 0.800 | 0.5857 | 0.4 | 0.265 | n.s. | 0.086 | 0.374 |
Bottom-50% | 0.330 | 0.507 | 0.3296 | 0.5 | − 0.265 | n.s. | 0.058 | − 0.348 |
Sum | 2.62 | 3.20 | 1.0000 | 1 | 0.387 |
χ^{2}; df = 3 | z | ||
---|---|---|---|
* Critical values | p < 0.001 | 16.266 | 3.291 |
p < 0.01 | 11.345 | 2.576 | |
p < 0.05 | 7.815 | 1.96 |
Comparison of PLOS One with RSC Advances
PLOS One versus RSC Advances | Unit 1 | Unit2 | Classes | n1 | n2 | I3*1 | I3*2 | Standardized residuals of the χ^{2} | χ ^{2} | |
---|---|---|---|---|---|---|---|---|---|---|
(a) | (b) | (c) | (d) | (e) | (f) | (g) | (h) | (i) | (j) | (k) |
Top-1% | 91 | 30 | 99–100 | 91 | 30 | 9100 | 3000 | 1.196 | n.s. | 9.493 |
Top10% | 2545 | 909 | 90–98 | 2454 | 879 | 24,540 | 8790 | 4.621 | p < .001 | 141.686 |
Top-50% | 20,141 | 5919 | 50–89 | 17,596 | 5010 | 35,192 | 10,020 | − 2.802 | p < .01 | 52.113 |
Bottom-50% | 0–49 | 9901 | 2516 | 9901 | 2516 | − 3.404 | p < .001 | 76.881 | ||
Sum | 30,042 | 8435 | 30,042 | 8435 | 78,733 | 24,326 | χ^{2} = 280.173 | |||
df = 3 | p < .001 | |||||||||
Cramèr’s V = 0.0521 |
PLOS One versus RSC Advances | I3*/N unit1 | I3*/N unit2 | p1 | p2 | z test | Cohen’s w | Cohen’s | |
---|---|---|---|---|---|---|---|---|
(a) | (l) | (m) | (o) | (p) | (q) | (r) | (s) | |
Top-1% | 0.303 | 0.356 | 0.003 | 0.004 | 0.765 | n.s. | 0.000 | 0.009 |
Top-10% | 0.817 | 1.042 | 0.082 | 0.104 | 6.498 | p < .001 | 0.005 | 0.078 |
Top-50% | 1.171 | 1.188 | 0.586 | 0.594 | 1.358 | n.s. | 0.000 | 0.017 |
Bottom-50% | 0.330 | 0.298 | 0.330 | 0.298 | − 5.432 | p < .001 | 0.003 | − 0.067 |
Sum | 2.621 | 2.884 | 1.000 | 1.000 | 0.091 |
χ^{2}; df = 3 | z | ||
---|---|---|---|
* Critical values | p < 0.001 | 16.266 | 3.291 |
p < 0.01 | 11.345 | 2.576 | |
p < 0.05 | 7.815 | 1.96 |
The results of the chi-squared tests are statistically significant (p < .001), both when comparing PLOS One with the expectation, and PLOS One with RSC Advances. One can summarize the results of the chi-squared ex post using Cramèr’s V which conveniently ranges from zero to one. In this case, Cramèr’s V = 0.27 in Table 10 and Cramèr’s V = 0.05 in Table 11. In other words, the differences between the expected and observed percentile-rank distribution is more than five times larger than the corresponding differences between PLOS One and RSC Advances. (The template provides these values automatically.) The results of the chi-squared based on testing the I3* values (in columns g and h in both Tables 10 and 11) are provided at the bottom of column k.
While the chi-squared statistic provides a test for comparing the entire distributions (two vectors of four classes), the decomposition of chi-squared into standardized residuals \(\left[ {\frac{{\left( {{\text{observed}} - {\text{expected}}} \right)}}{{\sqrt {{\text{expected}} } }}} \right]\) provides us with a statistic for each class. Standardized residuals can be considered as z-values: they are significant at the 5% level if the absolute value is larger than 1.96, 1% for an absolute value > 2.576, and 1‰ for an absolute value > 3.291 (Sheskin 2011, at p. 672). These statistics are provided in column j.
Furthermore, the residuals are signed and indicate (in Table 10, for example) that PLOS One scores are significantly below expectation in the top-1% class (p < .001), but above expectation in the top-50% class (p < .001). The overall distribution over the percentile classes (including the vertical direction of columns e and f) is statistically significant at the 1‰ level: the journal as a whole performs significantly below expectation in terms of I3*. (Note that each of the four decompositions in column l is based on two observations, since eight cells are used in the computation of the chi-squared.)
In Table 11, RSC Advances scores statistically significantly higher than PLOS One in the top-10% (column l), but not (statistically) significantly below PLOS One in the lower-ranked classes. Tables 10b and 11b add the statistics for I3*/N. The division by N makes all the frequencies relative. Since these relative frequencies can also be considered as proportions, one can z test for difference in proportions (Sheskin 2011, pp. 656f.) or also compute an effect size using Cohen’s w (1988, at p. 216; Leydesdorff et al. 2019).
The z-values in column q of Table 10 show that PLOS One scores above expectation in the percentile class between 50 and 89, but this value is not statistically significant (in column r). PLOS One scores statistically not-significantly below expectation in the top-1% and even more so in the top-10% and bottom-50%.
These results may come as no surprise, but cases other than PLOS One may offer less intuitive results about the status of a journal. For example, specification of the differences between RSC Advances and Nature in terms of these four classes would be far from obvious. The template available at https://www.leydesdorff.net/I3/template.xlsx automatically fills out the numbers and significance levels when the user provides the field-normalized and non-normalized values for top-1%, top-10%, top-50%, and total number of papers in the respective cells.
In order to have information about the significance of the results on the basis of effect sizes (Cohen 1988; Schneider 2013; Wasserstein and Lazar 2016; Williams and Bornmann 2014), we added Cohen’s h and w for the comparison among proportions as column s to Tables 10 and 11. The w index is 0.4 in Table 10, and thus the difference between PLOS One and its expected citation rates in these four categories is meaningful and significant for practical purposes. This is not the case for the difference between the two journals: w = 0.1. The values of h accord with those of the z test for each of the classes.
It should be kept in mind that the tests on proportions address the size-independent indicator I3*/N. As noted, this measure can be used as the expected value of citations of a publication published in the relevant journal. In other words, a paper that is accepted for publication in RSC Advances has a statistically significantly greater likelihood of being cited in the overall top-10% than a paper in PLOS One. It is also less likely to be cited below the 50% threshold.
Effects of different weighting schemes
Weighting schemes have a significant effect on the outcome and interpretation of the analysis of categorized data; weighting introduces a level of subjectivity. Using the general scheme of I3, I3 variants can be adapted to the context of the evaluation situation. For example, if the focus is solely on research excellence, the percentile classes reflecting high impact can be provided with a higher weight. Reducing the weighting for higher impact classes would mean that productivity is relatively more emphasized.
What happens if, instead of the logarithmic set, we use the linear set of Bornmann and Mutz (2011) specified in Table 1 or the respective quantile values as used by Leydesdorff and Bornmann (2011)? Our data collection is categorized in four classes, so we can do this with a weight of 6 for the top-1% papers, 4 for the top-10%, 2 for the top-50%, and 1 for the bottom-50%. Bornmann and Mutz (2011) used two additional classes: 5 for the top-5% papers, and the class between 50 and 89 was divided into 75–89 weighted with 3 and the class 50–74 weighted with 2. The analysis is now less sensitive: using a linear scale, the information benefit of I3* is considerably reduced.
Rotated component matrix^{a} | Rotated component matrix^{a} | ||||
---|---|---|---|---|---|
Component | Component | ||||
1 | 2 | 1 | 2 | ||
I3*F | .987 | .106 | I3* | .981 | .121 |
I3* | .987 | .105 | I3*F | .978 | .147 |
NPub | .976 | .006 | NPub | .964 | − .001 |
NCit | .642 | .418 | NCit | .660 | .404 |
JIF5 | .084 | .969 | JIF5 | .087 | .963 |
JIF2 | .093 | .968 | JIF2 | .096 | .963 |
I3*/N | .150 | .888 | I3*/N | .184 | .820 |
Major factor loadings are boldfaced; italics indicate interfactorial complexity | Major factor loadings are boldfaced; italics indicate interfactorial complexity | ||||
Extraction method: Principal component analysis | Extraction method: Principal component analysis | ||||
Rotation method: Varimax with Kaiser normalization^{a} | Rotation method: Varimax with Kaiser normalization | ||||
^{a}Rotation converged in 3 iterations | ^{a}Rotation converged in 3 iterations | ||||
Rotated factor matrix using six percentile ranks (Bornmann and Mutz 2011) | Rotated factor matrix using quantile values (Leydesdorff and Bornmann 2011) |
Recall a similar effect for the social sciences (SSCI compared with SCI) particularly when we focused on the 83 journals in the LIS category. The reason in that case was because of a difference in the data, but in the general case the reason is the (mis)specification of a model which does not give appropriate attention to the skew in the distribution.
Summary and conclusions
We argue in this paper that an indicator can be developed that reflects both impact and output, and that combines the two dimensions of publications and citations into a single measure by using non-parametric statistics. The generic Integrated Impact Indicator I3 is a sum of weighted publication numbers in different percentile classes. The indicator can be used very flexibly with a range of percentile classes and weights. Depending on the chosen parameters, I3 can be made more output- or more impact-orientated. In this study, we introduced I3* = I3(99-100, 90-10, 50-2, 0-1) which categorises and weights papers published in the higher citation impact range in a more informed way, given the distribution skew, than the indicator proposed by Bornmann and Mutz (2011) and the quantile-based approach elaborated by Leydesdorff and Bornmann (2011).
I3* can be size-normalized by dividing the value by the original number of publications, to obtain a secondary indicator that expresses the expected contribution made by a single paper given the journal’s characteristics. The size-normalized and size-independent indicators can be considered as relating to two nearly orthogonal axes. When we consider the relationship between conventional journal indicators and these new indicators, we see that I3* correlates strongly with both the total number of citations and publications, whereas I3*/N correlates with size-independent indicators such as JIF.
The journal impact factor developed by Garfield and Sher (1963) was originally intended as a journal statistic of value to publishers and librarians for portfolio management. It was not intended for research evaluation, but it has in fact been increasingly employed for this purpose and mistakenly used as a benchmark for individual researchers and their research output. An average citation rate of two (JIF2) or five (JIF5) years is not representative of the journal as a whole. The JIF can be used as one indicator of the reputation or status of a journal, subject to appropriate contextual considerations, but it cannot be used as an impact value for single papers (Pendlebury and Adams 2012; Bornmann and Williams 2017; Leydesdorff et al. 2016a, b; cf. Waltman and Traag 2017).
Can the I3* indicator be compared with the h-index? Only to the extent that the measurement of output and impact are combined into a single number in both indicators. However, the h-index is mathematically inconsistent; it overrides disciplinary-specific cultural and other considerations, and observed values cannot be tested systematically against expected ones. By contrast, I3* can be analyzed using various statistical tests or power analysis depending on the context in which one wishes to use the indicator. Furthermore, I3* does not provide only one single value like the h-index, but gives an additional four reference values with performance information in different impact classes. This information can be compared with expected values and between different publication sets (e.g., of two or more institutions). Thus, I3* can be used as a single number (e.g., for policy purposes), but it can also be decomposed into the contributions of the percentile rank classes (e.g., the top-10% group). Importantly, one is able to specify error terms on the basis of statistics.
The versatility of I3* is illustrated in a spreadsheet in Excel containing a template for the computation at https://www.leydesdorff.net/i3/template.xlsx. The P_{top10%} and PP_{top10%} indicators have become established as quasi-standard indicators in professional bibliometrics, especially when research institutions are compared (Waltman et al. 2012). The use of these percentile-based indicators is recommended, for instance in the Leiden Manifesto, which included ten guiding principles for research evaluation (Hicks et al. 2015).^{8} It is an advantage of the I3* indicator—which is a percentile-based indicator—that it integrates the top-1% with the top-10% information and combines them with information about other percentile classes. Thus, one provides a broader picture by using I3* as indicator compared to P_{top10%} and PP_{top10%}.
The almost weekly invention of a new h-type indicator signals that many innovative analysts are not aware of a central problem with bibliometric data, shared with other forms of collected data, that indicators necessarily generate error both in source measurement and through analytical methodology (Leydesdorff et al. 2016, pp. 2144f.). Consequently, one should not underestimate the need to elaborate, test, and report on algorithms and their analytics, both empirically and statistically. Elegance on purely mathematical (that is, a priori) grounds is not a sufficient condition for claiming scientometric utility (Ye and Leydesdorff 2014).
Perspectives for further research
The convergent validity of different (field-normalized) indicators can be investigated by comparing the indicators with assessments by peers (Bornmann et al. 2019). Peer assessments of papers published in the biomedical area are available in the F1000Prime database (see https://f1000.com/prime). High correlations between quantitative and qualitative assessments signal the convergent validity of bibliometric indicators which should be preferred in the practice of research evaluation. Bornmann and Leydesdorff (2013) have correlated different indicators with assessments by peers provided in the F1000Prime database. The results showed, for instance, that “Percentile in Subject Area achieves the highest correlation with F1000 ratings” (p. 286). In a follow-up study, I3* indicators are investigated with a similar design to investigate whether these new indicators also have convergent validity (Bornmann et al. 2019).
Footnotes
- 1.
A journal that publishes many items that do not report substantive research, but nonetheless attract citations, can inflate its JIF (Moed and van Leeuwen 1996).
- 2.
- 3.
- 4.
Analogously, the minimal I3* which PLOS One 2014 could reach is 30,042; all publications would in this case belong to the bottom-50% papers and thus be weighted only with a one (0 * 100 + 0 * 10 + 0 * 2 + 30,042 * 1 = 30,042).
- 5.
We checked also for oblique rotation, but the results are very similar.
- 6.
A non-parametric alternative would be to use multidimensional scaling (MDS, Schiffman et al. 1981).
- 7.
Cohen’s h tests proportions against each other for each row using h = 2 * (arcsin√p_{obs} − arcsin√p_{exp}) (Cohen 1988, pp. 180 ff.), whereas Cohen’s w first sums over the rows and then takes the square root (Cohen 1988, pp. 216f.): \({\mathbf{w}} = \sqrt {\mathop \sum \limits_{i = 1}^{m} \frac{{(p({\text{observed}}) - p({\text{expected}})^{2} }}{{p({\text{expected}})}}}\).
- 8.
As explained above, I3(90-1) is the notation for P_{top10%} whereas PP_{top10%} can be written as I3(90-1)/N.
Notes
Acknowledgements
The bibliometric data used in this paper are from an in-house database developed and maintained in collaboration with the Max Planck Digital Library (MPDL, Munich) of the Max Planck Society, and derived from the Science Citation Index Expanded (SCI-E), the Social Sciences Citation Index (SSCI), and the Arts and Humanities Citation Index (AHCI) prepared by Clarivate Analytics (Philadelphia, Pennsylvania, USA). We are also grateful to ISI/Clarivate Analytics for providing one of us with JCR data.
References
- Ahlgren, P., Persson, O., & Rousseau, R. (2014). An approach for efficient online identification of the top-k percent most cited documents in large sets of Web of Science documents. ISSI Newsletter, 10(4), 81–89.Google Scholar
- Alberts, B. (2013). Impact factor distortions. Science, 340(6134), 787.CrossRefGoogle Scholar
- Antonoyiannakis, M. (2018). Impact factors and the central limit theorem: Why citation averages are scale dependent. Journal of Informetrics, 12(4), 1072–1088.CrossRefGoogle Scholar
- Archambault, É., & Larivière, V. (2009). History of the journal impact factor: Contingencies and consequences. Scientometrics, 79(3), 635–649.CrossRefGoogle Scholar
- Bensman, S. J. (2007). Garfield and the impact factor. Annual Review of Information Science and Technology, 41(1), 93–155.CrossRefGoogle Scholar
- Bornmann, L. (2014). How are excellent (highly cited) papers defined in bibliometrics? A quantitative analysis of the literature. Research Evaluation, 23(2), 166–173.CrossRefGoogle Scholar
- Bornmann, L., De Moya Anegón, F., & Leydesdorff, L. (2012). The new excellence indicator in the World Report of the SCImago Institutions Rankings 2011. Journal of Informetrics, 6(2), 333–335. https://doi.org/10.1016/j.joi.2011.11.006.CrossRefGoogle Scholar
- Bornmann, L., & Leydesdorff, L. (2013). The validation of (advanced) bibliometric indicators through peer assessments: A comparative study using data from InCites and F1000. Journal of Informetrics, 7(2), 286–291. https://doi.org/10.1016/j.joi.2012.12.003.CrossRefGoogle Scholar
- Bornmann, L., & Mutz, R. (2011). Further steps towards an ideal method of measuring citation performance: The avoidance of citation (ratio) averages in field-normalization. Journal of Informetrics, 5(1), 228–230.CrossRefGoogle Scholar
- Bornmann, L., Mutz, R., & Daniel, H.-D. (2008). Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine. Journal of the American Society for Information Science and Technology, 59(5), 830–837. https://doi.org/10.1002/asi.20806.CrossRefGoogle Scholar
- Bornmann, L., Mutz, R., Hug, S. E., & Daniel, H.-D. (2011a). A multilevel meta-analysis of studies reporting correlations between the h index and 37 different h index variants. Journal of Informetrics, 5(3), 346–359.CrossRefGoogle Scholar
- Bornmann, L., Mutz, R., Marx, W., Schier, H., & Daniel, H.-D. (2011b). A multilevel modelling approach to investigating the predictive validity of editorial decisions: Do the editors of a high profile journal select manuscripts that are highly cited after publication? Journal of the Royal Statistical Society: Series A (Statistics in Society), 174(4), 857–879.MathSciNetCrossRefGoogle Scholar
- Bornmann, L., Tekles, A., & Leydesdorff, L. (2019). How well does I3 perform for impact measurement compared to other bibliometric indicators? The convergent validity of several (field-normalized) indicators. Scientometrics. https://doi.org/10.1007/s11192-019-03071-6.Google Scholar
- Bornmann, L., & Williams, R. (2017). Can the journal impact factor be used as a criterion for the selection of junior researchers? A large-scale empirical study based on ResearcherID data. Journal of Informetrics, 11(3), 788–799. https://doi.org/10.1016/j.joi.2017.06.001.CrossRefGoogle Scholar
- Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.zbMATHGoogle Scholar
- Egghe, L. (2008). Mathematical theory of the h-and g-index in case of fractional counting of authorship. Journal of the American Society for Information Science and Technology, 59(10), 1608–1616.CrossRefGoogle Scholar
- Egghe, L., & Rousseau, R. (1990). Introduction to informetrics. Amsterdam: Elsevier.Google Scholar
- Frandsen, T. F., & Rousseau, R. (2005). Article impact calculated over arbitrary periods. Journal of the American Society for Information Science and Technology, 56(1), 58–62.CrossRefGoogle Scholar
- Garfield, E. (1955). Citation indexes for science: A new dimension in documentation through association of ideas. Science, 122(3159), 108–111.CrossRefGoogle Scholar
- Garfield, E. (1972). Citation analysis as a tool in journal evaluation. Science, 178(Number 4060), 471–479.CrossRefGoogle Scholar
- Garfield, E. (1979). Is citation analysis a legitimate evaluation tool? Scientometrics, 1(4), 359–375.CrossRefGoogle Scholar
- Garfield, E. (2003). The meaning of the impact factor. Revista Internacional de Psicologia Clinica y de la Salud, 3(2), 363–369.Google Scholar
- Garfield, E. (2006). The history and meaning of the journal impact factor. JAMA, 295(1), 90–93.CrossRefGoogle Scholar
- Garfield, E., & Sher, I. H. (1963). New factors in the evaluation of scientific literature through citation indexing. American Documentation, 14(3), 195–201.CrossRefGoogle Scholar
- Gross, P. L. K., & Gross, E. M. (1927). College libraries and chemical education. Science, 66(No. 1713 (Oct. 28, 1927)), 385–389.CrossRefGoogle Scholar
- Hicks, D., Wouters, P., Waltman, L., de Rijcke, S., & Rafols, I. (2015). Bibliometrics: The Leiden Manifesto for research metrics. Nature, 520(7548), 429–431.CrossRefGoogle Scholar
- Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the USA, 102(46), 16569–16572.CrossRefzbMATHGoogle Scholar
- Jacsó, P. (2009). Five-year impact factor data in the journal citation reports. Online Information Review, 33(3), 603–614.CrossRefGoogle Scholar
- Kreft, G. G., & de Leeuw, E. (1988). The see-saw effect: A multilevel problem? Quality & Quantity, 22(2), 127–137.CrossRefGoogle Scholar
- Leydesdorff, L., & Bornmann, L. (2011). Integrated impact indicators compared with impact factors: An alternative research design with policy implications. Journal of the American Society for Information Science and Technology, 62(11), 2133–2146. https://doi.org/10.1002/asi.21609.CrossRefGoogle Scholar
- Leydesdorff, L., Bornmann, L., Comins, J., & Milojević, S. (2016a). Citations: Indicators of quality? The impact fallacy. Frontiers in Research Metrics and Analytics. https://doi.org/10.3389/frma.2016.00001.Google Scholar
- Leydesdorff, L., Bornmann, L., & Mingers, J. (2019). Statistical significance and effect sizes of differences among research universities at the level of nations and worldwide based on the Leiden rankings. Journal of the Association for Information Science and Technology, 70(5), 509–525. https://doi.org/10.1002/asi.24130.CrossRefGoogle Scholar
- Leydesdorff, L., Bornmann, L., Mutz, R., & Opthof, T. (2011). Turning the tables on citation analysis one more time: Principles for comparing sets of documents. Journal of the American Society for Information Science and Technology, 62(7), 1370–1381. https://doi.org/10.1002/asi.21534.CrossRefGoogle Scholar
- Leydesdorff, L., Wagner, C., & Bornmann, L. (2018). Discontinuities in citation relations among journals: Self-organized criticality as a model of scientific revolutions and change. Scientometrics, 116(1), 623–644. https://doi.org/10.1007/s11192-018-2734-6.CrossRefGoogle Scholar
- Leydesdorff, L., Wouters, P., & Bornmann, L. (2016b). Professional and citizen bibliometrics: Complementarities and ambivalences in the development and use of indicators—A state-of-the-art report. Scientometrics, 109(3), 2129–2150. https://doi.org/10.1007/s11192-016-2150-8.CrossRefGoogle Scholar
- Marchant, T. (2009). An axiomatic characterization of the ranking based on the h-index and some other bibliometric rankings of authors. Scientometrics, 80(2), 325–342.CrossRefGoogle Scholar
- Martyn, J., & Gilchrist, A. (1968). An evaluation of British scientific journals. London: Aslib.Google Scholar
- McAllister, P. R., Narin, F., & Corrigan, J. G. (1983). Programmatic evaluation and comparison based on standardized citation scores. IEEE Transactions on Engineering Management, 30(4), 205–211.CrossRefGoogle Scholar
- Moed, H. F., & Van Leeuwen, T. N. (1996). Impact factors can mislead. Nature, 381(6579), 186.CrossRefGoogle Scholar
- Narin, F. (1976). Evaluative bibliometrics: The use of publication and citation analysis in the evaluation of scientific activity. Washington, DC: National Science Foundation.Google Scholar
- Narin, F. (1987). Bibliometric techniques in the evaluation of research programs. Science and Public Policy, 14(2), 99–106.Google Scholar
- Pendlebury, D. A., & Adams, J. (2012). Comments on a critique of the Thomson Reuters journal impact factor. Scientometrics, 92, 395–401. https://doi.org/10.1007/s11192-012-0689-6.CrossRefGoogle Scholar
- Price, D. J. (1970). Citation measures of hard science, soft science, technology, and nonscience. In C. E. Nelson & D. K. Pollock (Eds.), Communication among scientists and engineers (pp. 3–22). Lexington, MA: Heath.Google Scholar
- Robinson, W. D. (1950). Ecological correlations and the behavior of individuals. American Sociological Review, 15, 351–357.CrossRefGoogle Scholar
- Schiffman, S. S., Reynolds, M. L., & Young, F. W. (1981). Introduction to multidimensional scaling: Theory, methods, and applications. New York: Academic Press.zbMATHGoogle Scholar
- Schneider, J. W. (2013). Caveats for using statistical significance tests in research assessments. Journal of Informetrics, 7(1), 50–62.CrossRefGoogle Scholar
- Seglen, P. O. (1992). The skewness of science. Journal of the American Society for Information Science, 43(9), 628–638.CrossRefGoogle Scholar
- Seglen, P. O. (1997). Why the impact factor of journals should not be used for evaluating research. British Medical Journal, 314, 498–502.CrossRefGoogle Scholar
- Sher, I. H., & Garfield, E. (1965). New tools for improving and evaluating the effectiveness of research. Paper presented at the Second conference on Research Program Effectiveness, July 27–29, Washington, DC.Google Scholar
- Sheskin, D. J. (2011). Handbook of parametric and nonparametric statistical procedures (5th ed.). Boca Raton, FL: Chapman & Hall/CRC.zbMATHGoogle Scholar
- Tijssen, R. J. W., Visser, M. S., & Van Leeuwen, T. N. (2002). Benchmarking international scientific excellence: Are highly cited research papers an appropriate frame of reference? Scientometrics, 54(3), 381–397.CrossRefGoogle Scholar
- Waltman, L., Calero-Medina, C., Kosten, J., Noyons, E., Tijssen, R. J., Eck, N. J., et al. (2012). The Leiden ranking 2011/2012: Data collection, indicators, and interpretation. Journal of the American Society for Information Science and Technology, 63(12), 2419–2432.CrossRefGoogle Scholar
- Waltman, L., & Schreiber, M. (2013). On the calculation of percentile-based bibliometric indicators. Journal of the American Society for Information Science and Technology, 64(2), 372–379.CrossRefGoogle Scholar
- Waltman, L., & Traag, V. A. (2017). Use of the journal impact factor for assessing individual articles need not be wrong. arXiv preprint arXiv:1703.02334.
- Waltman, L., & Van Eck, N. J. (2012). The inconsistency of the h-index. Journal of the American Society for Information Science and Technology, 63(2), 406–415.CrossRefGoogle Scholar
- Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: context, process, and purpose. The American Statistician, 70(2), 129–133.MathSciNetCrossRefGoogle Scholar
- Williams, R., & Bornmann, L. (2014). The substantive and practical significance of citation impact differences between institutions: Guidelines for the analysis of percentiles using effect sizes and confidence intervals. In Y. Ding, R. Rousseau, & D. Wolfram (Eds.), Measuring scholarly impact: Methods and practice (pp. 259–281). Heidelberg: Springer.Google Scholar
- Ye, F. Y., Bornmann, L., & Leydesdorff, L. (2017). h-based I3-type multivariate vectors: multidimensional indicators of publication and citation scores. COLLNET Journal of Scientometrics and Information Management, 11(1), 153–171.CrossRefGoogle Scholar
- Ye, F. Y., & Leydesdorff, L. (2014). The “Academic Trace” of the Performance Matrix: A Mathematical Synthesis of the h-Index and the Integrated Impact Indicator (I3). Journal of the Association for Information Science and Technology, 65(4), 742–750. https://doi.org/10.1002/asi.23075.CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.