The Integrated Impact Indicator (I3) Revisited: A Non-Parametric Alternative to the Journal Impact Factor

We propose the I3* indicator as a non-parametric alternative to the Journal Impact Factor (JIF) and h-index. We apply I3* to more than 10,000 journals. The results can be compared with other journal metrics. I3* is a promising variant within the general scheme of non-parametric indicators I3 introduced previously: it provides a single metric which correlates with both impact in terms of citations (c) and output in terms of publications (p). We argue for weighting using four percentile classes: the top-1% and top-10% as excellence indicators; the top-50% and bottom-50% as output indicators. Like the h-index, which also incorporates both c and p, I3*-values are size-dependent; however, division of I3* by the number of publications (I3*/N) provides a size-independent indicator which correlates strongly with the two- and five-year Journal Impact Factors (JIF2 and JIF5). Unlike the h-index, I3* correlates significantly with both the total number of citations and publications. The values of I3* and I3*/N can be statistically tested against the expectation or against one another using chi-square tests or effect sizes. A template (in Excel) is provided online for relevant tests.


Introduction
Citations create links between publications; but to relate citations to publications as two different things, one needs a model (for example, an equation). The Journal Impact Factor (JIF) indexes only one aspect of this relationship: citation impact. Using the h-index, papers with at least h citations are counted. One can also count papers with h 2 or h/2 citations (Egghe, 2008). This paper is based on a different and, in our opinion, more informative model: the Integrated Impact Indicator I3.
The two-year JIF was outlined by Garfield & Sher (1963;cf. Garfield, 1955;Sher & Garfield, 1965) at the time of establishing the Institute for Scientific Information (ISI). JIF2 is defined as the number of citations in the current year (t) to any of a journal's publications of the two previous years (t-1 and t-2), divided by the number of citable items (substantive articles, reviews, and proceedings) in the same journal in these two previous years. Although not strictly a mathematical average, JIF2 provides a functional approximation of the mean early citation rate per citable item. A JIF2 of 2.5 implies that, on average, the citable items published one or two years ago were cited two and a half times. Other JIF variants are also available; for example, JIF5 covers a five-year window. 1 The central problem that led Garfield (1972;1979) to use the JIF when developing the Science Citation Index, was the selection of journals for inclusion in this database. He argued that citation analysis provides an excellent source of information for evaluating journals. The choice of a two-year time window was based on experiments with the Genetics Citation Index and the early Science Citation Index (Garfield, 2003, at p. 364;Martyn & Gilchrist, 1968). However, one possible disadvantage of the short term (two years) could be that "the journal impact factors enter the picture when an individual's most recent papers have not yet had time to be cited" (Garfield, 2003, p. 365;cf. Archambault & Larivière, 2009). Bio-medical fields have a fastmoving research front with a short citation cycle, and JIF2 may be an appropriate measure for such fields but less so for other fields (Price, 1970). In the 2007 edition of Journal Citation Reports (reissued for this reason in 2009) a five-year JIF (JIF5, considering five instead of only two publication years) was added to balance the focus on short-term citations provided by JIF2 (Jacsó, 2009;cf. Frandsen & Rousseau, 2005). 2 The skew in citation distributions provides another challenge to the evaluation (Seglen, 1992;1997). The mean of a skewed distribution provides less information than the median as a measure of central tendency. To address this problem, McAllister, Narin, & Corrigan (1983, at p. 207) proposed the use of percentiles or percentile classes as a non-parametric indicator (Narin, 1987; 3 see later: Tijssen, Visser, & van Leeuwen, 2002). Using this non-parametric approach, and on the basis of a list of criteria provided by Leydesdorff, Bornmann, Mutz, & Opthof (2011), two of us first developed the Integrated Impact Indicator (I3) based on the integration of the quantile values attributed to each element in a distribution .
Since I3 is based on integration, the development of I3 presents citation analysts with a construct fundamentally different from a methodology based on averages. An analogy that demonstrates the difference between integration and averaging is given by basic mechanics: the impact of two colliding bodies is determined by their combined mass and velocity, and not by the average of their velocities. So, it can be argued that the gross impact of the journal as an entity is the combined volume and citation of its contents (articles and other items); but not an average. Journals differ both in size (the number of published items) and in the skew and kurtosis of the distribution of citations across items. A useful and informative indicator for the comparison of journal influences should respond to these differences. A citation average cannot reflect the variation in both publications and citations but an indicator based on integration can do so.
One route to indexing both performance and impact via a single number has been provided by the h-index (Hirsch, 2005) and its variants (e.g., Egghe, 2008). However, the h-index has many drawbacks, not least mathematical inconsistency (Marchant, 2009;Waltman & Van Eck, 2012). Furthermore, Bornmann, Mutz, & Daniel (2008) showed that the hindex is mainly determined by the number of papers (and not by citation impact). In other words, the impact dimension of a publication set may not be properly measured using the h-index. One aspect that I3 has in common with the h-index is that the focus is no longer on impact as an attribute but on the information production process (Egghe & Rousseau, 1990;Ye et al., 2017). This approach could be applied not only to journals but also to other sets of documents with citations such as the research portfolios of departments or universities. In this study, however, we focus on journal indicators.
At the time of our previous paper about I3 , we were unable to demonstrate the generic value of the non-parametric approach because of limited data access.
Recently, however, the complete Web of Science became accessible under license to the Max Planck Society (Germany). This enables us to compare I3-values across the database with other journal indicators such as JIF2 and JIF5, total citations (NCit), and numbers of publications (NPub). The choice for journals as units of analysis provides us with a rich and well-studied domain.
Our approach based on percentiles can be considered as the development of "second generation indicators" for two reasons. First, we build on the first-generation approach that Garfield (1979;2003;2006) developed for the selection of journals. Second, the original objective of journal selection is very different from the purposes of research evaluation to which JIF has erroneously ben applied (e.g., Alberts, 2013). The relevant indicators should accordingly be appropriately sophisticated.

The weighting scheme
In this study, we introduce I3*-a variant within the general I3 scheme-by proposing a weighting scheme of percentile classes. We elaborate on  who counted six percentile classes with weights from one to six. Since that publication, however, several threads of work have clarified the position of the top-10% and top-1% categories as proxies for excellence. On the basis of this literature (e.g., Bornmann, 2014), our basic assertion is that a paper in the top-1% class can be weighted at ten times the value of a paper in the top-10% class.
It follows log-linearly that a top-1% paper weighs 100 times more than a paper at the bottom.
This weighting scheme reflects the highly-skewed nature of citation distributions. We add, as a second assertion, a weighting to distinguish between papers in the top-50% (weight = 2) and bottom-50% (weight = 1). The dividing line between bottom-50% and top-50% is less pronounced than the line between an averagely-cited paper and an exceptionally-cited one. Figure 1 and Table 1 clarify the correspondence between the approaches. (We will show the differences empirically in a later section.) In Figure 1 the left axis is logarithmic-that is, log (1) to log(100) -whereas the right axis is linear (one to six). In the original scheme of , the relative weighting of a top-1% and top-10% paper was only 6 : 4.5 (equivalent to 4:3) whereas we apply 10 : 1 (= 10) in the new scheme. Using quantiles , the relation between a top-1% and top-10% paper would only be 99 : 89 (= 1.1).  In other words, we distinguish between I3 as a general scheme and a possible family of specific weighting schemes. The latter are applications for specific evaluation contexts. In general, I3 can be written as follows: where PR defines the lower threshold of the respective percentile rank class and W the corresponding weight; n is the number of classes and weights, respectively. In this notation, the scheme proposed by -at the time called PR6-can be written as follows: I3(99-6, 95-5, 90-4, 75-3, 50-2, 0-1); and the scheme in this paper (I3*) can be formalized as I3(99-100, 90-10, 50-2, 0-1). However, the scheme can be used more broadly for percentile-based indicators: the top-10% so-called excellence indicator (e.g., Bornmann et al., 2012;, for example, can be formalized as the special case I3(90-1). In this study, we propose a new variant which we denote as I3*; I3 can be considered as a pragmatic shorthand of I3(99-100, 90-10, 50-2, 0-1).
As is the case for all I3 evaluations, I3* is size-dependent: it scales ceteris paribus with journal size. By dividing I3* by the number of elements of the distribution N = Σi ni (of documents), a size-independent equivalent can be generated. Not surprisingly, this latter measure is highly correlated with JIF2 and JIF5. In other words, I3*i / Ni provides the journal-specific expected I3* value of a paper published in journal i. This value can be used as a benchmark for testing whether the observed citation count for a specific paper is above or below expectation.
Note that we test expected citation rates against observed ones at the level of a sample (e.g., a journal). Consequently, our approach avoids the "ecological fallacy" of using a journal characteristic as an expected value to compare with observed values derived from the individual papers published in the respective journal (Robinson, 1950;Kreft & de Leeuw, 1988;cf. Waltman & Traag, 2017). The observed values are not estimated on the basis of a journal characteristic, but are measured in order to inform the expectation.

Data
Data were harvested at the Max Planck Digital Library ( theoretically could be based on whole-number counting or fractional counting in the case of more than a single co-author. The unit of analysis in this study, however, is the individual paper to which citation counts are attributed irrespective of whether the paper is single-or multiauthored. The citation window in the in-house database was the period to the end of 2017, at the time of the data collection. We collected substantive items (articles and reviews) using the publication We collected the data as follows. On the basis of the number of papers (articles and reviews, excluding non-academic ephemera such as editorials) in a specific year (in this case: 2014), we identified the threshold number of citations at category boundaries, e.g. the lower boundary of the 1% most-frequently cited papers. If there are, for example, a total of 100,000 papers in a year, then one thousand of them should belong to the most-cited-1% for obvious stochastic reasons. If the papers are ranked by descending citation count, then the citation count of the 1,000 th paper is the threshold value (Ahlgren et al., 2014). For each journal, the number of papers in this set can be counted. By counting the number of papers with a citation count exceeding this threshold value, the problem of ties is circumvented. However, there is a possibility that more than 1,000 papers may thereby be included in the top-1% because there are several papers with the same value as the threshold (in 2014, e.g., 1.03% of the papers instead of exactly 1%). The same applies to the other top-x% classes.
In summary, we harvest the top-1%, top-10%, top-50%, and bottom-50% publication scores for each journal by first determining the thresholds of these percentile classes for the entire database and, second, by counting each journal's participation in the respective layers of the database.
Using a dedicated routine, the data are organized in a relational database with JCR-2014 data.
The tables resulting from the analyses can be read into standard software (e.g., Excel, SPSS) for further processing and statistical analysis.

Normalized data
Citation counts are also field-normalized in the in-house database using the WoS Categories, because citation rates differ between fields. These field-normalized scores are available at individual document level for all publications since 1980. The I3* indicator calculated with fieldnormalized data will be denoted as I3*F-pragmatically abbreviating I3*F(99-100, 90-10, 50-2, 0-1) in this case. Some journals are assigned to more than a single WOS CATEGORY: in these instances, the journal items and their citation counts are fractionally attributed. In the case of ties at the thresholds of a top-x% class of papers (see above), the field-normalized indicators have been calculated following Waltman & Schreiber (2013). Thus, the in-house database shows whether a paper belongs to the top-1%, top-10%, or top-50% of papers in the corresponding WoS Categories. Papers at the threshold separating the top from the bottom are fractionally assigned to the top paper set. Table 2 shows how to calculate I3* based on publication numbers using PLOS One as an  The maximal I3* is ((30,042 * 100) + (0 * 10) + (0 * 2) + (0 * 1) = ) 3,004,200 whereby all papers in the journal would belong to the 1% most frequently cited papers in the corresponding fields. With I* = 53,570.256, the journal reaches 1.78% of this maximum. Without fieldnormalization, this is 2.62%. In other words, there is ample room for improvement. 7

Statistics
As noted, I3* can be divided by N, the number of publications (which is by definition equal to the sum of the numbers in the four percentile classes). I3*/N is based on relative frequencies, since the number in each term (ni) is divided by N (= Σi ni ). One can expect I3*/N to no longer be size-dependent and thus to have applications different from I3*, as we shall show below. We focus on I3* in this paper; we will discuss potential applications of I3*/N in a later paper.
We have applied Spearman rank-correlation analysis and factor analysis (Principal Component Analysis with varimax rotation) to the following variables: 1 1. total numbers of publications ( The results are shown as factor-plots using the first two components as x-and y-axes. This representation in a two-dimensional map provides a ready means of assessing the results visually.
We chose two components in accordance with our design, but the number of eigenvectors with a value larger than one is also two. The results indicate that the two first eigenvectors explain about 85-90% of the variance in the subsequent analyses. Since the distributions are non-normal, Spearman's rank-order correlations are preferable to Pearson correlations. 2 Note that the factoranalysis is based on Pearson correlations and the results are consequently, in this respect, approximations. Rotated factor matrices and the percentages of explained variance are also provided for each analysis.

Results
3.1 Full set (journal count, n = 10,942) Figure 2 shows the two-dimensional factor plot of the data provided numerically in Table 3. The first two factors explain 87.5% of the variance. The correlation between I3* and its fieldnormalized equivalent I3*F and between them and this first component is greater than 0.9, so they can be considered as essentially the same characteristic. The factor loadings of the numbers of citations (NCit) and publications (NPub) on this first factor are greater than 0.8. NPub, which is the size indicator of output (number of publications), does not load substantially on the second factor which represents impact (number of citations); however, the number of citations (NCit) loads on factor 1 (.802) much more than on factor 2 (.324).   The correlations in Table 4 are all statistically significant (p<.01). Note that the number of journals is large (n = 10,942) and that significance is therefore less meaningful. However, it can be noted that JIF2 and JIF5 correlate with publication count (NPub) at an observably lower level (0.44 and 0.42) than I3* and I3*F (0.92 and 0.86). Obviously, size-normalization (dividing by n) does not completely remove the effect of size. This is in accordance with the recently published conclusions of Antonoyiannakis (2018). I3/N can also be considered as a mean and thus a parametric statistic.  I3* and I3*F ranking, but not the third column (I3*/N) which is size-independent because of the division by N. Twelve of the 25 titles in this latter column are attributed to journals in the Nature publishing group indicating the high quality of this portfolio. Note that Science, which occupies sixth position in the first two columns, drops to 29 th position on the size-independent indicator.
PLOS One falls much further, to position 2,064.
There may be a disciplinary interaction with normalization: field-normalization seems to affect the leading chemistry journals more than others. The Journal of the American Chemical Society (JACS), for example, holds second place on the (left-side) list of I3*, but only ninth place on I3*F. By contrast, leading physics journals seem to list higher on the normalized indicator.
Perhaps, these relatively well-cited journals in chemistry have a longer-tailed citation distribution than comparable physics journals: normalization (division by N) will have a greater effect with increasing values of N. As noted, the two indicators are highly correlated overall, but the possibility of a disciplinary, and therefore research-cultural, factor will need further elucidation.

The Social Sciences Citation Index (SSCI)
The citation environment of journals listed in the Social Sciences Citation Index is very different from that of journals in the SCI-E. The SCI-E journals in JCR constitute about 28% (3,105 / 10,942) of the total serial titles, but the total citations to SSCI journals constitute less than 10% of all citations to JCR titles (4,506,510/48,340,046 in our time window). The average yearly total cites (NCit) of a journal in SSCI is 1,451.3 compared with 4,417.8 for the combined set. Figure 3 shows the relatively small contribution of the SSCI journals to the citation indices in terms of citations.    (Tables 4 and 7, respectively). Consequently, NPub and NCit are distanced in Figure 4 and the order of the two factors is reversed. Nonetheless, these two factors together still explain 84% of the variance.  If we focus on a specific journal category of SSCI, such as the 83 journals in Information & Library Science, the difference in citation cultures between SCI and SSCI outcomes is further emphasized. Alternatively, if we focus on a narrow specialism in the natural sciences, such as Spectroscopy with 41 journals, we find that the distinction between the two components is even more pronounced than for the full set of 10,942 journals. Table 8 juxtaposes the rotated factor matrices showing these differences numerically. While the number of publications drives the number of citations in the SCI-E, this appears to be less the case in the SSCI. Size is less important for impact in SSCI than in SCIE. I3* correlates with size (NPub) more than with citations (NCit) in the social sciences

Comparison with 2009
It is possible that the results obtained for 2014 were specific for that year, because it is relatively recent and the citation counts were not yet stable. We tested this by repeating the analysis for   Figure 5 shows that the outcome for 2009 data is very similar to that seen with 2014 data (Table 9).  What do these figures mean, and are the differences statistically and practically significant? One can test the distribution of papers over the classes against the expected numbers. This can be done for the frequencies in the matrix using chi-square statistics, or by a test between means (in the case of I3*/N) using the z-test and/or Cohen's h for "practical significance." 12 Table 10 shows various options for testing observed values against expected ones; Table 11 generalizes this to the possibility to test any two distributions against each other. As empirical instances, we again use PLOS ONE for comparison of observed with expected values (Table 10), and this same journal versus RSC Advances in Table 11.
The results of the chi-square tests are statistically significant (p < .001), both when comparing PLOS ONE with the expectation, and PLOS ONE with RSC Advances. One can summarize the 12 Cohen's h tests proportions against each other for each row using h = 2* (arcsin√pobs -arcsin√pexp) (Cohen, 1988, pp. 180 ff.), whereas Cohen's w first sums over the rows and then takes the square root (Cohen, 1988, pp. 216f.): . results of the chi-square ex post using Cramèr's V which conveniently ranges from zero to one.
In this case, Cramèr's V = 0.27 in Table 10 and Cramèr's V = 0.05 in Table 11. In other words, the differences between the expected and observed percentile-rank distribution is more than five times larger than the corresponding differences between PLOS ONE and RSC Advances. (The template provides these values automatically.) The results of the chi-square based on testing the I3* values (in columns g and h in both Tables 9 and 10) are provided in column k at the bottom.  provides us with a statistic for each class. Standardized residuals can be considered as z-values: they are significant at the 5% level if the absolute value is larger than 1.96, 1% for an absolute value > 2.576, and 1‰ for an absolute value > 3.291 (Sheskin, 2011, at p. 672).
Furthermore, the residuals are signed and indicate (in Table 10 In Table 11, RSC Advances scores statistically significantly higher than PLOS One in the top-10% (column l), but not statistically significant below PLOS One in the lower-ranked classes.
Tables 10b and 11b add the statistics for I3*/N. The division by N makes all the frequencies relative. Since these relative frequencies can also be considered as proportions, one can z-test for difference in proportions (Sheskin, 2011, pp. 656f.) or also compute an effect size using Cohen's w (1988, at p. 216;Leydesdorff, Bornmann, & Mingers, in press).
The z-values in column q of Table 10 show (in column r) that PLOS One scores above expectation in the percentile class between 50 and 89, but this value is not statistically significant. PLOS One scores non-significantly below expectation in the top-1% and even more so in the top-10% and bottom-50%.
These results may come as no surprise, but in cases other than PLOS One may offer less intuitive results about the status of a journal. For example, specification of the differences between RSC Advances and Nature in terms of these four classes would be far from obvious. The template available at https://www.leydesdorff.net/I3/template.xlsx automatically fills out the numbers and significance levels when the user provides the field-normalized and non-normalized values for top-1%, top-10%, top-50%, and total number of papers in the respective cells.
In order to have information about the significance of the results on the basis of effect sizes (Cohen, 1988;Schneider, 2013;Wasserstein & Lazar, 2016;Williams & Bornmann, 2014), we added Cohen's h and w for the comparison among proportions as column s to Tables 10 and 11.
The w index is 0.4 in Table 10, and thus the difference between PLOS ONE and its expected citation rates in these four categories is meaningful and significant for practical purposes. This is not the case for the difference between the two journals: w = 0.1. The values of h accord with those of the z-test for each of the classes.
It should be kept in mind that these tests on proportions address the size-independent indicator I3*/N. This measure can be used as the expected value of citations of a publication published in the relevant journal. In other words, a paper that is accepted for publication in RSC Advances has a significantly greater likelihood of being cited in the overall top-10% than a paper in PLOS One.
It is also less likely to be cited below the 50% threshold.

Effects of different weighting schemes
Weighting schemes have a significant effect on the outcome and interpretation of the analysis of categorized data; weighting introduces a level of subjectivity. Using the general scheme of I3, I3 variants can be adapted to the context of the evaluation situation. For example, if the focus is solely on research excellence, the percentile classes reflecting high impact can be provided with a higher weight. Reducing the weighting for higher impact classes would mean that productivity is relatively more emphasized.
What happens if, instead of the logarithmic set, we use the linear set of Mutz & Bornmann (2011) specified in Table 1 (Leydesdorff & Bronmann, 2011) With linear weighting, Figure 6 shows us that I3* no longer captures the number of citations, but becomes a size indicator (correlated to NPub more than NCit). The top-1% papers, for example, are now given a relative value of six instead of one hundred and thus highly skewed, citation frequencies no longer play a strongly differential role in the assessment across higher and lowerranked percentile classes. Recall a similar effect for the social sciences (SSCI compared with SCI) particularly when we focused on the 83 journals in the LIS category. The reason in that case was because of a difference in the data, but in the general case the reason is the (mis)specification of a model which does not give appropriate attention to the skew in the distribution.

Summary and conclusions
We argue in this paper that an indicator can be developed that reflects both impact and output, and that combines the two dimensions of publications and citations into a single measure by using non-parametric statistics. The generic Integrated Impact Indicator I3 is a sum of weighted publication numbers in different percentile classes. The indicator can be used very flexibly with a range of percentile classes and weights. Depending on the chosen parameters, I3 can be made more output-or more impact-orientated. In this study, we introduced I3* = I3(99-100, 90-10, 50-2, 0-1) which categorises and weights papers published in the higher citation impact range in a more informed way, given the distribution skew, than the indicator proposed by  and the quantile-based approach elaborated by .
I3* can be size-normalized by dividing the value by the original number of publications, to obtain a secondary indicator that expresses the expected contribution made by a single paper given the journal's characteristics. The size-normalized and size-independent indicators can be considered as relating to two nearly orthogonal axes. When we consider the relationship between conventional journal indicators and these new indicators, we see that I3* correlates strongly with both the total number of citations and publications, whereas I3*/N correlates with sizeindependent indicators such as JIF.
The Journal Impact Factor developed by Garfield and Sher (1963) was originally intended as a journal statistic of value to publishers and librarians for portfolio management. It was not intended for research evaluation, but it has in fact been increasingly employed for this purpose and mistakenly used as a benchmark for individual researchers and their research output. An average citation rate of two (JIF2) or five (JIF5) years is not representative of the journal as a whole. The JIF can be used as one indicator of the reputation or status of a journal, subject to appropriate contextual considerations, but it cannot be used as an impact value for single papers (Pendlebury & Adams, 2012;Bornmann & Williams, 2017;Leydesdorff, Wouters, & Bornmann, 2016;cf. Waltman & Traag, 2018).
Can the I3* indicator be compared with the h-index? Only to the extent that the measurement of output and impact are combined into a single number in both indicators. However, the h-index is mathematically inconsistent; it overrides disciplinary-specific cultural and other considerations, and observed values cannot be tested systematically against expected ones. By contrast, I3* can be analyzed using various statistical tests or power analysis depending on the context in which one wishes to use the indicator. Furthermore, I3* does not provide only one single value like the h-index, but gives an additional four reference values with performance information in different impact classes. This information can be compared with expected values and between different publication sets (e.g., of two or more institutions). Thus, I3* can be used as a single number (e.g., for policy purposes), but it can also be decomposed into the contributions of the percentile rank classes (e.g., the top-10% group). Importantly, one is able to specify error terms on the basis of statistics.
The versatility of I3* is illustrated in a spreadsheet in Excel containing a template for the computation at https://www.leydesdorff.net/i3/template.xlsx. The Ptop 10% and PPtop 10% indicators have become established as quasi-standard indicators in professional bibliometrics, especially when research institutions are compared . The use of these percentilebased indicators is recommended, for instance in the Leiden Manifesto, which included ten guiding principles for research evaluation (Hicks et al., 2015). 13 It is an advantage of the I3* indicator-which is a percentile-based indicator-that it integrates the top-1% with the top-10% information and combines them with information about other percentile classes. Thus, one provides a broader picture by using I3* as indicator compared to Ptop 10% and PPtop 10%.
The almost weekly invention of a new h-type indicator signals that many innovative analysts are not aware of a central problem with bibliometric data, shared with other forms of collected data, that indicators necessarily generate error both in source measurement and through analytical methodology (Leydesdorff, Wouters, & Bornmann, 2016. Consequently, one should 13 As explained above, I3(90-1) is the notation for Ptop 10% whereas PPtop 10% can be written as I3(90-1)/N. not underestimate the need to elaborate, test, and report on algorithms and their analytics, both empirically and statistically. Elegance on purely mathematical (that is, a priori) grounds is not a sufficient claim for scientometric utility (Ye & Leydesdorff, 2014).

Perspectives for further research
The convergent validity of different (field-normalized) indicators can be investigated by comparing the indicators with assessments by peers (Bornmann et al., 2019). Peer assessments of papers published in the biomedical area are available in the F1000Prime database (see https://f1000.com/prime). High correlations between quantitative and qualitative assessments signal the convergent validity of bibliometric indicators which should be preferred in the practice of research evaluation. Bornmann & Leydesdorff (2013) have correlated different indicators with assessments by peers provided in the F1000Prime database. The results showed, for instance, that "Percentile in Subject Area achieves the highest correlation with F1000 ratings" (p. 286). In a follow-up study, I3* indicators are investigated with a similar design to investigate whether these new indicators also have convergent validity (Bornmann et al., in press).
Analytics (Philadelphia, Pennsylvania, USA). We are also grateful to ISI/Clarivate Analytics for providing one of us with JCR data.