Problems of using the h-index to assess the research productivity and impact of authors

The h-index (Hirsch, 2005) has been widely used to assess the productivity and impact of journals (such as www.scimagojr.com) and authors (such as scholar.google.com and researchgate.net). Because a citation is a reference in a publication to another publication, it is appropriate to use the h-index to assess a journal’s productivity and impact based on its number of publications (h) that have received at least h citations each.

However, because the h-index does not take into account the number of authors in each publication (Schubert & Schubert, 2019), the following four problems exist in the research community when the h-index is used to assess each author’s research productivity and impact:

Problem #1–individually taking full credit for a multiauthored publication’s all contributions. A multiauthored publication is generated by intellectual contributions from multiple authors. Each citation cites a multiauthored publication as a whole, not each of its authors. When the number of all citations that a multiauthored publication has received is used in computing an individual author’s h-index, it means that this author takes full credit for this publication’s all contributions (including those from other authors), which is fundamentally and ethically improper.

Problem #2–creating inflation in counting citations. When the h-index is used to assess the research productivity and impact of authors, it also creates a problem of “inflation in counting citations”. For instance, suppose that a three-authored publication has received 60 citations. If each of these three authors credits this publication’s all 60 citations to herself or himself when computing her or his h-index, then these 60 citations will be redundantly counted three times on these authors’ separate webpages (e.g., scholar.google.com). This generates totally 180 citation counts by these three authors, thus inappropriately inflating their research productivity and impact from one publication.

Here is an example from Google ScholarFootnote 1: The publication “Guidelines for the use and interpretation of assays for monitoring autophagy (4th edition)” had 10,477 citations as of December 18, 2021. This publication has 2931 authors.Footnote 2 If Google Scholar would list this publication’s 10,477 citations on the webpage of each of those 2931 authors, then each author would have over 10,000 citations on Google Scholar, and Google Scholar would redundantly generate enormous 30,708,087 citation counts in total from this publication to drastically exaggerate research productivity in the research community.

Problem #3–creating unfairness in evaluating research contributions. Each coauthor makes a partial contribution to a multiauthored publication. Because the h-index does not take into account the number of authors in publications, the h-index is computed by ignoring the difference between partial contributions in multiauthored publications and full contributions in single-authored publications, thus creating unfairness in assessing research contributions.

Problem #4–driving the unethical behavior of gift authorship. The h-index has been used for decision-making on appointments, promotion and tenure, etc. Its popularity and importance may drive some people to unethically increase their h-index through gift authorship. Because the h-index does not take the number of authors in each publication into account, some people may add each other’s names to the author lists of publications, in which they have no or little contributions, to unethically boost their h-index.

Solution: taking the number of authors in each publication into account

Although many variants of the h-index have been proposed (Alonso et al., 2009; Batista et al., 2006; Hirsch, 2019; Schreiber, 2008a, b; Schubert & Schubert, 2019; Todeschini & Baccini, 2016), the h-index is still widely used for two reasons: it is easy to understand, compute, and interpret by almost all users to evaluate their research productivity and impact; it can be used across different disciplines, different types of publications, etc.

Assessing the research productivity and impact of individual authors more accurately

Among the proposed variants of the h-index, “the fractional h-index” has taken into account the number of authors in publications by “giving an author of an m-authored paper only a credit of \(\frac{c}{m}\) if the paper received c citations” (Egghe, 2008). This paper renames “the fractional h-index” to “the hi-index”; one reason is that although a fractional number of citations that each multiauthored publication has received are used to compute each author’s hi-index, the subscript “i” is used to emphasize the individual contributions in publications that each author should be accredited. Note that this paper does not propose any new variant of the h-index, but attempts to use the hi-index (i.e., the fractional h-index) to address the aforementioned four problems.

Note that the hi-index is different from the hI-index (Batista et al., 2006), the hm-index (Schreiber, 2008a, b), and the hα index (Hirsch, 2019). The hI-index is obtained by dividing “h by the mean number of researchers in the h publications” (Batista et al., 2006); the hI-index “disfavours people with some papers with a large number of co-authors” (Schreiber, 2008a). The hm-index is determined by comparing “an effective rank” with the number of citations that publications have received (Schreiber, 2008a, b); it seems more difficult for users (especially non-technical users) to interpret their research productivity and impact by using “an effective rank” than by directly using the number of citations that publications have received. The hα index is proposed as “a measure of the scientific production of a scientist that counts only those papers where the scientist is the leading author”, with an assumption that “the coauthor with the highest h-index is the most likely” leading author in a multiauthored publication (Hirsch, 2019); in reality, however, this assumption is questionable for many publications.

To discuss how the hi-index can address the aforementioned four problems, it is necessary to first explain how it works:

The hi-index is used to assess the research productivity and impact of individual authors. For an author who has published k publications in total, the author’s hi-index is defined as the maximum value of n such that the author has n publications, each of which has \(\frac{{c_{j} }}{{m_{j} }} \ge n\), where the jth publication has mj authors and has received cj citations (cj ≥ 1, mj ≥ 1, 1 ≤ j ≤ k, 1 ≤ n ≤ k), or is 0 if each publication has \(\frac{{c_{j} }}{{m_{j} }}\) < 1 (cj ≥ 0, mj ≥ 1, 1 ≤ j ≤ k).

When computing the hi-index, it is recommended that if the percentages of contributions from the authors to a publication are known, then these percentages should be used to distribute the number of citations that the publication has received among the authors (Tscharntke et al., 2007); otherwise, the computation of the hi-index assumes equal contributions from the authors by default.

Mathematically, suppose that an author has k publications in total, and these k publications have m1, m2, …, mk (mj ≥ 1, 1 ≤ j ≤ k) authors and have received c1, c2, …, ck (cj ≥ 0, 1 ≤ j ≤ k) citations, respectively;

for each publication, let f be the function that corresponds to the number of citations per author, i.e., \(f(j) = \frac{{c_{j} }}{{m_{j} }}\) (1 ≤ j ≤ k);

for k publications, if the values of \(f(j) = \frac{{c_{j} }}{{m_{j} }}\) (1 ≤ j ≤ k) are ordered in descending order (i.e., the highest value f (1) in the 1st position and the lowest value f (k) in the kth position), then the hi-index is computed as follows:

hi-index (f) = \(\left\{ {\begin{array}{*{20}l} {\max \{ j \in {\mathbb{N}}: \, 1 \le j \le k, \, \left\lfloor {f(j)} \right\rfloor \ge j\} \, if \, f(1) \ge 1} \hfill \\ {0 \, if \, f(1) < 1} \hfill \\ \end{array} } \right.\), where \(\left\lfloor \cdot \right\rfloor\) is the floor function.

As an illustrative example to compare the h-index and the hi-index, Table 1 shows the number of citations (cj) and the number of authors (mj, in parentheses) of an author’s 15 publications (i.e., k = 15). According to Table 1, this author’s h-index = 9 based on the publications highlighted in bold. Next, Table 2 calculates the division \(f(j) = \frac{{c_{j} }}{{m_{j} }}\) for each publication. Then, Table 3 reorders the values of \(f(j)\) in Table 2 from high to low, along with the corresponding publications. According to Table 3, this author’s hi-index = 7 based on the publications highlighted in bold.

Table 1 An example of the h-index (1 ≤ j ≤ 15)
Table 2 Calculating the number of citations per author for each publication in Table 1
Table 3 The hi-index: Reordering the number of citations per author in Table 2 from high to low

Here are two insightful observations from this example. First, publication #11 does not contribute to the h-index = 9 in Table 1, but it contributes to the hi-index = 7 in Table 3. It is a single-authored publication. Because #11’s all citations are accredited to this single author’s intellectual contribution, it makes sense that #11 boosts this author’s individual hi-index.

Second, publications #4 and #5 contribute to the h-index = 9 in Table 1, but they do not contribute to the hi-index = 7 in Table 3. Although #4’s 37 citations and #5’s 34 citations are higher than most of the other publications in Table 1, these two publications have the highest numbers of authors among all publications. Having a relatively large number of coauthors probably implies that this author’s contributions in these two publications are relatively small, and thus it makes sense that #4 and #5 do not contribute to increasing this author’s individual hi-index in Table 3.

Solving the four problems

The aforementioned four problems can be effectively addressed by using the hi-index: The hi-index prevents each author from taking full credit for a multiauthored publication’s all contributions (Problem #1). The hi-index eliminates inflation in counting citations, because it ensures that the portions of a publication’s received citations distributed among its authors will add up to its total number of received citations (Problem #2). By taking a publication’s all authors into account when assessing their contributions to the publication, the hi-index promotes fairness in assessing the research contributions of authors (Problem #3).

To discuss how the hi-index addresses Problem #4, assume that one more person was added to Publication #7 in Table 1 through gift authorship; that is, it had 4 authors and received 25 citations, yielding \(\frac{{c_{7} }}{{m_{7} }} = \frac{25}{4} = 6.25\). This change does not affect the h-index in Table 1, but reduces the hi-index from 7 to 6 in Table 3. In general, adding more people to a publication’s author list through gift authorship will slow down this publication’s potential contribution to each author’s hi-index. The hi-index can make it difficult to “inflate results with coauthorship of documents for reasons other than good scientific performance” (Vieira & Gomes, 2011). Due to this effect, the use of the hi-index will discourage authors from adding people with no or little research contributions to their publications, and thus can effectively curb the unethical practice of gift authorship.

From the h-index to the h i-index

Potential impact and causes

To study how the change from the h-index to the hi-index may potentially impact authors in different fields, this paper compares the h-index and the hi-index of 12 Nobel laureates in four scientific fields. As shown in Table 4, the citation data of three Nobel laureates in each field are obtained from the Google Scholar webpages.

Table 4 The Google Scholar data of 12 Nobel laureates (accessed December 18, 2021) P = Physics, C = Chemistry, M = Physiology or Medicine, E = Economic Sciences

For each author in Table 4, (1) = \(\sum\nolimits_{j = 1}^{k} {c_{j} }\), (2) = \(\sum\nolimits_{j = 1}^{k} {\frac{{c_{j} }}{{m_{j} }}}\), (3) = \(\frac{{\sum\nolimits_{j = 1}^{k} {\frac{{c_{j} }}{{m_{j} }}} }}{{\sum\nolimits_{j = 1}^{k} {c_{j} } }} \times 100\%\), (6) = \(\frac{{h - {\text{index }} - \, h_{i} - {\text{index}}}}{{h - {\text{index}}}} \times 100\%\), (7) =  \(\frac{{\sum\nolimits_{j = 1}^{k} {m_{j} } }}{k}\), and (8) = \(\sum\nolimits_{j = 1}^{k} {c_{j} m_{j} }\), where the author has k cited publications, and the jth publication has mj authors and has received cj citations (cj ≥ 1, mj ≥ 1, 1 ≤ j ≤ k).

Column (8) in Table 4 shows that if the citations in column (1) would be redundantly counted for each author listed in these publications, then Google Scholar would generate much higher total citation counts for all authors. Such huge inflation in counting citations may create misleading impressions of research productivity and impact in the research community.

Based on the data in Table 4, Fig. 1 shows that for these four scientific fields, the authors in Economic Sciences (i.e., E1, E2, and E3) generally have lower average numbers of authors per cited publication; the authors in Physiology or Medicine (i.e., M1, M2, and M3) generally have higher average numbers of authors per cited publication; however, P2 has the highest average number of authors per cited publication. Figure 1 reveals that different fields, as well as different research areas within the same field (e.g., physics), may have different practices of research collaboration and publication.

Fig. 1
figure 1

The relationship between Table 4’s column (6) (in blue) and columns (3) (in green) and (7) (in red)

Although Fig. 1 does not show a simple linear relationship between the blue bars and the red line or the green line, it shows that in general, when the average number of authors per cited publication becomes smaller, or when the percentage of an author’s fractional number of citations becomes larger, the author’s percentage decrease from the h-index to the hi-index becomes smaller.

Based on the data in Table 4, Table 5 shows the rankings of 12 authors when the h-index is used; Table 6 shows how their rankings change when the hi-index is used instead of the h-index.

Table 5 Rankings of 12 authors when the h-index is used (based on Table 4)
Table 6 From the h-index to the hi-index: Change of rankings (based on Table 4)

There are a few observations from the analysis of these 12 authors in four research fields. First, the rankings of the authors from different fields are all mixed in both Table 5 and Table 6; no field is ranked definitely higher or lower than all other fields based on the h-index or the hi-index. Second, when the hi-index is used instead of the h-index, the rankings of all three authors in Economic Sciences are increased (see Table 6); this is probably because these three authors have lower average numbers of authors per cited publication than the authors in the other three fields (see Fig. 1). Lastly, when the hi-index is used instead of the h-index (see Table 6), for each field other than Economic Sciences, some authors’ rankings are increased while some authors’ rankings are decreased; the rankings of M1 and P1 remain unchanged. These observations reveal that changing from the h-index to the hi-index may have different impacts on the rankings of authors in different fields or in different research areas within the same field.

Determining the percentages of partial contributions of coauthors

To calculate an author’s hi-index, it is necessary to determine the author’s percentage of contributions in each multiauthored publication. This paper considers the following two methods:

The first method is that the authors disclose the percentages of their respective contributions in a multiauthored publication. Because different fields may have different practices of research collaboration and publication, it should be the authors who decide their publication’s author list. In general, any individuals who have contributed significantly to a publication should be individually named in the author list. When the authors decide the author list and the order of their names, they typically have some sense about their respective contributions. It is thus possible to estimate such sense into certain percentages of contributions, which sum to 100% for each publication.

The second method is to assume equal contributions from coauthors. When the authors do not disclose the percentages of their respective contributions in a multiauthored publication, this method is desirable for a few reasons: First, because different fields may have different practices of determining the author list, the contributions of authors, the order of authors, etc., this second method practically simplifies and standardizes the implementation of the hi-index across all fields. Second, this method does not force the authors to fight over agreeing on the percentages of their respective contributions in a multiauthored publication, thus encouraging productive research collaboration. Lastly, a prior study showed that for overcoming the h-index’s problem of ignoring the number of authors in each publication, the improvement of the authorship-weighted methods (e.g., first-author-emphasis, corresponding-author-emphasis) compared to the equal-contribution method “is not as high as one would expect” (Vavryčuk, 2018). It is also worth mentioning that while the authorship-weighted methods “may be very useful as applied to a particular field or discipline, they cannot be used across the board because of the very different practices in different disciplines regarding order of authors, significance of authorship position in the author’s list, etc.” (Hirsch, 2019).

For calculating the hi-index, this paper recommends that if the authors disclose the percentages of their respective contributions in a multiauthored publication, then these percentages should be used for allocating the publication’s received citations among its authors; if the authors do not disclose the percentages of their respective contributions in a publication, then the second method can be used. The second method achieves a good balance of overcoming the h-index’s problem of ignoring the number of authors in each publication, making the hi-index implementable across different fields, encouraging productive research collaboration among coauthors, and keeping it easy for authors to interpret their research productivity and impact in a straightforward way based on the number of citations that each publication has received.

Discussion and conclusion

This paper revealed that because the h-index does not take into account the number of authors in each publication, four major problems exist when the h-index is used to assess the research productivity and impact of authors: individually taking full credit for a multiauthored publication’s all contributions, creating inflation in counting citations, creating unfairness in evaluating research contributions, and driving the unethical behavior of gift authorship. This paper showed that the hi-index (i.e., the fractional h-index), which distributes each publication’s received citations among its authors, can help solve these four problems effectively.

The hi-index has several advantages. First, the hi-index assesses each author’s research productivity and impact more accurately and fairly than the h-index. Second, like the original h-index, the hi-index is still easy to understand, compute, and interpret by almost all users (including non-technical users) in comparison with many other variants of the h-index such as the hI-index and the hm-index. Lastly, the hi-index can be used across different disciplines, different types of publications, etc.

This paper used the Google Scholar data of 12 Nobel laureates in four scientific fields to show what happens when the hi-index is used instead of the h-index. These examples demonstrated that the existing h-index drastically exaggerates research productivity and impact. In addition, these examples showed that the percentage decreases from the h-index to the hi-index are generally smaller for the authors whose average number of authors per publication is smaller (or whose portions of contributions to publications are larger). This finding provides a useful implication that the use of the hi-index can potentially motivate more effort to make research contributions.

Although software has been developed for authors to install and compute their own hi-index, such as Publish or Perish (Harzing 2021), the hi-index (i.e., the fractional h-index) is still not widely used. This is probably because nowadays, most authors rely on websites where they can find their h-index instantly. Therefore, this paper recommends that websites (such as scholar.google.com and researchgate.net) should add the hi-index for the sake of building a fairer and more ethical research community, assessing each author’s research productivity and impact more accurately, and encouraging more contributions to research and publication.