Introduction

One of the key aspects of scientific research is the publication of results, which not only allows new scientific knowledge to be disseminated, (Lafrenière et al., 2013; Mabe, 2010) but also increases the recognition of researchers (Bavdekar & Tullu, 2016; Post et al., 2012; Vale, 2015). Quantitative analysis of researchers' productivity (by documenting the number of articles published) and qualitative analysis of their publications (usually by looking at the number of citations and/or the impact factor of journals) has become essential for comparing researchers (Pfeiffer et al., 2016; Prpić, 1996). It also allows to dig some important topics related to research policy, such as the influence of gender and collaboration between institutions (Akers et al., 2016; Gasparyan et al., 2014, 2016; Lerchenmueller & Sorenson, 2016).

The h-index is a bibliometric variable that is increasingly used to evaluate researchers and to quantitatively measure their contributions to the scientific production. It takes into account both the productivity of researchers and the quality of their articles, since it combines the number of publications and citations (Hirsch, 2005). Indeed, the number of citations alone does not allow a reliable assessment of the overall quality of researchers’ publications, as it can be affected by the publication of a few articles with many citations. The h-index has been launched by Jorge Hirsch in 2005 (Hirsch, 2005). First, it was used to determine the scientific quality of articles published by theoretical physicists. It is now used in many scientific fields, including medicine (Baldock, 2008). This index is defined as the number of articles h that have each been cited at least h times. It is easily determined using databases such as Web of Science, Scopus and Google Scholar.

Despite its advantages, the h-index is often criticized because it does not take into account the relative contribution of each author to the articles. To overcome this limitation, Butson and Yu proposed to introduce a weighting factor for the first author in the h-index (Butson & Yu, 2010). This modified h-index, called the h(fa)-index, could provide a more appropriate impact index. It is calculated with the following formula: h-index *(Np + Fp)/Np, where Np and Fp are respectively the total number of articles and the number of first author articles entering into the calculation of the h-index. This new indicator is relatively easy to calculate, as the h-index is immediately available through various databases and Fp is easy to obtain by manual verification.

The main objective of this study was to assess whether there were differences in the classification of researchers according to whether the h-index or the h(fa)-index was used (inter-individual evaluation). For reaching this objective, we collected data allowing us to calculate these indices for senior hospital physicians practicing internal medicine or family medicine in Swiss university hospitals. We hypothesized significant differences in the classification of researchers between the two indicators. The use of the h(fa)-index would then bring an added value to the h-index, allowing to correct the categorization of researchers and to characterize more precisely the contribution of each researcher to the scientific production.

Materials and methods

Study site and study population

This was a secondary study based on data collected between 1 and 14 March 2020 for a bibliometric study aimed at assessing the productivity of senior hospital physicians in Switzerland (Sebo et al., 2020, 2021). We included all senior physicians (heads of division, staff physicians and senior registrars) practicing internal medicine or family medicine in one of the six Swiss university hospitals: three hospitals in predominantly French-speaking cantons (Western Switzerland: Geneva, Lausanne and Fribourg) and three hospitals in predominantly German-speaking cantons (Eastern Switzerland: Bern, Basel, Zurich).

Data collection

The list of physicians selected for the study was established by visiting hospital websites. Their main socio-demographic characteristics (gender, medical department [internal medicine, family medicine], title of professor [Y/N], hierarchical position [head of division, staff physician, senior resident] were then recorded. Finally, using Web of Science, their list of publications, the year of publication of their first article and their h-index were extracted, and their h(fa)-index was calculated. The procedure used to retrieve these bibliometric variables was described in detail in the main article (Sebo et al., 2020). Analyses were only conducted on data for physicians with a non-zero h-index.

Statistical analyses and sample size

Frequency tables were used to summarize the socio-demographic characteristics of physicians, and medians and interquartile ranges (IQR) were used to summarize their h-index and h(fa)-index. Spearman correlation coefficients were calculated to determine the correlation between the h-index and the h(fa) index. We considered correlations of 0.10 as "small", 0.30 as "medium" and 0.50 as "large". (Cohen, 1988).

We then sorted the physicians by their h-index and h(fa)-index, respectively, divided the two samples into ten quantiles and calculated the inter-rater agreement between the h-index and the h(fa)-index using the Cohen’s kappa statistic. Cohen's kappa statistic is a measure originally designed to measure agreement between two raters. However, it was also used as a performance indicator in classification when comparing measures (Tian et al. 2018; Duro et al., 2012). We considered that an inter-rater agreement between 0.2 and 0.4 was "fair", between 0.4 and 0.6 was "moderate", between 0.6 and 0.8 was "substantial" and between 0.8 and 1.0 was "excellent". (Landis & Koch, 1977) To avoid including overly large subgroups in this analysis, we removed physician subgroups when they contained more than 10% of the total study sample (physician subgroups with an h-index of 1 and 2).

Finally, negative binomial regressions were used to assess the associations between physicians’ socio-demographic characteristics and their h-index and h(fa)-index (= over-dispersion count data) (Negative Binomial Regression | Stata Annotated Output, 2019; Negative Binomial Regression | Stata Data Analysis Examples, 2019).

We calculated the required sample size for the primary study using Zhu and Lakkis’ equations developed to compare two negative binomial rates (Zhu & Lakkis, 2014). We estimated that by comparing two separate groups, a sample size of 336 (168 in each group) would be sufficient to detect a difference of five publications with a type I and II error both set at 5%. We expected that the mean number of publications would be ten in one group and five in the other, the ratio of the number of participants would be one, and the dispersion parameter would be five. We performed all statistical analyses with STATA version 15.1 (College Station, USA).

Results

Of the 349 senior physicians working in one of Switzerland's six university hospitals, 244 had a non-zero h-index and were included in the analyses (men: 63%). 147 of these physicians (60%) were located in French-speaking Switzerland and 126 (52%) were affiliated to a department of family medicine. In addition, the proportion of experienced physicians was relatively high: 102 (42%) were staff physicians or heads of department, 48 (20%) had the title of professor, and the mean number of years since the publication of the first article was 12 (SD 7).

The 244 physicians in the study had a median h-index of 3 (IQR 11, min–max 1–94) and a median h(fa)-index of 4.5 (IQR 14, min–max 1–130). The correlation between the h-index and the h(fa)-index was very high (Table 1). If we restricted the analysis to physicians with the highest h-index, the correlation decreased slightly (rho 0.98 for the total sample, vs. 0.89 for the group consisting of the 25% of physicians with the highest h-index). The decrease in correlation for high h-indices is clearly illustrated in Fig. 1 (scatterplot).

Table 1 Correlation between the h-index and the h(fa)-index
Fig. 1
figure 1

Scatterplot for the association between the h-index and the h(fa)-index (N = 244)

Although the correlation was very high, Fig. 2 shows that the inter-rater agreement between the h-index and the h(fa)-index was only moderate (agreement 63.4%, kappa 0.59 (95% CI 0.56–0.64), p-value < 0.001).

Fig. 2
figure 2

Interrater agreement between the h-index and the h(fa)-index (the two indices were categorized into ten quantiles after removing physicians with an h-index of 1 or 2, N = 142)

Finally, Table 2 shows the association between the two indices and physicians’ socio-demographic characteristics. With the exception of hospital location, all physician characteristics were significantly associated with the h-index and the h(fa)-index, and the levels of association found were the same or nearly the same for the h-index and the h(fa)-index. The differences in incident rate ratios (IRRs) were at most 0.2 in absolute value. For example, in univariate analysis, these indices were identical and on average 30% higher for physicians in internal medicine than for those in family medicine (h-index: IRR 1.3 [95% CI 1.0–1.8], p-value 0.04; h(fa)-index: IRR 1.3 [95% CI 1.0–1.8], p-value 0.05). In multivariate analysis, the IRRs were identical as well (h-index: IRR 1.3 [95% CI 1.0–1.5], p-value 0.03; h(fa)-index: IRR 1.3 [95% CI 1.0–1.6], p-value 0.03).

Table 2 Unadjusted and adjusted associations between physicians’ socio-demographic characteristics, and the h-index and the h(fa)-index (N = 244)

Discussion

Main findings

In summary, we found that the senior physicians included in our study had a median h-index of 3 and a median h(fa)-index of 4.5. The correlation between the two indices was very high, although slightly lower for physicians with a higher h-index. Despite the high correlation, we found that the inter-rater agreement was only moderate. Finally, the levels of association with physicians' socio-demographic characteristics were almost the same for the h-index and the h(fa)-index.

Comparison with existing literature

We showed that the h(fa)-index was very useful in quantifying the productivity and impact of individual researchers. The correlation between the two indices was very high. However, in our sample of hospital-based physicians previously sorted into 10 categories according to the h-index, we found that the h(fa)-index was able to correct the categorization of the researchers in about 1/3 of the cases (level of agreement: 63%). Of course, the results probably depend on the sample examined and the number of categories used, but they do suggest that the h(fa)-index could improve the estimation made with the h-index. These data are consistent with those presented by Butson and Yu for a small sample of Australia's leading medical physics researchers (Butson & Yu, 2010).

This index does not seem to add value when comparing groups of physicians rather than individual physicians. We found no difference in analyzing the association with various socio-demographic variables. The levels of association and p-values were very similar. For this type of subgroup analysis, the h-index, which is simpler and therefore quicker to obtain, seems to be sufficient.

Surprisingly, despite the relative simplicity of the h(fa)-index, there are to our knowledge no published studies that have used this index in bibliometric studies or that have compared it to the h-index. Some authors have used an index similar to the h(fa)-index, but even simpler to calculate, the h-index for articles limited to the first author only (McClelland et al., 2018; Vavryčuk, 2018). This index is probably too simplistic because it does not take into account the articles published in co-authorship, unlike the h(fa)-index.

Perspectives

Although the h-index is increasingly used to evaluate researchers and to quantitatively measure their respective contributions to the scientific community, it has its critics (Bornmann & Marx, 2011; Schubert & Schubert, 2019; Waltman & Eck, 2012). The limitations that have been documented and discussed extensively in the literature have been summarized by Schubert & Schubert (Schubert & Schubert, 2019). The h-index depends on the duration of a researcher’s career. There are differences in citation practices between fields, so the h-index should not be used to compare researchers from different disciplines. Since the h-index is easy to obtain, there is a risk of indiscriminate use (research performance is a complex, multifaceted endeavor that cannot be captured in a single indicator). It does not take into account the publication type. Finally, it does not take into account multiple co-authorship or self-citations. Baldock has shown, for example, that self-citations artificially improve the h-index (Baldock, 2008).

Whether we like it or not, it is likely that the h-index will not disappear soon and we will have to make do with it. There are however beyond any doubt ways to improve the evaluation of researchers. One solution would be to weight the h-index according to the position of the authors in the list of authors. The contribution of each author would unfortunately be difficult to estimate, as it would have to be done separately for each article. Another solution, much easier to implement, would be to introduce a weighting factor for the first author. This is the h(fa)-index (Butson & Yu, 2010). This index could supplant the h-index in the future because it is fairly easy to obtain and probably evaluates the impact of researchers more reliably.

Thus, although it is certainly challenging to use a single quantitative indicator to evaluate researchers' publications, the h(fa)-index would at least have the advantage of weighting the score according to the number of first author articles entering in the calculation of the h-index. This could allow a more equitable evaluation of some researchers, as those working in small centers who are currently disadvantaged by the use of the h-index as an indicator of the impact of their research (Butson & Yu, 2010). However, although simple to calculate, the h(fa)-index is unfortunately not directly available using databases such as Web of Science, Scopus and Google Scholar. Therefore, in situations where promotion from junior to senior faculty or from senior faculty to department chair is not at the forefront, the h(fa)-index probably does not always add enough value to replace the h-index.

Limitations

First, we only included in the study physicians practicing general internal medicine and family medicine in Swiss university hospitals. The results can possibly not be generalized to other physicians or to other countries. Second, we cannot exclude certain errors in the lists of publications. Yet, even if this risk exists, it should be low. Indeed, we standardized search procedures, duplicated all searches and reviewed all publications to remove articles published by homonyms. In addition, we extracted the lists using the full first name of the physicians, not just the first letter of their first name.

Conclusion

In conclusion, in this sample of Swiss hospital-based senior physicians, we found that despite a high correlation between the h-index and the h(fa)-index, the inter-rater agreement was only moderate. The h(fa)-index was able to correct the categorization of the researchers in about 1/3 of the cases. Although it is certainly challenging to use a single quantitative indicator, the h(fa)-index could allow a more equitable evaluation of researchers.