Abstract
We seek a unified and distinctive citation description of both journals and individuals. The journal impact factor has a restrictive definition that constrains its extension to individuals, whereas the h-index for individuals can easily be applied to journals. Going beyond any single parameter, the shape of each negative slope Hirsch curve of citations vs. rank index is distinctive. This shape can be described through five minimal parameters or ‘flags’: the h-index itself on the curve; the average citation of each segment on either side of h; and the two axis endpoints. We obtain the five flags from real data for two journals and 10 individual faculty, showing they provide unique citation fingerprints, enabling detailed comparative assessments. A computer code is provided to calculate five flags as the output, from citation data as the input. Since papers (citations) can form nodes (links) of a network, Hirsch curves and five flags could carry over to describe local degree sequences of general networks.
Similar content being viewed by others
References
D J de Solla Price, Science 149, 510 (1965)
D J de Solla Price, J. Am. Soc. Inform. Sci. 27, 292 (1976)
S Redner, Eur. J. Phys. B 4, 131 (1998)
S Redner, Phys. Today 58, 49 (2009)
R Albert and A-L Barabasi, Rev. Mod. Phys. 74, 47 (2002)
G Bianconi, Multilayer networks (Oxford University Press, 2018)
G Bianconi and A-L Barabasi, Europhys. Lett. 54, 436 (2001)
A-L Barabasi and Z Oltvai, Nature Rev.: Genetics 5, 104 (2004)
G Bianconi, Pramana – J. Phys. 70, 1135 (2008)
E Garfield, Science 178, 471 (1972)
E Garfield, http://www.garfield.library.upenn.edu/papers/jifchicago2005.pdf
http://www.garfield.library.upenn.edu/commentaries/tsv12(03)p10y19980202.pdf
http://www.garfield.library.upenn.edu/commentaries/tsv12(14)p12y19980706.pdf
E Garfield, Council of Scientific Editors Annual Meeting, May, 2000
E Garfield, The Scientist 10(17), 13 (1996)
E Garfield, Science 144, 649 (1964)
https://clarivate.com/webofsciencegroup/essays/impact-factor
J Hirsch, Proc. Natl. Acad. Sci. 102, 16569 (2005)
S Alonso, F J Caberizo, E Herrera-Viedma and F Herrera, J. Informetrics 3, 273 (2009)
S Lehmann, A D Jackson and B E Lautrup, Nature 44, 1003 (2006)
A L Kinney, Proc. Natl. Acad. Sci. 104,17943 (2007)
J-F Molinari and A Molinari, Scientometrics 75, 163 (2008)
A Chatterji, A Ghosh and B Chakrabarti, PLoS One (2016), https://doi.org/10.1371/journal.pone.0146762
A Khaleque, A Chatterji and P Sen, J. Scientometric Res. 5(1), 25 (2016)
P Wouters, C R Sugimoto, V Larivier̀e, M E McVeigh, B Pulverer, S de Rijcke and L Waltmann, Nature 569, 621 (2019)
D Hicks, P Wouters, L Waltmann, S de Rijcke and I Rafols, Nature 520, 429 (2015)
R Adler, J Ewing and P Taylor, Stat. Sci. 24, 1 (2009)
P O Seglen, Brit. Med. J. 314, 497 (1997)
A M Grimwade, Front. Res. Metr. Anal. (2018), https://doi.org/10.3389/frma.2018.00014.
M Rossner, H Van Epps and E Hill, J. Exp. Med. 204, 3052 (2007)
M Rossner, H Van Epps and E Hill, J. Exp. Med. 205 260 (2008)
N-X Wang, Nature 476, 253 (2011)
M Price, https://www.sciencemag.org/careers013/09/should-we-ditch-journal-impact-factor
J Bollen, H Van de Sompel, A Hagberg and R Chute, https://arxiv.org/abs/0902.2183
M R Berenbaum, Proc. Natl Acad. Sci. 116, 16659 (2019)
A Fersht, Proc. Natl. Acad. Sci. 106, 688 (2009)
San Francisco Declaration on Research Assessment (DORA) (2012), https://sfdora.org/read/
V Larivière, V Klermer, C J MacCallum, M McNutt, M Patterson, B Pulverer, S Swaminathan, S Taylor and S Curry, https://www.biorxiv.org/content/early/2016/07/05/062109
T Braun, W Glanzel and A Schubert, Scientometrics 69, 169 (2006)
Garfield later stated [11], ”Further, I myself deplore the quotation of impact factors to three decimal places. ISI uses three decimal places to reduce the number of journals with identical impact rank. It matters very little whether the impact of JAMA (J. American Medical Association) is quoted as 21.5 rather than 21.455”
Novel citational correlations may be discovered by analysing proprietary databases that are properly subscribed to and with the database use duly acknowledged in the paper. However, authors may still not be allowed to make their detailed research analysis available to colleagues, in a journal data depository. See Data Availability section of L Bornmann, Quant. Sci. Stud. 1, 1553 (2020)
S Saha, S Saint and D A Christakis, J. Med. Libr. Assoc. 91, 42 (2003)
A I Pudovkin, Front. Res. Metr. Anal., https://doi.org/10.3389/frma.2018.00002
L Waltman and V A Traag, https://arxiv.org/ftp/arxiv/papers/1703/1703.02334.pdf
Garfield notes [11] “Thus the impact factor is used to estimate the influence of individual papers, which is rather dubious considering the known skewness observed for most journals”
Database searches for individuals, yield items \(N_{\rm items}\) but not all are (original or review) research articles. Items displayed could include arXiv preprints, conference abstracts, seminar notices, etc. Eliminating them ‘by hand’could be tedious. However, empirical examination shows that such ‘ephemera’ either have zero cites (\(N_0\) items), or are cited only once (\(N_1\) items). Subtracting these yields a pruned number of papers \(N_p \equiv N_{\rm items} - ({N_0} + {N_1})\) that tend to have ephemera automatically filtered out. The resultant \(N_p (A)\) items cited more than once or \(c(s =N_p) \ge 2\) are taken as the number of research papers. New research publications would eventually get cited, meet this criterion and be included. Similarly, for journals, a database search for ‘all item’ mentions, would include non-research items like editorials, letters of opinion, news items etc. Again, we retain only those items cited more than once, to filter out ephemera. For ten faculty members, the average fractions discarded are \(\langle N_0/N_{\rm items}\rangle \)\(=0.28\), and \(\langle N_1/N_{\rm items}\rangle = 0.08\). For the two journals, J1 has \(N_0/N_{\rm items}= 0.18\), \(N_1/N_{\rm items}\)\(= 0.11\); while J2 has \(N_0/N_{\rm items} = 0.02\), \( N_1/N_{\rm items} = 0.03\)
B-H Jin, L-M Liang, R Rosseau and L Egghe, Chin. Sci. Bull. 52, 855 (2007). Their parameters are related to the 5F as ‘A-index’ = \(hac\); ‘R-index’ = \(\sqrt{h \times hac}\)
Each 5F data set could be depicted by a symbol with three Cartesian axes of \((x,y,z)= (nac,hac,h)\). The other two 5F parameters could enter through variations in symbol size (diameter \(\sim \ln u\)) and symbol colour (\(0 < r < 1\) fixes position in colour bar). In a simpler 2D plot of \(hac\) vs. \(nac\), the more well-cited individuals or journals will be points near the upper right corner
R Koch, The 80:20 principle (Little Brown, 2013)
R Sinatra, D Wang, P Deville and A-L Barabasi, Science 354, 6312 (2016)
Garfield recognised that [10] the “citation frequency of a journal is thus a function not only of the scientific significance of the material it publishes (as reflected by citation), but also of the amount of material it publishes”
https://www.topuniversities.com/university-rankings/world-university-rankings/2022
H Jeong, B Tombor, R Albert, Z N Oltvai and A L Barabasi, Nature 407, 651 (2000)
G Bagler, Physica A 387, 2972 (2008)
P Bak, C Tang and K Wiesenfeld, Phys. Rev. Lett. 59, 381 (1987)
D Dhar and R Ramaswamy, Phys. Rev. Lett. 63, 1659 (1989)
The five flags defined in §3 can be obtained as follows from Google Scholar that provides citations in decreasing values. Note down your academic age \(A\), the years after your first paper. Find your citations \(c(s)\) to your \(s=1,2,\ldots \) papers, with the highest \(c(1) = C_{\rm max}\). Note down the largest \(s\) for which \(c(s) \ge s\): this is your \(h\)-index. The largest serial number \(s\) of papers cited more than once \(c(s=N_p) \ge 2\) fixes \(N_p(A)\). Three of the F5 are then known, \(h, r= h/N_p, u= C_{\rm max} /h\). The average citation of the first \(h\) papers over \(s= 1,\ldots ,h\) is the \(hac\)-number. The average citation of the remainder \(n= N_p -h\) papers over \(s= h+1,\ldots ,N_p\) is the \(nac\)-number. These are the five flag components \({\phi }_5 = (h,r,u,nac,hac) \)
We also have developed and provide, a computer code that yields the 5F as output directly from citation data in any order, as input. This is useful when adding new papers to previous-year data files. See URL https://citation-profiler.tifrh.res.in. The source code is also available at URL https://github.com/pankajpopli/cit-prof
A-W Harzing and S Alakangas, Scientometrics 106, 787 (2016). See also the Harzing blog for useful packages to obtain citational information from Google Scholar. URL: https://harzing.com/
J Li, S Fortunato and D Wang, Nat. Rev. Phys. 1, 302 (2019)
S E Cozzens, Scientometrics 15, 437 (1989)
T S Kuhn, The structure of scientific revolutions (University of Chicago Press, 1962)
P Popli and S R Shenoy, unpublished (2022)
Acknowledgements
It is a pleasure to thank Mustansir Barma, Smarajit Karmakar, Prasad Perlekar and Surajit Sengupta for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Appendix A. Properties of the IF
Appendix A. Properties of the IF
We introduce a notation to describe citations to published papers. Citations to papers involve pairs of publication year/citation year and a suitables notation is needed to uniquely identify various citation parameters.
Consider \(n_p(y)\) papers published in the year y, in a publication block y(A) of duration A years. The total papers are \(N_p (A) = \sum _{y(A)} n_p(y)\). We define \(P_c (C; y,Y)\) as the number of citations C garnered in the year Y, in a citation block Y(B) of duration B years. The difference between the start of the publication block of A years and the start of the citation block of B years is the delay \(D \equiv \text {Start } (B)-\text {Start } (A) \) years, i.e., zero for \(B=A\) block coincidence.
A useful notation to describe different citation variables is the sum over all allowed C, y, Y that defines the total citations \(S_c(A, B; D)\):
where
We make four observations.
1.1 Observation 1: The A-year IF is not an A-year average
In the familiar case of a citation average, the publication/citation year blocks are equal, coincident and not sequential. Thus, \(A=B\) and \(D =0\). The total number of citations is \(N_{c,\mathrm {tot}} = \sum _C N_c (C; A,A,0) = S_c(A,A,0)\), where \(N_c (C; A,A,0) \equiv \sum _{y(A) }\sum _{Y(A)} P_c (C; y,Y))\). The average citation per paper \({\bar{C}}\) over A years is
Here \(N_c(C;5,5,0)\) is the 5-year citation frequency, written simply as \( N_c (C)\) and shown in figures 2 and 3. On the other hand, the A-year current impact factor IF(A) has publication/citation year blocks that are unequal and not coincident, but sequential, so \(A \ne B\) and \(D \ne 0\). The single citation year \(B=1\) commences right after A, and so \(D=A\). Thus
Clearly, IF\((A) \ne {{\bar{C}}}(A)\). Here, the number of papers \(N_p(A)\) (cited more than once [47]) is replaced by the number of items N(A) (with any citations).
1.2 Observation 2: \({I\!F}(A)\) has far fewer citations than A-year average
The JIF with one citation year, has restricted (and hence fewer) publication–citation year pairs. A toy-model for the 2018 citation year is illustrative. Publications in \(y =2016\) have citations \(P_c(C; 16, 16)\), \(P_c(C; 16, 17)\), \( P_c(C;16, 18)\). Publications in \(y= 2017\) have citations \(P_c (C; 17, 17), P_c (C; 17,18)\). Now suppose the number of citations are \(P_c =2000\) in the year of publication, 1500 in the second year, 500 in the third year and zero thereafter. For the JIF in the year 2018, the two allowed publication–citation pairs are \((y,Y) =(16,18),\) (17, 18) and so total citations are \( [(500) + (1500)] =2000\), for a smaller \(\mathrm {IF}(2) \equiv \mathrm {JIF} = 2000/ [100 +100] = 10\). For the average citation with the same two publication years, the pairs are \((y,Y) = (16,16), (16,17),(17,17)\) and so total citations are the larger \([(2000+ 1500) + (2000)] =5500\) cites, yielding \({{\bar{C}}} = 5500/[100 +100] =27.5\) cites >\(\mathrm {JIF} =10\) cites.
1.3 Observation 3: Different parameters give different rankings
In the early scientometric literature, an adjective made a difference: Current IF means one year of citations and cumulative IF means summing up several years of citations. A multiple-citation-year parameter is [11] the 5-year ‘cumulated’ impact factor with \(B=5\) years of citations, e.g. 1999–2004, from one \(A=1\) publication year of say 1999, with the same start year and so \(D=0\). It is given by \(\sim \!\!S_c (1, 5; 0) /N (1)\) and was applied to JAMA for different single years of publication.
Another parameter is the 15-year ‘cumulative’ impact factor [12, 13] for citations over \(B=15\) years, e.g. 1981–1995, to \(A= 2\) years of publications, e.g. 1981–1982, with the same start year and so \(D=0\). It is \(\sim \!\!\!S_c( 2, 15;0)/N(2)\) and is applied to generate rankings for 100 journals [12, 13] and compare: with the JIF rankings for JCR reference year 1983.
With \(\Delta R\) the difference between the two rankings for a given journal, the average ranking-change magnitude \(\langle | \Delta R | \rangle \) can be found, over subsets of the ranks. For the top 10 journals, the average \(\langle | \Delta R |\rangle = 2 \), is small, consistent with a claimed insensitivity [12, 13]. However, for all the 100 journals, the average ranking shift shows substantial shuffle, \(\langle |\Delta R|\rangle \simeq 34\). The specific JIF rankings depend on the chosen JIF parameter: other choices could give other journal rankings.
1.4 Observation 4: The \({I\!F}(A)\) definition makes rankings A-insensitive
For different durations A, how different are the rankings obtained from the A-year current impact factor of \(\mathrm {IF}(A) = S_c(A,1; A) /N (A)\)? Surprisingly, the large-A ranking can be close to the usual \(A=2\) ranking from the JIF. Suppose that the numerator rises as more years are included, but then flattens to a constant, for A larger than the half-life (‘old papers are less cited’). Suppose further, that the denominator varies as the number of years A in the block, or \(N (A) = A N (1)\) (‘journal size is the same, every year’). In such a case, \(\mathrm {IF}(A) \simeq (2/A) \mathrm {IF}(2)\) and the journal rankings (not values) can be A-insensitive, from the definition. The ranking commonality does not imply that the JIF ranking has any property of uniqueness, or of optimisation [35].
Rights and permissions
About this article
Cite this article
Popli, P., Shenoy, S.R. Unified citation parameters for journals and individuals: Beyond the journal impact factor or the h-index alone. Pramana - J Phys 96, 189 (2022). https://doi.org/10.1007/s12043-022-02413-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12043-022-02413-z