Abstract
Citation-based measures are widely used as quantitative proxies for subjective factors such as the importance of a paper or even the worth of individual researchers. Here we analyze the citation histories of 4669 papers published in journals of the American Physical Society between 1960 and 1968 and argue that state-of-the-art models of citation dynamics and algorithms for forecasting nonstationary time series are very likely to fail to predict the long-term (50 years after publication) citation counts of highly-cited papers using citation data collected in a short period (say, 10 years) after publication. This is so because those papers do not exhibit distinctive short-term citation patterns, although their long-term citation patterns clearly set them apart from the other papers. We conclude that even if one accepts that citation counts are proxies for the quality of papers, they are not useful evaluative tools since the short-term counts are not informative about the long-term counts in the case of highly-cited papers.
GraphicAbstract
Similar content being viewed by others
Data Availability Statement
This manuscript has associated data in a data repository. [Authors’ comment: The citing article pairs and bibliographic meta-data used to generate the citation histories of the 4669 papers considered in the manuscript are available under request from the APS Data Sets for Research at https://journals.aps.org/datasets.]
References
J.Z. Muller, The Tyranny of Metrics (Princeton University Press, Princeton, 2018)
E. Garfield, R.K. Merton, Citation Indexing: Its Theory and Application in Science, Technology, and Humanities, vol. 8 (Wiley, New York, 1979)
L.I. Meho, Phys. World 20, 32 (2007)
S. Fortunato, C. T. Bergstrom, K. Börner, J. A. Evans, D. Helbing, S. Milojević, A. M. Petersen, F. Radicchi, R. Sinatra, B. Uzzi, A. Vespignani, L. Waltman, D. Wang, A.-L. Barabási, Science 359 (2018)
S.E. Phelan, Emergence 3, 120 (2001)
N. Wade, Science 188, 429 (1975)
E. Garfield, Science 178, 471 (1972)
E. Garfield, Scientometrics 1, 359 (1979)
N. De Bellis, Bibliometrics and Citation Analysis: From the Science Citation Index to Cybermetrics (Scarecrow Press, London, 2009)
D. Wang, C. Song, A.-L. Barabási, Science 342, 127 (2013)
S. Seabold, J. Perktold, in 9th Python in Science Conference (2010)
APS Data Sets for Research. https://journals.aps.org/datasets. Accessed 27 April 2021
S.M. Reia, J.F. Fontanari, Eur. Phys. J. Plus 136, 207 (2021)
D. Steinley, Br. J. Math. Stat. Psychol. 59, 1 (2006)
E. Garfield, JAMA 295, 90 (2006)
R.L. Thorndike, Psychometrika 18, 267 (1953)
C. Chatfield, Time-Series Forecasting (CRC Press, London, 2000)
R.J. Hyndman, G. Athanasopoulos, Forecasting: Principles and Practice (OTexts, New York, 2018)
D.J.S. Price, Science Since Babylon (Yale University Press, New Haven, 1975)
R. Merton, The Sociology of Science (University of Chicago Press, Chicago, 1973)
S. Redner, Phys. Today 58, 49 (2005)
J.G. Foster, A. Rzhetsky, J.A. Evans, Am. Sociol. Rev. 80, 875 (2015)
J. Li, Y. Yin, S. Fortunato, D. Wang, Nat. Rev. Phys. 1, 301 (2019)
Y.-H. Eom, S. Fortunato, PLoS ONE 6, e24926 (2011)
Y. Dong, R.A. Johnson, N.V. Chawla, IEEE Trans. Big Data 2, 18 (2016)
L.M. Bettencourt, A. Cintrón-Arias, D.I. Kaiser, C. Castillo-Chávez, Phys. A 364, 513 (2006)
W.O. Kermack, A.G. McKendrick, Proc. R. Soc. A 115, 700 (1927)
J. Mingers, J. Oper. Res. Soc. 59, 1013 (2008)
C. Min, Y. Ding, J. Li, Y. Bu, L. Pei, J. Sun, J. Assoc. Inf. Sci. Technol. 69, 1271 (2018)
A.H. Rosenfeld, A. Barbaro-Galtieri, W.J. Podolsky, L.R. Price, P. Soding, C.G. Wohl, M. Roos, W.J. Willis, Rev. Mod. Phys. 39, 1 (1967)
W. Galbraith, E.W. Jenkins, T.F. Kycia, B.A. Leontic, R.H. Phillips, A.L. Read, R. Rubinstein, Phys. Rev. 138, B913 (1965)
C.M. Perey, F. Perey, Phys. Rev. 132, 755 (1963)
R.P. Madden, K. Codling, Phys. Rev. Lett. 10, 516 (1963)
J. Wang, Y. Mei, D. Hicks, Science 345, 149 (2014)
H. Shen, D. Wang, C. Song, A.L. Barabási, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28 (2014)
Acknowledgements
We thank the American Physical Society for letting us use their citation database. The research of JFF was supported in part by Grant No. 2020/03041-3, Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) and by Grant No. 305058/2017-7, Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq). SMR was supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)-Finance Code 001.
Author information
Authors and Affiliations
Contributions
All the authors contributed equally to the paper.
Appendices
Appendix A
The 4669 cumulative distribution functions exhibited in Fig. 1 can be grouped in a certain number K of classes (or clusters) according to their similarities. Here the dissimilarity or distance between the cumulative functions \(c_i(t)\) and \(c_j(t)\) of papers i and j is given by the sum of the squared errors
Given the K classes obtained from the application of the K-means clustering algorithm [14] to the APS data set, we can readily determine the mean cumulative function \(\bar{c}_k (t)\) for each class \(k=1, \ldots ,K\),
where \(\varOmega _k\) is the set of papers in class k and \(| \varOmega _k |\) is the cardinality of \(\varOmega _k\). Hence the squared distance of the cumulative distributions of the papers in class k to the mean cumulative distribution of class k is
Finally, we can define a measure of the goodness of the choice of the numbers of classes K used as input to the K-means clustering algorithm as the mean quadratic error \(\bar{d} = \sum _{k=1}^K \bar{d}_{k}\), which is shown in Fig. 8 as a function of K. The idea of the elbow method is to pick the elbow of the curve \(\bar{d} \times K\) as the number of classes to use, since adding another class does not result in a significant decrease of the error measure. Hence the choice \(K=4\) used in the paper.
Appendix B
It is clear from Fig. 1 that different papers have very heterogeneous time scales. This heterogeneity is captured by the parameter \(\mu _i\) of the WSB model (see Eq. 1), which determines the time \(\tau _i\) necessary for paper i to reach the geometric mean of its final citations, viz, \(\tau _i= \exp (\mu _i)\) [10]. Accordingly, in Fig. 9 we show the boxplots of \(\mu _i\) and \(\tau _i\) for papers in classes \(k=1,2,3,4\), which demonstrate that papers in class \(k=4\) have larger \(\mu _i\) (and, consequently, larger \(\tau _i\)) values than papers in the other classes. In addition, these parameters exhibit a much larger variation for papers in class \(k=4\) compared to the other classes.
Considering the remarkable distinctiveness of the parameters \(\mu _i\) and \(\tau _i\) for papers in class \(k=4\), use of a same training period T for all papers, as usually done in the literature, may undermine the performance of the mechanistic models of citation dynamics for papers of extended longevity. A possible way to circumvent this issue is to introduce a paper-dependent training period \( T_i = \tau ^* \tau _i\) for a fixed \(\tau ^*\) so that papers of extended longevity (i.e., papers characterized by large \(\tau _i\) values) are assigned long training periods. The training periods \(T_i\) are shown in Fig. 10 for \(\tau ^* = 0.001, 0.01\) and 0.1, where the white band indicates the region \(T_i \in [1,600]\) months of training periods that are feasible to implement with the APS data set used in this paper. It is clear from this figure that there is no choice of \(\tau ^*\) that includes all the 4669 papers considered in our study. Hence the paper-dependent training period cannot be implemented in practice, this being the reason we used the 10 years training period for all papers in the analysis of Sect. 5.
Rights and permissions
About this article
Cite this article
Reia, S.M., Fontanari, J.F. Long-term scientific impact revisited. Eur. Phys. J. Plus 137, 161 (2022). https://doi.org/10.1140/epjp/s13360-022-02376-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1140/epjp/s13360-022-02376-5