Skip to main content
Log in

Long-term scientific impact revisited

  • Regular Article
  • Published:
The European Physical Journal Plus Aims and scope Submit manuscript

Abstract

Citation-based measures are widely used as quantitative proxies for subjective factors such as the importance of a paper or even the worth of individual researchers. Here we analyze the citation histories of 4669 papers published in journals of the American Physical Society between 1960 and 1968 and argue that state-of-the-art models of citation dynamics and algorithms for forecasting nonstationary time series are very likely to fail to predict the long-term (50 years after publication) citation counts of highly-cited papers using citation data collected in a short period (say, 10 years) after publication. This is so because those papers do not exhibit distinctive short-term citation patterns, although their long-term citation patterns clearly set them apart from the other papers. We conclude that even if one accepts that citation counts are proxies for the quality of papers, they are not useful evaluative tools since the short-term counts are not informative about the long-term counts in the case of highly-cited papers.

GraphicAbstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability Statement

This manuscript has associated data in a data repository. [Authors’ comment: The citing article pairs and bibliographic meta-data used to generate the citation histories of the 4669 papers considered in the manuscript are available under request from the APS Data Sets for Research at https://journals.aps.org/datasets.]

References

  1. J.Z. Muller, The Tyranny of Metrics (Princeton University Press, Princeton, 2018)

    Book  Google Scholar 

  2. E. Garfield, R.K. Merton, Citation Indexing: Its Theory and Application in Science, Technology, and Humanities, vol. 8 (Wiley, New York, 1979)

    Google Scholar 

  3. L.I. Meho, Phys. World 20, 32 (2007)

    Article  Google Scholar 

  4. S. Fortunato, C. T. Bergstrom, K. Börner, J. A. Evans, D. Helbing, S. Milojević, A. M. Petersen, F. Radicchi, R. Sinatra, B. Uzzi, A. Vespignani, L. Waltman, D. Wang, A.-L. Barabási, Science 359 (2018)

  5. S.E. Phelan, Emergence 3, 120 (2001)

    Article  Google Scholar 

  6. N. Wade, Science 188, 429 (1975)

    Article  ADS  Google Scholar 

  7. E. Garfield, Science 178, 471 (1972)

    Article  ADS  Google Scholar 

  8. E. Garfield, Scientometrics 1, 359 (1979)

    Article  Google Scholar 

  9. N. De Bellis, Bibliometrics and Citation Analysis: From the Science Citation Index to Cybermetrics (Scarecrow Press, London, 2009)

    Google Scholar 

  10. D. Wang, C. Song, A.-L. Barabási, Science 342, 127 (2013)

    Article  ADS  Google Scholar 

  11. S. Seabold, J. Perktold, in 9th Python in Science Conference (2010)

  12. APS Data Sets for Research. https://journals.aps.org/datasets. Accessed 27 April 2021

  13. S.M. Reia, J.F. Fontanari, Eur. Phys. J. Plus 136, 207 (2021)

    Article  Google Scholar 

  14. D. Steinley, Br. J. Math. Stat. Psychol. 59, 1 (2006)

    Article  MathSciNet  Google Scholar 

  15. E. Garfield, JAMA 295, 90 (2006)

    Article  Google Scholar 

  16. R.L. Thorndike, Psychometrika 18, 267 (1953)

    Article  Google Scholar 

  17. C. Chatfield, Time-Series Forecasting (CRC Press, London, 2000)

    Book  Google Scholar 

  18. R.J. Hyndman, G. Athanasopoulos, Forecasting: Principles and Practice (OTexts, New York, 2018)

    Google Scholar 

  19. D.J.S. Price, Science Since Babylon (Yale University Press, New Haven, 1975)

    Google Scholar 

  20. R. Merton, The Sociology of Science (University of Chicago Press, Chicago, 1973)

    Google Scholar 

  21. S. Redner, Phys. Today 58, 49 (2005)

    Article  Google Scholar 

  22. J.G. Foster, A. Rzhetsky, J.A. Evans, Am. Sociol. Rev. 80, 875 (2015)

    Article  Google Scholar 

  23. J. Li, Y. Yin, S. Fortunato, D. Wang, Nat. Rev. Phys. 1, 301 (2019)

    Article  Google Scholar 

  24. Y.-H. Eom, S. Fortunato, PLoS ONE 6, e24926 (2011)

    Article  ADS  Google Scholar 

  25. Y. Dong, R.A. Johnson, N.V. Chawla, IEEE Trans. Big Data 2, 18 (2016)

    Article  Google Scholar 

  26. L.M. Bettencourt, A. Cintrón-Arias, D.I. Kaiser, C. Castillo-Chávez, Phys. A 364, 513 (2006)

    Article  Google Scholar 

  27. W.O. Kermack, A.G. McKendrick, Proc. R. Soc. A 115, 700 (1927)

    ADS  Google Scholar 

  28. J. Mingers, J. Oper. Res. Soc. 59, 1013 (2008)

    Article  Google Scholar 

  29. C. Min, Y. Ding, J. Li, Y. Bu, L. Pei, J. Sun, J. Assoc. Inf. Sci. Technol. 69, 1271 (2018)

    Article  Google Scholar 

  30. A.H. Rosenfeld, A. Barbaro-Galtieri, W.J. Podolsky, L.R. Price, P. Soding, C.G. Wohl, M. Roos, W.J. Willis, Rev. Mod. Phys. 39, 1 (1967)

    Article  ADS  Google Scholar 

  31. W. Galbraith, E.W. Jenkins, T.F. Kycia, B.A. Leontic, R.H. Phillips, A.L. Read, R. Rubinstein, Phys. Rev. 138, B913 (1965)

    Article  ADS  Google Scholar 

  32. C.M. Perey, F. Perey, Phys. Rev. 132, 755 (1963)

    Article  ADS  Google Scholar 

  33. R.P. Madden, K. Codling, Phys. Rev. Lett. 10, 516 (1963)

    Article  ADS  Google Scholar 

  34. J. Wang, Y. Mei, D. Hicks, Science 345, 149 (2014)

    ADS  Google Scholar 

  35. H. Shen, D. Wang, C. Song, A.L. Barabási, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28 (2014)

Download references

Acknowledgements

We thank the American Physical Society for letting us use their citation database. The research of JFF was supported in part by Grant No. 2020/03041-3, Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) and by Grant No. 305058/2017-7, Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq). SMR was supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)-Finance Code 001.

Author information

Authors and Affiliations

Authors

Contributions

All the authors contributed equally to the paper.

Appendices

Appendix A

The 4669 cumulative distribution functions exhibited in Fig. 1 can be grouped in a certain number K of classes (or clusters) according to their similarities. Here the dissimilarity or distance between the cumulative functions \(c_i(t)\) and \(c_j(t)\) of papers i and j is given by the sum of the squared errors

$$\begin{aligned} d_{ij} = \sum _{t=1}^T \left[ c_i(t) - c_j(t) \right] ^2. \end{aligned}$$
(A1)
Fig. 8
figure 8

Mean quadratic error \(\bar{d}\) that measures the mean of the squared distances of the cumulative distributions of the papers to the mean cumulative distributions of their classes as function of the number of classes K used as input to the K-means clustering algorithm

Fig. 9
figure 9

Boxplots of the parameter \(\mu _i\) of the WSB model (panel A) and of the time \(\tau _i= \exp (\mu _i)\) necessary for paper i to reach the geometric mean of its final citations (panel B) for papers in classes \(k=1,2,3,4\)

Given the K classes obtained from the application of the K-means clustering algorithm [14] to the APS data set, we can readily determine the mean cumulative function \(\bar{c}_k (t)\) for each class \(k=1, \ldots ,K\),

$$\begin{aligned} \bar{c}_k (t) = \frac{1}{|\varOmega _k| }\sum _{i \in \varOmega _k } c_i(t) \end{aligned}$$
(A2)

where \(\varOmega _k\) is the set of papers in class k and \(| \varOmega _k |\) is the cardinality of \(\varOmega _k\). Hence the squared distance of the cumulative distributions of the papers in class k to the mean cumulative distribution of class k is

$$\begin{aligned} \bar{d}_{k} = \sum _{i \in \varOmega _k } \sum _{t=1}^T \left[ c_i(t) - \bar{c}_k(t) \right] ^2. \end{aligned}$$
(A3)

Finally, we can define a measure of the goodness of the choice of the numbers of classes K used as input to the K-means clustering algorithm as the mean quadratic error \(\bar{d} = \sum _{k=1}^K \bar{d}_{k}\), which is shown in Fig. 8 as a function of K. The idea of the elbow method is to pick the elbow of the curve \(\bar{d} \times K\) as the number of classes to use, since adding another class does not result in a significant decrease of the error measure. Hence the choice \(K=4\) used in the paper.

Appendix B

It is clear from Fig. 1 that different papers have very heterogeneous time scales. This heterogeneity is captured by the parameter \(\mu _i\) of the WSB model (see Eq. 1), which determines the time \(\tau _i\) necessary for paper i to reach the geometric mean of its final citations, viz, \(\tau _i= \exp (\mu _i)\) [10]. Accordingly, in Fig. 9 we show the boxplots of \(\mu _i\) and \(\tau _i\) for papers in classes \(k=1,2,3,4\), which demonstrate that papers in class \(k=4\) have larger \(\mu _i\) (and, consequently, larger \(\tau _i\)) values than papers in the other classes. In addition, these parameters exhibit a much larger variation for papers in class \(k=4\) compared to the other classes.

Considering the remarkable distinctiveness of the parameters \(\mu _i\) and \(\tau _i\) for papers in class \(k=4\), use of a same training period T for all papers, as usually done in the literature, may undermine the performance of the mechanistic models of citation dynamics for papers of extended longevity. A possible way to circumvent this issue is to introduce a paper-dependent training period \( T_i = \tau ^* \tau _i\) for a fixed \(\tau ^*\) so that papers of extended longevity (i.e., papers characterized by large \(\tau _i\) values) are assigned long training periods. The training periods \(T_i\) are shown in Fig. 10 for \(\tau ^* = 0.001, 0.01\) and 0.1, where the white band indicates the region \(T_i \in [1,600]\) months of training periods that are feasible to implement with the APS data set used in this paper. It is clear from this figure that there is no choice of \(\tau ^*\) that includes all the 4669 papers considered in our study. Hence the paper-dependent training period cannot be implemented in practice, this being the reason we used the 10 years training period for all papers in the analysis of Sect. 5.

Fig. 10
figure 10

Paper-dependent training periods \( T_i = \tau ^* \tau _i\) for \(\tau ^* = 0.001\) (panel A), 0.01 (panel B) and 0.1 (panel C). The white band indicates the region \(T_i \in [1,600]\) months of training periods that are feasible to implement with the APS data set

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reia, S.M., Fontanari, J.F. Long-term scientific impact revisited. Eur. Phys. J. Plus 137, 161 (2022). https://doi.org/10.1140/epjp/s13360-022-02376-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1140/epjp/s13360-022-02376-5

Navigation