Predicting citation patterns: defining and determining influence
- 504 Downloads
Definitions for influence in bibliometrics are surveyed and expanded upon in this work. On data composed of the union of DBLP and CiteSeer x , approximately 6 million publications, a relatively small number of features are developed to describe the set, including loyalty and community longevity, two novel features. These features are successfully used to predict the influential set of papers in a series of machine learning experiments. The most predictive features are highlighted and discussed.
KeywordsCitation analysis Bibliometrics Big data Machine learning
This research was supported, in part, under National Science Foundation Grants CNS-0958379, CNS-0855217, ACI-1126113 and the City University of New York High Performance Computing Center at the College of Staten Island. The authors also acknowledge the Office of Information Technology at The Graduate Center, CUNY for providing database and server resources that have contributed to the research results reported within this paper. URL: http://it.gc.cuny.edu/.
- Bollacker, K. D., Lawrence, S., & Giles, C. L. (1998). CiteSeer: An autonomous web agent for automatic retrieval and identification of interesting publications. In Proceedings of the second international conference on Autonomous agents (pp. 116–123).Google Scholar
- Giles, C. L., Bollacker, K. D., & Lawrence, S. (1998). CiteSeer: An automatic citation indexing system. In Proceedings of the third ACM conference on digital libraries (pp. 89–98).Google Scholar
- Ley, M. (2002) The DBLP computer science bibliography: Evolution, research issues, perspectives. In String processing and information retrieval (pp. 1–10).Google Scholar
- Lotka, A. J. (1926). The frequency distribution of scientific productivity. Journal of the Washington Academy of Sciences, 16(12), 317–323.Google Scholar
- Mitra, P. (2006). Hirsch-type indices for ranking institutions scientific research output. Current Science, 91(11), 1439.Google Scholar
- Newman, M. E. J. (2009). The first-mover advantage in scientific publication. EPL (Europhysics Letters), 86(6), 68001.Google Scholar
- Newman, M. E. J. (2014). Prediction of highly cited papers. EPL (Europhysics Letters), 105(2), 28002.Google Scholar
- Price, D. J. de Solla (1965). Networks of scientific papers. Science, 149(3683), 510–515.Google Scholar
- Sher, I. H., & Garfield, E. (1965). New tools for improving and evaluating the effectiveness of research. In Research program effectiveness, proceedings of the conference sponsored by the Office of Naval Research, Washington, DC (pp. 135–146).Google Scholar
- Shi, X., Tseng, B., & Adamic, L. A. (2009). Information diffusion in computer science citation networks. arXiv preprint arXiv:0905.2636.
- Steinbach, M., Karypis, G., & Kumar, V. (2000). A comparison of document clustering techniques. In KDD workshop on text mining (Vol. 400, No. 1, pp. 525–526).Google Scholar