Applied Intelligence

, Volume 44, Issue 2, pp 252–268 | Cite as

Citation count prediction as a link prediction problem

Article

Abstract

The citation count is an important factor to estimate the relevance and significance of academic publications. However, it is not possible to use this measure for papers which are too new. A solution to this problem is to estimate the future citation counts. There are existing works, which point out that graph mining techniques lead to the best results. We aim at improving the prediction of future citation counts by introducing a new feature. This feature is based on frequent graph pattern mining in the so-called citation network constructed on the basis of a dataset of scientific publications. Our new feature improves the accuracy of citation count prediction, and outperforms the state-of-the-art features in many cases which we show with experiments on two real datasets.

Keywords

Citation count Graph pattern mining Feature selection 

References

  1. 1.
    Pobiedina N, Ichise R (2014) Predicting citation counts for academic literature using graph pattern mining. In: Proceeding IEA/AIE, pp 109–119Google Scholar
  2. 2.
    Garfield E (2001) Impact factors, and why they won’t go away. Science 411(6837):522Google Scholar
  3. 3.
    Hirsch J (2005) An index to quantify an individual’s scientific research output. Proc the National Academy of Sciences of the United States America 102(46):16569CrossRefGoogle Scholar
  4. 4.
    Beel J, Gipp B (2009) Google scholar’s ranking algorithm: The impact of citation counts (an empirical study). In: Proceeding RCIS, pp 439–446Google Scholar
  5. 5.
    Bethard S, Jurafsky D (2010) Who should I cite: learning literature search models from citation behavior. In: Proceeding CIKM, pp 609–618Google Scholar
  6. 6.
    Callaham M, Wears R, Weber E (2002) Journal prestige, publication bias, and other characteristics associated with citation of published studies in peer-reviewed journals. J. Am. Med. Assoc. 287(21):2847–50CrossRefGoogle Scholar
  7. 7.
    Kulkarni AV, Busse JW, Shams I (2007) Characteristics associated with citation rate of the medical literature. PLOS One 2(5)Google Scholar
  8. 8.
    Didegah F, Thelwall M (2013) Determinants of research citation impact in nanoscience and nanotechnology. JASIST (JASIS) 64(5):1055–1064CrossRefGoogle Scholar
  9. 9.
    Livne A, Adar E, Teevan J, Dumais S (2013) Predicting citation counts using text and graph mining. In: Proceeding the iConference 2013 Workshop on Computational Scientometrics: Theory and ApplicationsGoogle Scholar
  10. 10.
    Bringmann B, Berlingerio M, Bonchi F, Gionis A (2010) Learning and predicting the evolution of social networks. IEEE Intell Syst 25:26–35CrossRefGoogle Scholar
  11. 11.
    Yan R, Tang J, Liu X, Shan D, Li X (2011) Citation count prediction: learning to estimate future citations for literature. In: Proceeding CIKM, pp 1247–1252Google Scholar
  12. 12.
    Mcgovern A, Friedl L, Hay M, Gallagher B, Fast A, Neville J, Jensen D (2003) Exploiting relational structure to understand publication patterns in high-energy physics. SIGKDD Explorations 5:2003CrossRefGoogle Scholar
  13. 13.
    Yan R, Huang C, Tang J, Zhang Y, Li X (2012) To better stand on the shoulder of giants. In: Proceeding JCDL, pp 51– 60Google Scholar
  14. 14.
    Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Sci Mag 286(5439):509–512MathSciNetGoogle Scholar
  15. 15.
    Adamic LA, Adar E (2003) Friends and neighbors on the web. Soc Networks 25(3):211–230CrossRefGoogle Scholar
  16. 16.
    Liben-Nowell D (2007) The link-prediction problem for social networks. JASIST 58(7):1019–1031CrossRefGoogle Scholar
  17. 17.
    Munasinghe L, Ichise R (2012) Time score: A new feature for link prediction in social networks. IEICE Trans 95-D(3):821–828Google Scholar
  18. 18.
    Shi X, Leskovec J, McFarland D A (2010) Citing for high impact. In: Proceeding JCDL, pp 49–58Google Scholar
  19. 19.
    Devroye L, Gyrfi L, Lugosi G (1996) A Probabilistic Theory of Pattern Recognition. SpringerGoogle Scholar
  20. 20.
    Chang CC, Lin CJ (2011) Libsvm: A library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27CrossRefGoogle Scholar
  21. 21.
    Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: A conditional inference framework. J Comp Graph Stat 15(3):651–674CrossRefMathSciNetGoogle Scholar
  22. 22.
    Breiman L, Friedman J, Stone C J, Olshen R (1984) Classification and Regression Trees. Chapman and Hall/CRCGoogle Scholar
  23. 23.
    The R project for statistical computing http://www.r-project.org/ (January 2013)
  24. 24.
    Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923CrossRefGoogle Scholar
  25. 25.
    Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Institute of Software Technology and Interactive SystemsVienna University of TechnologyViennaAustria
  2. 2.Principles of Informatics Research DivisionNational Institute of InformaticsTokyoJapan

Personalised recommendations