Estimating Number of Citations Using Author Reputation

  • Carlos Castillo
  • Debora Donato
  • Aristides Gionis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4726)


We study the problem of predicting the popularity of items in a dynamic environment in which authors post continuously new items and provide feedback on existing items. This problem can be applied to predict popularity of blog posts, rank photographs in a photo-sharing system, or predict the citations of a scientific article using author information and monitoring the items of interest for a short period of time after their creation. As a case study, we show how to estimate the number of citations for an academic paper using information about past articles written by the same author(s) of the paper. If we use only the citation information over a short period of time, we obtain a predicted value that has a correlation of r = 0.57 with the actual value. This is our baseline prediction. Our best-performing system can improve that prediction by adding features extracted from the past publishing history of its authors, increasing the correlation between the actual and the predicted values to r = 0.81.


Link Prediction Average Citation Prediction Task Citation Information Citation Relationship 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adar, E., Zhang, L., Adamic, L.A., Lukose, R.M.: Implicit structure and the dynamics of blogspace. In: WWE 2004, New York, USA (May 2004)Google Scholar
  2. 2.
    Baeza-Yates, R., Saint-Jean, F., Castillo, C.: Web structure, dynamics and page quality. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Buriol, L., Castillo, C., Donato, D., Leonardi, S., Millozzi, S.: Temporal evolution of the wikigraph. In: WI 2006, Hong Kong, pp. 45–51. IEEE CS Press, Los Alamitos (December 2006)Google Scholar
  4. 4.
    Cho, J., Roy, S., Adams, R.E.: Page quality: in search of an unbiased web ranking. In: SIGMOD 2005, pp. 551–562. ACM Press, New York (2005)CrossRefGoogle Scholar
  5. 5.
    Feitelson, D.G., Yovel, U.: Predictive ranking of computer scientists using citeseer data. Journal of Documentation 60(1), 44–61 (2004)CrossRefGoogle Scholar
  6. 6.
    Fujimura, K., Tanimoto, N.: The eigenrumor algorithm for calculating contributions in cyberspace communities. In: Falcone, R., Barber, S., Sabater-Mir, J., Singh, M.P. (eds.) Trusting Agents for Trusting Electronic Societies. LNCS (LNAI), vol. 3577, pp. 59–74. Springer, Heidelberg (2005)Google Scholar
  7. 7.
    Gehrke, J., Ginsparg, P., Kleinberg, J.: Overview of the 2003 kdd cup. SIGKDD Explor. Newsl. 5(2), 149–151 (2003)CrossRefGoogle Scholar
  8. 8.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: Structure and evolution of blogspace. Commun. ACM 47(12), 35–39 (2004)CrossRefGoogle Scholar
  10. 10.
    Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: KDD 2005, pp. 177–187. ACM Press, New York (2005)CrossRefGoogle Scholar
  11. 11.
    Liben-Nowell, D., Kleinberg, J.: The link prediction problem for social networks. In: CIKM 2003, pp. 556–559. ACM Press, New York (2003)CrossRefGoogle Scholar
  12. 12.
    Mei, Q., Liu, C., Su, H., Zhai, C.: A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In: WWW 2006, pp. 533–542. ACM Press, New York (2006)CrossRefGoogle Scholar
  13. 13.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the Web. Technical report, Stanford Digital Library Technologies Project (1998)Google Scholar
  14. 14.
    Popescul, A., Ungar, L.H.: Statistical relational learning for link prediction. In: IJCAI 2003 (2003)Google Scholar
  15. 15.
    Salganik, M.J., Dodds, P.S., Watts, D.J.: Experimental study of inequality and unpredictability in an artificial cultural market. Science 311(5762), 854–856 (2006)CrossRefGoogle Scholar
  16. 16.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Carlos Castillo
    • 1
  • Debora Donato
    • 1
  • Aristides Gionis
    • 1
  1. 1.Yahoo! Research Barcelona, C/Ocata 1, 08003 Barcelona, CatalunyaSpain

Personalised recommendations