From popularity prediction to ranking online news

  • Alexandru TatarEmail author
  • Panayotis Antoniadis
  • Marcelo Dias de Amorim
  • Serge Fdida
Original Article


News articles are an engaging type of online content that captures the attention of a significant amount of Internet users. They are particularly enjoyed by mobile users and massively spread through online social platforms. As a result, there is an increased interest in discovering the articles that will become popular among users. This objective falls under the broad scope of content popularity prediction and has direct implications in the development of new services for online advertisement and content distribution. In this paper, we address the problem of predicting the popularity of news articles based on user comments. We formulate the prediction task as a ranking problem, where the goal is not to infer the precise attention that a content will receive but to accurately rank articles based on their predicted popularity. Using data obtained from two important news sites in France and Netherlands, we analyze the ranking effectiveness of two prediction models. Our results indicate that popularity prediction methods are adequate solutions for this ranking task and could be considered as a valuable alternative for automatic online news ranking.


Online news User comments Ranking Predictions 



The work presented in this paper has been carried out at LINCS ( and was partially supported by the ANR project Crowd under contract ANR-08-VERS-006 and EINS, the Network of Excellence in Internet Science, FP7 grant 288021. The authors are grateful to Manos Tsagkias for sharing the telegraaf data set.


  1. Adamic LA, Glance N (2005) The political blogosphere and the 2004 us election: divided they blog. In: Proceedings of the 3rd international workshop on Link discovery. ACM, New York, pp 36–43Google Scholar
  2. Bandari R, Asur S, Huberman B (2012) The pulse of news in social media: forecasting popularity. Arxiv preprint arXiv:12020332Google Scholar
  3. Breslau L, Cao P, Fan L, Phillips G, Shenker S (1999) Web caching and zipf-like distributions: evidence and implications. In: INFOCOM’99. eighteenth annual joint conference of the IEEE computer and communications societies. Proceedings. IEEE, IEEE, vol 1, pp 126–134Google Scholar
  4. Cha M, Kwak H, Rodriguez P, Ahn Y, Moon S (2007) I tube, you tube, everybody tubes: analyzing the world’s largest user generated content video system. In: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement. ACM, New York, pp 1–14Google Scholar
  5. Cha M, Mislove A, Gummadi KP (2009) A measurement-driven analysis of information propagation in the flickr social network. In: Proceedings of the 18th international conference on World wide web. ACM, New York, pp 721–730Google Scholar
  6. Cha M, Pérez JAN, Haddadi H (2012) The spread of media content through blogs. Soc Netw Anal Min 2(3):249–264CrossRefGoogle Scholar
  7. Clauset A, Shalizi C, Newman M (2009) Power-law distributions in empirical data. SIAM Rev 51:661–703CrossRefMathSciNetzbMATHGoogle Scholar
  8. Crane R, Sornette D (2008) Robust dynamic classes revealed by measuring the response function of a social system. Proc Natl Acad Sci 105(41):15649–15653CrossRefGoogle Scholar
  9. Dang V (2012) Ranklib library.
  10. De Francisci Morales G, Gionis A, Lucchese C (2012) From chatter to headlines: harnessing the real-time web for personalized news recommendation. In: Proceedings of the fifth ACM international conference on Web search and data mining, ACM, New York, pp 153–162Google Scholar
  11. Dezsö Z, Almaas E, Lukács A, Rácz B, Szakadát I, Barabási AL (2006) Dynamics of information access on the web. Phys Rev E 73(6):066,132CrossRefGoogle Scholar
  12. Easley D, Kleinberg J (2010) Networks, crowds, and markets: reasoning about a highly connected world. Cambridge University Press, CambridgeGoogle Scholar
  13. Fortunato S, Flammini A, Menczer F, Vespignani A (2006) Topical interests and the mitigation of search engine bias. Proc Natl Acad Sci 103(34):12684–12689CrossRefGoogle Scholar
  14. Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969MathSciNetGoogle Scholar
  15. Friedman JH (2001) Greedy function approximation: a gradient boosting machine.(english summary). Ann Stat 29(5):1189–1232CrossRefzbMATHGoogle Scholar
  16. Guo L, Tan E, Chen S, Xiao Z, Zhang X (2008) The stretched exponential distribution of internet media access patterns. In: Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing, ACM, New York, pp 283–294Google Scholar
  17. Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of ir techniques. ACM Trans Inf Syst (TOIS) 20(4):422–446CrossRefGoogle Scholar
  18. Kaltenbrunner A, Gomez V, Lopez V (2007a) Description and prediction of slashdot activity. In: Proceedings of the 2007 Latin American web conference. IEEE Computer Society, Washington, DC, pp 57–66Google Scholar
  19. Kaltenbrunner A, Gómez V, Moghnieh A, Meza R, Blat J, López V (2007b) Homogeneous temporal activity patterns in a large online communication space. CoRR URL
  20. Lee J, Moon S, Salamatian K (2010) An approach to model and predict the popularity of online contents with explanatory factors. In: Proceedings of the 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, vol 01. IEEE Computer Society, Washington, DCGoogle Scholar
  21. Lerman K, Ghosh R (2010) Information contagion: an empirical study of the spread of news on Digg and Twitter social networks. In: Proceedings of 4th international conference on weblogs and social media (ICWSM), URL
  22. Lerman K, Hogg T (2010) Using a model of social dynamics to predict popularity of news. In: Proceedings of the 19th international conference on World wide web,WWW ’10. ACM, New York, pp 621–630Google Scholar
  23. Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, KDD ’09Google Scholar
  24. Li CT, Kuo TT, Ho CT, Hong SC, Lin WS, Lin SD (2013) Modeling and evaluating information propagation in a microblogging social network. Soc Netw Anal Min, pp 1–17Google Scholar
  25. Liu TY (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331CrossRefGoogle Scholar
  26. Macskassy SA (2011) Contextual linking behavior of bloggers: leveraging text mining to enable topic-based analysis. Soc Netw Anal Min 1(4):355–375CrossRefGoogle Scholar
  27. McCreadie R, Macdonald C, Ounis I (2010) News article ranking: Leveraging the wisdom of bloggers. In: Adaptivity, Personalization and Fusion of Heterogeneous Information, pp 40–48Google Scholar
  28. Mitzenmacher M (2004) A brief history of generative models for power law and lognormal distributions. Internet Math 1(2):226–251CrossRefMathSciNetzbMATHGoogle Scholar
  29. Simkin M, Roychowdhury V (2012) Why does attention to web articles fall with time? Arxiv preprint arXiv:12023492Google Scholar
  30. Szabo G, Huberman BA (2008) Predicting the popularity of online content. Commun ACM 53(8):80CrossRefGoogle Scholar
  31. Tatar A, Antoniadis P, Limbourg A, de Amorim MD, Leguay J, Fdida S (2011) Predicting the popularity of online articles based on user comments. In: WIMS’11. ACM, New York, pp 67–75Google Scholar
  32. Tsagkias M, Weerkamp W, De Rijke M (2009) Predicting the volume of comments on online news stories. In: Proceeding of the 18th ACM conference on Information and knowledge management. ACM, New York, pp 1765–1768Google Scholar
  33. Tsagkias M, Weerkamp W, De Rijke M (2010) News comments: exploring, modeling, and online prediction. In: Proceedings of the 32nd European conference on advances in information retrieval, ECIR2010. Springer, BerlinGoogle Scholar
  34. Van Mieghem P, Blenn N, Doerr C (2011) Lognormal distribution in the digg online social network. Eur Phys J B 83:251–261CrossRefGoogle Scholar
  35. Wu F, Huberman B (2007) Novelty and collective attention. Proc Natl Acad Sci 104(45):17599CrossRefGoogle Scholar
  36. Wu Q, Burges CJ, Svore KM, Gao J (2010) Adapting boosting for information retrieval measures. Inf Retr 13(3):254–270CrossRefGoogle Scholar
  37. Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 391–398Google Scholar
  38. Zhou R, Khemmarat S, Gao L (2010) The impact of youtube recommendation system on video views. In: Proceedings of the 10th ACM SIGCOMM conference on internet measurement, IMC’10. ACM, New York, pp 404–410Google Scholar

Copyright information

© Springer-Verlag Wien 2014

Authors and Affiliations

  • Alexandru Tatar
    • 1
    Email author
  • Panayotis Antoniadis
    • 2
  • Marcelo Dias de Amorim
    • 1
  • Serge Fdida
    • 1
  1. 1.LIP6/CNRSUPMC Sorbonne UniversitésParisFrance
  2. 2.Communication Systems GroupETH ZurichZurichSwitzerland

Personalised recommendations