Skip to main content
Log in

From popularity prediction to ranking online news

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

News articles are an engaging type of online content that captures the attention of a significant amount of Internet users. They are particularly enjoyed by mobile users and massively spread through online social platforms. As a result, there is an increased interest in discovering the articles that will become popular among users. This objective falls under the broad scope of content popularity prediction and has direct implications in the development of new services for online advertisement and content distribution. In this paper, we address the problem of predicting the popularity of news articles based on user comments. We formulate the prediction task as a ranking problem, where the goal is not to infer the precise attention that a content will receive but to accurately rank articles based on their predicted popularity. Using data obtained from two important news sites in France and Netherlands, we analyze the ranking effectiveness of two prediction models. Our results indicate that popularity prediction methods are adequate solutions for this ranking task and could be considered as a valuable alternative for automatic online news ranking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://www.20minutes.fr/.

  2. http://www.telegraaf.nl/.

  3. We are aware that there are other fine-grained methods of evaluating the decay of attention over time (Lee et al. 2010; Simkin and Roychowdhury 2012; Wu et al. 2007), but for the scope of our work, this coarse characterization provides us with sufficient information.

  4. According to the latest statistics of the two sites: http://corporate.tmg.nl/en/result-second-quarter-2012 ( telegraaf ); http://www.mediametrie.fr ( 20minutes ).

  5. Statistical techniques based on maximum-likelihood methods and Kolmogorov–Smirnov statistics.

  6. The algorithm uses the number of blog posts to predict users’ interest in articles.

  7. We deploy these algorithms using RankLib open source library (Dang 2012).

  8. Information about section and author are available only for 20minutes data set.

References

  • Adamic LA, Glance N (2005) The political blogosphere and the 2004 us election: divided they blog. In: Proceedings of the 3rd international workshop on Link discovery. ACM, New York, pp 36–43

  • Bandari R, Asur S, Huberman B (2012) The pulse of news in social media: forecasting popularity. Arxiv preprint arXiv:12020332

  • Breslau L, Cao P, Fan L, Phillips G, Shenker S (1999) Web caching and zipf-like distributions: evidence and implications. In: INFOCOM’99. eighteenth annual joint conference of the IEEE computer and communications societies. Proceedings. IEEE, IEEE, vol 1, pp 126–134

  • Cha M, Kwak H, Rodriguez P, Ahn Y, Moon S (2007) I tube, you tube, everybody tubes: analyzing the world’s largest user generated content video system. In: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement. ACM, New York, pp 1–14

  • Cha M, Mislove A, Gummadi KP (2009) A measurement-driven analysis of information propagation in the flickr social network. In: Proceedings of the 18th international conference on World wide web. ACM, New York, pp 721–730

  • Cha M, Pérez JAN, Haddadi H (2012) The spread of media content through blogs. Soc Netw Anal Min 2(3):249–264

    Article  Google Scholar 

  • Clauset A, Shalizi C, Newman M (2009) Power-law distributions in empirical data. SIAM Rev 51:661–703

    Article  MathSciNet  MATH  Google Scholar 

  • Crane R, Sornette D (2008) Robust dynamic classes revealed by measuring the response function of a social system. Proc Natl Acad Sci 105(41):15649–15653

    Article  Google Scholar 

  • Dang V (2012) Ranklib library. http://people.cs.umass.edu/vdang/ranklib.html

  • De Francisci Morales G, Gionis A, Lucchese C (2012) From chatter to headlines: harnessing the real-time web for personalized news recommendation. In: Proceedings of the fifth ACM international conference on Web search and data mining, ACM, New York, pp 153–162

  • Dezsö Z, Almaas E, Lukács A, Rácz B, Szakadát I, Barabási AL (2006) Dynamics of information access on the web. Phys Rev E 73(6):066,132

    Article  Google Scholar 

  • Easley D, Kleinberg J (2010) Networks, crowds, and markets: reasoning about a highly connected world. Cambridge University Press, Cambridge

  • Fortunato S, Flammini A, Menczer F, Vespignani A (2006) Topical interests and the mitigation of search engine bias. Proc Natl Acad Sci 103(34):12684–12689

    Article  Google Scholar 

  • Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969

    MathSciNet  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine.(english summary). Ann Stat 29(5):1189–1232

    Article  MATH  Google Scholar 

  • Guo L, Tan E, Chen S, Xiao Z, Zhang X (2008) The stretched exponential distribution of internet media access patterns. In: Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing, ACM, New York, pp 283–294

  • Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of ir techniques. ACM Trans Inf Syst (TOIS) 20(4):422–446

    Article  Google Scholar 

  • Kaltenbrunner A, Gomez V, Lopez V (2007a) Description and prediction of slashdot activity. In: Proceedings of the 2007 Latin American web conference. IEEE Computer Society, Washington, DC, pp 57–66

  • Kaltenbrunner A, Gómez V, Moghnieh A, Meza R, Blat J, López V (2007b) Homogeneous temporal activity patterns in a large online communication space. CoRR URL http://arxiv.org/abs/0708.1579v1

  • Lee J, Moon S, Salamatian K (2010) An approach to model and predict the popularity of online contents with explanatory factors. In: Proceedings of the 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, vol 01. IEEE Computer Society, Washington, DC

  • Lerman K, Ghosh R (2010) Information contagion: an empirical study of the spread of news on Digg and Twitter social networks. In: Proceedings of 4th international conference on weblogs and social media (ICWSM), URL http://arxiv.org/abs/1003.2664

  • Lerman K, Hogg T (2010) Using a model of social dynamics to predict popularity of news. In: Proceedings of the 19th international conference on World wide web,WWW ’10. ACM, New York, pp 621–630

  • Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, KDD ’09

  • Li CT, Kuo TT, Ho CT, Hong SC, Lin WS, Lin SD (2013) Modeling and evaluating information propagation in a microblogging social network. Soc Netw Anal Min, pp 1–17

  • Liu TY (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331

    Article  Google Scholar 

  • Macskassy SA (2011) Contextual linking behavior of bloggers: leveraging text mining to enable topic-based analysis. Soc Netw Anal Min 1(4):355–375

    Article  Google Scholar 

  • McCreadie R, Macdonald C, Ounis I (2010) News article ranking: Leveraging the wisdom of bloggers. In: Adaptivity, Personalization and Fusion of Heterogeneous Information, pp 40–48

  • Mitzenmacher M (2004) A brief history of generative models for power law and lognormal distributions. Internet Math 1(2):226–251

    Article  MathSciNet  MATH  Google Scholar 

  • Simkin M, Roychowdhury V (2012) Why does attention to web articles fall with time? Arxiv preprint arXiv:12023492

  • Szabo G, Huberman BA (2008) Predicting the popularity of online content. Commun ACM 53(8):80

    Article  Google Scholar 

  • Tatar A, Antoniadis P, Limbourg A, de Amorim MD, Leguay J, Fdida S (2011) Predicting the popularity of online articles based on user comments. In: WIMS’11. ACM, New York, pp 67–75

  • Tsagkias M, Weerkamp W, De Rijke M (2009) Predicting the volume of comments on online news stories. In: Proceeding of the 18th ACM conference on Information and knowledge management. ACM, New York, pp 1765–1768

  • Tsagkias M, Weerkamp W, De Rijke M (2010) News comments: exploring, modeling, and online prediction. In: Proceedings of the 32nd European conference on advances in information retrieval, ECIR2010. Springer, Berlin

  • Van Mieghem P, Blenn N, Doerr C (2011) Lognormal distribution in the digg online social network. Eur Phys J B 83:251–261

    Article  Google Scholar 

  • Wu F, Huberman B (2007) Novelty and collective attention. Proc Natl Acad Sci 104(45):17599

    Article  Google Scholar 

  • Wu Q, Burges CJ, Svore KM, Gao J (2010) Adapting boosting for information retrieval measures. Inf Retr 13(3):254–270

    Article  Google Scholar 

  • Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 391–398

  • Zhou R, Khemmarat S, Gao L (2010) The impact of youtube recommendation system on video views. In: Proceedings of the 10th ACM SIGCOMM conference on internet measurement, IMC’10. ACM, New York, pp 404–410

Download references

Acknowledgments

The work presented in this paper has been carried out at LINCS (http://www.lincs.fr) and was partially supported by the ANR project Crowd under contract ANR-08-VERS-006 and EINS, the Network of Excellence in Internet Science, FP7 grant 288021. The authors are grateful to Manos Tsagkias for sharing the telegraaf data set.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandru Tatar.

Additional information

This paper is a significant extension of our previous work, “Ranking news articles based on popularity prediction”, published at ASONAM 2012.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tatar, A., Antoniadis, P., Amorim, M.D.d. et al. From popularity prediction to ranking online news. Soc. Netw. Anal. Min. 4, 174 (2014). https://doi.org/10.1007/s13278-014-0174-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-014-0174-8

Keywords

Navigation