Abstract
News articles are an engaging type of online content that captures the attention of a significant amount of Internet users. They are particularly enjoyed by mobile users and massively spread through online social platforms. As a result, there is an increased interest in discovering the articles that will become popular among users. This objective falls under the broad scope of content popularity prediction and has direct implications in the development of new services for online advertisement and content distribution. In this paper, we address the problem of predicting the popularity of news articles based on user comments. We formulate the prediction task as a ranking problem, where the goal is not to infer the precise attention that a content will receive but to accurately rank articles based on their predicted popularity. Using data obtained from two important news sites in France and Netherlands, we analyze the ranking effectiveness of two prediction models. Our results indicate that popularity prediction methods are adequate solutions for this ranking task and could be considered as a valuable alternative for automatic online news ranking.
Similar content being viewed by others
Notes
According to the latest statistics of the two sites: http://corporate.tmg.nl/en/result-second-quarter-2012 ( telegraaf ); http://www.mediametrie.fr ( 20minutes ).
Statistical techniques based on maximum-likelihood methods and Kolmogorov–Smirnov statistics.
The algorithm uses the number of blog posts to predict users’ interest in articles.
We deploy these algorithms using RankLib open source library (Dang 2012).
Information about section and author are available only for 20minutes data set.
References
Adamic LA, Glance N (2005) The political blogosphere and the 2004 us election: divided they blog. In: Proceedings of the 3rd international workshop on Link discovery. ACM, New York, pp 36–43
Bandari R, Asur S, Huberman B (2012) The pulse of news in social media: forecasting popularity. Arxiv preprint arXiv:12020332
Breslau L, Cao P, Fan L, Phillips G, Shenker S (1999) Web caching and zipf-like distributions: evidence and implications. In: INFOCOM’99. eighteenth annual joint conference of the IEEE computer and communications societies. Proceedings. IEEE, IEEE, vol 1, pp 126–134
Cha M, Kwak H, Rodriguez P, Ahn Y, Moon S (2007) I tube, you tube, everybody tubes: analyzing the world’s largest user generated content video system. In: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement. ACM, New York, pp 1–14
Cha M, Mislove A, Gummadi KP (2009) A measurement-driven analysis of information propagation in the flickr social network. In: Proceedings of the 18th international conference on World wide web. ACM, New York, pp 721–730
Cha M, Pérez JAN, Haddadi H (2012) The spread of media content through blogs. Soc Netw Anal Min 2(3):249–264
Clauset A, Shalizi C, Newman M (2009) Power-law distributions in empirical data. SIAM Rev 51:661–703
Crane R, Sornette D (2008) Robust dynamic classes revealed by measuring the response function of a social system. Proc Natl Acad Sci 105(41):15649–15653
Dang V (2012) Ranklib library. http://people.cs.umass.edu/vdang/ranklib.html
De Francisci Morales G, Gionis A, Lucchese C (2012) From chatter to headlines: harnessing the real-time web for personalized news recommendation. In: Proceedings of the fifth ACM international conference on Web search and data mining, ACM, New York, pp 153–162
Dezsö Z, Almaas E, Lukács A, Rácz B, Szakadát I, Barabási AL (2006) Dynamics of information access on the web. Phys Rev E 73(6):066,132
Easley D, Kleinberg J (2010) Networks, crowds, and markets: reasoning about a highly connected world. Cambridge University Press, Cambridge
Fortunato S, Flammini A, Menczer F, Vespignani A (2006) Topical interests and the mitigation of search engine bias. Proc Natl Acad Sci 103(34):12684–12689
Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969
Friedman JH (2001) Greedy function approximation: a gradient boosting machine.(english summary). Ann Stat 29(5):1189–1232
Guo L, Tan E, Chen S, Xiao Z, Zhang X (2008) The stretched exponential distribution of internet media access patterns. In: Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing, ACM, New York, pp 283–294
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of ir techniques. ACM Trans Inf Syst (TOIS) 20(4):422–446
Kaltenbrunner A, Gomez V, Lopez V (2007a) Description and prediction of slashdot activity. In: Proceedings of the 2007 Latin American web conference. IEEE Computer Society, Washington, DC, pp 57–66
Kaltenbrunner A, Gómez V, Moghnieh A, Meza R, Blat J, López V (2007b) Homogeneous temporal activity patterns in a large online communication space. CoRR URL http://arxiv.org/abs/0708.1579v1
Lee J, Moon S, Salamatian K (2010) An approach to model and predict the popularity of online contents with explanatory factors. In: Proceedings of the 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, vol 01. IEEE Computer Society, Washington, DC
Lerman K, Ghosh R (2010) Information contagion: an empirical study of the spread of news on Digg and Twitter social networks. In: Proceedings of 4th international conference on weblogs and social media (ICWSM), URL http://arxiv.org/abs/1003.2664
Lerman K, Hogg T (2010) Using a model of social dynamics to predict popularity of news. In: Proceedings of the 19th international conference on World wide web,WWW ’10. ACM, New York, pp 621–630
Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, KDD ’09
Li CT, Kuo TT, Ho CT, Hong SC, Lin WS, Lin SD (2013) Modeling and evaluating information propagation in a microblogging social network. Soc Netw Anal Min, pp 1–17
Liu TY (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331
Macskassy SA (2011) Contextual linking behavior of bloggers: leveraging text mining to enable topic-based analysis. Soc Netw Anal Min 1(4):355–375
McCreadie R, Macdonald C, Ounis I (2010) News article ranking: Leveraging the wisdom of bloggers. In: Adaptivity, Personalization and Fusion of Heterogeneous Information, pp 40–48
Mitzenmacher M (2004) A brief history of generative models for power law and lognormal distributions. Internet Math 1(2):226–251
Simkin M, Roychowdhury V (2012) Why does attention to web articles fall with time? Arxiv preprint arXiv:12023492
Szabo G, Huberman BA (2008) Predicting the popularity of online content. Commun ACM 53(8):80
Tatar A, Antoniadis P, Limbourg A, de Amorim MD, Leguay J, Fdida S (2011) Predicting the popularity of online articles based on user comments. In: WIMS’11. ACM, New York, pp 67–75
Tsagkias M, Weerkamp W, De Rijke M (2009) Predicting the volume of comments on online news stories. In: Proceeding of the 18th ACM conference on Information and knowledge management. ACM, New York, pp 1765–1768
Tsagkias M, Weerkamp W, De Rijke M (2010) News comments: exploring, modeling, and online prediction. In: Proceedings of the 32nd European conference on advances in information retrieval, ECIR2010. Springer, Berlin
Van Mieghem P, Blenn N, Doerr C (2011) Lognormal distribution in the digg online social network. Eur Phys J B 83:251–261
Wu F, Huberman B (2007) Novelty and collective attention. Proc Natl Acad Sci 104(45):17599
Wu Q, Burges CJ, Svore KM, Gao J (2010) Adapting boosting for information retrieval measures. Inf Retr 13(3):254–270
Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 391–398
Zhou R, Khemmarat S, Gao L (2010) The impact of youtube recommendation system on video views. In: Proceedings of the 10th ACM SIGCOMM conference on internet measurement, IMC’10. ACM, New York, pp 404–410
Acknowledgments
The work presented in this paper has been carried out at LINCS (http://www.lincs.fr) and was partially supported by the ANR project Crowd under contract ANR-08-VERS-006 and EINS, the Network of Excellence in Internet Science, FP7 grant 288021. The authors are grateful to Manos Tsagkias for sharing the telegraaf data set.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is a significant extension of our previous work, “Ranking news articles based on popularity prediction”, published at ASONAM 2012.
Rights and permissions
About this article
Cite this article
Tatar, A., Antoniadis, P., Amorim, M.D.d. et al. From popularity prediction to ranking online news. Soc. Netw. Anal. Min. 4, 174 (2014). https://doi.org/10.1007/s13278-014-0174-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-014-0174-8