Abstract
As the market of globally available online news is large and still growing, there is a strong competition between online publishers in order to reach the largest possible audience. Therefore an intelligent online publishing strategy is of the highest importance to publishers. A prerequisite for being able to optimize any online strategy, is to have trustworthy predictions of how popular new online content may become. This paper presents a novel methodology to model and predict the popularity of online news. We first introduce a new strategy and mathematical model to capture view patterns of online news. After a thorough analysis of such view patterns, we show that well-chosen base functions lead to suitable models, and show how the influence of day versus night on the total view patterns can be taken into account to further increase the accuracy, without leading to more complex models. Second, we turn to the prediction of future popularity, given recently published content. By means of a new real-world dataset, we show that the combination of features related to content, meta-data, and the temporal behavior leads to significantly improved predictions, compared to existing approaches which only consider features based on the historical popularity of the considered articles. Whereas traditionally linear regression is used for the application under study, we show that the more expressive gradient tree boosting method proves beneficial for predicting news popularity.
This is a preview of subscription content, access via your institution.











Notes
Buzzfeed is a popular American news platform, and one of the first who focuses on highly shareable breaking news, original reporting, entertainment, and video.
References
Arapakis I, Cambazoglu BB, Lalmas M (2014) On the feasibility of predicting news popularity at cold start. In: Proceedings of the 6th Internation Conference on Social Informatics, pp 290–299
Bandari R, Asur S, Huberman B (2012) The pulse of news in social media: Forecasting popularity. In: Proceedings of the 6th International Conference on Weblogs and Social Media, pp 26–33
Barber B (2012) Bayesian reasoning and machine learning. Cambridge University Press
Berger J, Milkman KL (2012) What makes online content viral? J Market Res 49(2):192–205
Castillo C, El-Haddad M, Stempeck M, Jazeera A, Pfeffer J (2014) Characterizing the life cycle of online news stories using social media reactions. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing, pp 211–213
Cheng A, Evans M, Singh H (2014) Inside Twitter: An in-depth look inside the Twitter world. Technical report
DeGroot MH, Schervish MJ (2010) Probability and statistics
Deleu J, Moor AD (2012) Named entity recognition on Flemish audio-visual and news-paper archives. In: Proceedings of the 12th Dutch-Belgian Information Retrieval Workshop, pp 38–41
Figueiredo F, Gonçalves M, Almeida JM (2014) Improving the effectiveness of content popularity prediction methods using time series trends. In: ECML/PKDD Discovery Challenge on Predictive Analytics, pp 1–6
Kaltenbrunner A, Gómez V, López V (2007) Description and prediction of Slashdot activity. In: Proceedings of the Latin American Web Conference, pp 57–66
Kim SD, Kim SH, Cho HG (2011) Predicting the virtual temperature of web-blog articles as a measurement tool for online popularity. In: Proceedings of the 11th International Conference on Computer and Information Technology, pp 449–454
Kong S (2014) Predicting future retweet counts in a microblog. J Comput Inf Syst 4(10):1393–1404
Manning CD, Raghavan P, Schütze H. (2008) Introduction to information retrieval. Cambridge University Press, NY, USA
Oghina A, Breuss M, Tsagkias M, De Rijke M (2012) Predicting IMDB movie ratings using social media. In: Proceedings of the 34th European Conference on Advances in Information Retrieval, pp 503–507
Pinto H, Almeida JM, Gonçalves M. A. (2013) Using early view patterns to predict the popularity of YouTube videos. In: Proceedings of the 6h ACM International Conference on Web Search and Data Mining, pp 365–374
Sakai T (2006) Evaluating evaluation metrics based on the bootstrap. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 525–532
Szabo G, Huberman B (2008) Predicting the popularity of online content. Commun ACM 53:80–88
Tatar A, Antoniadis P, De Amorim MD, Fdida S (2014) From popularity prediction to ranking online news. Soc Netw Anal Min 4(1):174–186
Tsagkias M, Weerkamp W, De Rijke M (2009) Predicting the volume of comments on online news stories. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp 1765–1768
Tsagkias M, Weerkamp W, De Rijke M (2010) News comments: Exploring, modeling, and online prediction. In: Proceedings of the 32nd European Conference on Advances in Information Retrieval, pp 191–203
Acknowledgments
We thank Ke Zhou for useful suggestions on drafts of the manuscript. Steven Van Canneyt is funded by a Ph.D. grant of the Agency for Innovation by Science and Technology in Flanders (IWT). Part of the presented research was performed within the MIX-ICON project PROVIDENCE, facilitated by iMinds-Media and funded by the IWT.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Van Canneyt, S., Leroux, P., Dhoedt, B. et al. Modeling and predicting the popularity of online news based on temporal and content-related features. Multimed Tools Appl 77, 1409–1436 (2018). https://doi.org/10.1007/s11042-017-4348-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4348-z