Modeling and predicting the popularity of online news based on temporal and content-related features

Abstract

As the market of globally available online news is large and still growing, there is a strong competition between online publishers in order to reach the largest possible audience. Therefore an intelligent online publishing strategy is of the highest importance to publishers. A prerequisite for being able to optimize any online strategy, is to have trustworthy predictions of how popular new online content may become. This paper presents a novel methodology to model and predict the popularity of online news. We first introduce a new strategy and mathematical model to capture view patterns of online news. After a thorough analysis of such view patterns, we show that well-chosen base functions lead to suitable models, and show how the influence of day versus night on the total view patterns can be taken into account to further increase the accuracy, without leading to more complex models. Second, we turn to the prediction of future popularity, given recently published content. By means of a new real-world dataset, we show that the combination of features related to content, meta-data, and the temporal behavior leads to significantly improved predictions, compared to existing approaches which only consider features based on the historical popularity of the considered articles. Whereas traditionally linear regression is used for the application under study, we show that the more expressive gradient tree boosting method proves beneficial for predicting news popularity.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Notes

  1. 1.

    http://newsmonkey.be

  2. 2.

    https://sites.google.com/site/predictivechallenge2014/

  3. 3.

    http://newsmonkey.be

  4. 4.

    Buzzfeed is a popular American news platform, and one of the first who focuses on highly shareable breaking news, original reporting, entertainment, and video.

  5. 5.

    https://www.kaggle.com/

  6. 6.

    http://scikit-learn.org

  7. 7.

    https://sites.google.com/site/predictivechallenge2014/

References

  1. 1.

    Arapakis I, Cambazoglu BB, Lalmas M (2014) On the feasibility of predicting news popularity at cold start. In: Proceedings of the 6th Internation Conference on Social Informatics, pp 290–299

  2. 2.

    Bandari R, Asur S, Huberman B (2012) The pulse of news in social media: Forecasting popularity. In: Proceedings of the 6th International Conference on Weblogs and Social Media, pp 26–33

  3. 3.

    Barber B (2012) Bayesian reasoning and machine learning. Cambridge University Press

  4. 4.

    Berger J, Milkman KL (2012) What makes online content viral? J Market Res 49(2):192–205

    Article  Google Scholar 

  5. 5.

    Castillo C, El-Haddad M, Stempeck M, Jazeera A, Pfeffer J (2014) Characterizing the life cycle of online news stories using social media reactions. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing, pp 211–213

  6. 6.

    Cheng A, Evans M, Singh H (2014) Inside Twitter: An in-depth look inside the Twitter world. Technical report

  7. 7.

    DeGroot MH, Schervish MJ (2010) Probability and statistics

  8. 8.

    Deleu J, Moor AD (2012) Named entity recognition on Flemish audio-visual and news-paper archives. In: Proceedings of the 12th Dutch-Belgian Information Retrieval Workshop, pp 38–41

  9. 9.

    Figueiredo F, Gonçalves M, Almeida JM (2014) Improving the effectiveness of content popularity prediction methods using time series trends. In: ECML/PKDD Discovery Challenge on Predictive Analytics, pp 1–6

  10. 10.

    Kaltenbrunner A, Gómez V, López V (2007) Description and prediction of Slashdot activity. In: Proceedings of the Latin American Web Conference, pp 57–66

  11. 11.

    Kim SD, Kim SH, Cho HG (2011) Predicting the virtual temperature of web-blog articles as a measurement tool for online popularity. In: Proceedings of the 11th International Conference on Computer and Information Technology, pp 449–454

  12. 12.

    Kong S (2014) Predicting future retweet counts in a microblog. J Comput Inf Syst 4(10):1393–1404

    Google Scholar 

  13. 13.

    Manning CD, Raghavan P, Schütze H. (2008) Introduction to information retrieval. Cambridge University Press, NY, USA

    Google Scholar 

  14. 14.

    Oghina A, Breuss M, Tsagkias M, De Rijke M (2012) Predicting IMDB movie ratings using social media. In: Proceedings of the 34th European Conference on Advances in Information Retrieval, pp 503–507

  15. 15.

    Pinto H, Almeida JM, Gonçalves M. A. (2013) Using early view patterns to predict the popularity of YouTube videos. In: Proceedings of the 6h ACM International Conference on Web Search and Data Mining, pp 365–374

  16. 16.

    Sakai T (2006) Evaluating evaluation metrics based on the bootstrap. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 525–532

  17. 17.

    Szabo G, Huberman B (2008) Predicting the popularity of online content. Commun ACM 53:80–88

    Article  Google Scholar 

  18. 18.

    Tatar A, Antoniadis P, De Amorim MD, Fdida S (2014) From popularity prediction to ranking online news. Soc Netw Anal Min 4(1):174–186

    Article  Google Scholar 

  19. 19.

    Tsagkias M, Weerkamp W, De Rijke M (2009) Predicting the volume of comments on online news stories. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp 1765–1768

  20. 20.

    Tsagkias M, Weerkamp W, De Rijke M (2010) News comments: Exploring, modeling, and online prediction. In: Proceedings of the 32nd European Conference on Advances in Information Retrieval, pp 191–203

Download references

Acknowledgments

We thank Ke Zhou for useful suggestions on drafts of the manuscript. Steven Van Canneyt is funded by a Ph.D. grant of the Agency for Innovation by Science and Technology in Flanders (IWT). Part of the presented research was performed within the MIX-ICON project PROVIDENCE, facilitated by iMinds-Media and funded by the IWT.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Steven Van Canneyt.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Van Canneyt, S., Leroux, P., Dhoedt, B. et al. Modeling and predicting the popularity of online news based on temporal and content-related features. Multimed Tools Appl 77, 1409–1436 (2018). https://doi.org/10.1007/s11042-017-4348-z

Download citation

Keywords

  • Online news
  • Popularity modeling
  • Popularity prediction
  • Regression
  • Feature engineering