Skip to main content
Log in

A Framework for Recommendation of Highly Popular News Lacking Social Feedback

  • Special Feature
  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

Social media is rapidly becoming the main source of news consumption for users, raising significant challenges to news aggregation and recommendation tasks. One of these challenges concerns the recommendation of very recent news. To tackle this problem, approaches to the prediction of news popularity have been proposed. In this paper, we study the task of predicting news popularity upon their publication, when social feedback is unavailable or scarce, and to use such predictions to produce news rankings. Unlike previous work, we focus on accurately predicting highly popular news. Such cases are rare, causing known issues for standard prediction models and evaluation metrics. To overcome such issues we propose the use of resampling strategies to bias learners towards these rare cases of highly popular news, and a utility-based framework for evaluating their performance. An experimental evaluation is performed using real-world data to test our proposal in distinct scenarios. Results show that our proposed approaches improve the ability of predicting and recommending highly popular news upon publication, in comparison to previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. This process was carried out using the density function in R.

  2. The total number of news for all topics is 106.456.

  3. Twitter API: https://dev.twitter.com/docs/api. The count method was deprecated on 20th of November, 2015.

  4. Preliminary results obtained with evaluation metric described in Sect. 5.3.1 show that the application of a bag of words approach to the headline of news produces better results than when applied to the title.

  5. We have established that the relevant items are those which belong to the top 10.

  6. Lingpipe 4.1.0: http://aliasi.com/lingpipe.

References

  1. Ahmed, M., Spagna, S., Huici, F., Niccolini, S. A peek into the future: Predicting the evolution of popularity in user generated content. In: Proc. of 6th ACM WSDM, pp. 607–616. ACM, New York (2013)

  2. Akima, H., Gebhardt, A. Akima: Interpolation of Irregularly and Regularly Spaced Data (2015). https://CRAN.R-project.org/package=akima, r package version0.5-12

  3. Bandari, R., Asur, S., Huberman, B.A.: The Pulse of News in Social Media: Forecasting Popularity (2012). arXiv:abs/1202.0332 (CoRR)

  4. Berger, J., Milkman, K.L.: What Makes Online Content Viral? J. Mark. Res. 49(2), 192–205 (2012)

    Article  Google Scholar 

  5. Branco, P.: Re-sampling Approaches for Regression Tasks Under Imbalanced Domains. Ph.D. thesis, Universidade do Porto (2014)

  6. Branco, P., Ribeiro, R.P., Torgo, L. UBL: an r package for utility-based learning (2016). arXiv:abs/1604.08079 (CoRR)

  7. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic Minority Over-Sampling Technique. JAIR 16, 321–357 (2002)

    MATH  Google Scholar 

  8. David, E., Jon, K.: Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, New York (2010)

    MATH  Google Scholar 

  9. Davis, J., Goadrich, M. The relationship between precision-recall and roc curves. In: Proc. of 23rd ICML, pp. 233–240. New York (2006)

  10. De Choudhury, M., Counts, S., Czerwinski, M.: Identifying relevant social media content: leveraging information diversity and user cognition. In: Proc. of the 22nd ACM HT, pp. 161–170. ACM, New York (2011)

  11. De Francisci Morales, G., Gionis, A., Lucchese, C.: From chatter to headlines: harnessing the real-time web for personalized news recommendation. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining. ACM, New York (2012)

  12. Dougherty, R.L., Edelman, A., Hyman, J.M.: Nonnegativity-, Monotonicity-, or Convexity-Preserving Cubic and Quintic Hermite Interpolation. Math. Comput. 52(186), 471–494 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  13. Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, IJCAI’01, vol. 2. Morgan Kaufmann Publishers Inc., San Francisco, pp. 973–978 (2001)

  14. Fawcett, T., Provost, F.: Activity monitoring: noticing interesting changes in behavior. Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’99, pp. 53–62. ACM, New York (1999)

  15. Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in r. Journal of Statistical Software 5(25), 1–54 (2008)

    Google Scholar 

  16. Figueiredo, F., Almeida, J.M., Gonçalves, M.A., Benevenuto, F.: On the Dynamics of Social Media Popularity: A Youtube Case Study. TOIT 14(4), 24:1–24:23 (2014)

  17. Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An Efficient Boosting Algorithm for Combining Preferences. J. Mach. Learn. Res. 4, 933–969 (2003)

    MathSciNet  MATH  Google Scholar 

  18. Friedman, J.H.: Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 29, 1189–1232 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  19. Gürsun, G., Crovella, M., Matta, I.: Describing and Forecasting Video Access Patterns. In: Proc. of 2011 IEEE INFOCOM, pp. 16–20 (2011)

  20. Gupta, M., Gao, J., Zhai, C., Han, J.: Predicting future popularity trend of events in microblogging platforms. ASIS&T 75th Annual Meeting (2012)

  21. Hong, L., Dom, B., Gurumurthy, S., Tsioutsiouliklis, K.: A time-dependent topic model for multiple text streams. In: Proc. of the 17th ACM SIGKDD, ACM, KDD ’11, pp. 832–840 (2011)

  22. Hsieh, C., Moghbel, C., Fang, J., Cho, J.: Experts vs. the crowd: examining popular news prediction performance on twitter. In: Proc. of WWW (2013)

  23. Hu, M., Liu, B.: Mining opinion features in customer reviews. In: Proc. of 19th AAAI, pp. 755–760. AAAI Press, London (2004)

  24. Kaltenbrunner, A., Gomez, V., Lopez, V.: Description and prediction of slashdot activity. In: Proc. of the 2007 LA-WEB, pp. 57–66. IEEE (2007)

  25. Lee, J.G., Moon, S., Salamatian, K.: Modeling and Predicting the Popularity of Online Contents with Cox Proportional Hazard Regression Model. Neurocomputing 76(1), 134–145 (2012)

    Article  Google Scholar 

  26. Lerman, K., Ghosh, R.: Information contagion: An empirical study of the spread of news on digg and twitter social networks. In: Proc. of 4th ICWSM (2010)

  27. Liaw, A., Wiener, M.: Classification and Regression by Randomforest. R News 2(3), 18–22 (2002). http://CRAN.R-project.org/doc/Rnews/

  28. McCreadie, R.M.C., Macdonald, C., Ounis, I.: News article ranking: Leveraging the wisdom of bloggers. In: Adaptivity, Personalization and Fusion of Heterogeneous Information, LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE, RIAO ’10, pp. 40–48 (2010)

  29. Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics (e1071). TU Wien (2012). http://CRAN.R-project.org/package=e1071, r package version 1.6-1

  30. Milborrow, S.: Earth: Multivariate Adaptive Regression Spline Models (2013). http://CRAN.R-project.org/package=earth, r package version 3.2-6

  31. Moniz, N., Torgo, L., Rodrigues, F.: Resampling Approaches to Improve News Importance Prediction. Advances in Intelligent Data Analysis XIII, vol. 8819, pp. 215–226. Springer International Publishing, New York (2014)

  32. Moniz, N., Torgo, L., Eirinaki, M., Branco, P.: Time-based ensembles for prediction of rare events in news streams. In: 2016 IEEE International Conference on Data Mining Workshop. ICDMW (2016) (accepted)

  33. Newman, N., Fletcher, R., Levy, D.A.L., Nielsen, R.K.: Reuters institute digital news report 2016. Reuters Institute for the Study of Journal (University of Oxford), Tech. rep (2016)

    Google Scholar 

  34. Osborne, M., Dredze, M.: Facebook, twitter and google plus for breaking news: Is there a winner? In: Proc. of the 8th ICWSM (2014)

  35. Petrovic, S., Osborne, M., McCreadie, R., Macdonald C., Ounis, I., Shrimpton, L.: Can twitter replace newswire for breaking news? In: Proc. of 7th ICWSM. The AAAI Press, London (2013)

  36. Pinto, H., Almeida, J.M., Gonçalves, M.A.: Using early view patterns to predict the popularity of youtube videos. In: Proc. of 6th ACM WSDM, pp. 365–374. New York (2013)

  37. Ribeiro, R.: Utility-based regression. Ph.D. thesis, Dep. Computer Science, Faculty of Sciences—University of Porto (2011)

  38. Shulman, B., Sharma, A., Cosley, D.: Predictability of popularity: Gaps between prediction and understanding. In: Proc 10th ICWSM, pp. 348–357 (2016)

  39. Simkin, M.V., Roychowdhury, V.P.: Why does attention to web articles fall with time? J. Assoc. Inf. Sci. Technol. 66(9), 1847–1856 (2015)

    Article  Google Scholar 

  40. Suh, B., Hong, L., Pirolli, P., Chi, E.H. Want to be retweeted? large scale analytics on factors impacting retweet in twitter network. In: Proc. of the 2nd IEEE SOCIALCOM, pp. 177–184. IEEE, DC (2010)

  41. Szabo, G., Huberman, B.A.: Predicting the popularity of online content. Commun. ACM 53(8), 80–88 (2010)

    Article  Google Scholar 

  42. Tatar, A., Leguay, J., Antoniadis, P., Limbourg, A., de Amorim, M.D., Fdida, S.: Predicting the popularity of online articles based on user comments. In: Proc. of the 2011 WIMS, pp. 67:1–67:8 (2011)

  43. Tatar, A., Antoniadis, P., de Amorim, M.D., Fdida, S.: Ranking news articles based on popularity prediction. Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining. ASONAM ’12, pp. 106–110. IEEE Computer Society, Washington, DC (2012)

  44. Tatar, A., de Amorim, M., Fdida, S., Antoniadis, P.: A survey on predicting the popularity of web content. JISA 5(1), 8 (2014a)

    Google Scholar 

  45. Tatar, A., Antoniadis, P., Amorim, MDd, Fdida, S.: From Popularity Prediction to Ranking Online News. SNAM 4(1), 1–12 (2014b)

    Google Scholar 

  46. Torgo, L.: Data Mining with R, learning with case studies. Chapman and Hall/CRC, London (2010)

    Book  Google Scholar 

  47. Torgo, L.: An infra-structure for performance estimation and experimental comparison of predictive models in r (2014). arXiv:abs/1412.0436 (CoRR)

  48. Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: Smote for regression. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA. Lecture Notes in Computer Science, vol. 8154, pp. 378–389. Springer, New york (2013)

    Google Scholar 

  49. Torgo, L., Branco, P., Ribeiro, R.P., Pfahringer, B.: Resampling strategies for regression. Expert Syst. 32(3), 465–476 (2015)

    Article  Google Scholar 

  50. Tsagkias, M., Weerkamp, W., de Rijke, M.: Predicting the volume of comments on online news stories. In: Proc. of 18th ACM CIKM, pp. 1765–1768. New York (2009)

  51. Tsagkias, M., Weerkamp, W., Rijke, M.: News comments: Exploring, modeling, and online prediction. In: Proc. of 32nd ECIR, pp. 191–203 (2010)

  52. Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retr. 13(3), 254–270 (2010)

    Article  Google Scholar 

  53. Xu, J., Li, H.: Adarank: A boosting algorithm for information retrieval. In: Proc. of 30th ACM SIGIR, pp. 391–398. New York (2007)

  54. Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proc. of 4th ACM WSDM, pp. 177–186. New York (2011)

  55. Zaman, T., Fox, E.B., Bradlow, E.T.: A Bayesian Approach for Predicting the Popularity of Tweets (2013). arXiv:1304.6777

  56. Özgöbek, O., Gulla, J.A., Erdur, R.C.: A survey on challenges and methods in news recommendation. Proc of 10th WEBIST (2014)

Download references

Acknowledgements

This work is financed by the ERDF—European Regional Development Fund through the COMPETE 2020 Programme within project POCI-01-0145-FEDER-006961, and by National Funds through the FCT—Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) as part of project UID/EEA/50014/2013. The work of N. Moniz is supported by a Ph.D. scholarship of FCT (SFRH/BD/90180/2012). The work of P. Branco is supported by a Ph.D. scholarship of FCT (PD/BD/105788/2014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nuno Moniz.

Appendix

Appendix

The following list describes the parameters tested in each of the regression algorithms to optimize the results concerning the experimental evaluation (Sect. 6) carried out in this paper. For SVM models, we tested the parameters cost (c) and gamma (g); for MARS models, the parameters nk, degree (d) and thresh (th); and for Random Forest models parameters mtry (mt) and ntree (nt) were tested. We remind that these parameters correspond to the mentioned implementation of such models in R. The optimal parametrization is presented in Table 9.

  • svm: \({c} \in \{ 10,150,300 \}\), \(g \in \{ 0.01, 0.001 \}\);

  • mars: \({nk} \in \{ 10, 17 \}\), \(d \in \{ 1 , 2 \}\), \(th \in \{ 0.01, 0.001 \}\);

  • rf: \(mt \in \{ 5, 7 \}\), \(nt \in \{ 500, 750, 1500 \}\);

Table 9 Optimal parametrization for regression algorithms and resampling strategies, for all topics

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moniz, N., Torgo, L., Eirinaki, M. et al. A Framework for Recommendation of Highly Popular News Lacking Social Feedback. New Gener. Comput. 35, 417–450 (2017). https://doi.org/10.1007/s00354-017-0019-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-017-0019-x

Keywords

Navigation