A Framework for Recommendation of Highly Popular News Lacking Social Feedback

Moniz, Nuno; Torgo, Luís; Eirinaki, Magdalini; Branco, Paula

doi:10.1007/s00354-017-0019-x

A Framework for Recommendation of Highly Popular News Lacking Social Feedback

Special Feature
Published: 23 May 2017

Volume 35, pages 417–450, (2017)
Cite this article

New Generation Computing Aims and scope Submit manuscript

Nuno Moniz¹,
Luís Torgo¹,
Magdalini Eirinaki² &
…
Paula Branco¹

456 Accesses
11 Citations
Explore all metrics

Abstract

Social media is rapidly becoming the main source of news consumption for users, raising significant challenges to news aggregation and recommendation tasks. One of these challenges concerns the recommendation of very recent news. To tackle this problem, approaches to the prediction of news popularity have been proposed. In this paper, we study the task of predicting news popularity upon their publication, when social feedback is unavailable or scarce, and to use such predictions to produce news rankings. Unlike previous work, we focus on accurately predicting highly popular news. Such cases are rare, causing known issues for standard prediction models and evaluation metrics. To overcome such issues we propose the use of resampling strategies to bias learners towards these rare cases of highly popular news, and a utility-based framework for evaluating their performance. An experimental evaluation is performed using real-world data to test our proposal in distinct scenarios. Results show that our proposed approaches improve the ability of predicting and recommending highly popular news upon publication, in comparison to previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing the recency-relevance-diversity trade-offs in non-personalized news recommendations

Article Open access 27 February 2019

Machine Learning-Based Social Media News Popularity Prediction

On the Feasibility of Predicting News Popularity at Cold Start

Notes

This process was carried out using the density function in R.
The total number of news for all topics is 106.456.
Twitter API: https://dev.twitter.com/docs/api. The count method was deprecated on 20th of November, 2015.
Preliminary results obtained with evaluation metric described in Sect. 5.3.1 show that the application of a bag of words approach to the headline of news produces better results than when applied to the title.
We have established that the relevant items are those which belong to the top 10.
Lingpipe 4.1.0: http://aliasi.com/lingpipe.

References

Ahmed, M., Spagna, S., Huici, F., Niccolini, S. A peek into the future: Predicting the evolution of popularity in user generated content. In: Proc. of 6th ACM WSDM, pp. 607–616. ACM, New York (2013)
Akima, H., Gebhardt, A. Akima: Interpolation of Irregularly and Regularly Spaced Data (2015). https://CRAN.R-project.org/package=akima, r package version0.5-12
Bandari, R., Asur, S., Huberman, B.A.: The Pulse of News in Social Media: Forecasting Popularity (2012). arXiv:abs/1202.0332 (CoRR)
Berger, J., Milkman, K.L.: What Makes Online Content Viral? J. Mark. Res. 49(2), 192–205 (2012)
Article Google Scholar
Branco, P.: Re-sampling Approaches for Regression Tasks Under Imbalanced Domains. Ph.D. thesis, Universidade do Porto (2014)
Branco, P., Ribeiro, R.P., Torgo, L. UBL: an r package for utility-based learning (2016). arXiv:abs/1604.08079 (CoRR)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic Minority Over-Sampling Technique. JAIR 16, 321–357 (2002)
MATH Google Scholar
David, E., Jon, K.: Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, New York (2010)
MATH Google Scholar
Davis, J., Goadrich, M. The relationship between precision-recall and roc curves. In: Proc. of 23rd ICML, pp. 233–240. New York (2006)
De Choudhury, M., Counts, S., Czerwinski, M.: Identifying relevant social media content: leveraging information diversity and user cognition. In: Proc. of the 22nd ACM HT, pp. 161–170. ACM, New York (2011)
De Francisci Morales, G., Gionis, A., Lucchese, C.: From chatter to headlines: harnessing the real-time web for personalized news recommendation. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining. ACM, New York (2012)
Dougherty, R.L., Edelman, A., Hyman, J.M.: Nonnegativity-, Monotonicity-, or Convexity-Preserving Cubic and Quintic Hermite Interpolation. Math. Comput. 52(186), 471–494 (1989)
Article MathSciNet MATH Google Scholar
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, IJCAI’01, vol. 2. Morgan Kaufmann Publishers Inc., San Francisco, pp. 973–978 (2001)
Fawcett, T., Provost, F.: Activity monitoring: noticing interesting changes in behavior. Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’99, pp. 53–62. ACM, New York (1999)
Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in r. Journal of Statistical Software 5(25), 1–54 (2008)
Google Scholar
Figueiredo, F., Almeida, J.M., Gonçalves, M.A., Benevenuto, F.: On the Dynamics of Social Media Popularity: A Youtube Case Study. TOIT 14(4), 24:1–24:23 (2014)
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An Efficient Boosting Algorithm for Combining Preferences. J. Mach. Learn. Res. 4, 933–969 (2003)
MathSciNet MATH Google Scholar
Friedman, J.H.: Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 29, 1189–1232 (2000)
Article MathSciNet MATH Google Scholar
Gürsun, G., Crovella, M., Matta, I.: Describing and Forecasting Video Access Patterns. In: Proc. of 2011 IEEE INFOCOM, pp. 16–20 (2011)
Gupta, M., Gao, J., Zhai, C., Han, J.: Predicting future popularity trend of events in microblogging platforms. ASIS&T 75th Annual Meeting (2012)
Hong, L., Dom, B., Gurumurthy, S., Tsioutsiouliklis, K.: A time-dependent topic model for multiple text streams. In: Proc. of the 17th ACM SIGKDD, ACM, KDD ’11, pp. 832–840 (2011)
Hsieh, C., Moghbel, C., Fang, J., Cho, J.: Experts vs. the crowd: examining popular news prediction performance on twitter. In: Proc. of WWW (2013)
Hu, M., Liu, B.: Mining opinion features in customer reviews. In: Proc. of 19th AAAI, pp. 755–760. AAAI Press, London (2004)
Kaltenbrunner, A., Gomez, V., Lopez, V.: Description and prediction of slashdot activity. In: Proc. of the 2007 LA-WEB, pp. 57–66. IEEE (2007)
Lee, J.G., Moon, S., Salamatian, K.: Modeling and Predicting the Popularity of Online Contents with Cox Proportional Hazard Regression Model. Neurocomputing 76(1), 134–145 (2012)
Article Google Scholar
Lerman, K., Ghosh, R.: Information contagion: An empirical study of the spread of news on digg and twitter social networks. In: Proc. of 4th ICWSM (2010)
Liaw, A., Wiener, M.: Classification and Regression by Randomforest. R News 2(3), 18–22 (2002). http://CRAN.R-project.org/doc/Rnews/
McCreadie, R.M.C., Macdonald, C., Ounis, I.: News article ranking: Leveraging the wisdom of bloggers. In: Adaptivity, Personalization and Fusion of Heterogeneous Information, LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE, RIAO ’10, pp. 40–48 (2010)
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics (e1071). TU Wien (2012). http://CRAN.R-project.org/package=e1071, r package version 1.6-1
Milborrow, S.: Earth: Multivariate Adaptive Regression Spline Models (2013). http://CRAN.R-project.org/package=earth, r package version 3.2-6
Moniz, N., Torgo, L., Rodrigues, F.: Resampling Approaches to Improve News Importance Prediction. Advances in Intelligent Data Analysis XIII, vol. 8819, pp. 215–226. Springer International Publishing, New York (2014)
Moniz, N., Torgo, L., Eirinaki, M., Branco, P.: Time-based ensembles for prediction of rare events in news streams. In: 2016 IEEE International Conference on Data Mining Workshop. ICDMW (2016) (accepted)
Newman, N., Fletcher, R., Levy, D.A.L., Nielsen, R.K.: Reuters institute digital news report 2016. Reuters Institute for the Study of Journal (University of Oxford), Tech. rep (2016)
Google Scholar
Osborne, M., Dredze, M.: Facebook, twitter and google plus for breaking news: Is there a winner? In: Proc. of the 8th ICWSM (2014)
Petrovic, S., Osborne, M., McCreadie, R., Macdonald C., Ounis, I., Shrimpton, L.: Can twitter replace newswire for breaking news? In: Proc. of 7th ICWSM. The AAAI Press, London (2013)
Pinto, H., Almeida, J.M., Gonçalves, M.A.: Using early view patterns to predict the popularity of youtube videos. In: Proc. of 6th ACM WSDM, pp. 365–374. New York (2013)
Ribeiro, R.: Utility-based regression. Ph.D. thesis, Dep. Computer Science, Faculty of Sciences—University of Porto (2011)
Shulman, B., Sharma, A., Cosley, D.: Predictability of popularity: Gaps between prediction and understanding. In: Proc 10th ICWSM, pp. 348–357 (2016)
Simkin, M.V., Roychowdhury, V.P.: Why does attention to web articles fall with time? J. Assoc. Inf. Sci. Technol. 66(9), 1847–1856 (2015)
Article Google Scholar
Suh, B., Hong, L., Pirolli, P., Chi, E.H. Want to be retweeted? large scale analytics on factors impacting retweet in twitter network. In: Proc. of the 2nd IEEE SOCIALCOM, pp. 177–184. IEEE, DC (2010)
Szabo, G., Huberman, B.A.: Predicting the popularity of online content. Commun. ACM 53(8), 80–88 (2010)
Article Google Scholar
Tatar, A., Leguay, J., Antoniadis, P., Limbourg, A., de Amorim, M.D., Fdida, S.: Predicting the popularity of online articles based on user comments. In: Proc. of the 2011 WIMS, pp. 67:1–67:8 (2011)
Tatar, A., Antoniadis, P., de Amorim, M.D., Fdida, S.: Ranking news articles based on popularity prediction. Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining. ASONAM ’12, pp. 106–110. IEEE Computer Society, Washington, DC (2012)
Tatar, A., de Amorim, M., Fdida, S., Antoniadis, P.: A survey on predicting the popularity of web content. JISA 5(1), 8 (2014a)
Google Scholar
Tatar, A., Antoniadis, P., Amorim, MDd, Fdida, S.: From Popularity Prediction to Ranking Online News. SNAM 4(1), 1–12 (2014b)
Google Scholar
Torgo, L.: Data Mining with R, learning with case studies. Chapman and Hall/CRC, London (2010)
Book Google Scholar
Torgo, L.: An infra-structure for performance estimation and experimental comparison of predictive models in r (2014). arXiv:abs/1412.0436 (CoRR)
Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: Smote for regression. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA. Lecture Notes in Computer Science, vol. 8154, pp. 378–389. Springer, New york (2013)
Google Scholar
Torgo, L., Branco, P., Ribeiro, R.P., Pfahringer, B.: Resampling strategies for regression. Expert Syst. 32(3), 465–476 (2015)
Article Google Scholar
Tsagkias, M., Weerkamp, W., de Rijke, M.: Predicting the volume of comments on online news stories. In: Proc. of 18th ACM CIKM, pp. 1765–1768. New York (2009)
Tsagkias, M., Weerkamp, W., Rijke, M.: News comments: Exploring, modeling, and online prediction. In: Proc. of 32nd ECIR, pp. 191–203 (2010)
Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retr. 13(3), 254–270 (2010)
Article Google Scholar
Xu, J., Li, H.: Adarank: A boosting algorithm for information retrieval. In: Proc. of 30th ACM SIGIR, pp. 391–398. New York (2007)
Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proc. of 4th ACM WSDM, pp. 177–186. New York (2011)
Zaman, T., Fox, E.B., Bradlow, E.T.: A Bayesian Approach for Predicting the Popularity of Tweets (2013). arXiv:1304.6777
Özgöbek, O., Gulla, J.A., Erdur, R.C.: A survey on challenges and methods in news recommendation. Proc of 10th WEBIST (2014)

Download references

Acknowledgements

This work is financed by the ERDF—European Regional Development Fund through the COMPETE 2020 Programme within project POCI-01-0145-FEDER-006961, and by National Funds through the FCT—Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) as part of project UID/EEA/50014/2013. The work of N. Moniz is supported by a Ph.D. scholarship of FCT (SFRH/BD/90180/2012). The work of P. Branco is supported by a Ph.D. scholarship of FCT (PD/BD/105788/2014).

Author information

Authors and Affiliations

DCC, Faculdade de Ciências, Universidade do Porto, LIAAD-INESC TEC, Porto, Portugal
Nuno Moniz, Luís Torgo & Paula Branco
Computer Engineering Department, San Jose State University, San Jose, USA
Magdalini Eirinaki

Authors

Nuno Moniz
View author publications
You can also search for this author in PubMed Google Scholar
Luís Torgo
View author publications
You can also search for this author in PubMed Google Scholar
Magdalini Eirinaki
View author publications
You can also search for this author in PubMed Google Scholar
Paula Branco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nuno Moniz.

Appendix

The following list describes the parameters tested in each of the regression algorithms to optimize the results concerning the experimental evaluation (Sect. 6) carried out in this paper. For SVM models, we tested the parameters cost (c) and gamma (g); for MARS models, the parameters nk, degree (d) and thresh (th); and for Random Forest models parameters mtry (mt) and ntree (nt) were tested. We remind that these parameters correspond to the mentioned implementation of such models in R. The optimal parametrization is presented in Table 9.

svm: \({c} \in \{ 10,150,300 \}\), \(g \in \{ 0.01, 0.001 \}\);
mars: \({nk} \in \{ 10, 17 \}\), \(d \in \{ 1 , 2 \}\), \(th \in \{ 0.01, 0.001 \}\);
rf: \(mt \in \{ 5, 7 \}\), \(nt \in \{ 500, 750, 1500 \}\);

Table 9 Optimal parametrization for regression algorithms and resampling strategies, for all topics

Full size table

About this article

Cite this article

Moniz, N., Torgo, L., Eirinaki, M. et al. A Framework for Recommendation of Highly Popular News Lacking Social Feedback. New Gener. Comput. 35, 417–450 (2017). https://doi.org/10.1007/s00354-017-0019-x

Download citation

Received: 10 December 2016
Accepted: 07 May 2017
Published: 23 May 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s00354-017-0019-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Framework for Recommendation of Highly Popular News Lacking Social Feedback

Abstract

Access this article

Similar content being viewed by others

Optimizing the recency-relevance-diversity trade-offs in non-personalized news recommendations

Machine Learning-Based Social Media News Popularity Prediction

On the Feasibility of Predicting News Popularity at Cold Start

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

About this article

Cite this article

Keywords

Navigation

A Framework for Recommendation of Highly Popular News Lacking Social Feedback

Abstract

Access this article

Similar content being viewed by others

Optimizing the recency-relevance-diversity trade-offs in non-personalized news recommendations

Machine Learning-Based Social Media News Popularity Prediction

On the Feasibility of Predicting News Popularity at Cold Start

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

About this article

Cite this article

Share this article

Keywords

Search

Navigation