Resampling Approaches to Improve News Importance Prediction

Moniz, Nuno; Torgo, Luís; Rodrigues, Fátima

doi:10.1007/978-3-319-12571-8_19

Nuno Moniz¹⁷,
Luís Torgo¹⁷ &
Fátima Rodrigues¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8819))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1525 Accesses
4 Citations

Abstract

The methods used to produce news rankings by recommender systems are not public and it is unclear if they reflect the real importance assigned by readers. We address the task of trying to forecast the number of times a news item will be tweeted, as a proxy for the importance assigned by its readers. We focus on methods for accurately forecasting which news will have a high number of tweets as these are the key for accurate recommendations. This type of news is rare and this creates difficulties to standard prediction methods. Recent research has shown that most models will fail on tasks where the goal is accuracy on a small sub-set of rare values of the target variable. In order to overcome this, resampling approaches with several methods for handling imbalanced regression tasks were tested in our domain. This paper describes and discusses the results of these experimental comparisons.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asur, S., Huberman, B.A.: Predicting the future with social media. In: Proc. of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2010, vol. 1, pp. 492–499. IEEE Computer Society (2010)
Google Scholar
Bandari, R., Asur, S., Huberman, B.A.: The pulse of news in social media: Forecasting popularity. CoRR (2012)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)
Google Scholar
Curtiss, M., Bharat, K., Schmitt, M.: Systems and methods for improving the ranking of news articles. US Patent App. 10/662,931 (March 17, 2005)
Google Scholar
Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in r. Journal of Statistical Software 5(25), 1–54 (2008)
Google Scholar
Filloux, F., Gassee, J.: Google news: The secret sauce. Monday Note (2013), URL, http://www.mondaynote.com/2013/02/24/google-news-the-secret-sauce/
Gupta, M., Gao, J., Zhai, C., Han, J.: Predicting future popularity trend of events in microblogging platforms. In: ASIS&T 75th Annual Meeting (2012)
Google Scholar
Hong, L., Dom, B., Gurumurthy, S., Tsioutsiouliklis, K.: A time-dependent topic model for multiple text streams. In: Proc. of the 17th ACM SIGKDD, KDD 2011. pp. 832–840. ACM (2011)
Google Scholar
Hsieh, C., Moghbel, C., Fang, J., Cho, J.: Experts vs. the crowd: examining popular news prediction performance on twitter. In: Proc. of ACM KDD Conference (2013)
Google Scholar
Hu, M., Liu, B.: Mining opinion features in customer reviews. In: Proc. of the 19th National Conference on Artificial Intelligence, AAAI 2004 (2004)
Google Scholar
Kim, S., Kim, S., Cho, H.: Predicting the virtual temperature of web-blog articles as a measurement tool for online popularity. In: Proc. of the 2011 IEEE 11th International Conference on Computer and Information Technology, CIT 2011, pp. 449–454. IEEE Computer Society (2011)
Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proc. of the 14th Int. Conf. on Machine Learning, Nashville, TN, USA, pp. 179–186. Morgan Kaufmann (1997)
Google Scholar
Lee, J.G., Moon, S., Salamatian, K.: An approach to model and predict the popularity of online contents with explanatory factors. In: Proc. of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2010, vol. 1, pp. 623–630. IEEE Computer Society (2010)
Google Scholar
Lerman, K., Hogg, T.: Using a model of social dynamics to predict popularity of news. In: Proc. of the 19th International Conference on World Wide Web, WWW 2010, pp. 621–630. ACM (2010)
Google Scholar
Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 497–506. ACM (2009)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University (1998)
Google Scholar
Ribeiro, R.: Utility-based Regression. PhD thesis, Dep. Computer Science, Faculty of Sciences - University of Porto (2011)
Google Scholar
Rinker, T.W.: qdap: Quantitative Discourse Analysis Package. University at Buffalo/SUNY (2013)
Google Scholar
Szabo, G., Huberman, B.A.: Predicting the popularity of online content. Commun. ACM 53(8), 80–88 (2010)
Article Google Scholar
Tatar, A., Leguay, J., Antoniadis, P., Limbourg, A., Amorim, M.D.d., Fdida, S.: Predicting the popularity of online articles based on user comments. In: Proc. of the International Conference on Web Intelligence, Mining and Semantics, WIMS 2011, pp. 67:1–67:8. ACM (2011)
Google Scholar
Torgo, L.: An Infra-Structure for Performance Estimation and Experimental Comparison of Predictive Models (2013), https://github.com/ltorgo/performanceEstimation
Torgo, L., Ribeiro, R.: Utility-based regression. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 597–604. Springer, Heidelberg (2007)
Google Scholar
Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: SMOTE for regression. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA 2013. LNCS, vol. 8154, pp. 378–389. Springer, Heidelberg (2013)
Chapter Google Scholar
Torgo, L., Branco, P., Ribeiro, R., Pfahringer, B.: Re-sampling strategies for regression. Expert Systems (to appear, 2014)
Google Scholar
Zaman, T., Fox, E.B., Bradlow, E.T.: A Bayesian Approach for Predicting the Popularity of Tweets. Technical Report arXiv:1304.6777 (April 2013)
Google Scholar

Download references

Author information

Authors and Affiliations

LIAAD-INESC TEC / FCUP-DCC, University of Porto, Portugal
Nuno Moniz & Luís Torgo
GECAD - ISEP/IPP, ISEP-DEI, Polytechnic of Porto, Portugal
Fátima Rodrigues

Authors

Nuno Moniz
View author publications
You can also search for this author in PubMed Google Scholar
Luís Torgo
View author publications
You can also search for this author in PubMed Google Scholar
Fátima Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, KU Leuven, 3001, Heverlee, Belgium
Hendrik Blockeel & Matthijs van Leeuwen &
Brunel University, UB8 3PH, Uxbridge, UK
Veronica Vinciotti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moniz, N., Torgo, L., Rodrigues, F. (2014). Resampling Approaches to Improve News Importance Prediction. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds) Advances in Intelligent Data Analysis XIII. IDA 2014. Lecture Notes in Computer Science, vol 8819. Springer, Cham. https://doi.org/10.1007/978-3-319-12571-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-12571-8_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12570-1
Online ISBN: 978-3-319-12571-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics