Abstract
The methods used to produce news rankings by recommender systems are not public and it is unclear if they reflect the real importance assigned by readers. We address the task of trying to forecast the number of times a news item will be tweeted, as a proxy for the importance assigned by its readers. We focus on methods for accurately forecasting which news will have a high number of tweets as these are the key for accurate recommendations. This type of news is rare and this creates difficulties to standard prediction methods. Recent research has shown that most models will fail on tasks where the goal is accuracy on a small sub-set of rare values of the target variable. In order to overcome this, resampling approaches with several methods for handling imbalanced regression tasks were tested in our domain. This paper describes and discusses the results of these experimental comparisons.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asur, S., Huberman, B.A.: Predicting the future with social media. In: Proc. of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2010, vol. 1, pp. 492–499. IEEE Computer Society (2010)
Bandari, R., Asur, S., Huberman, B.A.: The pulse of news in social media: Forecasting popularity. CoRR (2012)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)
Curtiss, M., Bharat, K., Schmitt, M.: Systems and methods for improving the ranking of news articles. US Patent App. 10/662,931 (March 17, 2005)
Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in r. Journal of Statistical Software 5(25), 1–54 (2008)
Filloux, F., Gassee, J.: Google news: The secret sauce. Monday Note (2013), URL, http://www.mondaynote.com/2013/02/24/google-news-the-secret-sauce/
Gupta, M., Gao, J., Zhai, C., Han, J.: Predicting future popularity trend of events in microblogging platforms. In: ASIS&T 75th Annual Meeting (2012)
Hong, L., Dom, B., Gurumurthy, S., Tsioutsiouliklis, K.: A time-dependent topic model for multiple text streams. In: Proc. of the 17th ACM SIGKDD, KDD 2011. pp. 832–840. ACM (2011)
Hsieh, C., Moghbel, C., Fang, J., Cho, J.: Experts vs. the crowd: examining popular news prediction performance on twitter. In: Proc. of ACM KDD Conference (2013)
Hu, M., Liu, B.: Mining opinion features in customer reviews. In: Proc. of the 19th National Conference on Artificial Intelligence, AAAI 2004 (2004)
Kim, S., Kim, S., Cho, H.: Predicting the virtual temperature of web-blog articles as a measurement tool for online popularity. In: Proc. of the 2011 IEEE 11th International Conference on Computer and Information Technology, CIT 2011, pp. 449–454. IEEE Computer Society (2011)
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proc. of the 14th Int. Conf. on Machine Learning, Nashville, TN, USA, pp. 179–186. Morgan Kaufmann (1997)
Lee, J.G., Moon, S., Salamatian, K.: An approach to model and predict the popularity of online contents with explanatory factors. In: Proc. of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2010, vol. 1, pp. 623–630. IEEE Computer Society (2010)
Lerman, K., Hogg, T.: Using a model of social dynamics to predict popularity of news. In: Proc. of the 19th International Conference on World Wide Web, WWW 2010, pp. 621–630. ACM (2010)
Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 497–506. ACM (2009)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University (1998)
Ribeiro, R.: Utility-based Regression. PhD thesis, Dep. Computer Science, Faculty of Sciences - University of Porto (2011)
Rinker, T.W.: qdap: Quantitative Discourse Analysis Package. University at Buffalo/SUNY (2013)
Szabo, G., Huberman, B.A.: Predicting the popularity of online content. Commun. ACM 53(8), 80–88 (2010)
Tatar, A., Leguay, J., Antoniadis, P., Limbourg, A., Amorim, M.D.d., Fdida, S.: Predicting the popularity of online articles based on user comments. In: Proc. of the International Conference on Web Intelligence, Mining and Semantics, WIMS 2011, pp. 67:1–67:8. ACM (2011)
Torgo, L.: An Infra-Structure for Performance Estimation and Experimental Comparison of Predictive Models (2013), https://github.com/ltorgo/performanceEstimation
Torgo, L., Ribeiro, R.: Utility-based regression. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 597–604. Springer, Heidelberg (2007)
Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: SMOTE for regression. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA 2013. LNCS, vol. 8154, pp. 378–389. Springer, Heidelberg (2013)
Torgo, L., Branco, P., Ribeiro, R., Pfahringer, B.: Re-sampling strategies for regression. Expert Systems (to appear, 2014)
Zaman, T., Fox, E.B., Bradlow, E.T.: A Bayesian Approach for Predicting the Popularity of Tweets. Technical Report arXiv:1304.6777 (April 2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Moniz, N., Torgo, L., Rodrigues, F. (2014). Resampling Approaches to Improve News Importance Prediction. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds) Advances in Intelligent Data Analysis XIII. IDA 2014. Lecture Notes in Computer Science, vol 8819. Springer, Cham. https://doi.org/10.1007/978-3-319-12571-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-12571-8_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12570-1
Online ISBN: 978-3-319-12571-8
eBook Packages: Computer ScienceComputer Science (R0)