Skip to main content

Resampling Approaches to Improve News Importance Prediction

  • Conference paper
Advances in Intelligent Data Analysis XIII (IDA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8819))

Included in the following conference series:

Abstract

The methods used to produce news rankings by recommender systems are not public and it is unclear if they reflect the real importance assigned by readers. We address the task of trying to forecast the number of times a news item will be tweeted, as a proxy for the importance assigned by its readers. We focus on methods for accurately forecasting which news will have a high number of tweets as these are the key for accurate recommendations. This type of news is rare and this creates difficulties to standard prediction methods. Recent research has shown that most models will fail on tasks where the goal is accuracy on a small sub-set of rare values of the target variable. In order to overcome this, resampling approaches with several methods for handling imbalanced regression tasks were tested in our domain. This paper describes and discusses the results of these experimental comparisons.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asur, S., Huberman, B.A.: Predicting the future with social media. In: Proc. of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2010, vol. 1, pp. 492–499. IEEE Computer Society (2010)

    Google Scholar 

  2. Bandari, R., Asur, S., Huberman, B.A.: The pulse of news in social media: Forecasting popularity. CoRR (2012)

    Google Scholar 

  3. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)

    Google Scholar 

  4. Curtiss, M., Bharat, K., Schmitt, M.: Systems and methods for improving the ranking of news articles. US Patent App. 10/662,931 (March 17, 2005)

    Google Scholar 

  5. Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in r. Journal of Statistical Software 5(25), 1–54 (2008)

    Google Scholar 

  6. Filloux, F., Gassee, J.: Google news: The secret sauce. Monday Note (2013), URL, http://www.mondaynote.com/2013/02/24/google-news-the-secret-sauce/

  7. Gupta, M., Gao, J., Zhai, C., Han, J.: Predicting future popularity trend of events in microblogging platforms. In: ASIS&T 75th Annual Meeting (2012)

    Google Scholar 

  8. Hong, L., Dom, B., Gurumurthy, S., Tsioutsiouliklis, K.: A time-dependent topic model for multiple text streams. In: Proc. of the 17th ACM SIGKDD, KDD 2011. pp. 832–840. ACM (2011)

    Google Scholar 

  9. Hsieh, C., Moghbel, C., Fang, J., Cho, J.: Experts vs. the crowd: examining popular news prediction performance on twitter. In: Proc. of ACM KDD Conference (2013)

    Google Scholar 

  10. Hu, M., Liu, B.: Mining opinion features in customer reviews. In: Proc. of the 19th National Conference on Artificial Intelligence, AAAI 2004 (2004)

    Google Scholar 

  11. Kim, S., Kim, S., Cho, H.: Predicting the virtual temperature of web-blog articles as a measurement tool for online popularity. In: Proc. of the 2011 IEEE 11th International Conference on Computer and Information Technology, CIT 2011, pp. 449–454. IEEE Computer Society (2011)

    Google Scholar 

  12. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proc. of the 14th Int. Conf. on Machine Learning, Nashville, TN, USA, pp. 179–186. Morgan Kaufmann (1997)

    Google Scholar 

  13. Lee, J.G., Moon, S., Salamatian, K.: An approach to model and predict the popularity of online contents with explanatory factors. In: Proc. of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2010, vol. 1, pp. 623–630. IEEE Computer Society (2010)

    Google Scholar 

  14. Lerman, K., Hogg, T.: Using a model of social dynamics to predict popularity of news. In: Proc. of the 19th International Conference on World Wide Web, WWW 2010, pp. 621–630. ACM (2010)

    Google Scholar 

  15. Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 497–506. ACM (2009)

    Google Scholar 

  16. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University (1998)

    Google Scholar 

  17. Ribeiro, R.: Utility-based Regression. PhD thesis, Dep. Computer Science, Faculty of Sciences - University of Porto (2011)

    Google Scholar 

  18. Rinker, T.W.: qdap: Quantitative Discourse Analysis Package. University at Buffalo/SUNY (2013)

    Google Scholar 

  19. Szabo, G., Huberman, B.A.: Predicting the popularity of online content. Commun. ACM 53(8), 80–88 (2010)

    Article  Google Scholar 

  20. Tatar, A., Leguay, J., Antoniadis, P., Limbourg, A., Amorim, M.D.d., Fdida, S.: Predicting the popularity of online articles based on user comments. In: Proc. of the International Conference on Web Intelligence, Mining and Semantics, WIMS 2011, pp. 67:1–67:8. ACM (2011)

    Google Scholar 

  21. Torgo, L.: An Infra-Structure for Performance Estimation and Experimental Comparison of Predictive Models (2013), https://github.com/ltorgo/performanceEstimation

  22. Torgo, L., Ribeiro, R.: Utility-based regression. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 597–604. Springer, Heidelberg (2007)

    Google Scholar 

  23. Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: SMOTE for regression. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA 2013. LNCS, vol. 8154, pp. 378–389. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  24. Torgo, L., Branco, P., Ribeiro, R., Pfahringer, B.: Re-sampling strategies for regression. Expert Systems (to appear, 2014)

    Google Scholar 

  25. Zaman, T., Fox, E.B., Bradlow, E.T.: A Bayesian Approach for Predicting the Popularity of Tweets. Technical Report arXiv:1304.6777 (April 2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Moniz, N., Torgo, L., Rodrigues, F. (2014). Resampling Approaches to Improve News Importance Prediction. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds) Advances in Intelligent Data Analysis XIII. IDA 2014. Lecture Notes in Computer Science, vol 8819. Springer, Cham. https://doi.org/10.1007/978-3-319-12571-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12571-8_19

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12570-1

  • Online ISBN: 978-3-319-12571-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics