Learning Readers’ News Preferences with Support Vector Machines

  • Elena Hensinger
  • Ilias Flaounas
  • Nello Cristianini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6594)

Abstract

We explore the problem of learning and predicting popularity of articles from online news media. The only available information we exploit is the textual content of the articles and the information whether they became popular – by users clicking on them – or not. First we show that this problem cannot be solved satisfactorily in a naive way by modelling it as a binary classification problem. Next, we cast this problem as a ranking task of pairs of popular and non-popular articles and show that this approach can reach accuracy of up to 76%. Finally we show that prediction performance can improve if more content-based features are used. For all experiments, Support Vector Machines approaches are used.

Keywords

Pattern recognition Data mining Applications Machine learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fürnkranz, J., Hüllermeier, E.: Preference learning: An introduction. In: Preference Learning. Springer, Heidelberg (2010)Google Scholar
  2. 2.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 133–142. ACM, New York (2002)CrossRefGoogle Scholar
  3. 3.
    Joachims, T., Radlinski, F.: Search engines that learn from implicit feedback. IEEE Computer 40(8), 34–40 (2007)CrossRefGoogle Scholar
  4. 4.
    Jiang, X., Hu, Y., Li, H.: A ranking approach to keyphrase extraction. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009), pp. 756–757. ACM, New York (2009)Google Scholar
  5. 5.
    Center, P.R.: When technology makes headlines: The media’s double vision about the digital age. Technical report, Pew Research Center’s Project for Excellence in Journalism (2010)Google Scholar
  6. 6.
    Gans, H.J.: Deciding What’s News: A Study of CBS Evening News, NBC Nightly News, Newsweek, and Time, 25th anniversary edn. Northwestern University Press (2004)Google Scholar
  7. 7.
    Steinberger, R., Pouliquen, B., Van der Goot, E.: An introduction to the europe media monitor family of applications. In: Information Access in a Multilingual World - Proceedings of the SIGIR 2009 Workshop (SIGIR-CLIR 2009), pp. 1–8 (2009)Google Scholar
  8. 8.
    Bautin, M., Ward, C., Patil, A., Skiena, S.: Access: News and blog analsysis for the social sciences. In: Proceedings of the 19th International Conference on World Wide Web (WWW), pp. 1229–1232 (2010)Google Scholar
  9. 9.
    Flaounas, I., Turchi, M., Ali, O., Fyson, N., De Bie, T., Mosdell, N., Lewis, J., Cristianini, N.: The structure of EU mediasphere. PLoS ONE 5, e14243 (2010)CrossRefGoogle Scholar
  10. 10.
    Liu, J., Dolan, P., Pedersen, E.R.: Personalized news recommendation based on click behavior. In: Proceedings of the 2010 International Conference on Intelligent User Interfaces (IUI), pp. 31–40 (2010)Google Scholar
  11. 11.
    Wu, F., Huberman, B.A.: Popularity, novelty and attention. In: Proceedings 9th ACM Conference on Electronic Commerce (EC 2008), pp. 240–245 (2008)Google Scholar
  12. 12.
    Szabó, G., Huberman, B.A.: Predicting the popularity of online content. Commun. ACM 53(8), 80–88 (2010)CrossRefGoogle Scholar
  13. 13.
    Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49. ACM Press, New York (1999)Google Scholar
  14. 14.
    Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)CrossRefMATHGoogle Scholar
  15. 15.
    Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representation for text categorization. In: 7th International Conference on Information and Knowledge Management (CIKM), pp. 148–155 (1998)Google Scholar
  16. 16.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  17. 17.
    Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer, Dordrecht (2002)CrossRefGoogle Scholar
  18. 18.
    Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Conference on Computational Learning Theory (COLT), pp. 144–152 (1992)Google Scholar
  19. 19.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)CrossRefMATHGoogle Scholar
  20. 20.
    Scholkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge Mass (2002)MATHGoogle Scholar
  21. 21.
    Turchi, M., Flaounas, I., Ali, O., De Bie, T., Snowsill, T., Cristianini, N.: Found in translation. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5782, pp. 746–749. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  22. 22.
    Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)CrossRefGoogle Scholar
  23. 23.
    Liu, B.: Web Data Mining, Exploring Hyperlinks, Contents, and Usage Data. Springer, Heidelberg (2007)MATHGoogle Scholar
  24. 24.
    Joachims, T.: Making large-scale svm learning practical. In: Advances in Kernel Methods: Support Vector Learning, pp. 169–184. MIT Press, Cambridge (1999)Google Scholar
  25. 25.
    Flaounas, I.N., Turchi, M., Cristianini, N.: Detecting macro-patterns in the european mediasphere. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology - Workshops, pp. 527–530 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Elena Hensinger
    • 1
  • Ilias Flaounas
    • 1
  • Nello Cristianini
    • 1
  1. 1.Intelligent Systems LaboratoryUniversity of BristolUK

Personalised recommendations