Modelling and Explaining Online News Preferences

  • Elena Hensinger
  • Ilias Flaounas
  • Nello Cristianini
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 204)

Abstract

We use Machine Learning techniques to model the reading preferences of audiences of 14 online news outlets. The models, describing the appeal of a given article to each audience, are formed by linear functions of word frequencies, and are obtained by comparing articles that became “Most Popular” on a given day in a given outlet with articles that did not. We make use of 2,432,148 such article pairs, collected over a period of over 1.5 years. Those models are shown to be predictive of user choices, and they are then used to compare both the audiences and the contents of various news outlets. In the first case, we find that there is a significant correlation between demographic profiles of audiences and their preferences. In the second case we find that content appeal is related both to writing style - with more sentimentally charged language being preferred, and to content with “Public Affairs” topics, such as “Finance” and “Politics”, being less preferred.

Keywords

Pattern Analysis Ranking SVM News Appeal Text Analysis User Preference Modelling Prediction of user choices 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Assael, H., Roscoe Jr., A.M.: Approaches to Market Segmentation Analysis. The Journal of Marketing 40(4), 67–76 (1976)CrossRefGoogle Scholar
  2. 2.
    Boczkowski, P.J., Mitchelstein, E.: Is There a Gap between the News Choices of Journalists and Consumers? A Relational and Dynamic Approach. The International Journal of Press/Politics 15(4), 420–440 (2010)CrossRefGoogle Scholar
  3. 3.
    Boczkowski, P.J., Peer, L.: The Choice Gap: The Divergent Online News Preferences of Journalists and Consumers. Journal of Communication 61(5), 857–876 (2011)CrossRefGoogle Scholar
  4. 4.
    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A Training algorithm for Optimal Margin Classifiers. In: Proceedings of the 5th Conference on Computational Learning Theory (COLT), pp. 144–152. ACM (1992)Google Scholar
  5. 5.
    Burgoon, J.K., Burgoon, M., Wilkinson, M.: Writing Style as a Predictor of Newspaper Readership, Satisfaction and Image. Journalism Quarterly 58, 225–231 (1981)CrossRefGoogle Scholar
  6. 6.
    Cristianini, N., Shawe-Taylor, J.: An introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press (2000)Google Scholar
  7. 7.
    Das, A.S., Datar, M., Garg, A., Rajaram, S.: Google news personalization: scalable online collaborative filtering. In: Proceedings of the 16th International Conference on World Wide Web (WWW), pp. 271–280. ACM (2007)Google Scholar
  8. 8.
    Flaounas, I.N., Turchi, M., Cristianini, N.: Detecting macro-patterns in the european mediasphere. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology - Workshops, pp. 527–530. IEEE (2009)Google Scholar
  9. 9.
    Flaounas, I., Ali, O., Turchi, M., Snowsill, T., Nicart, F., De Bie, T., Cristianini, N.: NOAM: news outlets analysis and monitoring system. In: Proceedings of the 2011 International Conference on Management of Data (SIGMOD 2011), pp. 1275–1278. ACM (2011)Google Scholar
  10. 10.
    Flaounas, I.: Pattern Analysis of News Media Content. PhD thesis, University of Bristol (2011)Google Scholar
  11. 11.
    Flesch, R.: A New Readability Yardstick. Journal of Applied Psychology 32(3), 221–233 (1948)CrossRefGoogle Scholar
  12. 12.
    Groseclose, T., Milyo, J.: A Measure of Media Bias. The Quarterly Journal of Economics 120(4), 1191–1237 (2005)CrossRefGoogle Scholar
  13. 13.
    Harcup, T., O’Neill, D.: What is News? Galtung and Ruge revisited. Journalism Studies 2(2), 261–280 (2001)Google Scholar
  14. 14.
    Hatzivassiloglou, V., Wiebe, J.M.: Effects of adjective orientation and gradability on sentence subjectivity. In: Proceedings of the International Conference on Computational Linguistics (COLING), pp. 299–305. Morgan Kaufmann (2000)Google Scholar
  15. 15.
    Hensinger, E., Flaounas, I., Cristianini, N.: Learning the Preferences of News Readers with SVM and Lasso Ranking. In: Papadopoulos, H., Andreou, A.S., Bramer, M. (eds.) AIAI 2010. IFIP AICT, vol. 339, pp. 179–186. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Hensinger, E., Flaounas, I.N., Cristianini, N.: Learning Readers’ News Preferences with Support Vector Machines. In: Dobnikar, A., Lotrič, U., Šter, B. (eds.) ICANNGA 2011, Part II. LNCS, vol. 6594, pp. 322–331. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  17. 17.
    Hensinger, E., Flaounas, I.N., Cristianini, N.: What Makes Us Click? - Modelling and Predicting the Appeal of News Articles. In: Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 41–50. SciTePress (2012)Google Scholar
  18. 18.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 133–142. ACM (2002)Google Scholar
  19. 19.
    Joachims, T.: Training linear SVMs in linear time. In: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 217–226. ACM (2006)Google Scholar
  20. 20.
    Liu, B.: Web Data Mining, Exploring Hyperlinks, Contents, and Usage Data. Springer (2007)Google Scholar
  21. 21.
    Liu, J., Dolan, P., Pedersen, E.R.: Personalized news recommendation based on click behavior. In: Proceedings of the 15th International Conference on Intelligent user Interfaces (IUI 2010), pp. 31–40. ACM (2010)Google Scholar
  22. 22.
    Mantel, N.: The Detection of Disease Clustering and a Generalized Regression Approach. Cancer Research 27, 209–220 (1967)Google Scholar
  23. 23.
    Oksanen, J., Blanchet, F.G., Kindt, R., Legendre, P., Minchin, P.R., O’Hara, R.B., Simpson, G.L., Solymos, P., Stevens, M.H.H., Wagner, H.: vegan: Community Ecology Package. R package version 2.0-3 (2012), http://CRAN.R-project.org/package=vegan
  24. 24.
    Porter, M.F.: An Algorithm for Suffix Stripping. Program 14, 130–137 (1980)CrossRefGoogle Scholar
  25. 25.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM. 18,11, November, 613–620 (1975)Google Scholar
  26. 26.
    Sculley, D., Wachman, G.M.: Relaxed online SVMs for spam filtering. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415–422. ACM (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Elena Hensinger
    • 1
  • Ilias Flaounas
    • 1
  • Nello Cristianini
    • 1
  1. 1.Intelligent Systems LaboratoryUniversity of BristolBristolUK

Personalised recommendations