Journal of Intelligent Information Systems

, Volume 32, Issue 2, pp 191–212 | Cite as

An adaptive personalized news dissemination system

  • Ioannis Katakis
  • Grigorios Tsoumakas
  • Evangelos Banos
  • Nick Bassiliades
  • Ioannis Vlahavas
Article

Abstract

With the explosive growth of the Word Wide Web, information overload became a crucial concern. In a data-rich information-poor environment like the Web, the discrimination of useful or desirable information out of tons of mostly worthless data became a tedious task. The role of Machine Learning in tackling this problem is thoroughly discussed in the literature, but few systems are available for public use. In this work, we bridge theory to practice, by implementing a web-based news reader enhanced with a specifically designed machine learning framework for dynamic content personalization. This way, we get the chance to examine applicability and implementation issues and discuss the effectiveness of machine learning methods for the classification of real-world text streams. The main features of our system named PersoNews are: (a) the aggregation of many different news sources that offer an RSS version of their content, (b) incremental filtering, offering dynamic personalization of the content not only per user but also per each feed a user is subscribed to, and (c) the ability for every user to watch a more abstracted topic of interest by filtering through a taxonomy of topics. PersoNews is freely available for public use on the WWW (http://news.csd.auth.gr).

Keywords

Personalization Text classification Concept drift Ontology News filtering Dynamic feature space 

References

  1. Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., Paliouras, G., & Spyropoulos, C. D. (2000). An evaluation of naive bayesian anti-spam filtering. In Proceedings of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning (ECML 2000), Barcelona, Spain.Google Scholar
  2. Banos, E., Katakis, I., Bassiliades, N., Tsoumakas, G., & Vlahavas, I. (2006). PersoNews: A personalized news reader enhanced by machine learning and semantic filtering. In Proceedings of the 5th International Conference on Ontologies, DataBases and Applications of Semantics (ODBASE 2006). Montpellier, France: Springer.Google Scholar
  3. Bharat, K., Kamba, T., & Albers, M. (1998). Personalized, interactive news on the web. Multimedia Systems, 6(5), 349–358.CrossRefGoogle Scholar
  4. Billsus, D., & Pazzani, M. (1999). A hybrid user model for news story classification. In Proceedings of the Seventh International Conference on User Modeling. Banff, Canada: Springer.Google Scholar
  5. Carreira, R., Crato, J. M., Goncalves, D., & Jorge, J. A. (2004). Evaluating adaptive user profiles for news classification. In Proceedings of the 9th International Conference on Intelligent user Interface. Funchal. Madeira, Portugal: ACM.Google Scholar
  6. Chan, C.-H., Sun, A., & Lim, E.-P. (2001). Automated online news classification with personalization. In Proceedings of the 4th International Conference of Asian Digital Library (ICADL2001), Bangalore, India.Google Scholar
  7. Chin, J. P., Diehl, V. A., & Norman, K. L. (1988). Development of an instrument measuring user satisfaction of the human-computer interface. In Proceedings of SIGCHI Conference on Human factors in computing systems. Washington, DC: ACM.Google Scholar
  8. Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the seventh international conference on Information and knowledge management. Bethesda, MD: ACM.Google Scholar
  9. Fan, W. (2004). Systematic data selection to mine concept-drifting data streams. In Proceedings of the Tenth ACM SIGKDD international conference on knowledge discovery and data mining. Seattle, WA: ACM.Google Scholar
  10. Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. In Proceedings of the Seventh ACM SIGKDD international conference on knowledge discovery and data mining. San Francisco, CA: ACM.Google Scholar
  11. Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of ECML-98, 10th European Conference on Machine Learning. New York: Springer.Google Scholar
  12. Katakis, I., Tsoumakas, G., & Vlahavas, I. (2006). Dynamic feature space and incremental feature selection for the classification of textual data streams. In Proceedings of ECML/PKDD-2006 International Workshop on knowledge discovery from data streams. Berlin, Germany: Springer.Google Scholar
  13. Kim, B. M., Li, Q., Park, C. S., Kim, S. G., & Kim, J. Y. (2006). A new approach for combining content-based and collaborative filters. Journal of Intelligent Information Systems, 27(1), 79–91.CrossRefGoogle Scholar
  14. Klinkenberg, R. (2004). Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis, Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, 8(3), 281–200.Google Scholar
  15. Kokkoras, F., Bassiliades, N., & Vlahavas, I. (2007). Cooperative CG-wrappers for web content extraction. In Proceedings of the 15th International Conference on Conceptual Structures, ICCS’07, Sheffield, UK.Google Scholar
  16. Laskov, P., Gehl, C., Kruger, S., & Muller, K.-R. (2006). Incremental support vector learning: Analysis, implementation and applications. Journal of Machine Learning Research, 7, 1909–1936.MathSciNetGoogle Scholar
  17. Lewis, D. D. (1992). An evaluation of phrasal and clustered representations on a text categorization task. In Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval. Copenhagen, Denmark: ACM.Google Scholar
  18. Lewis, D. D., & Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV. Google Scholar
  19. McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification. In Proceedings of AAAI-98 Workshop on Learning for Text Categorization.Google Scholar
  20. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.Google Scholar
  21. Scholz, M., & Klinkenberg, R. (2007). Boosting classifiers for drifting concepts. Intelligent Data Analysis, 11(1), 3–28.Google Scholar
  22. Schutze, H., Hull, D. A., & Pedersen, J. O. (1995). A comparison of classifiers and document representations for the routing problem. In Proceedings of the SIGIR ‘95, 18th Annual International ACM SIGIR conference on research and development in information retrieval. Seattle, WA: ACM.Google Scholar
  23. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47.CrossRefGoogle Scholar
  24. Tsymbal, A. (2004). The problem of concept drift: Definitions and related work. Technical Report. Dublin, Ireland: Department of Computer Science, Trinity College.Google Scholar
  25. Wenerstrom, B., & Giraud-Carrier, C. (2006). Temporal data mining in dynamic feature spaces. In Proceedings of the Sixth International Conference on Data Mining.Google Scholar
  26. Widmer, G., & Kubat, M. (1996). Learning in the presense of concept drift and hidden contexts. Machine Learning, 23(1), 69–101.Google Scholar
  27. Witten, I., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). San Francisco, CA: Kaufmann.MATHGoogle Scholar
  28. Yang, Y. (1994a). An example-based mapping method for text categorization and retrieval. ACM Transactions on Information Systems, 12(3), 252–277.CrossRefGoogle Scholar
  29. Yang, Y. (1994b). Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proceedings of the 17th Annual International ACM SIGIR conference on research and development in information retrieval. Dublin, Ireland: Springer.Google Scholar
  30. Yang, Y., & Pedersn, J. O. (1997). A comparative study on feature selection in text categorization. In Proceedings of ICML-97, 14th International Conference on Machine Learning. San Francisco, CA: Kaufmann.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Ioannis Katakis
    • 1
  • Grigorios Tsoumakas
    • 1
  • Evangelos Banos
    • 1
  • Nick Bassiliades
    • 1
  • Ioannis Vlahavas
    • 1
  1. 1.Department of InformaticsAristotle UniversityThessalonikiGreece

Personalised recommendations