Abstract
This paper describes a new research proposal of multi-document summarization of dynamic content in web pages. Much information is lost in the Web due to the temporal character of web documents. Therefore adapting summarization techniques to the web genre is a promising task. The aim of our research is to provide methods for summarizing volatile content retrieved from collections of topically related web pages over defined time periods. The resulting summary ideally would reflect the most popular topics and concepts found in retrospective web collections. Because of the content and time diversities of web changes, it is necessary to apply different techniques than standard methods used for static documents. In this paper we propose an initial solution to this summarization problem. Our approach exploits temporal similarities between web pages by utilizing sliding window concept over dynamic parts of the collection.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Allan, J., Gupta, R., Khandelwal, V.: Temporal Summaries of News Topics. In: Proceedings of the 24th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, pp. 10–18 (2001)
Allan, J. (ed.): Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic Publishers, Norwell (2002)
Berger, A.L., Mittal, V.O.: OCELOT: a System for Summarizing Web Pages. In: Proceedings of the 23rd ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 144–151 (2000)
Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Seeing the Whole in Parts: Text Summarization for Web Browsing on Handheld Devices. In: Proceedings of the 10th International WWW Conference, Hong Kong, pp. 652–662 (2001)
Google News, http://news.google.com
Jatowt, A., Khoo, K.B., Ishizuka, M.: Change Summarization in Web Collections. In: Proceedings of the 17th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, Ottawa, Canada, pp. 653–662 (2004)
Jatowt, A., Ishizuka, M.: Web Page Summarization Using Dynamic Content. In: Proceedings of the 13th International World Wide Web Conference, New York, USA, pp. 344–345 (2004)
Mani, I., Maybury, M.T. (eds.): Advances in Automatic Text Summarization. MIT Press, Cambridge (1999)
McKeown, K., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J.L., Nenkova, A., Sable, C., Schiffman, B., Sigelman, S.: Tracking and Summarizing News on a Daily Basis with Columbia’s Newsblaster. In: Proceedings of Human Language Technology Conference, San Diego, USA (2002)
Open Directory Project (ODP), http://dmoz.org
Radev, D., Blair-Goldensohn, S., Zhang, Z., Raghavan, S.R.: NewsInEssence: A System for Domain-Independent, Real-Time News Clustering and Multi-Document Summarization. In: Human Language Technology Conference, San Diego, USA (2001)
Radev, D., Fan, W., Zhang, Z.: WebInEssence: A Personalized Web-Based Multi-Document Summarization and Recommendation System. In: NAACL 2001 Workshop on Automatic Summarization, Pittsburgh, USA, pp. 79–88 (2001)
Salton, G., Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5), 513–523 (1988)
Swan, R., Jensen, D.: TimeMines: Constructing Timelines with Statistical Models of Word Usage. In: ACM SIGKDD 2000 Workshop on Text Mining, Boston MA, USA, pp. 73–80 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jatowt, A., Ishizuka, M. (2004). Summarization of Dynamic Content in Web Collections. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-30116-5_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive