Abstract
The paper addresses the problem of clustering text documents coming from the Web. We apply clustering to support users in interactive browsing through hierarchically organized search results as opposed to the standard ranked-list presentation. We propose a clustering method that is tailored to on-line processing of Web documents and takes into account the time aspect, the particular requirements of clustering texts, and readability of the produced hierarchy. Finally, we present the user interface of an actual system in which the method is applied to the results of a popular search engine.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Boley D., Gini M. et al. (1999) Partitioning-based clustering for web document categorization. Decision Support Systems 27(3), 329–341
Hearst M. A. (1998) The use of categories and clusters in information access interfaces. T. Strzalkowski (ed.), Natural Language Information Retrieval. Kluwer Academic Publishers
Hearst M. A., Pedersen J. O. (1996) Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. Proc. of the 19th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 85–92
Maarek Y. S., Fagin R. et al. (2000) Ephemeral document clustering for Web applications. IBM Research Report RJ 10186, Haifa
Masłowska I., Weiss D. (2000) JUICER — a data mining approach to information extraction from the WWW, Foundations of Computing and Decision Sciences 25(2), 67–87
Milligan G. W., Cooper M. C. (1985) An examination of procedures for detecting the number of clusters in a data set. Psychometrika 50, 159–79
van Rijsbergen C. J. (1979) Information Retrieval, Butterworths, London
Roy B. (1969) Algèbre moderne et théorie des graphes orientées vers les sciences économiques et sociales, Dunod
Salton G. (1989) Automatic Text Processing, Addison-Wesley
Ukkonen E. (1995) On-line construction of suffix trees, Algorithmica 14, 249–260
Voorhees E. M. (1986) Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. Information Processing and Management 22, 465–76
Weiss D. (2001) A Clustering Interface for Web Search Results in Polish and English. Master Thesis, Poznań University of Technology (http://www.cs.put.poznan.pl/dweiss/index.php/publications/)
Willett P. (1988) Recent trends in hierarchical document clustering: A critical review. Information Processing & Management 24(5), 577–597
Zamir O. (1999) Clustering Web Documents: A Phrase-Based Method for Grouping Search Engine Results. Doctoral dissertation, University of Washington
Zamir O., Etzioni O. (1998) Web Document Clustering: A Feasibility Demonstration. Proc. of the 21st Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 46–54
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Masłowska, I. (2003). Phrase-Based Hierarchical Clustering of Web Search Results. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_42
Download citation
DOI: https://doi.org/10.1007/3-540-36618-0_42
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-01274-0
Online ISBN: 978-3-540-36618-8
eBook Packages: Springer Book Archive