Phrase-Based Hierarchical Clustering of Web Search Results

Masłowska, Irmina

doi:10.1007/3-540-36618-0_42

Irmina Masłowska⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2633))

Included in the following conference series:

European Conference on Information Retrieval

1266 Accesses
8 Citations

Abstract

The paper addresses the problem of clustering text documents coming from the Web. We apply clustering to support users in interactive browsing through hierarchically organized search results as opposed to the standard ranked-list presentation. We propose a clustering method that is tailored to on-line processing of Web documents and takes into account the time aspect, the particular requirements of clustering texts, and readability of the produced hierarchy. Finally, we present the user interface of an actual system in which the method is applied to the results of a popular search engine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Boley D., Gini M. et al. (1999) Partitioning-based clustering for web document categorization. Decision Support Systems 27(3), 329–341
Article Google Scholar
Hearst M. A. (1998) The use of categories and clusters in information access interfaces. T. Strzalkowski (ed.), Natural Language Information Retrieval. Kluwer Academic Publishers
Google Scholar
Hearst M. A., Pedersen J. O. (1996) Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. Proc. of the 19th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 85–92
Google Scholar
Maarek Y. S., Fagin R. et al. (2000) Ephemeral document clustering for Web applications. IBM Research Report RJ 10186, Haifa
Google Scholar
Masłowska I., Weiss D. (2000) JUICER — a data mining approach to information extraction from the WWW, Foundations of Computing and Decision Sciences 25(2), 67–87
Google Scholar
Milligan G. W., Cooper M. C. (1985) An examination of procedures for detecting the number of clusters in a data set. Psychometrika 50, 159–79
Article Google Scholar
van Rijsbergen C. J. (1979) Information Retrieval, Butterworths, London
Google Scholar
Roy B. (1969) Algèbre moderne et théorie des graphes orientées vers les sciences économiques et sociales, Dunod
Google Scholar
Salton G. (1989) Automatic Text Processing, Addison-Wesley
Google Scholar
Ukkonen E. (1995) On-line construction of suffix trees, Algorithmica 14, 249–260
Article MATH MathSciNet Google Scholar
Voorhees E. M. (1986) Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. Information Processing and Management 22, 465–76
Article Google Scholar
Weiss D. (2001) A Clustering Interface for Web Search Results in Polish and English. Master Thesis, Poznań University of Technology (http://www.cs.put.poznan.pl/dweiss/index.php/publications/)
Willett P. (1988) Recent trends in hierarchical document clustering: A critical review. Information Processing & Management 24(5), 577–597
Article Google Scholar
Zamir O. (1999) Clustering Web Documents: A Phrase-Based Method for Grouping Search Engine Results. Doctoral dissertation, University of Washington
Google Scholar
Zamir O., Etzioni O. (1998) Web Document Clustering: A Feasibility Demonstration. Proc. of the 21st Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 46–54
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing Science, Poznań University of Technology, Piotrowo 3A, 60-965, Poznań, Poland
Irmina Masłowska

Authors

Irmina Masłowska
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Via Giuseppe Moruzzi, 1, 56124, Pisa, Italy
Fabrizio Sebastiani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Masłowska, I. (2003). Phrase-Based Hierarchical Clustering of Web Search Results. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_42

Download citation

DOI: https://doi.org/10.1007/3-540-36618-0_42
Published: 15 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-01274-0
Online ISBN: 978-3-540-36618-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics