Phrase-Based Hierarchical Clustering of Web Search Results

  • Irmina Masłowska
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2633)


The paper addresses the problem of clustering text documents coming from the Web. We apply clustering to support users in interactive browsing through hierarchically organized search results as opposed to the standard ranked-list presentation. We propose a clustering method that is tailored to on-line processing of Web documents and takes into account the time aspect, the particular requirements of clustering texts, and readability of the produced hierarchy. Finally, we present the user interface of an actual system in which the method is applied to the results of a popular search engine.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Boley D., Gini M. et al. (1999) Partitioning-based clustering for web document categorization. Decision Support Systems 27(3), 329–341CrossRefGoogle Scholar
  2. [2]
    Hearst M. A. (1998) The use of categories and clusters in information access interfaces. T. Strzalkowski (ed.), Natural Language Information Retrieval. Kluwer Academic PublishersGoogle Scholar
  3. [3]
    Hearst M. A., Pedersen J. O. (1996) Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. Proc. of the 19th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 85–92Google Scholar
  4. [4]
    Maarek Y. S., Fagin R. et al. (2000) Ephemeral document clustering for Web applications. IBM Research Report RJ 10186, HaifaGoogle Scholar
  5. [5]
    Masłowska I., Weiss D. (2000) JUICER — a data mining approach to information extraction from the WWW, Foundations of Computing and Decision Sciences 25(2), 67–87Google Scholar
  6. [6]
    Milligan G. W., Cooper M. C. (1985) An examination of procedures for detecting the number of clusters in a data set. Psychometrika 50, 159–79CrossRefGoogle Scholar
  7. [7]
    van Rijsbergen C. J. (1979) Information Retrieval, Butterworths, LondonGoogle Scholar
  8. [8]
    Roy B. (1969) Algèbre moderne et théorie des graphes orientées vers les sciences économiques et sociales, DunodGoogle Scholar
  9. [9]
    Salton G. (1989) Automatic Text Processing, Addison-WesleyGoogle Scholar
  10. [10]
    Ukkonen E. (1995) On-line construction of suffix trees, Algorithmica 14, 249–260MATHCrossRefMathSciNetGoogle Scholar
  11. [11]
    Voorhees E. M. (1986) Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. Information Processing and Management 22, 465–76CrossRefGoogle Scholar
  12. [12]
    Weiss D. (2001) A Clustering Interface for Web Search Results in Polish and English. Master Thesis, Poznań University of Technology (
  13. [13]
    Willett P. (1988) Recent trends in hierarchical document clustering: A critical review. Information Processing & Management 24(5), 577–597CrossRefGoogle Scholar
  14. [14]
    Zamir O. (1999) Clustering Web Documents: A Phrase-Based Method for Grouping Search Engine Results. Doctoral dissertation, University of WashingtonGoogle Scholar
  15. [15]
    Zamir O., Etzioni O. (1998) Web Document Clustering: A Feasibility Demonstration. Proc. of the 21st Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 46–54Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Irmina Masłowska
    • 1
  1. 1.Institute of Computing SciencePoznań University of TechnologyPoznańPoland

Personalised recommendations