Carrot2 and Language Properties in Web Search Results Clustering

  • Jerzy Stefanowski
  • Dawid Weiss
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2663)


This paper relates to a technique of improving results visualization in Web search engines known as search results clustering. We introduce an open extensible research system for examination and development of search results clustering algorithms — Carrot2. We also discuss attempts to measuring quality of discovered clusters and demonstrate results of our experiments with quality assessment when inflectionally rich language (Polish) is clustered using a representative algorithm - Suffix Tree Clustering.


information retrieval web browsing and exploration web search clustering suffix tree clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Maarek, Y.S., Fagin, R., Ben-Shaul, I.Z., Pelleg, D.: Ephemeral document clustering for web applications. Technical Report RJ 10186, IBM Research (2000)Google Scholar
  2. 2.
    Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to Web search results. Computer Networks (Amsterdam, Netherlands: 1999) 31 (1999) 1361–1374Google Scholar
  3. 3.
    Weiss, D., Stefanowski, J.: Web search results clustering in Polish: Experimental evaluation of Carrot. Accepted for New Trends in Intelligent Information Processing and Web Mining Conference, Zakopane, Poland (2003)Google Scholar
  4. 4.
    Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis. fourth edn. Oxford University Press (2001)Google Scholar
  5. 5.
    Hearst, M.A., Pedersen, J.O.: Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In: Proceedings of SIGIR-96, 19th ACM International Conference on Research and Development in Information Retrieval, Zürich, CH (1996) 76–84Google Scholar
  6. 6.
    Boley, D., Gini, M., Gross, R., Han, S., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., Moore, J.: Partitioning-based clustering for web document categorization. Decision Support Systems 27 (1999) 329–341CrossRefGoogle Scholar
  7. 7.
    Hannappel, P., Klapsing, R., Neumann, G.: MSEEC — a multi search engine with multiple clustering. In: Proceedings of the 99 Information Resources Management Association International Conference, Hershey, Pennsylvania (1999)Google Scholar
  8. 8.
    Masłowska, I., Słowiński, R.: Hierarchical clustering of large text corpora. Accepted for New Trends in Intelligent Information Processing and Web Mining Conference, Zakopane, Poland (2003)Google Scholar
  9. 9.
    Dong, Z.: Towards Web Information Clustering. PhD thesis, Southeast University, Nanjing, China (2002)Google Scholar
  10. 10.
    Zamir, O.: Clustering Web Documents: A Phrase-Based Method for Grouping Search Engine Results. PhD thesis, University of Washington (1999)Google Scholar
  11. 11.
    Larsson, J.N.: Structures of String Matching and Data Compression. PhD thesis, Department of Comp. Science, Lund University (1999)Google Scholar
  12. 12.
    Macskassy, S.A., Banerjee, A., Davison, B.D., Hirsh, H.: Human performance on clustering web pages: A preliminary study. In: Knowledge Discovery and Data Mining. (1998) 264–268Google Scholar
  13. 13.
    Dom, B.E.: An information-theoretic external cluster-validity measure. Technical Report IBM Research Report RJ 10219, IBM (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Jerzy Stefanowski
    • 1
  • Dawid Weiss
    • 1
  1. 1.Institute of Computing SciencePoznań University of TechnologyPoznańPoland

Personalised recommendations