Carrot2: Design of a Flexible and Efficient Web Information Retrieval Framework

  • Stanisław Osiński
  • Dawid Weiss
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3528)


In this paper we present the design goals and implementation outline of Carrot2, an open source framework for rapid development of applications dealing with Web Information Retrieval and Web Mining. The framework has been written from scratch keeping in mind flexibility and efficiency of processing. We show two software architectures that meet the requirements of these two aspects and provide evidence of their use in clustering of search results.

We also discuss the importance and advantages of contributing and integrating the results of scientific projects with the open source community.


Information Retrieval Clustering Systems Design 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Grefenstette, G.: Comparing two language identification schemes. In: Proceedings of the 3rd International Conference on Statistical Analysis of Textual Data (1995)Google Scholar
  2. 2.
    Lang, H.C.: A tolerance rough set approach to clustering web search results. Faculty of Mathematics, Informatics and Mechanics, Warsaw University (2004)Google Scholar
  3. 3.
    Schockaert, S.: Het clusteren van zoekresultaten met behulp van vaagmieren (clustering of search results using fuzzy ants). Master thesis, University of Ghent (2004)Google Scholar
  4. 4.
    Zamir, O.: Clustering Web Documents: A Phrase-Based Method for Grouping Search Engine Results. PhD thesis, University of Washington (1999)Google Scholar
  5. 5.
    Osiński, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on Singular Value Decomposition. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Proceedings of the International IIS: Intelligent Information Processing and Web Mining Conference, Zakopane, Poland. Advances in Soft Computing, pp. 359–368. Springer, Heidelberg (2004)Google Scholar
  6. 6.
    Jensen, L.R.: A reuse repository with automated synonym support and cluster generation. Department of Computer Science at the Faculty of Science, University of Aarhus, Denmark (2004)Google Scholar
  7. 7.
    Osiński, S.: Dimensionality reduction techniques for search results clustering. MSc thesis, University of Sheffield, UK (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Stanisław Osiński
    • 1
  • Dawid Weiss
    • 1
  1. 1.Institute of Computing SciencePoznań University of TechnologyPoznańPoland

Personalised recommendations