Advertisement

Statistical vs. Rule-Based Stemming for Monolingual French Retrieval

  • Prasenjit Majumder
  • Mandar Mitra
  • Kalyankumar Datta
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4730)

Abstract

This paper describes our approach to the 2006 Adhoc Mo-nolingual Information Retrieval run for French. The goal of our experiment was to compare the performance of a proposed statistical stemmer with that of a rule-based stemmer, specifically the French version of Porter’s stemmer. The statistical stemming approach is based on lexicon clustering, using a novel string distance measure. We submitted three official runs, besides a baseline run that uses no stemming. The results show that stemming significantly improves retrieval performance (as expected) by about 9-10%, and the performance of the statistical stemmer is comparable with that of the rule-based stemmer.

Keywords

French Version Relevance Judgment Statistical Stemmer Partitive Cluster Algorithm Improve Retrieval Performance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Buckley, C., Singhal, A., Mitra, M.: Using Query Zoning and Correlation within SMART: TREC5. In: Voorhees, E.M., Harman, D.K. (eds.) Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication, pp. 500–238 (November 1997)Google Scholar
  2. 2.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRefGoogle Scholar
  3. 3.
    Salton, G.: The SMART Retrieval System—Experiments in Automatic Document Retrieval. Prentice Hall Inc., Englewood Cliffs (1971)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Prasenjit Majumder
    • 1
  • Mandar Mitra
    • 1
  • Kalyankumar Datta
    • 2
  1. 1.CVPR Unit, Indian Statistical Institute, Kolkata 
  2. 2.Dept. of EE, Jadavpur University, Kolkata 

Personalised recommendations