Statistical vs. Rule-Based Stemming for Monolingual French Retrieval
This paper describes our approach to the 2006 Adhoc Mo-nolingual Information Retrieval run for French. The goal of our experiment was to compare the performance of a proposed statistical stemmer with that of a rule-based stemmer, specifically the French version of Porter’s stemmer. The statistical stemming approach is based on lexicon clustering, using a novel string distance measure. We submitted three official runs, besides a baseline run that uses no stemming. The results show that stemming significantly improves retrieval performance (as expected) by about 9-10%, and the performance of the statistical stemmer is comparable with that of the rule-based stemmer.
KeywordsFrench Version Relevance Judgment Statistical Stemmer Partitive Cluster Algorithm Improve Retrieval Performance
Unable to display preview. Download preview PDF.
- 1.Buckley, C., Singhal, A., Mitra, M.: Using Query Zoning and Correlation within SMART: TREC5. In: Voorhees, E.M., Harman, D.K. (eds.) Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication, pp. 500–238 (November 1997)Google Scholar
- 3.Salton, G.: The SMART Retrieval System—Experiments in Automatic Document Retrieval. Prentice Hall Inc., Englewood Cliffs (1971)Google Scholar