Advertisement

Bulgarian, Hungarian and Czech Stemming Using YASS

  • Prasenjit Majumder
  • Mandar Mitra
  • Dipasree Pal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5152)

Abstract

This is the second year in a row we are participating in CLEF. Our aim is to test the performance of a statistical stemmer on various languages. For CLEF 2006, we tried the stemmer on French [1]; while for CLEF 2007, we did experiments for the Hungarian, Bulgarian and Czech monolingual tasks. We find that, for all languages, YASS produces significant improvements over the baseline (unstemmed) runs. The performance of YASS is also found to be comparable to that of other available stemmers for all the three east European Languages.

Keywords

Model Topic Retrieval Model Cluster Threshold Statistical Stemmer Hungarian Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Majumder, P., Mitra, M., Datta, K.: Statistical vs. rule-based stemming for monolingual french retrieval. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 107–110. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Majumder, P., Mitra, M., Parui, S.K., Kole, G., Mitra, P., Datta, K.: Yass: Yet another suffix stripper. ACM Trans. Inf. Syst. 25(4), 18 (2007)CrossRefGoogle Scholar
  3. 3.
    Salton, G. (ed.): The SMART Retrieval System—Experiments in Automatic Document Retrieval. Prentice Hall Inc., Englewood Cliffs (1971)Google Scholar
  4. 4.
    Ounis, I., Lioma, C., Macdonald, C., Plachouras, V.: Research directions in terrier. In: Baeza-Yates, R., et al. (eds.) Novatica/UPGRADE Special Issue on Web Information Access (Invited Paper) (2007)Google Scholar
  5. 5.
    Tordai, A., de Rijke, M.: Four Stemmers and a Funeral: Stemming in Hungarian at CLEF 2005. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 179–186. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Dolamic, L., Savoy, J.: Stemming approaches for east european languages. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 37–44. Springer, Heidelberg (2008)Google Scholar
  7. 7.
    Ircing, P., Muller, L.: Czech Monolingual Information Retrieval Using Off-The-Shelf Components - the University of West Bohemia at CLEF 2007 Ad-Hoc track. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152. Springer, Heidelberg (2008)Google Scholar
  8. 8.
    Ceska, P., Pecina, P.: Charles University at CLEF 2007 Ad-Hoc Track. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152. Springer, Heidelberg (2008)Google Scholar
  9. 9.
    Savoy, J., Abdou, S.: Experiments with monolingual, bilingual, and robust retrieval. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 137–144. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Savoy, J.: Searching strategies for the hungarian language. Inf. Process Manage 44(1), 310–324 (2008)CrossRefGoogle Scholar
  11. 11.
    Savoy, J.: Report on CLEF-2003 Monolingual Tracks: Fusion of Probabilistic Models for Effective Monolingual Retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 322–336. Springer, Heidelberg (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Prasenjit Majumder
    • 1
  • Mandar Mitra
    • 1
  • Dipasree Pal
    • 1
  1. 1.CVPR UnitIndian Statistical InstituteKolkata 

Personalised recommendations