Skip to main content

STEMBR: A Stemming Algorithm for the Brazilian Portuguese Language

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 3808)

Abstract

Stemming algorithms have traditionally been utilized in information retrieval systems as they generate a more concise word representation. However, the efficiency of these algorithms varies according to the language they are used with. This paper presents STEMBR, a stemmer for Brazilian Portuguese whereby the suffix treatment is based on a statistical study of the frequency of the last letter for words found in Brazilian web pages. The proposed stemmer is compared with another algorithm specifically developed for Portuguese. The results show the efficiency of our stemmer.

Keywords

  • Hide Markov Model
  • Information Retrieval
  • Information Retrieval System
  • Suffix Treatment
  • Brazilian Version

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bacchin, M., Ferro, N., Melucci, M.: University of Padua at CLEF 2002: Experiments to evaluate a statistical stemming algorithm. In: Proceedings of Working Notes for CLEF 2002, Rome, September 2002, pp. 161–168 (2002)

    Google Scholar 

  2. Ferreira, A.B.H.: Dicionário Aurélio Eletrônico. CD-ROM. Nova Fronteira (1999) (in Portuguese)

    Google Scholar 

  3. Frakes, W., Yates, B.R.: Information Retrieval: Data Structures and Algorithms. Prentice Hall, NJ (1992)

    Google Scholar 

  4. Orengo, V., Huyck, C.: A Stemming Algorithm for The Portuguese Language. In: Proceedings of Eighth Symposium on String Processing and Information Retrieval (SPIRE 2001), Laguna de San Raphael, Chile, November 2001, pp. 186–193 (2001)

    Google Scholar 

  5. Junior, A.M.: LexWeb: um léxico da língua portuguesa extraído automaticamente da internet. Master Thesis (in Portuguese). Programa de Pós-Graduação em Engenharia Elétrica. UFPA (November 2004)

    Google Scholar 

  6. Melucci, M., Orio, N.: A Novel Method for Stemmer Generation Based on Hidden Markov Models. In: Proceedings of Conference on Information and Knowledge Management (CIKM 2003), New Orleans, LA, November 2003, pp. 131–138. ACM Press, New York (2003)

    CrossRef  Google Scholar 

  7. Mayfield, J., McNamee, P.: Single N-gram Stemming. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, Toronto, Canada, July 2003, pp. 415–416. ACM Press, New York (2003)

    Google Scholar 

  8. Paice, C.: An evaluation method for stemming algorithms. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, Dublin, Ireland, July 1994, pp. 42–50. ACM Press, New York (1994)

    Google Scholar 

  9. Porter, M.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  10. Stefik, M.: Introduction to Knowledge systems. Morgan Kaufmann Publishers, San Francisco (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alvares, R.V., Garcia, A.C.B., Ferraz, I. (2005). STEMBR: A Stemming Algorithm for the Brazilian Portuguese Language. In: Bento, C., Cardoso, A., Dias, G. (eds) Progress in Artificial Intelligence. EPIA 2005. Lecture Notes in Computer Science(), vol 3808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11595014_67

Download citation

  • DOI: https://doi.org/10.1007/11595014_67

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30737-2

  • Online ISBN: 978-3-540-31646-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics