Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Stemming

  • Chris D. Paice
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_942

Synonyms

Affix removal; Suffix stripping; Suffixing; Word conflation

Definition

Stemming is a process by which word endings or other affixes are removed or modified in order that word forms which differ in non-relevant ways may be merged and treated as equivalent. A computer program which performs such a transformation is referred to as a stemmer or stemming algorithm. The output of a stemming algorithm is known as a stem.

Historical Background

The need for stemming first arose in the field of information retrieval (IR), where queries containing search terms need to be matched against document surrogates containing index terms. With the development of computer-based systems for IR, the problem immediately arose that a small difference in form between a search term and an index term could result in a failure to retrieve some relevant documents. Thus, if a query used the term “explosion” and a document was indexed by the term “explosives,” there would be no match on this term (whether or...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Adamson GW, Boreham J. The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Inf Process Manage. 1974;10(7/8):253–60.zbMATHGoogle Scholar
  2. 2.
    Ahmad F, Yusoff M, Sembok MT. Experiments with a stemming algorithm for Malay words. J Am Soc Inf Sci Technol. 1996;47(12):909–18.CrossRefGoogle Scholar
  3. 3.
    Al-Sughaiyer IA, Al-Kharashi IA. Arabic morphological analysis techniques: a comprehensive survey. J Am Soc Inf Sci Technol. 2004;55(3):189–213.CrossRefGoogle Scholar
  4. 4.
    Aljlayl M, Frieder O. On arabic search: improving the retrieval effectiveness via a light stemming approach. In: Proceedings of the International Conference on Information and Knowledge Management; 2002. p. 340–7.Google Scholar
  5. 5.
    Bacchin M, Ferro N, Melluci M. A probabilistic model for stemmer generation. Inf Process Manage. 2005;41(1):121–37.CrossRefGoogle Scholar
  6. 6.
    Frakes WB, Fox CJ. Strength and similarity of affix removal stemming algorithms. SIGIR Forum. 2003;37(1):26–30.CrossRefGoogle Scholar
  7. 7.
    Harman D. How effective is suffixing? J Am Soc Inf Sci. 1991;42(1):7–15.CrossRefGoogle Scholar
  8. 8.
    Hull D. A Stemming algorithms: a case study for detailed evaluation. J Am Soc Inf Sci. 1996;47(1):70–84.CrossRefGoogle Scholar
  9. 9.
    Krovetz R. Viewing morphology as an inference process. Artificial Intelligence. 2000;118(1/2):277–94.zbMATHCrossRefGoogle Scholar
  10. 10.
    Lennon M, Pierce DS, Tarry BD, Willett P. An evaluation of some conflation algorithms for information retrieval. J Inf Sci. 1981;3(4):177–83.CrossRefGoogle Scholar
  11. 11.
    Lovins JB. Development of a stemming algorithm. Mech Transl Comput Linguist. 1968;11:22–31.Google Scholar
  12. 12.
    Paice CD. Another stemmer. SIGIR Forum. 1990;24(3):56–61.CrossRefGoogle Scholar
  13. 13.
    Paice CD. A method for the evaluation of stemming algorithms based on error counting. J Am Soc Inf Sci. 1996;47(8):632–49.CrossRefGoogle Scholar
  14. 14.
    Porter MF. An algorithm for suffix stripping. Program. 1980;14(3):130–7.CrossRefGoogle Scholar
  15. 15.
    Xu J, Croft WB. Corpus-based stemming using coocurrence of word variants. ACM Trans Inf Syst. 1998;16(1):61–81.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Lancaster UniversityLancasterUK

Section editors and affiliations

  • Edie Rasmussen
    • 1
  1. 1.Library, Archival & Inf. StudiesThe Univ. of British ColumbiaVancouverCanada