Synonyms
Definition
Stemming is a process by which word endings or other affixes are removed or modified in order that word forms which differ in non-relevant ways may be merged and treated as equivalent. A computer program which performs such a transformation is referred to as a stemmer or stemming algorithm. The output of a stemming algorithm is known as a stem.
Historical Background
The need for stemming first arose in the field of information retrieval (IR), where queries containing search terms need to be matched against document surrogates containing index terms. With the development of computer-based systems for IR, the problem immediately arose that a small difference in form between a search term and an index term could result in a failure to retrieve some relevant documents. Thus, if a query used the term “explosion” and a document was indexed by the term “explosives,” there would be no match on this term (whether or...
Recommended Reading
Adamson GW, Boreham J. The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Inf Process Manage. 1974;10(7/8):253–60.
Ahmad F, Yusoff M, Sembok MT. Experiments with a stemming algorithm for Malay words. J Am Soc Inf Sci Technol. 1996;47(12):909–18.
Al-Sughaiyer IA, Al-Kharashi IA. Arabic morphological analysis techniques: a comprehensive survey. J Am Soc Inf Sci Technol. 2004;55(3):189–213.
Aljlayl M, Frieder O. On arabic search: improving the retrieval effectiveness via a light stemming approach. In Proceedings of international conference on information and knowledge management. 2002. p. 340–7.
Bacchin M, Ferro N, Melluci M. A probabilistic model for stemmer generation. Inf Process Manage. 2005;41(1):121–37.
Frakes WB, Fox CJ. Strength and similarity of affix removal stemming algorithms. SIGIR Forum. 2003;37(1):26–30.
Harman D. How effective is suffixing? J Am Soc Inf Sci. 1991;42(1):7–15.
Hull D. A Stemming algorithms: a case study for detailed evaluation. J Am Soc Inf Sci. 1996;47(1):70–84.
Krovetz R. Viewing morphology as an inference process. Artificial Intelligence. 2000;118(1/2):277–94.
Lennon M, Pierce DS, Tarry BD, Willett P. An evaluation of some conflation algorithms for information retrieval. J Inf Sci. 1981;3:177–83.
Lovins JB. Development of a stemming algorithm. Mech Transl Comput Linguist. 1968;11:22–31.
Paice CD. Another stemmer. SIGIR Forum. 1990;24(3):56–61.
Paice CD. A method for the evaluation of stemming algorithms based on error counting. J Am Soc Inf Sci. 1996;47(8):632–49.
Porter MF. An algorithm for suffix stripping. Program. 1980;14(3):130–7.
Xu J, Croft WB. Corpus-based stemming using coocurrence of word variants. ACM Trans Inf Syst. 1998;16(1):61–81.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media LLC
About this entry
Cite this entry
Paice, C.D. (2016). Stemming. In: Liu, L., Özsu, M. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4899-7993-3_942-2
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7993-3_942-2
Received:
Accepted:
Published:
Publisher Name: Springer, New York, NY
Online ISBN: 978-1-4899-7993-3
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering