Artificial Intelligence Review

, Volume 48, Issue 2, pp 157–217 | Cite as

A systematic review of text stemming techniques

Article

Abstract

Stemming is a program that matches the morphological variants of the word to its root word. Stemming is extensively used as a pre-processing tool in the field of natural language processing, information retrieval, and language modeling. Though a lot of advancements have been made in the field, yet organized arrangement of the previous work and efforts are lacking in this field. In this paper, we present a review of the text stemming theory, algorithms, and applications. It first describes the existing literature relevant to text stemming by classifying it according to certain key parameters; then it describes the deep analysis of some well-known stemming algorithms on standard data sets. In the end, the current state-of-the-art and certain open issues related to unsupervised stemming are presented. The main aim of this paper is to provide an extensive and useful understanding of the important aspects of text stemming. The open issues and analysis of the current stemming techniques will help the researchers to think of new lines to conduct research in future.

Keywords

Stemming Natural language processing Information retrieval Language modeling 

References

  1. Adam G, Asimakis K, Bouras C, Poulopoulos V (2010) An efficient mechanism for stemming and tagging: the case of greek language. In: Proceedings of the 14th international conference on knowledge-based and intelligent information and engineering systems, pp 389–397Google Scholar
  2. Ahmad F, Yusoff M, Sembok T (1996) Experiments with a stemming algorithm for Malay words. J Am Soc Inf Sci 47:909–918CrossRefGoogle Scholar
  3. Ahmed F, Nürnberger A (2009) Evaluation of n-gram conflation approaches for Arabic text retrieval. J Am Soc Inf Sci Technol 60:1448–1465CrossRefGoogle Scholar
  4. Akram Q-A, Naseer A, Hussain S (2009) Assas-band, an affix-exception-list based Urdu stemmer. In: Proceedings of the 7th workshop on Asian language resources, pp 40–46Google Scholar
  5. Aljlayl M, Frieder O (2002) On Arabic search: improving the retrieval effectiveness via a light stemming approach. In: ACM eleventh conference on information and knowledge management, pp 340–347Google Scholar
  6. Al-Kabi M (2013) Towards improving Khoja rule-based Arabic stemmer. In: IEEE Jordan conference on applied electrical engineering and computing technologies (AEECT), pp 1–6Google Scholar
  7. Alshalabi R (2005) Pattern-based stemmer for finding Arabic roots. Inf Technol J 4:38–43CrossRefGoogle Scholar
  8. Al-Shalabi R, Kannan G, Hilat I et al (2005) Experiments with the successor variety algorithm using the cutoff and entropy methods. Inf Technol J 4:55–62CrossRefGoogle Scholar
  9. Al-shammari E, Lin J (2008) Towards an error-free Arabic stemming. In: Proceedings of the 2nd ACM workshop on improving non English web searching, iNEWS’08, pp 9–16Google Scholar
  10. Alvares R, Garcia A, Ferraz I (2005) STEMBR: a stemming algorithm for the Brazilian Portuguese language. In: Proceedings of 12th Portuguese conference on artificial intelligence, EPIA 2005, pp 693–701Google Scholar
  11. Al-Zyoud A, Al-Rabayah W (2015) Arabic stemming techniques: comparisons and new vision. In: Proceedings of the 8th IEEE GCC conference and exhibition, pp 1–6Google Scholar
  12. Amati G (2006) Frequentist and bayesian approach to information retrieval. In: Advances in information retrieval. Springer, pp 13–24Google Scholar
  13. Amati G, Van Rijsbergen CJ (2002) Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans Inf Syst 20:357–389CrossRefGoogle Scholar
  14. Apache lucene. http://lucene.apache.org
  15. Baayen RH, Piepenbrock R, van H R (1993) The CELEX lexical data base (CD-ROM). Linguistic data consortium. University of Pennsylvania, PhiladelphiaGoogle Scholar
  16. Bacchin M, Ferro N, Melucci M (2002) The effiectiveness of a graph-based algorithm for stemming. In: Digital libraries: people, knowledge, and technology. Springer, pp 117–128Google Scholar
  17. Bacchin M, Ferro N, Melucci M (2005) A probabilistic model for stemmer generation. Inf Process Manag 41:121–137CrossRefGoogle Scholar
  18. Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. ACM Press, Los AngelesGoogle Scholar
  19. Baroni M, Matiasek J, Trost H (2002) Unsupervised discovery of morphologically related words based on orthographic and semantic similarity. In: Workshop on morphological and phonological learning (MPL’02), pp 48–57Google Scholar
  20. Bhamidipati NL, Pal SK (2007) Stemming via distribution-based word segregation for classification and retrieval. IEEE Trans Syst Man Cybern B Cybern 37:350–360CrossRefGoogle Scholar
  21. Bhattacharya S, Chhoudhury M, Sarkar S, Basu A (2005) Inflectional morphology synthesis for Bengali noun, pronoun and verb systems. In: Proceedings of the national conference on computer processing of Bangla, pp 34–43Google Scholar
  22. Biba M, Gjatu E (2014) Boosting text classification through stemming of composite words. Recent Adv Intell Inform 235:185–194CrossRefGoogle Scholar
  23. Bisazza A, Federico M (2009) Morphological pre-processing for Turkish to English statistical machine translation. In: International workshop on spoken language translation, pp 129–135Google Scholar
  24. Braschler M, RippLinger B (2004) How effective is stemming and decompounding for German text retrieval? Inf Retr Boston 7:291–316CrossRefMATHGoogle Scholar
  25. Brychcín T, Konopík M (2015) HPS: high precision stemmer. Inf Process Manag 51:68–91CrossRefGoogle Scholar
  26. Carlberger J, Dalianis H, Hassel M, Knutsson O (2001) Improving precision in information retrieval for Swedish using stemming. In: Proceedings of 13th Nordic conference on computational linguistics (NODALIDA ’01)Google Scholar
  27. Chan E (2006) Learning probabilistic paradigms for morphology in a latent class model. In: Proceedings of the eighth meeting of the ACL special interest group on computational phonology and morphology, pp 69–78Google Scholar
  28. Chaupattnaik S, Nanda S, Mohanty S (2012) A suffix stripping algorithm for Odia stemmer. Int J Comput Linguist Nat Lang Process 1:1–5Google Scholar
  29. Chen A, Gey F (2002) Building an Arabic stemmer for information retrieval. In: Proceedings of the text retrieval conference (TREC’02), pp 631–639Google Scholar
  30. Cilden E (2006) Stemming Turkish words using snowball. http://snowball.tartarus.org/algorithms/turkish/stemmer.html
  31. Darwish K, Oard D (2002) CLIR experiments at Maryland for TREC-2002: Evidence combination for Arabic-English retrieval. In: Proceedings of the text retrieval conference (TREC’02), pp 703–710Google Scholar
  32. Das A, Bandyopadhyay S (2010) Morphological stemming cluster identification for Bangla. In: Knowledge sharing event-I: task 3: morphological analyzers and generators, MysoreGoogle Scholar
  33. Dasgupta S, Khan M (2004) Feature unification for morphological parsing in Bangla. In: Proceedings of the 7th international conference on computer and information technologyGoogle Scholar
  34. Dawson JL (1974) Suffix removal for word conflation. Bull Assoc Lit Linguist Comput 2:33–46Google Scholar
  35. Deepamala N, Kumar P (2015) Kannada stemmer and its effect on Kannada documents classification. In: Proceedings of the international conference on computational intelligence in data mining, pp 75–86Google Scholar
  36. Dolamic L, Savoy J (2009a) Indexing and stemming approaches for the Czech language. Inf Process Manag 45:714–720CrossRefGoogle Scholar
  37. Dolamic L, Savoy J (2009b) Indexing and searching strategies for the Russian language. J Am Soc Inf Sci Technol 60:2540–2547CrossRefGoogle Scholar
  38. Dolamic L, Savoy J (2010) Comparative study of indexing and search strategies for the Hindi, Marathi, and Bengali languages. ACM Trans Asian Lang Inf Process 9:11CrossRefGoogle Scholar
  39. El-Beltagy S, Rafea A (2011) An accuracy-enhanced light stemmer for Arabic text. ACM Trans Speech Lang Process 7:1–22CrossRefGoogle Scholar
  40. Elrajubi O (2013) An improved Arabic light stemmer. In: 3rd International conference on research and innovation in information systems (ICRIIS’13), pp 33–38Google Scholar
  41. Eryiğit G, Adalı E (2004) An affix stripping morphological analyzer for Turkish. In: Proceedings of the IASTED international conference artificial intelligence and applicationsGoogle Scholar
  42. Fareed NS, Mousa HM, Elsisi AB (2013) Enhanced semantic Arabic Question answering system based on Khoja stemmer and AWN. In: 9th international computer engineering conference (ICENCO). IEEE, Giza, pp 85–91Google Scholar
  43. Fernández A, Díaz J, Gutiérrez Y (2011) An unsupervised method to improve Spanish stemmer. In: Natural language processing and information systems. Springer, pp 221–224Google Scholar
  44. Figuerola C, Gomez R, Rodriguez A, Berrocal J (2001) Stemming in Spanish: a first approach to its impact on information retrieval. In: Working notes of CLEF 2001 workshop. Darmstadt, Germany, pp 197–202Google Scholar
  45. Frakes WB (1992) Stemming algorithms. In: Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice-Hall, Upper Saddle River, New Jersey, pp 131–160Google Scholar
  46. Frakes WB, Fox CJ (2003) Strength and similarity of affix removal stemming algorithms. ACM SIGIR Forum 37:26–30CrossRefGoogle Scholar
  47. Froud H, Benslimane R, Lachkar A, Ouatik SA (2010) Stemming and similarity measures for Arabic documents clustering. In: 5th International symposium on communications and mobile network (ISVC), pp 1–4Google Scholar
  48. Ganguly D, Leveling J, Jones G (2012) DCU@FIRE-2012: rule-based stemmers for Bengali and Hindi. In: Fourth workshop of the forum for information retrieval evaluation (FIRE 2012)Google Scholar
  49. Gaustad T, Bouma G, Groningen R (2002) Accurate stemming of Dutch for text classification. Lang Comput 45:104–117Google Scholar
  50. Goldsmith J (2001) Unsupervised learning of the morphology of a natural language. J Comput Linguist 27:153–198MathSciNetCrossRefGoogle Scholar
  51. Goldsmith J (2006) An algorithm for the unsupervised learning of morphology. Nat Lang Eng 12:353–371CrossRefGoogle Scholar
  52. Gupta V (2014) Hindi rule based stemmer for nouns. Int J Adv Res Comput Sci Softw Eng 4:62–65Google Scholar
  53. Gupta V, Lehal GS (2011) Punjabi language stemmer for nouns and proper names. In: Proceedings of the 2nd workshop on south and southeast Asian natural language processing (WSSANLP), pp 35–39Google Scholar
  54. Hafer MA, Weiss SF (1974) Word segmentation by letter successor varieties. Inf Storage Retr 10:371–385CrossRefGoogle Scholar
  55. Hammarström H, Borin L (2011) Unsupervised learning of morphology. Comput Linguist 37:309–350CrossRefGoogle Scholar
  56. Harman D (1991) How effective is suffixing? J Am Soc Inf Sci 42:7–15CrossRefGoogle Scholar
  57. Harmanani H, Keirouz W, Raheel S (2006) A rule-based extensible stemmer for information retrieval with application to Arabic. Int Arab J Inf Technol 3:265–272Google Scholar
  58. Hegde Y, Kadambe S, Naduthota P (2013) Suffix stripping algorithm for Kannada information retrieval. In: International conference on advances in computing, communications and informatics (ICACCI), pp 527–533Google Scholar
  59. Hiemstra D (2001) Using language models for information retrieval. Taaluitgeverij Neslia PaniculataGoogle Scholar
  60. Honrado A, Leon R, O’Dennol R, Sinclair D (2000) A word stemming algorithm for the Spanish language. In: Proceedings of the 7th international symposium on string processing and information retrieval, pp 139–145Google Scholar
  61. Huddleston R (1988) English grammar: an outline. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  62. Hull DA (1996) Stemming algorithms—a case study for detailed evaluation. J Am Soc Inf Sci 47:70–84CrossRefGoogle Scholar
  63. Islam M, Uddin M, Khan M (2007) A Light weight stemmer for Bengali and its use in spelling checker. In: Proceedings of the 1st international conference on digital communications and computersGoogle Scholar
  64. Jivani AG (2011) A comparative study of stemming algorithms. Int J Comput Technol Appl 2:1930–1938Google Scholar
  65. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of 10th European conference on machine learning, chapter. Springer, pp 137–142Google Scholar
  66. Jordan C, Healy J, Keselj V (2006) Swordfish: an unsupervised ngram based approach to morphological analysis. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 657–658Google Scholar
  67. Jurka TP, Collingwood L, Boydstun AE et al (2013) RTextTools: a supervised learning package for text classification. R J 5:6–12Google Scholar
  68. Kalamboukis T, Nikolaidis S (1995) Suffix stripping with modern Greek. Progr Electron Libr Inf Syst 29:313–321Google Scholar
  69. Kalamboukis T, Nikolaidis S (1999) An evaluation of stemming algorithms with modern Greek. In: Proceedings of the 7th Hellenic conference on informatics, pp 61–70Google Scholar
  70. Kchaou Z, Kanoun S (2008) Arabic stemming with two dictionaries. In: IEEE international conferenece on innovations in information technology, pp 688–691Google Scholar
  71. Khoja S, Garside R (1999) Stemming Arabic text. Computing Department, Lancaster University, LancasterGoogle Scholar
  72. Kleinberg J (1999) Authoritative sources in a hyperlinked environment. J ACM 46:604–632MathSciNetCrossRefMATHGoogle Scholar
  73. Konkol M, Konopík M (2014) Named entity recognition for highly inflectional languages: effects of various lemmatization and stemming approaches. In: Text, speech and dialogue, pp 267–274Google Scholar
  74. Korenius T, Laurikkala J, Jarvelin K, Juhola M (2004) Stemming and lemmatization in the clustering of finnish text documents. In: Proceedings of the thirteenth ACM international conference on information and knowledge management (CIKM’04), pp 625–633Google Scholar
  75. Kraaij W, Pohlman R (1994) Porter’s stemming algorithm for Dutch. New Rev Doc Text Manag 1:25–43Google Scholar
  76. Kraaij W, Pohlman R (1996) Viewing stemming as recall enhancement. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, pp 40–48Google Scholar
  77. Krovetz R (1993) Viewing morphology as an inference process. In: Proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval, pp 191–202Google Scholar
  78. Kumar D, Rana P (2010) Design and development of a stemmer for Punjabi. Int J Comput Appl 11:18–23Google Scholar
  79. Larkey L, Ballesteros L, Connell ME (2007) Light stemming for Arabic information retrieval. Arab Comput Morphol Text Speech Lang Technol 38:221–243CrossRefGoogle Scholar
  80. Larkey L, Ballesteros L, Connell ME (2002) Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. In: Proceedings of the 25th annual international ACM conference on research and development in information retrieval (SIGIR ’02), pp 275–282Google Scholar
  81. Larkey L, Connell M, Abdulijaleel N (2003) Hindi CLIR in thirty days. ACM Trans Asian Lang Inf Process 2:130–142CrossRefGoogle Scholar
  82. Lavie A, Sagae K, Jayaraman S (2004) The significance of recall in automatic metrics for MT evaluation. In: Machine translation: from real users to research. Springer, pp 134–143Google Scholar
  83. Lennon M, Peirce DS, Tarry BD, Willett P (1981) An evaluation of some conflation algorithms for information retrieval. J Inf Sci 3:177–183CrossRefGoogle Scholar
  84. Lennon M, Pierce DS, Tarry BD, Willett P (1988) An evaluation of some conflation algorithms for information retrieval. In: Document retrieval systems, pp 99–105Google Scholar
  85. Louis A, Nenkova A (2009) Automatically evaluating content selection in summarization without human models. In: Proceedings of the conference on empirical methods in natural language processing, pp 306–314Google Scholar
  86. Lovins JB (1968) Development of a stemming algorithm. Mech Transl Comput Linguist 11:22–31Google Scholar
  87. Lushanthan S, Weerasingha A, Hearth D (2014) Morphological analyzer and generator for Tamil language. In: International conference on advances in ICT for emerging regions (ICTer), pp 190–196Google Scholar
  88. Mass D (1996) MPROE—Ein system zur analyse und synthese deutscher Wörter. In: Hauser R (Ed) Linguistische Verifikation. Max Niemeyer Verlag, TübingenGoogle Scholar
  89. Mahmud M, Afrin M, Razzaque M et al (2014) A rule based Bengali stemmer. In: International conference on advances in computing, communication and informatics, pp 2750–2756Google Scholar
  90. Majumder P, Mitra M, Datta K (2007a) Statistical vs. rule-based stemming for monolingual french retrieval. Eval Multiling Multi Modal Inf Retr 4730:107–110CrossRefGoogle Scholar
  91. Majumder P, Mitra M, Parui SK et al (2007b) YASS: yet another suffix stripper. ACM Trans Inf Syst 25:18CrossRefGoogle Scholar
  92. Majumder P, Mitra M, Pal D (2008) Bulgarian, Hungarian and Czech stemming using YASS. In: Advances in multilingual and multimodal information retrieval, pp 49–56Google Scholar
  93. Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkCrossRefMATHGoogle Scholar
  94. Mayfield J, Mcnamee P (2003) Single N-gram stemming. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development of information retrieval, pp 415–416Google Scholar
  95. Mcnamee P, Mayfield J (2004) Character n-gram tokenization for European language text retrieval. Inf Retr Boston 7:73–97CrossRefGoogle Scholar
  96. Melucci M, Orio N (2003) A novel method for stemmer generation based on hidden Markov models. In: Proceedings of the twelfth international conference on information and knowledge management (CIKM’03), pp 131–138Google Scholar
  97. Méndez-Cruz C-F, Torres-Moreno J-M, Medina-Urrea A, Sierra G (2013) Extrinsic evaluation on automatic summarization tasks: testing affixality measurements for statistical word stemming. In: Advances in computational intelligence. Springer, pp 46–57Google Scholar
  98. Meyer D, Dimitriadou E, Hornik K et al (2012) Misc functions of the department of statistics (e1071), TU Wien. R Packag 1:5–24Google Scholar
  99. Monz C (2003) From document retrieval to question answering. Institute for Logic, Language and Computation, AmsterdamMATHGoogle Scholar
  100. Monz C, Rijke M (2002) Shallow morphological analysis in monolingual information retrieval for Dutch, German, and Italian. Eval Cross Lang Inf Retr Syst 2046:262–277CrossRefMATHGoogle Scholar
  101. Moral C, Antonio A, Imbert R, Ramirez J (2014) A survey of stemming algorithms in information retrieval. Inf Res 19:1–14Google Scholar
  102. Nakov P (2003) Design and evaluation of inflectional stemmer for Bulgarian. In: Proceedings of workshop on Balkan language resources and toolsGoogle Scholar
  103. Ntais G (2006) Development of a stemmer for the Greek language. Master Thesis, Department of Computer and Systems Sciences, Stockholm UniversityGoogle Scholar
  104. Oard D, Levow G, Cabezas C (2001) CLEF experiments at Maryland? Statistical stemming and backoff translation. In: Proceedings of the workshop of cross-language evaluation forum on cross language information retrieval and evaluation. Springer, Berlin, pp 176–187Google Scholar
  105. Open American National Corpus. http://www.anc.org/data/oanc
  106. Orengo V, Huyck C (2001) A stemming algorithm for the Portuguese language. In: Proceedings of 8th internatioanl symposium on string processing and information retrieval, pp 186–193Google Scholar
  107. Othman R (1993) Footer Malay word for document retrieval system. M.Sc. Thesis. National University of MalaysiaGoogle Scholar
  108. Ounis I, Amati G, Plachouras V, et al (2006) Terrier: a high performance and scalable information retrieval platform. In: Proceedings of ACM SIGIR’06 workshop on open source information retrieval (OSIR 2006)Google Scholar
  109. Paice CD (1990) Another stemmer. ACM SIGIR Forum 24:56–61CrossRefGoogle Scholar
  110. Paice CD (1994) An evaluation method for stemming algorithms. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, pp 42–50Google Scholar
  111. Paik J, Mitra M, Parui S, Jarvelin K (2011a) GRAS: an effective and efficient stemming algorithm for information retrieval. ACM Trans Inf Syst 29:1–24CrossRefGoogle Scholar
  112. Paik JH, Pal D, Parui SK (2011c) A novel corpus-based stemming algorithm using co-occurrence statistics. In: Proceedings of the 34th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’11). ACM, New York, pp 863–872Google Scholar
  113. Paik JH, Parui SK (2011b) A fast corpus-based stemmer. ACM Trans Asian Lang Inf Process 10:1–16. doi: 10.1145/1967293.1967295 CrossRefGoogle Scholar
  114. Paik JH, Parui SK, Pal D, Robertson SE (2013) Effective and robust query-based stemming. ACM Trans Inf Syst 31:1–29. doi: 10.1145/2536736.2536738 CrossRefGoogle Scholar
  115. Patel P, Popat K, Bhattacharyya P (2010) Hybrid stemmer for Gujarati. In: Proceedings of the 23rd international conference on computational linguistics (COLING), pp 51–55Google Scholar
  116. Peng F, Ahmed N, Li X, Lu Y (2007) Context sensitive stemming for web search. In: Proceedings of the 30th annual International ACM SIGIR conference on research and development in information retrieval—SIGIR ’07. ACM Press, New York, p 639Google Scholar
  117. Popovic M, Willet P (1992) The effectiveness of stemming for natural-language access to Slovene textual data. J Am Soc Inf Sci 43:384–390CrossRefGoogle Scholar
  118. Porter MF (1980) An algorithm for suffix stripping. Progr Electron Libr Inf Syst 14:130–137Google Scholar
  119. Porter MF (2001) Snowball: a language for stemming algorithms. http://snowball.tartarus.org
  120. Ramachandran V, Krishnamurthi I (2012) An iterative stemmer for Tamil language. In: Proceedings of the 4th Asian conference, ACIIDS 2012, pp 197–205Google Scholar
  121. Ramanathan A, Hegde J, Shah RM, et al (2008) Simple syntactic and morphological processing can help English–Hindi statistical machine translation. In: International joint conference on natural language processing, pp 513–520Google Scholar
  122. Ramanathan A, Rao D (2003) A lightweight stemmer for Hindi. In: Proceedings of the 10th conference of the European chapter of the association for computational linguisticsGoogle Scholar
  123. Robertson SE, Walker S, Beaulieu M (2000) Experimentation as a way of life: Okapi at TREC. Inf Process Manag 36:95–108CrossRefGoogle Scholar
  124. Rosell M (2003) Improving clustering of Swedish newspaper articles using stemming and compound splitting. In: NoDaLiDa 2003, Reykjavik, Iceland 2003, pp 1–7Google Scholar
  125. Salton G, McGill M (1971) The SMART retrieval system—experiments in automatic document retrieval. Prentice Hall Inc., Englewood CliffsGoogle Scholar
  126. Sandhya N, Lalitha YS, Sowmya V et al (2011) Analysis of stemming algorithm for text clustering. IJCSI Int J Comput Sci Issues 8:352–359Google Scholar
  127. Savoy J (1999) A stemming procedure and stopword list for general French corpora. J Am Soc Inf Sci 50:944–952CrossRefGoogle Scholar
  128. Savoy J (2006) Light stemming approaches for the French, Portuguese, German and Hungarian languages. In: Proceedings of the 2006 ACM symposium on applied computing, pp 1031–1035Google Scholar
  129. Savoy J (2008) Searching strategies for the Hungarian language. Inf Process Manag 44:310–324CrossRefGoogle Scholar
  130. Savoy J, Berger P-Y (2006) Monolingual, Bilingual, and GIRT information retrieval at CLEF-2005. In: 6th workshop of the cross-language evalution forum, CLEF 2005, pp 131–140Google Scholar
  131. Sembok T (2005) Word stemming algorithms and retrieval effectiveness in Malay and Arabic documents retrieval systems. In: Proceedings of the world academy of science, engineering and technologyGoogle Scholar
  132. Sever H, Bitirim Y (2003) FindStem: analysis and evaluation of a Turkish stemming algorithm. In: Proceedings of the 10th international symposium on string processing and information retrieval, pp 238–251Google Scholar
  133. Sharifloo A, Shamsfard M (2008) A bottom up approach to persian stemming. In: Proceedings of the third international joint conference on natural language processingGoogle Scholar
  134. Shrivastava M, Bhattacharyya P (2008) Hindi POS tagger using naive stemming: harnessing morphological information without extensive linguistic knowledge. In: Proceedings of international conference on NLP (ICON08)Google Scholar
  135. Shrivastava M, Mohapatra B, Bhattacharyya P et al (2005) Morphology based natural language processing tools for Indian languages. In: Proceedings of the 4th annual international research student seminar in computer scienceGoogle Scholar
  136. Smirnov I (2008) Overview of stemming algorithms. In: Mechanical Translation. http://thesmirnovs.org/info/stemming.pdf. Accessed 25 May 2014
  137. Soares MVB, Prati RC, Monard MC (2009) Improvement on the Porter’s stemming algorithm for Portuguese. IEEE Lat Am Trans 7:472–477CrossRefGoogle Scholar
  138. Stein B, Potthast M (2007) Putting successor variety stemming to work. In: Advances in data analysis. Springer, pp 367–374Google Scholar
  139. Suba K, Jiandani D, Bhattacharyya P (2011) Hybrid inflectional stemmer and rule-based derivational stemmer for Gujarati. In: Sangal R, Malik M (eds) Proceedings of the 23rd workshop on south and southeast Asian natural language processing (WSSANLP). Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp 1–8Google Scholar
  140. Taghva K, Elkhoury R, Coombs J (2005) Arabic stemming without a root dictionary. In: Proceedings of the International conference on information technology: coding and computing (ITCC’05), pp 152–157Google Scholar
  141. Tai S, Ong C, Abdullah N (2000) On designing an automated Malaysian stemmer for the Malay language. In: Proceedings of the fifth international workshop on information retrieval with Asian languages, pp 207–208Google Scholar
  142. Tala F (2003) A study of stemming effects on information retrieval in Bahasa Indonesia. Master Thesis, University of AmsterdamGoogle Scholar
  143. Terrier information retrieval platform. http://terrier.org
  144. The lemur project. http://www.lemurproject.org
  145. The R project for statistical computing. http://www.r-project.org
  146. Toutanova K, Suzuki H, Ruopp A (2008) Applying morphology generation models to machine translation. In: Association for computational linguistics, pp 514–522Google Scholar
  147. Xapian project website. http://xapian.org
  148. Xu J, Croft WB (1998) Corpus-based stemming using cooccurrence of word variants. ACM Trans Inf Syst 16:61–81CrossRefGoogle Scholar
  149. Yadav A, Yadav R, Pal S (2012) ISM@FIRE-2012 adhoc retrieval and morpheme extraction task. In: Post proceedings of FIRE-2012Google Scholar
  150. Zollmann A, Venugopal A, Vogel S (2006) Bridging the inflection morphology gap for Arabic statistical machine translation. In: Proceedings of the human language technology, pp 201–204Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  1. 1.University Institute of Engineering and Technology, Panjab UniversityChandigarhIndia

Personalised recommendations