A systematic review of text stemming techniques
- 639 Downloads
- 3 Citations
Abstract
Stemming is a program that matches the morphological variants of the word to its root word. Stemming is extensively used as a pre-processing tool in the field of natural language processing, information retrieval, and language modeling. Though a lot of advancements have been made in the field, yet organized arrangement of the previous work and efforts are lacking in this field. In this paper, we present a review of the text stemming theory, algorithms, and applications. It first describes the existing literature relevant to text stemming by classifying it according to certain key parameters; then it describes the deep analysis of some well-known stemming algorithms on standard data sets. In the end, the current state-of-the-art and certain open issues related to unsupervised stemming are presented. The main aim of this paper is to provide an extensive and useful understanding of the important aspects of text stemming. The open issues and analysis of the current stemming techniques will help the researchers to think of new lines to conduct research in future.
Keywords
Stemming Natural language processing Information retrieval Language modelingReferences
- Adam G, Asimakis K, Bouras C, Poulopoulos V (2010) An efficient mechanism for stemming and tagging: the case of greek language. In: Proceedings of the 14th international conference on knowledge-based and intelligent information and engineering systems, pp 389–397Google Scholar
- Ahmad F, Yusoff M, Sembok T (1996) Experiments with a stemming algorithm for Malay words. J Am Soc Inf Sci 47:909–918CrossRefGoogle Scholar
- Ahmed F, Nürnberger A (2009) Evaluation of n-gram conflation approaches for Arabic text retrieval. J Am Soc Inf Sci Technol 60:1448–1465CrossRefGoogle Scholar
- Akram Q-A, Naseer A, Hussain S (2009) Assas-band, an affix-exception-list based Urdu stemmer. In: Proceedings of the 7th workshop on Asian language resources, pp 40–46Google Scholar
- Aljlayl M, Frieder O (2002) On Arabic search: improving the retrieval effectiveness via a light stemming approach. In: ACM eleventh conference on information and knowledge management, pp 340–347Google Scholar
- Al-Kabi M (2013) Towards improving Khoja rule-based Arabic stemmer. In: IEEE Jordan conference on applied electrical engineering and computing technologies (AEECT), pp 1–6Google Scholar
- Alshalabi R (2005) Pattern-based stemmer for finding Arabic roots. Inf Technol J 4:38–43CrossRefGoogle Scholar
- Al-Shalabi R, Kannan G, Hilat I et al (2005) Experiments with the successor variety algorithm using the cutoff and entropy methods. Inf Technol J 4:55–62CrossRefGoogle Scholar
- Al-shammari E, Lin J (2008) Towards an error-free Arabic stemming. In: Proceedings of the 2nd ACM workshop on improving non English web searching, iNEWS’08, pp 9–16Google Scholar
- Alvares R, Garcia A, Ferraz I (2005) STEMBR: a stemming algorithm for the Brazilian Portuguese language. In: Proceedings of 12th Portuguese conference on artificial intelligence, EPIA 2005, pp 693–701Google Scholar
- Al-Zyoud A, Al-Rabayah W (2015) Arabic stemming techniques: comparisons and new vision. In: Proceedings of the 8th IEEE GCC conference and exhibition, pp 1–6Google Scholar
- Amati G (2006) Frequentist and bayesian approach to information retrieval. In: Advances in information retrieval. Springer, pp 13–24Google Scholar
- Amati G, Van Rijsbergen CJ (2002) Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans Inf Syst 20:357–389CrossRefGoogle Scholar
- Apache lucene. http://lucene.apache.org
- Baayen RH, Piepenbrock R, van H R (1993) The CELEX lexical data base (CD-ROM). Linguistic data consortium. University of Pennsylvania, PhiladelphiaGoogle Scholar
- Bacchin M, Ferro N, Melucci M (2002) The effiectiveness of a graph-based algorithm for stemming. In: Digital libraries: people, knowledge, and technology. Springer, pp 117–128Google Scholar
- Bacchin M, Ferro N, Melucci M (2005) A probabilistic model for stemmer generation. Inf Process Manag 41:121–137CrossRefGoogle Scholar
- Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. ACM Press, Los AngelesGoogle Scholar
- Baroni M, Matiasek J, Trost H (2002) Unsupervised discovery of morphologically related words based on orthographic and semantic similarity. In: Workshop on morphological and phonological learning (MPL’02), pp 48–57Google Scholar
- Bhamidipati NL, Pal SK (2007) Stemming via distribution-based word segregation for classification and retrieval. IEEE Trans Syst Man Cybern B Cybern 37:350–360CrossRefGoogle Scholar
- Bhattacharya S, Chhoudhury M, Sarkar S, Basu A (2005) Inflectional morphology synthesis for Bengali noun, pronoun and verb systems. In: Proceedings of the national conference on computer processing of Bangla, pp 34–43Google Scholar
- Biba M, Gjatu E (2014) Boosting text classification through stemming of composite words. Recent Adv Intell Inform 235:185–194CrossRefGoogle Scholar
- Bisazza A, Federico M (2009) Morphological pre-processing for Turkish to English statistical machine translation. In: International workshop on spoken language translation, pp 129–135Google Scholar
- Braschler M, RippLinger B (2004) How effective is stemming and decompounding for German text retrieval? Inf Retr Boston 7:291–316CrossRefMATHGoogle Scholar
- Brychcín T, Konopík M (2015) HPS: high precision stemmer. Inf Process Manag 51:68–91CrossRefGoogle Scholar
- Carlberger J, Dalianis H, Hassel M, Knutsson O (2001) Improving precision in information retrieval for Swedish using stemming. In: Proceedings of 13th Nordic conference on computational linguistics (NODALIDA ’01)Google Scholar
- Chan E (2006) Learning probabilistic paradigms for morphology in a latent class model. In: Proceedings of the eighth meeting of the ACL special interest group on computational phonology and morphology, pp 69–78Google Scholar
- Chaupattnaik S, Nanda S, Mohanty S (2012) A suffix stripping algorithm for Odia stemmer. Int J Comput Linguist Nat Lang Process 1:1–5Google Scholar
- Chen A, Gey F (2002) Building an Arabic stemmer for information retrieval. In: Proceedings of the text retrieval conference (TREC’02), pp 631–639Google Scholar
- Cilden E (2006) Stemming Turkish words using snowball. http://snowball.tartarus.org/algorithms/turkish/stemmer.html
- Darwish K, Oard D (2002) CLIR experiments at Maryland for TREC-2002: Evidence combination for Arabic-English retrieval. In: Proceedings of the text retrieval conference (TREC’02), pp 703–710Google Scholar
- Das A, Bandyopadhyay S (2010) Morphological stemming cluster identification for Bangla. In: Knowledge sharing event-I: task 3: morphological analyzers and generators, MysoreGoogle Scholar
- Dasgupta S, Khan M (2004) Feature unification for morphological parsing in Bangla. In: Proceedings of the 7th international conference on computer and information technologyGoogle Scholar
- Dawson JL (1974) Suffix removal for word conflation. Bull Assoc Lit Linguist Comput 2:33–46Google Scholar
- Deepamala N, Kumar P (2015) Kannada stemmer and its effect on Kannada documents classification. In: Proceedings of the international conference on computational intelligence in data mining, pp 75–86Google Scholar
- Dolamic L, Savoy J (2009a) Indexing and stemming approaches for the Czech language. Inf Process Manag 45:714–720CrossRefGoogle Scholar
- Dolamic L, Savoy J (2009b) Indexing and searching strategies for the Russian language. J Am Soc Inf Sci Technol 60:2540–2547CrossRefGoogle Scholar
- Dolamic L, Savoy J (2010) Comparative study of indexing and search strategies for the Hindi, Marathi, and Bengali languages. ACM Trans Asian Lang Inf Process 9:11CrossRefGoogle Scholar
- El-Beltagy S, Rafea A (2011) An accuracy-enhanced light stemmer for Arabic text. ACM Trans Speech Lang Process 7:1–22CrossRefGoogle Scholar
- Elrajubi O (2013) An improved Arabic light stemmer. In: 3rd International conference on research and innovation in information systems (ICRIIS’13), pp 33–38Google Scholar
- Eryiğit G, Adalı E (2004) An affix stripping morphological analyzer for Turkish. In: Proceedings of the IASTED international conference artificial intelligence and applicationsGoogle Scholar
- Fareed NS, Mousa HM, Elsisi AB (2013) Enhanced semantic Arabic Question answering system based on Khoja stemmer and AWN. In: 9th international computer engineering conference (ICENCO). IEEE, Giza, pp 85–91Google Scholar
- Fernández A, Díaz J, Gutiérrez Y (2011) An unsupervised method to improve Spanish stemmer. In: Natural language processing and information systems. Springer, pp 221–224Google Scholar
- Figuerola C, Gomez R, Rodriguez A, Berrocal J (2001) Stemming in Spanish: a first approach to its impact on information retrieval. In: Working notes of CLEF 2001 workshop. Darmstadt, Germany, pp 197–202Google Scholar
- Frakes WB (1992) Stemming algorithms. In: Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice-Hall, Upper Saddle River, New Jersey, pp 131–160Google Scholar
- Frakes WB, Fox CJ (2003) Strength and similarity of affix removal stemming algorithms. ACM SIGIR Forum 37:26–30CrossRefGoogle Scholar
- Froud H, Benslimane R, Lachkar A, Ouatik SA (2010) Stemming and similarity measures for Arabic documents clustering. In: 5th International symposium on communications and mobile network (ISVC), pp 1–4Google Scholar
- Ganguly D, Leveling J, Jones G (2012) DCU@FIRE-2012: rule-based stemmers for Bengali and Hindi. In: Fourth workshop of the forum for information retrieval evaluation (FIRE 2012)Google Scholar
- Gaustad T, Bouma G, Groningen R (2002) Accurate stemming of Dutch for text classification. Lang Comput 45:104–117Google Scholar
- Goldsmith J (2001) Unsupervised learning of the morphology of a natural language. J Comput Linguist 27:153–198MathSciNetCrossRefGoogle Scholar
- Goldsmith J (2006) An algorithm for the unsupervised learning of morphology. Nat Lang Eng 12:353–371CrossRefGoogle Scholar
- Gupta V (2014) Hindi rule based stemmer for nouns. Int J Adv Res Comput Sci Softw Eng 4:62–65Google Scholar
- Gupta V, Lehal GS (2011) Punjabi language stemmer for nouns and proper names. In: Proceedings of the 2nd workshop on south and southeast Asian natural language processing (WSSANLP), pp 35–39Google Scholar
- Hafer MA, Weiss SF (1974) Word segmentation by letter successor varieties. Inf Storage Retr 10:371–385CrossRefGoogle Scholar
- Hammarström H, Borin L (2011) Unsupervised learning of morphology. Comput Linguist 37:309–350CrossRefGoogle Scholar
- Harman D (1991) How effective is suffixing? J Am Soc Inf Sci 42:7–15CrossRefGoogle Scholar
- Harmanani H, Keirouz W, Raheel S (2006) A rule-based extensible stemmer for information retrieval with application to Arabic. Int Arab J Inf Technol 3:265–272Google Scholar
- Hegde Y, Kadambe S, Naduthota P (2013) Suffix stripping algorithm for Kannada information retrieval. In: International conference on advances in computing, communications and informatics (ICACCI), pp 527–533Google Scholar
- Hiemstra D (2001) Using language models for information retrieval. Taaluitgeverij Neslia PaniculataGoogle Scholar
- Honrado A, Leon R, O’Dennol R, Sinclair D (2000) A word stemming algorithm for the Spanish language. In: Proceedings of the 7th international symposium on string processing and information retrieval, pp 139–145Google Scholar
- Huddleston R (1988) English grammar: an outline. Cambridge University Press, CambridgeCrossRefGoogle Scholar
- Hull DA (1996) Stemming algorithms—a case study for detailed evaluation. J Am Soc Inf Sci 47:70–84CrossRefGoogle Scholar
- Islam M, Uddin M, Khan M (2007) A Light weight stemmer for Bengali and its use in spelling checker. In: Proceedings of the 1st international conference on digital communications and computersGoogle Scholar
- Jivani AG (2011) A comparative study of stemming algorithms. Int J Comput Technol Appl 2:1930–1938Google Scholar
- Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of 10th European conference on machine learning, chapter. Springer, pp 137–142Google Scholar
- Jordan C, Healy J, Keselj V (2006) Swordfish: an unsupervised ngram based approach to morphological analysis. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 657–658Google Scholar
- Jurka TP, Collingwood L, Boydstun AE et al (2013) RTextTools: a supervised learning package for text classification. R J 5:6–12Google Scholar
- Kalamboukis T, Nikolaidis S (1995) Suffix stripping with modern Greek. Progr Electron Libr Inf Syst 29:313–321Google Scholar
- Kalamboukis T, Nikolaidis S (1999) An evaluation of stemming algorithms with modern Greek. In: Proceedings of the 7th Hellenic conference on informatics, pp 61–70Google Scholar
- Kchaou Z, Kanoun S (2008) Arabic stemming with two dictionaries. In: IEEE international conferenece on innovations in information technology, pp 688–691Google Scholar
- Khoja S, Garside R (1999) Stemming Arabic text. Computing Department, Lancaster University, LancasterGoogle Scholar
- Kleinberg J (1999) Authoritative sources in a hyperlinked environment. J ACM 46:604–632MathSciNetCrossRefMATHGoogle Scholar
- Konkol M, Konopík M (2014) Named entity recognition for highly inflectional languages: effects of various lemmatization and stemming approaches. In: Text, speech and dialogue, pp 267–274Google Scholar
- Korenius T, Laurikkala J, Jarvelin K, Juhola M (2004) Stemming and lemmatization in the clustering of finnish text documents. In: Proceedings of the thirteenth ACM international conference on information and knowledge management (CIKM’04), pp 625–633Google Scholar
- Kraaij W, Pohlman R (1994) Porter’s stemming algorithm for Dutch. New Rev Doc Text Manag 1:25–43Google Scholar
- Kraaij W, Pohlman R (1996) Viewing stemming as recall enhancement. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, pp 40–48Google Scholar
- Krovetz R (1993) Viewing morphology as an inference process. In: Proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval, pp 191–202Google Scholar
- Kumar D, Rana P (2010) Design and development of a stemmer for Punjabi. Int J Comput Appl 11:18–23Google Scholar
- Larkey L, Ballesteros L, Connell ME (2007) Light stemming for Arabic information retrieval. Arab Comput Morphol Text Speech Lang Technol 38:221–243CrossRefGoogle Scholar
- Larkey L, Ballesteros L, Connell ME (2002) Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. In: Proceedings of the 25th annual international ACM conference on research and development in information retrieval (SIGIR ’02), pp 275–282Google Scholar
- Larkey L, Connell M, Abdulijaleel N (2003) Hindi CLIR in thirty days. ACM Trans Asian Lang Inf Process 2:130–142CrossRefGoogle Scholar
- Lavie A, Sagae K, Jayaraman S (2004) The significance of recall in automatic metrics for MT evaluation. In: Machine translation: from real users to research. Springer, pp 134–143Google Scholar
- Lennon M, Peirce DS, Tarry BD, Willett P (1981) An evaluation of some conflation algorithms for information retrieval. J Inf Sci 3:177–183CrossRefGoogle Scholar
- Lennon M, Pierce DS, Tarry BD, Willett P (1988) An evaluation of some conflation algorithms for information retrieval. In: Document retrieval systems, pp 99–105Google Scholar
- Louis A, Nenkova A (2009) Automatically evaluating content selection in summarization without human models. In: Proceedings of the conference on empirical methods in natural language processing, pp 306–314Google Scholar
- Lovins JB (1968) Development of a stemming algorithm. Mech Transl Comput Linguist 11:22–31Google Scholar
- Lushanthan S, Weerasingha A, Hearth D (2014) Morphological analyzer and generator for Tamil language. In: International conference on advances in ICT for emerging regions (ICTer), pp 190–196Google Scholar
- Mass D (1996) MPROE—Ein system zur analyse und synthese deutscher Wörter. In: Hauser R (Ed) Linguistische Verifikation. Max Niemeyer Verlag, TübingenGoogle Scholar
- Mahmud M, Afrin M, Razzaque M et al (2014) A rule based Bengali stemmer. In: International conference on advances in computing, communication and informatics, pp 2750–2756Google Scholar
- Majumder P, Mitra M, Datta K (2007a) Statistical vs. rule-based stemming for monolingual french retrieval. Eval Multiling Multi Modal Inf Retr 4730:107–110CrossRefGoogle Scholar
- Majumder P, Mitra M, Parui SK et al (2007b) YASS: yet another suffix stripper. ACM Trans Inf Syst 25:18CrossRefGoogle Scholar
- Majumder P, Mitra M, Pal D (2008) Bulgarian, Hungarian and Czech stemming using YASS. In: Advances in multilingual and multimodal information retrieval, pp 49–56Google Scholar
- Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkCrossRefMATHGoogle Scholar
- Mayfield J, Mcnamee P (2003) Single N-gram stemming. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development of information retrieval, pp 415–416Google Scholar
- Mcnamee P, Mayfield J (2004) Character n-gram tokenization for European language text retrieval. Inf Retr Boston 7:73–97CrossRefGoogle Scholar
- Melucci M, Orio N (2003) A novel method for stemmer generation based on hidden Markov models. In: Proceedings of the twelfth international conference on information and knowledge management (CIKM’03), pp 131–138Google Scholar
- Méndez-Cruz C-F, Torres-Moreno J-M, Medina-Urrea A, Sierra G (2013) Extrinsic evaluation on automatic summarization tasks: testing affixality measurements for statistical word stemming. In: Advances in computational intelligence. Springer, pp 46–57Google Scholar
- Meyer D, Dimitriadou E, Hornik K et al (2012) Misc functions of the department of statistics (e1071), TU Wien. R Packag 1:5–24Google Scholar
- Monz C (2003) From document retrieval to question answering. Institute for Logic, Language and Computation, AmsterdamMATHGoogle Scholar
- Monz C, Rijke M (2002) Shallow morphological analysis in monolingual information retrieval for Dutch, German, and Italian. Eval Cross Lang Inf Retr Syst 2046:262–277CrossRefMATHGoogle Scholar
- Moral C, Antonio A, Imbert R, Ramirez J (2014) A survey of stemming algorithms in information retrieval. Inf Res 19:1–14Google Scholar
- Nakov P (2003) Design and evaluation of inflectional stemmer for Bulgarian. In: Proceedings of workshop on Balkan language resources and toolsGoogle Scholar
- Ntais G (2006) Development of a stemmer for the Greek language. Master Thesis, Department of Computer and Systems Sciences, Stockholm UniversityGoogle Scholar
- Oard D, Levow G, Cabezas C (2001) CLEF experiments at Maryland? Statistical stemming and backoff translation. In: Proceedings of the workshop of cross-language evaluation forum on cross language information retrieval and evaluation. Springer, Berlin, pp 176–187Google Scholar
- Open American National Corpus. http://www.anc.org/data/oanc
- Orengo V, Huyck C (2001) A stemming algorithm for the Portuguese language. In: Proceedings of 8th internatioanl symposium on string processing and information retrieval, pp 186–193Google Scholar
- Othman R (1993) Footer Malay word for document retrieval system. M.Sc. Thesis. National University of MalaysiaGoogle Scholar
- Ounis I, Amati G, Plachouras V, et al (2006) Terrier: a high performance and scalable information retrieval platform. In: Proceedings of ACM SIGIR’06 workshop on open source information retrieval (OSIR 2006)Google Scholar
- Paice CD (1990) Another stemmer. ACM SIGIR Forum 24:56–61CrossRefGoogle Scholar
- Paice CD (1994) An evaluation method for stemming algorithms. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, pp 42–50Google Scholar
- Paik J, Mitra M, Parui S, Jarvelin K (2011a) GRAS: an effective and efficient stemming algorithm for information retrieval. ACM Trans Inf Syst 29:1–24CrossRefGoogle Scholar
- Paik JH, Pal D, Parui SK (2011c) A novel corpus-based stemming algorithm using co-occurrence statistics. In: Proceedings of the 34th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’11). ACM, New York, pp 863–872Google Scholar
- Paik JH, Parui SK (2011b) A fast corpus-based stemmer. ACM Trans Asian Lang Inf Process 10:1–16. doi: 10.1145/1967293.1967295 CrossRefGoogle Scholar
- Paik JH, Parui SK, Pal D, Robertson SE (2013) Effective and robust query-based stemming. ACM Trans Inf Syst 31:1–29. doi: 10.1145/2536736.2536738 CrossRefGoogle Scholar
- Patel P, Popat K, Bhattacharyya P (2010) Hybrid stemmer for Gujarati. In: Proceedings of the 23rd international conference on computational linguistics (COLING), pp 51–55Google Scholar
- Peng F, Ahmed N, Li X, Lu Y (2007) Context sensitive stemming for web search. In: Proceedings of the 30th annual International ACM SIGIR conference on research and development in information retrieval—SIGIR ’07. ACM Press, New York, p 639Google Scholar
- Popovic M, Willet P (1992) The effectiveness of stemming for natural-language access to Slovene textual data. J Am Soc Inf Sci 43:384–390CrossRefGoogle Scholar
- Porter MF (1980) An algorithm for suffix stripping. Progr Electron Libr Inf Syst 14:130–137Google Scholar
- Porter MF (2001) Snowball: a language for stemming algorithms. http://snowball.tartarus.org
- Ramachandran V, Krishnamurthi I (2012) An iterative stemmer for Tamil language. In: Proceedings of the 4th Asian conference, ACIIDS 2012, pp 197–205Google Scholar
- Ramanathan A, Hegde J, Shah RM, et al (2008) Simple syntactic and morphological processing can help English–Hindi statistical machine translation. In: International joint conference on natural language processing, pp 513–520Google Scholar
- Ramanathan A, Rao D (2003) A lightweight stemmer for Hindi. In: Proceedings of the 10th conference of the European chapter of the association for computational linguisticsGoogle Scholar
- Robertson SE, Walker S, Beaulieu M (2000) Experimentation as a way of life: Okapi at TREC. Inf Process Manag 36:95–108CrossRefGoogle Scholar
- Rosell M (2003) Improving clustering of Swedish newspaper articles using stemming and compound splitting. In: NoDaLiDa 2003, Reykjavik, Iceland 2003, pp 1–7Google Scholar
- Salton G, McGill M (1971) The SMART retrieval system—experiments in automatic document retrieval. Prentice Hall Inc., Englewood CliffsGoogle Scholar
- Sandhya N, Lalitha YS, Sowmya V et al (2011) Analysis of stemming algorithm for text clustering. IJCSI Int J Comput Sci Issues 8:352–359Google Scholar
- Savoy J (1999) A stemming procedure and stopword list for general French corpora. J Am Soc Inf Sci 50:944–952CrossRefGoogle Scholar
- Savoy J (2006) Light stemming approaches for the French, Portuguese, German and Hungarian languages. In: Proceedings of the 2006 ACM symposium on applied computing, pp 1031–1035Google Scholar
- Savoy J (2008) Searching strategies for the Hungarian language. Inf Process Manag 44:310–324CrossRefGoogle Scholar
- Savoy J, Berger P-Y (2006) Monolingual, Bilingual, and GIRT information retrieval at CLEF-2005. In: 6th workshop of the cross-language evalution forum, CLEF 2005, pp 131–140Google Scholar
- Sembok T (2005) Word stemming algorithms and retrieval effectiveness in Malay and Arabic documents retrieval systems. In: Proceedings of the world academy of science, engineering and technologyGoogle Scholar
- Sever H, Bitirim Y (2003) FindStem: analysis and evaluation of a Turkish stemming algorithm. In: Proceedings of the 10th international symposium on string processing and information retrieval, pp 238–251Google Scholar
- Sharifloo A, Shamsfard M (2008) A bottom up approach to persian stemming. In: Proceedings of the third international joint conference on natural language processingGoogle Scholar
- Shrivastava M, Bhattacharyya P (2008) Hindi POS tagger using naive stemming: harnessing morphological information without extensive linguistic knowledge. In: Proceedings of international conference on NLP (ICON08)Google Scholar
- Shrivastava M, Mohapatra B, Bhattacharyya P et al (2005) Morphology based natural language processing tools for Indian languages. In: Proceedings of the 4th annual international research student seminar in computer scienceGoogle Scholar
- Smirnov I (2008) Overview of stemming algorithms. In: Mechanical Translation. http://thesmirnovs.org/info/stemming.pdf. Accessed 25 May 2014
- Soares MVB, Prati RC, Monard MC (2009) Improvement on the Porter’s stemming algorithm for Portuguese. IEEE Lat Am Trans 7:472–477CrossRefGoogle Scholar
- Stein B, Potthast M (2007) Putting successor variety stemming to work. In: Advances in data analysis. Springer, pp 367–374Google Scholar
- Suba K, Jiandani D, Bhattacharyya P (2011) Hybrid inflectional stemmer and rule-based derivational stemmer for Gujarati. In: Sangal R, Malik M (eds) Proceedings of the 23rd workshop on south and southeast Asian natural language processing (WSSANLP). Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp 1–8Google Scholar
- Taghva K, Elkhoury R, Coombs J (2005) Arabic stemming without a root dictionary. In: Proceedings of the International conference on information technology: coding and computing (ITCC’05), pp 152–157Google Scholar
- Tai S, Ong C, Abdullah N (2000) On designing an automated Malaysian stemmer for the Malay language. In: Proceedings of the fifth international workshop on information retrieval with Asian languages, pp 207–208Google Scholar
- Tala F (2003) A study of stemming effects on information retrieval in Bahasa Indonesia. Master Thesis, University of AmsterdamGoogle Scholar
- Terrier information retrieval platform. http://terrier.org
- The lemur project. http://www.lemurproject.org
- The R project for statistical computing. http://www.r-project.org
- Toutanova K, Suzuki H, Ruopp A (2008) Applying morphology generation models to machine translation. In: Association for computational linguistics, pp 514–522Google Scholar
- Xapian project website. http://xapian.org
- Xu J, Croft WB (1998) Corpus-based stemming using cooccurrence of word variants. ACM Trans Inf Syst 16:61–81CrossRefGoogle Scholar
- Yadav A, Yadav R, Pal S (2012) ISM@FIRE-2012 adhoc retrieval and morpheme extraction task. In: Post proceedings of FIRE-2012Google Scholar
- Zollmann A, Venugopal A, Vogel S (2006) Bridging the inflection morphology gap for Arabic statistical machine translation. In: Proceedings of the human language technology, pp 201–204Google Scholar