Abstract
Stemming and lemmatization are two basic modules used for text normalization in Natural language processing (NLP) which qualifies text, words, and documents for further processing. Stemming is the process of eliminating the affixes from the inflectional word to generate root word. The extracted stem or root word may not be a valid. Lemmatization is also a process of removing the affixes from the word but returning the word in dictionary form which is known as lemma. This lemma will always be meaningful word. Hence, while developing the Lemmatizer semantic knowledge is considered. In this paper, Unsupervised Stemmer and Rule-Based Lemmatizer have been proposed for Kannada. Experimentation is done by building a dataset of 17,825 root words with the help of Kannada dictionary.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mishra, U., Prakash, C.: MAULIK: an effective stemmer for Hindi language. Int. J. Comput. Sci. Eng. 4, 711717 (2012)
Shambhavi, B.R., Ramakanth Kumar, P., Srividya, K., Jyothi, B.J., Kundargi, S., Varsha Shastri G.: Kannada morphological analyser and generator using trie. IJCSNS Int. J. Comput. Sci. Network Security 11(1), (2011)
Bhat, S.: Statistical stemmer for Kannada. In: International joint conference on natural language processing, Nagoya, Japan, pp. 25–33 (2013)
Padma, M.C., Prathibha, R.J.: Development of morphological stemmer, analyzer and generator for Kannada nouns. Lecture Notes in Electrical Engineering, (LNEE) Series, Springer India, vol. 248, pp. 713–723 (2014). ISSN 18761100
Prathibha, R.J., Padma, M.C.: Development of morphological analyzer for Kannada Verbs. IEEE Xplore Digital Library, pp. 22–27 (2013). ISBN-9781-84919-842-4
Kasthuri, M., Britto Ramesh Kumar, S., Khaddaj, S.: PLIS: Proposed Language Independent Stemmer for Information Retrieval Systems Using Dynamic Programming. Kingston University, UK (2017)
Nathani, B., Purohit, G.N., Joshi, N.: A Rule Based Light Weight Inflectional Stemmer for Sindhi Devanagari Using Affix Stripping Approach. Department of Computer Science Banasthali Vidyapith Banasthali India (2018)
Paul, S., Tandon, M., Joshi, N., Mathur, I.: Design of a rule based Hindi lemmatizer. © Computer Science and Information Technology, pp. 67–74 (2013)
Deepamala, N., Ramakanth, P.: Kannada stemmer and ıts effect on Kannada documents classification. Computational Intelligence in Data Mining, vol. 3, Smart Innovation, Systems and Technologies 33, © Springer India (2015). doi: https://doi.org/10.1007/978-81-322-2202-6_7
Thangarasu, M., Manavalan: Design and development of stemmer for Tamil language: cluster analysis. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3, 812–818 (2013)
Majumder, P., Mitra, M., Pauri, S.K., Kole, G., Mitra, P., Datta, K.: YASS: yet another suffix stripper. ACM Trans. Inf. Syst. 25(4), 18–38 (2007)
Kasthuri, M., Ramesh Kumar, B.: An improved rule based iterative affix stripping stemmer for Tamil language using K-mean clustering. Int. J. Comput. Appl. 94, 36–41 (2014)
Thangarasu, M., Manavalan: Stemmers for Tamil language: performance analysis. Int. J. Comput. Sci. Eng. Techonol. 4, 902–908 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Trishala, G., Mamatha, H.R. (2021). Implementation of Stemmer and Lemmatizer for a Low-Resource Language—Kannada. In: Pandian, A.P., Palanisamy, R., Ntalianis, K. (eds) Proceedings of International Conference on Intelligent Computing, Information and Control Systems. Advances in Intelligent Systems and Computing, vol 1272. Springer, Singapore. https://doi.org/10.1007/978-981-15-8443-5_28
Download citation
DOI: https://doi.org/10.1007/978-981-15-8443-5_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8442-8
Online ISBN: 978-981-15-8443-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)