Abstract
Language is known to be one of the tools for communication in a translingual society. It is composed of many elements and the basic fundamental part of a language is the structure of words. Understanding the structure of word is not only necessary to gain the proper understanding about a language, but also an important factor for language translation. The words have numerous variant forms based on its usage; “depend” has variants as dependency, dependent, independent, etc., where depend is a root word. To drop the root from its variant form, some tools are required like Stemming or Lemmatizer. But to extract correct and meaningful root word, the mechanism of lemmatizer should be used because it is not always possible to use stemming to find the meaningful root word. Therefore, lemmatizer is an extended mechanism of stemming. In this paper, the rule-based Urdu Lemmatizer is created that works by eliminating suffix from the root word and adds some required and relevant information to extract the meaningful root.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Porter, M. F. (1980). An algorithm for suffix stripping program, 14(3), 130–137.
Paul, S., Joshi, N., & Mathur, I. (2013). Development of a hindi lemmatizer. In proceeding of International journal of Computational Linguistics and Natural Language Processing, 2(5), 380–384.
Plisson, J., Larc, N., & Mladenic, D. A. (2008). Rule based approach to word lemmatization. In Proceedings of the 7th International Multiconference Information Society. IS-2004, pp. 83–86.
Gupta, D., Yadav, R.K., & Sajan, N. (2012). Improving unsupervised stemming by using partial lemmatization coupled with data based heuristics for hindi. In Proceedings of International Journal of Computer Application (0975–8887), 38(8).
Majumder, P., Mitra, M., Parui, S. K., Kole, G., Mitra, P., & Datta, K. (2007). YASS: Yet another suffix stripper. Association for Computing Machinery Transactions on Information Systems, 25(4):18–38.
Qtair, M. A. (2013). Comparative analysis of arabic stemming algorithms. In proceedings of International Journal of Managing Information Technology (IJMIT), 5(2).
Bhattacharyya, P., Bahuguna, A., Talukdar, L., & Phukan, B. (2014). Facilitating multi-lingual sense annotation: human mediated lemmatizer. In Proceedings of Global Wordnet Conference (GWC 2014), Tartu, Estonia, January 25–29, 2014.
Gupta, V., Joshi, N., & Mathur, I. (2013). Rule based stemmer in Urdu. In Proceedings of 4th International Conference on Computer and Communication Technology. Published in IEEE Xplore. (ISBN: 978-1-4799-1572-9, pp. 1520–1525).
Paul, S., Joshi, N., & Mathur, I. (2013). Development of a rule based hindi lemmatizer. In Proceedings of 3rd International Conference Artificial Intelligence, Soft Computing and Application (pp. 67–74).
Ameta, J., Joshi, N., & Mathur, I. (2011). A lightweight stemmer for Gujrati. In Proceedings of 46th Annual National Convention of Computer Society of India. Ahmedabad, India, (2011).
Gupta, V., Joshi, N., Mathur, I. (2015). Design and development of rule based inflectional and derivational Urdu stemmer ‘Usal’. In Proceedings of 1st International Conference INBUSH ERA 2015. Published in IEEE Xplore. (ISBN: 978-1-4799-8432-9).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Gupta, V., Joshi, N., Mathur, I. (2016). Design and Development of a Rule-Based Urdu Lemmatizer. In: Satapathy, S., Joshi, A., Modi, N., Pathak, N. (eds) Proceedings of International Conference on ICT for Sustainable Development. Advances in Intelligent Systems and Computing, vol 409. Springer, Singapore. https://doi.org/10.1007/978-981-10-0135-2_15
Download citation
DOI: https://doi.org/10.1007/978-981-10-0135-2_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0133-8
Online ISBN: 978-981-10-0135-2
eBook Packages: EngineeringEngineering (R0)