Skip to main content

Design and Development of a Rule-Based Urdu Lemmatizer

  • Conference paper
  • First Online:
Proceedings of International Conference on ICT for Sustainable Development

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 409))

Abstract

Language is known to be one of the tools for communication in a translingual society. It is composed of many elements and the basic fundamental part of a language is the structure of words. Understanding the structure of word is not only necessary to gain the proper understanding about a language, but also an important factor for language translation. The words have numerous variant forms based on its usage; “depend” has variants as dependency, dependent, independent, etc., where depend is a root word. To drop the root from its variant form, some tools are required like Stemming or Lemmatizer. But to extract correct and meaningful root word, the mechanism of lemmatizer should be used because it is not always possible to use stemming to find the meaningful root word. Therefore, lemmatizer is an extended mechanism of stemming. In this paper, the rule-based Urdu Lemmatizer is created that works by eliminating suffix from the root word and adds some required and relevant information to extract the meaningful root.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Porter, M. F. (1980). An algorithm for suffix stripping program, 14(3), 130–137.

    Google Scholar 

  2. Paul, S., Joshi, N., & Mathur, I. (2013). Development of a hindi lemmatizer. In proceeding of International journal of Computational Linguistics and Natural Language Processing, 2(5), 380–384.

    Google Scholar 

  3. Plisson, J., Larc, N., & Mladenic, D. A. (2008). Rule based approach to word lemmatization. In Proceedings of the 7th International Multiconference Information Society. IS-2004, pp. 83–86.

    Google Scholar 

  4. Gupta, D., Yadav, R.K., & Sajan, N. (2012). Improving unsupervised stemming by using partial lemmatization coupled with data based heuristics for hindi. In Proceedings of International Journal of Computer Application (0975–8887), 38(8).

    Google Scholar 

  5. Majumder, P., Mitra, M., Parui, S. K., Kole, G., Mitra, P., & Datta, K. (2007). YASS: Yet another suffix stripper. Association for Computing Machinery Transactions on Information Systems, 25(4):18–38.

    Google Scholar 

  6. Qtair, M. A. (2013). Comparative analysis of arabic stemming algorithms. In proceedings of International Journal of Managing Information Technology (IJMIT), 5(2).

    Google Scholar 

  7. Bhattacharyya, P., Bahuguna, A., Talukdar, L., & Phukan, B. (2014). Facilitating multi-lingual sense annotation: human mediated lemmatizer. In Proceedings of Global Wordnet Conference (GWC 2014), Tartu, Estonia, January 25–29, 2014.

    Google Scholar 

  8. Gupta, V., Joshi, N., & Mathur, I. (2013). Rule based stemmer in Urdu. In Proceedings of 4th International Conference on Computer and Communication Technology. Published in IEEE Xplore. (ISBN: 978-1-4799-1572-9, pp. 1520–1525).

    Google Scholar 

  9. Paul, S., Joshi, N., & Mathur, I. (2013). Development of a rule based hindi lemmatizer. In Proceedings of 3rd International Conference Artificial Intelligence, Soft Computing and Application (pp. 67–74).

    Google Scholar 

  10. Ameta, J., Joshi, N., & Mathur, I. (2011). A lightweight stemmer for Gujrati. In Proceedings of 46th Annual National Convention of Computer Society of India. Ahmedabad, India, (2011).

    Google Scholar 

  11. Gupta, V., Joshi, N., Mathur, I. (2015). Design and development of rule based inflectional and derivational Urdu stemmer ‘Usal’. In Proceedings of 1st International Conference INBUSH ERA 2015. Published in IEEE Xplore. (ISBN: 978-1-4799-8432-9).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vaishali Gupta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this paper

Cite this paper

Gupta, V., Joshi, N., Mathur, I. (2016). Design and Development of a Rule-Based Urdu Lemmatizer. In: Satapathy, S., Joshi, A., Modi, N., Pathak, N. (eds) Proceedings of International Conference on ICT for Sustainable Development. Advances in Intelligent Systems and Computing, vol 409. Springer, Singapore. https://doi.org/10.1007/978-981-10-0135-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-0135-2_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-0133-8

  • Online ISBN: 978-981-10-0135-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics