Advertisement

Rule-Based Derivational Stemmer for Sindhi Devanagari Using Suffix Stripping Approach

  • Bharti NathaniEmail author
  • Nisheeth Joshi
  • G. N. Purohit
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 141)

Abstract

Stemming is one of the important tasks of Natural Language Processing Applications, such as in Information retrieval and Machine Translation. In this research paper, we focused on Derivational Stemmer for resource-poor language Sindhi, in Devanagari Script by using suffix Stripping approach. A dictionary of frequent words is added to reduce over and under stemming error. This is our first attempt to develop a Rule-based Derivational Stemmer in Sindhi Devanagari Script. We compared the results of this derivational stemmer with inflectional stemmer of Sindhi Devanagari Script, previously developed by us.

Keywords

Stemming Natural language processing Derivational Rule based Devanagari script Sindhi language Machine translation Information retrieval 

References

  1. 1.
    Shahani, A.T.: Sindhi Self-instructor: In Arabic Sindhi and Devanagari Scripts with Pronunciations in Roman Characters, 5th edn. Sindhi Academy, Delhi (2011)Google Scholar
  2. 2.
    Saraswat, U.: Nutan Sindhi Vyakaran. Suresh Saraswat, Lajpat Nagar, Delhi (2014)Google Scholar
  3. 3.
    Jetley, M.: Sindhi Bhasha Vyakaran Evam Prayog. Sindhi Academy, Delhi (2012)Google Scholar
  4. 4.
    Rahman, M.U.: Sindhi morphology and noun inflections. In: Proceedings of the Conference on Language and Technology, pp. 74–81 (2009)Google Scholar
  5. 5.
    Oad, J.D.: Implementing GF resource grammar for Sindhi language. Doctor dissertation, M.Sc. thesis, Chalmers University of Technology, Gothenburg, Sweden (2012)Google Scholar
  6. 6.
    Mahar, J.A., Memon, G.Q.: Probabilistic analysis of sindhi word prediction using N-Grams. Aust. J. Basic Appl. Sci. 5(5), 1137–1143 (2011)Google Scholar
  7. 7.
    Mahar, J.A., Memon, G.Q., Danwar, S.H.: Algorithms for sindhi word segmentation using Lexicon-Driven approach. Int. J. Acad. Res. 3(3) (2011)Google Scholar
  8. 8.
    Lashari, M.A., Soomro, A.A.: Subject-verb agreement in Sindhi and English: a comparative study. Lang. India 13(6), 473–495 (2013)Google Scholar
  9. 9.
    Motlani, R., Tyers, F. M., Sharma, D.M.: A finite-state morphological analyser for Sindhi. In: LREC (2016)Google Scholar
  10. 10.
    Narejo, W.A., Mahar, J.A.: Morphology: Sindhi morphological analysis for natural language processing applications. In: 2016 International Conference on Computing, Electronic and Electrical Engineering (ICE Cube), pp. 27–31. IEEE (2016) Google Scholar
  11. 11.
    Narejo, W.A., Mahar, J.A., Mahar, S.A., Surahio, F.A., Jumani, A.K.: Sindhi morphological analysis: an algorithm for Sindhi word segmentation into morphemes. Int. J. Comput. Sci. Inf. Secur. 14(6), 293 (2016)Google Scholar
  12. 12.
    Makhija, S.D.: A study of different stemmer for sindhi language based on devanagari script. In: Computing for Sustainable Global Development (INDIACom), 2016 3rd International Conference on, pp. 2326–2329. IEEE (2016)Google Scholar
  13. 13.
    Shah, M., Shaikh, H., Mahar, J., Mahar, S.: Sindhi stemmer for information retrieval system using rule-based stripping approach. Sindh Univ. Res. J.-SURJ (Sci. Ser.) 48(4) (2016)Google Scholar
  14. 14.
    Suba, K., Jiandani, D., Bhattacharyya, P.: Hybrid inflectional stemmer and rule-based derivational stemmer for gujarati. In: Proceedings of the 2nd Workshop on South Southeast Asian Natural Language Processing (WSSANLP), pp. 1–8 (2011)Google Scholar
  15. 15.
    Kanuparthi, N., Inumella, A., Sharma, D.M.: Hindi derivational morphological analyzer. In: Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology, pp. 10–16. Association for Computational Linguistics (2012)Google Scholar
  16. 16.
    Gupta, V., Joshi, N., Mathur, I.: Design & development of rule based inflectional and derivational Urdu stemmer ‘Usal’. In: 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), pp. 7–12. IEEE (2015)Google Scholar
  17. 17.
    Saharia, N., Sharma, U., Kalita, J.: Stemming resource-poor Indian languages. ACM Trans. Asian Lang. Inf. Process. (TALIP) 13(3), 14 (2014)Google Scholar
  18. 18.
    Rathod, S., Govilkar, S.: Survey of various POS tagging techniques for Indian regional languagesGoogle Scholar
  19. 19.
    Govilkar, S.S., Bakal, J.W., Kulkarni, S.R.: Extraction of root words using morphological analyzer for Devanagari script. Int. J. Inf. Technol. Comput. Sci. (IJITCS) 8(1), 33 (2016)Google Scholar
  20. 20.
    Karanikolas, N.N.: A methodology for building simple but robust stemmers without language knowledge: stemmer configuration. Procedia Soc. Behav. Sci. 147, 370–375 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Bharti Nathani
    • 1
    Email author
  • Nisheeth Joshi
    • 1
  • G. N. Purohit
    • 1
  1. 1.Computer Science Department, Faculty of Mathematics and ComputingBanasthali VidyapithJaipurIndia

Personalised recommendations