Advertisement

A Hybrid Stemmer for the Affix Stacking Language: Marathi

  • Harshali B. PatilEmail author
  • Ajay S. Patil
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1025)

Abstract

Stemming is the process of term conflation that reduces the morphological variations of the terms to their common stem. It plays a significant role during preprocessing in most of the natural language processing, text mining, and information retrieval applications. The use of stemmers has proven highly effective for the task of information retrieval for many languages like English and Arabic. This paper focuses on the development of automated stemmer for Marathi language. We have adopted a hybrid technique for the development of proposed stemmer. The goal of this work is to overcome the limitations of the existing stemmers available for Marathi and to enhance the accuracy of Marathi stemming. The proposed stemmer is tested on Marathi news articles and the evaluation of the work shows that significant improvement is obtained in the accuracy, due to the proposed hybrid stemmer over the existing rule-based stemmer. We have achieved an average accuracy of 84.82% with the proposed hybrid stemmer for Marathi.

Keywords

Stemming Hybrid Rule-based Dictionary Marathi NLP IR 

References

  1. 1.
    Patil, H.B., Pawar, B.V., Patil, A.S.: A comprehensive analysis of stemmers available for indic languages. Int. J. Nat. Lang. Comput. 5, 45–55 (2016)CrossRefGoogle Scholar
  2. 2.
    Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11(1–2), 22–31 (1968)Google Scholar
  3. 3.
    Dawson, J.: Suffix removal and word conflation. Bull. Assoc. Lit. Linguist. Comput. 2(3), 33–46 (1974)Google Scholar
  4. 4.
    Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRefGoogle Scholar
  5. 5.
    Hull, D.A.: Stemming algorithms: a case study for detailed evaluation. JASIS 47(1), 70–84 (1996)CrossRefGoogle Scholar
  6. 6.
    Krovetz, R.: Viewing morphology as an inference process. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–202. ACM (1993)Google Scholar
  7. 7.
    Honrado, A., Leon, R., O’Donnel, R., Sinclair, D.: A word stemming algorithm for the Spanish language. In: Proceedings Seventh International Symposium on String Processing and Information Retrieval, pp. 139–145. IEEE (2000)Google Scholar
  8. 8.
    Bouma, T.G.G.: Accurate stemming of Dutch for text classification. Lang. Comput. 45(1), 104–117 (2002)Google Scholar
  9. 9.
    Argaw, A.A., Asker, L.: An Amharic stemmer: reducing words to their citation forms. In: Proceedings of the 5th Workshop on Important Unresolved Matters, pp. 104–110. ACL (2007)Google Scholar
  10. 10.
    Eger, S., Sējāne, I.: An ensemble of classifiers methodology for stemming in inflectional languages: using the example of Latvian. In: Proceedings of the Fourth International Conference Baltic HLT, pp. 217–224. IOS Press (2010)Google Scholar
  11. 11.
    Estahbanati, S., Javidan, R.: A new stemmer for Farsi language. In: CSI International Symposium on Computer Science and Software Engineering, pp. 25–29. IEEE (2011)Google Scholar
  12. 12.
    Dilekh, T., Behloul, A.: Implementation of a new hybrid method for stemming of Arabic text. Int. J. Comput. Appl. 46(8) (2012)Google Scholar
  13. 13.
    Sitaula, C.: A hybrid algorithm for stemming of Nepali text. Intell. Inf. Manag. 5(4) (2013)Google Scholar
  14. 14.
    Kumar, D., Rana, P.: Stemming of punjabi words by using brute force technique. Int. J. Eng. Sci. Technol. 3, 1351–1357 (2011)Google Scholar
  15. 15.
    Patel, P., Popat, K., Bhattacharyya, P.: Hybrid stemmer for Gujarati. In: 23rd International Conference on Computational Linguistics, p. 51 (2010)Google Scholar
  16. 16.
    Suba, K., Jiandani, D., Bhattacharyya, P.: Hybrid inflectional stemmer and rule-based derivational stemmer for Gujarati. In: Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (2011)Google Scholar
  17. 17.
    Mishra, U., Prakash, C.: MAULIK: an effective stemmer for Hindi language. Int. J. Comput. Sci. Eng. 4(5), 711 (2012)Google Scholar
  18. 18.
    Meitei, S.P., Purkayastha, B.S., Devi, H.M.: Development of a Manipuri stemmer: a hybrid approach. In: International Symposium on Advanced Computing and Communication, pp. 128–131. IEEE (2015)Google Scholar
  19. 19.
    Pandey, P., Amin, D., Govilkar, S.: Rule based stemmer using Marathi WordNet for Marathi language. Int. J. Comput. Sci. Eng. 5(10), 278–282 (2016)Google Scholar
  20. 20.
    Patil, H.B., Patil, A.S.: MarS: a rule-based stemmer for morphologically rich language Marathi. In: Proceedings of the International Conference on Computer, Communications and Electronics, pp 580–584. IEEE (2017)Google Scholar
  21. 21.
    Patil, H.B., Mhaske, N.T., Patil, A.S.: Design and development of a dictionary based stemmer for Marathi language. In: Proceedings of the 3rd International Conference on Next Generation Computing Technologies (2017)Google Scholar
  22. 22.
    Husain, M.S.: An unsupervised approach to develop stemmer. Int. J. Nat. Lang. Comput. 1(2), 15–23 (2012)CrossRefGoogle Scholar
  23. 23.
    Majgaonker, M.M.: Discovering suffixes: a case study for Marathi language. Int. J. Comput. Sci. Eng. 02(08), 2716–2720 (2010)Google Scholar
  24. 24.
    Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. In: ACM SIGIR Forum, vol. 37, no. 1, pp. 26–30. ACM (2003)Google Scholar
  25. 25.
    Paice, C.D.: Method for evaluation of stemming algorithms based on error counting. J. Assoc. Inf. Sci. Technol. 47(8), 632–649 (1996)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Kavayitri Bahinabai Chaudhari North Maharashtra UniversityJalgaonIndia

Personalised recommendations