Skip to main content

Kannada Stemmer and Its Effect on Kannada Documents Classification

  • Conference paper
  • First Online:
Computational Intelligence in Data Mining - Volume 3

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 33))

Abstract

Stemming is reducing a word to its root or stem form. Kannada is a morphologically rich language and words get inflected to different forms based on person, number, gender and tense. Stemming is an important pre-processing step in any Natural Language Processing application. In this paper, stemming is performed on Kannada words using unsupervised method using suffix arrays. An accuracy of 0.58 % was achieved with this method. The performance of the stemmer is further improved by using a stem-list dictionary in combination with the unsupervised method. A list of 18,804 stem words is created manually in Kannada Language as part of this work. A 10 % improvement in performance is observed. The effect of the proposed stemmer on text classification of Kannada documents using Naïve Bayes and Maximum Entropy methods are compared. It is shown in this paper, that stemming improves the performance of text classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)

    Google Scholar 

  2. Lovins, J.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11, 22–23 (1968)

    Google Scholar 

  3. Paice, C., Husk, G.: Another stemmer. ACM SIGIR Forum 24(3), 566 (1990)

    Google Scholar 

  4. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  5. Ramanathan, A., Rao, D.D.: A lightweight stemmer for Hindi. In: Proceedings of EACL, ACL (2003)

    Google Scholar 

  6. Islam, Z., Uddin, N., Khan, M.: A light weight stemmer for bengali and its use in spelling checker. In: Proceedings of 1st International conference on Digital Communications and Computer Applications (DCCA 2007), Irbid, Jordan, pp. 87–93 (2007)

    Google Scholar 

  7. Majumder, P., Mitra, M., Parui, S.K., Kole, G., Mitra, P., Datta, K.: Yass: yet another suffix stripper. ACM Trans. Inf. Syst. 25(4), 18 (2007)

    Article  Google Scholar 

  8. Pandey, A.K., Siddiqui, T.J.: An unsupervised Hindi stemmer with heuristic improvements. In: Proceedings of the Second W.orkshop on Analytics for Noisy Unstructured Text Data, AND 2008, Singapore, pp. 99–105 (2008)

    Google Scholar 

  9. Dasgupta, S., Ng, V.: Unsupervised morphological parsing of bengali. Lang. Resour. Eval. 40, 311–330 (2006)

    Google Scholar 

  10. Keshava, S., Pitler, E.: A simpler, intuitive approach to morpheme induction. In: Proceedings of 2nd Pascal Challenges Workshop, pp. 31–35 (2006)

    Google Scholar 

  11. Majgaonker, M.M., Siddiqui, T.J.: Discovering suffixes: a case study for Marathi language. Int. J. Comput. Sci. Eng. 04, 2716–2720 (2010)

    Google Scholar 

  12. Suba, K., Jiandani, D., Bhattacharyya, P.: Hybrid inflectional stemmer and rule-based derivational stemmer for Gujrati. In: 2nd Workshop on South and Southeast Asian Natural Languages Processing, Chiang Mai, Thailand (2011)

    Google Scholar 

  13. Gupta, V., Lehal, G.S.: Punjabi language stemmer for nouns and proper names. In: Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP) IJCNLP 2011, Chiang Mai, Thailand, pp. 35–39 (2011)

    Google Scholar 

  14. Kumar, D., Rana, P.: Design and development of a stemmer for Punjabi. Int. J. Comput. Appl. 11(12), 0975–8887 (2010)

    Google Scholar 

  15. Padma, M.C., Prathibha, R.J.: Development of morphological stemmer, analyzer and generator for Kannada nouns. In: Proceedings of International Conference, ICERECT 2012, pp. 713–723 (2014)

    Google Scholar 

  16. Bhat, S.: Statistical stemming for Kannada. In: Proceedings The 4th Workshop on South and Southeast Asian NLP (WSSANLP), International Joint Conference on Natural Language Processing, Nagoya, Japan, pp. 25–33, 14–18 Oct 2013

    Google Scholar 

  17. http://www.hlt.utdallas.edu/~sajib/FinalDistribution.tar.gz. Accessed 24 July 2014

  18. Emille corpus: http://www.emille.lancs.ac.uk (2003)

  19. Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering, pp. 61–67 (1999)

    Google Scholar 

  20. McCallum, A.K.: MALLET: a machine learning for language toolkit (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Deepamala .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer India

About this paper

Cite this paper

Deepamala, N., Ramakanth Kumar, P. (2015). Kannada Stemmer and Its Effect on Kannada Documents Classification. In: Jain, L., Behera, H., Mandal, J., Mohapatra, D. (eds) Computational Intelligence in Data Mining - Volume 3. Smart Innovation, Systems and Technologies, vol 33. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2202-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2202-6_7

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2201-9

  • Online ISBN: 978-81-322-2202-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics