Advertisement

Abstract

The major task of a stemmer is to find root words that are not in original form and are hence absent in the dictionary. The stemmer after stemming finds the word in the dictionary. If a match of the word is not found, then it may be some incorrect word or a name, otherwise the word is correct. For any language in the world, stemmer is a basic linguistic resource required to develop any type of application in Natural Language Processing (NLP) with high accuracy such as machine translation, document classification, document clustering, text question answering, topic tracking, text summarization and keywords extraction etc. This paper concentrates on complete automatic stemming of Punjabi words covering Punjabi nouns, verbs, adjectives, adverbs, pronouns and proper names. A suffix list of 18 suffixes for Punjabi nouns and proper names and a number of other suffixes for Punjabi verbs, adjectives and adverbs and different stemming rules for Punjabi nouns, verbs, adjectives, adverbs, pronouns and proper names have been generated after analysis of corpus of Punjabi. It is first time that complete Punjabi stemmer covering Punjabi nouns, verbs, adjectives, adverbs, pronouns, and proper names has been proposed and it will be useful for developing other Punjabi NLP applications with high accuracy. A portion of Punjabi stemmer of proper names and nouns has been implemented as a part of Punjabi text summarizer in MS Access as back end and ASP.NET as front end with 87.37% efficiency

Keywords

Punjabi Stemming Punjabi Noun Stemming Punjabi Verb Stemmer Punjabi Names Stemmer Punjabi Adjective Stemmer 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Porter, M.: An Algorithm for Suffix Stripping Program 14, 130–137 (1980)Google Scholar
  2. 2.
    Jenkins, M., Smith, D.: Conservative Stemming for Search and Indexing. In: Proceedings of SIGIR 2005 (2005)Google Scholar
  3. 3.
    Mayfield, J., McNamee, P.: Single N-gram stemming. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415–416 (2003)Google Scholar
  4. 4.
    Massimo, M., Nicola, O.: A Novel Method for Stemmer Generation based on Hidden Markov Models. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 131–138 (2003)Google Scholar
  5. 5.
    Goldsmith, J.A.: Unsupervised Learning of the Morphology of a Natural Language. Computational Linguistics 27, 153–198 (2001)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Creutz, M., Lagus, K.: Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora using Morfessor 1.0. Publications of Computer and Information Science, Helsinki University of Technology (2005)Google Scholar
  7. 7.
    Ramanathan, A., Rao, D.D.: A Lightweight Stemmer for Hindi. In: Proceedings of Workshop on Computational Linguistics for South-Asian Languages, EACL (2003)Google Scholar
  8. 8.
    Islam, M.Z., Uddin, M.N., Khan, M.: A Light Weight Stemmer for Bengali and its Use in Spelling Checker. In: Proceedings of. 1st Intl. Conf. on Digital Comm. and Computer Applications (DCCA 2007), Irbid, Jordan, pp. 19–23 (2007)Google Scholar
  9. 9.
    Majumder, P., Mitra, M., Parui, S.K., Kole, G., Datta, K.: YASS Yet Another Suffix Stripper. Association for Computing Machinery Transactions on Information Systems 25, 18–38 (2007)CrossRefGoogle Scholar
  10. 10.
    Dasgupta, S., Ng, V.: Unsupervised Morphological Parsing of Bengali. Language Resources and Evaluation 40, 311–330 (2006)CrossRefGoogle Scholar
  11. 11.
    Pandey, A.K., Siddiqui, T.J.: An Unsupervised Hindi Stemmer with Heuristic Improvements. In: Proceedings of the Second Workshop on Analytics For Noisy Unstructured Text Data, vol. 303, pp. 99–105 (2008)Google Scholar
  12. 12.
    Majgaonker, M.M., Siddiqui, T.J.: Discovering Suffixes: A Case Study for Marathi Language. Proceedings of International Journal on Computer Science and Engineering 2, 2716–2720 (2010)Google Scholar
  13. 13.
    Suba, K., Jiandani, D., Bhattacharyya, P.: Hybrid Inflectional Stemmer and Rule-based Derivational Stemmer for Gujarati. In: Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP) IJCNLP 2011, Chiang Mai, Thailand, pp. 1–8 (2011)Google Scholar
  14. 14.
    Gupta, V., Lehal, G.S.: Punjabi Language Stemmer for Nouns and Proper Names. In: Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP) IJCNLP 2011, Chiang Mai, Thailand, pp. 35–39 (2011)Google Scholar
  15. 15.
    Gupta, V., Lehal, G.S.: Preprocessing Phase of Punjabi Language Text Summarization. In: Singh, C., Singh Lehal, G., Sengupta, J., Sharma, D.V., Goyal, V. (eds.) ICISIL 2011. CCIS, vol. 139, pp. 250–253. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  16. 16.
    Gupta, V., Lehal, G.S.: Automatic Punjabi Text Extractive Summarization System. In: Proceedings of International Conference on Computational Linguistics COLING 2012, pp. 191–198 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.University Institute of Engineering & TechnologyPanjab University ChandigarhChandigarhIndia

Personalised recommendations