Skip to main content

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 51))

Abstract

Literature in Indian language must be classified for its easy retrieval. In Punjabi literature classifier, five different categories: nature, romantic, religious, patriotic and philosophical, are manually populated with 250 poems. These poems are pre-processed through data cleaning, tokenization, bag of word, stop word identification and stemming phases. Due to unavailability of Punjabi stop words in public domain, manual collection of 256 stop words are done from poetry and articles. After stemming, 184 unique stemmed words are identified. Based on part of speech tagging, 184 stop words are categorized into 98 adverbs, 7 conjunctions, 43 verbs, 24 pronouns and 12 miscellaneous words. These unique 184 stemmed words are being released for other language processing algorithm in Punjabi. This paper concentrates on providing better and deeper understanding of Punjabi stop words in lieu of Punjabi grammar and part of speech based word class categorization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  2. Languages of India.: http://en.wikipedia.org/wiki/Languages_of_India#Prominent_languages_of_India

  3. Punjabi Language.: http://en.wikipedia.org/wiki/Punjabi_language

  4. Poem.: http://oxforddictionaries.com/definition/english/poem

  5. Kaur, J., Saini, J.R.: A study and analysis of opinion mining research in Indo-Aryan, Dravidian and Tibeto-Burman Language families. Int. J. Data Mining Emerg. 4(2), 53–60 (2014)

    Article  Google Scholar 

  6. Ali, R.A., Maliha, I.: Urdu text classification. In: 7th International Conference on Frontiers of Information Technology, ACM New York, USA, (2009). ISBN 978-1-60558-642-7, doi:10.1145/1838002.1838025

  7. Mansur, M., UzZaman, N., Khan, M.: Analysis of N-Gram Based Text Categorization for Bangla in a Newspaper Corpus. Center for Research on Bangla Language Processing. BRAC University, Dhaka, Bangladesh (2006)

    Google Scholar 

  8. Mohanty, S., Santi, P.K., Mishra, R., Mohapatra, R.N., Swain, S.: Semantic based text classification using wordnets: Indian language perspective. In: 3rd International Wordnet Conference (GWC 06). pp. 321–324 (2006). doi:10.1.1.134.866

  9. Nidhi., Gupta, V.: Domain based classification Punjabi text documents. In: International Conference on Computational Linguistics, pp. 297–304 (2012)

    Google Scholar 

  10. Sarmah, J., Saharia, N., Sarma, S.K.: A novel approach for document classification using assamese wordnet. In: 6th International Global Wordnet Conference, pp. 324–329 (2012)

    Google Scholar 

  11. Murthy, K.N.: Automatic Categorization of Telugu News Articles. Department of Computer and Information Sciences, University of Hyderabad, Hyderabad (2003). doi:202.41.85.68

  12. Rajan, K., Ramalingam, V., Ganesan, M., Palanive, S., Palaniappan, B.: Automatic classification of Tamil documents using vector space model and artificial neural network. Expert Syst. Appl. 36(8), 10914–10918 (2009)

    Article  Google Scholar 

  13. Jayashree, R.: An analysis of sentence level text classification for the Kannada language. In: International Conference of Soft Computing and Pattern Recognition, pp. 147–151 (2011)

    Google Scholar 

  14. Gupta, V., Lehal, G.S.: Preprocessing phase of Punjabi language text summarization. In: International Conference on Information System for Indian languages, vol. 139, pp. 250–253(2011)

    Google Scholar 

  15. Unicode Table. http://www.tamasoft.co.jp/en/general-info/unicode-decimal.html

  16. Stemming. http://en.wikipedia.org/wiki/Stemming

  17. Gupta, V.: Automatic stemming of words for Punjabi language. In: Advances in Signal Processing and Intelligent Recognition systems, Advances in Intelligent Systems and Computing, vol. 264, pp. 73–84 (2014)

    Google Scholar 

  18. Google Translation. https://translate.google.co.in/#auto/en/%E0%A8%AA%E0%A8%8F

  19. Transliteration and Translation. http://www.shabdkosh.com/pa/

  20. Bhatia, T.K.: Punjabi: a cognitive-descriptive grammar. Rout ledge Descriptive Grammar Series (1993)

    Google Scholar 

  21. Overview of Punjabi Grammar. http://punjabi.aglsoft.com/punjabi/learngrammar/?show=conjunction

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaur Jasleen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Jasleen, K., Saini Jatinderkumar, R. (2016). POS Word Class Based Categorization of Gurmukhi Language Stemmed Stop Words. In: Satapathy, S., Das, S. (eds) Proceedings of First International Conference on Information and Communication Technology for Intelligent Systems: Volume 2. Smart Innovation, Systems and Technologies, vol 51. Springer, Cham. https://doi.org/10.1007/978-3-319-30927-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30927-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30926-2

  • Online ISBN: 978-3-319-30927-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics