Skip to main content

Design and Implementation of Stop Words Removal Method for Punjabi Language Using Finite Automata

  • Conference paper
  • First Online:
Advances in Data Computing, Communication and Security

Abstract

With the ease in accessibility of Internet, data available online has become one of the main source of information. Large amount of data gets updated daily online. Although this data may be useful for research purposes, however, it cannot be used in its raw form. In general, unstructured data contains a lot of common irrelevant words which do not add to the semantic meaning of the document. These words are known as stop words, and removing them is an important requirement for efficient text processing as done in information retrieval systems and other natural language processing applications. A significant amount of research has been done for removing stop words in languages such as English, Chinese, Urdu, Arabic, Hindi. However, not enough work is done regarding removal of stop words in Punjabi language. Most of the available works utilize corpus-based methods for removing stop words, which tend to be time-consuming. Present paper proposes a method for removing stop words for Punjabi language using finite automata. The performance of the proposed method is compared with the classical method of stop words removal. The implementation results show that the proposed algorithm gives better results in terms of execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. R. Feldman, J. Sanger, Categorization, in The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data (Cambridge University Press, New York, 2016), p. 68

    Google Scholar 

  2. R.B. Myerson, Fundamentals of social choice theory. QJPS. 8(3), 305–337 (2013). https://doi.org/10.1561/100.00013006

    Article  Google Scholar 

  3. S. Behera, Implementation of a finite state automaton to recognize and remove stop words in English text on its retrieval, in 2018 2nd ICOEI (IEEE, 2018). https://doi.org/10.1109/icoei.2018.8553828

  4. J. Martin, Finite automata and the languages they accept, in Introduction to Languages and the Theory of Computation (McGraw-Hill, New York, 2011), p. 45

    Google Scholar 

  5. C. Fox, A stop list for general text. SIGIR Forum. 24(1–2), 19–21 (1989). https://doi.org/10.1145/378881.378888

    Article  Google Scholar 

  6. J. Savoy, A stemming procedure and stop word list for general French corpora. J. Am. Soc. Inf. Sci. 50(10), 944–952 (1999). https://doi.org/10.1002/(sici)1097-4571(1999)50:10%3c944::aid-asi9%3e3.0.co;2-q

    Article  Google Scholar 

  7. M.P. Sinka, D.W. Corne, Towards modernised and Web-specific stoplists for web document analysis, in Proceedings IEEE/WIC International Conference on Web Intelligence (2003). https://doi.org/10.1109/wi.2003.1241221

  8. R. Al-Shalabi et al., Stop-word removal algorithm for Arabic language, in Proceedings 2004 ICICT: From Theory to Applications (IEEE, 2004). https://doi.org/10.1109/ictta.2004.1307875

  9. B. Alhadidi, M. Alwedyan, Hybrid stop-word removal technique for Arabic language. Egypt. Comput. Sci. J. 30, 35–38 (2008)

    Google Scholar 

  10. I.A. El-Khair, Effects of stop words elimination for Arabic information retrieval: a comparative study. IJCIS. 4, 119–133 (2006)

    Google Scholar 

  11. A. Alajmi, E.M. Saad, R.R. Darwish, Article: toward an ARABIC stop-words list generation. Int. J. Comput. Appl. 46(8), 8–13 (2012)

    Google Scholar 

  12. K.S. Dar et al., An efficient stop word elimination algorithm for Urdu language, in 2017 14th ECTI-CON (IEEE, 2017). https://doi.org/10.1109/ecticon.2017.8096386

  13. S. Kamran et al., Stop words elimination in Urdu language using finite state automaton. Int. J. Asian Lang. Process. 27, 21–32 (2017)

    Google Scholar 

  14. F. Zou et al., Automatic construction of Chinese stop word list, in Proceedings of the 5th WSEAS ICACS, Hangzhou, China (2006), pp. 1010–1015

    Google Scholar 

  15. L. Hao, L. Hao, Automatic identification of stop words in Chinese text classification, in 2008 ICCSSE (IEEE, 2008). https://doi.org/10.1109/csse.2008.829

  16. M. Choy, Effective listings of function stop words for Twitter. IJACSA. 3, 6 (2012). https://doi.org/10.14569/ijacsa.2012.030602

  17. Z. Yao, C. Ze-wen, Research on the construction and filter method of stop-word list in text preprocessing, in 2011 4th ICICTA (IEEE, 2011). https://doi.org/10.1109/icicta.2011.64

  18. G. Zheng, G. Gaowa, The selection of Mongolian stop words, in 2010 IEEE ICICIS (IEEE, 2010). https://doi.org/10.1109/icicisys.2010.5658841

  19. R. Puri, R.P.S. Bedi, V. Goyal, Automated stopwords identification in Punjabi documents. IJES. 8(June) (2013)

    Google Scholar 

  20. J. Kaur, J.R. Saini, Punjabi stop words, in Proceedings of the ACM Symposium on Women in Research 2016—WIR’16 (ACM Press, 2016). https://doi.org/10.1145/2909067.2909073

  21. J. Kaur, Stopwords removal and its algorithms based on different methods. IJARCS. 9(5), 81–88 (2018). https://doi.org/10.26483/ijarcs.v9i5.6301

  22. V. Jha, et al., HSRA: Hindi stopword removal algorithm, in 2016 International Conference on Microelectronics, Computing and Communications (MicroCom) (IEEE, 2016). https://doi.org/10.1109/microcom.2016.7522593

  23. S. Siddiqi, A. Sharan, Construction of a generic stopwords list for Hindi language without corpus statistics. IJACR. 8(34), 35–40 (2018). https://doi.org/10.19101/ijacr.2017.733030

  24. J.K. Raulji, J.R. Saini, Stop-word removal algorithm and its implementation for Sanskrit language. Int. J. Comput. Appl. 150(2), 15–17 (2016). https://doi.org/10.5120/ijca2016911462

  25. A. Pimpalshende, A.R. Mahajan, Test model for stop word removal of Devnagari text documents based on finite automata, in 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI) (IEEE, 2017). https://doi.org/10.1109/icpcsi.2017.8391797

  26. B. Arora, S. Gandotra, Automated stop-word list generation for Dogri corpus, in IJAST, vol. 28 (2019), pp. 884–889

    Google Scholar 

  27. R.U. Haque et al., Bengali stop word and phrase detection mechanism. Arab. J. Sci. Eng. 45(4), 3355–3368 (2020). https://doi.org/10.1007/s13369-020-04388-8

    Article  Google Scholar 

  28. N. Rajkumar, et al., Tamil stop word removal based on term frequency, in Advances in Intelligent Systems and Computing (Springer Singapore, 2020), pp. 21–30. https://doi.org/10.1007/978-981-15-1097-7_3

  29. K.P. Johnson, (2020). https://github.com/cltk/cltk/blob/master/src/cltk/stops/pan.py. Accessed 5 May 2021

  30. BBC.com (2021). https://www.bbc.com/punjabi. Accessed 21 May 2021

  31. PunjabiLibrary.com (2021). https://punjabilibrary.com/news/. Accessed 21 May 2021

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kochhar, T.S., Goyal, G. (2022). Design and Implementation of Stop Words Removal Method for Punjabi Language Using Finite Automata. In: Verma, P., Charan, C., Fernando, X., Ganesan, S. (eds) Advances in Data Computing, Communication and Security. Lecture Notes on Data Engineering and Communications Technologies, vol 106. Springer, Singapore. https://doi.org/10.1007/978-981-16-8403-6_8

Download citation

Publish with us

Policies and ethics