Skip to main content
Log in

Sense disambiguation for Punjabi language using supervised machine learning techniques

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Automatic identification of a meaning of a word in a context is termed as Word Sense Disambiguation (WSD). It is a vital and hard artificial intelligence problem used in several natural language processing applications like machine translation, question answering, information retrieval, etc. In this paper, an explicit WSD system for Punjabi language using supervised techniques has been analysed. The sense tagged corpus of 150 ambiguous Punjabi noun words has been manually prepared. The six supervised machine learning techniques Decision List, Decision Tree, Naive Bayes, K-Nearest Neighbour (K-NN), Random Forest and Support Vector Machines (SVM) have been investigated in this proposed work. Every classifier has used same feature space encompassing lexical (unigram, bigram, collocations, and co-occurrence) and syntactic (part of speech) count based features. The semantic features of Punjabi language have been devised from the unlabelled Punjabi Wikipedia text using word2vec continuous bag of word and skip gram shallow neural network models. Two deep learning neural network classifiers multilayer perceptron and long short term memory have also been applied for WSD of Punjabi words. The word embedding features have experimented on six classifiers for the Punjabi WSD task. It has been observed that the performance of the supervised classifiers applied for the WSD task of Punjabi language has been enhanced with the application of word embedding features. In this work, an accuracy of 84% has been achieved by LSTM classifier using word embedding feature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

References

  1. Navigli R 2009 Word sense disambiguation: A survey. ACM Comput. Surv. 41(2): 1–69

    Article  Google Scholar 

  2. Bhala R V and Abirami S 2014 Trends in word sense disambiguation. Artif. Intell. Rev. 42(2): 159–171

    Article  Google Scholar 

  3. Agirre E, de Lacalle OL, Soroa A 2014 Random walks for knowledge-based word sense disambiguation. Comput. Linguist. 40(1): 57–84

    Article  Google Scholar 

  4. Agirre E and Edmonds P eds. 2007 Word sense disambiguation: Algorithms and applications, vol. 33, Springer Science & Business Media, New York

    Google Scholar 

  5. Zhong Z and Ng H T 2010 It makes sense: a wide-coverage word sense disambiguation system for free text. In: Proceedings of the 48th ACL, Uppsala, Sweden, pp. 78–83

  6. Bengio Y, Ducharme R, Vincent P and Janvin C 2003 A neural probabilistic language model. J. Mach. Learn. Res. 3:1137–1155

    MATH  Google Scholar 

  7. Iacobacci I, Pilehvar M T and Navigli R 2016 Embeddings for word sense disambiguation: An evaluation study. In: Proceedings of the 54th annual meeting of the ACL, vol 1, Berlin, Germany, pp. 897–907

  8. Narang A, Sharma R and Kumar P 2013 Development of Punjabi WordNet. CSI Trans ICT 1(4): 349–354

    Article  Google Scholar 

  9. Mikolov T, Le Q V and Sutskever I 2013 Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168

  10. Grave E, Bojanowski P, Gupta P, Joulin A and Mikolov T 2018 Learning word vectors for 157 languages. arXiv preprint arXiv:1802.06893

  11. Fellbaum C 1998 WordNet. Blackwell Publishing Ltd.

  12. Narouei M, Ahmadi M and Sami A 2015 SePaS: Word sense disambiguation by sequential patterns in sentences. Nat. Lang. Eng. 21(2): 251–269

    Article  Google Scholar 

  13. Mohammad S and Pedersen T 2004 Combining lexical and syntactic features for supervised word sense disambiguation. In: CoNLL, Boston, USA, pp. 25–32

  14. Manning C D 2016 Computational linguistics and deep learning. Computational Linguistics, pp. 1–7

  15. Yarowsky D and Florian R 2002 Evaluating sense disambiguation across diverse parameter spaces. Nat. Lang. Eng. 8(4): 293–310

    Article  Google Scholar 

  16. Ronan Collobert and Jason Weston 2008 A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th ICML, Helsinki, Finland, pp. 160–167

  17. Trask A, Michalak P and Liu J 2015 Sense2vec-A fast and accurate method for word sense disambiguation in neural word embeddings. arXiv preprint arXiv:1511.06388, pp. 1–9

  18. Kageback M and Salomonsson H 2016 Word Sense Disambiguation using a Bidirectional LSTM. CoRR arXiv:1606.03568, pp. 1–6

  19. Mikolov T, Grave E, Bojanowski P, Puhrsch C and Joulin A 2017 Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405

  20. Popov A 2018 Neural network models for word sense disambiguation: an overview. Cybern. Inf. Technol. 18(1): 139–151

    MathSciNet  Google Scholar 

  21. Gupta S, Namavari A and Smith T O 2016 Word sense disambiguation using skip-gram and LSTM models. stanford.edu, pp. 1–9

  22. Yuan D, Richardson J, Doherty R, Evans C and Altendorf E 2016 Semi-supervised word sense disambiguation with neural models. arXiv preprint arXiv:1603.07012, pp. 1–12

  23. Raganato A, Bovi C D and Navigli R 2017 Neural sequence learning models for word sense disambiguation. In: Proceedings of the 2017 conference on empirical methods in natural language processing, Copenhagen, Denmark. pp. 1156–1167

  24. Rana P and Kumar P 2015 Word sense disambiguation for punjabi language using overlap based approach. In: Advances in Intelligent Informatics, pp. 607–619. Springer International Publishing

  25. Singh J and Singh I 2015 Word sense disambiguation: enhanced Lesk approach in Punjabi language. Int. J. Comput. Appl. (0975-8887) 129(6): 23–27

    Google Scholar 

  26. Josan G S and Lehal G S 2008 Size of N for word sense disambiguation using N-gram model for Punjabi language. Int. J. Transl. 20(1): 47–56

    Google Scholar 

  27. Saini T S and Lehal G S 2011 Word Disambiguation in Shahmukhi to Gurmukhi transliteration. Asian Language Resources collocated with IJCNLP, Chiang Mai, Thailand, pp. 79–87

  28. Kachru B, Kachru Y and Sridhar S 2008 Language in South Asia. Cambridge University Press. https://books.google.co.in/books?id=O2n4sFGDEMYC

  29. Kumar P and Sharma R K 2012 Punjabi to UNL enconversion system. Springer Sadhana Acad. Proc. Eng. Sci. 37(2): 299–318

    Google Scholar 

  30. Chan Y S, Ng H T and Chiang D 2007 Word sense disambiguation improves statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 33–40

  31. Bhattacharya P 2010 IndoWordNet. In: Proceeding of the Lexical Resources Engineering Conference, Malta, pp. 1–8

  32. Palmer M, Dang H T and Fellbaum C 2007 Making fine-grained and coarse-grained sense distinctions, both manually and automatically. Nat. Lang. Eng. 13(2): 137–163

    Article  Google Scholar 

  33. Rong X 2014 word2vec parameter learning explained. arXiv preprint arXiv:1411.2738

  34. Rehurek R and Sojka P 2010 Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50

  35. Kumar P 2012 UNL Based Machine Translation System for Punjabi Language. Ph.D. dissertation, CSED, TU, Patiala, Punjab, India

  36. Witten I H and Frank E 2005 Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, USA

  37. Amancio D R, Comin C H, Casanova D, Travieso G, Bruno O M, Rodrigues F A and da Fontoura Costa L 2014 A systematic comparison of supervised classifiers. PloS one. 9(4): e94137

    Article  Google Scholar 

  38. Chollet F 2015 keras software library. GitHub. https://github.com/fchollet/keras

Download references

Acknowledgements

This Publication is an outcome of the R&D work undertaken in the project under the Visvesvaraya PhD Scheme of Ministry of Electronics & Information Technology, Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Varinder Pal Singh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, V.P., Kumar, P. Sense disambiguation for Punjabi language using supervised machine learning techniques. Sādhanā 44, 226 (2019). https://doi.org/10.1007/s12046-019-1206-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-019-1206-x

Keywords

Navigation