Abstract
Automatic identification of a meaning of a word in a context is termed as Word Sense Disambiguation (WSD). It is a vital and hard artificial intelligence problem used in several natural language processing applications like machine translation, question answering, information retrieval, etc. In this paper, an explicit WSD system for Punjabi language using supervised techniques has been analysed. The sense tagged corpus of 150 ambiguous Punjabi noun words has been manually prepared. The six supervised machine learning techniques Decision List, Decision Tree, Naive Bayes, K-Nearest Neighbour (K-NN), Random Forest and Support Vector Machines (SVM) have been investigated in this proposed work. Every classifier has used same feature space encompassing lexical (unigram, bigram, collocations, and co-occurrence) and syntactic (part of speech) count based features. The semantic features of Punjabi language have been devised from the unlabelled Punjabi Wikipedia text using word2vec continuous bag of word and skip gram shallow neural network models. Two deep learning neural network classifiers multilayer perceptron and long short term memory have also been applied for WSD of Punjabi words. The word embedding features have experimented on six classifiers for the Punjabi WSD task. It has been observed that the performance of the supervised classifiers applied for the WSD task of Punjabi language has been enhanced with the application of word embedding features. In this work, an accuracy of 84% has been achieved by LSTM classifier using word embedding feature.
Similar content being viewed by others
References
Navigli R 2009 Word sense disambiguation: A survey. ACM Comput. Surv. 41(2): 1–69
Bhala R V and Abirami S 2014 Trends in word sense disambiguation. Artif. Intell. Rev. 42(2): 159–171
Agirre E, de Lacalle OL, Soroa A 2014 Random walks for knowledge-based word sense disambiguation. Comput. Linguist. 40(1): 57–84
Agirre E and Edmonds P eds. 2007 Word sense disambiguation: Algorithms and applications, vol. 33, Springer Science & Business Media, New York
Zhong Z and Ng H T 2010 It makes sense: a wide-coverage word sense disambiguation system for free text. In: Proceedings of the 48th ACL, Uppsala, Sweden, pp. 78–83
Bengio Y, Ducharme R, Vincent P and Janvin C 2003 A neural probabilistic language model. J. Mach. Learn. Res. 3:1137–1155
Iacobacci I, Pilehvar M T and Navigli R 2016 Embeddings for word sense disambiguation: An evaluation study. In: Proceedings of the 54th annual meeting of the ACL, vol 1, Berlin, Germany, pp. 897–907
Narang A, Sharma R and Kumar P 2013 Development of Punjabi WordNet. CSI Trans ICT 1(4): 349–354
Mikolov T, Le Q V and Sutskever I 2013 Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168
Grave E, Bojanowski P, Gupta P, Joulin A and Mikolov T 2018 Learning word vectors for 157 languages. arXiv preprint arXiv:1802.06893
Fellbaum C 1998 WordNet. Blackwell Publishing Ltd.
Narouei M, Ahmadi M and Sami A 2015 SePaS: Word sense disambiguation by sequential patterns in sentences. Nat. Lang. Eng. 21(2): 251–269
Mohammad S and Pedersen T 2004 Combining lexical and syntactic features for supervised word sense disambiguation. In: CoNLL, Boston, USA, pp. 25–32
Manning C D 2016 Computational linguistics and deep learning. Computational Linguistics, pp. 1–7
Yarowsky D and Florian R 2002 Evaluating sense disambiguation across diverse parameter spaces. Nat. Lang. Eng. 8(4): 293–310
Ronan Collobert and Jason Weston 2008 A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th ICML, Helsinki, Finland, pp. 160–167
Trask A, Michalak P and Liu J 2015 Sense2vec-A fast and accurate method for word sense disambiguation in neural word embeddings. arXiv preprint arXiv:1511.06388, pp. 1–9
Kageback M and Salomonsson H 2016 Word Sense Disambiguation using a Bidirectional LSTM. CoRR arXiv:1606.03568, pp. 1–6
Mikolov T, Grave E, Bojanowski P, Puhrsch C and Joulin A 2017 Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405
Popov A 2018 Neural network models for word sense disambiguation: an overview. Cybern. Inf. Technol. 18(1): 139–151
Gupta S, Namavari A and Smith T O 2016 Word sense disambiguation using skip-gram and LSTM models. stanford.edu, pp. 1–9
Yuan D, Richardson J, Doherty R, Evans C and Altendorf E 2016 Semi-supervised word sense disambiguation with neural models. arXiv preprint arXiv:1603.07012, pp. 1–12
Raganato A, Bovi C D and Navigli R 2017 Neural sequence learning models for word sense disambiguation. In: Proceedings of the 2017 conference on empirical methods in natural language processing, Copenhagen, Denmark. pp. 1156–1167
Rana P and Kumar P 2015 Word sense disambiguation for punjabi language using overlap based approach. In: Advances in Intelligent Informatics, pp. 607–619. Springer International Publishing
Singh J and Singh I 2015 Word sense disambiguation: enhanced Lesk approach in Punjabi language. Int. J. Comput. Appl. (0975-8887) 129(6): 23–27
Josan G S and Lehal G S 2008 Size of N for word sense disambiguation using N-gram model for Punjabi language. Int. J. Transl. 20(1): 47–56
Saini T S and Lehal G S 2011 Word Disambiguation in Shahmukhi to Gurmukhi transliteration. Asian Language Resources collocated with IJCNLP, Chiang Mai, Thailand, pp. 79–87
Kachru B, Kachru Y and Sridhar S 2008 Language in South Asia. Cambridge University Press. https://books.google.co.in/books?id=O2n4sFGDEMYC
Kumar P and Sharma R K 2012 Punjabi to UNL enconversion system. Springer Sadhana Acad. Proc. Eng. Sci. 37(2): 299–318
Chan Y S, Ng H T and Chiang D 2007 Word sense disambiguation improves statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 33–40
Bhattacharya P 2010 IndoWordNet. In: Proceeding of the Lexical Resources Engineering Conference, Malta, pp. 1–8
Palmer M, Dang H T and Fellbaum C 2007 Making fine-grained and coarse-grained sense distinctions, both manually and automatically. Nat. Lang. Eng. 13(2): 137–163
Rong X 2014 word2vec parameter learning explained. arXiv preprint arXiv:1411.2738
Rehurek R and Sojka P 2010 Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50
Kumar P 2012 UNL Based Machine Translation System for Punjabi Language. Ph.D. dissertation, CSED, TU, Patiala, Punjab, India
Witten I H and Frank E 2005 Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, USA
Amancio D R, Comin C H, Casanova D, Travieso G, Bruno O M, Rodrigues F A and da Fontoura Costa L 2014 A systematic comparison of supervised classifiers. PloS one. 9(4): e94137
Chollet F 2015 keras software library. GitHub. https://github.com/fchollet/keras
Acknowledgements
This Publication is an outcome of the R&D work undertaken in the project under the Visvesvaraya PhD Scheme of Ministry of Electronics & Information Technology, Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Singh, V.P., Kumar, P. Sense disambiguation for Punjabi language using supervised machine learning techniques. Sādhanā 44, 226 (2019). https://doi.org/10.1007/s12046-019-1206-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12046-019-1206-x