Skip to main content
Log in

Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

In this work, Word Sense Disambiguation (WSD) in Bengali language is implemented using unsupervised methodology. In the first phase of this experiment, sentence clustering is performed using Maximum Entropy method and the clusters are labelled with their innate senses by manual intervention, as these sense-tagged clusters could be used as sense inventories for further experiment. In the next phase, when a test data comes to be disambiguated, the Cosine Similarity Measure is used to find the closeness of that test data with the initially sense-tagged clusters. The minimum distance of that test data from a particular sense-tagged cluster assigns the same sense to the test data as that of the cluster it is assigned with. This strategy is considered as the baseline strategy, which produces 35% accurate result in WSD task. Next, two extensions are adopted over this baseline strategy: (a) Principal Component Analysis (PCA) over the feature vector, which produces 52% accuracy in WSD task and (b) Context Expansion of the sentences using Bengali WordNet coupled with PCA, which produces 61% accuracy in WSD task. The data sets that are used in this work are obtained from the Bengali corpus, developed under the Technology Development for the Indian Languages (TDIL) project of the Government of India, and the lexical knowledge base (i.e., the Bengali WordNet) used in the work is developed at the Indian Statistical Institute, Kolkata, under the Indradhanush Project of the DeitY, Government of India. The challenges and the pitfalls of this work are also described in detail in the pre-conclusion section.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

Notes

  1. The TDIL Bengali corpus is obtained from the Linguistic Research Unit Department, ISI, Kolkata.

References

  1. Ide N and Véronis J 1998 Word sense disambiguation: the state of the art. Computational Linguistics 24(1): 1–40

    Google Scholar 

  2. Navigli R 2009 Word sense disambiguation: a survey. ACM Computing Surveys 41(2): 1–69

    Article  Google Scholar 

  3. Sanderson M 1994 Word sense disambiguation and information retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’94, July 03–06, Dublin, Ireland, Springer, New York, pp. 142–151

    Chapter  Google Scholar 

  4. Mihalcea R and Moldovan D 2000 An iterative approach to word sense disambiguation. In: Proceedings of FLAIRS, Orlando, FL, pp. 219–223

  5. Sanderson M 1994 Word sense disambiguation and information retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’94, Dublin, Ireland, pp. 142–151

    Chapter  Google Scholar 

  6. Banerjee S and Pedersen T 2002 An adapted Lesk algorithm for word sense disambiguation using WordNet. In: Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, pp. 136–145

    Chapter  Google Scholar 

  7. Lesk M 1986 Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of SIGDOC ’86, the 5th Annual International Conference on Systems Documentation, Toronto, Ontario, Canada, pp. 24–26

  8. Seo H, Chung H, Rim H, Myaeng S H and Kim S 2004 Unsupervised word sense disambiguation using WordNet relatives. Computer Speech and Language 18(3): 253–273

    Article  Google Scholar 

  9. Martin W T and Berlanga L R 2012 A clustering-based approach for unsupervised word sense disambiguation. In: Procesamiento del Lenguaje Natural, Revista no 49, pp 49–56

    Google Scholar 

  10. Heyan H, Zhizhuo Y and Ping J 2011 Unsupervised word sense disambiguation using neighborhood knowledge. In: Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation, pp. 333–342

  11. Niu C, Li W, Srihari R K, Li H and Crist L 2004 Context clustering for word sense disambiguation based on modeling pairwise context similarities. In: Proceedings of SENSEVAL-3, Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain

  12. Jurafsky D and Martin J H 2000 Speech and language processing. ISBN 81-7808-594-1, Pearson Education (Singapore) Pte. Ltd. Indian Branch, Delhi 110092, India

  13. Singh R L, Ghosh K, Nongmeikapam K and Bandyopadhyay S 2014 A decision tree based word sense disambiguation system in Manipuri language. Advanced Computing: An International Journal 5(4): 17–22

    Google Scholar 

  14. Haroon R P 2010 Malayalam word sense disambiguation. In: Proceedings of the 2010 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC)

  15. Kumar R and Khanna R 2011 Natural language engineering: the study of word sense disambiguation in Punjabi. Research Cell: An International Journal of Engineering Sciences 1: 230–238

    Google Scholar 

  16. Sarmah J and Sarma S K 2016 Decision tree based word sense disambiguation for Assamese. International Journal of Computer Applications 141: 42–48

    Article  Google Scholar 

  17. Kalita P and Barman A K 2015 Implementation of Walker algorithm in word sense disambiguation for Assamese language. In: Proceedings of the International Symposium on Advanced Computing and Communication (ISACC), pp. 136–140

  18. Shahid H and Preeti Y 2014 Study of Hindi word sense disambiguation based on Hindi WorldNet. International Journal for Research in Applied Science and Engineering Technology 2(5): 390–395

    Google Scholar 

  19. Vishwarkarma S and Vishwarkarma C 2012 A graph-based approach to word sense disambiguation for Hindi language. International Journal of Scientific Research Engineering & Technology 1(5): 313–318

    Google Scholar 

  20. Singh S 2013 Hindi word sense disambiguation using semantic relatedness measure. In: Proceedings of the International Workshop on Multi-disciplinary Trends in Artificial Intelligence, pp. 247–256

    Google Scholar 

  21. Yadav P and Vishwarkarma S 2013 Mining association rules based approach to word sense disambiguation for Hindi language. International Journal of Emerging Technology and Advanced Engineering 3(5): 470–473

    Google Scholar 

  22. Tomar G S et al 2013 Probabilistic latent semantic analysis for unsupervised word sense disambiguation. International Journal of Computer Science Issues 10(5): 127–133

    Google Scholar 

  23. Kumari S and Singh P 2013 Optimized word sense disambiguation in Hindi using genetic algorithm. International Journal of Research in Computer and Communication Technology 2(7): 445–449

    Google Scholar 

  24. Tayal D K 2015 Word sense disambiguation in Hindi language using hyperspace analogue to language and fuzzy-C means clustering. In: Proceedings of the International Conference on Natural Language Processing (ICON)

  25. Roy A, Sarkar S and Purkayastha B S 2014 Knowledge based approaches to Nepali word sense disambiguation. International Journal on Natural Language Computing 3(3): 51–63

    Article  Google Scholar 

  26. Aung N T, Soe K M and Thein N L 2011 A word sense disambiguation system using Naive Bayes algorithm for Myanmar language. International Journal of Scientific & Engineering Research 2(9): 1–7

    Google Scholar 

  27. Merhben L, Zouaghi A and Zrigui M 2010 Ambiguous Arabic words disambiguation. In: Proceedings of the 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 157–164

  28. Bouhriz N, Benabbou F and Lahmar E H B 2016 Word sense disambiguation approach for Arabic text. International Journal of Advanced Computer Science and Applications 7(4): 381–385

    Article  Google Scholar 

  29. Merhbene L, Zouaghi A and Zrigui M 2013 A semi-supervised method for Arabic word sense disambiguation using a weighted directed graph. In: Proceedings of the International Joint Conference on Natural Language Processing, pp. 1027–1031

  30. Das A and Sarkar S 2013 Word sense disambiguation in Bengali applied to Bengali–Hindi machine translation. In: Proceedings of the 10th International Conference on Natural Language Processing (ICON), Noida, India

  31. Pandit R and Naskar S K 2015 A memory based approach to word sense disambiguation in Bangla using k-NN method. In: Proceedings of the 2nd IEEE International Conference on Recent Trends in Information Systems (ReTIS), pp. 383–386

  32. Nazah S, Hoque M M and Hossain R 2017 Word sense disambiguation of Bangla sentences using statistical approach. In: Proceedings of the 3rd International Conference on Electrical Information and Communication Technology (EICT), pp. 1–6

  33. Pal A R, Saha D, Naskar S and Dash N S 2015 Word sense disambiguation in Bengali: a lemmatized system increases the accuracy of the result. In: Proceedings of the 2nd IEEE International Conference on Recent Trends in Information Systems (ReTIS), pp. 342–346

  34. Dash N S 1999 Corpus oriented Bangla language processing. Jadavpur Journal of Philosophy 11(1): 1–28

    MathSciNet  Google Scholar 

  35. Dash N S and Chaudhuri B B 2001 A corpus based study of the Bangla language. Indian Journal of Linguistics 20: 19–40

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alok Ranjan Pal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pal, A.R., Saha, D. Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications. Sādhanā 44, 168 (2019). https://doi.org/10.1007/s12046-019-1149-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-019-1149-2

Keywords

Navigation