Skip to main content
Log in

Lexeme connexion measure of cohesive lexical ambiguity revealing factor: a robust approach for word sense disambiguation of Bengali text

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Word sense disambiguation (WSD) is the process of finding out the appropriate meaning of a polysemous word based on any given context. The Bengali language inherently comprises a large number of polysemous words. Recently, researchers in the domain of linguistics have been attracted to the problem of WSD in Bengali text due to its numerous interesting applications, viz. machine translation, opinion polarity identification, question-answering systems, etc. In this paper, lexeme connexion measure of cohesive lexical ambiguity revealing factor has been proposed that takes a decision on the disambiguation of senses of a Bengali polysemous word. All the polysemous words have been treated as target words, and a context window of three different sizes, viz. five, seven, and ten are considered based on these target words. This paper has generated lexeme harmony measure for quantifying heuristically of syntactic belongings of a collection of lexemes in Bengali text. The proposed methodology has been extracted a feature vector by considering the cohesive lexical ambiguity revealing factor or CLARF, depending on frame lexeme harmony (FLH), sense lexeme harmony (SLH), polysemy singularity coherence (PSC), polysemy distribution factor (PDF), and relative polysemy singularity coherence (RPSC) factor of a lexeme. This Bengali WSD technique has been applied max-rule of integrated lexeme connexion measure (LCM) of each lexeme of both the testing and training cases score for sense recognition. The proposed algorithm has succeeded in eliminating the drawback of the Bengali WSD approaches, as it can focus on both the lexical and semantic relationships between words. The performance of this algorithm has been evaluated on a dataset that consists of 100 polysemous words of three/four senses. Various evaluation metrics have been used to analyse the results obtained by the proposed algorithm. The obtained results indicate the robustness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Algorithm 6
Algorithm 7
Algorithm 8
Algorithm 9
Algorithm 10
Algorithm 11
Algorithm 12
Algorithm 13
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data Availability

The datasets generated and/or analysed during the current study are available in the “Kaggle” repository, https://www.kaggle.com/dsv/3985193 with DOI: 10.34740/KAGGLE/DSV/3985193.

References

  1. Agirre E, De Lacalle OL (2007) Ubc-alm: combining k-nn with svd for wsd. In: Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007), pp 342–345

  2. Agirre E, Edmonds P (2007) Word sense disambiguation: algorithms and applications, vol 33. Springer science & business media

  3. Anand Kumar M, Rajendran S, Soman KP (2014) Tamil word sense disambiguation using support vector machines with rich features. Int J Appl Eng Res 9(20):7609–20

    Google Scholar 

  4. Bala P (2013) Knowledge based approach for word sense disambiguation using hindi wordnet. Int J Eng Sci 2(4):36–41

    Google Scholar 

  5. Banerjee S, Naskar SK, Bandyopadhyay S (2014) Bfqa: a bengali factoid question answering system. In: International conference on text, speech, and dialogue. Springer, pp 217–224

  6. Biswas M, Sharif O, Hoque MM (2021) An empirical framework for bangla word sense disambiguation using statistical approach. In: International conference on machine learning and big data analytics. Springer, pp 22–33

  7. Bonami O, Boyé G, Dal G, Giraudo H, Namer F (2018) The lexeme in descriptive and theoretical morphology. Language science press

  8. Cohn T (2003) Performance metrics for word sense disambiguation. In: Proceedings of the australasian language technology workshop, vol 2003, pp 86–93

  9. Dang HT, Chia C-Y, Palmer M, Chiou F-D (2002) Simple features for chinese word sense disambiguation. In: Proceedings of the 19th international conference on computational linguistics. Association for computational linguistics, vol 1, pp 1–7

  10. Das D, Bandyopadhyay S (2009) Word to sentence level emotion tagging for bengali blogs. In: Proceedings of the ACL-IJCNLP 2009 conference short papers, pp 149–152

  11. Das A, Bandyopadhyay S (2009) Subjectivity detection in english and bengali: a crf-based approach. Proceeding of ICON

  12. Das A, Bandyopadhyay S (2010) Opinion-polarity identification in bengali. In: International conference on computer processing of oriental languages, pp 169–182

  13. Das A, Sarkar S (2013) Word sense disambiguation in bengali applied to bengali-hindi machine translation. In: Proc of international conference on natural language processing (ICON), vol 10, pp 20–28

  14. Das Dawn D, Khan A, Shaikh SH, Pal RK (2022) A dataset for evaluating Bengali word sense disambiguation techniques. J Ambient Intell Humanized Comput 1–30

  15. Dawn DD, Shaikh SH, Pal RK (2020) A comprehensive review of bengali word sense disambiguation. Artif Intell Rev 53(6):4183–4213

    Article  Google Scholar 

  16. Dey A (2020) Attention based lstm cnn framework for sentiment extraction from bengali texts. In: 2020 11th International conference on electrical and computer engineering (ICECE). IEEE, pp 226–229

  17. Dhungana UR, Shakya S (2014) Word sense disambiguation in nepali language. In: 2014 fourth international conference on digital information and communication technology and its applications (DICTAP). IEEE, pp 46–50

  18. Ekbal A, Haque R, Bandyopadhyay S (2007) Bengali part of speech tagging using conditional random field. In: Proceedings of seventh international symposium on natural language processing (SNLP2007), pp 131–136

  19. Florian R, Wicentowski R (2002) Unsupervised Italian word sense disambiguation using wordnets and unlabeled corpora. In: Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions, pp 67–73

  20. Hadni M, Ouatik SEA, Lachkar A (2016) Word sense disambiguation for arabic text categorization. Int Arab J Inf Technol 13(1A):215–222

    Google Scholar 

  21. Haque A, Haque MM (2016) Bangla word sense disambiguation system using dictionary based approach. ICAICT, Bangladesh

  22. Hoste V, Daelemans W, Hendrickx I, Bosch AVD (2002) Dutch word sense disambiguation: optimizing the localness of context. In: Proceedings of the ACL-02 workshop on word sense disambiguation: recent successes and future directions. Association for computational linguistics, vol 8, pp 61–66

  23. Islam M, Islam M, Mohammad Masum AK, Abujar S, Hossain SA et al (2021) Abstraction based bengali text summarization using bi-directional attentive recurrent neural networks. In: Emerging technologies in data mining and information security. Springer, pp 317–327

  24. Joachims T (1996) A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. Technical report, Carnegie-Mellon Univ Pittsburgh PA dept of computer science

  25. Korenius T, Laurikkala J, Järvelin K, Juhola M (2004) Stemming and lemmatization in the clustering of finnish text documents. In: Proceedings of the thirteenth ACM international conference on information and knowledge management, pp 625–633

  26. Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th annual international conference on Systems documentation, pp 24–26

  27. Liu H, Johnson SB, Friedman C (2002) Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the umls. J Am Med Inform Assoc 9(6):621–636

    Article  Google Scholar 

  28. Màrquez L, Escudero G, Martínez D, Rigau G (2007) Supervised corpus-based methods for wsd. In: Word sense disambiguation. Springer, pp 167–216

  29. McCallum A, Nigam K et al (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization. Citeseer, number 1, pp 41–48

  30. Menai MEB (2014) Word sense disambiguation using an evolutionary approach. Informatica, vol 38(3)

  31. Merhbene L, Zouaghi A, Zrigui M (2010) Ambiguous arabic words disambiguation. In: 2010 11th ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing. IEEE, pp 157–164

  32. Merhbene L, Zouaghi A, Zrigui M (2013) A semi-supervised method for arabic word sense disambiguation using a weighted directed graph. In: Proceedings of the sixth international joint conference on natural language processing, pp 1027–1031

  33. Mukaka MM (2012) Statistics corner: a guide to appropriate use of correlation coefficient in medical research malawi medical journal

  34. Murata M, Utiyama M, Uchimoto K, Ma Q, Isahara H (2001) Japanese word sense disambiguation using the simple bayes and support vector machine methods. In: Proceedings of SENSEVAL-2 second international workshop on evaluating word sense disambiguation systems, pp 135–138

  35. Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surveys (CSUR) 41(2):1–69

    Article  Google Scholar 

  36. Navigli R, Velardi P (2005) Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Trans Pattern Anal Mach Intell 27(7):1075–1086

    Article  Google Scholar 

  37. Ng HT, Lee HB (1996) Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach. In: Proceedings of the 34th annual meeting on association for computational linguistics. Association for computational linguistics, pp 40–47

  38. Pal AR, Kundu A, Singh A, Shekhar R, Sinha K (2015) A hybrid approach to word sense disambiguation combining supervised and unsupervised learning. arXiv:1611.01083

  39. Pal AR, Saha D (2016) Word sense disambiguation in bengali: an auto-updated learning set increases the accuracy of the result. In: Information systems design and intelligent applications. Springer, pp 423–430

  40. Pal AR, Saha D (2019) Word sense disambiguation in bengali language using unsupervised methodology with modifications. Sādhanā 44(7):168

    Article  Google Scholar 

  41. Pal AR, Saha D, Dash NS (2015) Automatic classification of bengali sentences based on sense definitions present in bengali wordnet. arXiv:1508.01349

  42. Pal AR, Saha D, Dash NS, Naskar SK, Pal A (2019) A novel approach to word sense disambiguation in bengali language using supervised methodology. Sādhanā 44(8):1–12

    Article  Google Scholar 

  43. Pal AR, Saha D, Naskar S, Dash NS (2015) Word sense disambiguation in bengali: a lemmatized system increases the accuracy of the result. In: 2015 IEEE 2nd international conference on recent trends in information systems (ReTIS). IEEE, pp 342–346

  44. Pal AR, Saha D, Naskar SK, Dash NS (2021) In search of a suitable method for disambiguation of word senses in bengali. Int J Speech Technol 24(2):439–454

    Article  Google Scholar 

  45. Pal AR, Saha D, Pal A (2017) A knowledge based methodology for word sense disambiguation for low resource language. Adv Computat Sci Technol 10 (2):267–283

    Google Scholar 

  46. Palanati DP, Kolikipogu R (2013) Decision list algorithm for word sense disambiguation for telegu natural language processing. Int J Electron Commun Comput Eng 4(6):176–180

    Google Scholar 

  47. Pandit R, Naskar SK (2015) A memory based approach to word sense disambiguation in bengali using k-nn method. In: 2015 IEEE 2nd international conference on recent trends in information systems (reTIS). IEEE, pp 383–386

  48. Parameswarappa S, Narayana VN (2011) Kannada word sense disambiguation using association rules. In: International conference on computing and communication systems. Springer, pp 47–56

  49. Parameswarappa S, Narayana VN, Yarowsky D (2013) Kannada word sense disambiguation using decision list. Int J Emerging Trends Technol Comput Sci (IJETTCS) 2(3):272–278

    Google Scholar 

  50. Pedersen T (2007) Unsupervised corpus-based methods for wsd. In: Word sense disambiguation. Springer, pp 133–166

  51. Rana P, Kumar P (2015) Word sense disambiguation for punjabi language using overlap based approach. In: Advances in intelligent informatics. Springer, pp 607–619

  52. Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. arXiv:cmp-lg/9511007

  53. Ritter A, Etzioni O et al (2010) A latent dirichlet allocation method for selectional preferences. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for computational linguistics, pp 424–434

  54. Roy A, Sarkar S, Purkayastha BS (2014) Knowledge based approaches to nepali word sense disambiguation. Int J Natural Lang Comput (IJNLC) 3(3):51–63

    Article  Google Scholar 

  55. Sarmah J, Sarma SK (2016) Decision tree based supervised word sense disambiguation for assamese. Int J Comput Appl 141(1):42–48

    Google Scholar 

  56. Sengupta S, Pandit R, Mitra P, Naskar SK, Sardar MM (2019) Word sense induction in bengali using parallel corpora and distributional semantics. J Intell Fuzzy Syst 36(5):4821–4832

    Article  Google Scholar 

  57. Sharma DK et al (2015) A comparative analysis of hindi word sense disambiguation and its approaches. In: International conference on computing, communication & automation. IEEE, pp 314–321

  58. Sidorov G, Gelbukh A (2001) Word sense disambiguation in a spanish explanatory dictionary. In: Proceedings of TALN, pp 398–402

  59. Singh RL, Ghosh K, Nongmeikapam K, Bandyopadhyay S (2014) A decision tree based word sense disambiguation system in manipuri language. Adv Comput 5(4):17

    Google Scholar 

  60. Singh S, Singh VK, Siddiqui TJ (2013) Hindi word sense disambiguation using semantic relatedness measure. In: International workshop on multi-disciplinary trends in artificial intelligence. Springer, pp 247–256

  61. Sinha M, Kumar M, Pande P, Kashyap L, Bhattacharyya P (2004) Hindi word sense disambiguation. In: International symposium on machine translation, natural language processing and translation support systems, Delhi, India

  62. Sruthi Sankar KP, Reghu Raj PC, Jayan V (2016) Unsupervised approach to word sense disambiguation in malayalam. Proced Technol 24:1507–1513

    Article  Google Scholar 

  63. Sultana M, Chakraborty P, Choudhury T (2022) Bengali abstractive news summarization using seq2seq learning with attention. In: Cyber intelligence and information retrieval. Springer, pp 279–289

  64. Tayal DK, Ahuja L, Chhabra S (2015) Word sense disambiguation in hindi language using hyperspace analogue to language and fuzzy c-means clustering. In: Proceedings of the 12th international conference on natural language processing, pp 49–58

  65. Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for computational linguistics, pp 384–394

  66. Vishwakarma SK, Vishwakarma CK (2012) A graph based approach to word sense disambiguation for hindi language. Int J Sci Res Eng Technol (IJSRET) Vol 1:313–318

    Google Scholar 

  67. Yadav P, Vishwakarma S (2013) Mining association rules based approach to word sense disambiguation for hindi language. Int J Emerging Technol Adv Eng 3(5):470–473

    Google Scholar 

  68. Zipf GK (1949) Human behavior and the principle of least effort. Adison-Wesley Press

  69. Zouaghi A, Merhbene L, Zrigui M (2011) Word sense disambiguation for arabic language using the variants of the lesk algorithm. WORLDCOMP 11:561–567

    Google Scholar 

  70. Zungre NB, Dhopavkar GM (2016) Sense disambiguation for marathi language words using decision graph method. In: 2016 World conference on futuristic trends in research and innovation for social welfare (startup conclave). IEEE, pp 1–6

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debapratim Das Dawn.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das Dawn, D., Khan, A., Shaikh, S.H. et al. Lexeme connexion measure of cohesive lexical ambiguity revealing factor: a robust approach for word sense disambiguation of Bengali text. Multimed Tools Appl 83, 12939–12983 (2024). https://doi.org/10.1007/s11042-023-14676-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14676-8

Keywords

Navigation