Skip to main content
Log in

A Novel Unsupervısed Graph-Based Algorıthm for Hindi Word Sense Disambiguation

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Natural languages are inherently ambiguous. Ambiguities exist at many levels, word sense ambiguity being one of them. Resolving sense ambiguity is crucial in many Natural Language Processing applications. In this paper, we focus on word sense ambiguity and propose an unsupervised graph-based algorithm for Hindi Word Sense disambiguation task. The work is motivated by the encouraging results achieved by graph-based WSD algorithms for English and other European languages and the lack of wide-coverage sense annotated dataset for Hindi. The proposed algorithm creates a weighted graph wherein the nodes represent the senses of words appearing in the context of an ambiguous word and the edges depict relations between them. It uses semantic similarity derived from Hindi WordNet to assign weight to edges and a random walk-type algorithm to assign the most appropriate sense to a polysemous word in a given context. The evaluation has been done on a sense annotated dataset comprising 20 polysemous nouns. We observed an overall accuracy of 63.39% which is better than earlier reported work on the same dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Jain A, Lobiyal DK. A new approach for unsupervised word sense disambiguation in Hindi language using graph connectivity measures. Int J Artif Intell Soft Comput. 2014;4(4):318–34.

    Google Scholar 

  2. Jain A, Lobiyal DK. Fuzzy Hindi WordNet and word sense disambiguation using fuzzy graph connectivity measures. ACM Trans Asian Low-Resour Lang Inf Process. 2015;15(2):1–31.

    Article  Google Scholar 

  3. Jain A, Lobiyal DK. Unsupervised Hindi word sense disambiguation based on network agglomeration. In: 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom). 2015; 195–200.

  4. Jain A, Yadav S, Tayal D. Measuring context-meaning for open class words in Hindi language. In:Proc. of 2013 Sixth International Conference on Contemporary Computing (IC3). IEEE. 2013;pp. 118–123.

  5. Yarowsky D. Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd annual meeting of the association for computational linguistics. Cambridge. 1995; pp. 189–196

  6. Agirre E, Soroa A. Semeval-2007 Task 02: Evaluating word sense induction and discrimination systems. In: Proceedings of SemEval-2007, Prague. Czech Republic. 2007; pp. 7–12.

  7. Agirre E, Martinez D, de Lacalle O, Soroa A. Two graph-based algorithms for state-of-the-art WSD. In: Proceedings of EMNLP-2006. Sydney, Australia; 2006, pp. 585–593.

  8. Agirre E, de Lacalle OL, Soroa A. Random walks for knowledge-based word sense disambiguation. Comput Linguist. 2014;40(1):57–84.

    Article  Google Scholar 

  9. https://www.cfilt.iitb.ac.in/wordnet/webhwn/

  10. Klapaftis I, Manandhar S. Word sense induction using graphs of collocations. In: ECAI. July 2008. pp. 298–302. http://dx.doi.org/https://doi.org/10.3233/978-1-58603-891-5-298

  11. Cuadros M, Rigau G. KnowNet: building a large net of knowledge from the Web. In Proc. of COLING-08.2008; pp161–168.

  12. Bevilacqua M, Pasini T, Raganato A, Navigli R. Recent trends in word sense disambiguation: a survey. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. International Joint Conference on Artificial Intelligence, Inc. 2021; pp. 4330–4338.

  13. Mishra N, Yadav S, Siddiqui TJ. An unsupervised approach to hindi word sense disambiguation. In: Tiwary US, Siddiqui TJ, Radhakrishna M, Tiwari MD (Eds) Proceedings of the First International Conference on Intelligent Human Computer Interaction. Springer, New Delhi. 2009. https://doi.org/10.1007/978-81-8489-203-1_32

  14. Kouris P, Alexandridis G, Stafylopatis A. Abstractive text summarization: enhancing sequence-to-sequence models using word sense disambiguation and semantic content generalization. Comput Linguist. 2021;47(4):813–85.

    Article  Google Scholar 

  15. Navigli R. Word sense disambiguation: a survey. ACM Comput Surv. 2009;41(2):1–69. https://doi.org/10.1145/1459352.1459355.

    Article  Google Scholar 

  16. Mihalcea R, Tarau P, Figa E. Pagerank on semantic networks with application to word sense disambiguation. In: COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics. 2004; pp. 1126–1132.

  17. Mihalcea R. Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. 2005;pp 411–418. DOI: https://doi.org/10.3115/1220575.1220627

  18. Sinha R, Mihalcea R. Unsupervised graph-based word sense disambiguation using measures of semantic similarity. In the Proceedings of International Conference on Semantic Computing. IEEE. 2007; pp. 363–369. http://dx.doi.org/https://doi.org/10.1109/ICSC.2007.87.

  19. Singh S, Siddiqui TJ. Role of karaka relations in hindi word sense disambiguation. J Inf Technol Res. 2015;8(3):21–42. https://doi.org/10.4018/JITR.2015070102.

    Article  Google Scholar 

  20. Bhingardive S, Redkar H,Sappadla P, Singh D, and Bhattacharyya P. IndoWordNet::similarity-computing semanticsimilarity and relatedness using indoWordNet. In: Proceedings of the 8th Global WordNet Conference (GWC). 2016; pp. 39–43

  21. Ponzetto SP, Navigli R. Knowledge-rich word sense disambiguation rivaling supervised systems. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden: Association for Computational Linguistics. 2010; Pp. 1522–1531.

  22. Singh S, Siddiqui TJ. Utilizing corpus statistics for hindi word sense disambiguation. Int Arab J Inform Technol. 2015;12(6A):755–63.

    Google Scholar 

  23. Singh S and Siddiqui Tanveer J. Evaluating effect of context window size, stemming and stop word removal on Hindi word sense disambiguation. In: International Conference on Information Retrieval & Knowledge Management (CAMP). 2012.

  24. Singh S, SiddiquiTanveer J, Sharma Sunil K. Naïve Bayes classifier for Hindi word sense disambiguation. In: Proceedings of the 7th ACM India computing conference. 2014; pp. 1–8.

  25. Singh S, Singh VK, Siddiqui TJ. Hindi word sense disambiguation using semantic relatedness measure. In: the Proceedings of MIWAI 2013, LNCS 8271, Springer. Berlin. 2013. pp. 247–256.

  26. Vishwakarma SK, Vishkarma CK. A graph based approach to word sense disambiguation for Hindi language. Int J Sci Res Eng Technol (IJSRET). 2012;1:313–8.

    Google Scholar 

  27. Sense Annotated Hindi Corpus: Indian Language Technology Proliferation and Deployment Centre. https://tdil-dc.in/index.php

  28. Zhong Z, Ng HT. Word sense disambiguation improves information retrieval. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Jeju, Republic of Korea. 2012, pp 273–282.

  29. HussainMH, KhanumMA. Word sense disambiguation in software requirement specifications using wordnet and association mining rule. ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies, March 2016, Article No.: 119, Pages 1–4.

  30. Jain G, Lobiyal DK. Word sense disambiguation of hindi text using fuzzified semantic relations and fuzzy hindi WordNet. 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 2019, pp. 494–497.

Download references

Funding

No funding is available for the work reported in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prajna Jha.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Research Trends in Computational Intelligence” guest edited by Anshul Verma, Pradeepika Verma, Vivek Kumar Singh and S. Karthikeyan.

Appendix I

Appendix I

अगर

अगली

अगले

अच्छी

अति

अथवा

अधिक

अनुसार

अनेक

अन्य

अपना

अपनी

अपने

अब

अभी

अलावा

आई

आएाँ

आगे

आती

आदि

आने

आप

आम

आसपास

इतनी

इतने

इन

इनमे

इन्हीं

इन्हे

इस

इसका

इसकी

इसके

लिए

इसके

इसमें

इसलिए

इससे

इसी

इसीलिए

इसे

उतनी

उधर

उन

उनका

उनकी

उनके

उनमें

उनसे

उन्हीं

उन्हे

उन्हें

उन्होंने

उन्होने

उस

उसका

उसकी

उसके

उससे

उसी

उसे

ऊपर

एक

एक-एक

एवं

ऐसा

ऐसी

ऐसे

ओर

कई

कछ

कब

कभी

कभी-कभी

कम

कया

कर

करके

करता

करती

करते

करना

करनी

करने

करा

कराने

कराया

करेंगे

करेगा

करेगी

का

काफी

काफी

कि

किंतु

किए

कितनी

कितने

किन

किया

किये

किस

किसी

की

कुछ

कुल

के

कारण

कैसे

को

कोई

कौन

क्या

क्यो

क्योकि

गई

गईं

गए

गया

गयी

गये

चलता

चलने

चली

चाहती

चाहते

चाहिए

चाहे

चुका

चुकी

चुके

चुके

छह

छू

जगह

जब

जबकि

जल्द

जल्दी

जहाँ

जहां

जहां-तहां

जा

जाए

जाएं

जाएंगी

जाएगी

जाएाँ

जाकर

जाता

जाती

जाते

जानना

जाना

जाने

जाये

जारी

जितना

जितनी

जिनमें

जिन्हें

जिसका

जिससे

जिसे

जी

जैसा

जैसे

जो

जोर

ज्यादा

ठीक

तक

तथा

तब

तभी

तरफ

तरह

तहत

ताकि

तीन

तो

तौर

था

थी

थे

थोडा

दरअसल

दिए

दिखाए

दिया

दी

दूर

दूसरी

दूसरे

दे

देंगी

देंगे

देकर

देता

देती

देते

देना

देने

दो

दोनो

द्वारा

नई

नए

नया

नहीं

नीचे

ने

पडता

पडने

पडा

पर

परंत

पहला

पहले

पांच

पाएं

पााँच

पीछे

पूरी

प्रति

प्रत्येक

फिर

बजाय

बजे

बडी

बढ़

बढ़ा

बढ़े

बताया

बन

बनाई

बनाए

बनाना

बनाने

बनी

बने

बल्कि

बहुत

बाकी

बाद

बार

बार-बार

बारे

बिना

बीच

बेहद

भी

मगर

मुताबिक

मे

में

यदि

यद्यपि

यह

यहाँ

यही

या

यानी

ये

रखना

रह

रहती

रही

रहे

रहेगा

रहेगी

रहो

रोका

लगभग

लगा

लगाई

लगे

लाकर

लाने

लिए

लिया

लिये

ली

ले

लेकर

लेकिन

लेगी

लेना

लेने

वनाट

वह

वहााँ

वहीं

वाला

वाली

वाले

वालो

विभिन्न

वे

वैसे

वो

शायद

सकता

सकती

सकते

सका

सके

सकेगा

सकेगी

सब

सबकी

सबके

सबसे

सभी

सहज

सही

सा

सात

साथ

साथ-साथ

साफ

सामने

सारे

सिर्फ

सीधे

से

हाँ

हम

हमने

हमारी

हमारे

हमें

हर

हां

हांलांकि

ही

हुआ

हुई

हुए

हूँ

है

हैं

हो

हों

होंगी

होगा

होगी

होता

होती

होते

होना

होनी

होने

  

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jha, P., Agarwal, S., Abbas, A. et al. A Novel Unsupervısed Graph-Based Algorıthm for Hindi Word Sense Disambiguation. SN COMPUT. SCI. 4, 675 (2023). https://doi.org/10.1007/s42979-023-02116-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-02116-1

Keywords

Navigation