Skip to main content

Word Sense Disambiguation from English to Indic Language: Approaches and Opportunities

  • Conference paper
  • First Online:
Soft Computing and Its Engineering Applications (icSoftComp 2022)

Abstract

Ambiguity is one of the major challenges in Natural Language Processing and the process to solve is known as Word Sense Disambiguation. It is useful to determine the appropriate meaning of polysemy words in a given context using computational methods. Generally, Knowledge, Supervised, and Unsupervised based approaches are the most common methods used to resolve ambiguity problems that occur in a sentence. The government of India has initiated many digital services for its citizen in the last decade. All these services require natural language processing to be easily accessed by web portals or any electronic gadget. Also, these services are provided by the government in Hindi or other Indian languages to better serve Indian citizens. Since English and other languages like Chinese, Japanese, and Korean have plenty of resources available to build applications based on natural language processing but due to low resources available for disambiguating polysemous words in Hindi and other Indian languages, it becomes a hindrance to building any application based on these languages. In this paper, the suggested method enables the assessment of the correct meaning in terms of sustaining data sequences. In order to automatically extract features, the proposed method uses an RNN neural network model. Additionally, it integrates glosses from IndoWordNet. The outcomes demonstrate that the suggested technique performs consistently and significantly better than the alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Agre, G., Petrov, D., Keskinova, S.: A new approach to the supervised word sense disambiguation. In: Agre, G., van Genabith, J., Declerck, T. (eds.) AIMSA 2018. LNCS (LNAI), vol. 11089, pp. 3–15. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99344-7_1

    Chapter  Google Scholar 

  • Banerjee, S., Pedersen, T.: Extended gloss overlaps as a measure of semantic relatedness. In: Ijcai, pp. 805–810 (2003)

    Google Scholar 

  • Banerjee, S., Pedersen, T.: An adapted lesk algorithm for word sense disambiguation using wordnet. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 136–145. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_11

    Chapter  Google Scholar 

  • Basile, P., Caputo, A., Semeraro, G.: An enhanced lesk word sense disambiguation algorithm through a distributional semantic model. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 1591–1600 (2014)

    Google Scholar 

  • Bhattacharyya, P.: Indowordnet. Lexical Resources Engineering Conference 2010 (Lrec 2010). Malta (2010)

    Google Scholar 

  • Bhingardive, S., et al.: Unsupervised most frequent sense detection using word embeddings. In: DENVER, Citeseer (2015)

    Google Scholar 

  • Bhingardive, S., Bhattacharyya, P.: Word sense disambiguation using IndoWordNet. In: Dash, N.S., Bhattacharyya, P., Pawar, J.D. (eds.) The WordNet in Indian Languages, pp. 243–260. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-1909-8_15

    Chapter  Google Scholar 

  • Butnaru, A.M., Ionescu, R.T.R.: ShotgunWSD 2.0: an improved algorithm for global word sense disambiguation. IEEE Access 7, 120961–120975 (2019). https://doi.org/10.1109/ACCESS.2019.2938058

    Article  Google Scholar 

  • Gautam, C.B.S., Sharma, D.K.: Hindi word sense disambiguation using lesk approach on bigram and trigram words. In: Proceedings of the International Conference on Advances in Information Communication Technology & Computing, pp. 1–5 (2016)

    Google Scholar 

  • Iacobacci, I., Pilehvar, M.T., Navigli, R.: Embeddings for word sense disambiguation: an evaluation study. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 897–907 (2016)

    Google Scholar 

  • Jha, S., Dipak, N., Prabhakar, P., Pushpak, B.: “A Wordnet for Hindi. In: International Workshop on Lexical Resources in Natural Language Processing. Hyderabad, India (2001)

    Google Scholar 

  • Khapra, M.M, Joshi, S., Bhattacharyya, P.: It takes two to tango: a bilingual unsupervised approach for estimating sense distributions using expectation maximization. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 695–704 (2011)

    Google Scholar 

  • Kumari, A., Lobiyal, D.K.: Word2vec’s distributed word representation for Hindi word sense disambiguation. In: Hung, D.V., D’Souza, M. (eds.) ICDCIT 2020. LNCS, vol. 11969, pp. 325–335. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36987-3_21

    Chapter  Google Scholar 

  • Kumari, A., Lobiyal, D.K.: Efficient estimation of Hindi WSD with distributed word representation in vector space. J. King Saud Univ. Comput. Inform. Sci. 34, 6092–6103(2021)

    Google Scholar 

  • Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26 (1986)

    Google Scholar 

  • Li, X., You, S., Chen, W.: Enhancing accuracy of semantic relatedness measurement by word single-meaning embeddings. IEEE Access 9, 117424–117433 (2021). https://doi.org/10.1109/ACCESS.2021.3107445

    Article  Google Scholar 

  • Liu, Y.-F., Wei, J.: Word sense disambiguation with massive contextual texts. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds.) DASFAA 2019. LNCS, vol. 11448, pp. 430–433. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-18590-9_60

    Chapter  Google Scholar 

  • Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  • Nair, A., Kyada, K., Zadafiya, N.: Implementation of word sense disambiguation on hadoop using map-reduce, pp. 573–580. Springer, In Information and Communication Technology for Intelligent Systems (2019)

    Google Scholar 

  • Narayan, D., Chakrabarti, D., Pande, P., Bhattacharyya, P.: An experience in building the indo wordnet-a wordnet for Hindi. In: First International Conference on Global WordNet. Mysore, India (2002)

    Google Scholar 

  • Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. 41(2), 1–69 (2009)

    Article  Google Scholar 

  • Ng, H.T., Lee. H.B.: Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach. In: 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, pp. 40–47. Association for Computational Linguistics, California, USA (1996). https://www.aclweb.org/anthology/P96-1006

  • Nguyen, Q.-P., Vo, A.-D., Shin, J.-C., Ock, C.-Y.: Effect of word sense disambiguation on neural machine translation: a case study in Korean. IEEE Access 6, 38512–38523 (2018). https://doi.org/10.1109/ACCESS.2018.2851281

    Article  Google Scholar 

  • Pasini, T., Navigli, R.: Train-O-Matic: supervised word sense disambiguation with no (manual) effort. Artific. Intell. 279, 103215 (2020). https://doi.org/10.1016/j.artint.2019.103215

    Article  MathSciNet  Google Scholar 

  • Ramamoorthy, N.C., et al.: A Gold Standard Hindi Raw Text Corpus. Central Institute of Indian Languages, Mysore (2019)

    Google Scholar 

  • Sharma, D.K.S.: A comparative analysis of hindi word sense disambiguation and its approaches. In: International Conference on Computing, Communication & Automation, pp. 314–321 (2015)

    Google Scholar 

  • Sharma, D.K.: Hindi word sense disambiguation using cosine similarity. In: Satapathy, S., Joshi, A., Modi, N., Pathak, N. (eds.) Proceedings of International Conference on ICT for Sustainable Development. AISC, vol. 409, 801–808. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-0135-2_76

  • Sheth, M., Popat, S., Vyas, T.: Word sense disambiguation for indian languages. In: Shetty, N., Patnaik, L., Prasad, N., Nalini, N. (eds.) Emerging Research in Computing, Information, Communication and Applications. ERCICA 2016. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-4741-1_50

  • Singh, S., Siddiqui, T.J.: Sense annotated Hindi Corpus. In: 2016 International Conference on Asian Language Processing (IALP), pp. 22–25. IEEE (2016)

    Google Scholar 

  • Singh, S., Siddiqui, T.J., Sharma, S.K.: Naïve bayes classifier for hindi word sense disambiguation. In: Proceedings of the 7th ACM India Computing Conference, pp. 1–8 (2014)

    Google Scholar 

  • Sinha, M., et al.: Hindi word sense disambiguation. In: International Symposium on Machine Translation, Natural Language Processing and Translation Support Systems. Delhi, India (2004)

    Google Scholar 

  • Soni, V.K., Gopalaniî, D., Govil, M.C.: An adaptive approach for word sense disambiguation for Hindi Language. In: IOP Conference Series: Materials Science and Engineering, p. 12022. IOP Publishing (2021)

    Google Scholar 

  • Taghipour, K., Ng, H.T.: Semi-supervised word sense disambiguation using word embeddings in general and specific domains. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 314–323 (2015)

    Google Scholar 

  • Tripathi, P., et al.: Word sense disambiguation in hindi language using score based modified lesk algorithm. Int. J. Comput. Dig. Syst. 10, 2–20 (2020)

    Google Scholar 

  • Vaishnav, Z.B., Sajja, P.S.: Knowledge-Based Approach for Word Sense Disambiguation Using Genetic Algorithm for Gujarati, pp. 485–494. Springer, In Information and Communication Technology for Intelligent Systems (2019)

    Google Scholar 

  • Wilks, Y.A., Slator, B.M., Guthrie, L.: Electric Words: Dictionaries, Computers, and Meanings. The MIT Press (1996). https://doi.org/10.7551/mitpress/2663.001.0001

    Book  Google Scholar 

  • Zhong, Z., Ng, H.T.: It makes sense: a wide-coverage word sense disambiguation system for free text. In: Proceedings of the ACL 2010 System Demonstrations, pp. 78–83 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Binod Kumar Mishra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mishra, B.K., Jain, S. (2023). Word Sense Disambiguation from English to Indic Language: Approaches and Opportunities. In: Patel, K.K., Santosh, K.C., Patel, A., Ghosh, A. (eds) Soft Computing and Its Engineering Applications. icSoftComp 2022. Communications in Computer and Information Science, vol 1788. Springer, Cham. https://doi.org/10.1007/978-3-031-27609-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27609-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27608-8

  • Online ISBN: 978-3-031-27609-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics