Skip to main content

English-Hindi Cross Language Query Translation and Disambiguation Using Most Salient Seed Word

  • Conference paper
  • First Online:
  • 2089 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1351))

Abstract

Natural languages suffer from two types of ambiguity namely Lexical ambiguity and Syntactic ambiguity. This paper deals only in lexical ambiguity, ambiguity that arises when a word has two or more possible meanings. English language is no excuse. To translate English to Hindi query in Cross Language Information Retrieval, these ambiguous words need to be disambiguated properly for relevant Hindi language documents to be retrieved. This paper aims to find the most salient context word in the English query and use it as a single disambiguation feature in contrast of using the entire query as a context for disambiguation. When the entire query is used as a context for disambiguation, all the terms are assumed as equally important. Ideally this is not true always. All the terms in the source query are not as predicative of the word being translated as others and thus treating all query terms as uniformly important may not always be a wise decision. This paper aims to investigate this claim by proposing two methods which use either statistical mean or contribution ratio to find the best context seed word to disambiguate user query terms. The proposed methods are compared to baseline method which uses entire query as disambiguation feature. The proposed methods achieve 85% precision as compared to baseline method, which is quiet good and thus these methods can be used with high confidence for query translation and disambiguation instead of using entire query context as done by most of the researchers.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Sever, Y., Ercan, G.: Evaluating cross-lingual textual similarity on dictionary alignment problem. Lang. Res. Eval. 1–20 (2020)

    Google Scholar 

  2. Bhattacharya, P., et al.: Using communities of words derived from multilingual word vectors for cross-language information retrieval in Indian languages. ACM Trans. Asian Low Res. Lang. Inf. Process. 18(1), 1–27 (2018)

    Google Scholar 

  3. https://assets.kpmg.com/content/dam/kpmg/in/pdf/2017/04/Indian-languages-Defining-Indias-Internet.pdf

  4. Çakal, Ö.Ö., Mahdavi, M., Abedjan, Z.: CLRL: feature engineering for cross-language record linkage. In: EDBT, pp. 678–681 (2019)

    Google Scholar 

  5. Chandra, G., Dwivedi, S.K.: Query expansion for effective retrieval results of hindi–english cross-lingual IR. Appl. Artif. Intell. 33(7), 567–593 (2019)

    Article  Google Scholar 

  6. Rekabsaz, N., et al.: Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian. Computation and Language, Cornell University Library, arXiv.org > cs > arXiv:1711.06196, (2017)

  7. Mohamed, E., Elmougy, S., Aref, M.: Toward multi-lingual information retrieval system based on internet linguistic diversity measurement. Ain Shams Eng. J. 10(3), 489–497 (2019)

    Article  Google Scholar 

  8. Cheung, P., Fung, P.: Translation disambiguation in mixed language queries. Mach. Transl. 18, 251–273 (2004)

    Article  Google Scholar 

  9. Lacerra, C., Bevilacqua, M., Pasini, T., Navigli, R.: CSI: a coarse sense inventory for 85% word sense disambiguation. In: AAAI, pp. 8123–8130 (2020)

    Google Scholar 

  10. Wang, Y., Yin, F., Liu, J., Tosato, M.: Automatic construction of domain sentiment lexicon for semantic disambiguation. Multi. Tools Appl. 79, 22355–22373 (2020)

    Google Scholar 

  11. Rosenfeld, R.: A Corpus-Based Approach to Language Learning, Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA (1995)

    Google Scholar 

  12. Dagan, I., Alon, I.: Word sense disambiguation using a second language monolingual corpus. Comput. Linguist. 20, 564–596 (1994)

    Google Scholar 

  13. Zagibalov, Taras, Carroll, J.: Automatic seed word selection for unsupervised sentiment classification of Chinese text. In: COLING 2008: Proceedings of the 22nd International Conference on Computational Linguistics. ACL, pp. 1073–1080, Morristown, NJ, USA (2008)

    Google Scholar 

  14. Butnaru, A.M., Ionescu, R.T.: ShotgunWSD 2.0: Aan improved algorithm for global word sense disambiguation. IEEE Access, 7, 120961–120975 (2019)

    Google Scholar 

  15. Li, Z.H.I., Yang, F.A.N., Luo, Y.: Context embedding based on bi-LSTM in semi-supervised biomedical word sense disambiguation. IEEE Access 7, 72928–72935 (2019)

    Article  Google Scholar 

  16. https://www.techopedia.com/definition/26136/statistical-mean

  17. Christof, M. and Bonnie, J. D.: Iterative translation disambiguation for cross-language information retrieval. In: Proceedings of the 28th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp 520–527 (2005)

    Google Scholar 

  18. http://ltrc.iiit.ac.in/onlineServices/Dictionaries/Dict_Frame.html

  19. http://www.cfilt.iitb.ac.in/Downloads.html

  20. Bajpai, P., Verma, P., Abbas, S.Q.: Two level disambiguation model for query translation. Int. J. Electr. Comput. Eng. (IJECE) 8(5) (2018)

    Google Scholar 

  21. Spink, A., Jansen, B., Blakely, C., Koshman, S.: A study of results overlap and uniqueness among major web search engines. Inf. Process. Manage. 42(5), 1379–1391 (2006)

    Article  Google Scholar 

  22. Bajpai, P., Verma, P., Abbas, S.Q.: English- Hindi cross language information retrieval system: query perspective. J. Comput. Sci. 14(5), 705–713 (2018)

    Article  Google Scholar 

  23. Jarvelin, K., Kekalainen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pratibha Maurya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Maurya, P. (2021). English-Hindi Cross Language Query Translation and Disambiguation Using Most Salient Seed Word. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds) Intelligent Systems Design and Applications. ISDA 2020. Advances in Intelligent Systems and Computing, vol 1351. Springer, Cham. https://doi.org/10.1007/978-3-030-71187-0_5

Download citation

Publish with us

Policies and ethics