English-Hindi Cross Language Query Translation and Disambiguation Using Most Salient Seed Word

Maurya, Pratibha

doi:10.1007/978-3-030-71187-0_5

English-Hindi Cross Language Query Translation and Disambiguation Using Most Salient Seed Word

Pratibha Maurya²⁰

Conference paper
First Online: 03 June 2021

2089 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1351))

Abstract

Natural languages suffer from two types of ambiguity namely Lexical ambiguity and Syntactic ambiguity. This paper deals only in lexical ambiguity, ambiguity that arises when a word has two or more possible meanings. English language is no excuse. To translate English to Hindi query in Cross Language Information Retrieval, these ambiguous words need to be disambiguated properly for relevant Hindi language documents to be retrieved. This paper aims to find the most salient context word in the English query and use it as a single disambiguation feature in contrast of using the entire query as a context for disambiguation. When the entire query is used as a context for disambiguation, all the terms are assumed as equally important. Ideally this is not true always. All the terms in the source query are not as predicative of the word being translated as others and thus treating all query terms as uniformly important may not always be a wise decision. This paper aims to investigate this claim by proposing two methods which use either statistical mean or contribution ratio to find the best context seed word to disambiguate user query terms. The proposed methods are compared to baseline method which uses entire query as disambiguation feature. The proposed methods achieve 85% precision as compared to baseline method, which is quiet good and thus these methods can be used with high confidence for query translation and disambiguation instead of using entire query context as done by most of the researchers.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Sever, Y., Ercan, G.: Evaluating cross-lingual textual similarity on dictionary alignment problem. Lang. Res. Eval. 1–20 (2020)
Google Scholar
Bhattacharya, P., et al.: Using communities of words derived from multilingual word vectors for cross-language information retrieval in Indian languages. ACM Trans. Asian Low Res. Lang. Inf. Process. 18(1), 1–27 (2018)
Google Scholar
https://assets.kpmg.com/content/dam/kpmg/in/pdf/2017/04/Indian-languages-Defining-Indias-Internet.pdf
Çakal, Ö.Ö., Mahdavi, M., Abedjan, Z.: CLRL: feature engineering for cross-language record linkage. In: EDBT, pp. 678–681 (2019)
Google Scholar
Chandra, G., Dwivedi, S.K.: Query expansion for effective retrieval results of hindi–english cross-lingual IR. Appl. Artif. Intell. 33(7), 567–593 (2019)
Article Google Scholar
Rekabsaz, N., et al.: Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian. Computation and Language, Cornell University Library, arXiv.org > cs > arXiv:1711.06196, (2017)
Mohamed, E., Elmougy, S., Aref, M.: Toward multi-lingual information retrieval system based on internet linguistic diversity measurement. Ain Shams Eng. J. 10(3), 489–497 (2019)
Article Google Scholar
Cheung, P., Fung, P.: Translation disambiguation in mixed language queries. Mach. Transl. 18, 251–273 (2004)
Article Google Scholar
Lacerra, C., Bevilacqua, M., Pasini, T., Navigli, R.: CSI: a coarse sense inventory for 85% word sense disambiguation. In: AAAI, pp. 8123–8130 (2020)
Google Scholar
Wang, Y., Yin, F., Liu, J., Tosato, M.: Automatic construction of domain sentiment lexicon for semantic disambiguation. Multi. Tools Appl. 79, 22355–22373 (2020)
Google Scholar
Rosenfeld, R.: A Corpus-Based Approach to Language Learning, Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA (1995)
Google Scholar
Dagan, I., Alon, I.: Word sense disambiguation using a second language monolingual corpus. Comput. Linguist. 20, 564–596 (1994)
Google Scholar
Zagibalov, Taras, Carroll, J.: Automatic seed word selection for unsupervised sentiment classification of Chinese text. In: COLING 2008: Proceedings of the 22nd International Conference on Computational Linguistics. ACL, pp. 1073–1080, Morristown, NJ, USA (2008)
Google Scholar
Butnaru, A.M., Ionescu, R.T.: ShotgunWSD 2.0: Aan improved algorithm for global word sense disambiguation. IEEE Access, 7, 120961–120975 (2019)
Google Scholar
Li, Z.H.I., Yang, F.A.N., Luo, Y.: Context embedding based on bi-LSTM in semi-supervised biomedical word sense disambiguation. IEEE Access 7, 72928–72935 (2019)
Article Google Scholar
https://www.techopedia.com/definition/26136/statistical-mean
Christof, M. and Bonnie, J. D.: Iterative translation disambiguation for cross-language information retrieval. In: Proceedings of the 28th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp 520–527 (2005)
Google Scholar
http://ltrc.iiit.ac.in/onlineServices/Dictionaries/Dict_Frame.html
http://www.cfilt.iitb.ac.in/Downloads.html
Bajpai, P., Verma, P., Abbas, S.Q.: Two level disambiguation model for query translation. Int. J. Electr. Comput. Eng. (IJECE) 8(5) (2018)
Google Scholar
Spink, A., Jansen, B., Blakely, C., Koshman, S.: A study of results overlap and uniqueness among major web search engines. Inf. Process. Manage. 42(5), 1379–1391 (2006)
Article Google Scholar
Bajpai, P., Verma, P., Abbas, S.Q.: English- Hindi cross language information retrieval system: query perspective. J. Comput. Sci. 14(5), 705–713 (2018)
Article Google Scholar
Jarvelin, K., Kekalainen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Amity Institute of Information Technology, Amity University, Lucknow Campus, Uttar Pradesh, India
Pratibha Maurya

Authors

Pratibha Maurya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pratibha Maurya .

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Ajith Abraham
Department of Computer Science, Università degli Studi di Milano, Milan, Milano, Italy
Vincenzo Piuri
Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Niketa Gandhi
Campus Centre de Créteil, Université Paris-Est Créteil, Créteil, France
Patrick Siarry
Department of Construction Management and Real Estate, Vilnius Gediminas Technical University, Vilnius, Lithuania
Arturas Kaklauskas
School of Engineering, Instituto Superior de Engenharia do Porto, Porto, Portugal
Ana Madureira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maurya, P. (2021). English-Hindi Cross Language Query Translation and Disambiguation Using Most Salient Seed Word. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds) Intelligent Systems Design and Applications. ISDA 2020. Advances in Intelligent Systems and Computing, vol 1351. Springer, Cham. https://doi.org/10.1007/978-3-030-71187-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-71187-0_5
Published: 03 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71186-3
Online ISBN: 978-3-030-71187-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics