Abstract
Natural languages suffer from two types of ambiguity namely Lexical ambiguity and Syntactic ambiguity. This paper deals only in lexical ambiguity, ambiguity that arises when a word has two or more possible meanings. English language is no excuse. To translate English to Hindi query in Cross Language Information Retrieval, these ambiguous words need to be disambiguated properly for relevant Hindi language documents to be retrieved. This paper aims to find the most salient context word in the English query and use it as a single disambiguation feature in contrast of using the entire query as a context for disambiguation. When the entire query is used as a context for disambiguation, all the terms are assumed as equally important. Ideally this is not true always. All the terms in the source query are not as predicative of the word being translated as others and thus treating all query terms as uniformly important may not always be a wise decision. This paper aims to investigate this claim by proposing two methods which use either statistical mean or contribution ratio to find the best context seed word to disambiguate user query terms. The proposed methods are compared to baseline method which uses entire query as disambiguation feature. The proposed methods achieve 85% precision as compared to baseline method, which is quiet good and thus these methods can be used with high confidence for query translation and disambiguation instead of using entire query context as done by most of the researchers.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Sever, Y., Ercan, G.: Evaluating cross-lingual textual similarity on dictionary alignment problem. Lang. Res. Eval. 1–20 (2020)
Bhattacharya, P., et al.: Using communities of words derived from multilingual word vectors for cross-language information retrieval in Indian languages. ACM Trans. Asian Low Res. Lang. Inf. Process. 18(1), 1–27 (2018)
Çakal, Ö.Ö., Mahdavi, M., Abedjan, Z.: CLRL: feature engineering for cross-language record linkage. In: EDBT, pp. 678–681 (2019)
Chandra, G., Dwivedi, S.K.: Query expansion for effective retrieval results of hindi–english cross-lingual IR. Appl. Artif. Intell. 33(7), 567–593 (2019)
Rekabsaz, N., et al.: Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian. Computation and Language, Cornell University Library, arXiv.org > cs > arXiv:1711.06196, (2017)
Mohamed, E., Elmougy, S., Aref, M.: Toward multi-lingual information retrieval system based on internet linguistic diversity measurement. Ain Shams Eng. J. 10(3), 489–497 (2019)
Cheung, P., Fung, P.: Translation disambiguation in mixed language queries. Mach. Transl. 18, 251–273 (2004)
Lacerra, C., Bevilacqua, M., Pasini, T., Navigli, R.: CSI: a coarse sense inventory for 85% word sense disambiguation. In: AAAI, pp. 8123–8130 (2020)
Wang, Y., Yin, F., Liu, J., Tosato, M.: Automatic construction of domain sentiment lexicon for semantic disambiguation. Multi. Tools Appl. 79, 22355–22373 (2020)
Rosenfeld, R.: A Corpus-Based Approach to Language Learning, Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA (1995)
Dagan, I., Alon, I.: Word sense disambiguation using a second language monolingual corpus. Comput. Linguist. 20, 564–596 (1994)
Zagibalov, Taras, Carroll, J.: Automatic seed word selection for unsupervised sentiment classification of Chinese text. In: COLING 2008: Proceedings of the 22nd International Conference on Computational Linguistics. ACL, pp. 1073–1080, Morristown, NJ, USA (2008)
Butnaru, A.M., Ionescu, R.T.: ShotgunWSD 2.0: Aan improved algorithm for global word sense disambiguation. IEEE Access, 7, 120961–120975 (2019)
Li, Z.H.I., Yang, F.A.N., Luo, Y.: Context embedding based on bi-LSTM in semi-supervised biomedical word sense disambiguation. IEEE Access 7, 72928–72935 (2019)
https://www.techopedia.com/definition/26136/statistical-mean
Christof, M. and Bonnie, J. D.: Iterative translation disambiguation for cross-language information retrieval. In: Proceedings of the 28th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp 520–527 (2005)
http://ltrc.iiit.ac.in/onlineServices/Dictionaries/Dict_Frame.html
Bajpai, P., Verma, P., Abbas, S.Q.: Two level disambiguation model for query translation. Int. J. Electr. Comput. Eng. (IJECE) 8(5) (2018)
Spink, A., Jansen, B., Blakely, C., Koshman, S.: A study of results overlap and uniqueness among major web search engines. Inf. Process. Manage. 42(5), 1379–1391 (2006)
Bajpai, P., Verma, P., Abbas, S.Q.: English- Hindi cross language information retrieval system: query perspective. J. Comput. Sci. 14(5), 705–713 (2018)
Jarvelin, K., Kekalainen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Maurya, P. (2021). English-Hindi Cross Language Query Translation and Disambiguation Using Most Salient Seed Word. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds) Intelligent Systems Design and Applications. ISDA 2020. Advances in Intelligent Systems and Computing, vol 1351. Springer, Cham. https://doi.org/10.1007/978-3-030-71187-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-71187-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71186-3
Online ISBN: 978-3-030-71187-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)