Abstract
In this paper, we compare several approaches for determining the most frequent senses of ambiguous words for Russian. We compare several approaches (frequency-based, topic models, information-retrieval and embedding-based) and consider two representation forms of information about multiword expressions described in RuThes. We found that the information-retrieval approach is better than the method based on probabilistic topic models. The best results are obtained with the application of distributional vector representations with thesaurus path weighing.
Keywords
- Lexical ambiguity
- Most frequent sense
- Thesaurus
This work was partially supported by Russian Science Foundation, grant N16-18-02074.
This is a preview of subscription content, access via your institution.
Buying options
References
Agirre, E., Soroa, A.: SemEval-2007 task 02: evaluating word sense induction and discrimination systems. In: Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 7–12 (2007)
Bhingardive, S., Singh, D., Murthy, R.: Unsupervised most frequent sense detection using word embeddings. In: Proceedings of NAACL-2015 (2015)
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Kobritsov, B., Lyashevskaya, O., Shemanayeva, O.: Surface filters for solving semantic homonymy in the textual case. In: Proceedings of International Conference on Dialogue-2005 (2005)
Koeling, R., McCarthy, D., Carroll, J.: Domain-specific sense distributions and predominant sense acquisition. In: Proceedings of EMNLP-2005, pp. 419–426 (2005)
Landes, S., Leacock, C., Tengi, R.I.: Building semantic concordances. WordNet: Electron. Lexical Database 199(216), 199–216 (1998)
Lashevskaja, O., Mitrofanova, O.: Disambiguation of taxonomy markers in context: Russian nouns. In: Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009), pp. 111–117 (2009)
Lau, J.H., Cook, P., McCarthy, D., Newman, D., Baldwin, T.: Word sense induction for novel sense detection. In: Proceedings of the EACL-2012, pp. 591–601. Association for Computational Linguistics (2012)
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of ACL-1998, pp. 768–774. Association for Computational Linguistics (1998)
Lopukhin, K., Iomdin, B., Lopukhina, A.: Word sense induction for Russian: deep study and comparison with dictionaries. In: Proceedings of International Conference on Dialogue-2017, vol. 1, pp. 121–134 (2017)
Lopukhin, K., Lopukhina, A.: Automated word sense frequency estimation for Russian nouns. In: Quantitative Approaches to the Russian Language, pp. 89–104. Routledge (2017)
Lopukhina, A., Lopukhin, K.: Word sense frequency estimation for Russian: verbs, adjectives, and different dictionaries. In: Proceedings of eLex 2017 Conference, pp. 267–280 (2017)
Loukachevitch, N., Chetviorkin, I.: Determining the most frequent senses using Russian linguistic ontology RuThes. In: Proceedings of Workshop on Semantic Resources and Semantic Annotation at NODALIDA 2015, pp. 21–27 (2015)
Loukachevitch, N., Chuiko, D.: Automatic resolution lexical ambiguity on basis of thesaurus knowledge, pp. 108–117 (2007)
Loukachevitch, N., Dobrov, B., Chetviorkin, I.: RuThes-Lite, a publicly available version of thesaurus of Russian language RuThes. In: Proceedings of International Conference on Dialogue-2014, vol. 2014 (2014)
Loukachevitch, N., Shevelev, A., Mozharova, V.: Testing features and methods in Russian paraphrasing task. In: Proceedings of International Conference on Dialog-2017, pp. 135–145 (2017)
McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Finding predominant word senses in untagged text (2004)
McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Unsupervised acquisition of predominant word senses. Comput. Linguist. 33(4), 553–590 (2007)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Mitra, S., Mitra, R., Riedl, M., Biemann, C., Mukherjee, A., Goyal, P.: That’s sick dude!: automatic identification of word sense change across different timescales. arXiv preprint arXiv:1405.4392 (2014)
Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. (CSUR) 41(2), 10 (2009)
Navigli, R., Jurgens, D., Vannella, D.: SemEval-2013 task 12: multilingual word sense disambiguation. In: Second Joint Conference on Lexical and Computational Semantics SemEval 2013, vol. 2, pp. 222–231 (2013)
Panchenko, A., et al.: RUSSE’2018: a shared task on word sense induction for the Russian language. In: Proceedings of International Conference Dialogue-2018, pp. 547–564 (2018)
Ustalov, D., Panchenko, A., Biemann, C.: Watset: automatic induction of synsets from a graph of synonyms. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1579–1590 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Loukachevitch, N., Mischenko, N. (2018). Evaluation of Approaches for Most Frequent Sense Identification in Russian. In: , et al. Analysis of Images, Social Networks and Texts. AIST 2018. Lecture Notes in Computer Science(), vol 11179. Springer, Cham. https://doi.org/10.1007/978-3-030-11027-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-11027-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11026-0
Online ISBN: 978-3-030-11027-7
eBook Packages: Computer ScienceComputer Science (R0)