Abstract
Semantic relatedness of terms represents similarity of meaning by a numerical score. On the one hand, humans easily make judgements about semantic relatedness. On the other hand, this kind of information is useful in language processing systems. While semantic relatedness has been extensively studied for English using numerous language resources, such as associative norms, human judgements and datasets generated from lexical databases, no evaluation resources of this kind have been available for Russian to date. Our contribution addresses this problem. We present five language resources of different scale and purpose for Russian semantic relatedness, each being a list of triples \(({word}_{i}, {word}_{j}, {similarity}_{ij}\)). Four of them are designed for evaluation of systems for computing semantic relatedness, complementing each other in terms of the semantic relation type they represent. These benchmarks were used to organise a shared task on Russian semantic relatedness, which attracted 19 teams. We use one of the best approaches identified in this competition to generate the fifth high-coverage resource, the first open distributional thesaurus of Russian. Multiple evaluations of this thesaurus, including a large-scale crowdsourcing study involving native speakers, indicate its high accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The HJ dataset was first released in November 2014 and first published in June 2015, while the SimLex-999 was first published December 2015.
- 2.
- 3.
- 4.
Annotation guidelines for the HJ dataset: http://russe.nlpub.ru/task/annotate.txt.
- 5.
The associations were sampled from the sociation.org database in July 2014.
- 6.
Annotation guidelines are available at http://crowd.russe.nlpub.ru.
- 7.
- 8.
References
Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)
Pedersen, T., Pakhomov, S.V., Patwardhan, S., Chute, C.G.: Measures of semantic similarity and relatedness in the biomedical domain. J. Biomed. Inform. 40(3), 288–299 (2007)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 1606–1611. Morgan Kaufmann Publishers Inc. (2007)
Batet, M., Sánchez, D., Valls, A.: An ontology-based measure to compute semantic similarity in biomedicine. J. Biomed. Inform. 44(1), 118–125 (2011)
Bär, D., Biemann, C., Gurevych, I., Zesch, T.: UKP: computing semantic textual similarity by combining multiple content similarity measures. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol. 1: Proceedings of the Main Conference and the Shared Task, vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, SemEval 2012, pp. 435–440. Association for Computational Linguistics (2012)
Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.: Text relatedness based on a word thesaurus. J. Artif. Intell. Res. 37(1), 1–40 (2010)
Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 241–257. Springer, Heidelberg (2003). doi:10.1007/3-540-36456-0_24
Hsu, M.-H., Tsai, M.-F., Chen, H.-H.: Query expansion with ConceptNet and WordNet: an intrinsic comparison. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 1–13. Springer, Heidelberg (2006). doi:10.1007/11880592_1
Panchenko, A.: Similarity measures for semantic relation extraction. Ph.D. thesis, UCLouvain (2013)
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Processes 6(1), 1–28 (1991)
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, WWW 2001, pp. 406–414. ACM (2001)
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 19–27. Association for Computational Linguistics (2009)
Gurevych, I.: Using the structure of a conceptual network in computing semantic relatedness. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 767–778. Springer, Heidelberg (2005). doi:10.1007/11562214_67
Hassan, S., Mihalcea, R.: Cross-lingual semantic relatedness using encyclopedic knowledge. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, vol. 3, pp. 1192–1201. Association for Computational Linguistics (2009)
Postma, M., Vossen, P.: What implementation and translation teach us: the case of semantic similarity measures in WordNets. In: Proceedings of the Seventh Global Wordnet Conference, pp. 133–141 (2014)
Jin, P., Wu, Y.: Semeval-2012 task 4: evaluating Chinese word similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol. 1: Proceedings of the Main Conference and the Shared Task, vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, SemEval 2012, pp. 374–377. Association for Computational Linguistics (2012)
Yang, D., Powers, D.M.W.: Verb similarity on the taxonomy of WordNet. In: Proceedings of the Third International WordNet Conference – GWC 2006, Masaryk University, pp. 121–128 (2006)
Meyer, C.M., Gurevych, I.: To exhibit is not to loiter: a multilingual, sense-disambiguated wiktionary for measuring verb similarity. In: Proceedings of COLING 2012: Technical Papers, The COLING 2012 Organizing Committee, pp. 1763–1780 (2012)
Hill, F., Reichart, R., Korhonen, A.: SimLex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)
Bruni, E., Tran, N.K., Baroni, M.: Multimodal distributional semantics. J. Artif. Intell. Res. 49(1), 1–47 (2014)
Ferraresi, A., Zanchetta, E., Bernardini, S., Baroni, M.: Introducing and evaluating ukWaC, a very large web-derived corpus of English. In: Proceedings of the 4th Web as Corpus Workshop (WAC-4): Can we beat Google? pp. 47–54 (2008)
Faruqui, M., Dyer, C.: Community evaluation and exchange of word vectors at wordvectors.org. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 19–24. Association for Computational Linguistics (2014)
Baroni, M., Lenci, A.: How we BLESSed distributional semantic evaluation. In: Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, GEMS 2011, pp. 1–10. Association for Computational Linguistics (2011)
Van de Cruys, T.: Mining for meaning: the extraction of lexicosemantic knowledge from text. Ph.D. thesis, University of Groningen (2010)
Biemann, C., Riedl, M.: Text: now in 2D! A framework for lexical expansion with contextual similarity. J. Lang. Model. 1(1), 55–95 (2013)
Sahlgren, M.: The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Stockholm University (2006)
Griffiths, T.L., Steyvers, M.: Prediction and semantic association. In: Advances in Neural Information Processing Systems, vol. 15, pp. 11–18. MIT Press (2003)
Rapp, R., Zock, M.: The CogALex-IV shared task on the lexical access problem. In: Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex), pp. 1–14. Association for Computational Linguistics and Dublin City University (2014)
Kiss, G.R., Armstrong, C., Milroy, R., Piper, J.: An associative thesaurus of English and its computer analysis. In: The Computer and Literary Studies, pp. 153–165. Edinburgh University Press (1973)
Panchenko, A., Loukachevitch, N.V., Ustalov, D., Paperno, D., Meyer, C.M., Konstantinova, N.: RUSSE: the first workshop on Russian semantic similarity. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual conference “Dialogue”, vol. 2, pp. 89–105. RGGU (2015)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 1995, vol. 1, pp. 448–453. Morgan Kaufmann Publishers Inc. (1995)
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, pp. 296–304. Morgan Kaufmann Publishers Inc. (1998)
Patwardhan, S., Pedersen, T.: Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the Workshop on Making Sense of Sense: Bringing Psycholinguistics and Computational Linguistics Together, pp. 1–8. Association for Computational Linguistics (2006)
Zesch, T., Müller, C., Gurevych, I.: Using Wiktionary for computing semantic relatedness. In: Proceedings of the 23rd National Conference on Artificial Intelligence, AAAI 2008, vol. 2, pp. 861–866. AAAI Press (2008)
Loukachevitch, N.V., Dobrov, B.V., Chetviorkin, I.I.: RuThes-Lite, a publicly available version of Thesaurus of Russian language RuThes. In: Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue”, pp. 340–349. RGGU (2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates, Inc. (2013)
Arefyev, N., Panchenko, A., Lukanin, A., Lesota, O., Romanov, P.: Evaluating three corpus-based semantic similarity systems for Russian. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual conference “Dialogue”, vol. 2, pp. 106–118. RGGU (2015)
Lopukhin, K.A., Lopukhina, A.A., Nosyrev, G.V.: The impact of different vector space models and supplementary techniques on Russian semantic similarity task. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual conference “Dialogue”, vol. 2, pp. 115–127. RGGU (2015)
Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Heidelberg (2015). doi:10.1007/978-3-319-26123-2_31
Ustalov, D.: A crowdsourcing engine for mechanized labor. In: Proceedings of the Institute for System Programming, vol. 27, no. 3, pp. 351–364 (2015)
Acknowledgements
We would like to acknowledge several funding organisations that partially supported this research. Dmitry Ustalov was supported by the Russian Foundation for Basic Research (RFBR) according to the research project no. 16-37-00354. Denis Paperno was supported by the European Research Council (ERC) 2011 Starting Independent Research Grant no. 283554 (COMPOSES). Natalia Loukachevitch was supported by Russian Foundation for Humanities (RFH), grant no. 15-04-12017. Alexander Panchenko was supported by the Deutsche Forschungsgemeinschaft (DFG) under the project “Joining Ontologies and Semantics Induced from Text (JOIN-T)”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Panchenko, A. et al. (2017). Human and Machine Judgements for Russian Semantic Relatedness. In: Ignatov, D., et al. Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-319-52920-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-52920-2_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52919-6
Online ISBN: 978-3-319-52920-2
eBook Packages: Computer ScienceComputer Science (R0)