Skip to main content

Human and Machine Judgements for Russian Semantic Relatedness

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2016)

Abstract

Semantic relatedness of terms represents similarity of meaning by a numerical score. On the one hand, humans easily make judgements about semantic relatedness. On the other hand, this kind of information is useful in language processing systems. While semantic relatedness has been extensively studied for English using numerous language resources, such as associative norms, human judgements and datasets generated from lexical databases, no evaluation resources of this kind have been available for Russian to date. Our contribution addresses this problem. We present five language resources of different scale and purpose for Russian semantic relatedness, each being a list of triples \(({word}_{i}, {word}_{j}, {similarity}_{ij}\)). Four of them are designed for evaluation of systems for computing semantic relatedness, complementing each other in terms of the semantic relation type they represent. These benchmarks were used to organise a shared task on Russian semantic relatedness, which attracted 19 teams. We use one of the best approaches identified in this competition to generate the fifth high-coverage resource, the first open distributional thesaurus of Russian. Multiple evaluations of this thesaurus, including a large-scale crowdsourcing study involving native speakers, indicate its high accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The HJ dataset was first released in November 2014 and first published in June 2015, while the SimLex-999 was first published December 2015.

  2. 2.

    http://wordvectors.org/suite.php.

  3. 3.

    http://russe.nlpub.ru.

  4. 4.

    Annotation guidelines for the HJ dataset: http://russe.nlpub.ru/task/annotate.txt.

  5. 5.

    The associations were sampled from the sociation.org database in July 2014.

  6. 6.

    Annotation guidelines are available at http://crowd.russe.nlpub.ru.

  7. 7.

    http://mtsar.nlpub.org.

  8. 8.

    http://russe.nlpub.ru/downloads.

References

  1. Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)

    Article  MATH  Google Scholar 

  2. Pedersen, T., Pakhomov, S.V., Patwardhan, S., Chute, C.G.: Measures of semantic similarity and relatedness in the biomedical domain. J. Biomed. Inform. 40(3), 288–299 (2007)

    Article  Google Scholar 

  3. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 1606–1611. Morgan Kaufmann Publishers Inc. (2007)

    Google Scholar 

  4. Batet, M., Sánchez, D., Valls, A.: An ontology-based measure to compute semantic similarity in biomedicine. J. Biomed. Inform. 44(1), 118–125 (2011)

    Article  Google Scholar 

  5. Bär, D., Biemann, C., Gurevych, I., Zesch, T.: UKP: computing semantic textual similarity by combining multiple content similarity measures. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol. 1: Proceedings of the Main Conference and the Shared Task, vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, SemEval 2012, pp. 435–440. Association for Computational Linguistics (2012)

    Google Scholar 

  6. Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.: Text relatedness based on a word thesaurus. J. Artif. Intell. Res. 37(1), 1–40 (2010)

    MATH  Google Scholar 

  7. Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 241–257. Springer, Heidelberg (2003). doi:10.1007/3-540-36456-0_24

    Chapter  Google Scholar 

  8. Hsu, M.-H., Tsai, M.-F., Chen, H.-H.: Query expansion with ConceptNet and WordNet: an intrinsic comparison. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 1–13. Springer, Heidelberg (2006). doi:10.1007/11880592_1

    Chapter  Google Scholar 

  9. Panchenko, A.: Similarity measures for semantic relation extraction. Ph.D. thesis, UCLouvain (2013)

    Google Scholar 

  10. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  11. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)

    Article  Google Scholar 

  12. Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Processes 6(1), 1–28 (1991)

    Article  Google Scholar 

  13. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, WWW 2001, pp. 406–414. ACM (2001)

    Google Scholar 

  14. Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 19–27. Association for Computational Linguistics (2009)

    Google Scholar 

  15. Gurevych, I.: Using the structure of a conceptual network in computing semantic relatedness. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 767–778. Springer, Heidelberg (2005). doi:10.1007/11562214_67

    Chapter  Google Scholar 

  16. Hassan, S., Mihalcea, R.: Cross-lingual semantic relatedness using encyclopedic knowledge. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, vol. 3, pp. 1192–1201. Association for Computational Linguistics (2009)

    Google Scholar 

  17. Postma, M., Vossen, P.: What implementation and translation teach us: the case of semantic similarity measures in WordNets. In: Proceedings of the Seventh Global Wordnet Conference, pp. 133–141 (2014)

    Google Scholar 

  18. Jin, P., Wu, Y.: Semeval-2012 task 4: evaluating Chinese word similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol. 1: Proceedings of the Main Conference and the Shared Task, vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, SemEval 2012, pp. 374–377. Association for Computational Linguistics (2012)

    Google Scholar 

  19. Yang, D., Powers, D.M.W.: Verb similarity on the taxonomy of WordNet. In: Proceedings of the Third International WordNet Conference – GWC 2006, Masaryk University, pp. 121–128 (2006)

    Google Scholar 

  20. Meyer, C.M., Gurevych, I.: To exhibit is not to loiter: a multilingual, sense-disambiguated wiktionary for measuring verb similarity. In: Proceedings of COLING 2012: Technical Papers, The COLING 2012 Organizing Committee, pp. 1763–1780 (2012)

    Google Scholar 

  21. Hill, F., Reichart, R., Korhonen, A.: SimLex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)

    Article  MathSciNet  Google Scholar 

  22. Bruni, E., Tran, N.K., Baroni, M.: Multimodal distributional semantics. J. Artif. Intell. Res. 49(1), 1–47 (2014)

    MathSciNet  MATH  Google Scholar 

  23. Ferraresi, A., Zanchetta, E., Bernardini, S., Baroni, M.: Introducing and evaluating ukWaC, a very large web-derived corpus of English. In: Proceedings of the 4th Web as Corpus Workshop (WAC-4): Can we beat Google? pp. 47–54 (2008)

    Google Scholar 

  24. Faruqui, M., Dyer, C.: Community evaluation and exchange of word vectors at wordvectors.org. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 19–24. Association for Computational Linguistics (2014)

    Google Scholar 

  25. Baroni, M., Lenci, A.: How we BLESSed distributional semantic evaluation. In: Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, GEMS 2011, pp. 1–10. Association for Computational Linguistics (2011)

    Google Scholar 

  26. Van de Cruys, T.: Mining for meaning: the extraction of lexicosemantic knowledge from text. Ph.D. thesis, University of Groningen (2010)

    Google Scholar 

  27. Biemann, C., Riedl, M.: Text: now in 2D! A framework for lexical expansion with contextual similarity. J. Lang. Model. 1(1), 55–95 (2013)

    Article  Google Scholar 

  28. Sahlgren, M.: The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Stockholm University (2006)

    Google Scholar 

  29. Griffiths, T.L., Steyvers, M.: Prediction and semantic association. In: Advances in Neural Information Processing Systems, vol. 15, pp. 11–18. MIT Press (2003)

    Google Scholar 

  30. Rapp, R., Zock, M.: The CogALex-IV shared task on the lexical access problem. In: Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex), pp. 1–14. Association for Computational Linguistics and Dublin City University (2014)

    Google Scholar 

  31. Kiss, G.R., Armstrong, C., Milroy, R., Piper, J.: An associative thesaurus of English and its computer analysis. In: The Computer and Literary Studies, pp. 153–165. Edinburgh University Press (1973)

    Google Scholar 

  32. Panchenko, A., Loukachevitch, N.V., Ustalov, D., Paperno, D., Meyer, C.M., Konstantinova, N.: RUSSE: the first workshop on Russian semantic similarity. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual conference “Dialogue”, vol. 2, pp. 89–105. RGGU (2015)

    Google Scholar 

  33. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 1995, vol. 1, pp. 448–453. Morgan Kaufmann Publishers Inc. (1995)

    Google Scholar 

  34. Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, pp. 296–304. Morgan Kaufmann Publishers Inc. (1998)

    Google Scholar 

  35. Patwardhan, S., Pedersen, T.: Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the Workshop on Making Sense of Sense: Bringing Psycholinguistics and Computational Linguistics Together, pp. 1–8. Association for Computational Linguistics (2006)

    Google Scholar 

  36. Zesch, T., Müller, C., Gurevych, I.: Using Wiktionary for computing semantic relatedness. In: Proceedings of the 23rd National Conference on Artificial Intelligence, AAAI 2008, vol. 2, pp. 861–866. AAAI Press (2008)

    Google Scholar 

  37. Loukachevitch, N.V., Dobrov, B.V., Chetviorkin, I.I.: RuThes-Lite, a publicly available version of Thesaurus of Russian language RuThes. In: Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue”, pp. 340–349. RGGU (2014)

    Google Scholar 

  38. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates, Inc. (2013)

    Google Scholar 

  39. Arefyev, N., Panchenko, A., Lukanin, A., Lesota, O., Romanov, P.: Evaluating three corpus-based semantic similarity systems for Russian. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual conference “Dialogue”, vol. 2, pp. 106–118. RGGU (2015)

    Google Scholar 

  40. Lopukhin, K.A., Lopukhina, A.A., Nosyrev, G.V.: The impact of different vector space models and supplementary techniques on Russian semantic similarity task. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual conference “Dialogue”, vol. 2, pp. 115–127. RGGU (2015)

    Google Scholar 

  41. Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Heidelberg (2015). doi:10.1007/978-3-319-26123-2_31

    Chapter  Google Scholar 

  42. Ustalov, D.: A crowdsourcing engine for mechanized labor. In: Proceedings of the Institute for System Programming, vol. 27, no. 3, pp. 351–364 (2015)

    Google Scholar 

Download references

Acknowledgements

We would like to acknowledge several funding organisations that partially supported this research. Dmitry Ustalov was supported by the Russian Foundation for Basic Research (RFBR) according to the research project no. 16-37-00354. Denis Paperno was supported by the European Research Council (ERC) 2011 Starting Independent Research Grant no. 283554 (COMPOSES). Natalia Loukachevitch was supported by Russian Foundation for Humanities (RFH), grant no. 15-04-12017. Alexander Panchenko was supported by the Deutsche Forschungsgemeinschaft (DFG) under the project “Joining Ontologies and Semantics Induced from Text (JOIN-T)”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Panchenko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Panchenko, A. et al. (2017). Human and Machine Judgements for Russian Semantic Relatedness. In: Ignatov, D., et al. Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-319-52920-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52920-2_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52919-6

  • Online ISBN: 978-3-319-52920-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics