Human and Machine Judgements for Russian Semantic Relatedness

  • Alexander Panchenko
  • Dmitry Ustalov
  • Nikolay Arefyev
  • Denis Paperno
  • Natalia Konstantinova
  • Natalia Loukachevitch
  • Chris Biemann
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 661)

Abstract

Semantic relatedness of terms represents similarity of meaning by a numerical score. On the one hand, humans easily make judgements about semantic relatedness. On the other hand, this kind of information is useful in language processing systems. While semantic relatedness has been extensively studied for English using numerous language resources, such as associative norms, human judgements and datasets generated from lexical databases, no evaluation resources of this kind have been available for Russian to date. Our contribution addresses this problem. We present five language resources of different scale and purpose for Russian semantic relatedness, each being a list of triples \(({word}_{i}, {word}_{j}, {similarity}_{ij}\)). Four of them are designed for evaluation of systems for computing semantic relatedness, complementing each other in terms of the semantic relation type they represent. These benchmarks were used to organise a shared task on Russian semantic relatedness, which attracted 19 teams. We use one of the best approaches identified in this competition to generate the fifth high-coverage resource, the first open distributional thesaurus of Russian. Multiple evaluations of this thesaurus, including a large-scale crowdsourcing study involving native speakers, indicate its high accuracy.

Keywords

Semantic similarity Semantic relatedness Evaluation Distributional thesaurus Crowdsourcing Language resources 

References

  1. 1.
    Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)CrossRefMATHGoogle Scholar
  2. 2.
    Pedersen, T., Pakhomov, S.V., Patwardhan, S., Chute, C.G.: Measures of semantic similarity and relatedness in the biomedical domain. J. Biomed. Inform. 40(3), 288–299 (2007)CrossRefGoogle Scholar
  3. 3.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 1606–1611. Morgan Kaufmann Publishers Inc. (2007)Google Scholar
  4. 4.
    Batet, M., Sánchez, D., Valls, A.: An ontology-based measure to compute semantic similarity in biomedicine. J. Biomed. Inform. 44(1), 118–125 (2011)CrossRefGoogle Scholar
  5. 5.
    Bär, D., Biemann, C., Gurevych, I., Zesch, T.: UKP: computing semantic textual similarity by combining multiple content similarity measures. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol. 1: Proceedings of the Main Conference and the Shared Task, vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, SemEval 2012, pp. 435–440. Association for Computational Linguistics (2012)Google Scholar
  6. 6.
    Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.: Text relatedness based on a word thesaurus. J. Artif. Intell. Res. 37(1), 1–40 (2010)MATHGoogle Scholar
  7. 7.
    Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 241–257. Springer, Heidelberg (2003). doi:10.1007/3-540-36456-0_24 CrossRefGoogle Scholar
  8. 8.
    Hsu, M.-H., Tsai, M.-F., Chen, H.-H.: Query expansion with ConceptNet and WordNet: an intrinsic comparison. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 1–13. Springer, Heidelberg (2006). doi:10.1007/11880592_1 CrossRefGoogle Scholar
  9. 9.
    Panchenko, A.: Similarity measures for semantic relation extraction. Ph.D. thesis, UCLouvain (2013)Google Scholar
  10. 10.
    Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  11. 11.
    Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)CrossRefGoogle Scholar
  12. 12.
    Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Processes 6(1), 1–28 (1991)CrossRefGoogle Scholar
  13. 13.
    Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, WWW 2001, pp. 406–414. ACM (2001)Google Scholar
  14. 14.
    Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 19–27. Association for Computational Linguistics (2009)Google Scholar
  15. 15.
    Gurevych, I.: Using the structure of a conceptual network in computing semantic relatedness. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 767–778. Springer, Heidelberg (2005). doi:10.1007/11562214_67 CrossRefGoogle Scholar
  16. 16.
    Hassan, S., Mihalcea, R.: Cross-lingual semantic relatedness using encyclopedic knowledge. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, vol. 3, pp. 1192–1201. Association for Computational Linguistics (2009)Google Scholar
  17. 17.
    Postma, M., Vossen, P.: What implementation and translation teach us: the case of semantic similarity measures in WordNets. In: Proceedings of the Seventh Global Wordnet Conference, pp. 133–141 (2014)Google Scholar
  18. 18.
    Jin, P., Wu, Y.: Semeval-2012 task 4: evaluating Chinese word similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics, vol. 1: Proceedings of the Main Conference and the Shared Task, vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, SemEval 2012, pp. 374–377. Association for Computational Linguistics (2012)Google Scholar
  19. 19.
    Yang, D., Powers, D.M.W.: Verb similarity on the taxonomy of WordNet. In: Proceedings of the Third International WordNet Conference – GWC 2006, Masaryk University, pp. 121–128 (2006)Google Scholar
  20. 20.
    Meyer, C.M., Gurevych, I.: To exhibit is not to loiter: a multilingual, sense-disambiguated wiktionary for measuring verb similarity. In: Proceedings of COLING 2012: Technical Papers, The COLING 2012 Organizing Committee, pp. 1763–1780 (2012)Google Scholar
  21. 21.
    Hill, F., Reichart, R., Korhonen, A.: SimLex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Bruni, E., Tran, N.K., Baroni, M.: Multimodal distributional semantics. J. Artif. Intell. Res. 49(1), 1–47 (2014)MathSciNetMATHGoogle Scholar
  23. 23.
    Ferraresi, A., Zanchetta, E., Bernardini, S., Baroni, M.: Introducing and evaluating ukWaC, a very large web-derived corpus of English. In: Proceedings of the 4th Web as Corpus Workshop (WAC-4): Can we beat Google? pp. 47–54 (2008)Google Scholar
  24. 24.
    Faruqui, M., Dyer, C.: Community evaluation and exchange of word vectors at wordvectors.org. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 19–24. Association for Computational Linguistics (2014)Google Scholar
  25. 25.
    Baroni, M., Lenci, A.: How we BLESSed distributional semantic evaluation. In: Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, GEMS 2011, pp. 1–10. Association for Computational Linguistics (2011)Google Scholar
  26. 26.
    Van de Cruys, T.: Mining for meaning: the extraction of lexicosemantic knowledge from text. Ph.D. thesis, University of Groningen (2010)Google Scholar
  27. 27.
    Biemann, C., Riedl, M.: Text: now in 2D! A framework for lexical expansion with contextual similarity. J. Lang. Model. 1(1), 55–95 (2013)CrossRefGoogle Scholar
  28. 28.
    Sahlgren, M.: The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Stockholm University (2006)Google Scholar
  29. 29.
    Griffiths, T.L., Steyvers, M.: Prediction and semantic association. In: Advances in Neural Information Processing Systems, vol. 15, pp. 11–18. MIT Press (2003)Google Scholar
  30. 30.
    Rapp, R., Zock, M.: The CogALex-IV shared task on the lexical access problem. In: Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex), pp. 1–14. Association for Computational Linguistics and Dublin City University (2014)Google Scholar
  31. 31.
    Kiss, G.R., Armstrong, C., Milroy, R., Piper, J.: An associative thesaurus of English and its computer analysis. In: The Computer and Literary Studies, pp. 153–165. Edinburgh University Press (1973)Google Scholar
  32. 32.
    Panchenko, A., Loukachevitch, N.V., Ustalov, D., Paperno, D., Meyer, C.M., Konstantinova, N.: RUSSE: the first workshop on Russian semantic similarity. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual conference “Dialogue”, vol. 2, pp. 89–105. RGGU (2015)Google Scholar
  33. 33.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 1995, vol. 1, pp. 448–453. Morgan Kaufmann Publishers Inc. (1995)Google Scholar
  34. 34.
    Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, pp. 296–304. Morgan Kaufmann Publishers Inc. (1998)Google Scholar
  35. 35.
    Patwardhan, S., Pedersen, T.: Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the Workshop on Making Sense of Sense: Bringing Psycholinguistics and Computational Linguistics Together, pp. 1–8. Association for Computational Linguistics (2006)Google Scholar
  36. 36.
    Zesch, T., Müller, C., Gurevych, I.: Using Wiktionary for computing semantic relatedness. In: Proceedings of the 23rd National Conference on Artificial Intelligence, AAAI 2008, vol. 2, pp. 861–866. AAAI Press (2008)Google Scholar
  37. 37.
    Loukachevitch, N.V., Dobrov, B.V., Chetviorkin, I.I.: RuThes-Lite, a publicly available version of Thesaurus of Russian language RuThes. In: Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue”, pp. 340–349. RGGU (2014)Google Scholar
  38. 38.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates, Inc. (2013)Google Scholar
  39. 39.
    Arefyev, N., Panchenko, A., Lukanin, A., Lesota, O., Romanov, P.: Evaluating three corpus-based semantic similarity systems for Russian. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual conference “Dialogue”, vol. 2, pp. 106–118. RGGU (2015)Google Scholar
  40. 40.
    Lopukhin, K.A., Lopukhina, A.A., Nosyrev, G.V.: The impact of different vector space models and supplementary techniques on Russian semantic similarity task. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual conference “Dialogue”, vol. 2, pp. 115–127. RGGU (2015)Google Scholar
  41. 41.
    Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Heidelberg (2015). doi:10.1007/978-3-319-26123-2_31 CrossRefGoogle Scholar
  42. 42.
    Ustalov, D.: A crowdsourcing engine for mechanized labor. In: Proceedings of the Institute for System Programming, vol. 27, no. 3, pp. 351–364 (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Alexander Panchenko
    • 1
  • Dmitry Ustalov
    • 2
  • Nikolay Arefyev
    • 3
  • Denis Paperno
    • 4
  • Natalia Konstantinova
    • 5
  • Natalia Loukachevitch
    • 3
  • Chris Biemann
    • 1
  1. 1.TU DarmstadtDarmstadtGermany
  2. 2.Ural Federal UniversityYekaterinburgRussia
  3. 3.Moscow State UniversityMoscowRussia
  4. 4.University of TrentoRoveretoItaly
  5. 5.University of WolverhamptonWolverhamptonUK

Personalised recommendations