RuThes Thesaurus in Detecting Russian Paraphrases

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 789)


In this paper we study the contribution of semantic features to the detection of Russian paraphrases. The features were calculated on the Russian Thesaurus RuThes. First, we applied RuThes synonyms in clustering news articles, many of which had been created with rewriting (that is paraphrasing) of source news, and found significant improvement. Second, we applied several semantic similarity measures proposed for English thesaurus WordNet to RuThes thesaurus and utilized them for detecting Russian paraphrased sentences.


Clustering News Articles Feature Thesaurus Paraphrase Detection Ontology Synonyms Part-whole Relations 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was partially supported by Russian National Foundation, grant N16-18-02074.


  1. 1.
    Fader, A., Zettlemoyer, L.S., Etzioni, O.: Paraphrase-driven learning for open question answering. In: Proceedings of ACL-2013, pp. 1608–1618 (2013)Google Scholar
  2. 2.
    Vossen, P., Rigau, G., Serafini, L., Stouten, P., Irving, F., van Hage, W.R.: NewsReader: recording history from daily news streams. In: Proceedings of LREC-2014, pp. 2000–2007 (2014)Google Scholar
  3. 3.
    Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data Book, pp. 43–76. Springer, Boston (2012). Scholar
  4. 4.
    Loukachevitch, N., Alekseev, A.: Summarizing news clusters on the basis of thematic chains. In: Proceedings of LREC-2012, pp. 1600–1607 (2012)Google Scholar
  5. 5.
    Clough, P., Gaizauskas, R., Piao, S., Wilks, Y.: METER: MEasuring TExt reuse. In: Proceedings of the 40th Anniversary Meeting for the Association for Computational Linguistics (ACL 2002), pp. 152–159 (2002)Google Scholar
  6. 6.
    Marton, Y., Callison-Burch, C., Resnik, P.: Improved statistical machine translation using monolingually-derived paraphrases. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP-2009, pp. 381–390 (2009)Google Scholar
  7. 7.
    Dolan, W.B., Quirk, C., Brockett, C.: Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In: Proceedings of the 20th International Conference on Computational Linguistics, Coling-2004, Geneva, Switzerland (2004)Google Scholar
  8. 8.
    Pavlick, E., Rastogi, P., Ganitkevitch, J., Durme, B., Callison-Burch, C.: PPDB 2.0: better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In: Proceedings of ACL-2015 and the 7th International Joint Conference on Natural Language Processing, vol. 2, pp. 425–430 (2015)Google Scholar
  9. 9.
    Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: a pilot on semantic textual similarity. In: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 385–393. Association for Computational Linguistics (2012)Google Scholar
  10. 10.
    Agirre, E., Banea, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Mihalcea, R., Wiebe, J.: Semeval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of SemEval, pp. 497–511 (2016)Google Scholar
  11. 11.
    Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. The MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  12. 12.
    Han, L., Kashyap, A., Finin, T., Mayfield, J., Weese, J.: UMBC EBIQUITY-CORE: semantic textual similarity systems. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, Atlanta, Georgia, USA, June, pp. 44–52. Association for Computational Linguistics (2013)Google Scholar
  13. 13.
    Loukachevitch, N., Dobrov, B.: RuThes linguistic ontology vs. Russian wordnets. In: Proceedings of Global WordNet Conference GWC-2014, pp. 154–162 (2014)Google Scholar
  14. 14.
    Pronoza, E., Yagunova, E., Pronoza, A.: Construction of a Russian paraphrase corpus: unsupervised paraphrase extraction. In: Braslavski, P., Markov, I., Pardalos, P., Volkovich, Y., Ignatov, D.I., Koltsov, S., Koltsova, O. (eds.) RuSSIR 2015. CCIS, vol. 573, pp. 146–157. Springer, Cham (2016). Scholar
  15. 15.
    Pivovarova, L., Pronoza, E., Yagunova, E., Pronoza, A.: ParaPhraser: Russian paraphrase corpus and shared task. In: Filchenkov, A., et al. (eds.) AINL 2017. CCIS, vol. 789, pp. 211–225. Springer, Cham (2018)CrossRefGoogle Scholar
  16. 16.
    Loukachevitch, N., Shevelev, A., Mozharova V.: Testing features and methods in Russian Paraphrasing Task. In: Proceedings of International Conference on Computational Linguistics and Intellectual Technologies Dialog 2017, vol. 1, pp. 135–145 (2017)Google Scholar
  17. 17.
    Kozareva, Z., Montoyo, A.: Paraphrase identification on the basis of supervised machine learning techniques. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 524–533. Springer, Heidelberg (2006). Scholar
  18. 18.
    Pronoza, E., Yagunova, E.: Low-level features for paraphrase identification. In: Sidorov, G., Galicia-Haro, S.N. (eds.) MICAI 2015. LNCS (LNAI), vol. 9413, pp. 59–71. Springer, Cham (2015). Scholar
  19. 19.
    Brockett, C., Dolan, W.B.: Support vector machines for paraphrase identification and corpus construction. In: Proceedings of the 3rd International Workshop on Paraphrasing, pp. 1–8 (2005)Google Scholar
  20. 20.
    Mihalcea, R., Corley, C., Strapparava C.: Corpus-based and Knowledge-based measures of text semantic similarity. In: Proceedings of the American Association for Artificial Intelligence (2006)Google Scholar
  21. 21.
    Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)Google Scholar
  22. 22.
    Bar, D., Biemann, C., Gurevych, I., Zesch, T.: UKP: computing semantic textual similarity by combining multiple content similarity measures. In: Proceedings of the 6th International Workshop on Semantic Evaluation, Held in Conjunction with the 1st Joint Conference on Lexical and Computational Semantics, pp. 435–440 (2012)Google Scholar
  23. 23.
    Rychalska, B., Pakulska, K., Chodorowska, K., Walczak, W., Andruszkiewicz, P.: Samsung Poland NLP team at SemEval-2016 Task 1: necessity for diversity; combining recursive autoencoders, wordnet and ensemble methods to measure semantic similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), San Diego, CA, USA (2016)Google Scholar
  24. 24.
    Gurevych, I., Niederlich, H.: Computing semantic relatedness in German with revised information content metrics. In: Proceedings of OntoLex 2005 - Ontologies and Lexical Resources, IJCNLP 2005 Workshop (2005)Google Scholar
  25. 25.
    Kunze, C., Lemnitzer, L.: GermaNet-representation, visualization, application. In: LREC-2002 (2002)Google Scholar
  26. 26.
    Muller, C., Gurevych, I., Muhlhauser, M.: Integrating semantic knowledge into text similarity and information retrieval. In: International Conference on Semantic Computing, ICSC 2007, pp. 257–264. IEEE (2007)Google Scholar
  27. 27.
    Loukachevitch, N.V., Dobrov, B.V., Chetviorkin, I.I.: Ruthes-lite, a publicly available version of thesaurus of Russian language ruthes. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference Dialogue-2014, Bekasovo, Russia, pp. 340–349 (2014)Google Scholar
  28. 28.
    Guarino, N.: The ontological level: revisiting 30 years of knowledge representation. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Conceptual Modeling: Foundations and Applications. LNCS, vol. 5600, pp. 52–67. Springer, Heidelberg (2009). Scholar
  29. 29.
    Loukachevitch, N., Dobrov, B.: The Sociopolitical Thesaurus as a resource for automatic document processing in Russian. Terminology 21(2), 238–263 (2015). Special issue Terminology across languages and domainsGoogle Scholar
  30. 30.
    Dobrov, B.V., Kuralenok, I., Loukachevitch, N.V., Nekrestyanov, I., Segalovich, I.: Russian information retrieval evaluation seminar. In: LREC-2004 (2004)Google Scholar
  31. 31.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefGoogle Scholar
  32. 32.
    Rokach, L., Maimon, O.: Clustering Methods. Data Mining and Knowledge Discovery Handbook, pp. 321–352. Springer, New York (2005)CrossRefGoogle Scholar
  33. 33.
    Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M., (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 226–231. AAAI Press (1996)Google Scholar
  34. 34.
    Zagoruiko, N.G.: Intellectual data analysis based on a rival similarity function. Optoelectron. Instrum. Data Process. 44(3), 211–217 (2008)CrossRefGoogle Scholar
  35. 35.
    Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)CrossRefGoogle Scholar
  36. 36.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007 (1995)

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Lomonosov Moscow State UniversityMoscowRussia

Personalised recommendations