Advertisement

Effect of Semantic Parsing Depth on the Identification of Paraphrases in Russian Texts

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 789)

Abstract

As a tool to solve the problem of identification of paraphrases in Russian texts, the paper proposes the semantic-syntactic parser SemSin and a semantic classifier. Several alternative methods for evaluating the similarity of sentence pairs—by words, by lemmas, by classes, by semantically related concepts, by predicate groups—have been analyzed. Advantages and drawbacks of the methods are discussed. The paraphrase identification quality has been shown to rise with increasing depth of using the semantic information. Yet, complementing the analysis with predicate groups, identified by the dependency tree, may even cause the identification to degrade due to the growing number of false positive decisions.

Keywords

Russian texts Paraphrases Semantic dictionary Lemmas Classifier Classes Semantic-syntactic parsing Synonymy 

References

  1. 1.
    Barron-Cedeno, A., Vila, M., Marti, M.A., Rosso, P.: Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguis. 39(4), 917–947 (2012)CrossRefGoogle Scholar
  2. 2.
    Pham, N., Bernardi, R., Zhang, Y.Z., Baroni, M.: Sentence paraphrase detection: when determiners and word order make the difference. In: Proceedings of the Towards a Formal Distributional Semantics Workshop at IWCS, pp. 21–29 (2013)Google Scholar
  3. 3.
    Finch, A., Hwang, Y.S., Sumita, E.: Using machine translation evaluation techniques to determine sentence-level semantic equivalence. In: Proceedings of the 3rd International Workshop on Paraphrasing, pp. 17–24 (2005)Google Scholar
  4. 4.
    Corley, C., Mihalcea, R.: Measuring the semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pp. 13–18. Association for Computational Linguistics, Stroudsburg (2005)Google Scholar
  5. 5.
  6. 6.
    Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: 11th Annual Research Colloquium Proceedings of the Computational Linguistics UK (2008)Google Scholar
  7. 7.
    Pershina, M., He, Y., Grishman, R.: Idiom paraphrases: seventh heaven vs cloud nine. In: Proceedings of the EMNLP 2015 Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, pp. 76–82 (2015)Google Scholar
  8. 8.
    Pronoza, E., Yagunova, E.: Comparison of sentence similarity measures for Russian paraphrase identification. In: Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference, pp. 74–82 (2015)Google Scholar
  9. 9.
    Pivovarova, L., Pronoza, E., Yagunova, E., Pronoza, A.: ParaPhraser Russian paraphrase corpus and shared task. In: Filchenkov, A., et al. (eds.) AINL 2017. CCIS, vol. 789, pp. 211–225 (2018).  https://doi.org/10.1007/978-3-319-71746-3_zCrossRefGoogle Scholar
  10. 10.
  11. 11.
    Iomdin, L., Petrochenkov, V., Sizov, V., Tsinman, L.: ETAP parser: state of the art Komp’iuternaia Lingvistika i Intellektual’nye Tekhnologii: Trudy Mezhdunarodnoi Konferentsii “Dialog 2012”. Bekasovo, pp. 119–131 (2012)Google Scholar
  12. 12.
    Antonova, A.A., Misyurev, A.V.: Russian dependency parser SyntAutom at the DIALOGUE-2012 parser evaluation task. Komp’iuternaia Lingvistika i Intellektual’nye Tekhnologii: Trudy Mezhdunarodnoi Konferentsii “Dialog 2012”. Bekasovo, pp. 104–118 (2012)Google Scholar
  13. 13.
    Anisimovich, K.V., Druzhkin, K.J., Minlos, F.R., Petrova, M.A., Selegey, V.P., Zuev, K.A.: Syntactic and semantic parser based on ABBYY Compreno linguistic technologies. Komp’iuternaia Lingvistika i Intellektual’nye Tekhnologii: Trudy Mezhdunarodnoi Konferentsii “Dialog 2012”. Bekasovo, pp. 91–103 (2012)Google Scholar
  14. 14.
  15. 15.
    Boyarsky, K.K., Kanevsky, E.A.: Semantiko-sintaksicheskiy parser SemSin // Nauchno-Tehnicheskii Vestnik Informatsionnykh Tekhnologii, Mekhaniki i Optiki, vol. 15, № 5, pp. 869–876 (2015). (in Russian)Google Scholar
  16. 16.
    Tuzov, V.A.: Komp’juternaja semantika russkogo jazyka. SPb: Izd-vo S. Peterb. un-ta (2004). (in Russian)Google Scholar
  17. 17.
    Boyarsky, K.K., Kanevsky, E.A.: Predsintaksicheskiy modul’ v analizatore SemSin. Internet i sovremennoe obshchestvo: sbornik nauchnyh statey. Trudy XVI Vserossiyskoy ob”edinennoy konferentsii «Internet i sovremennoe obshchestvo». SPb, pp. 280–286 (2013). (in Russian)Google Scholar
  18. 18.
    Boyarsky, K.K., Kanevsky, E.A.: Sistema produktsionnyh pravil dlya postroeniya sintaksicheskogo dereva predlozheniya. Prikladna lingvistika ta lingvistichni tekhnologii: MegaLing-2011.Kiev, pp. 73–80 (2012). (in Russian)Google Scholar
  19. 19.
    Boyarsky, K.K., Kanevsky, E.A.: Yazyk pravil dlya postroeniya sintaksicheskogo dereva // Internet i sovremennoe obshchestvo: Materialy XIV Vserossiyskoy ob”edinennoy konferentsii «Internet i sovremennoe obshchestvo». SPb, pp. 233–237 (2011). (in Russian)Google Scholar
  20. 20.
    Nivre, J., Boguslavsky, I.M., Iomdin, L.L.: Parsing the SynTagRus treebank of Russian. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 641–648. Association for Computational Linguistics (2008)Google Scholar
  21. 21.
    Avdeeva, N., Boyarsky, K., Kanevsky, E.: Extraction of low-frequent terms from domain-specific texts by cluster semantic analyses. In: Proceedings of the ISMW-FRUCT 2016, Saint-Petersburg, Russia, FRUCT Oy, Finland, pp. 86–89 (2016)Google Scholar
  22. 22.
    Artemova, G., Boyarsky, K., Gouzévitch, D., Gusarova, N., Dobrenko, N., Kanevsky, E., Petrova, D.: Text categorization for generation of a historical shipbuilding ontology. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2014. CCIS, vol. 468, pp. 1–14. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11716-4_1CrossRefGoogle Scholar
  23. 23.
    Pronoza, E., Yagunova, E., Pronoza, A.: Construction of a Russian paraphrase corpus: unsupervised paraphrase extraction. In: Braslavski, P., Markov, I., Pardalos, P., Volkovich, Y., Ignatov, Dmitry I., Koltsov, S., Koltsova, O. (eds.) RuSSIR 2015. CCIS, vol. 573, pp. 146–157. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-41718-9_8CrossRefGoogle Scholar
  24. 24.
    Avdeeva, N., Artemova, G., Boyarsky, K., Gusarova, N., Dobrenko, N., Kanevsky, E.: Subtopic segmentation of scientific texts: parametr optimisation. Commun. Comput. Inf. Sci. 518, 3–15 (2015)Google Scholar
  25. 25.
    Russkiy semanticheskiy slovar’. Tolkovyy slovar’, sistematizirovannyy po klassam slov i znacheniy/Rossiyskaya akademiya nauk. In-t rus. yaz. im. V.V. Vinogradova; Pod obshchey red. N. YU. SHvedovoy. – M.: “Azbukovnik” (1998). (in Russian)Google Scholar
  26. 26.
    Azarova, I.V., Mitrofanova, O.A., Sinopal’nikova, A.A.: Komp’yuternyy tezaurus russkogo yazyka tipa WordNet. Komp’iuternaia Lingvistika i Intellektual’nye Tekhnologii: Trudy Mezhdunarodnoi Konferentsii “Dialog 2003”, pp. 168–177 (2003). (in Russian)Google Scholar
  27. 27.
    Lukashevich, N.V.: Tezaurusy v zadachah informatsionnogo poiska. – M. MGU (2011). (in Russian)Google Scholar
  28. 28.

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.ITMO UniversitySt. PetersburgRussia

Personalised recommendations