Abstract
As a tool to solve the problem of identification of paraphrases in Russian texts, the paper proposes the semantic-syntactic parser SemSin and a semantic classifier. Several alternative methods for evaluating the similarity of sentence pairs—by words, by lemmas, by classes, by semantically related concepts, by predicate groups—have been analyzed. Advantages and drawbacks of the methods are discussed. The paraphrase identification quality has been shown to rise with increasing depth of using the semantic information. Yet, complementing the analysis with predicate groups, identified by the dependency tree, may even cause the identification to degrade due to the growing number of false positive decisions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barron-Cedeno, A., Vila, M., Marti, M.A., Rosso, P.: Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguis. 39(4), 917–947 (2012)
Pham, N., Bernardi, R., Zhang, Y.Z., Baroni, M.: Sentence paraphrase detection: when determiners and word order make the difference. In: Proceedings of the Towards a Formal Distributional Semantics Workshop at IWCS, pp. 21–29 (2013)
Finch, A., Hwang, Y.S., Sumita, E.: Using machine translation evaluation techniques to determine sentence-level semantic equivalence. In: Proceedings of the 3rd International Workshop on Paraphrasing, pp. 17–24 (2005)
Corley, C., Mihalcea, R.: Measuring the semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pp. 13–18. Association for Computational Linguistics, Stroudsburg (2005)
Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: 11th Annual Research Colloquium Proceedings of the Computational Linguistics UK (2008)
Pershina, M., He, Y., Grishman, R.: Idiom paraphrases: seventh heaven vs cloud nine. In: Proceedings of the EMNLP 2015 Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, pp. 76–82 (2015)
Pronoza, E., Yagunova, E.: Comparison of sentence similarity measures for Russian paraphrase identification. In: Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference, pp. 74–82 (2015)
Pivovarova, L., Pronoza, E., Yagunova, E., Pronoza, A.: ParaPhraser Russian paraphrase corpus and shared task. In: Filchenkov, A., et al. (eds.) AINL 2017. CCIS, vol. 789, pp. 211–225 (2018). https://doi.org/10.1007/978-3-319-71746-3_z
Iomdin, L., Petrochenkov, V., Sizov, V., Tsinman, L.: ETAP parser: state of the art Komp’iuternaia Lingvistika i Intellektual’nye Tekhnologii: Trudy Mezhdunarodnoi Konferentsii “Dialog 2012”. Bekasovo, pp. 119–131 (2012)
Antonova, A.A., Misyurev, A.V.: Russian dependency parser SyntAutom at the DIALOGUE-2012 parser evaluation task. Komp’iuternaia Lingvistika i Intellektual’nye Tekhnologii: Trudy Mezhdunarodnoi Konferentsii “Dialog 2012”. Bekasovo, pp. 104–118 (2012)
Anisimovich, K.V., Druzhkin, K.J., Minlos, F.R., Petrova, M.A., Selegey, V.P., Zuev, K.A.: Syntactic and semantic parser based on ABBYY Compreno linguistic technologies. Komp’iuternaia Lingvistika i Intellektual’nye Tekhnologii: Trudy Mezhdunarodnoi Konferentsii “Dialog 2012”. Bekasovo, pp. 91–103 (2012)
Boyarsky, K.K., Kanevsky, E.A.: Semantiko-sintaksicheskiy parser SemSin // Nauchno-Tehnicheskii Vestnik Informatsionnykh Tekhnologii, Mekhaniki i Optiki, vol. 15, № 5, pp. 869–876 (2015). (in Russian)
Tuzov, V.A.: Komp’juternaja semantika russkogo jazyka. SPb: Izd-vo S. Peterb. un-ta (2004). (in Russian)
Boyarsky, K.K., Kanevsky, E.A.: Predsintaksicheskiy modul’ v analizatore SemSin. Internet i sovremennoe obshchestvo: sbornik nauchnyh statey. Trudy XVI Vserossiyskoy ob”edinennoy konferentsii «Internet i sovremennoe obshchestvo». SPb, pp. 280–286 (2013). (in Russian)
Boyarsky, K.K., Kanevsky, E.A.: Sistema produktsionnyh pravil dlya postroeniya sintaksicheskogo dereva predlozheniya. Prikladna lingvistika ta lingvistichni tekhnologii: MegaLing-2011.Kiev, pp. 73–80 (2012). (in Russian)
Boyarsky, K.K., Kanevsky, E.A.: Yazyk pravil dlya postroeniya sintaksicheskogo dereva // Internet i sovremennoe obshchestvo: Materialy XIV Vserossiyskoy ob”edinennoy konferentsii «Internet i sovremennoe obshchestvo». SPb, pp. 233–237 (2011). (in Russian)
Nivre, J., Boguslavsky, I.M., Iomdin, L.L.: Parsing the SynTagRus treebank of Russian. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 641–648. Association for Computational Linguistics (2008)
Avdeeva, N., Boyarsky, K., Kanevsky, E.: Extraction of low-frequent terms from domain-specific texts by cluster semantic analyses. In: Proceedings of the ISMW-FRUCT 2016, Saint-Petersburg, Russia, FRUCT Oy, Finland, pp. 86–89 (2016)
Artemova, G., Boyarsky, K., Gouzévitch, D., Gusarova, N., Dobrenko, N., Kanevsky, E., Petrova, D.: Text categorization for generation of a historical shipbuilding ontology. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2014. CCIS, vol. 468, pp. 1–14. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11716-4_1
Pronoza, E., Yagunova, E., Pronoza, A.: Construction of a Russian paraphrase corpus: unsupervised paraphrase extraction. In: Braslavski, P., Markov, I., Pardalos, P., Volkovich, Y., Ignatov, Dmitry I., Koltsov, S., Koltsova, O. (eds.) RuSSIR 2015. CCIS, vol. 573, pp. 146–157. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41718-9_8
Avdeeva, N., Artemova, G., Boyarsky, K., Gusarova, N., Dobrenko, N., Kanevsky, E.: Subtopic segmentation of scientific texts: parametr optimisation. Commun. Comput. Inf. Sci. 518, 3–15 (2015)
Russkiy semanticheskiy slovar’. Tolkovyy slovar’, sistematizirovannyy po klassam slov i znacheniy/Rossiyskaya akademiya nauk. In-t rus. yaz. im. V.V. Vinogradova; Pod obshchey red. N. YU. SHvedovoy. – M.: “Azbukovnik” (1998). (in Russian)
Azarova, I.V., Mitrofanova, O.A., Sinopal’nikova, A.A.: Komp’yuternyy tezaurus russkogo yazyka tipa WordNet. Komp’iuternaia Lingvistika i Intellektual’nye Tekhnologii: Trudy Mezhdunarodnoi Konferentsii “Dialog 2003”, pp. 168–177 (2003). (in Russian)
Lukashevich, N.V.: Tezaurusy v zadachah informatsionnogo poiska. – M. MGU (2011). (in Russian)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Boyarsky, K., Kanevsky, E. (2018). Effect of Semantic Parsing Depth on the Identification of Paraphrases in Russian Texts. In: Filchenkov, A., Pivovarova, L., Žižka, J. (eds) Artificial Intelligence and Natural Language. AINL 2017. Communications in Computer and Information Science, vol 789. Springer, Cham. https://doi.org/10.1007/978-3-319-71746-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-71746-3_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71745-6
Online ISBN: 978-3-319-71746-3
eBook Packages: Computer ScienceComputer Science (R0)