Effect of Semantic Parsing Depth on the Identification of Paraphrases in Russian Texts

Boyarsky, Kirill; Kanevsky, Eugeni

doi:10.1007/978-3-319-71746-3_19

Kirill Boyarsky¹² &
Eugeni Kanevsky¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 789))

Included in the following conference series:

Conference on Artificial Intelligence and Natural Language

1259 Accesses
2 Citations

Abstract

As a tool to solve the problem of identification of paraphrases in Russian texts, the paper proposes the semantic-syntactic parser SemSin and a semantic classifier. Several alternative methods for evaluating the similarity of sentence pairs—by words, by lemmas, by classes, by semantically related concepts, by predicate groups—have been analyzed. Advantages and drawbacks of the methods are discussed. The paraphrase identification quality has been shown to rise with increasing depth of using the semantic information. Yet, complementing the analysis with predicate groups, identified by the dependency tree, may even cause the identification to degrade due to the growing number of false positive decisions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barron-Cedeno, A., Vila, M., Marti, M.A., Rosso, P.: Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguis. 39(4), 917–947 (2012)
Article Google Scholar
Pham, N., Bernardi, R., Zhang, Y.Z., Baroni, M.: Sentence paraphrase detection: when determiners and word order make the difference. In: Proceedings of the Towards a Formal Distributional Semantics Workshop at IWCS, pp. 21–29 (2013)
Google Scholar
Finch, A., Hwang, Y.S., Sumita, E.: Using machine translation evaluation techniques to determine sentence-level semantic equivalence. In: Proceedings of the 3rd International Workshop on Paraphrasing, pp. 17–24 (2005)
Google Scholar
Corley, C., Mihalcea, R.: Measuring the semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pp. 13–18. Association for Computational Linguistics, Stroudsburg (2005)
Google Scholar
http://wordnet.princeton.edu/
Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: 11th Annual Research Colloquium Proceedings of the Computational Linguistics UK (2008)
Google Scholar
Pershina, M., He, Y., Grishman, R.: Idiom paraphrases: seventh heaven vs cloud nine. In: Proceedings of the EMNLP 2015 Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, pp. 76–82 (2015)
Google Scholar
Pronoza, E., Yagunova, E.: Comparison of sentence similarity measures for Russian paraphrase identification. In: Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference, pp. 74–82 (2015)
Google Scholar
Pivovarova, L., Pronoza, E., Yagunova, E., Pronoza, A.: ParaPhraser Russian paraphrase corpus and shared task. In: Filchenkov, A., et al. (eds.) AINL 2017. CCIS, vol. 789, pp. 211–225 (2018). https://doi.org/10.1007/978-3-319-71746-3_z
Chapter Google Scholar
http://www.paraphraser.ru
Iomdin, L., Petrochenkov, V., Sizov, V., Tsinman, L.: ETAP parser: state of the art Komp’iuternaia Lingvistika i Intellektual’nye Tekhnologii: Trudy Mezhdunarodnoi Konferentsii “Dialog 2012”. Bekasovo, pp. 119–131 (2012)
Google Scholar
Antonova, A.A., Misyurev, A.V.: Russian dependency parser SyntAutom at the DIALOGUE-2012 parser evaluation task. Komp’iuternaia Lingvistika i Intellektual’nye Tekhnologii: Trudy Mezhdunarodnoi Konferentsii “Dialog 2012”. Bekasovo, pp. 104–118 (2012)
Google Scholar
Anisimovich, K.V., Druzhkin, K.J., Minlos, F.R., Petrova, M.A., Selegey, V.P., Zuev, K.A.: Syntactic and semantic parser based on ABBYY Compreno linguistic technologies. Komp’iuternaia Lingvistika i Intellektual’nye Tekhnologii: Trudy Mezhdunarodnoi Konferentsii “Dialog 2012”. Bekasovo, pp. 91–103 (2012)
Google Scholar
http://www.dictum.ru/ru/syntax-analysis/blog
Boyarsky, K.K., Kanevsky, E.A.: Semantiko-sintaksicheskiy parser SemSin // Nauchno-Tehnicheskii Vestnik Informatsionnykh Tekhnologii, Mekhaniki i Optiki, vol. 15, № 5, pp. 869–876 (2015). (in Russian)
Google Scholar
Tuzov, V.A.: Komp’juternaja semantika russkogo jazyka. SPb: Izd-vo S. Peterb. un-ta (2004). (in Russian)
Google Scholar
Boyarsky, K.K., Kanevsky, E.A.: Predsintaksicheskiy modul’ v analizatore SemSin. Internet i sovremennoe obshchestvo: sbornik nauchnyh statey. Trudy XVI Vserossiyskoy ob”edinennoy konferentsii «Internet i sovremennoe obshchestvo». SPb, pp. 280–286 (2013). (in Russian)
Google Scholar
Boyarsky, K.K., Kanevsky, E.A.: Sistema produktsionnyh pravil dlya postroeniya sintaksicheskogo dereva predlozheniya. Prikladna lingvistika ta lingvistichni tekhnologii: MegaLing-2011.Kiev, pp. 73–80 (2012). (in Russian)
Google Scholar
Boyarsky, K.K., Kanevsky, E.A.: Yazyk pravil dlya postroeniya sintaksicheskogo dereva // Internet i sovremennoe obshchestvo: Materialy XIV Vserossiyskoy ob”edinennoy konferentsii «Internet i sovremennoe obshchestvo». SPb, pp. 233–237 (2011). (in Russian)
Google Scholar
Nivre, J., Boguslavsky, I.M., Iomdin, L.L.: Parsing the SynTagRus treebank of Russian. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 641–648. Association for Computational Linguistics (2008)
Google Scholar
Avdeeva, N., Boyarsky, K., Kanevsky, E.: Extraction of low-frequent terms from domain-specific texts by cluster semantic analyses. In: Proceedings of the ISMW-FRUCT 2016, Saint-Petersburg, Russia, FRUCT Oy, Finland, pp. 86–89 (2016)
Google Scholar
Artemova, G., Boyarsky, K., Gouzévitch, D., Gusarova, N., Dobrenko, N., Kanevsky, E., Petrova, D.: Text categorization for generation of a historical shipbuilding ontology. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2014. CCIS, vol. 468, pp. 1–14. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11716-4_1
Chapter Google Scholar
Pronoza, E., Yagunova, E., Pronoza, A.: Construction of a Russian paraphrase corpus: unsupervised paraphrase extraction. In: Braslavski, P., Markov, I., Pardalos, P., Volkovich, Y., Ignatov, Dmitry I., Koltsov, S., Koltsova, O. (eds.) RuSSIR 2015. CCIS, vol. 573, pp. 146–157. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41718-9_8
Chapter Google Scholar
Avdeeva, N., Artemova, G., Boyarsky, K., Gusarova, N., Dobrenko, N., Kanevsky, E.: Subtopic segmentation of scientific texts: parametr optimisation. Commun. Comput. Inf. Sci. 518, 3–15 (2015)
Google Scholar
Russkiy semanticheskiy slovar’. Tolkovyy slovar’, sistematizirovannyy po klassam slov i znacheniy/Rossiyskaya akademiya nauk. In-t rus. yaz. im. V.V. Vinogradova; Pod obshchey red. N. YU. SHvedovoy. – M.: “Azbukovnik” (1998). (in Russian)
Google Scholar
Azarova, I.V., Mitrofanova, O.A., Sinopal’nikova, A.A.: Komp’yuternyy tezaurus russkogo yazyka tipa WordNet. Komp’iuternaia Lingvistika i Intellektual’nye Tekhnologii: Trudy Mezhdunarodnoi Konferentsii “Dialog 2003”, pp. 168–177 (2003). (in Russian)
Google Scholar
Lukashevich, N.V.: Tezaurusy v zadachah informatsionnogo poiska. – M. MGU (2011). (in Russian)
Google Scholar
http://www.paraphraser.ru/download/get?file_id=5

Download references

Author information

Authors and Affiliations

ITMO University, St. Petersburg, Russia
Kirill Boyarsky & Eugeni Kanevsky

Authors

Kirill Boyarsky
View author publications
You can also search for this author in PubMed Google Scholar
Eugeni Kanevsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kirill Boyarsky .

Editor information

Editors and Affiliations

ITMO University, St. Petersburg, Russia
Andrey Filchenkov
University of Helsinki, Helsinki, Finland
Lidia Pivovarova
Mendel University , Brno, Czech Republic
Jan Žižka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boyarsky, K., Kanevsky, E. (2018). Effect of Semantic Parsing Depth on the Identification of Paraphrases in Russian Texts. In: Filchenkov, A., Pivovarova, L., Žižka, J. (eds) Artificial Intelligence and Natural Language. AINL 2017. Communications in Computer and Information Science, vol 789. Springer, Cham. https://doi.org/10.1007/978-3-319-71746-3_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-71746-3_19
Published: 28 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71745-6
Online ISBN: 978-3-319-71746-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics