Corpus of Syntactic Co-Occurrences: A Delayed Promise

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 789)

Abstract

The paper gives a technical description of CoSyCo, a corpus of syntactic co-occurrences, which provides information on syntactically connected words in the Russian language. The paper includes an overview of the corpora collected for CoSyCo creation and the amount of collected combinations. In the paper, we also provide a short evaluation of the gathered information.

Keywords

Corpora creation Shallow parsing Grammatically ambiguous text Words combinations The Russian language 

References

  1. 1.
    Apresjan, V., Baisa, V., Buivolova, O., Kultepina, O.: RuSkELL: online language learning tool for Russian language. In: Proceedings of the XVII EURALEX International congress, Tbilisi, Georgia, pp. 292–299 (2016)Google Scholar
  2. 2.
    Belikov, V., Selegey, V., Sharoff, S.: Preliminary considerations towards developing the General Internet Corpus of Russian, Computational Linguistics and Intelligent Technologies: Proceedings of the International Conference “Dialog” 2012”, Bekasovo, vol. 1, pp. 37–49 (2012)Google Scholar
  3. 3.
    Belikov, V., Kopylov, N., Piperski, A., Selegey, V., Sharoff, S.: Corpus as language: from scalability to register variation [Korpus kak yazyk: ot masshtabiruemosti k differentsialnoi polnote] Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialog” (2013) [Komp’iuternaia Lingvistika i Intellektual’nye Tekhnologii: Po materialam ezhegodnoi Mezhdunarodnoii Konferentsii “Dialog” (2013)], Bekasovo, vol. 1, pp. 83–96 (2013)Google Scholar
  4. 4.
    Frolova, T.I., Podlesskaya, O.Y.: Tagging lexical functions in Russian texts of SynTagRus. In: Proceedings of Dialog 2011, pp. 207–218 (2011)Google Scholar
  5. 5.
    Kilgariff, A., Rychly, P., Smrz, P., Tugwell, D.: The sketch engine. In: Proceedings of the XI Euralex International Congress, Lorient, France, pp. 105–116 (2004)Google Scholar
  6. 6.
    Klyshinsky, E., Kochetkova, N., Litvinov, M., Maximov, V.: Method of POS-disambiguation using information about words co-occurrence (for Russian). In: Proceedings of the annual meeting of the GSCL, Hamburg, pp. 191–195 (2011)Google Scholar
  7. 7.
    Klyshinsky, E., et al.: Analysis of Words Ambiguity in European Languages, № 4, p. 31. Keldysh IAM Preprints 2015 (2015)Google Scholar
  8. 8.
    Klyshinsky, E., Ermakov, P., Lukashevich, N., Karpik, O.: The corpus of syntactic co-occurences: the first glance. In: Proceedings of the Fifth International Conference on Analysis of Images, Social Networks and Texts (AIST 2016), pp. 85–90 (2016)Google Scholar
  9. 9.
    Kormacheva, D., Pivovarova, L., Kopotev, M.: Automatic collocation extraction and classification of automatically obtained bigrams. In: Proceedings of Workshop on Computational, Cognitive, and Linguistic Approaches to the Analysis of Complex Words and Collocations (CCLCC 2014), pp. 27–33 (2014)Google Scholar
  10. 10.
    Lukashevich, N., Klyshinky, E., Kobozeva, I.: Lexical research in Russian: are modern corpora flexible enough?, Computational Linguistics and Intellectual Technologies: Proceedings of the Annual International Conference “Dialog” (2016) [Komp’iuternaia Lingvistika i Intellektual’nye Tekhnologii: Po materialam ezhegodnoi Mezhdunarodnoi Konferentsii “Dialog” (2016)], Moscow, pp. 385–397 (2016)Google Scholar
  11. 11.
    Lyashevskaja, O., Plungian, V.: Morphological annotation in Russian National Corpus: a theoretical feedback. In: Proceedings of the 5th International Conference on Formal Description of Slavic Languages (FDSL-5), Leipzig, pp. 26–28 (2003)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Eduard S. Klyshinsky
    • 1
  • Natalia Y. Lukashevich
    • 2
  1. 1.Keldysh IAM RASMoscowRussia
  2. 2.Moscow State UniversityMoscowRussia

Personalised recommendations