Automatic Word Sense Disambiguation and Construction Identification Based on Corpus Multilevel Annotation

  • Olga Lyashevskaya
  • Olga Mitrofanova
  • Maria Grachkova
  • Sergey Romanov
  • Anastasia Shimorina
  • Alexandra Shurygina
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6836)

Abstract

The research project reported in this paper aims at automatic extraction of linguistic information from contexts in the Russian National Corpus (RNC) and its subsequent use in building a comprehensive lexicographic resource – the Index of Russian lexical constructions. The proposed approach implies automatic context classification intended for word sense disambiguation (WSD) and construction identification (CxI). The automatic context processing procedure takes into account the following types of contextual information represented in the RNC multilevel annotation: lexical (lemma) tags (lex), morphological (grammatical) tags (gr), semantic (taxonomy) tags (sem), and combinations of the various types of tags. Multiple experiments on WSD and CxI are performed using RNC representative context samples for nouns. In each series of experiments we analyze (1) different context markers of meaning of target words and (2) constructions including context markers and target words.

Keywords

WSD constructions construction identification Russian National Corpus context classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Russian National Corpus, http://ruscorpora.ru
  2. 2.
    Russian National Corpus: 2003–2005. Indrik, Moscow (2005) (in Russian)Google Scholar
  3. 3.
    Russian National Corpus: 2006–2008. New results and future development. Nestor-Istorija, St. Petersburg (2009) (in Russian)Google Scholar
  4. 4.
    Nivre, J., Boguslavsky, I.M., Iomdin, L.: Parsing the SynTagRus Treebank of Russian. In: COLING 2008, Manchester, UK, vol. 1, pp. 641–648 (2008)Google Scholar
  5. 5.
    Goldberg, A.E.: Constructions. A Construction Grammar Approach to Argument Structure. University of Chicago Press, Chicago (1995)Google Scholar
  6. 6.
    Goldberg, A.E.: Constructions at Work: the Nature of Generalization in Language. Oxford University Press, Oxford (2006)Google Scholar
  7. 7.
    Fillmore, C.J.: The Mechanisms of Construction Grammar. Proceedings of the Berkeley Linguistic Society 14, 35–55 (1988)CrossRefGoogle Scholar
  8. 8.
    Tomasello, M.: Constructing a Language: A Usage-Based Approach to Child Language Acquisition. Harvard University Press, Cambridge (2003)Google Scholar
  9. 9.
    Agirre, E., Edmonds, P. (eds.): Word Sense Disambiguation: Algorithms and Applications. Text, Speech and Language Technology, vol. 33. Springer, Heidelberg (2007)Google Scholar
  10. 10.
    Mihalcea, R., Pedersen, T.: Word Sense Disambiguation Tutorial (2005), http://www.d.umn.edu/~tpederse/WSDTutorial.html
  11. 11.
    Navigli, R.: Word Sense Disambiguation: a Survey. ACM Computing Surveys 41(2), 1–69 (2009)CrossRefGoogle Scholar
  12. 12.
  13. 13.
  14. 14.
    Pedersen, T.: A Baseline Methodology for Word Sense Disambiguation. In: Gelbukh, A.F. (ed.) CICLing 2002. LNCS, vol. 2276, p. 126. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  15. 15.
    Schütze, H.: Automatic Word Sense Disambiguation. Computational Linguistics 24(1), 23–97 (1998)Google Scholar
  16. 16.
    Leacock, C., Chodorow, M., Miller, G.: Using Corpus Statistics and WordNet Relations for Sense Identification. Computational Linguistics 24(1), 147–165 (1998)Google Scholar
  17. 17.
    Mihalcea, R.: Word Sense Disambiguation Using Pattern Learning and Automatic Feature Selection. Journal of Natural Language and Engineering 1(1), 1–15 (2002)MathSciNetGoogle Scholar
  18. 18.
    Mitrofanova, O., Panicheva, P., Lashevskaya, O.: Statistical Word Sense Disambiguation in Contexts for Russian Nouns Denoting Physical Objects. In: Sojka, P., et al. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 153–159. Springer, Heidelberg (2008a)CrossRefGoogle Scholar
  19. 19.
    Mitrofanova, O., Lashevskaya, O., Panicheva, P.: Experiments on statistical WSD for Russian nouns in a corpus. In: Proceedings of the International Conference Corpora 2008, St. Petersburg, Russia, October 6–10, pp. 284–293 (2008b) (in Russian)Google Scholar
  20. 20.
    Lukashevich, N.V., Chujko, D.S.: Automatic WSD based on thesaurus knowledge. In: Internet-matematika 2007, Ekaterinburg, pp. 108–117 (2007) (in Russian)Google Scholar
  21. 21.
    Rahilina, E.V., Kobritsov, B.P., Kustova, G.I., Lashevskaja, O.N., Shemanaeva, O.J.: Semantic ambiguity as an application-oriented problem: word class tagging in the RNC. In: Computational Linguistics and Intellectual Technologies. Proceedings of the International Workshop Dialogue 2006, Moscow, pp. 445–450 (2006) (in Russian)Google Scholar
  22. 22.
    Kustova, G.I., Lashevskaja, O.N., Paducheva, E.V., Rakhilina, E.V.: Verb Taxonomy: From Theoretical Lexical Semantics to Practice of Corpus Tagging. In: Lewandowska, B., Dziwirek, K. (eds.) Cognitive Corpus Linguistics Studies. Peter Lang, Frankfurt (2009)Google Scholar
  23. 23.
    Azarova, I.V., Bichineva, S.V., Vakhitova, D.T.: Automatic WSD of the most frequent nouns (in terms of the structural units of RussNet). In: Proceedings of the International Conference Corpora 2008, St. Petersburg, Russia, October 6–10, pp. 5–8 (2008) (in Russian)Google Scholar
  24. 24.
    Azarova, I.V., Marina, A.S.: Computational context classification: preparing the data for the thesaurus RussNet. In: Computational Linguistics and Intellectual Technologies. Proceedings of the International Workshop Dialogue 2006, pp. 13–17. RGGU, Moscow (2006) (in Russian)Google Scholar
  25. 25.
    Kobritsov, B.P., Lashevskaja, O.N., Shemanajeva, O.J.: WSD in mass media texts: shallow rules and statistic evaluation. In: Internet–matematika 2005: Avtomaticheskaja obrabotka web-dannyx, Moscow, pp. 38–57 (2005) (in Russian)Google Scholar
  26. 26.
    Toldova, S.J., Kustova, G.I., Lashevskaja, O.N.: Semantic filters for WSD in the Russian National Corpus: verbs. In: Computational linguistics and intellectual technologies. Proceedings of the International Workshop Dialogue 2008, pp. 522–529. RGGU, Moscow (2008) (in Russian)Google Scholar
  27. 27.
    Sahlgren, M., Knutsson, O.: Workshop on Extracting and Using Constructions in NLP. In: NODALIDA 2009. SICS Technical Report T2009:10 (2009)Google Scholar
  28. 28.
    Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics, pp. 25–31, Los Angeles, CA (2010)Google Scholar
  29. 29.
    Wible, D., Tsao, N.-L.: StringNet as a Computational Resource for Discovering and Investigating Linguistic Constructions. In: Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics, Los Angeles, CA, pp. 25–31 (2010)Google Scholar
  30. 30.
    Lashevskaja, O., Mitrofanova, O.: Disambiguation of Taxonomy Markers in Context: Russian Nouns. In: Jokinen, K., Bick, E. (eds.) NODALIDA 2009. NEALT Proceedings Series, vol. 4, pp. 111–117 (2009)Google Scholar
  31. 31.
    Mitrofanova, O., Lyashevskaya, O.: Context markers of the nouns with concrete meaning in the lexico-semantic annotation of the RNC. In: XXXVIII International philological Conference, St. Petersburg (2009) (in Russian)Google Scholar
  32. 32.
    Atkins, B.T.S., Rundell, M.: The Oxford Guide to Practical Lexicography. Oxford University Press, New York (2008)Google Scholar
  33. 33.
    Gries, S.T., Divjak, D.: Behavioral Profiles: a Corpus-Based Approach to Cognitive Semantic Analysis. In: Evans, V., Pourcel, S.S. (eds.) New Directions in Cognitive Linguistics. John Benjamins, Amsterdam (2008)Google Scholar
  34. 34.
    Fillmore, C.J., Lee-Goldman, R.R., Rhodes, R.: The FrameNet Constructicon. In: Boas, H.C., Sag, I.A. (eds.) Sign-based Construction Grammar. CSLI Publications, Stanford (forthcoming)Google Scholar
  35. 35.
    Lyashevskaya, O.: Bank of Russian Constructions and Valencies. In: LREC 2010, pp. 1802–1805. ELRA (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Olga Lyashevskaya
    • 1
  • Olga Mitrofanova
    • 2
  • Maria Grachkova
    • 2
  • Sergey Romanov
    • 2
  • Anastasia Shimorina
    • 2
  • Alexandra Shurygina
    • 2
  1. 1.NRU Higher School of EconomicsMoscowRussia
  2. 2.St. Petersburg State UniversitySt. PetersburgRussia

Personalised recommendations