Extracting Frame-Like Structures from Google Books NGram Dataset

  • Vladimir Ivanov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8856)


We propose a method that facilitates a process of semi-automatic FrameNet construction. The method requires Google Books NGram dataset and WordNet or another thesaurus for a particular language. We evaluated the method for Russian ngrams. Due to a huge amount of available data the method does not require sophisticated natural language processing techniques (e.g. for word sense disambiguation), and it shows a promising result.


natural language processing framenet information extraction subordination models ngrams 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baker, C.F., Fillmore, C.J., Cronin, B.: The structure of the framenet database. International Journal of Lexicography 16(3), 281–296 (2003)CrossRefGoogle Scholar
  2. 2.
    Castro-Sánchez, N.A., Sidorov, G.: Analysis of definitions of verbs in an explanatory dictionary for automatic extraction of actants based on detection of patterns. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds.) NLDB 2010. LNCS, vol. 6177, pp. 233–239. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Koeva, S.: Lexicon and grammar in bulgarian framenet. In: LREC (2010)Google Scholar
  4. 4.
    Lin, Y., Michel, J.-B., Aiden, E.L., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the google books ngram corpus. In: Proceedings of the ACL 2012 System Demonstrations, pp. 169–174. Association for Computational Linguistics (2012)Google Scholar
  5. 5.
    Loukachevitch, N., Dobrov, B.: Ruthes linguistic ontology vs. russian wordnets. In: Proceedings of Global WordNet Conference GWC-2014, Tartu (2014)Google Scholar
  6. 6.
    Lyashevskaya, O.: Dictionary of valencies meets corpus annotation: A case of russian framebank. Proceedings of EURALEX 15 (2012)Google Scholar
  7. 7.
    Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: An annotated corpus of semantic roles. Computational linguistics 31(1), 71–106 (2005)CrossRefGoogle Scholar
  8. 8.
    Kochetkova, N.A., Klyshinsky, E.S.: A method of automatic generating of russian verb subordination models. In: Proceedings of the In XII National Conference of Artificial Intelligence (2013)Google Scholar
  9. 9.
    Schuler, K.K.: Verbnet: A broad-coverage, comprehensive verb lexicon (2005)Google Scholar
  10. 10.
    Sidorov, G.: Syntactic dependency based n-grams in rule based automatic english as second language grammar correction. International Journal of Computational Linguistics and Applications 4(2), 169–188 (2013)Google Scholar
  11. 11.
    Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic n-grams as machine learning features for natural language processing. Expert Systems with Applications 41(3), 853–860 (2014)CrossRefGoogle Scholar
  12. 12.
    Tonelli, S., Pianta, E.: Frame information transfer from english to italian. In: LREC (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Vladimir Ivanov
    • 1
    • 2
    • 3
  1. 1.Kazan Federal UniversityKazanRussia
  2. 2.National University of Science and Technology ”MISIS”MoscowRussia
  3. 3.Institute of InformaticsTatarstan Academy of SciencesKazanRussia

Personalised recommendations