Extracting Idiomatic Hungarian Verb Frames

  • Bálint Sass
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4139)


We describe a machine learning method for collecting idiomatic fixed stem verb frames. Firstly we collect frequent frame candidates from the output of a partial parser, secondly we apply a certain idiomaticity metric to the list to get the most idiomatic frames. Running our implemented system we get a list of ten thousand frames of more than 900 verbs which will be translated to English and used as a resource in a Hungarian-to-English machine translation system.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bojar, O., Hajič, J.: Extracting translations verb frames. In: Proceedings of the Modern Approaches in Translation Technologies Workshop, Borovets, Bulgaria, pp. 2–6 (2005)Google Scholar
  2. 2.
    Briscoe, T., Carroll, J.: Automatic extraction of subcategorization from corpora. In: Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP 1997), Washington, DC (1997)Google Scholar
  3. 3.
    Manning, C.D.: Automatic acquisition of a large subcategorization dictionary from corpora. In: Proceedings of the 31st Meeting of the Association for Computational Linguistics, Columbus, Ohio, pp. 235–242 (1993)Google Scholar
  4. 4.
    McCarthy, D., Keller, B., Carroll, J.: Detecting a continuum of compositionality in phrasal verbs. In: Proceedings of the ACL-SIGLEX Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, Sapporo, Japan, pp. 73–80 (2003)Google Scholar
  5. 5.
    Zeman, D., Sarkar, A.: Learning verb subcategorization from corpora: Counting frame subsets. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece (2000)Google Scholar
  6. 6.
    Kis, B., Villada, B., Bouma, G., Ugray, G., Bíró, T., Pohl, G., Nerbonne, J.: A new approach to the corpus-based statistical investigation of hungarian multi-word lexemes. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, vol. V, pp. 1677–1681 (2004)Google Scholar
  7. 7.
    Megyesi, B.: The hungarian language (1998)Google Scholar
  8. 8.
    Sass, B.: Vonzatkeretek a Magyar Nemzeti Szövegtárban [Verb frames in the Hungarian National Corpus]. In: Proceedings of the 3rd Magyar Számítógépes Nyelvészeti Konferencia [Hungarian Conference on Computational Linguistics] (MSZNY 2005), Szeged, Hungary, pp. 257–264 (2005)Google Scholar
  9. 9.
    Váradi, T.: The Hungarian National Corpus. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002), Las Palmas, Spain, pp. 385–389 (2002)Google Scholar
  10. 10.
    Abney, S.: Partial parsing via finite-state cascades. In: Proceedings of the 8th European Summer School in Logic, Language and Information (ESSLLI 1996) Robust Parsing Workshop, Prague, Czech Republic, pp. 8–15 (1996)Google Scholar
  11. 11.
    Tapanainen, P., Piitulainen, J., Järvinen, T.: Idiomatic object usage and support verbs. In: Proceedings of the 17th COLING – 36th ACL, Montreal, Canada, pp. 1289–1293 (1998)Google Scholar
  12. 12.
    Brent, M.: From grammar to lexicon: Unsupervised learning of lexical syntax. Computational Linguistics 19, 243–262 (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Bálint Sass
    • 1
  1. 1.Research Institute for LinguisticsHungarian Academy of Sciences 

Personalised recommendations