Advertisement

Corpus-Based Acquisition of Support Verb Constructions for Portuguese

  • Britta D. Zeller
  • Sebastian Padó
Conference paper
  • 894 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7243)

Abstract

We present a resource-poor approach to automatically acquire Support Verb Constructions (SVCs) for European Portuguese with a two-stage procedure. First, we apply a cross-lingual approach with a bilingual parallel corpus: starting with a Portuguese full verb, we use the translations into another language and the corresponding backtranslations to identify Portuguese verb-noun pairs with the same meaning. Since not all of these are SVCs, the candidates are ranked and filtered in a second, monolingual step based on association statistics. We discuss two parametrisations of our procedure for a high-precision and a high-recall setting. In our experiments, these parametrisations achieve a maximum precision of 91% and a maximum recall of 86%, respectively.

Keywords

lexical acquisition support verbs multi-word expressions parallel bilingual data word alignment association measures 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Athayde, M.F.: Construções com Verbo-suporte (Funktionsverbgefüge) do Português e do Alemão. Cadernos Do Cieg 1, 5–68 (2001)Google Scholar
  2. 2.
    Bannard, C., Callison-Burch, C.: Paraphrasing with Bilingual Parallel Corpora. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, MI, pp. 597–604 (2005)Google Scholar
  3. 3.
    Butt, M.: The Light Verb Jungle. Harvard Working Papers in Linguistics 9, 1–49 (2003)Google Scholar
  4. 4.
    Carreras, X., Chao, I., Padró, L., Padró, M.: FreeLing: an Open-Source Suite of Language Analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal (2004)Google Scholar
  5. 5.
    Cinková, S., Pecina, P., Podveský, P., Schlesinger, P.: Semi-automatic Building of Swedish Collocation Lexicon. In: Proceedings of the 5th Conference on International Language Resources and Evaluation, Genoa, Italy (2006)Google Scholar
  6. 6.
    Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37–46 (1960)CrossRefGoogle Scholar
  7. 7.
    Duarte, I., Gonçalves, A., Miguel, M., Mendes, A., Hendrickx, I., Oliveira, F., Cunha, L.F., Silva, F., Silvano, P.: Light Verbs Features in European Portuguese. In: Proceedings of the 2nd Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes, Pisa, Italy (2010)Google Scholar
  8. 8.
    Duran Sanches, M., Ramisch, C., Aluísio, S.M., Villavicencio, A.: Identifying and Analyzing Brazilian Portuguese Complex Predicates. In: Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World, Portland, USA, pp. 74–82 (2011)Google Scholar
  9. 9.
    Evert, S., Krenn, B.: Methods for the Qualitative Evaluation of Lexical Association Measures. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, pp. 188–195 (2001)Google Scholar
  10. 10.
    Grefenstette, G., Teufel, S.: Corpus-Based Method for Automatic Identification of Support Verbs for Nominalizations. In: Proceedings of European Chapter of the Associaton of Computational Linguistics, Dublin, Ireland, pp. 98–103 (1995)Google Scholar
  11. 11.
    Hanks, P., Urbschat, A., Gehweiler, E.: German Light Verb Constructions in Corpora and Dictionaries. International Journal of Lexicography 19(4), 439–457 (2006)CrossRefGoogle Scholar
  12. 12.
    Hendrickx, I., Mendes, A., Pereira, S., Gonçalves, A., Duarte, I.: Complex Predicates Annotation in a Corpus of Portuguese. In: Proceedings of the 4th ACL Linguistic Annotation Workshop, Uppsala, Sweden, pp. 100–108 (2010)Google Scholar
  13. 13.
    Koehn, P.: Europarl: a Parallel Corpus for Statistical Machine Translation. In: Proceedings of the 10th Machine Translation Summit, Chiang Mai, Thailand, pp. 79–86 (2005)Google Scholar
  14. 14.
    Krenn, B., Evert, S.: Can We Do Better than Frequency? A Case Study on Extracting PP-Verb Collocations. In: Proceedings of the ACL Workshop on Collocations, Toulouse, France (2001)Google Scholar
  15. 15.
    Lin, D., Pantel, P.: Discovery of Inference Rules for Question Answering. Journal of Natural Language Engineering 7(4), 343–360 (2001)CrossRefGoogle Scholar
  16. 16.
    Landis, J.R., Koch, G.G.: The Measurement of Observer Agreement for Categorical Data. Biometrics 33(1), 159–174 (1977)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Mukerjee, A., Soni, A., Raina, A.M.: Detecting Complex Predicates in Hindi Using POS Projection across Parallel Corpora. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, Australia, pp. 28–35 (2006)Google Scholar
  18. 18.
    Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)CrossRefGoogle Scholar
  19. 19.
    Och, F.J., Tillmann, C., Ney, H.: Improved Alignment Models for Statistical Machine Translation. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, pp. 20–28 (1999)Google Scholar
  20. 20.
    Padró, L., Collado, M., Reese, S., Lloberes, M., Castellón, I.: FreeLing 2.1: Five Years of Open-Source Language Processing Tools. In: Proceedings of the 7th Conference on International Language Resources and Evaluation, Valleta, Malta (2010)Google Scholar
  21. 21.
    Pantel, P., Ravichandran, D., Hovy, E.: Towards Terascale Knowledge Acquisition. In: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, pp. 771–777 (2004)Google Scholar
  22. 22.
    Ruppenhofer, J., Ellsworth, M., Petruck, M.R.L., Johnson, C.R., Scheffczyk, J.: FrameNet II: Extended Theory and Practice (2010), https://framenet2.icsi.berkeley.edu/docs/r1.5/book.pdf
  23. 23.
    Schmid, H.: Probabilistic Part-of-Speech Tagging Using Decision Trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK (1994)Google Scholar
  24. 24.
    Sinha, R.M.K.: Mining Complex Predicates in Hindi Using a Parallel Hindi-English Corpus. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, Singapore, pp. 40–46 (2009)Google Scholar
  25. 25.
    Storrer, A.: Corpus-based Investigations on German Support Verb Constructions. In: Fellbaum, C. (ed.) Collocations and Idioms: Linguistic, Lexicographic, and Computational Aspects, London, pp. 164–188.Google Scholar
  26. 26.
    Villada Moirón, B., Tiedemann, J.: Identifying Idiomatic Expressions Using Automatic Word-Alignment. In: Proceedings of the EACL Workshop on Multiword Expressions in a Multilingual Context, Trento, Italy (2006)Google Scholar
  27. 27.
    Wermter, J., Hahn, U.: Collocation Extraction Based on Modifiability Statistics. In: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland (2004)Google Scholar
  28. 28.
    Zarrieß, S., Kuhn, J.: Exploiting Translational Correspondences for Pattern-Independent MWE Identification. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, Singapore, pp. 23–30 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Britta D. Zeller
    • 1
  • Sebastian Padó
    • 1
  1. 1.Department of Computational LinguisticsHeidelberg UniversityGermany

Personalised recommendations