Corpus-Based Acquisition of Support Verb Constructions for Portuguese
- 894 Downloads
Abstract
We present a resource-poor approach to automatically acquire Support Verb Constructions (SVCs) for European Portuguese with a two-stage procedure. First, we apply a cross-lingual approach with a bilingual parallel corpus: starting with a Portuguese full verb, we use the translations into another language and the corresponding backtranslations to identify Portuguese verb-noun pairs with the same meaning. Since not all of these are SVCs, the candidates are ranked and filtered in a second, monolingual step based on association statistics. We discuss two parametrisations of our procedure for a high-precision and a high-recall setting. In our experiments, these parametrisations achieve a maximum precision of 91% and a maximum recall of 86%, respectively.
Keywords
lexical acquisition support verbs multi-word expressions parallel bilingual data word alignment association measuresPreview
Unable to display preview. Download preview PDF.
References
- 1.Athayde, M.F.: Construções com Verbo-suporte (Funktionsverbgefüge) do Português e do Alemão. Cadernos Do Cieg 1, 5–68 (2001)Google Scholar
- 2.Bannard, C., Callison-Burch, C.: Paraphrasing with Bilingual Parallel Corpora. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, MI, pp. 597–604 (2005)Google Scholar
- 3.Butt, M.: The Light Verb Jungle. Harvard Working Papers in Linguistics 9, 1–49 (2003)Google Scholar
- 4.Carreras, X., Chao, I., Padró, L., Padró, M.: FreeLing: an Open-Source Suite of Language Analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal (2004)Google Scholar
- 5.Cinková, S., Pecina, P., Podveský, P., Schlesinger, P.: Semi-automatic Building of Swedish Collocation Lexicon. In: Proceedings of the 5th Conference on International Language Resources and Evaluation, Genoa, Italy (2006)Google Scholar
- 6.Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37–46 (1960)CrossRefGoogle Scholar
- 7.Duarte, I., Gonçalves, A., Miguel, M., Mendes, A., Hendrickx, I., Oliveira, F., Cunha, L.F., Silva, F., Silvano, P.: Light Verbs Features in European Portuguese. In: Proceedings of the 2nd Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes, Pisa, Italy (2010)Google Scholar
- 8.Duran Sanches, M., Ramisch, C., Aluísio, S.M., Villavicencio, A.: Identifying and Analyzing Brazilian Portuguese Complex Predicates. In: Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World, Portland, USA, pp. 74–82 (2011)Google Scholar
- 9.Evert, S., Krenn, B.: Methods for the Qualitative Evaluation of Lexical Association Measures. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, pp. 188–195 (2001)Google Scholar
- 10.Grefenstette, G., Teufel, S.: Corpus-Based Method for Automatic Identification of Support Verbs for Nominalizations. In: Proceedings of European Chapter of the Associaton of Computational Linguistics, Dublin, Ireland, pp. 98–103 (1995)Google Scholar
- 11.Hanks, P., Urbschat, A., Gehweiler, E.: German Light Verb Constructions in Corpora and Dictionaries. International Journal of Lexicography 19(4), 439–457 (2006)CrossRefGoogle Scholar
- 12.Hendrickx, I., Mendes, A., Pereira, S., Gonçalves, A., Duarte, I.: Complex Predicates Annotation in a Corpus of Portuguese. In: Proceedings of the 4th ACL Linguistic Annotation Workshop, Uppsala, Sweden, pp. 100–108 (2010)Google Scholar
- 13.Koehn, P.: Europarl: a Parallel Corpus for Statistical Machine Translation. In: Proceedings of the 10th Machine Translation Summit, Chiang Mai, Thailand, pp. 79–86 (2005)Google Scholar
- 14.Krenn, B., Evert, S.: Can We Do Better than Frequency? A Case Study on Extracting PP-Verb Collocations. In: Proceedings of the ACL Workshop on Collocations, Toulouse, France (2001)Google Scholar
- 15.Lin, D., Pantel, P.: Discovery of Inference Rules for Question Answering. Journal of Natural Language Engineering 7(4), 343–360 (2001)CrossRefGoogle Scholar
- 16.Landis, J.R., Koch, G.G.: The Measurement of Observer Agreement for Categorical Data. Biometrics 33(1), 159–174 (1977)MathSciNetzbMATHCrossRefGoogle Scholar
- 17.Mukerjee, A., Soni, A., Raina, A.M.: Detecting Complex Predicates in Hindi Using POS Projection across Parallel Corpora. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, Australia, pp. 28–35 (2006)Google Scholar
- 18.Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)CrossRefGoogle Scholar
- 19.Och, F.J., Tillmann, C., Ney, H.: Improved Alignment Models for Statistical Machine Translation. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, pp. 20–28 (1999)Google Scholar
- 20.Padró, L., Collado, M., Reese, S., Lloberes, M., Castellón, I.: FreeLing 2.1: Five Years of Open-Source Language Processing Tools. In: Proceedings of the 7th Conference on International Language Resources and Evaluation, Valleta, Malta (2010)Google Scholar
- 21.Pantel, P., Ravichandran, D., Hovy, E.: Towards Terascale Knowledge Acquisition. In: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, pp. 771–777 (2004)Google Scholar
- 22.Ruppenhofer, J., Ellsworth, M., Petruck, M.R.L., Johnson, C.R., Scheffczyk, J.: FrameNet II: Extended Theory and Practice (2010), https://framenet2.icsi.berkeley.edu/docs/r1.5/book.pdf
- 23.Schmid, H.: Probabilistic Part-of-Speech Tagging Using Decision Trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK (1994)Google Scholar
- 24.Sinha, R.M.K.: Mining Complex Predicates in Hindi Using a Parallel Hindi-English Corpus. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, Singapore, pp. 40–46 (2009)Google Scholar
- 25.Storrer, A.: Corpus-based Investigations on German Support Verb Constructions. In: Fellbaum, C. (ed.) Collocations and Idioms: Linguistic, Lexicographic, and Computational Aspects, London, pp. 164–188.Google Scholar
- 26.Villada Moirón, B., Tiedemann, J.: Identifying Idiomatic Expressions Using Automatic Word-Alignment. In: Proceedings of the EACL Workshop on Multiword Expressions in a Multilingual Context, Trento, Italy (2006)Google Scholar
- 27.Wermter, J., Hahn, U.: Collocation Extraction Based on Modifiability Statistics. In: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland (2004)Google Scholar
- 28.Zarrieß, S., Kuhn, J.: Exploiting Translational Correspondences for Pattern-Independent MWE Identification. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, Singapore, pp. 23–30 (2009)Google Scholar