Corpus-Based Acquisition of Support Verb Constructions for Portuguese

Zeller, Britta D.; Padó, Sebastian

doi:10.1007/978-3-642-28885-2_8

Britta D. Zeller²³ &
Sebastian Padó²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7243))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

1144 Accesses

Abstract

We present a resource-poor approach to automatically acquire Support Verb Constructions (SVCs) for European Portuguese with a two-stage procedure. First, we apply a cross-lingual approach with a bilingual parallel corpus: starting with a Portuguese full verb, we use the translations into another language and the corresponding backtranslations to identify Portuguese verb-noun pairs with the same meaning. Since not all of these are SVCs, the candidates are ranked and filtered in a second, monolingual step based on association statistics. We discuss two parametrisations of our procedure for a high-precision and a high-recall setting. In our experiments, these parametrisations achieve a maximum precision of 91% and a maximum recall of 86%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Athayde, M.F.: Construções com Verbo-suporte (Funktionsverbgefüge) do Português e do Alemão. Cadernos Do Cieg 1, 5–68 (2001)
Google Scholar
Bannard, C., Callison-Burch, C.: Paraphrasing with Bilingual Parallel Corpora. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, MI, pp. 597–604 (2005)
Google Scholar
Butt, M.: The Light Verb Jungle. Harvard Working Papers in Linguistics 9, 1–49 (2003)
Google Scholar
Carreras, X., Chao, I., Padró, L., Padró, M.: FreeLing: an Open-Source Suite of Language Analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal (2004)
Google Scholar
Cinková, S., Pecina, P., Podveský, P., Schlesinger, P.: Semi-automatic Building of Swedish Collocation Lexicon. In: Proceedings of the 5th Conference on International Language Resources and Evaluation, Genoa, Italy (2006)
Google Scholar
Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37–46 (1960)
Article Google Scholar
Duarte, I., Gonçalves, A., Miguel, M., Mendes, A., Hendrickx, I., Oliveira, F., Cunha, L.F., Silva, F., Silvano, P.: Light Verbs Features in European Portuguese. In: Proceedings of the 2nd Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes, Pisa, Italy (2010)
Google Scholar
Duran Sanches, M., Ramisch, C., Aluísio, S.M., Villavicencio, A.: Identifying and Analyzing Brazilian Portuguese Complex Predicates. In: Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World, Portland, USA, pp. 74–82 (2011)
Google Scholar
Evert, S., Krenn, B.: Methods for the Qualitative Evaluation of Lexical Association Measures. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, pp. 188–195 (2001)
Google Scholar
Grefenstette, G., Teufel, S.: Corpus-Based Method for Automatic Identification of Support Verbs for Nominalizations. In: Proceedings of European Chapter of the Associaton of Computational Linguistics, Dublin, Ireland, pp. 98–103 (1995)
Google Scholar
Hanks, P., Urbschat, A., Gehweiler, E.: German Light Verb Constructions in Corpora and Dictionaries. International Journal of Lexicography 19(4), 439–457 (2006)
Article Google Scholar
Hendrickx, I., Mendes, A., Pereira, S., Gonçalves, A., Duarte, I.: Complex Predicates Annotation in a Corpus of Portuguese. In: Proceedings of the 4th ACL Linguistic Annotation Workshop, Uppsala, Sweden, pp. 100–108 (2010)
Google Scholar
Koehn, P.: Europarl: a Parallel Corpus for Statistical Machine Translation. In: Proceedings of the 10th Machine Translation Summit, Chiang Mai, Thailand, pp. 79–86 (2005)
Google Scholar
Krenn, B., Evert, S.: Can We Do Better than Frequency? A Case Study on Extracting PP-Verb Collocations. In: Proceedings of the ACL Workshop on Collocations, Toulouse, France (2001)
Google Scholar
Lin, D., Pantel, P.: Discovery of Inference Rules for Question Answering. Journal of Natural Language Engineering 7(4), 343–360 (2001)
Article Google Scholar
Landis, J.R., Koch, G.G.: The Measurement of Observer Agreement for Categorical Data. Biometrics 33(1), 159–174 (1977)
Article MathSciNet MATH Google Scholar
Mukerjee, A., Soni, A., Raina, A.M.: Detecting Complex Predicates in Hindi Using POS Projection across Parallel Corpora. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, Australia, pp. 28–35 (2006)
Google Scholar
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)
Article Google Scholar
Och, F.J., Tillmann, C., Ney, H.: Improved Alignment Models for Statistical Machine Translation. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, pp. 20–28 (1999)
Google Scholar
Padró, L., Collado, M., Reese, S., Lloberes, M., Castellón, I.: FreeLing 2.1: Five Years of Open-Source Language Processing Tools. In: Proceedings of the 7th Conference on International Language Resources and Evaluation, Valleta, Malta (2010)
Google Scholar
Pantel, P., Ravichandran, D., Hovy, E.: Towards Terascale Knowledge Acquisition. In: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, pp. 771–777 (2004)
Google Scholar
Ruppenhofer, J., Ellsworth, M., Petruck, M.R.L., Johnson, C.R., Scheffczyk, J.: FrameNet II: Extended Theory and Practice (2010), https://framenet2.icsi.berkeley.edu/docs/r1.5/book.pdf
Schmid, H.: Probabilistic Part-of-Speech Tagging Using Decision Trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK (1994)
Google Scholar
Sinha, R.M.K.: Mining Complex Predicates in Hindi Using a Parallel Hindi-English Corpus. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, Singapore, pp. 40–46 (2009)
Google Scholar
Storrer, A.: Corpus-based Investigations on German Support Verb Constructions. In: Fellbaum, C. (ed.) Collocations and Idioms: Linguistic, Lexicographic, and Computational Aspects, London, pp. 164–188.
Google Scholar
Villada Moirón, B., Tiedemann, J.: Identifying Idiomatic Expressions Using Automatic Word-Alignment. In: Proceedings of the EACL Workshop on Multiword Expressions in a Multilingual Context, Trento, Italy (2006)
Google Scholar
Wermter, J., Hahn, U.: Collocation Extraction Based on Modifiability Statistics. In: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland (2004)
Google Scholar
Zarrieß, S., Kuhn, J.: Exploiting Translational Correspondences for Pattern-Independent MWE Identification. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, Singapore, pp. 23–30 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computational Linguistics, Heidelberg University, Germany
Britta D. Zeller & Sebastian Padó

Authors

Britta D. Zeller
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Padó
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

UFSCAR, Rod. Washington Luís, 13565-905, São Carlos, Brazil
Helena Caseli
UFRGS, Av. Bento Gonçalves, 9500, 91501-970, Porto Alegre, Brazil
Aline Villavicencio
DETI/IEETA, Universidade de Aveiro, Campus Universitário de Santiago, 3810-193, Aveiro, Portugal
António Teixeira
UC/ IT, DEEC, Universidade de Coimbra, Polo 2, 3030-290, Coimbra, Portugal
Fernando Perdigão

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeller, B.D., Padó, S. (2012). Corpus-Based Acquisition of Support Verb Constructions for Portuguese. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science(), vol 7243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28885-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-28885-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28884-5
Online ISBN: 978-3-642-28885-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics