Abstract
We propose a methodology to populate the verb type hierarchy of a deep computational grammar in the HPSG formalism using syntactic and morphological information from Universal Dependencies (UD) treebanks. It is exemplified by means of the UD Bosque corpus and PorGram, a computational grammar for Portuguese constructed in the LinGO Grammar Matrix framework, but it can be applied to analogous grammars of other languages using other UD treebanks. The main component of the methodology is a Python module that extracts from the annotated sentences the core arguments and other features that are relevant to determine verb valence. This module enables the creation of a Python dictionary that maps valence frames to verb objects. This dictionary facilitates not only determining which frames occur with which verbs, but also detecting annotation errors. The potential of the module for rapid expansion of the lexical coverage of PorGram and corpus annotation error detection is illustrated with concrete examples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
We consider this to be fair use, due to the small amount of knowledge from these sources that was incorporated into PorGram.
- 5.
- 6.
- 7.
- 8.
- 9.
One of the reviewers suggested the newly released BDCamões Treebank, whose dependency graphs are also available in UD format, but with a different tag set [17]. Despite its potential advantages, it mostly a historical corpus, which makes it not immediately useful in our context, since our main concern with PorGram is modern Portuguese. Therefore, we postpone using BDCamões Treebank data for future work.
- 10.
- 11.
- 12.
- 13.
References
de Alencar, L.F.: Br Gram: uma gramática computacional de um fragmento do português brasileiro no formalismo da LFG. In: Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology (2013)
Bender, E.M., Drellishak, S., Fokkens, A., Poulson, L., Saleem, S.: Grammar customization. Res. Lang. Comput. 8(1), 23–72 (2010)
Bender, E.M., Flickinger, D., Oepen, S.: The grammar matrix: an open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. In: Carroll, J., Oostdijk, N., Sutcliffe, R. (eds.) Proceedings of the Workshop on Grammar Engineering and Evaluation at the 19th International Conference on Computational Linguistics, pp. 8–14. Taipei, Taiwan (2002)
Bick, E.: The Parsing System Palavras: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press, Aarhus (2000)
Branco, A., Costa, F.: A computational grammar for deep linguistic processing of Portuguese: LXGram (version 5). University of Lisbon, Department of Informatics, Technical Report (2014)
Copestake, A.: Implementing typed feature structure grammars. CSLI, Stanford (2002)
Copestake, A., Flickinger, D., Pollard, C., Sag, I.A.: Minimal recursion semantics: an introduction. Res. Lang. Comput. 3(2), 281–332 (2005)
Costa, F., Branco, A.: LXGram: a deep linguistic processing grammar for Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 86–89. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12320-7_11
Cunha, C., Cintra, L.: Nova Gramática do Português Contemporâneo. Nova Fronteira, Rio de Janeiro (1985)
Davis, A.R.: Linking by types in the hierarchical lexicon. CSLI, Stanford (2001)
Droganova, K., Zeman, D.: Towards deep universal dependencies. In: Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019), pp. 144–152. Association for Computational Linguistics, Paris, August 2019
Emerson, G., Copestake, A.: Leveraging a semantically annotated corpus to disambiguate prepositional phrase attachment. In: Proceedings of the 11th International Conference on Computational Semantics (IWCS), Association for Computational Linguistics (2015)
Falk, Y.: Lexical-functional grammar: an introduction to parallel constraint-based syntax. CSLI, Stanford (2001)
Fernandes, F.: Dicionário de verbos e regimes, 35th edn. Globo, Rio de Janeiro (1991)
Ferrucci, D., et al.: Building Watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)
Flickinger, D.: On building a more efficient grammar by exploiting types. Nat. Lang. Eng. 6(1), 15–28 (2000)
Grilo, S., Bolrinha, M., Silva, J., Vaz, R., Branco, A.: The BDCamões collection of Portuguese literary documents: a research resource for digital humanities and language technology. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 849–854. European Language Resources Association, Marseille, France, May 2020. https://aclanthology.org/2020.lrec-1.106
Helbig, G., Schenkel, W.: Wörterbuch zur Valenz und Distribution deutscher Verben, 8th edn. Niemeyer, Tübingen (1991)
Kuhnle, A., Copestake, A.: Shapeworld - a new test methodology for multimodal language understanding (2017). https://arxiv.org/abs/1704.04517
Lien, E., Kouylekov, M.: Semantic parsing for textual entailment. In: Proceedings of the 14th International Conference on Parsing Technologies, pp. 40–49. Association for Computational Linguistics, Bilbao, July 2015
de Marneffe, M.C., Manning, C.D., Nivre, J., Zeman, D.: Universal dependencies. Comput. Linguist. 47(2), 255–308 (2021)
Mateus, M.H.M., Brito, A.M., et al.: Gramática da língua Portuguesa. Caminho, Lisboa (1989)
McCord, M.C., Murdock, J.W., Boguraev, B.K.: Deep parsing in Watson. IBM J. Res. Dev. 56(3.4), 3-1 (2012)
Müller, S.: Grammatical Theory, 4th edn. Language Science Press, Berlin (2020)
Nunes, A.L., Rademaker, A., de Alencar, L.F.: Utilizando um dicionário morfológico para expandir a cobertura lexical de uma gramática do português no formalismo hpsg. In: Ruiz, E.E.S., Torrent, T.T. (eds.) Proceedings of the XIII Brazilian Symposium in Information and Human Language Technology, pp. 11–18. Departamento de Computação e Matemática, Universidade de São Paulo, Ribeirão Preto (2021)
de Paiva, V., Rademaker, A., de Melo, G.: OpenWordNet-PT: an open Brazilian WordNet for reasoning. In: Proceedings of COLING 2012: Demonstration Papers. pp. 353–360. The COLING 2012 Organizing Committee, Mumbai, India, December 2012. http://www.aclweb.org/anthology/C12-3044, published also as Techreport http://hdl.handle.net/10438/10274
Rademaker, A., Chalub, F., Real, L., Freitas, C., Bick, E., de Paiva, V.: Universal dependencies for Portuguese. In: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling), pp. 197–206. Pisa, Italy, September 2017
Sag, I.A., Wasow, T., Bender, E.M.: Syntactic Theory: A Formal Introduction, 2nd edn. University of Chicago Press, Chicago (2003)
Schäfer, U., Kiefer, B., Spurk, C., Steffen, J., Wang, R.: The ACL anthology searchbench. In: Proceedings of the ACL-HLT 2011 System Demonstrations, pp. 7–13 (2011)
Siegel, M., Bender, E.M., Bond, F.: JACY: an implemented grammar of Japanese. CSLI (2016)
da Silva Borba, F.: Dicionário gramatical de verbos do português contemporâneo do Brasil, 2nd edn. Editora da UNESP, São Paulo (1991)
Straka, M., Straková, J.: Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDPipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 88–99. Association for Computational Linguistics, Vancouver, August 2017
Zamaraeva, O.: Assembling Syntax: Modeling Constituent Questions in a Grammar Engineering Framework. Ph.D. thesis, University of Washington (2021)
Čulo, O.: Automatische Extraktion von bilingualen Valenzwörterbüchern aus deutsch-englischen Parallelkorpora: Eine Pilotstudie. Universaar, Saarbrücken (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
de Alencar, L.F., Coutinho, L.R., da Silva, W.J.L., Nunes, A.L., Rademaker, A. (2022). Extracting Valences from a Dependency Treebank for Populating the Verb Lexicon of a Portuguese HPSG Grammar. In: Pinheiro, V., et al. Computational Processing of the Portuguese Language. PROPOR 2022. Lecture Notes in Computer Science(), vol 13208. Springer, Cham. https://doi.org/10.1007/978-3-030-98305-5_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-98305-5_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98304-8
Online ISBN: 978-3-030-98305-5
eBook Packages: Computer ScienceComputer Science (R0)