Skip to main content

Extracting Valences from a Dependency Treebank for Populating the Verb Lexicon of a Portuguese HPSG Grammar

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2022)

Abstract

We propose a methodology to populate the verb type hierarchy of a deep computational grammar in the HPSG formalism using syntactic and morphological information from Universal Dependencies (UD) treebanks. It is exemplified by means of the UD Bosque corpus and PorGram, a computational grammar for Portuguese constructed in the LinGO Grammar Matrix framework, but it can be applied to analogous grammars of other languages using other UD treebanks. The main component of the methodology is a Python module that extracts from the annotated sentences the core arguments and other features that are relevant to determine verb valence. This module enables the creation of a Python dictionary that maps valence frames to verb objects. This dictionary facilitates not only determining which frames occur with which verbs, but also detecting annotation errors. The potential of the module for rapid expansion of the lexical coverage of PorGram and corpus annotation error detection is illustrated with concrete examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/LR-POR/PorGram.

  2. 2.

    https://michaelis.uol.com.br/.

  3. 3.

    https://aulete.com.br/.

  4. 4.

    We consider this to be fair use, due to the small amount of knowledge from these sources that was incorporated into PorGram.

  5. 5.

    http://hdl.handle.net/11234/1-4611.

  6. 6.

    http://stanza.run.

  7. 7.

    http://ufal.mff.cuni.cz/udpipe.

  8. 8.

    http://spacy.io.

  9. 9.

    One of the reviewers suggested the newly released BDCamões Treebank, whose dependency graphs are also available in UD format, but with a different tag set [17]. Despite its potential advantages, it mostly a historical corpus, which makes it not immediately useful in our context, since our main concern with PorGram is modern Portuguese. Therefore, we postpone using BDCamões Treebank data for future work.

  10. 10.

    https://universaldependencies.org/u/dep/.

  11. 11.

    https://github.com/EmilStenstrom/conllu/.

  12. 12.

    https://github.com/LR-POR/tools.

  13. 13.

    https://ufal.mff.cuni.cz/deep-universal-dependencies.

References

  1. de Alencar, L.F.: Br Gram: uma gramática computacional de um fragmento do português brasileiro no formalismo da LFG. In: Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology (2013)

    Google Scholar 

  2. Bender, E.M., Drellishak, S., Fokkens, A., Poulson, L., Saleem, S.: Grammar customization. Res. Lang. Comput. 8(1), 23–72 (2010)

    Article  Google Scholar 

  3. Bender, E.M., Flickinger, D., Oepen, S.: The grammar matrix: an open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. In: Carroll, J., Oostdijk, N., Sutcliffe, R. (eds.) Proceedings of the Workshop on Grammar Engineering and Evaluation at the 19th International Conference on Computational Linguistics, pp. 8–14. Taipei, Taiwan (2002)

    Google Scholar 

  4. Bick, E.: The Parsing System Palavras: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press, Aarhus (2000)

    Google Scholar 

  5. Branco, A., Costa, F.: A computational grammar for deep linguistic processing of Portuguese: LXGram (version 5). University of Lisbon, Department of Informatics, Technical Report (2014)

    Google Scholar 

  6. Copestake, A.: Implementing typed feature structure grammars. CSLI, Stanford (2002)

    Google Scholar 

  7. Copestake, A., Flickinger, D., Pollard, C., Sag, I.A.: Minimal recursion semantics: an introduction. Res. Lang. Comput. 3(2), 281–332 (2005)

    Article  Google Scholar 

  8. Costa, F., Branco, A.: LXGram: a deep linguistic processing grammar for Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 86–89. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12320-7_11

    Chapter  Google Scholar 

  9. Cunha, C., Cintra, L.: Nova Gramática do Português Contemporâneo. Nova Fronteira, Rio de Janeiro (1985)

    Google Scholar 

  10. Davis, A.R.: Linking by types in the hierarchical lexicon. CSLI, Stanford (2001)

    Google Scholar 

  11. Droganova, K., Zeman, D.: Towards deep universal dependencies. In: Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019), pp. 144–152. Association for Computational Linguistics, Paris, August 2019

    Google Scholar 

  12. Emerson, G., Copestake, A.: Leveraging a semantically annotated corpus to disambiguate prepositional phrase attachment. In: Proceedings of the 11th International Conference on Computational Semantics (IWCS), Association for Computational Linguistics (2015)

    Google Scholar 

  13. Falk, Y.: Lexical-functional grammar: an introduction to parallel constraint-based syntax. CSLI, Stanford (2001)

    Google Scholar 

  14. Fernandes, F.: Dicionário de verbos e regimes, 35th edn. Globo, Rio de Janeiro (1991)

    Google Scholar 

  15. Ferrucci, D., et al.: Building Watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)

    Google Scholar 

  16. Flickinger, D.: On building a more efficient grammar by exploiting types. Nat. Lang. Eng. 6(1), 15–28 (2000)

    Article  Google Scholar 

  17. Grilo, S., Bolrinha, M., Silva, J., Vaz, R., Branco, A.: The BDCamões collection of Portuguese literary documents: a research resource for digital humanities and language technology. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 849–854. European Language Resources Association, Marseille, France, May 2020. https://aclanthology.org/2020.lrec-1.106

  18. Helbig, G., Schenkel, W.: Wörterbuch zur Valenz und Distribution deutscher Verben, 8th edn. Niemeyer, Tübingen (1991)

    Google Scholar 

  19. Kuhnle, A., Copestake, A.: Shapeworld - a new test methodology for multimodal language understanding (2017). https://arxiv.org/abs/1704.04517

  20. Lien, E., Kouylekov, M.: Semantic parsing for textual entailment. In: Proceedings of the 14th International Conference on Parsing Technologies, pp. 40–49. Association for Computational Linguistics, Bilbao, July 2015

    Google Scholar 

  21. de Marneffe, M.C., Manning, C.D., Nivre, J., Zeman, D.: Universal dependencies. Comput. Linguist. 47(2), 255–308 (2021)

    Google Scholar 

  22. Mateus, M.H.M., Brito, A.M., et al.: Gramática da língua Portuguesa. Caminho, Lisboa (1989)

    Google Scholar 

  23. McCord, M.C., Murdock, J.W., Boguraev, B.K.: Deep parsing in Watson. IBM J. Res. Dev. 56(3.4), 3-1 (2012)

    Google Scholar 

  24. Müller, S.: Grammatical Theory, 4th edn. Language Science Press, Berlin (2020)

    Google Scholar 

  25. Nunes, A.L., Rademaker, A., de Alencar, L.F.: Utilizando um dicionário morfológico para expandir a cobertura lexical de uma gramática do português no formalismo hpsg. In: Ruiz, E.E.S., Torrent, T.T. (eds.) Proceedings of the XIII Brazilian Symposium in Information and Human Language Technology, pp. 11–18. Departamento de Computação e Matemática, Universidade de São Paulo, Ribeirão Preto (2021)

    Google Scholar 

  26. de Paiva, V., Rademaker, A., de Melo, G.: OpenWordNet-PT: an open Brazilian WordNet for reasoning. In: Proceedings of COLING 2012: Demonstration Papers. pp. 353–360. The COLING 2012 Organizing Committee, Mumbai, India, December 2012. http://www.aclweb.org/anthology/C12-3044, published also as Techreport http://hdl.handle.net/10438/10274

  27. Rademaker, A., Chalub, F., Real, L., Freitas, C., Bick, E., de Paiva, V.: Universal dependencies for Portuguese. In: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling), pp. 197–206. Pisa, Italy, September 2017

    Google Scholar 

  28. Sag, I.A., Wasow, T., Bender, E.M.: Syntactic Theory: A Formal Introduction, 2nd edn. University of Chicago Press, Chicago (2003)

    MATH  Google Scholar 

  29. Schäfer, U., Kiefer, B., Spurk, C., Steffen, J., Wang, R.: The ACL anthology searchbench. In: Proceedings of the ACL-HLT 2011 System Demonstrations, pp. 7–13 (2011)

    Google Scholar 

  30. Siegel, M., Bender, E.M., Bond, F.: JACY: an implemented grammar of Japanese. CSLI (2016)

    Google Scholar 

  31. da Silva Borba, F.: Dicionário gramatical de verbos do português contemporâneo do Brasil, 2nd edn. Editora da UNESP, São Paulo (1991)

    Google Scholar 

  32. Straka, M., Straková, J.: Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDPipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 88–99. Association for Computational Linguistics, Vancouver, August 2017

    Google Scholar 

  33. Zamaraeva, O.: Assembling Syntax: Modeling Constituent Questions in a Grammar Engineering Framework. Ph.D. thesis, University of Washington (2021)

    Google Scholar 

  34. Čulo, O.: Automatische Extraktion von bilingualen Valenzwörterbüchern aus deutsch-englischen Parallelkorpora: Eine Pilotstudie. Universaar, Saarbrücken (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonel Figueiredo de Alencar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

de Alencar, L.F., Coutinho, L.R., da Silva, W.J.L., Nunes, A.L., Rademaker, A. (2022). Extracting Valences from a Dependency Treebank for Populating the Verb Lexicon of a Portuguese HPSG Grammar. In: Pinheiro, V., et al. Computational Processing of the Portuguese Language. PROPOR 2022. Lecture Notes in Computer Science(), vol 13208. Springer, Cham. https://doi.org/10.1007/978-3-030-98305-5_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-98305-5_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-98304-8

  • Online ISBN: 978-3-030-98305-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics