Adaptation of Data and Models for Probabilistic Parsing of Portuguese

Wing, Benjamin; Baldridge, Jason

doi:10.1007/11751984_15

Benjamin Wing²⁴ &
Jason Baldridge²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3960))

Included in the following conference series:

International Workshop on Computational Processing of the Portuguese Language

439 Accesses
1 Citations

Abstract

We present the first results for recovering word-word dependencies from a probabilistic parser for Portuguese trained on and evaluated against human annotated syntactic analyses. We use the Floresta Sintá(c)tica with the Bikel multi-lingual parsing engine and evaluate performance on both PARSEVAL and unlabeled dependencies. We explore several configurations, both in terms of parameterizing the parser and in terms of enhancements to the trees used for training the parser. Our best configuration achieves 80.6% dependency accuracy on unseen test material, well above adjacency baselines and on par with previous results for unlabeled dependencies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Collins, M., Hajic, J., Ramshaw, L., Tillmann, C.: A statistical parser for Czech. In: Proc. of the 37th ACL, College Park, Maryland, USA (1999)
Google Scholar
Hajic, J.: Building a syntactically annotated corpus: Prague dependency treebank. In: Issues of Valency and Meaning, Karolinum, Prague, pp. 106–132 (1998)
Google Scholar
Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proc. of the 35th Annual Meeting of the ACL, Madrid, Spain, pp. 16–23 (1997)
Google Scholar
Dubey, A., Keller, F.: Probabilistic parsing for German using sister-head dependencies. In: Proc. of the 41st ACL, pp. 96–103 (2003)
Google Scholar
Dubey, A.: What to do when lexicalization fails: Parsing German with suffix analysis and smoothing. In: Proc. of the 43rd ACL, Ann Arbor, MI, pp. 314– 321 (2005)
Google Scholar
Arun, A., Keller, F.: Lexicalization in crosslinguistic probabilistic parsing: The case of French. In: Proc. of the 43rd ACL, Ann Arbor, MI, USA, pp. 306–313 (2005)
Google Scholar
de Carvalho e Sousa, F.: Analisador sintático estatístico orientado ao núcleo-léxico para a língua portuguesa. Master’s thesis, Instituto de Matemática e Estatística da Universidade de São Paulo (2003)
Google Scholar
Collins, M.: Head-driven statistical models for natural language parsing. Computational Linguistics 29(4), 589–638 (2003)
Article Google Scholar
Bonfante, A.G., das Graças Nunes, M.: The implementation process of a statistical parser for Brasilian Portuguese. In: Proc. of the IWPT 2001 (2001)
Google Scholar
Bonfante, A.G.: Parsing Probabilístico para o Português do Brasil. PhD thesis, Instituto de Ciências Matemáticas e de Computação da Universidade de São Paulo (2003)
Google Scholar
Afonso, S.: Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica (2005)
Google Scholar
Afonso, S., Bick, E., Haber, R., Santos, D.: Floresta sintá(c)tica: A treebank for Portuguese. In: Araujo, M.G.R.C.P.S. (ed.) Proc. of LREC 2002, Las Palmas de Gran Canaria, Spain, pp. 1698–1703 (2002)
Google Scholar
Bikel, D.: Design of a multi-lingual, parallel-processing statistical parsing engine. In: Proc. of the 2nd International Conference on Human Language Technology Research, San Francisco (2002)
Google Scholar
Bikel, D.: Intricacies of Collins’ parsing model. Computational Linguistics 30(4), 479–511 (2004)
Article Google Scholar
Bick, E.: The Parsing System PALAVRAS, Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press, Aarhus (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Linguistics, University of Texas at Austin, Austin, TX, 78712, USA
Benjamin Wing & Jason Baldridge

Authors

Benjamin Wing
View author publications
You can also search for this author in PubMed Google Scholar
Jason Baldridge
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Pontifícia Universidade do Rio Grande do Sul, Porto Alegre, Brasil
Renata Vieira
Departamento de Informática, Universidade de Évora, Portugal
Paulo Quaresma
NILC-ICMC, University of São Paulo, CP 668P, 13560-970, São Carlos, SP, Brazil
Maria das Graças Volpe Nunes
L2F/INESC-ID Lisboa, Email: qa-clef@l2f.inesc-id.pt, Rua Alves Redol, 9, 1000-029, Lisboa, Portugal
Nuno J. Mamede
Instituto Militar de Engenharia, Praça General Tibúrcio, 80, Rio de Janeiro, Brazil
Cláudia Oliveira
Pontifícia Universidade Católica do Rio de Janeiro, Rua Marquês de São Vicente, 225, Rio de Janeiro, Brazil
Maria Carmelita Dias

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wing, B., Baldridge, J. (2006). Adaptation of Data and Models for Probabilistic Parsing of Portuguese. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds) Computational Processing of the Portuguese Language. PROPOR 2006. Lecture Notes in Computer Science(), vol 3960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751984_15

Download citation

DOI: https://doi.org/10.1007/11751984_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34045-4
Online ISBN: 978-3-540-34046-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics