Skip to main content

An Account of the Challenge of Tagging a Reference Corpus for Brazilian Portuguese

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2721))

Abstract

This article identifies and addresses the major linguistic/conceptual, as opposed to logistic, issues faced in the morphosyntactic tagging of MAC-Morpho, a 1.1 million word Brazilian Portuguese corpus of newspaper articles that has been developed in the Lacio-Web Project. Rather than simply presenting the annotated corpus and describing its tagset, we elaborate on the criteria for establishing the tagset and analyze some interesting cases amongst the linguistic problems we faced in this work.

This project is partially funded by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq). We are grateful to E. Bick for parsing MAC-Morpho.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Marques, N.C., Lopes, J.G.P.: A Neural Network Approach to Portuguese Part-of-Speech Tagging. Anais do II Encontro para o Processamento Computacional de Português Escrito e Falado (1996) 1–9

    Google Scholar 

  2. Villavicencio, A., Viccari, R.M., Villavicencio, F.: Evaluating Part-of-Speech Taggers for the Portuguese Language. Anais do II Encontro para o Processamento Computacional de Português Escrito e Falado (1996) 159–167

    Google Scholar 

  3. Aires, R.V.X., Aluísio, S.M., Kuhn, D.C.S., Andreeta, M.L.B., Oliveira Jr., O.N.: Combining Multiple Classifiers to Improve Part of Speech Tagging: A Case Study for Brazilian Portuguese. Proceedings of SBIA’2000 (2000) 20–22

    Google Scholar 

  4. Bick, E.: The Parsing System “Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus: Aarhus University Press (2000).

    Google Scholar 

  5. Aluísio, S. et al.: An account of the challenge of tagging a reference corpus of Brazilian Portuguese. Technical Report 188 — ICMC-USP (2003). Also Available at http://www.nilc.icmc.usp.br/~lacio_web/

  6. Macleod, C., Ide, N., Grishman, R.: The American National Corpus: Standardized Resources for American English. Proceedings of the Second Language Resources and Evaluation Conference (LREC) (2000) 831–36

    Google Scholar 

  7. Galves, C., Britto, H.: A Construção do Corpus Anotado do Português Histórico Tycho Brahe: O sistema de anotação morfológica. Proceedings of PROPOR 99 (1999) 81–92.

    Google Scholar 

  8. Déjean, H.: How to Evaluate and Compare Tagsets? A Proposal. Proceedings of the Second Language Resources and Evaluation Conference (LREC) (2000). Also available at http://www.sfb441.uni-tuebingen.de/~dejean/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aluísio, S., Pelizzoni, J., Marchi, A.R., de Oliveira, L., Manenti, R., Marquiafável, V. (2003). An Account of the Challenge of Tagging a Reference Corpus for Brazilian Portuguese. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds) Computational Processing of the Portuguese Language. PROPOR 2003. Lecture Notes in Computer Science(), vol 2721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45011-4_17

Download citation

  • DOI: https://doi.org/10.1007/3-540-45011-4_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40436-1

  • Online ISBN: 978-3-540-45011-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics