Skip to main content

Automated Creation of a Medieval Portuguese Partial Treebank

  • Chapter
Treebanks

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 20))

Abstract

The growing trend towards corpus-based linguistics has led researchers to manually annotate large quantities of text. The human effort involved in this task is often enormous, and requires highly specialised linguistically trained manpower. According to our point of view, another approach should be followed, using this highly trained manpower in other activities, more rewarding and creative, in a constructive dialogue among the various kinds of expertise needed for overcoming our ignorance about languages. As an experiment, we used tools and linguistic resources previously built for Contemporary Portuguese for partially automating the process of partial annotation of a Medieval Portuguese corpus. In this paper, we describe the tools used (POS tagger, lexical analyser and partial parser) and demonstrate that the similarities between a language at two different time periods is sufficient for bootstrapping and acquiring lexical knowledge from the partially parsed, automatically annotated corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Balsa, J. (1998). A hierarchical Multi-agent system for Natural Language Diagnosis. Proceedings of the 13th European Conference on Artificial Intelligence. Henri Prade, ed.. John Willey & Sons, 1998.

    Google Scholar 

  • Balsa, J.; Lopes, J. G. (2000). A Distributed Approach for a robust and evolving NLP system. Proceedings of the NLP 2000 Conference, Patras, Greece, D. N Christodoulakis, ed.

    Google Scholar 

  • Böhmová, Alena et al. (2003). The Prague Dependency Treebank: a three-level annotation scenario. This volume.

    Google Scholar 

  • de la Clergerie, Eric; Lang, Bernard (1994). LPDA: Another Look at tabulation in logic programming. Proceedings of the International Conference on Logic Programming, Prague.

    Google Scholar 

  • Ferreira da Silva, Joaquim; Lopes, J. Gabriel; Xavier, M. Francisca; Vicente, Graça (1999). Relevant Expressions in Language Corpora. Actes de l’ atelier “Corpus et Traitement Automatique des Langues: Pour une réflexion méthodologique” (TALN’99), Cargèse, Corse (France). Anne Condamines, Cécile Fabre and Marie-Paule Péry-Woodley, eds. p. 86–94.

    Google Scholar 

  • Fiéis, M. Alexandra (2000). Interpolation in Medieval Portuguese. Proceedings of Lexicon & Grammar International Congress of Linguistics, Lugo.

    Google Scholar 

  • Hobbs, Jerry R. et al. (1997). FASTUS: A cascaded finite-state transducer for extracting information from natural-language text. In Finite-State Language Processing, E. Roche and Y. Schabes, ed., MIT Press, pp. 383–406.

    Google Scholar 

  • Lopes, J. Gabriel; Rocio, Vitor; Balsa, Joco (1999). Overcoming lexical information incompleteness, Proceedings of the First Portuguese Workshop on Computational Linguistics, Maria Antónia Mota, ed., Associação Por-tuguesa de Linguística.

    Google Scholar 

  • Marcus, Mitchell P.; Santorini, Beatrice; Marcinkiewicz, Mary Ann (1993). Building a large annotated corpus of English: The Penn Treebank, Proceedings of the 31st Annual Meeting of the Association of Computational Linguistics (ACL’93).

    Google Scholar 

  • Marques, Nuno; Lopes, J. Gabriel (1996). Using Neural Nets for Portuguese Part-of-Speech Tagging. Proceedings of the 5th International Conference on the Cognitive Science of Natural Language Processing. Dublin City University.

    Google Scholar 

  • Marques, Nuno; Lopes J. Gabriel; Coelho, Carlos A. (1998). Learning Verbal Transitivity using LogLinear Models, Proceedings of the Tenth European Conference on Machine Learning (ECML-98), Lecture Notes in Artificial Intelligence, Springer Verlag.

    Google Scholar 

  • Marques, Nuno (2000). Uma metodologia para a modelação estatística da subcategorização verbal. PhD thesis. Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2000.

    Google Scholar 

  • Moreno, A., López, S., et Sánchez F., (2003). Developing a syntactic annotation scheme and tools for a Spanish treebank. This volume.

    Google Scholar 

  • Pardo, M. A.; Souto; D.C.; Vilares, M.; de la Clergerie; E. (1999). Tabular Algorithms for TAG Parsing. Proceedings of EACL’99.

    Google Scholar 

  • Rocio, Vitor; Lopes, J. Gabriel (1998). Partial parsing, deduction and tabling, Actes des premières Journées sur la Tabulation en Analyse Syntaxique et Déduction (Proceedings of the Workshop on Tabulation in Parsing and Deduction), Bernard Lang, ed., INRIA, Rocquencourt.

    Google Scholar 

  • Rocio, Vitor; Lopes, J. Gabriel (1999a). An infra-structure for diagnosing causes for partially parsed natural language input, Proceedings of the 6th International Symposium on Social Communication, Santiago de Cuba, pp. 550–554, Editorial Oriente.

    Google Scholar 

  • Rocio, Vitor; Lopes, J. Gabriel (1999b). Cascaded Syntactic Partial Analysis. Proceedings of the First Portuguese Workshop on Computational Linguistics, Maria Antónia Mota, ed. Associação Portuguesa de Linguistica.

    Google Scholar 

  • Rocio, Vitor; de la Clergerie, Eric; Lopes, J. Gabriel (2000). Tabulation for multi-purpose partial parsing. Journal Grammars. To appear.

    Google Scholar 

  • Shieber, S.M.; Schabes, Y.; Pereira, F. (1995). Principles and Implementation of Deductive Parsing. Journal of Logic Programming.

    Google Scholar 

  • Xavier, M. Francisca; Vicente, Graça; Crispim, M. de Lourdes. (1999). Di-cionário de Verbos do Século 13, Lisboa, Centro de Linguistica da Uni-versidade Nova de Lisboa.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Rocio, V., Alves, M.A., Lopes, J.G., Xavier, M.F., Vicente, G. (2003). Automated Creation of a Medieval Portuguese Partial Treebank. In: Abeillé, A. (eds) Treebanks. Text, Speech and Language Technology, vol 20. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0201-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-94-010-0201-1_12

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-1335-5

  • Online ISBN: 978-94-010-0201-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics