Grammatical Annotation of Historical Portuguese: Generating a Corpus-Based Diachronic Dictionary

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9924)


In this paper, we present an automatic system for the morphosyntactic annotation and lexicographical evaluation of historical Portuguese corpora. Using rule-based orthographical normalization, we were able to apply a standard parser (PALAVRAS) to historical data (Colonia corpus) and to achieve accurate annotation for both POS and syntax. By aligning original and standardized word forms, our method allows to create tailor-made standardization dictionaries for historical Portuguese with optional period or author frequencies.


Historical corpus Corpus annotation Dictionary 


  1. 1.
    Bick, E.: PALAVRAS, a constraint grammar-based parsing system for Portuguese. In: Working with Portuguese Corpora, pp. 279–302 (2014)Google Scholar
  2. 2.
    Bick, E., Módolo, M.: Letters and editorials: a grammatically annotated corpus of 19th century Brazilian Portuguese. In: Proceedings of the 2nd Freiburg Workshop on Romance Corpus Linguistics, pp. 271–280 (2005)Google Scholar
  3. 3.
    Britto, H., Finger, M., Galves, C.: Computational and linguistic aspects of the Tycho Brahe parsed corpus of historical Portuguese. In: Romance Corpus Linguistics: Corpora and Spoken Language, pp. 137–146 (2002)Google Scholar
  4. 4.
    Davies, M.: Creating and using the corpus do Português and the frequency dictionary of Portuguese. In: Working with Portuguese Corpora, pp. 89–110 (2014)Google Scholar
  5. 5.
    Galves, C., Faria, P.: Tycho Brahe Parsed Corpus of Historical Portuguese (2010).
  6. 6.
    Hendrickx, I., Marquilhas, R.: From old texts to modern spellings: an experiment in automatic normalisation. JLCL 26(2), 65–76 (2011)Google Scholar
  7. 7.
    Hirohashi, A.: Aprendizado de Regras de Substituição para Normatização de Textos Históricos (2005)Google Scholar
  8. 8.
    Junior, A.C., Aluísio, S.M.: Building a corpus-based historical Portuguese dictionary: challenges and opportunities. TAL 50(2), 73–102 (2009)Google Scholar
  9. 9.
    Murakawa, C.D.A.A.: A Construção de um Dicionário Histórico: o Caso do Dicionário Histórico do Português do Brasil-séculos XVI, XVII e XVIII. Estudos de Lingüística Galega 6, 199–216 (2014)Google Scholar
  10. 10.
    Nevins, A., Rodrigues, C., Tang, K.: The rise and fall of the L-shaped morphome: diachronic and experimental studies. Probus 27(1), 101–155 (2015)CrossRefGoogle Scholar
  11. 11.
    Niculae, V., Zampieri, M., Dinu, L.P., Ciobanu, A.M.: Temporal text ranking and automatic dating of texts. In: Proceedings of EACL, pp. 17–21 (2014)Google Scholar
  12. 12.
    Rocio, V., Alves, M.A., Lopes, J.G., Xavier, M.F., Vicente, G.: Automated creation of a medieval Portuguese partial treebank. In: Abeillé, A. (ed.) Treebanks, pp. 211–227. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  13. 13.
    Santos, D., Mota, C.: A Admiração à Luz dos Corpos. Oslo Stud. Lang. 7(1), 57–77 (2015)Google Scholar
  14. 14.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, pp. 44–49 (1994)Google Scholar
  15. 15.
    Silvestre, J.P., Villalva, A.: A morphological historical root dictionary for Portuguese, pp. 967–971 (2014)Google Scholar
  16. 16.
    Zampieri, M., Becker, M.: Colonia: corpus of historical Portuguese. ZSM Studien, Special Volume on Non-standard Data Sources in Corpus-Based Research, pp. 77–84 (2013)Google Scholar
  17. 17.
    Zampieri, M., Malmasi, S., Dras, M.: Modeling language change in historical corpora: the case of Portuguese. In: Proceedings of LREC, pp. 4098–4104 (2016)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.University of Southern DenmarkOdenseDenmark
  2. 2.Saarland UniversitySaarbrückenGermany
  3. 3.German Research Center for Artificial Intelligence (DFKI)SaarbrückenGermany

Personalised recommendations