Language Resources and Evaluation

, Volume 47, Issue 4, pp 1327–1342

An open diachronic corpus of historical Spanish

  • Felipe Sánchez-Martínez
  • Isabel Martínez-Sempere
  • Xavier Ivars-Ribes
  • Rafael C. Carrasco
Project Note

DOI: 10.1007/s10579-013-9239-y

Cite this article as:
Sánchez-Martínez, F., Martínez-Sempere, I., Ivars-Ribes, X. et al. Lang Resources & Evaluation (2013) 47: 1327. doi:10.1007/s10579-013-9239-y

Abstract

The impact-es diachronic corpus of historical Spanish compiles over one hundred books—containing approximately 8 million words—in addition to a complementary lexicon which links more than 10,000 lemmas with attestations of the different variants found in the documents. This textual corpus and the accompanying lexicon have been released under an open license (Creative Commons by-nc-sa) in order to permit their intensive exploitation in linguistic research. Approximately 7 % of the words in the corpus (a selection aimed at enhancing the coverage of the most frequent word forms) have been annotated with their lemma, part of speech, and modern equivalent. This paper describes the annotation criteria followed and the standards, based on the Text Encoding Initiative recommendations, used to represent the texts in digital form.

Keywords

Diachronic corpus Historical Spanish Linguistic annotation TEI 

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Felipe Sánchez-Martínez
    • 1
  • Isabel Martínez-Sempere
    • 1
  • Xavier Ivars-Ribes
    • 1
  • Rafael C. Carrasco
    • 1
  1. 1.Dep. de Llenguatges i Sistemes InformàticsUniversitat d’AlacantAlacantSpain

Personalised recommendations